[Techtalk] Maintaining a production system

Julie jockgrrl at austin.rr.com
Wed Oct 10 21:36:05 EST 2001

Subba Rao wrote:
> I have a linux system that is rolled out into production. It is mounted
> onto the rack and humming away. Now I would like to upgrade the kernel.
> How do I go about upgrading the kernel on a system that is in operation?
> Is it kind of the weekend NT/VMS/MVS maintainence window thing? ;-)

Someone else made a few good comments about telling users you're
going to bring the machine down for a short bit.  These are my
experiences with maintenance windows and production environments.

First, condition your users to expect maintenance periods and know
when they are, how long they will last and so on.  It doesn't matter
how often you schedule them, like, "the system will be down from
10pm until midnight the first Saturday of each month".  Use those
periods for things like cleaning fan filters, checking UPSes,
adding memory, disk, etc.  Collect maintenance chores until then,
unless you can do them while the machine is live (like cleaning
tape drive heads ...).  Make sure you know the maintenance
intervals for things like tape drive cleaning, fan filters, etc.

Second, if you can afford a test system, get one.  Your test
system can be just about anything, so long as it is enough like
what you have to be meaningful.  Boot your kernel on that machine
and make sure the production software is going to run.  Use the
test system to test backups.  Try out upgrades on that system
first.  If something awful happens to your production system you
will have a lot of spare parts lying around in that test machine.
If you have a large production environment, keep spare parts on
hand, or have a contract to supply spare parts on very short
notice.  Some vendors will go as short as 4 hours.

Third, unless you absolutely must change something, don't.  I've
found over years of doing software support that users will learn
to live with minor problems, but that most =hate= to have a
stream of new minor problems in exchange for existing, but well
understood, minor problems.  Think "trailing edge technology",
not "leading edge technology".
Julianne Frances Haugh             Life is either a daring adventure
jockgrrl at austin.rr.com                 or nothing at all.
					    -- Helen Keller

More information about the Techtalk mailing list