[Techtalk] mysterious system halts - how to prevent/fix/detect?

Julie Meloni julie at i2ii.com
Wed Jun 5 05:41:38 EST 2002


Hi -

Any suggestions, swift kicks upside the head, etc would be greatly
appreciated.   I have a SuSE-based web/db/mail server that has been
alive and running brilliantly for 10 months.  In that time, I rebooted
it once, for no reason that I recollect now.

It's a basic P-III 550mhz, 256MB ram, 30gb drive machine.  Never has
the load average been over say .75 for more than a few minutes, unless I was
compiling something.  As far as traffic, it handles only 3 virtual web
servers, and only about 87000 hits per day.  There are about 25 normal
volume mail accounts in use.  In other words, this is not an
incredibly busy machine, relatively.

However, in the last 3 days, it has halted itself for no particular
reason:

* Sunday night at 8, I rebooted at 4 the next morning .
* Then Monday night at 7, I rebooted at 4 the next morning.
* Yesterday it halted at 4 in the afternoon, I rebooted it at 8pm.
* Just five hours later it halted, I rebooted it, and it's currently
still running.

This is the current output of free:
             total       used       free     shared    buffers     cached
Mem:        261732     258348       3384          0      91568     132760
-/+ buffers/cache:      34020     227712
Swap:       238936          0     238936

While I don't watch it every minute (but I plan to now...), that's the
status quo for memory usage.

For my Apache config, this is what I use re: processes and children(and I use this same set-up
on another machine, which has been fine):

KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MinSpareServers 5
MaxSpareServers 10
StartServers 10 
MaxClients 150
MaxRequestsPerChild  400

This machine has been, like all of our machines have, withstanding
your run of the mill portscans, synflooding attempts and other script kiddie
whatnots.  Never any issues.

When fsck runs at startup, it find 6.5% non-contiguous blocks, but no
errors to go fix.

I see nothing in any of my logs that say "I'm tired, going to halt
now" -- but I could be looking in all the wrong places.

So, the million dollar question (hope this all wasn't too much info
and thanks for reading this far) is....

What does this sound like?  the drive? memory exhaustion?  something
else?

Ideas, things to read, fixes to try, all greatly appreciated.

Thanks,
Julie




More information about the Techtalk mailing list