[Techtalk] help!

Jeff Dike jdike at karaya.com
Fri Jan 25 22:15:47 EST 2002


katie at katie-and-rob.org said:
> What do I do from here?

The first thing to do is to find the bad memory and remove it.  Or maybe
the overheating CPU.  Those would be my first two guesses.

Another possibility is a bad disk or bad sectors on it - those should
fill the kernel log with device, block layer, and filesystem errors.

I had something very similar happen a few weeks ago.  I put a new stick of
memory in my laptop, and happily ran with it for a couple of weeks, then
one morning I got massive file corruption.  The memory was the last thing I
thought of.  I exonerated the disk and filesystem first.  By then, the bad
memory had polluted a whole pile of data by corrupting the page cache and
writing it back out to disk.

I've RH installed on it, so after I pulled the bad memory, I spent an 
afternoon running rpm -V and rpm -i to figure out which RPMs had corrupted
files and replace them.

There were some unexpected after-effects.  The next morning, it didn't
boot at all (the kernel didn't even load).  It turned out that the kernel
RPM had failed verification (probably because I had replaced the 2.4.2 kernel
it came with with 2.4.17) and removed the new kernel.

I had also replaced the kernel headers with the 2.4.17 headers.  These
also got "fixed", giving me back the wrong TUN/TAP ioctl numbers I wanted
rid of.

				Jeff




More information about the Techtalk mailing list