[Techtalk] Is Linux 2.4.18 Really That Unstable?

Maria Blackmore mariab at cats.meow.at
Thu Oct 17 09:55:11 EST 2002


On 16 Oct 2002, mc wrote:

> I have had similar problems with lockups and errors on my RH 7.3
> 2.4.18-3 install.  I have tested the memory a couple of times and ran
> some IBM diagnostics on the Hard drive.  I am pretty sure it is hardware
> related, but haven't been able to pin it down.

This sounds to me like either a flakey CPU or flakey memory, I would
especially suspect the memory ...

> FWIW, either the box locks up tighter than a drum, no keyboard action,
> can't even log in remotely, or I will get random seg faults when trying
> to open certain programs, and only a reboot will get them opening again
> without errors.

random segfaults .... sounds like memory.

What did you use to test the memory?

I would recommend a quick application of memtest86, it tests the memory
properly and walks through combinations checking for effects on the rest
of the memory, and can detect things as annoying as a random short between
two data/address lines, a grounded line somewhere, a single dying memory
cell, and also detects possible crosstalk problems.

Once installed it's started through lilo or from a boot floppy, as it runs
straight on the bare metal after a reboot.

Something else which is notoriously good for stressing hardware is a
kernel compile, or possibly X or Gnome or KDE nowadays :)

The kernel is kind of the benchmark, I guess.  Compiling something that
big will stress the CPU to its limits, and stress the drives, use every
concievable bit of memory (unless you have some obscene amount of memory
installed), and because of this will also show up problems with the power
supply since more work means more power draw.  The kernel compile is so
well tested that the chances are if you have a supported compiler setup,
any segfault is likely to be down to hardware, especially if it occurs a
lot but at different points in the compile.


in order of probability (imho):

I would suspect the memory, first off, which is why I suggested memtest86.

It could also be the CPU, maybe getting a little too hot.  I would check
the CPU fan is spinning freely and smoothly, and that the heatsink is free
of dust.  An easy way to check for a CPU heat problem is lmsensors, which
queries various monitoring stuff on the motherboard.  Silicon will become
unhappy at somewhere between 80 and 100 degrees C, at which point
electrons occasionally go straight through insulation on the chip and end
up in the wrong place.

Lastly, it could be the power supply.  Some supplies have a truly horrible
habit that when they're under excessive load, instead of sagging, they
become more noisey, and you start seeing nasty shaped AC on the DC power
rails.  Which really messes things up.  Of course, sagging is bad enough,
but this is .. evil.


good luck

Maria




More information about the Techtalk mailing list