[Techtalk] Faster badblock scans?

Tue Aug 26 05:14:34 EST 2003

On Mon, 25 Aug 2003, Julie wrote:

> Does anyone know of a faster way to scan IDE drives for bad blocks?

Hi,

I'm sorry to say this, but don't bother.  Don't just delete this post as
being unhelpful, please read it and take note, it really is for your own
good.

> Failing that, does anyone know of a way to turn bad block reports
> in /var/log/messages into numbers I can feed to fsck from time to
> time?  I'm running e2fsck -c on a 40GB partition right now and it
> looks like it's going to take 10 hours to complete.

This would probably be something you can do with a script, but I cannot
possibly stress what a bad idea this is.

> Better still, when is someone going to fix ext[23] so that it
> automatically flags bad blocks when it finds them?  I'd love to do
> it myself but IBM says I can't ;-(

It's not broken.  Please don't ask for it to be fixed.  Just because it
doesn't do what you think it ought to, doesn't mean it is wrong.

Now, onto the explanation.

The drive is obviously a modern drive, because you mention a 40 GB
partition.  Now the thing you need to bear in mind is that modern drives
take care of bad blocks by themselves, they have very sophisticated
methods of spotting when blocks are going bad.  They also have a store of
backup blocks to replace bad ones.  This is why ext2 is not broken (nor
is ext3, since they're virtually the same).  It would be a bad thing to
try to second-guess something which is already done for you, furthermore, 
the drive is going to be a whole hell of a lot better at it than a
filesyste, and this is why ...

When they spot a block that's gone or going bad, the first thing it will
do is try to recover the data from it.  It'll do this by reading it
multiple times to try to ascertain what's data and what's noise.  Once
it's got the data back from the bad block it will write it back to disc in
one of the spare blocks.  Then it will put a little pointer in so that it
knows the bad block is unusable, and to retrieve the data from the sector
that it's been remapped to.

Whenever you get a drive, it will already have been scanned during
manufacture, as part of the low-level formatting.  If you happen to have a
scsi drive, you can have a look at what is called the "defect list".  The
drive will actually have two lists, one that was discovered during
manufacture, and a "grown" list which the drive has built up over time as
it's discovered blocks going bad.  For example, one of my IBM Ultrastars
has 3874 entries in the manufaturer's defect table, and none in the grown
table.  This is actually very interesting to look at, you can see obvious
trends, for example, this is a small flaw across the disc from the centre
towards the edge: (Cylinder:Head:Offset) - 2644:0:74240 2645:0:74240
2646:0:74240

Anyway, I digress.  The point I'm making is that if you're starting to see
bad blocks showing up in your use of the drive, and their numbers are
ever-increasing, this is ... a bad thing.  Your drive is dying.  By the
time you start seeing bad blocks manifesting themselves, there are two
possibilities.  Either the drive is (for some strange reason) configured
not to automatically remap bad blocks, or the drive IS configured to remap
bad blocks, and it's run out of spares.

If it's the first one, then there's something pretty strange going on.  
If it's the second one, then your drive is well on it's way to silicon
heaven, and the best thing you can do is dump it and buy a new one.

At this point, my honesty brings me to point out that many manufacturers
now offer utilities to bring your drives back to full health.  What they
do is have a little chat with the drive, and ask it how it's feeling about
life, it'll retrieve a list of things the drive is concerned about and
ponder over it, making reassuring sounding noises towards your ailing
drive.  Then it'll erase the drive's mind and perform a low level
format.  Sounds dreadfully callous, doesn't it?  The low level format
involves scanning the entire surface of the drive at a physical level, to
determine what is in good condition and what isn't, it will then rewrite
the addressing marks, format the drive, and build a new list of known
defects for the drive to avoid.  By the time it's done all this, the hard
drive will be ever-so-slightly smaller, but have no errors.  IBM's utility
is one of the best I've seen for this.

Ah, but there's a catch.  If the number of defects is sufficiently large
for you to need to scan the whole drive, and has been steadily increasing,
the best you can hope from a low level format is buying a little
time.  The drive is still dying, and there's still nothing you can do
about it.  With the price of drives nowadays you will be saving yourself
immeasurable amounts of grief by just buying a new one.

So, you've reached the end of this enormous rambling epistle, so I'll
re-iterate.  In my (humble) opinion, your drive is dying.  Anything you do
to it may lengthen the useful life of the drive, but when it comes down to
it, that clicking, clattering noise as the drive skips the head back and
forth over a bad block is the sound of inevitability.  Please buy a new
one.  Your life will be richer for it, and you won't have to entrust your 
precious data to a device you know is failing.  If you feel bad about just
throwing it away, keep it on one side, it may come in handy one day to act
as a doner of spare parts, if not to you, but to someone else.

Have fun, and I wish you luck.

Maria