[Techtalk] Faster badblock scans?

Wed Aug 27 01:29:11 EST 2003

On Tue, 26 Aug 2003, Julie wrote:

> > I'm sorry to say this, but don't bother.  Don't just delete this post as
> > being unhelpful, please read it and take note, it really is for your own
> > good.
> 
> While I appreciate the rant, and could quite easily have written
> it myself, I'd still like an answer to my question.

You cannot scan a drive any faster than it can read the data, your only
solution would be the application of hdparm to speed up read access to the
drive.  Also, your read speed will be hampered by the fact that data is
being read a block at a time, which will increase overheads.

> For what it's worth, the drives are still in excellent condition,

I wouldn't say that any drive with bad blocks showinup asis in excellent
condition.

> with far fewer than 1/100th of 1% of the blocks bad.

What about the blocks that have already been remapped, if your drive is
performing this?

I have a 9 GB seagate brick here, vintage 1991, that has a full gigabyte
of spare blocks, spread across all 16 surfaces.  Imagine how many your
modern, much larger, drive has, and consider the possibility that the
reason you're seeing them is that the spares on at least one of the
surfaces have been used up.

> Last nights scan was of 10,000,000 blocks, of which 9 came up bad.  

Ask yourself, were these 9 bad blocks ones that you already knew about?

> One bad block in 1,000,000 is hardly proof of pending calamity.

No, it's not.

but ...

> Mostly the problem is that of the 60,000,000 blocks of IDE drive on
> this machine, 40 or 50 of them seem to have gone bad over the past 12
> to 18 months.

... is proof, to my mind.  It's not their presence that's the key, it's
the fact that their numbers are increasing.

> And while rants are often fun,

I didn't write it for fun, I wrote it because I don't like to see people
losing data.

> in the same sense that a train wreck can be fun to watch,

Thanks.

No, really, I've given my advice, and now it's up to you to decide what to
do with it.

> the proof that Linux isn't ready for enterprise computing is that
> Linux can't survive having a couple of bad spots crop up on a large
> disk array without barfing.

At enterprise level computing, the presence of a single un-remapped bad
block is grounds to replace the drive.  If I had the budget, I would
replace the drive.  If I didn't have the budget, I would run the array in
degraded mode, and place the drive in another machine to perform a low
level format on it, before re-integrating it into the array.

Enterprise level computing does not use consumer grade IDE drives.

Bad blocks are not linux' problem.  It should never see them, and if it
does see them, what good is that?  You've already lost the data by that
point, because linux is abstracted from the drive hardware and cannot
control the hardware sufficiently to attempt to recover the data.  Modern
drives use Mega Funky (tm) statistical algorithms to ascertain the most
likely value of a bit of data.  Linux simply can't do this, nor is it
it's job to!

Furthermore, at enterprise level, the storage array would have thrown the
drive out of the array at the first sign of a bad block, and screamed blue
murder to have someone replace the drive,

Please don't "blame" linux for not dealing with shonky hardware, it's not 
it's problem.

> Seeing as the 240GB on this machine is slowly filling, and I expect to
> have 1TB on it before year's end, I'd like to be running an operating
> system that can survive the occasional non-recoverable disk error.

I'll have a look around for an operating system that allows you to attach
a crystal ball so you can find out what was in the now-dead bits of
unreliable drives.  ;)

In the mean time, I would recommend the purchase of a hardware RAID
solution, or the implementation of the "md" software RAID.  I would then
like to recommend the purchase of some nice happy SCSI drives for great
reliability.  If you can't run to SCSI drives, then you may find that one
of the RAID enclosures that take a whole bunch of IDE drives and present
them as a RAIDed SCSI device might be up your street.

Anyway, as I said, this is my advice, but I am loathe to tell people how
to do things that I think are not in their best interests, so you will
have to forgive my vague comments on speed increases.  Perhaps someone
else will answer in more detail, but I really do hope you don't get bitten
by a failing drive.  Still, you have backups, right? :)

Maria
(Listening to the sound of large numbers of 10k RPM SCSI drives :)
(and is a firm believer in getting what you pay for)