[Techtalk] Faster badblock scans?

Wed Aug 27 09:09:20 EST 2003

jas at spamcop.net wrote:
> Quoting Julie <txjulie at austin.rr.com>:
>>While I appreciate the rant, and could quite easily have written
>>it myself, I'd still like an answer to my question.
>>
>>For what it's worth, the drives are still in excellent condition,
>>with far fewer than 1/100th of 1% of the blocks bad.  Last nights
>>scan was of 10,000,000 blocks, of which 9 came up bad.  One bad
>>block in 1,000,000 is hardly proof of pending calamity.  Mostly
>>the problem is that of the 60,000,000 blocks of IDE drive on this
>>machine, 40 or 50 of them seem to have gone bad over the past 12
>>to 18 months.
> 
> 
> Why aren't they being mapped out by the drive's firmware, though? Is that 
> somehow disabled (in which case, just enable it!) - or is that defect list 
> already full, at which point the defects you're finding now are just the tip of 
> the iceberg - and the disk's dying. (Someone said on another list that IBM 
> drives don't do this - if it's an IBM drive, that could also explain the 
> problem.)

It doesn't appear that the drives support it.  I had tried
"hdparm -D /dev/hda" the last time I went through this (and just
tried it again) and I get an error message back from hdparm.  This
is what the drive reports it is --

Model=Maxtor 4G120J6, FwRev=GAK819K0, SerialNo=G6092NCE

and the message from trying to turn on defect management --

/dev/hda:
  setting drive defect-mgmt to 1
  HDIO_DRIVE_CMD(defectmgmt) failed: Input/output error

Which shows what I know -- I thought they were Western Digital
drives ...

>>And while rants are often fun, in the same sense that a train wreck
>>can be fun to watch, the proof that Linux isn't ready for enterprise
>>computing is that Linux can't survive having a couple of bad spots
>>crop up on a large disk array without barfing.  Seeing as the 240GB
>>on this machine is slowly filling, and I expect to have 1TB on it
>>before year's end, I'd like to be running an operating system that
>>can survive the occasional non-recoverable disk error.
> 
> 
> Linux just relies (quite reasonably, IMHO) on the disk's firmware handling that, 
> as it's supposed to. Being more tolerant of defective drives would be nice, 
> though - and both Windows NT/2k/XP and Linux handle this in the same way: you 
> need to be using the fault-tolerant drivers (FTDISK.SYS under NT, software RAID 
> under Linux) to get this extra layer of protection. The ordinary drivers just 
> assume the drive does what it says on the tin. For 1 Tb of data, there will be 
> several drives involved anyway: would adding one more to get RAID 5 be a 
> problem? That would avoid the whole issue, it seems: you'd get the extra layer 
> of remapping, and you'd be able to recover the lost data instead of just 
> identifying it.

My problem with Linux is mostly the lack of grace with which it
handles I/O errors.  The entire machine hangs while it struggles to
read a block it isn't going to read.  Paging I/O can't happen, so
I can't get it to do anything.

I've considered using RAID in the past.  At the time I didn't have the
IDE channels to connect enough drives to make it work and have the
amount of storage I wanted.  I finally found out why I couldn't put the
NICs I had in the machine in the 64-bit PCI slots in my motherboard
(Netgear NICs and the Tyan Tiger MPX chipset don't mix), so I now have
enough slots to put another IDE controller in along with several more
drives.

The other issue has been the power supply and case temperature.  I've
not added up all of the power requirements for the devices in the case
in a while.  I can't imagine I have a large enough power supply to
replace 2 120GB drives with 5 250GB drives.  So that's something else
to consider.  On top of that, things are getting toasty inside the case,
so I need to look into solving that as well.
-- 
Julianne Frances Haugh             Life is either a daring adventure
txjulie at austin.rr.com                  or nothing at all.
					    -- Helen Keller