[Techtalk] ide problem?

Maria Pinjanainen maria at tietonoita.fi
Tue Feb 3 13:02:45 UTC 2009


Hi!

I got a problem. It is possible to solve with using other hardware, one
new disk to the raid array, or totally new computer, but I like to know
first what is the problem.

First the computer, Debian Etch intel/amd hardware. 32bit start running
this:

/USR/SBIN/CRON[14992]: (root) CMD ([ -x /usr/share/mdadm/checkarray ] &&
[ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all
--quiet)

There are these arrays:

Personalities : [raid1] 
md3 : active raid1 sda9[0] sdb9[1]
      145155200 blocks [2/2] [UU]
      
md2 : active raid1 sda8[0] sdb8[1]
      393472 blocks [2/2] [UU]
      
md1 : active raid1 sda6[0] sdb6[1]
      2931712 blocks [2/2] [UU]
      
md0 : active raid1 sda5[0] sdb5[1]
      4883648 blocks [2/2] [UU]
       
md4 : active raid1 hda1[0] hdc1[1]
      488383936 blocks [2/2] [UU]
     
The md4 has this troubles.
At least the smartd tells it.

Device: /dev/hda, 2 Currently unreadable
(pending) sectors
Device: /dev/hda, 2 Offline uncorrectable
sectors

>From the syslog.
Etch start the cron and make it nice until the last arrays.
It ends to kernel panic.

The md4 device is only a data device. So, why it is going to kill whole
system? 
Or is there any other hardware or software trouble?

Feb  1 01:06:01 etch /USR/SBIN/CRON[14992]: (root) CMD
([ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ]
&& /usr/share/mdadm/checkarray --cron --all --quiet)
Feb  1 01:06:01 etch kernel: md: syncing RAID array md0
Feb  1 01:06:01 etch kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Feb  1 01:06:01 etch kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  1 01:06:01 etch kernel: md: using 128k window, over a total of
4883648 blocks.
Feb  1 01:06:01 etch kernel: md: delaying resync of md1 until md0 has
finished resync (they share one or more physical units)
Feb  1 01:06:01 etch kernel: md: delaying resync of md2 until md0 has
finished resync (they share one or more physical units)
Feb  1 01:06:01 etch kernel: md: delaying resync of md3 until md2 has
finished resync (they share one or more physical units)
Feb  1 01:06:01 etch kernel: md: delaying resync of md2 until md0 has
finished resync (they share one or more physical units)
Feb  1 01:06:01 etch kernel: md: delaying resync of md1 until md2 has
finished resync (they share one or more physical units)
Feb  1 01:06:01 etch kernel: md: syncing RAID array md4
Feb  1 01:06:01 etch kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Feb  1 01:06:01 etch kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  1 01:06:01 etch kernel: md: using 128k window, over a total of
488383936 blocks.
Feb  1 01:06:02 etch mdadm: RebuildStarted event detected on md
device /dev/md0
Feb  1 01:06:02 etch mdadm: RebuildStarted event detected on md
device /dev/md4
Feb  1 01:07:01 etch mdadm: Rebuild40 event detected on md
device /dev/md0
Feb  1 01:07:44 etch kernel: md: md0: sync done.
Feb  1 01:07:44 etch kernel: RAID1 conf printout:
Feb  1 01:07:44 etch kernel:  --- wd:2 rd:2
Feb  1 01:07:44 etch kernel:  disk 0, wo:0, o:1, dev:sda5
Feb  1 01:07:44 etch kernel:  disk 1, wo:0, o:1, dev:sdb5
Feb  1 01:07:44 etch kernel: md: delaying resync of md1 until md2 has
finished resync (they share one or more physical units)
Feb  1 01:07:44 etch kernel: md: syncing RAID array md2
Feb  1 01:07:44 etch kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Feb  1 01:07:44 etch kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  1 01:07:44 etch kernel: md: using 128k window, over a total of
393472 blocks.
Feb  1 01:07:44 etch kernel: md: delaying resync of md3 until md2 has
finished resync (they share one or more physical units)
Feb  1 01:07:44 etch mdadm: RebuildStarted event detected on md
device /dev/md2
Feb  1 01:07:44 etch mdadm: RebuildFinished event detected on md
device /dev/md0
Feb  1 01:07:54 etch kernel: md: md2: sync done.
Feb  1 01:07:54 etch kernel: md: delaying resync of md3 until md1 has
finished resync (they share one or more physical units)
Feb  1 01:07:54 etch kernel: md: syncing RAID array md1
Feb  1 01:07:54 etch kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Feb  1 01:07:54 etch kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  1 01:07:54 etch kernel: md: using 128k window, over a total of
2931712 blocks.
Feb  1 01:07:54 etch kernel: RAID1 conf printout:
Feb  1 01:07:54 etch kernel:  --- wd:2 rd:2
Feb  1 01:07:54 etch kernel:  disk 0, wo:0, o:1, dev:sda8
Feb  1 01:07:54 etch kernel:  disk 1, wo:0, o:1, dev:sdb8
Feb  1 01:07:54 etch mdadm: RebuildFinished event detected on md
device /dev/md2
Feb  1 01:07:54 etch mdadm: RebuildStarted event detected on md
device /dev/md1
Feb  1 01:08:54 etch mdadm: Rebuild60 event detected on md
device /dev/md1
Feb  1 01:09:09 etch kernel: md: md1: sync done.
Feb  1 01:09:09 etch kernel: md: syncing RAID array md3
Feb  1 01:09:09 etch kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Feb  1 01:09:09 etch kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  1 01:09:09 etch kernel: md: using 128k window, over a total of
145155200 blocks.
Feb  1 01:09:09 etch kernel: RAID1 conf printout:
Feb  1 01:09:09 etch kernel:  --- wd:2 rd:2
Feb  1 01:09:09 etch kernel:  disk 0, wo:0, o:1, dev:sda6
Feb  1 01:09:09 etch kernel:  disk 1, wo:0, o:1, dev:sdb6
Feb  1 01:09:09 etch mdadm: RebuildStarted event detected on md
device /dev/md3
Feb  1 01:09:09 etch mdadm: RebuildFinished event detected on md
device /dev/md1
Feb  1 01:19:09 etch mdadm: Rebuild20 event detected on md
device /dev/md3
Feb  1 01:29:09 etch mdadm: Rebuild40 event detected on md
device /dev/md3
Feb  1 01:36:09 etch mdadm: Rebuild20 event detected on md
device /dev/md4
Feb  1 01:49:09 etch mdadm: Rebuild80 event detected on md
device /dev/md3
Feb  1 02:00:11 etch kernel: md: md3: sync done.
Feb  1 02:00:11 etch kernel: RAID1 conf printout:
Feb  1 02:00:11 etch kernel:  --- wd:2 rd:2
Feb  1 02:00:11 etch kernel:  disk 0, wo:0, o:1, dev:sda9
Feb  1 02:00:11 etch kernel:  disk 1, wo:0, o:1, dev:sdb9
Feb  1 02:00:11 etch mdadm: RebuildFinished event detected on md
device /dev/md3

All the other raids are done, but the one is to do.

Feb  1 02:03:11 etch mdadm: Rebuild40 event detected on md
device /dev/md4
Feb  1 02:27:11 etch mdadm: Rebuild60 event detected on md
device /dev/md4Feb  1 02:54:11 etch mdadm: Rebuild80 event detected on
md device /dev/md4
Feb  1 02:54:11 etch mdadm: Rebuild80 event detected on md
device /dev/md4
Feb  1 03:28:09 etch kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb  1 03:28:09 etch kernel: hda: dma_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974762815
Feb  1 03:28:09 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:12 etch kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb  1 03:28:12 etch kernel: hda: dma_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974762815
Feb  1 03:28:12 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:15 etch kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb  1 03:28:15 etch kernel: hda: dma_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974762815
Feb  1 03:28:15 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:18 etch kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Feb  1 03:28:18 etch kernel: hda: dma_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974762815
Feb  1 03:28:18 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:18 etch kernel: hda: DMA disabled
Feb  1 03:28:18 etch kernel: hdb: DMA disabled

The loop starts...

Feb  1 03:28:18 etch kernel: ide0: reset: success
Feb  1 03:28:21 etch kernel: hda: task_in_intr: status=0x59 { DriveReady
SeekComplete DataRequest Error }
Feb  1 03:28:21 etch kernel: hda: task_in_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974763422
Feb  1 03:28:21 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:24 etch kernel: hda: task_in_intr: status=0x59 { DriveReady
SeekComplete DataRequest Error }
Feb  1 03:28:24 etch kernel: hda: task_in_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974763422
Feb  1 03:28:24 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:27 etch kernel: hda: task_in_intr: status=0x59 { DriveReady
SeekComplete DataRequest Error }
Feb  1 03:28:27 etch kernel: hda: task_in_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974763422
Feb  1 03:28:27 etch kernel: ide: failed opcode was: unknown 
Feb  1 03:28:30 etch kernel: hda: task_in_intr: status=0x59 { DriveReady
SeekComplete DataRequest Error }
Feb  1 03:28:30 etch kernel: hda: task_in_intr: error=0x01
{ AddrMarkNotFound }, LBAsect=974763422, high=58, low=1684894,
sector=974763422
Feb  1 03:28:30 etch kernel: ide: failed opcode was: unknown
Feb  1 03:28:30 etch kernel: ide0: reset: success

... etc... about 10 times... until kernel panic... 

At least so far I took the chackarray away from the cron. 

--m




More information about the Techtalk mailing list