[Techtalk] RAID data recovery w/ hardware controller

Gina Feichtinger geekgrrl at geekgrrl.priv.at
Wed Feb 13 08:16:45 UTC 2008


On 12.02.2008 22:22 Uhr, Rudy Zijlstra wrote:
> On Tue, 12 Feb 2008, Carla Schroder wrote:
[...]
>> I'm figuring out
>> the pros and cons with Linux software RAID; with Linux RAID you can stuff
>> your drives in any Linux box and re-construct the array, assuming the drives
>> are not damaged and the data not corrupted. Very nice for moving to a new
>> box, or recovering from some other component failure that doesn't affect the
>> hard drives.
> 
> I use both HW raid and SW raid. Generally speaking i'm using HW raid on 
> boxes with high availability requirements, and use them for system 
> partition. big storage RAID is then SW raid. Reasons to use HW raid:
> -1- easy identification of failed disk
> -2- no problems to boot from a raid partition
> -3- configuration / recovery can be done with no OS active
> 
> Examples: i have a IBM x-series server with 4 SCSI HDD. i tried to use SW 
> raid on that, and in order to have a simple raid config, used partitioned 
> raid driver. I subsequently ended up creating a specific boot CD, as no 
> disk based bootloader supported this configuration... Both grub and lilo 
> barfed on it :(

May I ask what you mean with "partitioned raid driver"? I've had a
x-Series machine barf on me after install (a x3650 with RHEL AS to be
precise) only because in that case grub wasn't properly written to the
MBR...

A thing I found useful about SW RAID - switching to bigger disks in a
RAID-1 setup is pretty easy. Split the mirror, switch second disk,
rebuild RAID, switch first disk, rebuild RAID, done (OKOK, more or
less). I've had to do that a few times already when the internal 36GB
disks were getting too small.

> When i had the opportunity to get some modern SCSI-320 raid controllers 
> for a sensible price i ran for it... much easier to maintain solution.
> 
> Also, i've found the beeping that ensues on a disk failure to be a very 
> usefull warning sign :)
>
> On SW raid, identifying which HDD has failed can be an issue. I've once 
> lost a 1.5T array because we mis-identified which HDD had failed.... All 
> data on the array was lost.

May I ask how this happened? Ususally "cat /proc/mdstat" gives you a
good overview about the state of the array.

> On the HW raid, when i accidentally disconnect the wrong disk, i can force 
> that disk online, and incur no data loss. I've so far found no way to do 
> that on SW raid... can be my problem though, as mdadm still has some 
> secrets for me.

mdadm is a bit, how should I say, cryptic at times. What I have come to
like a lot is the monitoring feature, though. Whenever a disk or
partition craps out I get a mail and can react pretty quickly. Since I
don't have to visit the server room that often anymore it could take way
longer for me to see/hear a failed disk in a HW RAID I think.

Just my 2¢,

Gina
-- 
Gina Feichtinger                :: LinuxChix member
System-/SAN-Administrator & DBA :: http://www.linuxchix.org/
http://www.geekgrrl.priv.at/    :: LUGA member
http://nilasae.livejournal.com/ :: http://www.luga.at/




More information about the Techtalk mailing list