HOWTO: Rebuild failed Linux software RAID arrays
Detecting a drive failure? No mystery there. It’s easy enough with a quick glance to the standard logs and stat files to notice a drive failure.
/var/log/messages will always fill with a mess of error messages, no matter what happened… but, when it’s a disk crash, tons of kernel errors are reported. Some nasty examples (for the masochists):
kernel: scsi0 channel 0 : resetting for second half of retries. kernel: SCSI bus is being reset for host 0 channel 0. kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0 kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00 kernel: scsi0: Aborting CCB #2669 to Target 0 kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder kernel: SCSI bus is being reset for host 0 channel 0. kernel: scsi0: CCB #2669 to Target 0 Aborted kernel: scsi0: Resetting BusLogic BT-958 due to Target 0 kernel: scsi0: *** BusLogic BT-958 Initialized Successfully ***
Most often, disk failures look like these:
kernel: sidisk I/O error: dev 08:01, sector 1590410 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002
or these:
kernel: sdb: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0
kernel: sdb: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Today, I had the pleasure a hard drive failure on a production server. The faulty drive was part of a Linux multidisk (md) software RAID 1. A RAID level 1 array is mirrored drives, so I lost no data, and just needed to replace the hardware. However, my array will require manual rebuilding.
When you look at a “normal” array, you see something like this:
[root@server] [~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[1] sda2[0] 4192896 blocks [2/2] [UU] md2 : active raid1 sdb3[2] sda3[0] 68308288 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] unused devices: <none>
Above is the normal state and what you want your array to look like.
When a drive has failed and been replaced (by you or a hotspare), it looks like this:
[root@server] [~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda2[0] 4192896 blocks [2/1] [U_] md2 : active raid1 sda3[0] 68308288 blocks [2/1] [U_] md0 : active raid1 sda1[0] 104320 blocks [2/1] [U_] unused devices: <none>
Notice that it doesn’t list the failed drive parts, and that an underscore appears beside each U. This shows that only one drive is active in these arrays… a.k.a. we have no mirror. You better do something quick.
A program that will show us the state of the raid partitions is “mdadm”. We use “mdadm -D” to view details of that partition.
[root@server] [~]# mdadm -D /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Mon Mar 5 05:12:34 2007 Raid Level : raid1 Array Size : 104320 (101.89 MiB 106.82 MB) Device Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Dec 20 14:46:50 2008 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 7524fa5f:514b0bd4:f3f5652f:cd1fa7b9 Events : 0.7700 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 0 0 - removed
As this shows, we currently only have one drive in the array. Although I already knew that /dev/sdb was the other part of the raid array, you can look at /etc/raidtab to see how the raid was defined.
To get the mirrored drives working properly again, we need to run “fdisk” to see what partitions are on the working drive: /dev/sda.
[root@server] [~]# fdisk /dev/sda The number of cylinders for this disk is set to 9039. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help): p Disk /dev/sda: 74.3 GB, 74355769344 bytes 255 heads, 63 sectors/track, 9039 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 13 104391 fd Linux raid autodetect /dev/sda2 14 535 4192965 fd Linux raid autodetect /dev/sda3 536 9039 68308380 fd Linux raid autodetect Command (m for help):
Now we just have to duplicate that structure on the new blank drive: /dev/sdb. Use “n” to create the partitions, and “t” to change their type to “fd” to match. Remember to use “w” to save changes and exit fdisk. Using “q” will not save the changes you have made.
[root@server] [~]# fdisk /dev/sdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 9039. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-9039, default 1): 1 Last cylinder or +size or +sizeM or +sizeK (1-9039, default 9039): 13 Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (14-9039, default 14): 14 Last cylinder or +size or +sizeM or +sizeK (14-9039, default 9039): 535 Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 3 First cylinder (536-9039, default 536): 536 Last cylinder or +size or +sizeM or +sizeK (536-9039, default 9039): 9039 Command (m for help): t Partition number (1-4): 1 Hex code (type L to list codes): L 0 Empty 1e Hidden W95 FAT1 75 PC/IX be Solaris boot 1 FAT12 24 NEC DOS 80 Old Minix bf Solaris 2 XENIX root 39 Plan 9 81 Minix / old Lin c1 DRDOS/sec (FAT- 3 XENIX usr 3c PartitionMagic 82 Linux swap c4 DRDOS/sec (FAT- 4 FAT16 <32M 40 Venix 80286 83 Linux c6 DRDOS/sec (FAT- 5 Extended 41 PPC PReP Boot 84 OS/2 hidden C: c7 Syrinx 6 FAT16 42 SFS 85 Linux extended da Non-FS data 7 HPFS/NTFS 4d QNX4.x 86 NTFS volume set db CP/M / CTOS / . 8 AIX 4e QNX4.x 2nd part 87 NTFS volume set de Dell Utility 9 AIX bootable 4f QNX4.x 3rd part 8e Linux LVM df BootIt a OS/2 Boot Manag 50 OnTrack DM 93 Amoeba e1 DOS access b W95 FAT32 51 OnTrack DM6 Aux 94 Amoeba BBT e3 DOS R/O c W95 FAT32 (LBA) 52 CP/M 9f BSD/OS e4 SpeedStor e W95 FAT16 (LBA) 53 OnTrack DM6 Aux a0 IBM Thinkpad hi eb BeOS fs f W95 Ext'd (LBA) 54 OnTrackDM6 a5 FreeBSD ee EFI GPT 10 OPUS 55 EZ-Drive a6 OpenBSD ef EFI (FAT-12/16/ 11 Hidden FAT12 56 Golden Bow a7 NeXTSTEP f0 Linux/PA-RISC b 12 Compaq diagnost 5c Priam Edisk a8 Darwin UFS f1 SpeedStor 14 Hidden FAT16 <3 61 SpeedStor a9 NetBSD f4 SpeedStor 16 Hidden FAT16 63 GNU HURD or Sys ab Darwin boot f2 DOS secondary 17 Hidden HPFS/NTF 64 Novell Netware b7 BSDI fs fd Linux raid auto 18 AST SmartSleep 65 Novell Netware b8 BSDI swap fe LANstep 1b Hidden W95 FAT3 70 DiskSecure Mult bb Boot Wizard hid ff BBT 1c Hidden W95 FAT3 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): t Partition number (1-4): 2 Hex code (type L to list codes): fd Changed system type of partition 2 to fd (Linux raid autodetect) Command (m for help): t Partition number (1-4): 3 Hex code (type L to list codes): fd Changed system type of partition 3 to fd (Linux raid autodetect) Command (m for help): a Partition number (1-4): 1 Command (m for help): p Disk /dev/sdb: 74.3 GB, 74355769344 bytes 255 heads, 63 sectors/track, 9039 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 * 1 13 104391 fd Linux raid autodetect /dev/sdb2 14 535 4192965 fd Linux raid autodetect /dev/sdb3 536 9039 68308380 fd Linux raid autodetect Command (m for help): w
Once this is done, we use “mdadm” to add the new drive partitions to the array. As we add them, md will copy the data from the existing drive to the new drive automatically. (The command “raidhotadd” will work if you have it installed on your machine. raidhotadd’s syntax is “raidhotadd /dev/md0 /dev/sdb1″)
[root@server] [~]# mdadm /dev/md0 -a /dev/sdb1 mdadm: hot added /dev/sdb1 [root@server] [~]# mdadm /dev/md1 -a /dev/sdb2 mdadm: hot added /dev/sdb2 [root@server] [~]# mdadm /dev/md2 -a /dev/sdb3 mdadm: hot added /dev/sdb3
The rebuilding can be viewed in /proc/mdstat
md0, the smallest partition, has already completed rebuilding (UU), while md1 has only begun.
[root@server] [~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[2] sda2[0] 4192896 blocks [2/1] [U_] [===>.................] recovery = 16.7% (704448/4192896) finish=0.7min speed=78272K/sec md2 : active raid1 sdb3[2] sda3[0] 68308288 blocks [2/1] [U_] resync=DELAYED md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] unused devices: <none>
md1 is finished… starting md2. md2 is the largest partition and will take about 15 minutes.
[root@server] [~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[1] sda2[0] 4192896 blocks [2/2] [UU] md2 : active raid1 sdb3[2] sda3[0] 68308288 blocks [2/1] [U_] [===>.................] recovery = 16.4% (11240768/68308288) finish=12.3min speed=76878K/sec md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] unused devices: <none>
finished.
[root@server] [~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[1] sda2[0] 4192896 blocks [2/2] [UU] md2 : active raid1 sdb3[2] sda3[0] 68308288 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [UU] unused devices: <none>
Now reboot and your md mirror will be working before the drive failure.
Related Posts: On this day...
Filed under: free open source software, howto, linux
BeautyandBoost.com
Ruby on Rails blog





