| Wednesday July 30th 2014

Feedburner

Subscribe by email:

We promise not to spam/sell you.


Search Amazon deals:

HOWTO: Rebuild failed Linux software RAID arrays


Detecting a drive failure? No mystery there. It’s easy enough with a quick glance to the standard logs and stat files to notice a drive failure.

/var/log/messages will always fill with a mess of error messages, no matter what happened… but, when it’s a disk crash, tons of kernel errors are reported. Some nasty examples (for the masochists):

kernel: scsi0 channel 0 : resetting for second half of retries.
kernel: SCSI bus is being reset for host 0 channel 0.
kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0
kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed
kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00
kernel: scsi0: Aborting CCB #2669 to Target 0
kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder
kernel: SCSI bus is being reset for host 0 channel 0.
kernel: scsi0: CCB #2669 to Target 0 Aborted
kernel: scsi0: Resetting BusLogic BT-958 due to Target 0
kernel: scsi0: *** BusLogic BT-958 Initialized Successfully ***

Most often, disk failures look like these:

kernel: sidisk I/O error: dev 08:01, sector 1590410
kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002

or these:

kernel: sdb: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0
kernel: sdb: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }

Today, I had the pleasure a hard drive failure on a production server. The faulty drive was part of a Linux multidisk (md) software RAID 1. A RAID level 1 array is mirrored drives, so I lost no data, and just needed to replace the hardware. However, my array will require manual rebuilding.


When you look at a “normal” array, you see something like this:

[root@server] [~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
4192896 blocks [2/2] [UU]

md2 : active raid1 sdb3[2] sda3[0]
68308288 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

unused devices: <none>

Above is the normal state and what you want your array to look like.

When a drive has failed and been replaced (by you or a hotspare), it looks like this:

[root@server] [~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0]
4192896 blocks [2/1] [U_]

md2 : active raid1 sda3[0]
68308288 blocks [2/1] [U_]

md0 : active raid1 sda1[0]
104320 blocks [2/1] [U_]

unused devices: <none>

Notice that it doesn’t list the failed drive parts, and that an underscore appears beside each U. This shows that only one drive is active in these arrays… a.k.a. we have no mirror. You better do something quick.

A program that will show us the state of the raid partitions is “mdadm”. We use “mdadm -D” to view details of that partition.

[root@server] [~]# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Mon Mar  5 05:12:34 2007
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Device Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sat Dec 20 14:46:50 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 7524fa5f:514b0bd4:f3f5652f:cd1fa7b9
Events : 0.7700

Number   Major   Minor   RaidDevice State
0       8        1        0      active sync   /dev/sda1
1       0        0        -      removed

As this shows, we currently only have one drive in the array. Although I already knew that /dev/sdb was the other part of the raid array, you can look at /etc/raidtab to see how the raid was defined.

To get the mirrored drives working properly again, we need to run “fdisk” to see what partitions are on the working drive: /dev/sda.

[root@server] [~]# fdisk /dev/sda

The number of cylinders for this disk is set to 9039.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 74.3 GB, 74355769344 bytes
255 heads, 63 sectors/track, 9039 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14         535     4192965   fd  Linux raid autodetect
/dev/sda3             536        9039    68308380   fd  Linux raid autodetect

Command (m for help):

Now we just have to duplicate that structure on the new blank drive: /dev/sdb. Use “n” to create the partitions, and “t” to change their type to “fd” to match. Remember to use “w” to save changes and exit fdisk. Using “q” will not save the changes you have made.

[root@server] [~]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.

The number of cylinders for this disk is set to 9039.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-9039, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-9039, default 9039): 13

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (14-9039, default 14): 14
Last cylinder or +size or +sizeM or +sizeK (14-9039, default 9039): 535

Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (536-9039, default 536): 536
Last cylinder or +size or +sizeM or +sizeK (536-9039, default 9039): 9039

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): L

0  Empty           1e  Hidden W95 FAT1 75  PC/IX           be  Solaris boot
1  FAT12           24  NEC DOS         80  Old Minix       bf  Solaris
2  XENIX root      39  Plan 9          81  Minix / old Lin c1  DRDOS/sec (FAT-
3  XENIX usr       3c  PartitionMagic  82  Linux swap      c4  DRDOS/sec (FAT-
4  FAT16 <32M      40  Venix 80286     83  Linux           c6  DRDOS/sec (FAT-
5  Extended        41  PPC PReP Boot   84  OS/2 hidden C:  c7  Syrinx
6  FAT16           42  SFS             85  Linux extended  da  Non-FS data
7  HPFS/NTFS       4d  QNX4.x          86  NTFS volume set db  CP/M / CTOS / .
8  AIX             4e  QNX4.x 2nd part 87  NTFS volume set de  Dell Utility
9  AIX bootable    4f  QNX4.x 3rd part 8e  Linux LVM       df  BootIt
a  OS/2 Boot Manag 50  OnTrack DM      93  Amoeba          e1  DOS access
b  W95 FAT32       51  OnTrack DM6 Aux 94  Amoeba BBT      e3  DOS R/O
c  W95 FAT32 (LBA) 52  CP/M            9f  BSD/OS          e4  SpeedStor
e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a0  IBM Thinkpad hi eb  BeOS fs
f  W95 Ext'd (LBA) 54  OnTrackDM6      a5  FreeBSD         ee  EFI GPT
10  OPUS            55  EZ-Drive        a6  OpenBSD         ef  EFI (FAT-12/16/
11  Hidden FAT12    56  Golden Bow      a7  NeXTSTEP        f0  Linux/PA-RISC b
12  Compaq diagnost 5c  Priam Edisk     a8  Darwin UFS      f1  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       a9  NetBSD          f4  SpeedStor
16  Hidden FAT16    63  GNU HURD or Sys ab  Darwin boot     f2  DOS secondary
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fd  Linux raid auto
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fe  LANstep
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid ff  BBT
1c  Hidden W95 FAT3
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): a
Partition number (1-4): 1

Command (m for help): p

Disk /dev/sdb: 74.3 GB, 74355769344 bytes
255 heads, 63 sectors/track, 9039 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14         535     4192965   fd  Linux raid autodetect
/dev/sdb3             536        9039    68308380   fd  Linux raid autodetect

Command (m for help): w

Once this is done, we use “mdadm” to add the new drive partitions to the array. As we add them, md will copy the data from the existing drive to the new drive automatically. (The command “raidhotadd” will work if you have it installed on your machine. raidhotadd’s syntax is “raidhotadd /dev/md0 /dev/sdb1″)

[root@server] [~]# mdadm /dev/md0 -a /dev/sdb1
mdadm: hot added /dev/sdb1
[root@server] [~]# mdadm /dev/md1 -a /dev/sdb2
mdadm: hot added /dev/sdb2
[root@server] [~]# mdadm /dev/md2 -a /dev/sdb3
mdadm: hot added /dev/sdb3

The rebuilding can be viewed in /proc/mdstat

md0, the smallest partition, has already completed rebuilding (UU), while md1 has only begun.

[root@server] [~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[2] sda2[0]
4192896 blocks [2/1] [U_]
[===>.................]  recovery = 16.7% (704448/4192896) finish=0.7min speed=78272K/sec

md2 : active raid1 sdb3[2] sda3[0]
68308288 blocks [2/1] [U_]
resync=DELAYED

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

unused devices: <none>

md1 is finished… starting md2. md2 is the largest partition and will take about 15 minutes.

[root@server] [~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
4192896 blocks [2/2] [UU]

md2 : active raid1 sdb3[2] sda3[0]
68308288 blocks [2/1] [U_]
[===>.................]  recovery = 16.4% (11240768/68308288) finish=12.3min speed=76878K/sec

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

unused devices: <none>

finished.

[root@server] [~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
4192896 blocks [2/2] [UU]

md2 : active raid1 sdb3[2] sda3[0]
68308288 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]

unused devices: <none>

Now reboot and your md mirror will be working before the drive failure.

Related Posts: On this day...

Reader Feedback

2 Responses to “HOWTO: Rebuild failed Linux software RAID arrays”

  1. Luigi says:

    Hi, you can also sopy partition table to the new disk with the following command:

    sfdisk -d /dev/sda | sfdisk /dev/sdb

    where /dev/sda is the good drive and /dev/sdb is the replaced one.

  2. Melgarejo says:

    Thanks a lot for sharing this with all of us you really realize what you are talking approximately! Bookmarked. Kindly additionally seek advice from my web site =). We can have a link change arrangement among us

Leave a Reply

You must be logged in to post a comment.