2010/05/08のRAID5の復旧
対象
- 500GB x 6 RAID5 array
経過
原因
- RAID5構成diskの6台中1台を誤って接続したまま、別のdiskへOSをインストールした際に、気付かずにRAID5構成disk(のおそらくsuperblock)を消去
- sudo fdisk -l /dev/sdg
/dev/sdg1 1 60789 488287611 fd Linux raid autodetect
- sudo mdadm --examine /dev/sdg1
mdadm: No md superblock detected on /dev/sdg1.
- cat /proc/mdstat
md2 : inactive sdj1[4](S) sdh1[5](S) sdd1[1](S) sdf1[0](S) sdi1[3](S) 2441437440 blocks
- sudo fdisk -l /dev/sdg
縮退起動
- mdadm --runで degraded start (5/6)
- sudo mdadm --run /dev/md2 --verbose
mdadm: started /dev/md2
- cat /proc/mdstat
md2 : active raid5 sdj1[4] sdh1[5] sdd1[1] sdf1[0] sdi1[3] 2441437440 blocks level 5, 64k chunk, algorithm 2 [6/5] [UU_UUU]
- sudo mdadm --run /dev/md2 --verbose
- 縮退運転中にまた別のdiskのI/Oエラー、および複数のdiskにhard resetting linkが発生
- => kern.log
- SATA-I/Fとdiskの相性が悪い模様。以前よりたびたび発生していたが、今回は致命的
- cat /proc/mdstat
md2 : active raid5 sdj1[6](F) sdh1[7](F) sdd1[1] sdf1[0] sdi1[8](F) 2441437440 blocks level 5, 64k chunk, algorithm 2 [6/2] [UU____]
- 再始動不能
- sudo mdadm --run /dev/md2 --verbose
mdadm: failed to run array /dev/md2: Device or resource busy
- sudo mdadm --run /dev/md2 --verbose
OS再起動
- system reboot
- OS再起動するも、2/6 failedとなり始動不可
- cat /proc/mdstat
md2 : inactive sdc1[3](S) sdb1[5](S) sdd1[4](S) sdh1[1](S) sdj1[0](S) 2441437440 blocks
- sudo mdadm --run /dev/md2 --verbose
mdadm: failed to run array /dev/md2: Input/output error
- dmesg | tail -n 30
[ 78.297675] md: kicking non-fresh sdb1 from array! [ 78.297685] md: unbind<sdb1> [ 78.321292] md: export_rdev(sdb1) [ 78.410382] raid5: device sdc1 operational as raid disk 3 [ 78.410384] raid5: device sdd1 operational as raid disk 4 [ 78.410386] raid5: device sdh1 operational as raid disk 1 [ 78.410388] raid5: device sdj1 operational as raid disk 0 [ 78.410861] raid5: allocated 6386kB for md2 [ 78.410907] 3: w=1 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 78.410910] 4: w=2 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 78.410912] 1: w=3 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 78.410914] 0: w=4 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 78.410916] raid5: not enough operational devices for md2 (2/6 failed) [ 78.411098] RAID5 conf printout: [ 78.411100] --- rd:6 wd:4 [ 78.411102] disk 0, o:1, dev:sdj1 [ 78.411103] disk 1, o:1, dev:sdh1 [ 78.411105] disk 3, o:1, dev:sdc1 [ 78.411107] disk 4, o:1, dev:sdd1 [ 78.411529] raid5: failed to run raid set md2 [ 78.411651] md: pers->run() failed ...
- cat /proc/mdstat
- /dev/sdb1がmdstatから消滅
- cat /proc/mdstat
md2 : inactive sda1[6](S) sdc1[3] sdd1[4] sdh1[1] sdj1[0] 2441437440 blocks
- cat /proc/mdstat
- /dev/sdb1をre-addし、再始動
- sudo mdadm /dev/md2 -a /dev/sdb1
mdadm: re-added /dev/sdb1
- sudo mdadm --run /dev/md2 --verbose
mdadm: started /dev/md2
- cat kern.log
May 8 23:07:16 HOSTNAME kernel: [ 308.856084] md: bind<sdb1> May 8 23:07:19 HOSTNAME kernel: [ 311.836915] raid5: device sdb1 operational as raid disk 5 May 8 23:07:19 HOSTNAME kernel: [ 311.836923] raid5: device sdc1 operational as raid disk 3 May 8 23:07:19 HOSTNAME kernel: [ 311.836929] raid5: device sdd1 operational as raid disk 4 May 8 23:07:19 HOSTNAME kernel: [ 311.836934] raid5: device sdh1 operational as raid disk 1 May 8 23:07:19 HOSTNAME kernel: [ 311.836939] raid5: device sdj1 operational as raid disk 0 May 8 23:07:19 HOSTNAME kernel: [ 311.838484] raid5: allocated 6386kB for md2 May 8 23:07:19 HOSTNAME kernel: [ 311.838789] 5: w=1 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 May 8 23:07:19 HOSTNAME kernel: [ 311.838796] 3: w=2 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 May 8 23:07:19 HOSTNAME kernel: [ 311.838801] 4: w=3 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 May 8 23:07:19 HOSTNAME kernel: [ 311.838807] 1: w=4 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 May 8 23:07:19 HOSTNAME kernel: [ 311.838812] 0: w=5 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 May 8 23:07:19 HOSTNAME kernel: [ 311.838818] raid5: raid level 5 set md2 active with 5 out of 6 devices, algorithm 2 May 8 23:07:19 HOSTNAME kernel: [ 311.852170] RAID5 conf printout: May 8 23:07:19 HOSTNAME kernel: [ 311.852174] --- rd:6 wd:5 May 8 23:07:19 HOSTNAME kernel: [ 311.852179] disk 0, o:1, dev:sdj1 May 8 23:07:19 HOSTNAME kernel: [ 311.852184] disk 1, o:1, dev:sdh1 May 8 23:07:19 HOSTNAME kernel: [ 311.852188] disk 3, o:1, dev:sdc1 May 8 23:07:19 HOSTNAME kernel: [ 311.852192] disk 4, o:1, dev:sdd1 May 8 23:07:19 HOSTNAME kernel: [ 311.852196] disk 5, o:1, dev:sdb1 May 8 23:07:19 HOSTNAME kernel: [ 311.852306] md2: detected capacity change from 0 to 2500031938560 May 8 23:07:19 HOSTNAME kernel: [ 311.852642] md2:RAID5 conf printout: May 8 23:07:19 HOSTNAME kernel: [ 311.853380] --- rd:6 wd:5 May 8 23:07:19 HOSTNAME kernel: [ 311.853386] disk 0, o:1, dev:sdj1 May 8 23:07:19 HOSTNAME kernel: [ 311.853390] disk 1, o:1, dev:sdh1 May 8 23:07:19 HOSTNAME kernel: [ 311.853394] disk 2, o:1, dev:sda1 May 8 23:07:19 HOSTNAME kernel: [ 311.853398] disk 3, o:1, dev:sdc1 May 8 23:07:19 HOSTNAME kernel: [ 311.853402] disk 4, o:1, dev:sdd1 May 8 23:07:19 HOSTNAME kernel: [ 311.853406] disk 5, o:1, dev:sdb1 May 8 23:07:19 HOSTNAME kernel: [ 311.853513] unknown partition table May 8 23:07:19 HOSTNAME kernel: [ 311.855855] md: recovery of RAID array md2 May 8 23:07:19 HOSTNAME kernel: [ 311.855863] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. May 8 23:07:19 HOSTNAME kernel: [ 311.855868] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. May 8 23:07:19 HOSTNAME kernel: [ 311.855883] md: using 128k window, over a total of 488287488 blocks.
- cat /proc/mdstat
md2 : active raid5 sdb1[5] sda1[6] sdc1[3] sdd1[4] sdh1[1] sdj1[0] 2441437440 blocks level 5, 64k chunk, algorithm 2 [6/5] [UU_UUU] [>....................] recovery = 0.0% (117376/488287488) finish=415.8min speed=19562K/sec
- sudo mdadm /dev/md2 -a /dev/sdb1
- 再度diskのI/Oエラー、および複数のdiskにhard resetting linkが発生
- => kern.log
物理的配置換え
- 問題の起きるSATA-I/Fの使用を諦め、別のM/Bにdisk6台を繋ぎなおす
- cat /proc/mdstat
md2 : inactive sdd1[5](S) sdg1[7](S) sdf1[0](S) sde1[4](S) sdc1[1](S) sdh1[3](S) 2929724928 blocks
- cat /proc/mdstat
- 今度は4/6 failedとなり再始動不可
- sudo mdadm --run /dev/md2
mdadm: failed to run array /dev/md2: Input/output error
- dmesg | tail -n 30
[ 128.378868] md: kicking non-fresh sdd1 from array! [ 128.378876] md: unbind<sdd1> [ 128.400016] md: export_rdev(sdd1) [ 128.400096] md: kicking non-fresh sde1 from array! [ 128.400101] md: unbind<sde1> [ 128.430012] md: export_rdev(sde1) [ 128.430082] md: kicking non-fresh sdh1 from array! [ 128.430087] md: unbind<sdh1> [ 128.500012] md: export_rdev(sdh1) [ 128.564040] raid5: device sdf1 operational as raid disk 0 [ 128.564043] raid5: device sdc1 operational as raid disk 1 [ 128.564449] raid5: allocated 6386kB for md2 [ 128.564469] 0: w=1 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 128.564471] 1: w=2 pa=0 pr=6 m=1 a=2 r=6 op1=0 op2=0 [ 128.564472] raid5: not enough operational devices for md2 (4/6 failed) [ 128.564720] RAID5 conf printout: [ 128.564722] --- rd:6 wd:2 [ 128.564723] disk 0, o:1, dev:sdf1 [ 128.564725] disk 1, o:1, dev:sdc1 [ 128.564917] raid5: failed to run raid set md2 [ 128.565090] md: pers->run() failed ...
- sudo mdadm --run /dev/md2
- unbindされたdeviceをre-add
- sudo mdadm /dev/md2 -a /dev/sdc1
mdadm: Cannot open /dev/sdc1: Device or resource busy
- sudo mdadm /dev/md2 -a /dev/sdd1
mdadm: re-added /dev/sdd1
- sudo mdadm /dev/md2 -a /dev/sde1
mdadm: re-added /dev/sde1
- sudo mdadm /dev/md2 -a /dev/sdf1
mdadm: Cannot open /dev/sdf1: Device or resource busy
- sudo mdadm /dev/md2 -a /dev/sdg1
mdadm: Cannot open /dev/sdg1: Device or resource busy
- sudo mdadm /dev/md2 -a /dev/sdh1
mdadm: re-added /dev/sdh1
- sudo mdadm /dev/md2 -a /dev/sdi1
mdadm: cannot find /dev/sdi1: No such file or directory
- sudo mdadm /dev/md2 -r /dev/sdg1
mdadm: hot removed /dev/sdg1
- sudo mdadm /dev/md2 -a /dev/sdg1
mdadm: re-added /dev/sdg1
- sudo mdadm /dev/md2 -a /dev/sdc1
- 再始動
- sudo mdadm --run /dev/md2
mdadm: started /dev/md2
- sudo mdadm --run /dev/md2
- rebuilding
- cat /proc/mdstat
md2 : active raid5 sdg1[7] sdh1[3] sde1[4] sdd1[5] sdf1[0] sdc1[1] 2441437440 blocks level 5, 64k chunk, algorithm 2 [6/5] [UU_UUU] [>....................] recovery = 1.0% (5284864/488287488) finish=128.6min speed=62558K/sec
- cat /proc/mdstat
- rebuild completed
- cat /proc/mdstat
md2 : active raid5 sdg1[2] sdh1[3] sde1[4] sdd1[5] sdf1[0] sdc1[1] 2441437440 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
- => kern.log
- cat /proc/mdstat
Last modified 15 years ago
Last modified on May 12, 2010 2:21:47 PM