[[PageOutline]] = failure test = == 初期状態 == * cat /proc/mdstat {{{ Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[0] sdc1[2](S) sdb2[1] 1461760 blocks [2/2] [UU] md0 : active raid0 sda1[0] sdb1[1] 192512 blocks 64k chunks unused devices: }}} * sudo mdadm -D /dev/md1 {{{ /dev/md1: Version : 00.90.03 Creation Time : Thu Jun 11 22:27:07 2009 Raid Level : raid1 Array Size : 1461760 (1427.74 MiB 1496.84 MB) Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Jun 12 02:21:57 2009 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a Events : 0.5 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 2 8 33 - spare /dev/sdc1 }}} == hot-remove spare disk == * sudo mdadm /dev/md1 -r /dev/sdc1 {{{ mdadm: hot removed /dev/sdc1 }}} * sudo mdadm -D /dev/md1 {{{ Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 }}} * sudo mdadm /dev/md1 -a /dev/sdc1 {{{ mdadm: re-added /dev/sdc1 }}} * sudo mdadm -D /dev/md1 {{{ Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 2 8 33 - spare /dev/sdc1 }}} == fail, remove and re-add == * man mdadm {{{ MANAGE MODE Usage: mdadm device options... devices... This usage will allow individual devices in an array to be failed, removed or added. It is possible to perform multiple operations with on command. For example: mdadm /dev/md0 -f /dev/hda1 -r /dev/hda1 -a /dev/hda1 will firstly mark /dev/hda1 as faulty in /dev/md0 and will then remove it from the array and finally add it back in as a spare. However only one md array can be affected by a single command. }}} * sudo mdadm /dev/md1 -f /dev/sda2 -r /dev/sda2 -a /dev/sda2 {{{ mdadm: set /dev/sda2 faulty in /dev/md1 mdadm: hot removed /dev/sda2 mdadm: re-added /dev/sda2 }}} * cat /proc/mdstat {{{ Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[2](S) sdc1[3] sdb2[1] 1461760 blocks [2/1] [_U] [>....................] recovery = 0.5% (8448/1461760) finish=2.8min speed=8448K/sec md0 : active raid0 sda1[0] sdb1[1] 192512 blocks 64k chunks unused devices: }}} * sudo mdadm -D /dev/md1 {{{ /dev/md1: Version : 00.90.03 Creation Time : Thu Jun 11 22:27:07 2009 Raid Level : raid1 Array Size : 1461760 (1427.74 MiB 1496.84 MB) Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Jun 12 02:28:20 2009 State : clean, degraded, recovering Active Devices : 1 Working Devices : 3 Failed Devices : 0 Spare Devices : 2 Rebuild Status : 8% complete UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a Events : 0.17 Number Major Minor RaidDevice State 3 8 33 0 spare rebuilding /dev/sdc1 1 8 18 1 active sync /dev/sdb2 2 8 2 - spare /dev/sda2 }}} === recovering === * dmesg {{{ [ 6932.978126] raid1: Disk failure on sda2, disabling device. [ 6932.978134] Operation continuing on 1 devices [ 6932.994568] RAID1 conf printout: [ 6932.994603] --- wd:1 rd:2 [ 6932.995107] disk 0, wo:1, o:0, dev:sda2 [ 6932.995127] disk 1, wo:0, o:1, dev:sdb2 [ 6933.022454] RAID1 conf printout: [ 6933.022462] --- wd:1 rd:2 [ 6933.022476] disk 1, wo:0, o:1, dev:sdb2 [ 6933.056753] RAID1 conf printout: [ 6933.056758] --- wd:1 rd:2 [ 6933.056762] disk 0, wo:1, o:1, dev:sdc1 [ 6933.056764] disk 1, wo:0, o:1, dev:sdb2 [ 6933.061349] md: recovery of RAID array md1 [ 6933.061379] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 6933.061403] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 6933.062332] md: using 128k window, over a total of 1461760 blocks. [ 6933.173690] md: unbind [ 6933.173708] md: export_rdev(sda2) [ 6933.945635] md: bind [ 7066.932827] md: md1: recovery done. [ 7067.055469] RAID1 conf printout: [ 7067.055475] --- wd:2 rd:2 [ 7067.055479] disk 0, wo:0, o:1, dev:sdc1 [ 7067.055481] disk 1, wo:0, o:1, dev:sdb2 }}} * cat /proc/mdstat {{{ Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[2](S) sdc1[0] sdb2[1] 1461760 blocks [2/2] [UU] md0 : active raid0 sda1[0] sdb1[1] 192512 blocks 64k chunks unused devices: }}} * sudo mdadm -D /dev/md1 {{{ /dev/md1: Version : 00.90.03 Creation Time : Thu Jun 11 22:27:07 2009 Raid Level : raid1 Array Size : 1461760 (1427.74 MiB 1496.84 MB) Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Jun 12 02:31:09 2009 State : active Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a Events : 0.45 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 18 1 active sync /dev/sdb2 2 8 2 - spare /dev/sda2 }}} == reboot with degraded 1 disk == * sudo mdadm /dev/md1 -f /dev/sdc1 /dev/sda2 {{{ mdadm: set /dev/sdc1 faulty in /dev/md1 mdadm: set /dev/sda2 faulty in /dev/md1 }}} * sudo mdadm /dev/md1 -r /dev/sdc1 /dev/sda2 {{{ mdadm: hot removed /dev/sdc1 mdadm: hot removed /dev/sda2 }}} * sudo mdadm -D /dev/md1 {{{ Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 }}} * sudo reboot * /dev/sdb2 のみで正常に起動する = hardware failure test = == 初期状態 == * cat /proc/mdstat {{{ Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[0] sdc1[2](S) sdb2[1] 1461760 blocks [2/2] [UU] md0 : active raid0 sda1[0] sdb1[1] 192512 blocks 64k chunks unused devices: }}} == disconnect /dev/sdb and reboot == * attachment:ubuntu-raid-1.png * この状態で4分ほど停止 (タイムアウト待ち?) * [https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/67299 Bug #67299 in mdadm (Ubuntu): “mdadm causes boot to hang for 4 minutes”] * sdbを物理的に切断したため、再起動前にsdcとして認識されていたディスクがsdbとして認識されている * [ 22.498171] sdb: sdb1 となっているのはそのせい * VMware 仮想SCSIデバイスとして接続しているため? * attachment:ubuntu-raid-2.png * 10秒ほど選択肢が表示される * attachment:ubuntu-raid-3.png * デフォルトでは、BusyBox shellに落ちる * degradedでも自動起動するには => wiki:TipAndDoc/RAID#dpkg-reconfiguremdadm * ~~bootloaderにGRUBを使っている場合は、ミラーのMBRにもGRUBをインストールしておく必要がある~~ * see wiki:TipAndDoc/RAID#bootloader