wiki:TipAndDoc/storage/RAID/failure

Version 8 (modified by mitty, 15 years ago) (diff)

--

failure test

初期状態

  • cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md1 : active raid1 sda2[0] sdc1[2](S) sdb2[1]
          1461760 blocks [2/2] [UU]
    
    md0 : active raid0 sda1[0] sdb1[1]
          192512 blocks 64k chunks
    
    unused devices: <none>
    
  • sudo mdadm -D /dev/md1
    /dev/md1:
            Version : 00.90.03
      Creation Time : Thu Jun 11 22:27:07 2009
         Raid Level : raid1
         Array Size : 1461760 (1427.74 MiB 1496.84 MB)
      Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB)
       Raid Devices : 2
      Total Devices : 3
    Preferred Minor : 1
        Persistence : Superblock is persistent
    
        Update Time : Fri Jun 12 02:21:57 2009
              State : clean
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 1
    
               UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a
             Events : 0.5
    
        Number   Major   Minor   RaidDevice State
           0       8        2        0      active sync   /dev/sda2
           1       8       18        1      active sync   /dev/sdb2
    
           2       8       33        -      spare   /dev/sdc1
    

hot-remove spare disk

  • sudo mdadm /dev/md1 -r /dev/sdc1
    mdadm: hot removed /dev/sdc1
    
    • sudo mdadm -D /dev/md1
          Number   Major   Minor   RaidDevice State
             0       8        2        0      active sync   /dev/sda2
             1       8       18        1      active sync   /dev/sdb2
      
  • sudo mdadm /dev/md1 -a /dev/sdc1
    mdadm: re-added /dev/sdc1
    
    • sudo mdadm -D /dev/md1
          Number   Major   Minor   RaidDevice State
             0       8        2        0      active sync   /dev/sda2
             1       8       18        1      active sync   /dev/sdb2
      
             2       8       33        -      spare   /dev/sdc1
      

fail, remove and re-add

  • man mdadm
    MANAGE MODE
           Usage: mdadm device options... devices...
    
           This usage will allow individual devices in  an  array  to  be  failed,
           removed  or  added.  It is possible to perform multiple operations with
           on command. For example:
             mdadm /dev/md0 -f /dev/hda1 -r /dev/hda1 -a /dev/hda1
           will firstly mark /dev/hda1 as faulty in /dev/md0 and will then  remove
           it  from the array and finally add it back in as a spare.  However only
           one md array can be affected by a single command.
    

  • sudo mdadm /dev/md1 -f /dev/sda2 -r /dev/sda2 -a /dev/sda2
    mdadm: set /dev/sda2 faulty in /dev/md1
    mdadm: hot removed /dev/sda2
    mdadm: re-added /dev/sda2
    
    • cat /proc/mdstat
      Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
      md1 : active raid1 sda2[2](S) sdc1[3] sdb2[1]
            1461760 blocks [2/1] [_U]
            [>....................]  recovery =  0.5% (8448/1461760) finish=2.8min speed=8448K/sec
      
      md0 : active raid0 sda1[0] sdb1[1]
            192512 blocks 64k chunks
      
      unused devices: <none>
      
    • sudo mdadm -D /dev/md1
      /dev/md1:
              Version : 00.90.03
        Creation Time : Thu Jun 11 22:27:07 2009
           Raid Level : raid1
           Array Size : 1461760 (1427.74 MiB 1496.84 MB)
        Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB)
         Raid Devices : 2
        Total Devices : 3
      Preferred Minor : 1
          Persistence : Superblock is persistent
      
          Update Time : Fri Jun 12 02:28:20 2009
                State : clean, degraded, recovering
       Active Devices : 1
      Working Devices : 3
       Failed Devices : 0
        Spare Devices : 2
      
       Rebuild Status : 8% complete
      
                 UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a
               Events : 0.17
      
          Number   Major   Minor   RaidDevice State
             3       8       33        0      spare rebuilding   /dev/sdc1
             1       8       18        1      active sync   /dev/sdb2
      
             2       8        2        -      spare   /dev/sda2
      

recovering

  • dmesg
    [ 6932.978126] raid1: Disk failure on sda2, disabling device.
    [ 6932.978134]  Operation continuing on 1 devices
    [ 6932.994568] RAID1 conf printout:
    [ 6932.994603]  --- wd:1 rd:2
    [ 6932.995107]  disk 0, wo:1, o:0, dev:sda2
    [ 6932.995127]  disk 1, wo:0, o:1, dev:sdb2
    [ 6933.022454] RAID1 conf printout:
    [ 6933.022462]  --- wd:1 rd:2
    [ 6933.022476]  disk 1, wo:0, o:1, dev:sdb2
    [ 6933.056753] RAID1 conf printout:
    [ 6933.056758]  --- wd:1 rd:2
    [ 6933.056762]  disk 0, wo:1, o:1, dev:sdc1
    [ 6933.056764]  disk 1, wo:0, o:1, dev:sdb2
    [ 6933.061349] md: recovery of RAID array md1
    [ 6933.061379] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
    [ 6933.061403] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
    [ 6933.062332] md: using 128k window, over a total of 1461760 blocks.
    [ 6933.173690] md: unbind<sda2>
    [ 6933.173708] md: export_rdev(sda2)
    [ 6933.945635] md: bind<sda2>
    [ 7066.932827] md: md1: recovery done.
    [ 7067.055469] RAID1 conf printout:
    [ 7067.055475]  --- wd:2 rd:2
    [ 7067.055479]  disk 0, wo:0, o:1, dev:sdc1
    [ 7067.055481]  disk 1, wo:0, o:1, dev:sdb2
    
  • cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md1 : active raid1 sda2[2](S) sdc1[0] sdb2[1]
          1461760 blocks [2/2] [UU]
    
    md0 : active raid0 sda1[0] sdb1[1]
          192512 blocks 64k chunks
    
    unused devices: <none>
    
  • sudo mdadm -D /dev/md1
    /dev/md1:
            Version : 00.90.03
      Creation Time : Thu Jun 11 22:27:07 2009
         Raid Level : raid1
         Array Size : 1461760 (1427.74 MiB 1496.84 MB)
      Used Dev Size : 1461760 (1427.74 MiB 1496.84 MB)
       Raid Devices : 2
      Total Devices : 3
    Preferred Minor : 1
        Persistence : Superblock is persistent
    
        Update Time : Fri Jun 12 02:31:09 2009
              State : active
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 1
    
               UUID : 79c3f25a:e47a02f3:9bb41f43:b191cd1a
             Events : 0.45
    
        Number   Major   Minor   RaidDevice State
           0       8       33        0      active sync   /dev/sdc1
           1       8       18        1      active sync   /dev/sdb2
    
           2       8        2        -      spare   /dev/sda2
    

reboot with degraded 1 disk

  • sudo mdadm /dev/md1 -f /dev/sdc1 /dev/sda2
    mdadm: set /dev/sdc1 faulty in /dev/md1
    mdadm: set /dev/sda2 faulty in /dev/md1
    
  • sudo mdadm /dev/md1 -r /dev/sdc1 /dev/sda2
    mdadm: hot removed /dev/sdc1
    mdadm: hot removed /dev/sda2
    
    • sudo mdadm -D /dev/md1
          Number   Major   Minor   RaidDevice State
             0       0        0        0      removed
             1       8       18        1      active sync   /dev/sdb2
      
  • sudo reboot
    • /dev/sdb2 のみで正常に起動する

hardware failure test

初期状態

  • cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md1 : active raid1 sda2[0] sdc1[2](S) sdb2[1]
          1461760 blocks [2/2] [UU]
    
    md0 : active raid0 sda1[0] sdb1[1]
          192512 blocks 64k chunks
    
    unused devices: <none>
    

disconnect /dev/sdb and reboot

  • attachment:ubuntu-raid-1.png
    • この状態で4分ほど停止 (タイムアウト待ち?)
    • sdbを物理的に切断したため、再起動前にsdcとして認識されていたディスクがsdbとして認識されている
      • [ 22.498171] sdb: sdb1 となっているのはそのせい
      • VMware 仮想SCSIデバイスとして接続しているため?
  • attachment:ubuntu-raid-2.png
    • 10秒ほど選択肢が表示される
  • attachment:ubuntu-raid-3.png
    • デフォルトでは、BusyBox shellに落ちる
  • bootloaderにGRUBを使っている場合は、ミラーのMBRにもGRUBをインストールしておく必要がある

Attachments (3)

Download all attachments as: .zip