| 37 | |
| 38 | == corosync == |
| 39 | * [http://web.archiveorange.com/archive/v/yYk4BF4JNlUlPQSLhaMo corosync ring marked FAULTY - administrative intervention required - Open SA Forum AIS Services mailing list - ArchiveOrange] |
| 40 | |
| 41 | === interface FAULTY === |
| 42 | * 複数のNICでredundant構成の時、あるinterfaceがダウンすると、problem counter(デフォルトでは10)のカウントダウンが始まる。0になると、そのinterfaceはFAULTYとして以後使わなくなる。 |
| 43 | {{{ |
| 44 | Jul 1 18:10:00 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 850 iface 172.16.0.209 to [1 of 10] |
| 45 | Jul 1 18:10:00 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 852 iface 172.16.0.209 to [2 of 10] |
| 46 | |
| 47 | (snip) |
| 48 | |
| 49 | Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 876 iface 172.16.0.209 to [9 of 10] |
| 50 | Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 878 iface 172.16.0.209 to [10 of 10] |
| 51 | Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Marking seqid 878 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required. |
| 52 | }}} |
| 53 | * FAULTYになったあとリンクが復活しても、corosync-cfgtool -rで手動で戻す必要がある。 |
| 54 | * mitty@debian-hab:~$ sudo corosync-cfgtool -s |
| 55 | {{{ |
| 56 | Printing ring status. |
| 57 | Local node ID 1358997696 |
| 58 | RING ID 0 |
| 59 | id = 192.168.0.209 |
| 60 | status = ring 0 active with no faults |
| 61 | RING ID 1 |
| 62 | id = 172.16.0.209 |
| 63 | status = Marking seqid 24 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required. |
| 64 | }}} |
| 65 | * mitty@debian-hab:~$ sudo corosync-cfgtool -r |
| 66 | {{{ |
| 67 | Re-enabling all failed rings. |
| 68 | }}} |
| 69 | * mitty@debian-hab:~$ sudo corosync-cfgtool -s |
| 70 | {{{ |
| 71 | Printing ring status. |
| 72 | Local node ID 1358997696 |
| 73 | RING ID 0 |
| 74 | id = 192.168.0.209 |
| 75 | status = ring 0 active with no faults |
| 76 | RING ID 1 |
| 77 | id = 172.16.0.209 |
| 78 | status = ring 1 active with no faults |
| 79 | }}} |
| 80 | |
| 81 | * problem counterの値は、rrp_problem_count_thresholdで変更出来る。 |
| 82 | * /etc/corosync/corosync.conf |
| 83 | {{{ |
| 84 | totem { |
| 85 | |
| 86 | (snip) |
| 87 | |
| 88 | rrp_problem_count_threshold: 1000 |
| 89 | }}} |
| 90 | |
| 91 | * 二つのinterfaceがある状態で、二つともダウンすると、problem counterのカウントダウンは停止し、その後(何故か)no faultsに戻る。 |
| 92 | * ring 1 -> ring 0の順でダウン |
| 93 | {{{ |
| 94 | Jul 1 18:30:21 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8312 iface 172.16.0.209 to [1 of 1000] |
| 95 | Jul 1 18:30:22 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8314 iface 172.16.0.209 to [2 of 1000] |
| 96 | Jul 1 18:30:22 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8316 iface 172.16.0.209 to [3 of 1000] |
| 97 | Jul 1 18:30:23 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] |
| 98 | Jul 1 18:30:23 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8318 iface 172.16.0.209 to [3 of 1000] |
| 99 | Jul 1 18:30:24 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8320 iface 172.16.0.209 to [4 of 1000] |
| 100 | Jul 1 18:30:25 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] |
| 101 | Jul 1 18:30:25 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8322 iface 172.16.0.209 to [4 of 1000] |
| 102 | Jul 1 18:30:26 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8324 iface 172.16.0.209 to [5 of 1000] |
| 103 | Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8326 iface 172.16.0.209 to [6 of 1000] |
| 104 | Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] |
| 105 | Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8328 iface 172.16.0.209 to [6 of 1000] |
| 106 | Jul 1 18:30:29 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] |
| 107 | Jul 1 18:30:31 debian-hab corosync[1424]: [TOTEM ] A processor failed, forming new configuration. <---- ここで二つ目のリンクもダウン、UNCLEAN(offline)へ |
| 108 | Jul 1 18:30:31 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [4 of 1000] |
| 109 | Jul 1 18:30:33 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] |
| 110 | |
| 111 | (snip) |
| 112 | |
| 113 | Jul 1 18:30:35 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] |
| 114 | Jul 1 18:30:37 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [1 of 1000] |
| 115 | Jul 1 18:30:39 debian-hab corosync[1424]: [TOTEM ] ring 1 active with no faults |
| 116 | }}} |
| 117 | 1. どちらかのリンクが復活すると、復活していない方のカウントダウンが再開する。 |
| 118 | * ring 1が復活 |
| 119 | {{{ |
| 120 | |
| 121 | Jul 1 18:34:56 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 2 iface 192.168.0.209 to [1 of 1000] |
| 122 | Jul 1 18:34:57 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 4 iface 192.168.0.209 to [2 of 1000] |
| 123 | Jul 1 18:34:58 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 6 iface 192.168.0.209 to [3 of 1000] |
| 124 | |
| 125 | (snip) |
| 126 | |
| 127 | Jul 1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-haa is online |
| 128 | Jul 1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-hab is online |
| 129 | |
| 130 | |
| 131 | (snip) |
| 132 | |
| 133 | Jul 1 18:35:09 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 42 iface 192.168.0.209 to [15 of 1000] |
| 134 | Jul 1 18:35:09 debian-hab corosync[1424]: [MAIN ] Completed service synchronization, ready to provide service. |
| 135 | Jul 1 18:35:09 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 44 iface 192.168.0.209 to [16 of 1000] |
| 136 | }}} |
| 137 | * ring 0も復活すると、片方がダウン->problem counterが閾値を越える前に回復、と同じように元に戻る |
| 138 | {{{ |
| 139 | |
| 140 | Jul 1 18:35:19 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 76 iface 192.168.0.209 to [27 of 1000] |
| 141 | Jul 1 18:35:21 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [26 of 1000] |
| 142 | |
| 143 | (snip) |
| 144 | |
| 145 | Jul 1 18:36:09 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [2 of 1000] |
| 146 | Jul 1 18:36:11 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [1 of 1000] |
| 147 | Jul 1 18:36:13 debian-hab corosync[1424]: [TOTEM ] ring 0 active with no faults |
| 148 | }}} |