Changes between Version 8 and Version 9 of TipAndDoc/HA


Ignore:
Timestamp:
Jul 1, 2011 6:45:24 PM (13 years ago)
Author:
mitty
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • TipAndDoc/HA

    v8 v9  
    3535 * [http://www.srchack.org/article.php?story=20110211234742375 Corosync(Slackware 13.1) - @SRCHACK.ORG(えす・あーる・しー・はっく)] 
    3636 * [http://library.linode.com/linux-ha/highly-available-file-database-server-ubuntu-10.04 Build a Highly Available NFS/MySQL/PostgreSQL Server on Ubuntu 10.04 LTS (Lucid) – Linode Library] 
     37 
     38 == corosync == 
     39 * [http://web.archiveorange.com/archive/v/yYk4BF4JNlUlPQSLhaMo corosync ring marked FAULTY - administrative intervention required - Open SA Forum AIS Services mailing list - ArchiveOrange] 
     40 
     41 === interface FAULTY === 
     42 * 複数のNICでredundant構成の時、あるinterfaceがダウンすると、problem counter(デフォルトでは10)のカウントダウンが始まる。0になると、そのinterfaceはFAULTYとして以後使わなくなる。 
     43{{{ 
     44Jul  1 18:10:00 debian-hab corosync[1377]:   [TOTEM ] Incrementing problem counter for seqid 850 iface 172.16.0.209 to [1 of 10] 
     45Jul  1 18:10:00 debian-hab corosync[1377]:   [TOTEM ] Incrementing problem counter for seqid 852 iface 172.16.0.209 to [2 of 10] 
     46 
     47(snip) 
     48 
     49Jul  1 18:10:11 debian-hab corosync[1377]:   [TOTEM ] Incrementing problem counter for seqid 876 iface 172.16.0.209 to [9 of 10] 
     50Jul  1 18:10:11 debian-hab corosync[1377]:   [TOTEM ] Incrementing problem counter for seqid 878 iface 172.16.0.209 to [10 of 10] 
     51Jul  1 18:10:11 debian-hab corosync[1377]:   [TOTEM ] Marking seqid 878 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required. 
     52}}} 
     53 * FAULTYになったあとリンクが復活しても、corosync-cfgtool -rで手動で戻す必要がある。 
     54  * mitty@debian-hab:~$ sudo corosync-cfgtool -s 
     55{{{ 
     56Printing ring status. 
     57Local node ID 1358997696 
     58RING ID 0 
     59        id      = 192.168.0.209 
     60        status  = ring 0 active with no faults 
     61RING ID 1 
     62        id      = 172.16.0.209 
     63        status  = Marking seqid 24 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required. 
     64}}} 
     65  * mitty@debian-hab:~$ sudo corosync-cfgtool -r 
     66{{{ 
     67Re-enabling all failed rings. 
     68}}} 
     69  * mitty@debian-hab:~$ sudo corosync-cfgtool -s 
     70{{{ 
     71Printing ring status. 
     72Local node ID 1358997696 
     73RING ID 0 
     74        id      = 192.168.0.209 
     75        status  = ring 0 active with no faults 
     76RING ID 1 
     77        id      = 172.16.0.209 
     78        status  = ring 1 active with no faults 
     79}}} 
     80 
     81 * problem counterの値は、rrp_problem_count_thresholdで変更出来る。 
     82  * /etc/corosync/corosync.conf 
     83{{{ 
     84totem { 
     85 
     86(snip) 
     87 
     88        rrp_problem_count_threshold: 1000 
     89}}} 
     90 
     91 * 二つのinterfaceがある状態で、二つともダウンすると、problem counterのカウントダウンは停止し、その後(何故か)no faultsに戻る。 
     92  * ring 1 -> ring 0の順でダウン 
     93{{{ 
     94Jul  1 18:30:21 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8312 iface 172.16.0.209 to [1 of 1000] 
     95Jul  1 18:30:22 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8314 iface 172.16.0.209 to [2 of 1000] 
     96Jul  1 18:30:22 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8316 iface 172.16.0.209 to [3 of 1000] 
     97Jul  1 18:30:23 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] 
     98Jul  1 18:30:23 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8318 iface 172.16.0.209 to [3 of 1000] 
     99Jul  1 18:30:24 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8320 iface 172.16.0.209 to [4 of 1000] 
     100Jul  1 18:30:25 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] 
     101Jul  1 18:30:25 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8322 iface 172.16.0.209 to [4 of 1000] 
     102Jul  1 18:30:26 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8324 iface 172.16.0.209 to [5 of 1000] 
     103Jul  1 18:30:27 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8326 iface 172.16.0.209 to [6 of 1000] 
     104Jul  1 18:30:27 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] 
     105Jul  1 18:30:27 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 8328 iface 172.16.0.209 to [6 of 1000] 
     106Jul  1 18:30:29 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] 
     107Jul  1 18:30:31 debian-hab corosync[1424]:   [TOTEM ] A processor failed, forming new configuration.                                      <---- ここで二つ目のリンクもダウン、UNCLEAN(offline)へ 
     108Jul  1 18:30:31 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [4 of 1000] 
     109Jul  1 18:30:33 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] 
     110 
     111(snip) 
     112 
     113Jul  1 18:30:35 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] 
     114Jul  1 18:30:37 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [1 of 1000] 
     115Jul  1 18:30:39 debian-hab corosync[1424]:   [TOTEM ] ring 1 active with no faults 
     116}}} 
     117  1. どちらかのリンクが復活すると、復活していない方のカウントダウンが再開する。 
     118   * ring 1が復活 
     119{{{ 
     120 
     121Jul  1 18:34:56 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 2 iface 192.168.0.209 to [1 of 1000] 
     122Jul  1 18:34:57 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 4 iface 192.168.0.209 to [2 of 1000] 
     123Jul  1 18:34:58 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 6 iface 192.168.0.209 to [3 of 1000] 
     124 
     125(snip) 
     126 
     127Jul  1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-haa is online 
     128Jul  1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-hab is online 
     129 
     130 
     131(snip) 
     132 
     133Jul  1 18:35:09 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 42 iface 192.168.0.209 to [15 of 1000] 
     134Jul  1 18:35:09 debian-hab corosync[1424]:   [MAIN  ] Completed service synchronization, ready to provide service. 
     135Jul  1 18:35:09 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 44 iface 192.168.0.209 to [16 of 1000] 
     136}}} 
     137  * ring 0も復活すると、片方がダウン->problem counterが閾値を越える前に回復、と同じように元に戻る 
     138{{{ 
     139 
     140Jul  1 18:35:19 debian-hab corosync[1424]:   [TOTEM ] Incrementing problem counter for seqid 76 iface 192.168.0.209 to [27 of 1000] 
     141Jul  1 18:35:21 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [26 of 1000] 
     142 
     143(snip) 
     144 
     145Jul  1 18:36:09 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [2 of 1000] 
     146Jul  1 18:36:11 debian-hab corosync[1424]:   [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [1 of 1000] 
     147Jul  1 18:36:13 debian-hab corosync[1424]:   [TOTEM ] ring 0 active with no faults 
     148}}}