Version 11 (modified by mitty, 13 years ago) (diff) |
---|
- TipAndDoc/HA
High Availability
- DRBD+Heartbeatでお手軽HA Cluster VA Linux Systems Japan
- ソフトウェアRAIDとDRBD作業ログ - より良い環境を求めて
- heartbeatとDRBD作業ログ + webmin - より良い環境を求めて
- HEARTBEATで使用可能なコマンドが知りたい。 (Linux-ha-jp) - Linux-HA Japan - SourceForge.JP
- RE: 27日の日記 - Pirorin - 楽天ブログ(Blog) - HeartBeat とDRBD によるHAクラスタの構築
- Linux-HA Japan
- DRBDとHeartbeat (1) サービス稼働率の向上 | CMS Blog | ミツエーリンクス
tutorial
- HighlyAvailableNFS - Community Ubuntu Documentation
- HighlyAvailableiSCSITarget - Community Ubuntu Documentation
DRBD
- Distributed Replicated Block Device
- DRBD
- Installing and Configuring DRBD on Ubuntu 10.04 « Johnson's Blog
- Ubuntu10.04でクラスタ環境(Heartbeat+DRBD) | サラトガIT日記
- lost and found ( for me ? ): Ubuntu 10.04 TLS : DRBD
apt で DRBD をインストールできるように、sources.list に1行目を追加
- これは必要か不明
- http://ppa.launchpad.net/ubuntu-ha/lucid-cluster/ubuntu/dists/lucid/main/binary-i386/Packages 等を見た限り「Source: redhat-cluster」となっているもの以外はppaから追加しなくても良さそう
- @IT:DRBD+iSCSI夢の共演(前編)(1/3)
- drbdのスプリットブレイン訓練 - お仕事日記。
Pacemaker
- Pacemakerインストール方法 CentOS 5編 « Linux-HA Japan
- PacemakerとDRBDでサーバー構築してみよう(動画デモ) « Linux-HA Japan
- Corosync(Slackware 13.1) - @SRCHACK.ORG(えす・あーる・しー・はっく)
- Build a Highly Available NFS/MySQL/PostgreSQL Server on Ubuntu 10.04 LTS (Lucid) – Linode Library
corosync
- |Openais| |PATCH| Implementation of automatic redundant ring recovery
- 将来的には下記の#interfaceFAULTYの問題はこのパッチが取り込まれて解決するかも
interface FAULTY
- 複数のNICでredundant構成の時、あるinterfaceがダウンすると、problem counter(デフォルトでは10)のカウントダウンが始まる。0になると、そのinterfaceはFAULTYとして以後使わなくなる。
Jul 1 18:10:00 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 850 iface 172.16.0.209 to [1 of 10] Jul 1 18:10:00 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 852 iface 172.16.0.209 to [2 of 10] (snip) Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 876 iface 172.16.0.209 to [9 of 10] Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Incrementing problem counter for seqid 878 iface 172.16.0.209 to [10 of 10] Jul 1 18:10:11 debian-hab corosync[1377]: [TOTEM ] Marking seqid 878 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required.
- FAULTYになったあとリンクが復活しても、corosync-cfgtool -rで手動で戻す必要がある。
- mitty@debian-hab:~$ sudo corosync-cfgtool -s
Printing ring status. Local node ID 1358997696 RING ID 0 id = 192.168.0.209 status = ring 0 active with no faults RING ID 1 id = 172.16.0.209 status = Marking seqid 24 ringid 1 interface 172.16.0.209 FAULTY - adminisrtative intervention required.
- mitty@debian-hab:~$ sudo corosync-cfgtool -r
Re-enabling all failed rings.
- mitty@debian-hab:~$ sudo corosync-cfgtool -s
Printing ring status. Local node ID 1358997696 RING ID 0 id = 192.168.0.209 status = ring 0 active with no faults RING ID 1 id = 172.16.0.209 status = ring 1 active with no faults
- mitty@debian-hab:~$ sudo corosync-cfgtool -s
- problem counterの値は、rrp_problem_count_thresholdで変更出来る。
- /etc/corosync/corosync.conf
totem { (snip) rrp_problem_count_threshold: 1000
- /etc/corosync/corosync.conf
- 二つのinterfaceがある状態で、二つともダウンすると、problem counterのカウントダウンは停止し、その後(何故か)no faultsに戻る。
- ring 1 -> ring 0の順でダウン
Jul 1 18:30:21 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8312 iface 172.16.0.209 to [1 of 1000] Jul 1 18:30:22 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8314 iface 172.16.0.209 to [2 of 1000] Jul 1 18:30:22 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8316 iface 172.16.0.209 to [3 of 1000] Jul 1 18:30:23 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] Jul 1 18:30:23 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8318 iface 172.16.0.209 to [3 of 1000] Jul 1 18:30:24 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8320 iface 172.16.0.209 to [4 of 1000] Jul 1 18:30:25 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] Jul 1 18:30:25 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8322 iface 172.16.0.209 to [4 of 1000] Jul 1 18:30:26 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8324 iface 172.16.0.209 to [5 of 1000] Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8326 iface 172.16.0.209 to [6 of 1000] Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] Jul 1 18:30:27 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 8328 iface 172.16.0.209 to [6 of 1000] Jul 1 18:30:29 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [5 of 1000] Jul 1 18:30:31 debian-hab corosync[1424]: [TOTEM ] A processor failed, forming new configuration. <---- ここで二つ目のリンクもダウン、UNCLEAN(offline)へ Jul 1 18:30:31 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [4 of 1000] Jul 1 18:30:33 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [3 of 1000] (snip) Jul 1 18:30:35 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [2 of 1000] Jul 1 18:30:37 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 172.16.0.209 to [1 of 1000] Jul 1 18:30:39 debian-hab corosync[1424]: [TOTEM ] ring 1 active with no faults
- どちらかのリンクが復活すると、復活していない方のカウントダウンが再開する。
- ring 1が復活
Jul 1 18:34:56 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 2 iface 192.168.0.209 to [1 of 1000] Jul 1 18:34:57 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 4 iface 192.168.0.209 to [2 of 1000] Jul 1 18:34:58 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 6 iface 192.168.0.209 to [3 of 1000] (snip) Jul 1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-haa is online Jul 1 18:35:02 debian-hab pengine: [1435]: info: determine_online_status: Node debian-hab is online (snip) Jul 1 18:35:09 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 42 iface 192.168.0.209 to [15 of 1000] Jul 1 18:35:09 debian-hab corosync[1424]: [MAIN ] Completed service synchronization, ready to provide service. Jul 1 18:35:09 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 44 iface 192.168.0.209 to [16 of 1000]
- ring 0も復活すると、片方がダウン->problem counterが閾値を越える前に回復、と同じように元に戻る
Jul 1 18:35:19 debian-hab corosync[1424]: [TOTEM ] Incrementing problem counter for seqid 76 iface 192.168.0.209 to [27 of 1000] Jul 1 18:35:21 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [26 of 1000] (snip) Jul 1 18:36:09 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [2 of 1000] Jul 1 18:36:11 debian-hab corosync[1424]: [TOTEM ] Decrementing problem counter for iface 192.168.0.209 to [1 of 1000] Jul 1 18:36:13 debian-hab corosync[1424]: [TOTEM ] ring 0 active with no faults
- ring 1 -> ring 0の順でダウン