Corosync

From Alteeve Wiki
Jump to navigation Jump to search

 Alteeve Wiki :: Corosync

Corosync is the communication layer of modern open-source clusters. It was created out of a desire to have a simplified and focused communication layer when openais was deemed too heavy weight and too complex given it's actual use in open-source clusters. It also replaces the now-deprecated heartbeat cluster communication program.

How it Works

Below is an adaptation from a linux-cluster mailing list post made on October 16th, 2013.

Corosync uses the totem protocol for "heartbeat" like monitoring of the other node's health. A token is passed around to each node, the node does some work (like acknowledge old messages, send new ones), and then it passes the token on to the next node. This goes around and around all the time. Should a node not pass it's token on after a short time-out period, the token is declared lost, an error count goes up and a new token is sent. If too many tokens are lost in a row, the node is declared lost/dead.

Once the node is declared lost, the remaining nodes reform a new cluster. If enough nodes are left to form quorum, then the new cluster will continue to provide services. In two-node clusters, quorum is disabled so each node can work on it's own.

Corosync itself only cares about cluster membership, message passing and quorum (as of corosync v2+). What happens after the cluster reforms is up to the cluster resource manager. Their are two main cluster resource managers out there; rgmanager and pacemaker. How either reacts to a change in the cluster's membership is, at a high level, the same. So we will use pacemaker as the example resource manager.

How Resource Managers React to Membership Changes

When pacemaker is told that membership has changed because a node died, it looks to see what services might have been lost. Once it knows what was lost, it looks at the rules it's been given and decides what to do.

Generally, the first thing it does is "stonith" (aka "fence") the lost node. This is a process where the lost node is powered off, called power fencing, or cut off from the network/storage, called fabric fencing. In either case, the idea is to make sure that the lost node is in a known state. If this is skipped, the node could recover later and try to provide cluster services, not having realized that it was removed from the cluster. This could cause problems from confusing switches to corrupting data.

In two-node clusters, there is also a chance of a "split-brain". Because quorum has to be disabled, it is possible for both nodes to think the other node is dead and both try to provide the same cluster services. By using stonith, after the nodes break from one another, which could happen with a network failure for example, neither node will offer services until one of them has stonith'ed the other. The faster node will win and the slower node will shut down (or be isolated). The survivor can then run services safely without risking a split-brain.

Once the dead node has been stonith'ed, pacemaker then decides what to do with the lost services. Generally, this means "restart the service here that had been running on the dead node". The details of this, though, are decided by you when you configure the resources in pacemaker.

This is a pretty high-level and simplifies a few things, but hopefully it helps clarify the mechanics of corosync.

Resources

More information can be found below;

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.