Cluster Fence: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
m (moved Cluster fence to Cluster Fence: Capitalizing the article name)
(Redirected page to 2-Node CentOS5 Cluster#Fencing)
Line 1: Line 1:
{{na_header}}
#REDIRECT [[2-Node_CentOS5_Cluster#Fencing]]
 
* The Cluster Admin's Mantra:
 
** '''The only thing you don't know is what you don't know'''.
 
Just because one node loses communication with another node, it '''cannot''' be assume that the silent node is dead!
 
= What is it? =
 
"Fencing" is the act of isolating a malfunctioning node. The goal is to prevent a '''split-brain''' condition where two nodes think the other member is dead and continue to use a shared resource. When this happens, file system corruption is almost guaranteed. If you are lucky enough to not lose the shared file system, you will be faced with the task of determining what data got written to which node, merging that data and/or overwriting the node you trust the least. This 'best case' is still pretty lousy.
 
Fencing, isolating a node from altering shared disks, can be accomplished in a couple of ways:
 
* Power
** Power fencing is where a device is used to cut the power to a malfunctioning node. This is probably the most common type.
* Blocking
** Blocking is often implemented at the network level. This type of fencing leaves the node alone, but disconnects it from the storage network. Often this is done by a switch which prevents traffic coming from the fenced node.
 
With power fencing, the term used is "STONITH", literally, '''S'''hoot '''T'''he '''O'''ther '''N'''ode '''I'''n '''T'''he '''H'''ead. Picture it like an old west dual. If one node is dead, the other node is going to win the dual by default and the dead node will just be shot again. When both nodes are alive, however, the faster node will win and will "kill" the slower node before it has a chance to fire. Once this dual is over, the surviving node can then go back to accessing the shared resource confident that it is the only one working on it.
 
== Misconception ==
 
It is a '''very''' common mistake to ignore fencing when first starting to learn about clustering. Often people think "It's just for production systems, I don't need to worry about it yet because I don't care what happens to my test cluster.".
 
'''''Wrong!'''''
 
For the most practical reason; the cluster software will block all I/O transactions when it can't guarantee a fence operation succeeded. The result is that your cluster will essentially "lock up". Likewise, [[cman]] and related daemons will fail if they can't find a fence agent to use.
 
Secondly; Testing our cluster will involve inducing errors. Without proper fencing, there is a high probability that our shared file system will be corrupted. That would force the need to start over, making your learning take a lot longer than it needs to.
 
== Implementation ==
 
In Red Hat's cluster software, the fence device (or devices) are configured in the main <span class="code">/etc/cluster.conf</span> cluster configuration file. This configuration is then acted on via the <span class="code">fenced</span> daemon. When the cluster determines that a node needs to be fenced, the <span class="code">fenced</span> daemon will consult the <span class="code">cluster.conf</span> file for information on how to access the fence device.
 
Given this <span class="code">cluster.conf</span> snippet:
<source lang="xml">
<cluster name="an_san" config_version="1">
<clusternodes>
<clusternode name="an_san02.alteeve.com" nodeid="2">
<fence>
<method name="node_assassin">
<device name="ariel" port="02" action="off"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="ariel" agent="fence_na" quiet="true" ipaddr="ariel.alteeve.com" login="ariel" passwd="gr0tt0">
</fencedevice>
</fencedevices>
</cluster>
</source>
 
If the cluster manager determines that the node <span class="code">an_san02.alteeve.com</span> needs to be fenced, it looks at the first (and only, in this case) <span class="code"><fence></span> entry's <span class="code">name</span>, which is <span class="code">ariel</span> in this case. It then looks in the <span class="code"><fencedevices></span> section for the device with the matching <span class="code">name</span>. From there, it gets the information needed to find and access the fence device. Once it connects to the fence device, it then passes the options set in <span class="code">an_san02.alteeve.com</span>'s <span class="code"><fence></span> argument.
 
So in this example, <span class="code">fenced</span> looks up the details on the <span class="code">ariel</span> Node Assassin fence device. It calls the <span class="code">fence_na</span> program, called a fence agent, and passes the following arguments:
* <span class="code">ipaddr=ariel.alteeve.com</span>
* <span class="code">login=ariel</span>
* <span class="code">passwd=gr0tt0</span>
* <span class="code">quiet=true/span>
* <span class="code">port=2</span>
* <span class="code">action=off</span>
 
How the fence agent acts on these arguments varies depending on the fence device itself. In general terms, the '<span class="code">fence_na</span>' fence agent will create a connection to the device at the IP address (or resolvable name, as in this case) specified in the <span class="code">ipaddr</span> argument. Once connected, it will authenticate using the <span class="code">login</span> and <span class="code">passwd</span> arguments. Once authenticated, it tells the device what <span class="code">port</span> to act on, which could be a power jack, a power or reset button, a network switch port and so on. Finally, it tells the device what <span class="code">action</span> to take.
 
Once the device completes, it returns a success or failed message. If the first attempt fails, the fence agent will try the next <span class="code"><fence></span> method, if a second exists. It will keep trying fence devices in the order they are found in the <span class="code">cluster.conf</span> file until it runs out of devices. If it fails to fence the node, most daemons will "block", that is, lock up and stop responding until the issue is resolved. The logic for this is that a locked up cluster is better than a corrupted one.
 
If any of the fence devices succeed though, the cluster will know that it is safe to proceed and will reconfigure the cluster without the defective node.
 
== Fence Devices ==
 
Many major [[OEM]]s have their own remote management devices that can serve as fence devices. Examples are [http://dell.ca Dell]'s 'DRAC' (Dell Remote Access Controller), [http://hp.ca HP]'s iLO (Integrate Lights Out), [http://ibm.ca IBM]'s 'RSA' (Remote Supervisor Adapter), [http://sun.ca Sun]'s 'SSP' (System Service Processor) and so on. Smaller manufacturers implement remote management via [[IPMI]], Intelligent Power Management Interface.
 
In the above devices, fencing is implemented via a build in or integrated device inside the server. These devices are usually accessible even when the host server is powered off or hard locked. Via these devices, the host server can be powered off, reset and powered on remotely, regardless of the state of the host server.
 
Block fencing is possible when the device connecting a node to shared resources, like a fiber-channel SAN switch, provides a method of logically "unplugging" a defective node from the shared resource, leaving the node itself alone.
 
== Node Assassin ==
 
A cheap alternative is the [[Node Assassin]], an open-hardware, open source fence device. It was built to allow the use of commodity system boards that lacked remote management support found on more expensive, server class hardware.
 
'''Full Disclosure''': Node Assassin was created by me, with much help from others, for this paper.
 
{{na_footer}}

Revision as of 22:25, 14 March 2010