Node Assassin :: Cluster Fence

The Cluster Admin's Mantra:

- The only thing you don't know is what you don't know.

Just because one node loses communication with another node, it cannot be assume that the silent node is dead!

= What is it?

"Fencing" is the act of isolating a malfunctioning node. The goal is to prevent a split-brain condition where two nodes think the other member is dead and continue to use a shared resource. When this happens, file system corruption is almost guaranteed. If you are lucky enough to not lose the shared file system, you will be faced with the task of determining what data got written to which node, merging that data and/or overwriting the node you trust the least. This 'best case' is still pretty lousy.

Fencing, isolating a node from altering shared disks, can be accomplished in a couple of ways:

Power
- Power fencing is where a device is used to cut the power to a malfunctioning node. This is probably the most common type.
Blocking
- Blocking is often implemented at the network level. This type of fencing leaves the node alone, but disconnects it from the storage network. Often this is done by a switch which prevents traffic coming from the fenced node.

With power fencing, the term used is "STONITH", literally, Shoot The Other Node In The Head. Picture it like an old west dual. If one node is dead, the other node is going to win the dual by default and the dead node will just be shot again. When both nodes are alive, however, the faster node will win and will "kill" the slower node before it has a chance to fire. Once this dual is over, the surviving node can then go back to accessing the shared resource confident that it is the only one working on it.

Misconception

It is a very common mistake to ignore fencing when first starting to learn about clustering. Often people think "It's just for production systems, I don't need to worry about it yet because I don't care what happens to my test cluster.".

Wrong!

For the most practical reason; the cluster software will block all I/O transactions when it can't guarantee a fence operation succeeded. The result is that your cluster will essentially "lock up". Likewise, cman and related daemons will fail if they can't find a fence agent to use.

Secondly; Testing our cluster will involve inducing errors. Without proper fencing, there is a high probability that our shared file system will be corrupted. That would force the need to start over, making your learning take a lot longer than it needs to.

Implementation

In Red Hat's cluster software, the fence device (or devices) are configured in the main /etc/cluster.conf cluster configuration file. This configuration is then acted on via the fenced daemon. When the cluster determines that a node needs to be fenced, the fenced daemon will consult the cluster.conf file for information on how to access the fence device.

Given this cluster.conf snippet:

<cluster name="an_san" config_version="1">
	<clusternodes>
		<clusternode name="an_san02.alteeve.com" nodeid="2">
			<fence>
				<method name="node_assassin">
					<device name="ariel" port="02" action="off"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice name="ariel" agent="fence_na" quiet="true" ipaddr="ariel.alteeve.com" login="ariel" passwd="gr0tt0">
		</fencedevice>
	</fencedevices>
</cluster>

If the cluster manager determines that the node an_san02.alteeve.com needs to be fenced, it looks at the first (and only, in this case) <fence> entry's name, which is ariel in this case. It then looks in the <fencedevices> section for the device with the matching name. From there, it gets the information needed to find and access the fence device. Once it connects to the fence device, it then passes the options set in an_san02.alteeve.com's <fence> argument.

So in this example, fenced looks up the details on the ariel Node Assassin fence device. It calls the fence_na program, called a fence agent, and passes the following arguments:

ipaddr=ariel.alteeve.com
login=ariel
passwd=gr0tt0
quiet=true/span>
port=2
action=off

How the fence agent acts on these arguments varies depending on the fence device itself. In general terms, the 'fence_na' fence agent will create a connection to the device at the IP address (or resolvable name, as in this case) specified in the ipaddr argument. Once connected, it will authenticate using the login and passwd arguments. Once authenticated, it tells the device what port to act on, which could be a power jack, a power or reset button, a network switch port and so on. Finally, it tells the device what action to take.

Once the device completes, it returns a success or failed message. If the first attempt fails, the fence agent will try the next <fence> method, if a second exists. It will keep trying fence devices in the order they are found in the cluster.conf file until it runs out of devices. If it fails to fence the node, most daemons will "block", that is, lock up and stop responding until the issue is resolved. The logic for this is that a locked up cluster is better than a corrupted one.

If any of the fence devices succeed though, the cluster will know that it is safe to proceed and will reconfigure the cluster without the defective node.

Fence Devices

Many major OEMs have their own remote management devices that can serve as fence devices. Examples are Dell's 'DRAC' (Dell Remote Access Controller), HP's iLO (Integrate Lights Out), IBM's 'RSA' (Remote Supervisor Adapter), Sun's 'SSP' (System Service Processor) and so on. Smaller manufacturers implement remote management via IPMI, Intelligent Power Management Interface.

In the above devices, fencing is implemented via a build in or integrated device inside the server. These devices are usually accessible even when the host server is powered off or hard locked. Via these devices, the host server can be powered off, reset and powered on remotely, regardless of the state of the host server.

Block fencing is possible when the device connecting a node to shared resources, like a fiber-channel SAN switch, provides a method of logically "unplugging" a defective node from the shared resource, leaving the node itself alone.

Node Assassin

A cheap alternative is the Node Assassin, an open-hardware, open source fence device. It was built to allow the use of commodity system boards that lacked remote management support found on more expensive, server class hardware.

Full Disclosure: Node Assassin was created by me, with much help from others, for this paper.

`Input, advice, complaints and meanderings all welcome!`
`Digimer`	`digimer@alteeve.ca`	`https://alteeve.ca/w`	`legal stuff:`
`All info is provided "As-Is". Do not use anything here unless you are willing and able to take resposibility for your own actions. © 1997-2013`
Naming credits go to Christopher Olah!
In memory of Kettle, Tonia, Josh, Leah and Harvey. In special memory of Hannah, Jack and Riley.

Cluster Fence

Contents

= What is it?

Misconception

Implementation

Fence Devices

Node Assassin

Navigation menu

Cluster Fence

= What is it?

Misconception

Implementation

Fence Devices

Node Assassin

Navigation menu

Search