Node Assassin Fence Agent v1.1.4
This is the fenced fence agent for Node Assassin.
The Node Assassin fence agent v1.1.4 is split up into three files:
- Source: fence_na - Download
- This is the core fence agent that exists in /sbin/.
- Source: fence_na.lib - Download
- This is the fence agent's function library that exists in /etc/na/.
- Source: fence_na.conf - Download
- This is the common Node Assassin configuration file that exists in /etc/na/.
The reason for the three files is that, later, there will be a fourth executable that will program the Node Assassin devices. When this program is created, it will consult the common configuration file and will use some of the functions in the library.
Be sure to review and edit /etc/na/fence_na.conf! It is heavily documented and explains what each option is and how it needs to be set for your Node Assassin(s).
The Cluster 'cluster.conf' File
Here is an example of the cluster related entries you will need to use in order to properly use the Node Assassin.
<cluster name="an_cluster" config_version="1"> <clusternodes> <clusternode name="an_node01.alteeve.com" nodeid="1"> <fence> <method name="node_assassin"> <device name="motoko" port="01" action="off"/> </method> </fence> </clusternode> <clusternode name="an_node02.alteeve.com" nodeid="2"> <fence> <method name="node_assassin"> <device name="motoko" port="02" action="off"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="node_assassin" agent="fence_na" ipaddr="motoko.alteeve.com" name="motoko" passwd="secret"></fencedevice> </fencedevices> </cluster>
XML Validation Support
Note: This applies to RHEL 5.x / CentOS 5.x only.
Before you can validate your cluster.conf file, the valid options and syntax need to be added to cluster.ng. The Node Assassin installer will do this for you.
Support is added by inserting two sections to the cluster.ng file. Here is the diff of the cluster.ng provided in CentOS 5.4 pre and post edit.
diff ../cluster.ng.original /usr/share/system-config-cluster/misc/cluster.ng
46a47,62 > <!-- Node Assassin --> > <group> > <attribute name="ipaddr"/> > <optional> > <attribute name="login"/> > </optional> > <optional> > <attribute name="passwd"/> > </optional> > <optional> > <attribute name="passwd_script"/> > </optional> > <optional> > <attribute name="quiet"/> > </optional> > </group> 1036a1053,1057 > <!-- Node Assassin --> > <group> > <attribute name="port"/> > <attribute name="action"/> > </group>
cluster.ng Unified diff
This is the same as above, but in a unified output.
diff -u ../cluster.ng.original /usr/share/system-config-cluster/misc/cluster.ng
--- ../cluster.ng.original 2010-04-16 13:08:48.000000000 -0400 +++ /usr/share/system-config-cluster/misc/cluster.ng 2010-04-16 18:25:53.000000000 -0400 @@ -44,6 +44,22 @@ <attribute name="agent"/> <optional> <choice> + <!-- Node Assassin --> + <group> + <attribute name="ipaddr"/> + <optional> + <attribute name="login"/> + </optional> + <optional> + <attribute name="passwd"/> + </optional> + <optional> + <attribute name="passwd_script"/> + </optional> + <optional> + <attribute name="quiet"/> + </optional> + </group> <!-- RPS10 --> <group> <attribute name="device" /> @@ -1034,6 +1050,11 @@ <data type="IDREF"/> </attribute> <choice> + <!-- Node Assassin --> + <group> + <attribute name="port"/> + <attribute name="action"/> + </group> <!-- DRAC --> <group> <optional>
Step by step of a Fence Action
When fenced is asked to fence a node, it will:
- Call /sbin/fence_na because of the fencedevices -> agent value.
- It will pass the following arguments to the fence agent, one pair per line:
agent=fence_na # From 'fencedevices' -> 'agent' name=motoko # From 'fencedevices' -> 'name' ipaddr=motoko.alteeve.com # From 'fencedevices' -> 'ipaddr' passwd=secret # From 'fencedevices' -> 'passwd' port=01 # From 'clusternode' -> 'an_node01.alteeve.com' # -> 'port' action=fence_na # From 'clusternode' -> 'an_node01.alteeve.com' # -> 'option'. This must be 'on', 'off', # 'reboot', 'status' or 'monitor'. See below # for how these terms are interpreted by this # agent. In most cases, you will want to use # 'off'. # NOTE: If 'option' is passed, its value will # be stored in 'action'. That is, 'action' and # 'option' are synonymous but 'option' is # deprecated.
Node Assassin's implementation of 'action's
Internally, Node Assassin can set nodes into one of four states:
- 0: Remove fence and boot.
- 1: Fence with front panel lock-out.
- 2: Close the power switch for one second.
- 3: Fence *without* front panel lock-out.
The list below shows how Node Assassin interprets the various fence agent actions.
This sets the node to state 1; Fenced. Internally, it will hit the reset switch for one second to immediately disable the node. Then it will release the reset switch for another second before pressing and holding the power switch. After five seconds, Node Assassin will check the node's power feed. If it is still on, it will wait another 25 seconds and check again. If the node is still on, an error will be generated. If the node turns off successfully, the fence is declared a success.
This sets the node to state 0; Not fenced. Both the power and reset switches are opened, the Node Assassin will pause for one second and then the power switch will be closed for one second to boot the node (that is, the node is set to state 2).
This essentially just calls an off and then an on. As per the FenceAgentAPI, the fence agent will return a success (exit 0) even if only the off stage succeeded.
This checks the power feed for the requested node and reports it's status. If the node is on, the agent will exit with code 0. If the node is off (or disconnected), it will exit with code 1. If an error occurred calling the Node Assassin, this will exit with code 2.
Being a multi-port fence device, this simply call 'list'.
This returns a CSV of the ports on your Node Assassin. Each node will be on a new line in the format 'node,alias' where the alias is read from the Node Assassin configuration file.
Node Assassin Specific Actions
The Node Assassin fence agent has an extended set of actions that are outside the FenceAgentAPI. These were added to make certain admin tasks easier, like being able to boot or shut down the entire cluster with one command.
These actions are meant to be used at the command line. Actions with the name *_all do not need the -p # argument and will ignore it if set.
Normally, the on action will release the fence and then boot the node. This action allows you to simply release the fence on a node without booting it.
This will release all fences held against all nodes in one pass. Like with release, the nodes will not be booted after the fence is released.
This will fence all nodes at one time. It's primarily for testing purposes.
This will check the power feeds of all nodes and any that are not running will be be booted. Specifically, they will be set to state 2; Their power buttons will be pressed for one second.
This is identical to boot_all, except that nodes found to be on will be set to state 2 to initiate power down via ACPI.
This is essentially the same as fence_all except that each node is set to state 3 instead. The difference is that the node's front panel LEDs are not locked out after the fence completes.
Command Line Arguments
Any command line arguments used by this fence agent are not dictated by the Fence Agent API. The following command line options are used to match the precedent set by existing fence agents for other devices.
Where it says that a command line argument "maps" to a given variable, it is referencing the cluster.conf file's arguments for Node Assassin.
Maps the value to 'ipaddr'.
Print the help message and then exits.
Maps the value to 'name'.
Maps the value to 'port'.
Maps the value to 'action'.
Maps the value to 'passwd'.
Maps the value to 'passwd_script'.
NOTE: This is not used by Node Assassin (yet) and is simply ignored.
Sets quiet mode. Only errors will be printed. Logging proceeds as normal.
Prints the 'fence_na' version and the version of any attached Node Assassin(s) and exits.
All verifications of actions is done by checking the state of the node's "Power LED". For this reason, it is critical that you connect this feed.
The power and reset buttons are polarized. That is, you *MUST* connect the positive terminals from your mainboard's power and reset switches to the positive wires going to the Node Assassin.
If you connect the power or reset buttons backwards, the circuit will be closed (that is, you will have pressed the button). This is by design!
ALWAYS TEST YOUR Node Assassin!
Specifically, after connecting a new node, be sure to manually send the on -> off -> on actions to make sure that the nodes are properly setup. If there are any problems with your Node Assassin or with the cables connected to your node, the fence agent will fail and generate errors.
To test Node Assassin's fence_na agent in a manner similar to how the fenced daemon will call it, copy the following into a file (ie: args.txt):
# Test file used as input for the NA fence agent. ipaddr=motoko.alteeve.com port=1 login=motoko passwd=secret action=on
And cat it into the fence agent via a pipe:
cat args.txt | fence_na
You can replace the port=1 and action=on with another port number or a different action if you wish.
|Input, advice, complaints and meanderings all welcome!|
|All info is provided "As-Is". Do not use anything here unless you are willing and able to take resposibility for your own actions. © 1997-2013|
|Naming credits go to Christopher Olah!|
|In memory of Kettle, Tonia, Josh, Leah and Harvey. In special memory of Hannah, Jack and Riley.|