Node Assassin Fence Agent v1.1.4

From Alteeve Wiki
Jump to navigation Jump to search

 Node Assassin :: Node_Assassin v1.1.4 :: Node Assassin Fence Agent v1.1.4

This is the fenced fence agent for Node Assassin.

Files

The Node Assassin fence agent v1.1.4 is split up into three files:

  • Source: fence_na - Download
    • This is the core fence agent that exists in /sbin/.
  • Source: fence_na.lib - Download
    • This is the fence agent's function library that exists in /etc/na/.
  • Source: fence_na.conf - Download
    • This is the common Node Assassin configuration file that exists in /etc/na/.

The reason for the three files is that, later, there will be a fourth executable that will program the Node Assassin devices. When this program is created, it will consult the common configuration file and will use some of the functions in the library.

Configuration File

Be sure to review and edit /etc/na/fence_na.conf! It is heavily documented and explains what each option is and how it needs to be set for your Node Assassin(s).

The Cluster 'cluster.conf' File

Here is an example of the cluster related entries you will need to use in order to properly use the Node Assassin.

<cluster name="an_cluster" config_version="1">
        <clusternodes>
                <clusternode name="an_node01.alteeve.com" nodeid="1">
                        <fence>
                                <method name="node_assassin">
                                        <device name="motoko" port="01" action="off"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an_node02.alteeve.com" nodeid="2">
                        <fence>
                                <method name="node_assassin">
                                        <device name="motoko" port="02" action="off"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice name="node_assassin" agent="fence_na" ipaddr="motoko.alteeve.com" name="motoko" passwd="secret"></fencedevice>
        </fencedevices>
</cluster>

XML Validation Support

Before you can validate your cluster.conf file, the valid options and syntax need to be added to cluster.ng. The Node Assassin installer will do this for you.

Support is added by inserting two sections to the cluster.ng file. Here is the diff of the cluster.ng provided in CentOS 5.4 pre and post edit.

cluster.ng diff

diff ../cluster.ng.original /usr/share/system-config-cluster/misc/cluster.ng
46a47,62
>        <!-- Node Assassin -->
>        <group>
>         <attribute name="ipaddr"/>
>         <optional>
>         <attribute name="login"/>
>         </optional>
>         <optional>
>         <attribute name="passwd"/>
>         </optional>
>         <optional>
>         <attribute name="passwd_script"/>
>         </optional>
>         <optional>
>          <attribute name="quiet"/>
>         </optional>
>        </group>
1036a1053,1057
>         <!-- Node Assassin -->
>         <group>
>          <attribute name="port"/>
>          <attribute name="action"/>
>         </group>

cluster.ng Unified diff

This is the same as above, but in a unified output.

diff -u ../cluster.ng.original /usr/share/system-config-cluster/misc/cluster.ng
--- ../cluster.ng.original	2010-04-16 13:08:48.000000000 -0400
+++ /usr/share/system-config-cluster/misc/cluster.ng	2010-04-16 18:25:53.000000000 -0400
@@ -44,6 +44,22 @@
      <attribute name="agent"/>
      <optional>
       <choice>
+       <!-- Node Assassin -->
+       <group>
+        <attribute name="ipaddr"/>
+        <optional>
+        <attribute name="login"/>
+        </optional>
+        <optional>
+        <attribute name="passwd"/>
+        </optional>
+        <optional>
+        <attribute name="passwd_script"/>
+        </optional>
+        <optional>
+         <attribute name="quiet"/>
+        </optional>
+       </group>
        <!-- RPS10 -->
        <group>
         <attribute name="device" />
@@ -1034,6 +1050,11 @@
         <data type="IDREF"/>
        </attribute>
        <choice>
+        <!-- Node Assassin -->
+        <group>
+         <attribute name="port"/>
+         <attribute name="action"/>
+        </group>
         <!-- DRAC -->
         <group>
          <optional>

Step by step of a Fence Action

When fenced is asked to fence a node, it will:

  1. Call /sbin/fence_na because of the fencedevices -> agent value.
  2. It will pass the following arguments to the fence agent, one pair per line:
    agent=fence_na              # From 'fencedevices' -> 'agent'
    name=motoko                 # From 'fencedevices' -> 'name'
    ipaddr=motoko.alteeve.com   # From 'fencedevices' -> 'ipaddr'
    passwd=secret               # From 'fencedevices' -> 'passwd'
    port=01                     # From 'clusternode' -> 'an_node01.alteeve.com'
                                # -> 'port'
    action=fence_na             # From 'clusternode' -> 'an_node01.alteeve.com'
                                # -> 'option'. This must be 'on', 'off', 
                                # 'reboot', 'status' or 'monitor'. See below
                                # for how these terms are interpreted by this
                                # agent. In most cases, you will want to use
                                # 'off'.
                                # NOTE: If 'option' is passed, its value will
                                # be stored in 'action'. That is, 'action' and
                                # 'option' are synonymous but 'option' is
                                # deprecated.

Node Assassin's implementation of 'action's

Internally, Node Assassin can set nodes into one of four states:

  • 0: Remove fence and boot.
  • 1: Fence with front panel lock-out.
  • 2: Close the power switch for one second.
  • 3: Fence *without* front panel lock-out.

The list below shows how Node Assassin interprets the various fence agent actions.

off

This sets the node to state 1; Fenced. Internally, it will hit the reset switch for one second to immediately disable the node. Then it will release the reset switch for another second before pressing and holding the power switch. After five seconds, Node Assassin will check the node's power feed. If it is still on, it will wait another 25 seconds and check again. If the node is still on, an error will be generated. If the node turns off successfully, the fence is declared a success.

on

This sets the node to state 0; Not fenced. Both the power and reset switches are opened, the Node Assassin will pause for one second and then the power switch will be closed for one second to boot the node (that is, the node is set to state 2).

reboot

This essentially just calls an off and then an on. As per the FenceAgentAPI, the fence agent will return a success (exit 0) even if only the off stage succeeded.

status

This checks the power feed for the requested node and reports it's status. If the node is on, the agent will exit with code 0. If the node is off (or disconnected), it will exit with code 1. If an error occurred calling the Node Assassin, this will exit with code 2.

monitor

Being a multi-port fence device, this simply call 'list'.

list

This returns a CSV of the ports on your Node Assassin. Each node will be on a new line in the format 'node,alias' where the alias is read from the Node Assassin configuration file.

Node Assassin Specific Actions

The Node Assassin fence agent has an extended set of actions that are outside the FenceAgentAPI. These were added to make certain admin tasks easier, like being able to boot or shut down the entire cluster with one command.

These actions are meant to be used at the command line. Actions with the name *_all do not need the -p # argument and will ignore it if set.

release

Normally, the on action will release the fence and then boot the node. This action allows you to simply release the fence on a node without booting it.

release_all

This will release all fences held against all nodes in one pass. Like with release, the nodes will not be booted after the fence is released.

fence_all

This will fence all nodes at one time. It's primarily for testing purposes.

boot_all

This will check the power feeds of all nodes and any that are not running will be be booted. Specifically, they will be set to state 2; Their power buttons will be pressed for one second.

shutdown_all

This is identical to boot_all, except that nodes found to be on will be set to state 2 to initiate power down via ACPI.

forcedown_all

This is essentially the same as fence_all except that each node is set to state 3 instead. The difference is that the node's front panel LEDs are not locked out after the fence completes.

Command Line Arguments

Any command line arguments used by this fence agent are not dictated by the Fence Agent API. The following command line options are used to match the precedent set by existing fence agents for other devices.

Where it says that a command line argument "maps" to a given variable, it is referencing the cluster.conf file's arguments for Node Assassin.

-a <ip>

Maps the value to 'ipaddr'.

-h

Print the help message and then exits.

-l <name>

Maps the value to 'name'.

-n <num>

Maps the value to 'port'.

-o <string>

Maps the value to 'action'.

-p <string>

Maps the value to 'passwd'.

-S <path>

Maps the value to 'passwd_script'.

NOTE: This is not used by Node Assassin (yet) and is simply ignored.

-q

Sets quiet mode. Only errors will be printed. Logging proceeds as normal.

-V

Prints the 'fence_na' version and the version of any attached Node Assassin(s) and exits.

Notes

All verifications of actions is done by checking the state of the node's "Power LED". For this reason, it is critical that you connect this feed.

The power and reset buttons are polarized. That is, you *MUST* connect the positive terminals from your mainboard's power and reset switches to the positive wires going to the Node Assassin.

IMPORTANT!

If you connect the power or reset buttons backwards, the circuit will be closed (that is, you will have pressed the button). This is by design!

ALWAYS TEST YOUR Node Assassin!

Specifically, after connecting a new node, be sure to manually send the on -> off -> on actions to make sure that the nodes are properly setup. This sequence will boot, fence and reboot the node and will require all functions to be working properly to succeed. If there are any errors, the fence agent will generate errors.

Agent Testing

To test the agent in a manner similar to how fenced calls it, copy the following into a file (ie: args.txt):

# Test file used as input for the NA fence agent.
ipaddr=motoko.alteeve.com
port=1
login=motoko
passwd=secret
action=on

And cat it into the fence agent via a pipe:

clear; cat args.txt | ./fence_na

 

Input, advice, complaints and meanderings all welcome!
Digimer digimer@alteeve.ca https://alteeve.ca/w legal stuff:  
All info is provided "As-Is". Do not use anything here unless you are willing and able to take resposibility for your own actions. © 1997-2013
Naming credits go to Christopher Olah!
In memory of Kettle, Tonia, Josh, Leah and Harvey. In special memory of Hannah, Jack and Riley.