Fence na: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 8: Line 8:
# Node Assassin - Fence Agent
# Node Assassin - Fence Agent
# Digimer; digimer@alteeve.com
# Digimer; digimer@alteeve.com
# Mar. 08, 2010.
# Apr. 06, 2010
# Version: 0.1.005
# Version: 1.1.4
#
#
# Bugs;
# Bugs;
Line 18: Line 18:


Changes:
Changes:
v1.1.4
- Changed the version number to follow the Node Assassin version number. From
  this point on, matching version numbers will be compatible. This version
  breaks compatibility with older versions of Node Assassin devices.
-
v0.1.006
- Added an initial 'release_all' to the 'boot_all' action to make sure nodes
  that were fenced have their reset ports released.


v0.1.005
v0.1.005
Line 77: Line 87:


- Node Assassin's implentation of options.
- Node Assassin's implentation of options.
   - 'off' This sets the node to state '0' on the reset port followed by
   - 'off' This set's the node to state 1; Fenced.
  state '3' to the power port. State 0 is maintained to prevent
   - 'on' This sets the node to state '0'; Unfenced. It then pauses one
  a reboot.
   second and then sets state 2 to boot the node.
   - 'on' This sets the node to state '1' on the reset port followed by
   - 'reboot' This sets the node to state 1; Fenced. It then waits for one
   state '2' on the power port to boot the node.
   second, sets state 0 to unfence the node, waits a further
   - 'reboot' This sets the node to state '2' on the reset port to quickly
   second and sets the node to state 2 to boot the node.
   kill the node, then switches to state '3' on the power port,
   - 'status' This calls '00:0' and returns the state of the node. This
  checks the return value (later, will check the probe pin),
   includes the state of the power and reset button plus the power
   sets state '1' on the reset port, pauses 1 second, and then
  feed status to indicate if the node is on or not.
  sets state '2' on the power port to boot the node.
   - 'status' This calls '00:0' and returns the state of the port. Later,
   this will return the value from the voltage sensing pin.
   - 'monitor' being a multi-port fence device, this should call 'list'.
   - 'monitor' being a multi-port fence device, this should call 'list'.
   MADI: Confirm that this is what is meant in "Issues" here:
   MADI: Confirm that this is what is meant in "Issues" here:
Line 164: Line 171:


# Set STDOUT and $log to hot.
# Set STDOUT and $log to hot.
if (1)
{
{
select $log;
select $log;
Line 182: Line 190:
# This makes sure the node ID is zero-padded or '00'.
# This makes sure the node ID is zero-padded or '00'.
$conf->{na}{port}=$conf->{na}{port} ? $conf->{na}{port}=sprintf("%02d", $conf->{na}{port}) : "00";
$conf->{na}{port}=$conf->{na}{port} ? $conf->{na}{port}=sprintf("%02d", $conf->{na}{port}) : "00";
# record($conf, $log, __LINE__."; na::port: [$conf->{na}{port}]\n");


# Find the TCP port from the config file.
# Find the TCP port from the config file.
Line 189: Line 198:
{
{
$conf->{'system'}{na_id}=$i;
$conf->{'system'}{na_id}=$i;
# record($conf, $log, __LINE__."; system::na_id: [$conf->{'system'}{na_id}]\n");
$conf->{na}{tcp_port}=$conf->{na}{$i}{tcp_port};
$conf->{na}{tcp_port}=$conf->{na}{$i}{tcp_port};
# record($conf, $log, __LINE__."; na::tcp_port: [$conf->{na}{tcp_port}]\n");
$conf->{na}{na_name}=$conf->{na}{$i}{na_name} ? $conf->{na}{$i}{na_name} : "Node Assassin #$i";
$conf->{na}{na_name}=$conf->{na}{$i}{na_name} ? $conf->{na}{$i}{na_name} : "Node Assassin #$i";
# record($conf, $log, __LINE__."; na::na_name: [$conf->{na}{na_name}]\n");
$conf->{na}{max_nodes}=$conf->{na}{$i}{max_nodes};
$conf->{na}{max_nodes}=$conf->{na}{$i}{max_nodes};
# record($conf, $log, __LINE__."; na::max_nodes: [$conf->{na}{max_nodes}]\n");
foreach my $node (1..$conf->{na}{max_nodes})
{
$conf->{na}{power_pins}.=sprintf("%02d", (($node*2)-1)).",";
$conf->{na}{reset_pins}.=sprintf("%02d", (($node*2)-1)).",";
}
$conf->{na}{power_pins}=~s/,$//;
$conf->{na}{reset_pins}=~s/,$//;
last;
}
}
}
}


die "Exiting on errors.\n" if $bad;
die "Exiting on errors.\n" if $bad;
# record($conf, $log, "Node Assassin: [$conf->{na}{ipaddr}].\n");
record($conf, $log, "Node Assassin: . [$conf->{na}{ipaddr}].\n");
# record($conf, $log, "TCP Port: .... [$conf->{na}{tcp_port}].\n");
record($conf, $log, "TCP Port: ...... [$conf->{na}{tcp_port}].\n");
# record($conf, $log, "Node: ........ [$conf->{na}{port}].\n");
record($conf, $log, "Node: .......... [$conf->{na}{port}].\n");
# record($conf, $log, "Login: ....... [$conf->{na}{login}].\n");
record($conf, $log, "Login: ......... [$conf->{na}{login}].\n");
# record($conf, $log, "Password: .... [$conf->{na}{passwd}].\n");
record($conf, $log, "Password: ...... [$conf->{na}{passwd}].\n");
# record($conf, $log, "Action: ...... [$conf->{na}{action}].\n");
record($conf, $log, "Action: ........ [$conf->{na}{action}].\n");
# record($conf, $log, "Version: ..... [$conf->{'system'}{version}].\n");
record($conf, $log, "Version Request: [$conf->{'system'}{version}].\n");
# record($conf, $log, "Done reading args.\n");
record($conf, $log, "Done reading args.\n");


# If I've been asked to show the version information, do so and then exit.
# If I've been asked to show the version information, do so and then exit.
Line 240: Line 244:
# If I've been asked to show the info on the given node assassin, do so and
# If I've been asked to show the info on the given node assassin, do so and
# then exit.
# then exit.
# record($conf, $log, "List State: .. [$conf->{'system'}{list_state}].\n");
# record($conf, $log, "List State: .... [$conf->{'system'}{list_state}].\n");
if ($conf->{'system'}{list_state})
if ($conf->{'system'}{list_state})
{
{
record($conf, $log, "Calling the 'show_state' function.\n");
show_state($conf, $log);
show_state($conf, $log);
do_exit($conf, $log, 0);
do_exit($conf, $log, 0);
Line 249: Line 254:
# When asked to 'monitor' or 'list', do this... whatever 'this' is. All I know
# When asked to 'monitor' or 'list', do this... whatever 'this' is. All I know
# is that it should not generate output.
# is that it should not generate output.
# record($conf, $log, "Action: ...... [$conf->{na}{action}].\n");
# record($conf, $log, "Action: ........ [$conf->{na}{action}].\n");
if (($conf->{na}{action} eq "monitor") or ($conf->{na}{action} eq "list"))
if (($conf->{na}{action} eq "monitor") or ($conf->{na}{action} eq "list"))
{
{
record($conf, $log, "Calling the 'show_list' function.\n");
show_list($conf, $log, "list");
show_list($conf, $log, "list");
do_exit($conf, $log, 0);
do_exit($conf, $log, 0);

Revision as of 05:30, 7 April 2010

 Node Assassin :: Fence na

This is the core fence agent that exists in /sbin/.

#!/usr/bin/perl
#
# Node Assassin - Fence Agent
# Digimer; digimer@alteeve.com
# Apr. 06, 2010
# Version: 1.1.4
#
# Bugs;
# - None known, many expected
# 

=pod

Changes:

v1.1.4
 - Changed the version number to follow the Node Assassin version number. From
   this point on, matching version numbers will be compatible. This version
   breaks compatibility with older versions of Node Assassin devices.
 - 

v0.1.006
 - Added an initial 'release_all' to the 'boot_all' action to make sure nodes
   that were fenced have their reset ports released.

v0.1.005
 - Expanded the 'version' call to return complete details on each attached and
   configured Node Assassins.
 - Implemented the 'release', 'release_all', 'fence_all' and 'boot_all'
   actions.
 - General cleanup and updating of the docs.

v0.1.004
 - Fixed the command line argument bug.
 - Updated the 'help' message to be more accurate.

Given the following:
<cluster name="an_san" config_version="1">
	<clusternodes>
		<clusternode name="an_san01.alteeve.com" nodeid="1">
			<fence>
				<method name="node_assassin">
					<device name="ariel" port="01" action="off"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="an_san02.alteeve.com" nodeid="2">
			<fence>
				<method name="node_assassin">
					<device name="ariel" port="02" action="off"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice name="node_assassin" agent="fence_na" ipaddr="ariel.alteeve.com" name="ariel" passwd="gr0tt0"></fencedevice>
	</fencedevices>
</cluster>

Questions:
- Is there a corelation between 'clusternode -> name', 'device -> name' and 
 'fencedevice -> name'? Which is used when sending 'name' to the fence agent?
 'fencedevice'?
  

When 'fenced' decides to fence "an_san01.alteeve.com", it will:
- call '/sbin/fence_na' because of the 'fencedevices -> agent' value.
- It will pass the following arguments to the fence agent, one pair per line:
    agent=fence_na		# From 'fencedevices -> agent'
    name=ariel			# From 'fencedevices -> name'
    ipaddr=ariel.alteeve.com	# From 'fencedevices -> ipaddr'
    passwd=gr0tt0		# From 'fencedevices -> passwd'
    port=01			# From 'clusternode "an_san01.alteeve.com" -> port'
    action=fence_na		# From 'clusternode "an_san01.alteeve.com" -> option'
    				# This must be 'on', 'off', 'reboot', 'status'
    				# or 'monitor'. See below for how these terms
    				# are interpretted by this agent.
    				# NOTE: If 'option' is passed, it's value will
    				# be stored in 'action'. That is, 'action' and
    				# 'option' are synonymous.
    				# 

- Node Assassin's implentation of options.
  - 'off'	This set's the node to state 1; Fenced.
  - 'on'	This sets the node to state '0'; Unfenced. It then pauses one
  		second and then sets state 2 to boot the node.
  - 'reboot'	This sets the node to state 1; Fenced. It then waits for one
  		second, sets state 0 to unfence the node, waits a further
  		second and sets the node to state 2 to boot the node.
  - 'status'	This calls '00:0' and returns the state of the node. This
  		includes the state of the power and reset button plus the power
  		feed status to indicate if the node is on or not.
  - 'monitor'	being a multi-port fence device, this should call 'list'.
  		MADI: Confirm that this is what is meant in "Issues" here:
  		http://sources.redhat.com/cluster/wiki/FenceAgentAPI
  - 'list'	No info on this

Command Line Arguments:
- Any command line arguments used by this fence agent are not dictated by the
  Fence Agent API. By convention only, the following command line options are
  used:
  -a <ip>	# Maps the value to 'ipaddr'.
  -h		# Print the help message and then exits.
  -l <name>	# Maps the value to 'name'.
  -n <num>	# Maps the value to 'port'.
  -o <string>	# Maps the value to 'action'.
  -p <string>	# Maps the value to 'passwd'.
  -S <path>	# Maps the value to 'passwd_script'. This is not used by Node
  		# Assassin yet and is simply ignored.
  -q		# Sets quiet mode. Only errors will be printed. Logging
  		# proceeds as normal
  -V		# Prints the 'fence_na' version and the version of any attached
  		# Node Assassin(s) and exits.

Note:
- For now, I will return '0' if the command succeeded, but will add a detection
  line checks if there is voltage from the node's PSU later.
=cut

# Play safe!
use strict;
use warnings;
# Load my library.
require '/etc/na/fence_na.lib';
# This is how I talk.
use IO::Handle;
use Net::Telnet;

# This will be read in from a config file later.
my $conf={
	'system'	=>	{
		max_valid_state	=>	3,
		conf_file	=>	"/etc/na/fence_na.conf",
		quiet		=>	"",
		version		=>	0,
		list_state	=>	"",
		list		=>	"",
		monitor		=>	"",
		na_id		=>	0,
		got_cla		=>	0,	# This is set if command line arguments are read.
	},
	na		=>	{
		ipaddr		=>	"",
		tcp_port	=>	"",
		port		=>	"238",
		login		=>	"",
		passwd		=>	"",
		port		=>	"",
		set_state	=>	"",
		passwd_script	=>	"",
		action		=>	"",
		agent		=>	"",	# This is only used by 'fenced'
		na_name		=>	"",	# This is used for the 'list' function.
		handle		=>	"",
		max_node	=>	0,
		set_state	=>	[],	# This anon array will store the states to set based on the action passed for the proper ports.
	}
};
# This method can't pass in the '$log' handle, obviously, as it does not yet
# exist.
read_conf($conf);

# Log file for output.
my $log=IO::Handle->new();
open ($log, ">$conf->{'system'}{'log'}") || die "Failed to open: [$conf->{'system'}{'log'}] for writing; Error: $!\n";

# Set STDOUT and $log to hot.
if (1)
{
	select $log;
	$|=1;
	select STDOUT;
	$|=1;
}

# If this gets set in the next two function, the agent will exit.
my $bad=0;

# Read in arguments from the command line.
($bad)=read_cla($conf, $log, $bad);

# Now read in arguments from STDIN, which is how 'fenced' passes arguments.
($bad)=read_stdin($conf, $log, $bad);

# This makes sure the node ID is zero-padded or '00'.
$conf->{na}{port}=$conf->{na}{port} ? $conf->{na}{port}=sprintf("%02d", $conf->{na}{port}) : "00";
# record($conf, $log, __LINE__."; na::port: [$conf->{na}{port}]\n");

# Find the TCP port from the config file.
foreach my $i (1..$conf->{'system'}{na_num})
{
	if ((lc($conf->{na}{$i}{ipaddr}) eq lc($conf->{na}{ipaddr})))
	{
		$conf->{'system'}{na_id}=$i;
# 		record($conf, $log, __LINE__."; system::na_id: [$conf->{'system'}{na_id}]\n");
		$conf->{na}{tcp_port}=$conf->{na}{$i}{tcp_port};
# 		record($conf, $log, __LINE__."; na::tcp_port: [$conf->{na}{tcp_port}]\n");
		$conf->{na}{na_name}=$conf->{na}{$i}{na_name} ? $conf->{na}{$i}{na_name} : "Node Assassin #$i";
# 		record($conf, $log, __LINE__."; na::na_name: [$conf->{na}{na_name}]\n");
		$conf->{na}{max_nodes}=$conf->{na}{$i}{max_nodes};
# 		record($conf, $log, __LINE__."; na::max_nodes: [$conf->{na}{max_nodes}]\n");
	}
}

die "Exiting on errors.\n" if $bad;
record($conf, $log, "Node Assassin: . [$conf->{na}{ipaddr}].\n");
record($conf, $log, "TCP Port: ...... [$conf->{na}{tcp_port}].\n");
record($conf, $log, "Node: .......... [$conf->{na}{port}].\n");
record($conf, $log, "Login: ......... [$conf->{na}{login}].\n");
record($conf, $log, "Password: ...... [$conf->{na}{passwd}].\n");
record($conf, $log, "Action: ........ [$conf->{na}{action}].\n");
record($conf, $log, "Version Request: [$conf->{'system'}{version}].\n");
record($conf, $log, "Done reading args.\n");

# If I've been asked to show the version information, do so and then exit.
# record($conf, $log, "Version: ..... [$conf->{'system'}{version}].\n");
if ($conf->{'system'}{version})
{
	version($conf, $log);
	do_exit($conf, $log, 0);
}

# Connect to the Node Assassin.
connect_to_na($conf, $log);

# Validate credentials.
# NOTE: Checking before the telnet fails on the exit. Also, this will be moved
# into the Node Assassin soon anyway.
if (($conf->{na}{login} ne $conf->{'system'}{username}) or ($conf->{na}{passwd} ne $conf->{'system'}{password}))
{
	record($conf, $log, "Username and/or password failed.\n");
	do_exit($conf, $log, 8);
}

###############################################################################
# What do?                                                                    #
###############################################################################

# If I've been asked to show the info on the given node assassin, do so and
# then exit.
# record($conf, $log, "List State: .... [$conf->{'system'}{list_state}].\n");
if ($conf->{'system'}{list_state})
{
	record($conf, $log, "Calling the 'show_state' function.\n");
	show_state($conf, $log);
	do_exit($conf, $log, 0);
}

# When asked to 'monitor' or 'list', do this... whatever 'this' is. All I know
# is that it should not generate output.
# record($conf, $log, "Action: ........ [$conf->{na}{action}].\n");
if (($conf->{na}{action} eq "monitor") or ($conf->{na}{action} eq "list"))
{
	record($conf, $log, "Calling the 'show_list' function.\n");
	show_list($conf, $log, "list");
	do_exit($conf, $log, 0);
}

# If I made it this far, I am setting a state. Sort out what state from the
# values in my conf->{na} hash.
# record($conf, $log, "Setting node: [$conf->{na}{port}] to action: [$conf->{na}{action}] using the Node Assassin: [$conf->{na}{ipaddr}] using the login: [$conf->{na}{login}/$conf->{na}{passwd}]\n");

# Convert the action into Node Assassin protocol arguments.
process_action($conf, $log);

# Now execute the action plan.
my $exit_code=do_actions($conf, $log);
# record($conf, $log, "All calls complete, exiting.\n");

# Cleanup and exit.
do_exit($conf, $log, $exit_code);

 

Input, advice, complaints and meanderings all welcome!
Digimer digimer@alteeve.ca https://alteeve.ca/w legal stuff:  
All info is provided "As-Is". Do not use anything here unless you are willing and able to take resposibility for your own actions. © 1997-2013
Naming credits go to Christopher Olah!
In memory of Kettle, Tonia, Josh, Leah and Harvey. In special memory of Hannah, Jack and Riley.