Node Assassin - Original

From Alteeve Wiki
Jump to navigation Jump to search

 Node Assassin :: Node Assassin - Original

-=] Paradise by the node assassin light [=-

The AN!Cluster unnamed assassin.

Node Assassin is an open-source, open-hardware project to create a network-attached cluster fence device.

Of course, you must proceed at your own risk. :)

Current Status

Feb. 28, 2010: Release NAOS ver. 1.0.3. Now working on the fence agent.

First Version

The first version of Node Assassin is operational!

It still needs to be tied into Red Hat's 'ricci' and 'luci' programs to function as a fence device though. This fence device resides on the cluster's private network channel (or some other common intranet).

Software

This is the initial release of the fence control software; Node Assassin Operating System.

Source Code and notes:

Version Control

All software related to this product is hosted on GitHub.

Naos v1.x Protocol

It works by listening for a connection on TCP port 238 on IP 192.168.1.66 (both port and IP are configurable). Once connected, it uses a very simple protocol.

Queries

Any message starting with 00:x are query commands. That is, the integer represented by x represents a type of query for the node assassin.

Get the state of all nodes:

 00:0

This will generate a message like:

Node states: 
- Max Node: 08
- Node 01: Running
- Node 02: Running
- Node 03: Running
- Node 04: Running
- Node 05: Running
- Node 06: Running
- Node 07: Running
- Node 08: Running
End Message.

Get information on the node assassin device:

 00:1

This will generate a message like:

Node info: 
- Node Name: ..... Ariel
- NAOS Version: .. v1.0.3
- Serial Number: . NA0001
- Build Date: .... 2010-02-26
- MAC address: ... 00:09:30:ff:f0:8a
- IP address: .... 192.168.1.66
- Subnet Mask: ... 255.255.255.0
- Default Gateway: 192.168.1.1
End Message.

Any node that is currently fenced will read 'Fenced!'.

Commands

Any string starting with a non-zero integer, like 02:x, will be interpreted as a command meant to be executed on a node. An integer represented by the x will be the command to execute.

Valid commands are:

  • xx:0 - Fence the node and keep it fenced.
  • xx:1 - Release the fence and allow the node to boot.
  • xx:2 - (Re)boot a node. This will close the switch for 1 second. This is useful when connecting a port to a node's power switch to send a graceful power-off via ACPI or to boot a powered-down server.
  • xx:3 - Force a power-off on a node. This will close the switch for 10 seconds, forcing a locked up node to power down. This is useless when connected to a reset switch and is meant to be used when a port is connected to a power switch.

Examples:

Fence node '02' (Supports nodes from 01 to 08):

 02:0

This will generate a message like:

 Node 02:0: Now Fenced!

Release '02':

 02:1

This will generate a message like:

 Node 02:1: Fence released!

Fencing or releasing other nodes is a simple as replacing '02' above with the node number. The design allows for the expansion of ports up to 99 devices. It should be trivial to go beyond that, but it's a sufficiently high number for now.

Once a node is fenced, the caller can release and the state will be held until either another call releases the fence or the Node Assassin is reset.

Fence Agent

Feb. 27, 2010: With the na_v1.1.3 prototype complete, I will now be turning my attention to this fence agent. Progress should be noticeable from here on in.

This is (will be...) the cman fence agent for Node Assassin.

#!/usr/bin/perl
#
# Node Assassin - Fence Agent
# Digimer; digimer@alteeve.com
# Mar. 01, 2010.
# Version: 0.1.002
#
# Bugs;
# - None known, many expected
# 

# Play safe!
use strict;
use warnings;
use IO::Handle;
use Net::Telnet;

my $conf={
	nodes		=>	1,
	max_valid_state	=>	3,
	node		=>	{
			1	=>	{
				name	=>	"Ariel",
				ip	=>	"192.168.1.66",
				port	=>	"238",
				handle	=>	"",
		},
	},
};

# Log file for output.
my $log_file="/tmp/fence_na.log";
my $log=IO::Handle->new();
open ($log, ">$log_file") || die "Failed to open: [$log_file] for writing; Error: $!\n";

# Let's see what we were asked to do.
record($log, "Got args:\n");
my $set_state=[];
my $bad=0;
my $set_next;
foreach my $arg (@ARGV)
{
	record($log, "[$arg], set_next: [$set_next]\n");
	if (defined $set_next)
	{
		$set_state->[$set_next]=$arg;
		record($log, "set_next: [$set_next], list: [$arg] ($set_state->[$set_next])\n");
		$set_next;
		next;
	}
	if ($arg =~/^--/)
	{
		# Picking up args using --set_state_#=<list>
		my ($i, $list)=($arg=~/--set_state_(\d)=(.*)/);
		if ((not defined $i) or ($i =~ /\D/) or (not $list))
		{
			record($log, "Argument: [$arg] is not valid!\n");
			record($log, "Double-dashed arguments are expected to be '--set_state_X=<list>' where 'X' is\n");
			record($log, "the state to set and '<list>' is the node id or a comma-seperated list of node\n");
			record($log, "ids to work on.\n");
			$bad=1;
		}
		$set_state->[$i]=$list;
# 		record($log, "i: [$i], list: [$list] ($set_state->[$i])\n");
	}
	elsif ($arg=~/^-/)
	{
		$arg=~s/^-//;
		$set_next=$arg;
		if ($set_next =~ /\D/)
		{
			record($log, "Argument: [$arg] ($set_next) is not valid!\n");
			record($log, "Single-dashed arguments are expected to be an integer.\n");
			$bad=1;
		}
	}
	else
	{
		# Bad arg.
		record($log, "Argument: [$arg] is not valid!\n");
		record($log, "Arguments must be '--set_state_X=<list>' or '-X <list>' where 'X' is the state\n");
		record($log, "to set and '<list>' is a node id or a comma-seperated list of nodes to set the\n");
		record($log, "state for.\n");
		$bad=1;
	}
}
die "Exiting on errors.\n" if $bad;
record($log, "Done.\n");

# Connect to the Node Assassin.
$conf->{node}{'1'}{handle}=new Net::Telnet(
	Timeout	=>	10,
	Errmode	=>	'die',
	Port	=>	$conf->{node}{'1'}{port},
	Prompt	=>	'/EOM$/',
);
# print "Handle: [$conf->{node}{'1'}{handle}]\n";
$conf->{node}{'1'}{handle}->open($conf->{node}{'1'}{ip});

# Query states and Node Assassin info.
record($log, "Checking Node Assassin info:\n");
my @info_out=$conf->{node}{'1'}{handle}->cmd("00:1");
my $node_name="";
foreach my $line (@info_out)
{
	record($log, $line);
	$node_name=$1 if $line=~/- Node Name: ..... (.*)/;
}
record($log, "Node name: [$node_name]\n");
record($log, "Done.\n");

record($log, "Checking states:\n");
my @state_out=$conf->{node}{'1'}{handle}->cmd("00:0");
foreach my $line (@state_out)
{
	record($log, $line);
}
record($log, "Done.\n");

for (my $i=0; $i<=$conf->{max_valid_state}; $i++)
{
	record($log, "Checking if there are IDs to set to state: [$i] - ");
	if (defined $set_state->[$i])
	{
		record($log, "There are.\n");
		my $list=$set_state->[$i];
		if ($list =~ /,/)
		{
			# process multiple node IDs.
			foreach my $id (split/,/, $list)
			{
				$id=sprintf("%02d", $id);
				record($log, "Setting node: [$id] to state: [$i]\n");
				my @set_state=$conf->{node}{'1'}{handle}->cmd("$id:$i");
				foreach my $line (@set_state)
				{
					record($log, $line);
				}
				record($log, "Done.\n");
			}
		}
		else
		{
			# process a single node ID.
			$list=sprintf("%02d", $list);
			record($log, "Setting node: [$list] to state: [$i]\n");
			my @set_state=$conf->{node}{'1'}{handle}->cmd("$list:$i");
			foreach my $line (@set_state)
			{
				record($log, $line);
			}
			record($log, "Done.\n");
		}
	}
	else
	{
		record($log, "none.\n");
	}
}


# Cleanup and exit.
$conf->{node}{'1'}{handle}->close;
$log->close();
exit(0);


sub record
{
	my ($log, $msg)=@_;
	
	print $log $msg;
	print $msg;
	
	return(0);
}

Hardware

Feb. 20, 2010: Current version:

This is the final revision of the Node Assassin hardware for the first release of the hardware.

v1.1.3

Availability of parts forced a slight change from v1.1.2 to this version:

  • Based on the Arduino Duemilanove using the Ethernet Shield for communication.
  • Arduino's digital pins #2-9 set to output through 330Ω resistors to 2x LTV-846 opto-isolators which opens and closes a given node's reset switch. Supports up to 8 nodes at the moment.
  • 74HC540E 8-Input Inverter for the status LEDs.
  • 1N4148 diodes prevent improper connection of the reset connectors by providing a closed circuit when the reset switch's +5vcc is connected to the ground pin.
  • 8x green and 8x red low-power LEDs are are paired up per circuit to show the status of each port.
  • 16x 330Ω resistors. Each Open/Fenced status LED pair's grounds are tied together and connected to ground via a resistor and the Arduino pins going through the optoisolators are tied to ground through a resistor.
  • An Arduino Proto Shield v.4 is used to assemble the components onto an Arduino-compatible shield. Given the size restrictions, wire-wrap wire is used to provide connections. It's messy, but it works.

Feb. 27, 2010: The initial proto-board is done!

It's alive! Here ports #02 and #05 are fenced.
Same picture, different angle.
The horrible looking, but functioning, underside!

I still need to do up proper schematics. I will do that shortly.

Per circuit design:

This is the schematic for a given circuit in this design. Credit's to Mark Loit.

Old Versions


 

Input, advice, complaints and meanderings all welcome!
Digimer digimer@alteeve.ca https://alteeve.ca/w legal stuff:  
All info is provided "As-Is". Do not use anything here unless you are willing and able to take resposibility for your own actions. © 1997-2013
Naming credits go to Christopher Olah!
In memory of Kettle, Tonia, Josh, Leah and Harvey. In special memory of Hannah, Jack and Riley.