Anvil! Tutorial 3

From Alteeve Wiki
Jump to navigation Jump to search

 AN!Wiki :: How To :: Anvil! Tutorial 3

Warning: This tutorial is incomplete, flawed and generally sucks at this time. Do not follow this and expect anything to work. In large part, it's a dumping ground for notes and little else. This warning will be removed when the tutorial is completed.
Warning: This tutorial is built on a guess of what Red Hat's Enterprise Linux 7 will offer, based on what the author sees happening in Fedora upstream. Red Hat never confirms what a future release will contain until it is actually released. As such, this tutorial may turn out to be inappropriate for the final release of RHEL 7. In such a case, the warning above will remain in place until the tutorial is updated to reflect the final release.

This is the third AN!Cluster tutorial built on Red Hat's Enterprise Linux 7. It improves on the RHEL 5, RHCS stable 2 and RHEL 6, RHCS stable3 tutorials.

As with the previous tutorials, the end goal of this tutorial is a 2-node cluster providing a platform for high-availability virtual servers. It's design attempts to remove all single points of failure from the system. Power and networking are made fully redundant in this version, along with minimizing the node failures which would lead to service interruption. This tutorial also covers the AN!Utilities; AN!Cluster Dashboard, AN!Cluster Monitor and AN!Safe Cluster Shutdown.

As it the previous tutorial, KVM will be the hypervisor used for facilitating virtual machines. The old cman and rgmanager tools are replaced in favour of pacemaker for resource management.

Before We Begin

This tutorial does not require prior cluster experience, but it does expect familiarity with Linux and a low-intermediate understanding of networking. Where possible, steps are explained in detail and rationale is provided for why certain decisions are made.

For those with cluster experience;

Please be careful not to skip too much. There are some major and some subtle changes from previous tutorials.

OS Setup

Warning: I used Fedora 18 at this point, obviously things will change, possibly a lot, once RHEL 7 is released.

Install

Not all of these are required, but most are used at one point or another in this tutorial.

yum install bridge-utils corosync net-tools network ntp pacemaker pcs rsync syslinux wget fence-agents-all resource-agents
Warning: On Fedora 19, there is a bug in pcsd where a dependency is missing. This manifests when you try to start the pcsd daemon and get the following error;
Jun 15 12:39:44 pcmk1 systemd: Starting PCS GUI...
Jun 15 12:39:44 pcmk1 systemd: Started PCS GUI.
Jun 15 12:39:45 pcmk1 pcsd: Starting pcsd: /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require': cannot load such file -- rpam_ext (LoadError)
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/gemhome/gems/rpam-ruby19-1.2.1/lib/rpam.rb:1:in `<top (required)>'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:110:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:110:in `rescue in require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:35:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/auth.rb:4:in `<top (required)>'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/pcsd.rb:11:in `<top (required)>'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:45:in `require'
Jun 15 12:39:45 pcmk1 pcsd: from /usr/lib/pcsd/ssl.rb:47:in `<main>'
Jun 15 12:39:45 pcmk1 pcsd: [FAILED]
Jun 15 12:39:45 pcmk1 systemd: pcsd.service: main process exited, code=exited, status=1/FAILURE
Jun 15 12:39:45 pcmk1 systemd: Unit pcsd.service entered failed state.

You will need to run the following until this bug is fixed;

yum -y install make gcc ruby-devel pam-devel
gem install rpam

Optional stuff:

yum install gpm man vim screen mlocate syslinux

If you want to use your mouse at the node's terminal, run the following;

systemctl enable gpm.service
systemctl start gpm.service

Make the Network Configuration Static

We don't want NetworkManager in our cluster as it tries to dynamically manage the network and we need our network to be static.

yum remove NetworkManager
Note: This assumes that systemd will be used in RHEL7. This may not be the case come release day.

Now to ensure the static network service starts on boot.

systemctl enable network.service

Setting the Hostname

Fedora 18 is very different from EL6.

Note: The '--pretty' line currently doesn't work as there is a bug (rhbz#895299) with single-quotes.
Note: The '--static' option is currently needed to prevent the '.' from being removed. See this bug (rhbz#896756).

Use a format that works for you. For the tutorial, node names are based on the following;

  • A two-letter prefix identifying the company/user (an, for "Alteeve's Niche!")
  • A sequential cluster ID number in the form of cXX (c01 for "Cluster 01", c02 for Cluster 02, etc)
  • A sequential node ID number in the form of nYY

In my case, this is my third cluster and I use the company prefix an, so my two nodes will be;

  • an-c03n01 - node 1
  • an-c03n02 - node 2

Folks who've read my earlier tutorials will note that this is a departure in naming. I find this method spans and scales much better. Further, it the simply required in order to use the AN! Cluster Dashboard.

hostnamectl set-hostname an-c03n01.alteeve.ca --static
hostnamectl set-hostname --pretty "Alteeve's Niche! - Cluster 01, Node 01"

If you want the new host name to take effect immediately, you can use the traditional hostname command:

hostname an-c03n01.alteeve.ca

Alternatively

If you have trouble with those commands, you can directly edit the files that contain the host names.

The host name is stored in /etc/hostname:

echo an-c03n01.alteeve.ca > /etc/hostname 
cat /etc/hostname
an-c03n01.alteeve.ca

The "pretty" host name is stored in /etc/machine-info as the unquoted value for the PRETTY_HOSTNAME value.

vim /etc/machine-info
PRETTY_HOSTNAME=Alteeves Niche! - Cluster 01, Node 01

If you can't get the hostname command to work for some reason, you can reboot to have the system read the new values.

Optional - Video Problems

On my servers, Fedora 18 doesn't detect or use the video card properly. To resolve this, I need to add nomodeset to the kernel line when installing and again after the install is complete.

Once installed

Edit the /etc/default/grub and append nomodeset to the end of the GRUB_CMDLINE_LINUX variable.

vim /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_CMDLINE_LINUX="nomodeset rd.md=0 rd.lvm=0 rd.dm=0 $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) rd.luks=0 vconsole.keymap=us nomodeset"
GRUB_DISABLE_RECOVERY="true"
GRUB_THEME="/boot/grub2/themes/system/theme.txt"

Save that. and then rewrite the grub2 configuration file.

grub2-mkconfig -o /boot/grub2/grub.cfg

Next time you reboot, you should get a stock 80x25 character display. It's not much, but it will work on esoteric video cards or weird monitors.

What Security?

This section will be re-added at the end. For now;

setenforce 0
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
systemctl disable firewalld.service
systemctl stop firewalld.service

Network

We want static, named network devices. Follow this;

Then, use these configuration files;

Build the bridge;

vim /etc/sysconfig/network-scripts/ifcfg-ifn-vbr1
# Internet-Facing Network - Bridge
DEVICE="ifn-vbr1"
TYPE="Bridge"
BOOTPROTO="none"
IPADDR="10.255.10.1"
NETMASK="255.255.0.0"
GATEWAY="10.255.255.254"
DNS1="8.8.8.8"
DNS2="8.8.4.4"
DEFROUTE="yes"

Now build the bonds;

vim /etc/sysconfig/network-scripts/ifcfg-ifn-bond1
# Internet-Facing Network - Bond
DEVICE="ifn-bond1"
BRIDGE="ifn-vbr1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=ifn1"
vim /etc/sysconfig/network-scripts/ifcfg-sn-bond1
# Storage Network - Bond
DEVICE="sn-bond1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=sn1"
IPADDR="10.10.10.1"
NETMASK="255.255.0.0"
vim /etc/sysconfig/network-scripts/ifcfg-bcn-bond1
# Back-Channel Network - Bond
DEVICE="bcn-bond1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=bcn1"
IPADDR="10.20.10.1"
NETMASK="255.255.0.0"

Now tell the interfaces to be slaves to their bonds;

Internet-Facing Network;

vim /etc/sysconfig/network-scripts/ifcfg-ifn1
# Internet-Facing Network - Link 1
DEVICE="ifn1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="ifn-bond1"
vim /etc/sysconfig/network-scripts/ifcfg-ifn2
# Back-Channel Network - Link 2
DEVICE="ifn2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="ifn-bond1"

Storage Network;

vim /etc/sysconfig/network-scripts/ifcfg-sn1
# Storage Network - Link 1
DEVICE="sn1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn-bond1"
vim /etc/sysconfig/network-scripts/ifcfg-sn2
# Storage Network - Link 1
DEVICE="sn2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn-bond1"

Back-Channel Network

vim /etc/sysconfig/network-scripts/ifcfg-bcn1
# Back-Channel Network - Link 1
DEVICE="bcn1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="bcn-bond1"
vim /etc/sysconfig/network-scripts/ifcfg-bcn2
# Storage Network - Link 1
DEVICE="bcn2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="bcn-bond1"

Now restart the network, confirm that the bonds and bridge are up and you are ready to proceed.

Setup The hosts File

You can use DNS if you prefer. For now, lets use /etc/hosts for node name resolution.

vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# AN!Cluster 01, Node 01
10.255.10.1     an-c01n01.ifn
10.10.10.1      an-c01n01.sn
10.20.10.1      an-c01n01.bcn an-c01n01 an-c01n01.alteeve.ca
10.20.11.1      an-c01n01.ipmi

# AN!Cluster 01, Node 02
10.255.10.2     an-c01n02.ifn
10.10.10.2      an-c01n02.sn
10.20.10.2      an-c01n02.bcn an-c01n02 an-c01n02.alteeve.ca
10.20.11.2      an-c01n02.ipmi

# Foundation Pack
10.20.2.7       an-p03 an-p03.alteeve.ca

Setup SSH

Same as before.

Populating And Pushing ~/.ssh/known_hosts

Same as before.

ssh root@an-c03n01.alteeve.ca
The authenticity of host 'an-c03n01.alteeve.ca (10.20.30.1)' can't be established.
RSA key fingerprint is 7b:dd:0d:aa:c5:f5:9e:a6:b6:4d:40:69:d6:80:4d:09.
Are you sure you want to continue connecting (yes/no)?

Type yes

Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'an-c03n01.alteeve.ca,10.20.30.1' (RSA) to the list of known hosts.
Last login: Thu Feb 14 15:18:33 2013 from 10.20.5.100

You will now be logged into the an-c03n01 node, which in this case is the same machine on a new session in the same terminal.

[root@an-c03n01 ~]#

You can logout by typing exit.

exit
logout
Connection to an-c03n01.alteeve.ca closed.

Now we have to repeat the steps for all the other variations on the names of the hosts. This is annoying and tedious, sorry.

ssh root@an-c03n01
ssh root@an-c03n01.bcn
ssh root@an-c03n01.sn
ssh root@an-c03n01.ifn
ssh root@an-c03n02.alteeve.ca
ssh root@an-c03n02
ssh root@an-c03n02.bcn
ssh root@an-c03n02.sn
ssh root@an-c03n02.ifn

Your ~/.ssh/known_hosts file will now be populated with both nodes' ssh fingerprints. Copy it over to the second node to save all that typing a second time.

rsync -av ~/.ssh/known_hosts root@an-c03n02:/root/.ssh/

Keeping Time in Sync

It's not as critical as it used to be to keep the clocks on the nodes in sync, but it's still a good idea.

systemctl start ntpd.service
systemctl enable ntpd.service

Configuring IPMI

F19 specifics based on the IPMI tutorial.

yum -y install ipmitools OpenIPMI
systemctl start ipmi.service
systemctl enable ipmi.service
ln -s '/usr/lib/systemd/system/ipmi.service' '/etc/systemd/system/multi-user.target.wants/ipmi.service'

Our servers use lan channel 2, yours might be 1 or something else. Experiment.

ipmitool lan print 2
Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : BIOS Assigned Address
IP Address              : 10.20.51.1
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:d8:e8
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

I need to set the IPs to 10.20.31.1/16 and 10.20.31.2/16 for nodes 1 and 2, respectively. I also want to set the password to secret for the admin user.

Node 01 IP;

ipmitool lan set 2 ipsrc static
ipmitool lan set 2 ipaddr 10.20.31.
ipmitool lan set 2 netmask 255.255.0.0
ipmitool lan set 2 defgw ipaddr 10.20.255.254
ipmitool lan print 2
Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : Static Address
IP Address              : 10.20.31.1
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:d8:e8
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

Node 01 IP;

ipmitool lan set 2 ipsrc static
ipmitool lan set 2 ipaddr 10.20.31.2
ipmitool lan set 2 netmask 255.255.0.0
ipmitool lan set 2 defgw ipaddr 10.20.255.254
ipmitool lan print 2
Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : Static Address
IP Address              : 10.20.31.2
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:b1:78
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

Set the password.

ipmitool user list 2
ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
1                    true    true       true       Unknown (0x00)
2   admin            true    true       true       OEM
Get User Access command failed (channel 2, user 3): Unknown (0x32)

(ignore the error, it's harmless... *BOOM*)

We want to set admin's password, so we do:

Note: The 2 below is the ID number, not the LAN channel.
ipmitool user set password 2 secret

Done!

Configuring the Cluster

Now we're getting down to business!

For this section, we will be working on an-c03n01 and using ssh to perform tasks on an-c03n02.

Note: TODO: explain what this is and how it works.

Enable the pcs Daemon

Note: Most of this section comes more or less verbatim from the main Clusters from Scratch tutorial.

We will use pcs, the Pacemaker Configuration System, to configure our cluster.

systemctl start pcsd.service
systemctl enable pcsd.service
ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'

Now we need to set a password for the hacluster user. This is the account used by pcs on one node to talk to the pcs daemon on the other node. For this tutorial, we will use the password secret. You will want to use a stronger password, of course.

echo secret | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

Initializing the Cluster

One of the biggest reasons we're using the pcs tool, over something like crm, is that it has been written to simplify the setup of clusters on Red Hat style operating systems. It will configure corosync automatically.

First, authenticate against the cluster nodes.

pcs cluster auth an-c03n01 an-c03n02

This will ask you for the user name and password. The default user name is hacluster and we set the password to secret.

Username: hacluster
Password: 
an-c03n01: Authorized
an-c03n02: Authorized

Do this on one node only:

Now to initialize the cluster's communication and membership layer.

pcs cluster setup an-cluster-03 an-c03n01.alteeve.ca an-c03n02.alteeve.ca
an-c03n01: Succeeded
an-c03n02: Succeeded

This will create the corosync configuration file /etc/corosync/corosync.conf;

cat /etc/corosync/corosync.conf
totem {
version: 2
secauth: off
cluster_name: an-cluster-03
transport: udpu
}

nodelist {
  node {
        ring0_addr: an-c03n01.alteeve.ca
        nodeid: 1
       }
  node {
        ring0_addr: an-c03n02.alteeve.ca
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
}

logging {
to_syslog: yes
}

Start the Cluster For the First Time

This starts the cluster communication and membership layer for the first time.

On one node only;

pcs cluster start --all
an-c03n01.alteeve.ca: Starting Cluster...
an-c03n02.alteeve.ca: Starting Cluster...

After a few moments, you should be able to check the status;

pcs status
Cluster name: an-cluster-03
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon Jun 24 23:28:29 2013
Last change: Mon Jun 24 23:28:10 2013 via crmd on an-c03n01.alteeve.ca
Current DC: NONE
2 Nodes configured, unknown expected votes
0 Resources configured.


Node an-c03n01.alteeve.ca (1): UNCLEAN (offline)
Node an-c03n02.alteeve.ca (2): UNCLEAN (offline)

Full list of resources:

The other node should show almost the identical output.

The two main things here are errors about stonith being unconfigured. We will fix this very shortly, but for just this moment, we will disable it and quorum.

pcs property set stonith-enabled=false
pcs status
Cluster name: an-cluster-03
Last updated: Tue Jun 25 00:12:21 2013
Last change: Tue Jun 25 01:42:04 2013 via cibadmin on an-c03n01.alteeve.ca
Stack: corosync
Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-3.1670.377aefd.git.el7-377aefd
2 Nodes configured, unknown expected votes
0 Resources configured.


Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]

Full list of resources:

Disabling Quorum

Note: Show the math.

With quorum enabled, a two node cluster will lose quorum once either node fails. So we have to disable quorum.

By default, pacemaker uses quorum. You don't see this initially though;

pcs property
Cluster Properties:
 dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
 cluster-infrastructure: corosync

To disable it, we set no-quorum-policy=ignore.

pcs property set no-quorum-policy=ignore
pcs property
Cluster Properties:
 dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
 cluster-infrastructure: corosync
 no-quorum-policy: ignore

Enabling and Configuring Fencing

We will use IPMI and PDU based fence devices for redundancy.

You can see the list of available fence agents here. You will need to find the one for your hardware fence devices.

pcs stonith list
fence_alom - Fence agent for Sun ALOM
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC over SNMP
fence_baytech - I/O Fencing agent for Baytech RPC switches in combination with a Cyclades Terminal
                Server
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for Brocade over telnet
fence_bullpap - I/O Fencing agent for Bull FAME architecture controlled by a PAP management console.
fence_cisco_mds - Fence agent for Cisco MDS
fence_cisco_ucs - Fence agent for Cisco UCS
fence_cpint - I/O Fencing agent for GFS on s390 and zSeries VM clusters
fence_drac - fencing agent for Dell Remote Access Card
fence_drac5 - Fence agent for Dell DRAC CMC/5
fence_eaton_snmp - Fence agent for Eaton over SNMP
fence_egenera - I/O Fencing agent for the Egenera BladeFrame
fence_eps - Fence agent for ePowerSwitch
fence_hpblade - Fence agent for HP BladeSystem
fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
fence_idrac - Fence agent for IPMI over LAN
fence_ifmib - Fence agent for IF MIB
fence_ilo - Fence agent for HP iLO
fence_ilo2 - Fence agent for HP iLO
fence_ilo3 - Fence agent for IPMI over LAN
fence_ilo_mp - Fence agent for HP iLO MP
fence_imm - Fence agent for IPMI over LAN
fence_intelmodular - Fence agent for Intel Modular
fence_ipdu - Fence agent for iPDU over SNMP
fence_ipmilan - Fence agent for IPMI over LAN
fence_kdump - Fence agent for use with kdump
fence_ldom - Fence agent for Sun LDOM
fence_lpar - Fence agent for IBM LPAR
fence_mcdata - I/O Fencing agent for McData FC switches
fence_rackswitch - fence_rackswitch - I/O Fencing agent for RackSaver RackSwitch
fence_rhevm - Fence agent for RHEV-M REST API
fence_rsa - Fence agent for IBM RSA
fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
fence_sanbox2 - Fence agent for QLogic SANBox2 FC switches
fence_scsi - fence agent for SCSI-3 persistent reservations
fence_virsh - Fence agent for virsh
fence_vixel - I/O Fencing agent for Vixel FC switches
fence_vmware - Fence agent for VMWare
fence_vmware_soap - Fence agent for VMWare over SOAP API
fence_wti - Fence agent for WTI
fence_xcat - I/O Fencing agent for xcat environments
fence_xenapi - XenAPI based fencing for the Citrix XenServer virtual machines.
fence_zvm - I/O Fencing agent for GFS on s390 and zSeries VM clusters

We will use fence_ipmilan and fence_apc_snmp.

Configuring IPMI Fencing

Every fence agent has a possibly unique subset of options that can be used. You can see a brief description of these options with the pcs stonith describe fence_X command. Let's look at the options available for fence_ipmilan.

pcs stonith describe fence_ipmilan
Stonith options for: fence_ipmilan
  auth: IPMI Lan Auth type (md5, password, or none)
  ipaddr: IPMI Lan IP to talk to
  passwd: Password (if required) to control power on IPMI device
  passwd_script: Script to retrieve password (if required)
  lanplus: Use Lanplus
  login: Username/Login (if required) to control power on IPMI device
  action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
  timeout: Timeout (sec) for IPMI operation
  cipher: Ciphersuite to use (same as ipmitool -C parameter)
  method: Method to fence (onoff or cycle)
  power_wait: Wait X seconds after on/off operation
  delay: Wait X seconds before fencing is started
  privlvl: Privilege level on IPMI device
  verbose: Verbose mode

One of the nice things about pcs is that it allows us to create a test file to prepare all our changes in. Then, when we're happy with the changes, merge them into the running cluster. So let's make a copy called stonith_cfg

pcs cluster cib stonith_cfg

Now add fencing.

#   temp file                     unique name    fence agent   target node                          device addr           options
pcs -f stonith_cfg stonith create impi-an-c03n01 fence_ipmilan pcmk_host_list="an-m03n01 an-m03n02" ipaddr=an-c03n01.ipmi pcmk_reboot_action=reboot login=admin passwd=secret delay=15 op monitor interval=60s
pcs -f stonith_cfg stonith create impi-an-c03n02 fence_ipmilan pcmk_host_list="an-m03n01 an-m03n02" ipaddr=an-c03n02.ipmi pcmk_reboot_action=reboot login=admin passwd=secret op monitor interval=60s

Note that impi-an-c03n01 has a delay=10 set but impi-an-c03n02 does not. If the network connection breaks between the two nodes, they will both try to fence each other at the same time. If acpid is running, the slower node will not die right away. It will continue to run for up to four more seconds, ample time for it to also initiate a fence against the faster node. The end result is that both nodes get fenced. The ten-second delay protects against this by causing an-c03n02 to pause for 10 seconds before initiating a fence against an-c03n01. If both nodes are alive, an-c03n02 will power off before the 10 seconds pass, so it will never fence an-c03n01. However, if an-c03n01 really is dead, after the ten seconds have elapsed, fencing will proceed as normal.

We can check the new configuration now;

pcs -f stonith_cfg stonith
impi-an-c03n01	(stonith:fence_ipmilan):	Stopped 
 impi-an-c03n02	(stonith:fence_ipmilan):	Stopped

Before we proceed, we need to tell pacemaker to use fencing;

pcs -f stonith_cfg property set stonith-enabled=true
pcs -f stonith_cfg property
Cluster Properties:
 dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
 cluster-infrastructure: corosync
 no-quorum-policy: ignore
 stonith-enabled: true

Excellent. Now we'll merge these changes into the active cluster configuration.

pcs cluster push cib stonith_cfg
CIB updated

Notes

This is raw notes I store for myself while working on this tutorial. This section will be removed before the tutorial is complete.

Complex Fencing Topology

crm configure primitive fence_n01_ipmi stonith:fence_ipmilan params ipaddr="an-c03n01.ipmi" pcmk_reboot_action="reboot" login="admin" passwd="secret" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n02_ipmi stonith:fence_ipmilan params ipaddr="an-c03n02.ipmi" pcmk_reboot_action="reboot" login="admin" passwd="secret" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n01_psu1_off stonith:fence_apc_snmp params ipaddr="an-p01" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu2_off stonith:fence_apc_snmp params ipaddr="an-p02" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu1_on stonith:fence_apc_snmp params ipaddr="an-p01" pcmk_reboot_action="on" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n01_psu2_on stonith:fence_apc_snmp params ipaddr="an-p02" pcmk_reboot_action="on" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
crm configure primitive fence_n02_psu1_off stonith:fence_apc_snmp params ipaddr="an-p01" pcmk_reboot_action="off" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu2_off stonith:fence_apc_snmp params ipaddr="an-p02" pcmk_reboot_action="off" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu1_on stonith:fence_apc_snmp params ipaddr="an-p01" pcmk_reboot_action="on" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure primitive fence_n02_psu2_on stonith:fence_apc_snmp params ipaddr="an-p02" pcmk_reboot_action="on" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
crm configure location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
crm configure location loc_fence_n02_ipmi fence_n02_ipmi -inf: an-c03n02.alteeve.ca
crm configure location loc_fence_n01_psu1_off fence_n01_psu1_off -inf: an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu2_off fence_n01_psu2_off -inf: an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu1_on fence_n01_psu1_on -inf: an-c03n01.alteeve.ca
crm configure location loc_fence_n01_psu2_on fence_n01_psu2_on -inf: an-c03n01.alteeve.ca
crm configure location loc_fence_n02_psu1_off fence_n02_psu1_off -inf: an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu2_off fence_n02_psu2_off -inf: an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu1_on fence_n02_psu1_on -inf: an-c03n02.alteeve.ca
crm configure location loc_fence_n02_psu2_on fence_n02_psu2_on -inf: an-c03n02.alteeve.ca
crm configure fencing_topology an-c03n01.alteeve.ca: fence_n01_ipmi fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on an-c03n02.alteeve.ca: fence_n02_ipmi fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on
crm configure property stonith-enabled="true"
crm configure show
node $id="1" an-c03n01.alteeve.ca
node $id="2" an-c03n02.alteeve.ca
primitive fence_n01_ipmi stonith:fence_ipmilan \
        params ipaddr="an-c03n01.ipmi" pcmk_reboot_action="reboot" login="admin" passwd="secret" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu1_off stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu1_on stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="on" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu2_off stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu2_on stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="on" port="1" pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n02_ipmi stonith:fence_ipmilan \
        params ipaddr="an-c03n02.ipmi" pcmk_reboot_action="reboot" login="admin" passwd="secret" pcmk_host_list="an-c03n02.alteeve.ca" \
        meta target-role="Started"
primitive fence_n02_psu1_off stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="off" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu1_on stonith:fence_apc_snmp \
        params ipaddr="an-p01" pcmk_reboot_action="on" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu2_off stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="off" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
primitive fence_n02_psu2_on stonith:fence_apc_snmp \
        params ipaddr="an-p02" pcmk_reboot_action="on" port="2" pcmk_host_list="an-c03n02.alteeve.ca"
location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu1_off fence_n01_psu1_off -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu1_on fence_n01_psu1_on -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu2_off fence_n01_psu2_off -inf: an-c03n01.alteeve.ca
location loc_fence_n01_psu2_on fence_n01_psu2_on -inf: an-c03n01.alteeve.ca
location loc_fence_n02_ipmi fence_n02_ipmi -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu1_off fence_n02_psu1_off -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu1_on fence_n02_psu1_on -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu2_off fence_n02_psu2_off -inf: an-c03n02.alteeve.ca
location loc_fence_n02_psu2_on fence_n02_psu2_on -inf: an-c03n02.alteeve.ca
fencing_topology \
        an-c03n01.alteeve.ca: fence_n01_ipmi fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on \
        an-c03n02.alteeve.ca: fence_n02_ipmi fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on
property $id="cib-bootstrap-options" \
        dc-version="1.1.10-3.1733.a903e62.git.el7-a903e62" \
        cluster-infrastructure="corosync" \
        no-quorum-policy="ignore" \
        stonith-enabled="true"
cat /var/lib/pacemaker/cib/cib.xml
<cib epoch="38" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Jun 27 12:20:19 2013" update-origin="an-c03n01.alteeve.ca" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-3.1733.a903e62.git.el7-a903e62"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
        <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="an-c03n01.alteeve.ca"/>
      <node id="2" uname="an-c03n02.alteeve.ca"/>
    </nodes>
    <resources>
      <primitive class="stonith" id="fence_n01_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n01_ipmi-instance_attributes">
          <nvpair id="fence_n01_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-c03n01.ipmi"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="reboot"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-login" name="login" value="admin"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n02_ipmi-instance_attributes">
          <nvpair id="fence_n02_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-c03n02.ipmi"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="reboot"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-login" name="login" value="admin"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
        <meta_attributes id="fence_n02_ipmi-meta_attributes">
          <nvpair id="fence_n02_ipmi-meta_attributes-target-role" name="target-role" value="Started"/>
        </meta_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu1_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu1_off-instance_attributes">
          <nvpair id="fence_n01_psu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
          <nvpair id="fence_n01_psu1_off-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n01_psu1_off-instance_attributes-port" name="port" value="1"/>
          <nvpair id="fence_n01_psu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu1_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu1_on-instance_attributes">
          <nvpair id="fence_n01_psu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
          <nvpair id="fence_n01_psu1_on-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n01_psu1_on-instance_attributes-port" name="port" value="1"/>
          <nvpair id="fence_n01_psu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu2_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu2_off-instance_attributes">
          <nvpair id="fence_n01_psu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
          <nvpair id="fence_n01_psu2_off-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n01_psu2_off-instance_attributes-port" name="port" value="1"/>
          <nvpair id="fence_n01_psu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n01_psu2_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_psu2_on-instance_attributes">
          <nvpair id="fence_n01_psu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
          <nvpair id="fence_n01_psu2_on-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n01_psu2_on-instance_attributes-port" name="port" value="1"/>
          <nvpair id="fence_n01_psu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu1_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu1_off-instance_attributes">
          <nvpair id="fence_n02_psu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
          <nvpair id="fence_n02_psu1_off-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n02_psu1_off-instance_attributes-port" name="port" value="2"/>
          <nvpair id="fence_n02_psu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu1_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu1_on-instance_attributes">
          <nvpair id="fence_n02_psu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
          <nvpair id="fence_n02_psu1_on-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n02_psu1_on-instance_attributes-port" name="port" value="2"/>
          <nvpair id="fence_n02_psu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu2_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu2_off-instance_attributes">
          <nvpair id="fence_n02_psu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
          <nvpair id="fence_n02_psu2_off-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence_n02_psu2_off-instance_attributes-port" name="port" value="2"/>
          <nvpair id="fence_n02_psu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
      <primitive class="stonith" id="fence_n02_psu2_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_psu2_on-instance_attributes">
          <nvpair id="fence_n02_psu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
          <nvpair id="fence_n02_psu2_on-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="on"/>
          <nvpair id="fence_n02_psu2_on-instance_attributes-port" name="port" value="2"/>
          <nvpair id="fence_n02_psu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
        </instance_attributes>
      </primitive>
    </resources>
    <constraints>
      <rsc_location id="loc_fence_n01_ipmi" node="an-c03n01.alteeve.ca" rsc="fence_n01_ipmi" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_ipmi" node="an-c03n02.alteeve.ca" rsc="fence_n02_ipmi" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu1_off" node="an-c03n01.alteeve.ca" rsc="fence_n01_psu1_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu1_on" node="an-c03n01.alteeve.ca" rsc="fence_n01_psu1_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu2_off" node="an-c03n01.alteeve.ca" rsc="fence_n01_psu2_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n01_psu2_on" node="an-c03n01.alteeve.ca" rsc="fence_n01_psu2_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu1_off" node="an-c03n02.alteeve.ca" rsc="fence_n02_psu1_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu1_on" node="an-c03n02.alteeve.ca" rsc="fence_n02_psu1_on" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu2_off" node="an-c03n02.alteeve.ca" rsc="fence_n02_psu2_off" score="-INFINITY"/>
      <rsc_location id="loc_fence_n02_psu2_on" node="an-c03n02.alteeve.ca" rsc="fence_n02_psu2_on" score="-INFINITY"/>
    </constraints>
    <fencing-topology>
      <fencing-level devices="fence_n01_ipmi" id="fencing" index="1" target="an-c03n01.alteeve.ca"/>
      <fencing-level devices="fence_n01_psu1_off,fence_n01_psu2_off,fence_n01_psu1_on,fence_n01_psu2_on" id="fencing-3" index="2" target="an-c03n01.alteeve.ca"/>
      <fencing-level devices="fence_n02_ipmi" id="fencing-1" index="1" target="an-c03n02.alteeve.ca"/>
      <fencing-level devices="fence_n02_psu1_off,fence_n02_psu2_off,fence_n02_psu1_on,fence_n02_psu2_on" id="fencing-4" index="2" target="an-c03n02.alteeve.ca"/>
    </fencing-topology>
  </configuration>
</cib>

DRBD

We will use DRBD 8.4.

yum -y install drbd drbd-pacemaker drbd-bash-completion


Thanks

This list will certainly grow as this tutorial progresses;

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.