Latest revision as of 16:48, 19 November 2016

AN!Wiki :: How To :: Anvil! Tutorial 3

Warning: This tutorial is incomplete, flawed and generally sucks at this time. Do not follow this and expect anything to work. In large part, it's a dumping ground for notes and little else. This warning will be removed when the tutorial is completed.

This is the third Anvil! tutorial built on Red Hat's Enterprise Linux 7. It marks the third generation of the Anvil! High-Availability Platform.

As with the previous tutorials, the end goal of this tutorial is an Anvil! platform for high-availability virtual servers. It's design attempts to remove all single points of failure from the system. Power and networking are made fully redundant in this version, along with minimizing the node failures which would lead to service interruption. This tutorial also covers the Striker dashboard and ScanCore monitoring and self-healing tools.

As it the previous tutorial, KVM will be the hypervisor used for facilitating virtual machines. The old cman and rgmanager tools are replaced in favour of pacemaker for resource management.

Before We Begin

This tutorial does not require prior Anvil! experience (or any clustering experience), but it does expect a certain familiarity with Linux and a low-intermediate understanding of networking. Where possible, steps are explained in detail and rationale is provided for why certain decisions are made.

For those with Anvil! experience;

Please be careful not to skip too much. There are some major and some subtle changes from previous tutorials.

OS Setup

This tutorial assumes a minimal install of either RHEL or CentOS version 7.

Post OS Install

Note: With RHEL7, biosdevname tries to give network devices predictable names. It's very likely that your initial device names will differ from those in this tutorial.

If you are running RHEL

Before you can download any packages, you will need to register your nodes with Red Hat's subscription manager;

an-a04n01

subscription-manager register --username $username --password $password --auto-attach
subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
subscription-manager repos --enable=rhel-7-server-optional-rpms

The system has been registered with ID: 9c578d87-bd80-4637-9f41-6076efb9e20e

Installed Product Current Status:
Product Name: Red Hat Enterprise Linux Server
Status:       Subscribed

an-a04n02

subscription-manager register --username $username --password $password --auto-attach
subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
subscription-manager repos --enable=rhel-7-server-optional-rpms

The system has been registered with ID: a55c83e5-e4ec-4fcf-b7b7-b9455b3e07cf

Installed Product Current Status:
Product Name: Red Hat Enterprise Linux Server
Status:       Subscribed

Adding LINBIT Repos

If you purchased full LINBIT support, you can add their repos in order to get DRBD 9 and associated tools.

First, download their registration tool.

an-a04n01

cd /root
wget https://my.linbit.com/linbit-manage-node.py

--2016-11-19 10:22:21--  https://my.linbit.com/linbit-manage-node.py
Resolving my.linbit.com (my.linbit.com)... 212.69.166.235
Connecting to my.linbit.com (my.linbit.com)|212.69.166.235|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26797 (26K) [application/x-python-script]
Saving to: ‘linbit-manage-node.py’

100%[========================================================================================>] 26,797      --.-K/s   in 0.1s    

2016-11-19 10:22:21 (175 KB/s) - ‘linbit-manage-node.py’ saved [26797/26797]

an-a04n02

cd /root
wget https://my.linbit.com/linbit-manage-node.py

--2016-11-19 10:26:52--  https://my.linbit.com/linbit-manage-node.py
Resolving my.linbit.com (my.linbit.com)... 212.69.166.235
Connecting to my.linbit.com (my.linbit.com)|212.69.166.235|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26797 (26K) [application/x-python-script]
Saving to: ‘linbit-manage-node.py’

100%[========================================================================================>] 26,797      --.-K/s   in 0.1s    

2016-11-19 10:26:53 (182 KB/s) - ‘linbit-manage-node.py’ saved [26797/26797]

Make it executable.

an-a04n01

chmod 755 linbit-manage-node.py
ls -lah linbit-manage-node.py

-rwxr-xr-x. 1 root root 27K Oct 11 05:54 linbit-manage-node.py

an-a04n02

chmod 755 linbit-manage-node.py

-rwxr-xr-x. 1 root root 27K Oct 11 05:54 linbit-manage-node.py

Note: If you get the error: 'ERR: Could not detect MAC addresses of your node', then the version of 'linbit-manage-node.py' does not yet recognise bridges or slaved interfaces in bonds. For now, you can download a modified version from Alteeve instead.

Now run the tool interactively.

an-a04n01

/root/linbit-manage-node.py

linbit-manage-node.py (Version: 1.11)
Checking if version is up to date
[OK] Your version is up to date
Username:

an-a04n02

/root/linbit-manage-node.py

linbit-manage-node.py (Version: 1.11)
Checking if version is up to date
[OK] Your version is up to date
Username:

Enter the user name and password given to you by LINBIT when you registered with them.

an-a04n01

Username: xxxxxx
Credential (will not be echoed):

[OK] Login successful
The following contracts are available:
Will this node form a cluster with...

1) Contract: silver 2017-01-07 (ID: xxxx)

--> Please enter a number in range and press return:

an-a04n02

Username: xxxxxx
Credential (will not be echoed):

[OK] Login successful
The following contracts are available:
Will this node form a cluster with...

1) Contract: silver 2017-01-07 (ID: xxxx)

--> Please enter a number in range and press return:

If you have multiple contracts, select the number to the left of the contract identification. Otherwise, select '1'.

an-a04n01	--> Please enter a number in range and press return: 1 Writing registration data:
an-a04n02	--> Please enter a number in range and press return: 1 Writing registration data:

Confirm that you want to write out the license file. Once you accept, you will be presented with a menu of which repositories you want to use from LINBIT. We're only going to enable the 'drbd-9.0' repo and leave the pacemaker repos disabled as we'll pull them from Red Hat.

an-a04n01

--> Write to file (/var/lib/drbd-support/registration.json)? [y/N]

  Here are the repositories you can enable:

    1) pacemaker-1.1.15(Disabled)
    2) pacemaker-1.1.12(Disabled)
    3) pacemaker-1.1(Disabled)
    4) drbd-9.0(Disabled)
    5) drbd-8.4(Disabled)

  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.

  Enable/Disable: 4

  Here are the repositories you can enable:

    1) pacemaker-1.1.15(Disabled)
    2) pacemaker-1.1.12(Disabled)
    3) pacemaker-1.1(Disabled)
    4) drbd-9.0(Enabled)
    5) drbd-8.4(Disabled)

  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.

  Enable/Disable: 0

an-a04n02

--> Write to file (/var/lib/drbd-support/registration.json)? [y/N]

  Here are the repositories you can enable:

    1) pacemaker-1.1.15(Disabled)
    2) pacemaker-1.1.12(Disabled)
    3) pacemaker-1.1(Disabled)
    4) drbd-9.0(Disabled)
    5) drbd-8.4(Disabled)

  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.

  Enable/Disable: 4

  Here are the repositories you can enable:

    1) pacemaker-1.1.15(Disabled)
    2) pacemaker-1.1.12(Disabled)
    3) pacemaker-1.1(Disabled)
    4) drbd-9.0(Enabled)
    5) drbd-8.4(Disabled)

  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.

  Enable/Disable: 0

Warning: The repository will include a node-specific hash string in the 'baseurl'. Keep this private!

Once you select '0' to exit that menu, a summary of the repo will be displayed and you will be asked if you want to save it or not.

an-a04n01

Writing repository config:
Content:
[drbd-8.4]
name=LINBIT Packages for drbd-8.4 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-8.4/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1]
name=LINBIT Packages for pacemaker-1.1 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1.15]
name=LINBIT Packages for pacemaker-1.1.15 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.15/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1.12]
name=LINBIT Packages for pacemaker-1.1.12 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.12/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[drbd-9.0]
name=LINBIT Packages for drbd-9.0 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-9.0/$basearch
enabled=1
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

--> Write to file (/etc/yum.repos.d/linbit.repo)? [y/N] y

an-a04n02

Writing repository config:
Content:
[drbd-8.4]
name=LINBIT Packages for drbd-8.4 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-8.4/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1]
name=LINBIT Packages for pacemaker-1.1 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1.15]
name=LINBIT Packages for pacemaker-1.1.15 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.15/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[pacemaker-1.1.12]
name=LINBIT Packages for pacemaker-1.1.12 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.12/$basearch
enabled=0
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

[drbd-9.0]
name=LINBIT Packages for drbd-9.0 - $basearch
baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-9.0/$basearch
enabled=1
gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
gpgcheck=1

--> Write to file (/etc/yum.repos.d/linbit.repo)? [y/N] y

When you accept, it will download the yum plugins and then ask you if you want to save their PGP key.

an-a04n01

[OK] Repository configuration written
Downloading LINBIT yum plugin
Downloading LINBIT yum plugin config
Final Notes:

--> Add linbit signing key to keyring now? [y/N] y

Now update your package information and install
LINBIT's kernel module and/or user space utilities
[OK] Congratulations! Your node was successfully configured.

an-a04n02

[OK] Repository configuration written
Downloading LINBIT yum plugin
Downloading LINBIT yum plugin config
Final Notes:

--> Add linbit signing key to keyring now? [y/N] y

Now update your package information and install
LINBIT's kernel module and/or user space utilities
[OK] Congratulations! Your node was successfully configured.

Done!

Install

Not all of these are required, but most are used at one point or another in this tutorial.

Note: The fence-agents-virsh package is not available in RHEL 7 beta. Further, it's only needed if you're building your Anvil! using VMs.

an-a04n01

yum install rsync pacemaker bridge-utils ntp corosync pcs wget gpm man vim screen mlocate syslinux bzip2 \
            openssh-clients fence-agents-all fence-agents-virsh policycoreutils-python drbd drbd-bash-completion \
            drbd-pacemaker drbd-udev drbd-utils drbdmanage

an-a04n02

<same>

Making ssh faster when the net is down

By default, the nodes will try to resolve the host name of an incoming ssh connection. When the internet connection is down, DNS lookups have to time out, which can make login times quite slow. When something goes wrong, seconds count and waiting for up to a minute for an SSH password prompt can be maddening.

For this reason, we will make two changes to /etc/ssh/sshd_config that disable this login delay.

Please be aware that this can reduce security. If this is a concern, skip this step.

an-a04n01

sed -i.anvil 's/#GSSAPIAuthentication no/GSSAPIAuthentication no/' /etc/ssh/sshd_config
sed -i 's/GSSAPIAuthentication yes/#GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sed -i 's/#UseDNS yes/UseDNS no/' /etc/ssh/sshd_config
systemctl restart sshd.service
diff -u /etc/ssh/sshd_config.anvil /etc/ssh/sshd_config

--- /etc/ssh/sshd_config.anvil	2014-06-09 21:15:52.000000000 -0400
+++ /etc/ssh/sshd_config	2014-07-27 08:41:03.296760761 -0400
@@ -89,8 +89,8 @@
 #KerberosUseKuserok yes
 
 # GSSAPI options
-#GSSAPIAuthentication no
-GSSAPIAuthentication yes
+GSSAPIAuthentication no
+#GSSAPIAuthentication yes
 #GSSAPICleanupCredentials yes
 GSSAPICleanupCredentials yes
 #GSSAPIStrictAcceptorCheck yes
@@ -127,7 +127,7 @@
 #ClientAliveInterval 0
 #ClientAliveCountMax 3
 #ShowPatchLevel no
-#UseDNS yes
+UseDNS no
 #PidFile /var/run/sshd.pid
 #MaxStartups 10:30:100
 #PermitTunnel no

an-a04n02

same

same

Subsequent logins when the net is down should be quick.

Configuring the network

If you want to make any other changes, like configuring the interface to have a static IP, do so now. Once you're done editing;

nmcli connection reload
systemctl restart NetworkManager.service
ip addr show

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:a7:9d:17 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.201/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fea7:9d17/64 scope link 
       valid_lft forever preferred_lft forever

The interface should now start on boot properly.

Setting the Hostname

Fedora 19 is very different from EL6.

Note: The '--pretty' line currently doesn't work as there is a bug (rhbz#895299) with single-quotes.

Note: The '--static' option is currently needed to prevent the '.' from being removed. See this bug (rhbz#896756).

Use a format that works for you. For the tutorial, node names are based on the following;

A two-letter prefix identifying the company/user (an, for "Alteeve's Niche!")
A sequential Anvil! ID number in the form of aXX (a01 for "Anvil! 01", a02 for Anvil! 02, etc)
A sequential node ID number in the form of nYY

In our case, this is my third Anvil! and we use the company prefix an, so these two nodes will be;

an-a04n01 - node 1
an-a04n02 - node 2

hostnamectl set-hostname an-a04n01.alteeve.ca --static
hostnamectl set-hostname --pretty "Alteeve's Niche! - Anvil! 03, Node 01"

If you want the new host name to take effect immediately, you can use the traditional hostname command:

hostname an-a04n01.alteeve.ca

The "pretty" host name is stored in /etc/machine-info as the unquoted value for the PRETTY_HOSTNAME value.

vim /etc/machine-info

PRETTY_HOSTNAME=Alteeves Niche! - Anvil! 03, Node 01

If you can't get the hostname command to work for some reason, you can reboot to have the system read the new values.

Network

Note: (Note for myself) - Consider using 'primary_reselect=1.

We want static, named network devices. Follow this;

Changing Ethernet Device Names in EL7 and Fedora 15+

Then, use these configuration files;

Build the bridge;

vim /etc/sysconfig/network-scripts/ifcfg-ifn_bridge1

# Internet-Facing Network - Bridge
DEVICE="ifn_bridge1"
TYPE="Bridge"
BOOTPROTO="none"
IPADDR="10.255.40.1"
NETMASK="255.255.0.0"
GATEWAY="10.255.255.254"
DNS1="8.8.8.8"
DNS2="8.8.4.4"
DEFROUTE="yes"

Now build the bonds;

vim /etc/sysconfig/network-scripts/ifcfg-ifn_bond1

# Internet-Facing Network - Bond
DEVICE="ifn_bond1"
BRIDGE="ifn_bridge1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 primary=ifn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"

vim /etc/sysconfig/network-scripts/ifcfg-sn_bond1

# Storage Network - Bond
DEVICE="sn_bond1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 primary=sn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"
IPADDR="10.10.40.1"
NETMASK="255.255.0.0"

vim /etc/sysconfig/network-scripts/ifcfg-bcn_bond1

# Back-Channel Network - Bond
DEVICE="bcn_bond1"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 primary=bcn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"
IPADDR="10.20.40.1"
NETMASK="255.255.0.0"

Now tell the interfaces to be slaves to their bonds;

Internet-Facing Network;

vim /etc/sysconfig/network-scripts/ifcfg-ifn_link1

# Internet-Facing Network - Link 1
DEVICE="ifn_link1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="ifn_bond1"

vim /etc/sysconfig/network-scripts/ifcfg-ifn_link2

# Internet-Facing Network - Link 2
DEVICE="ifn_link2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="ifn_bond1"

Storage Network;

vim /etc/sysconfig/network-scripts/ifcfg-sn_link1

# Storage Network - Link 1
DEVICE="sn_link1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn_bond1"

vim /etc/sysconfig/network-scripts/ifcfg-sn_link2

# Storage Network - Link 2
DEVICE="sn_link2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn_bond1"

Back-Channel Network

vim /etc/sysconfig/network-scripts/ifcfg-bcn_link1

# Back-Channel Network - Link 1
DEVICE="bcn_link1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="bcn_bond1"

vim /etc/sysconfig/network-scripts/ifcfg-bcn_link2

# Back-Channel Network - Link 2
DEVICE="bcn_link2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="bcn_bond1"

Now restart the network, confirm that the bonds and bridge are up and you are ready to proceed.

Setup The hosts File

You can use DNS if you prefer. For now, lets use /etc/hosts for node name resolution.

vim /etc/hosts

127.0.0.1	localhost localhost.localdomain localhost4 localhost4.localdomain4
::1		localhost localhost.localdomain localhost6 localhost6.localdomain6

# Anvil! 03, Node 01
10.255.40.1	an-a04n01.ifn
10.10.40.1	an-a04n01.sn
10.20.40.1	an-a04n01.bcn an-a04n01 an-a04n01.alteeve.ca
10.20.41.1	an-a04n01.ipmi

# Anvil! 03, Node 02
10.255.40.2	an-a04n02.ifn
10.10.40.2	an-a04n02.sn
10.20.40.2	an-a04n02.bcn an-a04n02 an-a04n02.alteeve.ca
10.20.41.2	an-a04n02.ipmi

# Foundation Pack
### Foundation Pack
# Network Switches
10.20.1.1	an-switch01 an-switch01.alteeve.ca
10.20.1.2	an-switch02 an-switch02.alteeve.ca	# Only accessible when out of the stack
 
# Switched PDUs
10.20.2.1	an-pdu01 an-pdu01.alteeve.ca
10.20.2.2	an-pdu02 an-pdu02.alteeve.ca
 
# Network-monitored UPSes
10.20.3.1	an-ups01 an-ups01.alteeve.ca
10.20.3.2	an-ups02 an-ups02.alteeve.ca
 
### Monitor Packs
10.20.4.1	an-striker01 an-striker01.alteeve.ca
10.255.4.1	an-striker01.ifn
10.20.4.2	an-striker02 an-striker02.alteeve.ca
10.255.4.2	an-striker02.ifn

Setup SSH

Same as before.

Populating And Pushing ~/.ssh/known_hosts

an-a04n01

ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa

Generating public/private rsa key pair.

Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
be:17:cc:23:8e:b1:b4:76:a1:e4:2a:91:cb:cd:d8:3a root@an-a04n01.alteeve.ca
The key's randomart image is:
+--[ RSA 8191]----+
|                 |
|                 |
|                 |
|                 |
|   .    So       |
|  o   +.o =      |
| . B + B.o o     |
|  E + B o..      |
|  .+.o ...       |
+-----------------+

an-a04n01

ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa

Generating public/private rsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
71:b1:9d:31:9f:7a:c9:10:74:e0:4c:69:53:8f:e4:70 root@an-a04n02.alteeve.ca
The key's randomart image is:
+--[ RSA 8191]----+
|          ..O+E  |
|           B+% + |
|        . o.*.= .|
|         o   + . |
|        S   . +  |
|             .   |
|                 |
|                 |
|                 |
+-----------------+

Setup autorized_keys:

an-a04n01

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh root@an-a04n02 "cat /root/.ssh/id_rsa.pub" >> ~/.ssh/authorized_keys 
rsync -av ~/.ssh/authorized_keys root@an-a04n02:/root/.ssh/
ssh-keyscan an-a04n01.alteeve.ca >> ~/.ssh/known_hosts
ssh-keyscan an-a04n01 >> ~/.ssh/known_hosts
ssh-keyscan an-a04n01.bcn >> ~/.ssh/known_hosts
ssh-keyscan an-a04n01.sn >> ~/.ssh/known_hosts
ssh-keyscan an-a04n01.ifn >> ~/.ssh/known_hosts
ssh-keyscan an-a04n02.alteeve.ca >> ~/.ssh/known_hosts
ssh-keyscan an-a04n02 >> ~/.ssh/known_hosts
ssh-keyscan an-a04n02.bcn >> ~/.ssh/known_hosts
ssh-keyscan an-a04n02.sn >> ~/.ssh/known_hosts
ssh-keyscan an-a04n02.ifn >> ~/.ssh/known_hosts
rsync -av ~/.ssh/known_hosts root@an-a04n02:/root/.ssh/
rsync -av /etc/hosts root@an-a04n02:/etc/

an-a04n01

Keeping Time in Sync

It's not as critical as it used to be to keep the clocks on the nodes in sync, but it's still a good idea.

ln -sf /usr/share/zoneinfo/America/Toronto /etc/localtime
systemctl start ntpd.service
systemctl enable ntpd.service

Configuring IPMI

F19 specifics based on the IPMI tutorial.

yum -y install ipmitools OpenIPMI
systemctl start ipmi.service
systemctl enable ipmi.service

ln -s '/usr/lib/systemd/system/ipmi.service' '/etc/systemd/system/multi-user.target.wants/ipmi.service'

Our servers use lan channel 2, yours might be 1 or something else. Experiment.

ipmitool lan print 2

Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : BIOS Assigned Address
IP Address              : 10.20.41.1
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:d8:e8
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

I need to set the IPs to 10.20.41.1/16 and 10.20.41.2/16 for nodes 1 and 2, respectively. I also want to set the password to secret for the admin user.

Node 01 IP;

ipmitool lan set 2 ipsrc static
ipmitool lan set 2 ipaddr 10.20.41.1
ipmitool lan set 2 netmask 255.255.0.0
ipmitool lan set 2 defgw ipaddr 10.20.255.254
ipmitool lan print 2

Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : Static Address
IP Address              : 10.20.41.1
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:d8:e8
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

Node 01 IP;

ipmitool lan set 2 ipsrc static
ipmitool lan set 2 ipaddr 10.20.41.2
ipmitool lan set 2 netmask 255.255.0.0
ipmitool lan set 2 defgw ipaddr 10.20.255.254
ipmitool lan print 2

Set in Progress         : Set Complete
Auth Type Support       : NONE MD5 PASSWORD 
Auth Type Enable        : Callback : NONE MD5 PASSWORD 
                        : User     : NONE MD5 PASSWORD 
                        : Operator : NONE MD5 PASSWORD 
                        : Admin    : NONE MD5 PASSWORD 
                        : OEM      : NONE MD5 PASSWORD 
IP Address Source       : Static Address
IP Address              : 10.20.41.2
Subnet Mask             : 255.255.0.0
MAC Address             : 00:19:99:9a:b1:78
SNMP Community String   : public
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
Default Gateway IP      : 10.20.255.254
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM

Set the password.

ipmitool user list 2

ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
1                    true    true       true       Unknown (0x00)
2   admin            true    true       true       OEM
Get User Access command failed (channel 2, user 3): Unknown (0x32)

(ignore the error, it's harmless... *BOOM*)

We want to set admin's password, so we do:

Note: The 2 below is the ID number, not the LAN channel.

ipmitool user set password 2 secret

Done!

Configuring the Anvil!

Now we're getting down to business!

For this section, we will be working on an-a04n01 and using ssh to perform tasks on an-a04n02.

Note: TODO: explain what this is and how it works.

Enable the pcs Daemon

Note: Most of this section comes more or less verbatim from the main Clusters from Scratch tutorial.

We will use pcs, the Pacemaker Configuration System, to configure our Anvil!.

Note that pcsd uses TCP port 2224.

systemctl start pcsd.service
systemctl enable pcsd.service

ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'

Now we need to set a password for the hacluster user. This is the account used by pcs on one node to talk to the pcs daemon on the other node. For this tutorial, we will use the password secret. You will want to use a stronger password, of course.

echo "super secret password" | passwd --stdin hacluster

Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

Open up the firewall:

firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload

Initializing the Cluster

One of the biggest reasons we're using the pcs tool, over something like crm, is that it has been written to simplify the setup of clusters on Red Hat style operating systems. It will configure corosync automatically.

First, we need to know what hostname we will need to use for pcs.

Node 01:

hostname

an-a04n01.alteeve.ca

Node 02:

hostname

an-a04n02.alteeve.ca

Next, authenticate against the cluster nodes.

Both nodes:

pcs cluster auth an-a04n01.alteeve.ca an-a04n02.alteeve.ca -u hacluster

This will ask you for the user name and password. The default user name is hacluster and we set the password to secret.

Password: 
an-a04n01.alteeve.ca: 6e9f7e98-dfb7-4305-b8e0-d84bf4f93ce3
an-a04n01.alteeve.ca: Authorized
an-a04n02.alteeve.ca: ffee6a85-ddac-4d03-9b97-f136d532b478
an-a04n02.alteeve.ca: Authorized

Do this on one node only:

Now to initialize the cluster's communication and membership layer.

pcs cluster setup --name an-anvil-03 an-a04n01.alteeve.ca an-a04n02.alteeve.ca

an-a04n01.alteeve.ca: Succeeded
an-a04n02.alteeve.ca: Succeeded

This will create the corosync configuration file /etc/corosync/corosync.conf;

cat /etc/corosync/corosync.conf

totem {
version: 2
secauth: off
cluster_name: an-anvil-03
transport: udpu
}

nodelist {
  node {
        ring0_addr: an-a04n01.alteeve.ca
        nodeid: 1
       }
  node {
        ring0_addr: an-a04n02.alteeve.ca
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_syslog: yes
}

Start the Cluster For the First Time

This starts the cluster communication and membership layer for the first time.

On one node only;

pcs cluster start --all

an-a04n01.alteeve.ca: Starting Cluster...
an-a04n02.alteeve.ca: Starting Cluster...

After a few moments, you should be able to check the status;

pcs status

Cluster name: an-anvil-04
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon Jun 24 23:28:29 2013
Last change: Mon Jun 24 23:28:10 2013 via crmd on an-a04n01.alteeve.ca
Current DC: NONE
2 Nodes configured, unknown expected votes
0 Resources configured.


Node an-a04n01.alteeve.ca (1): UNCLEAN (offline)
Node an-a04n02.alteeve.ca (2): UNCLEAN (offline)

Full list of resources:

The other node should show almost the identical output.

Disabling Quorum

Note: Show the math.

With quorum enabled, a two node cluster will lose quorum once either node fails. So we have to disable quorum.

By default, pacemaker uses quorum. You don't see this initially though;

pcs property

Cluster Properties:
 dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
 cluster-infrastructure: corosync

To disable it, we set no-quorum-policy=ignore.

pcs property set no-quorum-policy=ignore
pcs property

Cluster Properties:
 dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
 cluster-infrastructure: corosync
 no-quorum-policy: ignore

Enabling and Configuring Fencing

We will use IPMI and PDU based fence devices for redundancy.

You can see the list of available fence agents here. You will need to find the one for your hardware fence devices.

pcs stonith list

fence_alom - Fence agent for Sun ALOM
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC over SNMP
fence_baytech - I/O Fencing agent for Baytech RPC switches in combination with a Cyclades Terminal
                Server
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for Brocade over telnet
fence_bullpap - I/O Fencing agent for Bull FAME architecture controlled by a PAP management console.
fence_cisco_mds - Fence agent for Cisco MDS
fence_cisco_ucs - Fence agent for Cisco UCS
fence_cpint - I/O Fencing agent for GFS on s390 and zSeries VM clusters
fence_drac - fencing agent for Dell Remote Access Card
fence_drac5 - Fence agent for Dell DRAC CMC/5
fence_eaton_snmp - Fence agent for Eaton over SNMP
fence_egenera - I/O Fencing agent for the Egenera BladeFrame
fence_eps - Fence agent for ePowerSwitch
fence_hpblade - Fence agent for HP BladeSystem
fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
fence_idrac - Fence agent for IPMI over LAN
fence_ifmib - Fence agent for IF MIB
fence_ilo - Fence agent for HP iLO
fence_ilo2 - Fence agent for HP iLO
fence_ilo3 - Fence agent for IPMI over LAN
fence_ilo_mp - Fence agent for HP iLO MP
fence_imm - Fence agent for IPMI over LAN
fence_intelmodular - Fence agent for Intel Modular
fence_ipdu - Fence agent for iPDU over SNMP
fence_ipmilan - Fence agent for IPMI over LAN
fence_kdump - Fence agent for use with kdump
fence_ldom - Fence agent for Sun LDOM
fence_lpar - Fence agent for IBM LPAR
fence_mcdata - I/O Fencing agent for McData FC switches
fence_rackswitch - fence_rackswitch - I/O Fencing agent for RackSaver RackSwitch
fence_rhevm - Fence agent for RHEV-M REST API
fence_rsa - Fence agent for IBM RSA
fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
fence_sanbox2 - Fence agent for QLogic SANBox2 FC switches
fence_scsi - fence agent for SCSI-3 persistent reservations
fence_virsh - Fence agent for virsh
fence_vixel - I/O Fencing agent for Vixel FC switches
fence_vmware - Fence agent for VMWare
fence_vmware_soap - Fence agent for VMWare over SOAP API
fence_wti - Fence agent for WTI
fence_xcat - I/O Fencing agent for xcat environments
fence_xenapi - XenAPI based fencing for the Citrix XenServer virtual machines.
fence_zvm - I/O Fencing agent for GFS on s390 and zSeries VM clusters

We will use fence_ipmilan and fence_apc_snmp.

Configuring IPMI Fencing

Every fence agent has a possibly unique subset of options that can be used. You can see a brief description of these options with the pcs stonith describe fence_X command. Let's look at the options available for fence_ipmilan.

pcs stonith describe fence_ipmilan

Stonith options for: fence_ipmilan
  auth: IPMI Lan Auth type (md5, password, or none)
  ipaddr: IPMI Lan IP to talk to
  passwd: Password (if required) to control power on IPMI device
  passwd_script: Script to retrieve password (if required)
  lanplus: Use Lanplus
  login: Username/Login (if required) to control power on IPMI device
  action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
  timeout: Timeout (sec) for IPMI operation
  cipher: Ciphersuite to use (same as ipmitool -C parameter)
  method: Method to fence (onoff or cycle)
  power_wait: Wait X seconds after on/off operation
  delay: Wait X seconds before fencing is started
  privlvl: Privilege level on IPMI device
  verbose: Verbose mode

One of the nice things about pcs is that it allows us to create a test file to prepare all our changes in. Then, when we're happy with the changes, merge them into the running cluster. So let's make a copy called stonith_cfg

pcs cluster cib stonith_cfg

Now add IPMI fencing.

#                  unique name    fence agent   target node                           device addr             options
pcs stonith create fence_n01_ipmi fence_ipmilan pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-a04n01.ipmi" action="reboot" login="admin" passwd="secret" delay=15 op monitor interval=60s
pcs stonith create fence_n02_ipmi fence_ipmilan pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-a04n02.ipmi" action="reboot" login="admin" passwd="secret" op monitor interval=60s

Note that fence_n01_ipmi has a delay=15 set but fence_n02_ipmi does not. If the network connection breaks between the two nodes, they will both try to fence each other at the same time. If acpid is running, the slower node will not die right away. It will continue to run for up to four more seconds, ample time for it to also initiate a fence against the faster node. The end result is that both nodes get fenced. The ten-second delay protects against this by causing an-a04n02 to pause for 10 seconds before initiating a fence against an-a04n01. If both nodes are alive, an-a04n02 will power off before the 10 seconds pass, so it will never fence an-a04n01. However, if an-a04n01 really is dead, after the ten seconds have elapsed, fencing will proceed as normal.

Note: At the time of writing, pcmk_reboot_action is needed to override pacemaker's global fence action and pcmk_reboot_action is not recognized by pcs. Both of these issues will be resolved shortly; Pacemaker will honour action="..." in v1.1.10 and pcs will recognize pcmk_* special attributes "real soon now". Until then, the --force switch is needed.

Next, add the PDU fencing. This requires distinct "off" and "on" actions for each outlet on each PDU. With two nodes, each with two PSUs, this translates to eight commands. The "off" commands will be monitored to alert us if the PDU fails for some reason. There is no reason to monitor the "on" actions (it would be redundant). Note also that we don't bother using a "delay". The IPMI fence method will go first, before the PDU actions, so the PDU is already delayed.

# Node 1 - off
pcs stonith create fence_n01_pdu1_off fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu01" action="off" port="1" op monitor interval="60s"
pcs stonith create fence_n01_pdu2_off fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu02" action="off" port="1" power_wait="5" op monitor interval="60s"

# Node 1 - on
pcs stonith create fence_n01_pdu1_on fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu01" action="on" port="1"
pcs stonith create fence_n01_pdu2_on fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu02" action="on" port="1"

# Node 2 - off
pcs stonith create fence_n02_pdu1_off fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu01" action="off" port="2" op monitor interval="60s"
pcs stonith create fence_n02_pdu2_off fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu02" action="off" port="2" power_wait="5" op monitor interval="60s"

# Node 2 - on
pcs stonith create fence_n02_pdu1_on fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu01" action="on" port="2"
pcs stonith create fence_n02_pdu2_on fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu02" action="on" port="2"

We can check the new configuration now;

pcs status

Cluster name: an-anvil-04
Last updated: Tue Jul  2 16:41:55 2013
Last change: Tue Jul  2 16:41:44 2013 via cibadmin on an-a04n01.alteeve.ca
Stack: corosync
Current DC: an-a04n01.alteeve.ca (1) - partition with quorum
Version: 1.1.9-3.fc19-781a388
2 Nodes configured, unknown expected votes
10 Resources configured.


Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

Full list of resources:

 fence_n01_ipmi	(stonith:fence_ipmilan):	Started an-a04n01.alteeve.ca 
 fence_n02_ipmi	(stonith:fence_ipmilan):	Started an-a04n02.alteeve.ca 
 fence_n01_pdu1_off	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca 
 fence_n01_pdu2_off	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca 
 fence_n02_pdu1_off	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca 
 fence_n02_pdu2_off	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca 
 fence_n01_pdu1_on	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca 
 fence_n01_pdu2_on	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca 
 fence_n02_pdu1_on	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca 
 fence_n02_pdu2_on	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca

Before we proceed, we need to tell pacemaker to use fencing;

pcs property set stonith-enabled=true
pcs property

Cluster Properties:
Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.9-3.fc19-781a388
 no-quorum-policy: ignore
 stonith-enabled: true

Excellent!

Configuring Fence Levels

The goal of fence levels is to tell pacemaker that there are "fence methods" to try and to impose an order on those methods. Each method composes one or more fence primitives and, when 2 or more primitives are tied together, that all primitives must succeed for the overall method to succeed.

So in our case; the order we want is;

IPMI -> PDUs

The reason is that when IPMI fencing succeeds, we can be very certain the node is truly fenced. When PDU fencing succeeds, it only confirms that the power outlets were cycled. If someone moved a node's power cables to another outlet, we'll get a false positive. On that topic, tie-down the node's PSU cables to the PDU's cable tray when possible, clearly label the power cables and wrap the fingers of anyone who might move them around.

The PDU fencing needs to be implemented using four steps;

PDU 1, outlet X -> off
PDU 2, outlet X -> off
- The power_wait="5" setting for the fence_n0X_pdu2_off primitives will cause a 5 second delay here, giving ample time to ensure the nodes lose power
PDU 1, outlet X -> on
PDU 2, outlet X -> on

This is to ensure that both outlets are off at the same time, ensuring that the node loses power. This works because fencing_topology acts serially.

Putting all this together, we issue this command;

pcs stonith level add 1 an-a04n01.alteeve.ca fence_n01_ipmi
pcs stonith level add 1 an-a04n02.alteeve.ca fence_n02_ipmi

The 1 tells pacemaker that this is our highest priority fence method. We can see that this was set using pcs;

pcs stonith level

 Node: an-a04n01.alteeve.ca
  Level 1 - fence_n01_ipmi
 Node: an-a04n02.alteeve.ca
  Level 1 - fence_n02_ipmi

Now we'll tell pacemaker to use the PDUs as the second fence method. Here we tie together the two off calls and the two on calls into a single method.

pcs stonith level add 2 an-a04n01.alteeve.ca fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
pcs stonith level add 2 an-a04n02.alteeve.ca fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on

Check again and we'll see that the new methods were added.

pcs stonith level

 Node: an-a04n01.alteeve.ca
  Level 1 - fence_n01_ipmi
  Level 2 - fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
 Node: an-a04n02.alteeve.ca
  Level 1 - fence_n02_ipmi
  Level 2 - fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on

For those of us who are XML fans, this is what the cib looks like now:

cat /var/lib/pacemaker/cib/cib.xml

<cib epoch="18" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Jul 18 13:15:53 2013" update-origin="an-a04n01.alteeve.ca" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="1">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.9-dde1c52"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="an-a04n01.alteeve.ca"/>
      <node id="2" uname="an-a04n02.alteeve.ca"/>
    </nodes>
    <resources>
      <primitive class="stonith" id="fence_n01_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n01_ipmi-instance_attributes">
          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-a04n01.ipmi"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-action" name="action" value="reboot"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-login" name="login" value="admin"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
          <nvpair id="fence_n01_ipmi-instance_attributes-delay" name="delay" value="15"/>
        </instance_attributes>
        <operations>
          <op id="fence_n01_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n02_ipmi" type="fence_ipmilan">
        <instance_attributes id="fence_n02_ipmi-instance_attributes">
          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-a04n02.ipmi"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-action" name="action" value="reboot"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-login" name="login" value="admin"/>
          <nvpair id="fence_n02_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
        </instance_attributes>
        <operations>
          <op id="fence_n02_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n01_pdu1_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_pdu1_off-instance_attributes">
          <nvpair id="fence_n01_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
          <nvpair id="fence_n01_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
          <nvpair id="fence_n01_pdu1_off-instance_attributes-action" name="action" value="off"/>
          <nvpair id="fence_n01_pdu1_off-instance_attributes-port" name="port" value="1"/>
        </instance_attributes>
        <operations>
          <op id="fence_n01_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n01_pdu2_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_pdu2_off-instance_attributes">
          <nvpair id="fence_n01_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
          <nvpair id="fence_n01_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
          <nvpair id="fence_n01_pdu2_off-instance_attributes-action" name="action" value="off"/>
          <nvpair id="fence_n01_pdu2_off-instance_attributes-port" name="port" value="1"/>
          <nvpair id="fence_n01_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
        </instance_attributes>
        <operations>
          <op id="fence_n01_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n01_pdu1_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_pdu1_on-instance_attributes">
          <nvpair id="fence_n01_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
          <nvpair id="fence_n01_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
          <nvpair id="fence_n01_pdu1_on-instance_attributes-action" name="action" value="on"/>
          <nvpair id="fence_n01_pdu1_on-instance_attributes-port" name="port" value="1"/>
        </instance_attributes>
        <operations>
          <op id="fence_n01_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n01_pdu2_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n01_pdu2_on-instance_attributes">
          <nvpair id="fence_n01_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
          <nvpair id="fence_n01_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
          <nvpair id="fence_n01_pdu2_on-instance_attributes-action" name="action" value="on"/>
          <nvpair id="fence_n01_pdu2_on-instance_attributes-port" name="port" value="1"/>
        </instance_attributes>
        <operations>
          <op id="fence_n01_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n02_pdu1_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_pdu1_off-instance_attributes">
          <nvpair id="fence_n02_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
          <nvpair id="fence_n02_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
          <nvpair id="fence_n02_pdu1_off-instance_attributes-action" name="action" value="off"/>
          <nvpair id="fence_n02_pdu1_off-instance_attributes-port" name="port" value="2"/>
        </instance_attributes>
        <operations>
          <op id="fence_n02_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n02_pdu2_off" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_pdu2_off-instance_attributes">
          <nvpair id="fence_n02_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
          <nvpair id="fence_n02_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
          <nvpair id="fence_n02_pdu2_off-instance_attributes-action" name="action" value="off"/>
          <nvpair id="fence_n02_pdu2_off-instance_attributes-port" name="port" value="2"/>
          <nvpair id="fence_n02_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
        </instance_attributes>
        <operations>
          <op id="fence_n02_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n02_pdu1_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_pdu1_on-instance_attributes">
          <nvpair id="fence_n02_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
          <nvpair id="fence_n02_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
          <nvpair id="fence_n02_pdu1_on-instance_attributes-action" name="action" value="on"/>
          <nvpair id="fence_n02_pdu1_on-instance_attributes-port" name="port" value="2"/>
        </instance_attributes>
        <operations>
          <op id="fence_n02_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="stonith" id="fence_n02_pdu2_on" type="fence_apc_snmp">
        <instance_attributes id="fence_n02_pdu2_on-instance_attributes">
          <nvpair id="fence_n02_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
          <nvpair id="fence_n02_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
          <nvpair id="fence_n02_pdu2_on-instance_attributes-action" name="action" value="on"/>
          <nvpair id="fence_n02_pdu2_on-instance_attributes-port" name="port" value="2"/>
        </instance_attributes>
        <operations>
          <op id="fence_n02_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
    </resources>
    <constraints/>
    <fencing-topology>
      <fencing-level devices="fence_n01_ipmi" id="fl-an-a04n01.alteeve.ca-1" index="1" target="an-a04n01.alteeve.ca"/>
      <fencing-level devices="fence_n02_ipmi" id="fl-an-a04n02.alteeve.ca-1" index="1" target="an-a04n02.alteeve.ca"/>
      <fencing-level devices="fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on" id="fl-an-a04n01.alteeve.ca-2" index="2" target="an-a04n01.alteeve.ca"/>
      <fencing-level devices="fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on" id="fl-an-a04n02.alteeve.ca-2" index="2" target="an-a04n02.alteeve.ca"/>
    </fencing-topology>
  </configuration>
</cib>

Fencing using fence_virsh

Note: To write this section, I used two virtual machines called pcmk1 and pcmk2.

If you are trying to learn fencing using KVM or Xen virtual machines, you can use the fence_virsh. You can also use fence_virtd, which is actually recommended by many, but I have found it to be rather unreliable.

To use fence_virsh, first install it.

yum -y install fence-agents-virsh

<lots of yum output>

Now test it from the command line. To do this, we need to know a few things;

The VM host is at IP 192.168.122.1
The username and password (-l and -p respectively) are the credentials used to log into VM host over SSH.
- If you don't want your password to be shown, create a little shell script that simply prints your password and then use -S /path/to/script instead of -p "secret".
The name of the target VM, as shown by virsh list --all on the host, is the node (-n) value. For me, the nodes are called an-a04n01 and an-a04n02.

Create the Password Script

In my case, the host is called 'lemass', so I want to create a password script called '/root/lemass.pw'. The name of the script is entirely up to you.

an-a04n01

vim /root/lemass.pw

echo "my secret password"

chmod 755 /root/lemass.pw
/root/lemass.pw

my secret password

rsync -av /root/lemass.pw root@an-a04n02:/root/

sending incremental file list
lemass.pw

sent 102 bytes  received 31 bytes  266.00 bytes/sec
total size is 25  speedup is 0.19

an-a04n02

/root/lemass.pw

my secret password

Done.

Test fence_virsh Status from the Command Line

an-a04n01	fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-a04n02 -o status Status: ON
an-a04n02	fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-a04n01 -o status Status: ON

Excellent! Now to configure it in pacemaker;

an-a04n01

pcs stonith create fence_n01_virsh fence_virsh pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-a04n01" delay=15 op monitor interval=60s
pcs stonith create fence_n02_virsh fence_virsh pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-a04n02" op monitor interval=60s
pcs cluster status

Cluster Status:
 Last updated: Sun Jan 26 15:45:31 2014
 Last change: Sun Jan 26 15:06:14 2014 via crmd on an-a04n01.alteeve.ca
 Stack: corosync
 Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
 Version: 1.1.10-19.el7-368c726
 2 Nodes configured
 2 Resources configured

PCSD Status:
an-a04n01.alteeve.ca: 
  an-a04n01.alteeve.ca: Online
an-a04n02.alteeve.ca: 
  an-a04n02.alteeve.ca: Online

Test Fencing

ToDo: Kill each node with echo c > /proc/sysrq-trigger and make sure the other node fences it.

Shared Storage

DRBD

We will use DRBD 8.4.

Install DRBD 8.4.4 from AN!

Warning: this doesn't work.

ToDo: Make a proper repo

an-a04n01

rpm -Uvh https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-udev-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-utils-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm \
         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-xen-8.4.4-4.el7.x86_64.rpm

an-a04n02

Install DRBD 8.4.4 From Source

At this time, no EPEL repo exists for RHEL7, and the Fedora RPMs don't work, so we will install DRBD 8.4.4 from source.

Install dependencies:

yum -y install gcc flex rpm-build wget kernel-devel
wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
tar -xvzf drbd-8.4.4.tar.gz 
cd drbd-8.4.4
./configure \
  --prefix=/usr \
  --localstatedir=/var \
  --sysconfdir=/etc \
  --with-km \
  --with-udev \
  --with-pacemaker \
  --with-bashcompletion \
  --with-utils \
  --without-xen \
  --without-rgmanager \
  --without-heartbeat
make
make install

Don't let DRBD start on boot (pacemaker will handle it for us).

systemctl disable drbd.service

drbd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig drbd off

Done.

Optional; Make RPMs

Warning: I've not been able to get the RPMs genreated here to install yet. I'd recommend skipping this, unless you want to help sort out the problems. :)

After ./configure above, you can make RPMs instead of installing directly.

Dependencies:

yum install rpmdevtools redhat-rpm-config kernel-devel

<install text>

Setup RPM dev tree:

cd ~
rpmdev-setuptree
ls -lah ~/rpmbuild/
wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
tar -xvzf drbd-8.4.4.tar.gz
cd drbd-8.4.4
./configure \
  --prefix=/usr \
  --localstatedir=/var \
  --sysconfdir=/etc \
  --with-km \
  --with-udev \
  --with-pacemaker \
  --with-bashcompletion \
  --with-utils \
  --without-xen \
  --without-heartbeat

total 4.0K
drwxr-xr-x. 7 root root   67 Dec 23 20:06 .
dr-xr-x---. 6 root root 4.0K Dec 23 20:06 ..
drwxr-xr-x. 2 root root    6 Dec 23 20:06 BUILD
drwxr-xr-x. 2 root root    6 Dec 23 20:06 RPMS
drwxr-xr-x. 2 root root    6 Dec 23 20:06 SOURCES
drwxr-xr-x. 2 root root    6 Dec 23 20:06 SPECS
drwxr-xr-x. 2 root root    6 Dec 23 20:06 SRPMS

Userland tools:

make rpm

checking for presence of 8\.4\.4 in various changelog files
<snip>
+ exit 0
You have now:
/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm

Kernel module:

make kmp-rpm

checking for presence of 8\.4\.4 in various changelog files
<snip>
+ exit 0
You have now:
/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/kmod-drbd-8.4.4_3.10.0_54.0.1-4.el7.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/drbd-kernel-debuginfo-8.4.4-4.el7.x86_64.rpm

Configure DRBD

Configure global-common.conf;

vim /etc/drbd.d/global_common.conf

# These are options to set for the DRBD daemon sets the default values for
# resources.
global {
	# This tells DRBD that you allow it to report this installation to 
	# LINBIT for statistical purposes. If you have privacy concerns, set
	# this to 'no'. The default is 'ask' which will prompt you each time
	# DRBD is updated. Set to 'yes' to allow it without being prompted.
	usage-count no;

	# minor-count dialog-refresh disable-ip-verification
}

common {
	handlers {
		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
		
		# Hook into Pacemaker's fencing.
		fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
	}

	startup {
		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
	}

	options {
		# cpu-mask on-no-data-accessible
	}

	disk {
		# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
		# disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
                fencing resource-and-stonith;
	}

	net {
		# protocol timeout max-epoch-size max-buffers unplug-watermark
		# connect-int ping-int sndbuf-size rcvbuf-size ko-count
		# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
		# after-sb-1pri after-sb-2pri always-asbp rr-conflict
		# ping-timeout data-integrity-alg tcp-cork on-congestion
		# congestion-fill congestion-extents csums-alg verify-alg
		# use-rle

		# Protocol "C" tells DRBD not to tell the operating system that
		# the write is complete until the data has reach persistent
		# storage on both nodes. This is the slowest option, but it is
		# also the only one that guarantees consistency between the
		# nodes. It is also required for dual-primary, which we will 
		# be using.
		protocol C;

		# Tell DRBD to allow dual-primary. This is needed to enable 
		# live-migration of our servers.
		allow-two-primaries yes;

		# This tells DRBD what to do in the case of a split-brain when
		# neither node was primary, when one node was primary and when
		# both nodes are primary. In our case, we'll be running
		# dual-primary, so we can not safely recover automatically. The
		# only safe option is for the nodes to disconnect from one
		# another and let a human decide which node to invalidate. Of 
		after-sb-0pri discard-zero-changes;
		after-sb-1pri discard-secondary;
		after-sb-2pri disconnect;
	}
}

And now configure the first resource;

vim /etc/drbd.d/r0.res

# This is the first DRBD resource. If will store the shared file systems and
# the servers designed to run on node 01.
resource r0 {
	# These options here are common to both nodes. If for some reason you
	# need to set unique values per node, you can move these to the
	# 'on <name> { ... }' section.
	
	# This sets the device name of this DRBD resouce.
	device /dev/drbd0;

	# This tells DRBD what the backing device is for this resource.
	disk /dev/sda5;

	# This controls the location of the metadata. When "internal" is used,
	# as we use here, a little space at the end of the backing devices is
	# set aside (roughly 32 MB per 1 TB of raw storage). External metadata
	# can be used to put the metadata on another partition when converting
	# existing file systems to be DRBD backed, when there is no extra space
	# available for the metadata.
	meta-disk internal;

	# NOTE: this is not required or even recommended with pacemaker. remove
	# 	this options as soon as pacemaker is setup.
	startup {
		# This tells DRBD to promote both nodes to 'primary' when this
		# resource starts. However, we will let pacemaker control this
		# so we comment it out, which tells DRBD to leave both nodes
		# as secondary when drbd starts.
		#become-primary-on both;
	}

	# NOTE: Later, make it an option in the dashboard to trigger a manual
	# 	verify and/or schedule periodic automatic runs
	net {
		# TODO: Test performance differences between sha1 and md5
		# This tells DRBD how to do a block-by-block verification of
		# the data stored on the backing devices. Any verification
		# failures will result in the effected block being marked
		# out-of-sync.
		verify-alg md5;

		# TODO: Test the performance hit of this being enabled.
		# This tells DRBD to generate a checksum for each transmitted
		# packet. If the data received data doesn't generate the same
		# sum, a retransmit request is generated. This protects against
		# otherwise-undetected errors in transmission, like 
		# bit-flipping. See:
		# http://www.drbd.org/users-guide/s-integrity-check.html
		data-integrity-alg md5;
	}

	# WARNING: Confirm that these are safe when the controller's BBU is
	#          depleted/failed and the controller enters write-through 
	#          mode.
	disk {
		# TODO: Test the real-world performance differences gained with
		#       these options.
		# This tells DRBD not to bypass the write-back caching on the
		# RAID controller. Normally, DRBD forces the data to be flushed
		# to disk, rather than allowing the write-back cachine to 
		# handle it. Normally this is dangerous, but with BBU-backed
		# caching, it is safe. The first option disables disk flushing
		# and the second disabled metadata flushes.
		disk-flushes no;
		md-flushes no;
	}

	# This sets up the resource on node 01. The name used below must be the
	# named returned by "uname -n".
	on an-a04n01.alteeve.ca {
		# This is the address and port to use for DRBD traffic on this
		# node. Multiple resources can use the same IP but the ports
		# must differ. By convention, the first resource uses 7788, the
		# second uses 7789 and so on, incrementing by one for each
		# additional resource. 
		address 10.10.40.1:7788;
	}
	on an-a04n02.alteeve.ca {
		address 10.10.40.2:7788;
	}
}

Disable drbd from starting on boot.

systemctl disable drbd.service

drbd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig drbd off

Load the config;

modprobe drbd

Now check the config;

drbdadm dump

  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

you are the 69th user to install this version
/etc/drbd.d/r0.res:3: in resource r0:
become-primary-on is set to both, but allow-two-primaries is not set.

Ignore that error. It has been reported and does not effect operation.

Create the metadisk;

drbdadm create-md r0

Writing meta data...
initializing activity log
NOT initializing bitmap
New drbd meta data block successfully created.
success

Start the DRBD resource on both nodes;

drbdadm up r0

Once /proc/drbd shows both nodes connected, force one to primary and it will sync over the second.

drbdadm primary --force r0

You should see the resource syncing now. Push both nodes to primary;

drbdadm primary r0

DLM, Clustered LVM and GFS2

an-a04n01

sed -i.anvil 's^filter = \[ "a/\.\*/" \]^filter = \[ "a|/dev/drbd*|", "r/.*/" \]^' /etc/lvm/lvm.conf
sed -i 's/locking_type = 1$/locking_type = 3/' /etc/lvm/lvm.conf
sed -i 's/fallback_to_local_locking = 1$/fallback_to_local_locking = 0/' /etc/lvm/lvm.conf 
sed -i 's/use_lvmetad = 1$/use_lvmetad = 0/' /etc/lvm/lvm.conf

--- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
+++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.026928464 -0500
@@ -84,7 +84,7 @@
     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
 
     # By default we accept every block device:
-    filter = [ "a/.*/" ]
+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
 
     # Exclude the cdrom drive
     # filter = [ "r|/dev/cdrom|" ]
@@ -451,7 +451,7 @@
     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
     # is set at the same time, LVM always issues a warning message about this
     # and then it automatically disables lvmetad use.
-    locking_type = 1
+    locking_type = 3
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -467,7 +467,7 @@
     # to 1 an attempt will be made to use local file-based locking (type 1).
     # If this succeeds, only commands against local volume groups will proceed.
     # Volume Groups marked as clustered will be ignored.
-    fallback_to_local_locking = 1
+    fallback_to_local_locking = 0
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
@@ -594,7 +594,7 @@
     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
     # is set at the same time, LVM always issues a warning message about this
     # and then it automatically disables lvmetad use.
-    use_lvmetad = 1
+    use_lvmetad = 0
 
     # Full path of the utility called to check that a thin metadata device
     # is in a state that allows it to be used.

rsync -av /etc/lvm/lvm.conf* root@an-a04n02:/etc/lvm/

sending incremental file list
lvm.conf
lvm.conf.anvil

sent 48536 bytes  received 440 bytes  97952.00 bytes/sec
total size is 90673  speedup is 1.85

an-a04n02

diff -u /etc/lvm/lvm.conf.anvil /etc/lvm/lvm.conf

--- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
+++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.000000000 -0500
@@ -84,7 +84,7 @@
     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
 
     # By default we accept every block device:
-    filter = [ "a/.*/" ]
+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
 
     # Exclude the cdrom drive
     # filter = [ "r|/dev/cdrom|" ]
@@ -451,7 +451,7 @@
     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
     # is set at the same time, LVM always issues a warning message about this
     # and then it automatically disables lvmetad use.
-    locking_type = 1
+    locking_type = 3
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -467,7 +467,7 @@
     # to 1 an attempt will be made to use local file-based locking (type 1).
     # If this succeeds, only commands against local volume groups will proceed.
     # Volume Groups marked as clustered will be ignored.
-    fallback_to_local_locking = 1
+    fallback_to_local_locking = 0
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
@@ -594,7 +594,7 @@
     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
     # is set at the same time, LVM always issues a warning message about this
     # and then it automatically disables lvmetad use.
-    use_lvmetad = 1
+    use_lvmetad = 0
 
     # Full path of the utility called to check that a thin metadata device
     # is in a state that allows it to be used.

Disable lvmetad as it's not cluster-aware.

an-a04n01

systemctl disable lvm2-lvmetad.service
systemctl disable lvm2-lvmetad.socket
systemctl stop lvm2-lvmetad.service

rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'

an-a04n02

systemctl disable lvm2-lvmetad.service
systemctl disable lvm2-lvmetad.socket
systemctl stop lvm2-lvmetad.service

rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'

Note: This will be moved to pacemaker shortly. We're enabling it here just long enough to configure pacemaker.

Start DLM and clvmd;

an-a04n01	systemctl start dlm.service systemctl start clvmd.service
an-a04n02	systemctl start dlm.service systemctl start clvmd.service

Create the PV, VG and the /shared LV;

an-a04n01

pvcreate /dev/drbd0

  Physical volume "/dev/drbd0" successfully created

vgcreate an-a04n01_vg0 /dev/drbd0

  /proc/devices: No entry for device-mapper found
  Clustered volume group "an-a04n01_vg0" successfully created

lvcreate -L 10G -n shared an-a04n01_vg0

  Logical volume "shared" created

an-a04n02

pvscan

  PV /dev/drbd0   VG an-a04n01_vg0   lvm2 [20.00 GiB / 20.00 GiB free]
  Total: 1 [20.00 GiB] / in use: 1 [20.00 GiB] / in no VG: 0 [0   ]

vgscan

  Reading all physical volumes.  This may take a while...
  Found volume group "an-a04n01_vg0" using metadata type lvm2

lvscan

  ACTIVE            '/dev/an-a04n01_vg0/shared' [10.00 GiB] inherit

Format the /dev/an-a04n01_vg0/shared;

an-a04n01

mkfs.gfs2 -j 2 -p lock_dlm -t an-anvil-04:shared /dev/an-a04n01_vg0/shared

/dev/an-a04n01_vg0/shared is a symbolic link to /dev/dm-0
This will destroy any data on /dev/dm-0

Are you sure you want to proceed? [y/n]y

Device:                    /dev/an-a04n01_vg0/shared
Block size:                4096
Device size:               10.00 GB (2621440 blocks)
Filesystem size:           10.00 GB (2621438 blocks)
Journals:                  2
Resource groups:           40
Locking protocol:          "lock_dlm"
Lock table:                "an-anvil-04:shared"
UUID:                      20bafdb0-1f86-f424-405b-9bf608c0c486

mkdir /shared
mount /dev/an-a04n01_vg0/shared /shared
df -h

Filesystem                         Size  Used Avail Use% Mounted on
/dev/vda3                           18G  5.6G   12G  32% /
devtmpfs                           932M     0  932M   0% /dev
tmpfs                              937M   61M  877M   7% /dev/shm
tmpfs                              937M  1.9M  935M   1% /run
tmpfs                              937M     0  937M   0% /sys/fs/cgroup
/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
/dev/vda1                          484M   83M  401M  18% /boot
/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared

an-a04n02

Filesystem                         Size  Used Avail Use% Mounted on
/dev/vda3                           18G  5.6G   12G  32% /
devtmpfs                           932M     0  932M   0% /dev
tmpfs                              937M   76M  862M   9% /dev/shm
tmpfs                              937M  2.0M  935M   1% /run
tmpfs                              937M     0  937M   0% /sys/fs/cgroup
/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
/dev/vda1                          484M   83M  401M  18% /boot
/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared

Shut down gfs2, clvmd and drbd now.

an-a04n01	umount /shared/ systemctl stop clvmd.service drbdadm down r0
an-a04n02	umount /shared/ systemctl stop clvmd.service drbdadm down r0

Done.

Add Storage to Pacemaker

Configure Dual-Primary DRBD

Setup DRBD as a dual-primary resource.

an-a04n01

pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create drbd_r0 ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
pcs -f drbd_cfg resource master drbd_r0_Clone drbd_r0 master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs cluster cib-push drbd_cfg

CIB updated

Give it a couple minutes to promote both nodes to Master on both nodes. Initially, it will appear as Master on one node only.

Once updated, you should see this:

an-a04n01

pcs status

Cluster name: an-anvil-04
Last updated: Sun Jan 26 20:26:33 2014
Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-a04n01.alteeve.ca
Stack: corosync
Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured


Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

Full list of resources:

 fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca 
 fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca 
 Master/Slave Set: drbd_r0_Clone [drbd_r0]
     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

PCSD Status:
an-a04n01.alteeve.ca: 
  an-a04n01.alteeve.ca: Online
an-a04n02.alteeve.ca: 
  an-a04n02.alteeve.ca: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

an-a04n02

pcs status

Cluster name: an-anvil-04
Last updated: Sun Jan 26 20:26:58 2014
Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-a04n01.alteeve.ca
Stack: corosync
Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
4 Resources configured


Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

Full list of resources:

 fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca 
 fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca 
 Master/Slave Set: drbd_r0_Clone [drbd_r0]
     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

PCSD Status:
an-a04n01.alteeve.ca: 
  an-a04n01.alteeve.ca: Online
an-a04n02.alteeve.ca: 
  an-a04n02.alteeve.ca: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Configure DLM

an-a04n01

pcs cluster cib dlm_cfg
pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=60s
pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
pcs cluster cib-push dlm_cfg

CIB updated

an-a04n02

pcs status

Cluster name: an-anvil-04
Last updated: Sun Jan 26 20:34:36 2014
Last change: Sun Jan 26 20:33:31 2014 via cibadmin on an-a04n01.alteeve.ca
Stack: corosync
Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
6 Resources configured


Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

Full list of resources:

 fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca 
 fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca 
 Master/Slave Set: drbd_r0_Clone [drbd_r0]
     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
 Clone Set: dlm-clone [dlm]
     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

PCSD Status:
an-a04n01.alteeve.ca: 
  an-a04n01.alteeve.ca: Online
an-a04n02.alteeve.ca: 
  an-a04n02.alteeve.ca: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Configure Cluster LVM

an-a04n01

pcs cluster cib clvmd_cfg
pcs -f clvmd_cfg resource create clvmd lsb:clvmd params daemon_timeout=30s op monitor interval=60s
pcs -f clvmd_cfg resource clone clvmd clone-max=2 clone-node-max=1
pcs -f clvmd_cfg constraint colocation add dlm-clone clvmd-clone INFINITY
pcs -f clvmd_cfg constraint order start dlm then start clvmd-clone
pcs cluster cib-push clvmd_cfg

CIB updated

an-a04n02

pcs status

Cluster name: an-anvil-04
Last updated: Mon Jan 27 19:00:33 2014
Last change: Mon Jan 27 19:00:19 2014 via crm_resource on an-a04n01.alteeve.ca
Stack: corosync
Current DC: an-a04n01.alteeve.ca (1) - partition with quorum
Version: 1.1.10-19.el7-368c726
2 Nodes configured
8 Resources configured


Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

Full list of resources:

 fence_n01_virsh        (stonith:fence_virsh):  Started an-a04n01.alteeve.ca
 fence_n02_virsh        (stonith:fence_virsh):  Started an-a04n02.alteeve.ca
 Master/Slave Set: drbd_r0_Clone [drbd_r0]
     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
 Clone Set: dlm-clone [dlm]
     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]

PCSD Status:
an-a04n01.alteeve.ca:
  an-a04n01.alteeve.ca: Online
an-a04n02.alteeve.ca:
  an-a04n02.alteeve.ca: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Configure the /shared GFS2 Partition

an-a04n01

pcs cluster cib fs_cfg
pcs -f fs_cfg resource create sharedFS Filesystem device="/dev/an-a04n01_vg0/shared" directory="/shared" fstype="gfs2"
pcs -f fs_cfg resource clone sharedFS
pcs cluster cib-push fs_cfg

CIB updated

df -h

Filesystem                         Size  Used Avail Use% Mounted on
/dev/vda3                           18G  5.6G   12G  32% /
devtmpfs                           932M     0  932M   0% /dev
tmpfs                              937M   61M  877M   7% /dev/shm
tmpfs                              937M  2.2M  935M   1% /run
tmpfs                              937M     0  937M   0% /sys/fs/cgroup
/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
/dev/vda1                          484M   83M  401M  18% /boot
/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared

an-a04n02

df -h

Filesystem                         Size  Used Avail Use% Mounted on
/dev/vda3                           18G  5.6G   12G  32% /
devtmpfs                           932M     0  932M   0% /dev
tmpfs                              937M   76M  862M   9% /dev/shm
tmpfs                              937M  2.6M  935M   1% /run
tmpfs                              937M     0  937M   0% /sys/fs/cgroup
/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
/dev/vda1                          484M   83M  401M  18% /boot
/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared

Configuring Constraints

an-a04n01

pcs cluster cib cst_cfg
pcs -f cst_cfg constraint order start dlm then promote drbd_r0_Clone
pcs -f cst_cfg constraint order promote drbd_r0_Clone then start clvmd-clone
pcs -f cst_cfg constraint order promote clvmd-clone then start sharedFS-clone
pcs cluster cib-push cst_cfg

CIB updated

pcs constraint show

Location Constraints:
Ordering Constraints:
  start dlm then promote drbd_r0_Clone
  promote drbd_r0_Clone then start clvmd-clone
  start clvmd-clone then start sharedFS-clone
Colocation Constraints:

an-a04n02

pcs constraint show

Location Constraints:
Ordering Constraints:
  start dlm then promote drbd_r0_Clone
  promote drbd_r0_Clone then start clvmd-clone
  start clvmd-clone then start sharedFS-clone
Colocation Constraints:

Odds and Sods

This is a section for random notes. The stuff here will be integrated into the finished tutorial or removed.

Determine multicast Address

Useful if you need to ensure that your switch has persistent multicast addresses set.

corosync-cmapctl | grep mcastaddr

totem.interface.0.mcastaddr (str) = 239.192.122.199

an-a04n01
an-a04n02

Notes

Pacemaker Logging
Editing cib.xml offline is possible with: CIB_file=/path/to/real/cib.xml cibadmin .... and sync to other nodes when done.

Thanks

This list will certainly grow as this tutorial progresses;

Olivier Allart, RCHE for doing a lot of the heavy lifting on the fencing_topology configuration.

Any questions, feedback, advice, complaints or meanderings are welcome.
`Alteeve's Niche!`	`Enterprise Support: Alteeve Support`	`Community Support`
© Alteeve's Niche! Inc. 1997-2024		Anvil! "Intelligent Availability®" Platform
`legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.`

@@ Line 3: / Line 3: @@
 {{warning|1=This tutorial is incomplete, flawed and generally sucks at this time. Do not follow this and expect anything to work. In large part, it's a dumping ground for notes and little else. This warning will be removed when the tutorial is completed.}}
-{{warning|1=This tutorial is built on [[Red Hat]]'s Enterprise Linux 7 beta. [[Red Hat]] never confirms what a future release will contain until it is actually released, so there is a real chance that what is in the beta will '''not''' be in the final release.}}
+This is the third '''Anvil!''' tutorial built on [[Red Hat]]'s Enterprise Linux 7. It marks the third generation of the [[Anvil!]] High-Availability Platform.
-This is the third '''AN!Cluster''' tutorial built on [[Red Hat]]'s Enterprise Linux 7. It improves on the [[Red Hat Cluster Service 2 Tutorial|RHEL 5, RHCS stable 2]] and [[2-Node Red Hat KVM Cluster Tutorial|RHEL 6, RHCS stable3]] tutorials.
+As with the previous tutorials, the end goal of this tutorial is an ''Anvil!'' platform for high-availability virtual servers. It's design attempts to remove all single points of failure from the system. Power and networking are made fully redundant in this version, along with minimizing the node failures which would lead to service interruption. This tutorial also covers the [[Striker]] dashboard and [[ScanCore]] monitoring and self-healing tools.
-As with the previous tutorials, the end goal of this tutorial is a 2-node cluster providing a platform for high-availability virtual servers. It's design attempts to remove all single points of failure from the system. Power and networking are made fully redundant in this version, along with minimizing the node failures which would lead to service interruption. This tutorial also covers the [[AN!Utilities]]; [[AN!Cluster Dashboard]], [[AN!Cluster Monitor]] and [[AN!Safe Cluster Shutdown]].
 As it the previous tutorial, [[KVM]] will be the hypervisor used for facilitating virtual machines. The old <span class="code">[[cman]]</span> and <span class="code">[[rgmanager]]</span> tools are replaced in favour of <span class="code">[[pacemaker]]</span> for resource management.
@@ Line 13: / Line 11: @@
 = Before We Begin =
-This tutorial '''does not''' require prior cluster experience, but it does expect familiarity with Linux and a low-intermediate understanding of networking. Where possible, steps are explained in detail and rationale is provided for why certain decisions are made.
+This tutorial '''does not''' require prior Anvil! experience (or any clustering experience), but it does expect a certain familiarity with Linux and a low-intermediate understanding of networking. Where possible, steps are explained in detail and rationale is provided for why certain decisions are made.
-'''For those with cluster experience''';
+'''For those with Anvil! experience''';
 Please be careful not to skip too much. There are some major and some subtle changes from previous tutorials.
@@ Line 21: / Line 19: @@
 = OS Setup =
-{{warning|1=We are using the [[RHEL]] 7 beta OS.}}
+This tutorial assumes a <span class="code">minimal</span> install of either [[RHEL]] or [[CentOS]] version 7.
 == Post OS Install ==
@@ Line 27: / Line 25: @@
 {{note|1=With RHEL7, <span class="code">[[biosdevname]]</span> tries to give network devices predictable names. It's very likely that your initial device names will differ from those in this tutorial.}}
-=== Making ssh faster when the net is down ===
+=== If you are running RHEL ===
-By default, the nodes will try to resolve the host name of an incoming ssh connection. When the internet connection is down, DNS lookups have to time out, which can make login times quite slow. When something goes wrong, seconds count and waiting for up to a minute for an SSH password prompt can be maddening.
+Before you can download any packages, you will need to register your nodes with Red Hat's subscription manager;
-For this reason, we will make two changes to <span class="code">/etc/ssh/sshd_config</span> that disable this login delay.
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+subscription-manager register --username $username --password $password --auto-attach
+subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
+subscription-manager repos --enable=rhel-7-server-optional-rpms
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+The system has been registered with ID: 9c578d87-bd80-4637-9f41-6076efb9e20e
-Please be aware that this can reduce security. If this is a concern, skip this step.
+Installed Product Current Status:
+Product Name: Red Hat Enterprise Linux Server
+Status:       Subscribed
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+subscription-manager register --username $username --password $password --auto-attach
+subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
+subscription-manager repos --enable=rhel-7-server-optional-rpms
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+The system has been registered with ID: a55c83e5-e4ec-4fcf-b7b7-b9455b3e07cf
-<syntaxhighlight lang="bash">
+Installed Product Current Status:
-sed -i.anvil 's/#GSSAPIAuthentication no/GSSAPIAuthentication no/' /etc/ssh/sshd_config
+Product Name: Red Hat Enterprise Linux Server
-sed -i 's/GSSAPIAuthentication yes/#GSSAPIAuthentication yes/' /etc/ssh/sshd_config
+Status:       Subscribed
-sed -i 's/#UseDNS yes/UseDNS no/' /etc/ssh/sshd_config
-systemctl restart sshd.service
-diff -u /etc/ssh/sshd_config.anvil /etc/ssh/sshd_config
 </syntaxhighlight>
-<syntaxhighlight lang="diff">
+|}
---- /etc/ssh/sshd_config.anvil	2013-11-08 09:17:23.000000000 -0500
-+++ /etc/ssh/sshd_config	2014-04-03 00:01:40.980951975 -0400
+== Adding LINBIT Repos ==
-@@ -89,8 +89,8 @@
- #KerberosUseKuserok yes
+If you purchased full [https://my.linbit.com LINBIT support], you can add their repos in order to get DRBD 9 and associated tools.
- # GSSAPI options
+First, download their registration tool.
--#GSSAPIAuthentication no
--GSSAPIAuthentication yes
+{|class="wikitable"
-+GSSAPIAuthentication no
+!<span class="code">an-a04n01</span>
-+#GSSAPIAuthentication yes
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
- #GSSAPICleanupCredentials yes
+cd /root
- GSSAPICleanupCredentials yes
+wget https://my.linbit.com/linbit-manage-node.py
- #GSSAPIStrictAcceptorCheck yes
-@@ -127,7 +127,7 @@
- #ClientAliveInterval 0
- #ClientAliveCountMax 3
- #ShowPatchLevel no
--#UseDNS yes
-+UseDNS no
- #PidFile /var/run/sshd.pid
- #MaxStartups 10:30:100
- #PermitTunnel no
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+--2016-11-19 10:22:21--  https://my.linbit.com/linbit-manage-node.py
+Resolving my.linbit.com (my.linbit.com)... 212.69.166.235
+Connecting to my.linbit.com (my.linbit.com)|212.69.166.235|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 26797 (26K) [application/x-python-script]
+Saving to: ‘linbit-manage-node.py’
-Subsequent logins when the net is down should be quick.
+%[========================================================================================>] 26,797      --.-K/s   in 0.1s
-=== Configuring the network ===
+-11-19 10:22:21 (175 KB/s) - ‘linbit-manage-node.py’ saved [26797/26797]
+</syntaxhighlight>
-<span class="code"></span>
+|-
-<syntaxhighlight lang="bash">
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+cd /root
+wget https://my.linbit.com/linbit-manage-node.py
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+--2016-11-19 10:26:52--  https://my.linbit.com/linbit-manage-node.py
+Resolving my.linbit.com (my.linbit.com)... 212.69.166.235
+Connecting to my.linbit.com (my.linbit.com)|212.69.166.235|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 26797 (26K) [application/x-python-script]
+Saving to: ‘linbit-manage-node.py’
+%[========================================================================================>] 26,797      --.-K/s   in 0.1s
-Enable the <span class="code">eth0</span> interface on boot.
+-11-19 10:26:53 (182 KB/s) - ‘linbit-manage-node.py’ saved [26797/26797]
-<syntaxhighlight lang="bash">
-sed -i.bak 's/ONBOOT=.*/ONBOOT="yes"/' /etc/sysconfig/network-scripts/ifcfg-eth0
-diff -U0 /etc/sysconfig/network-scripts/ifcfg-eth0.bak /etc/sysconfig/network-scripts/ifcfg-eth0
-</syntaxhighlight>
-<syntaxhighlight lang="diff">
---- /etc/sysconfig/network-scripts/ifcfg-eth0.bak	2014-01-23 16:15:45.008085032 -0500
-+++ /etc/sysconfig/network-scripts/ifcfg-eth0	2014-01-23 16:15:25.573009623 -0500
-@@ -11 +11 @@
--ONBOOT=no
-+ONBOOT="yes"
 </syntaxhighlight>
+|}
-If you want to make any other changes, like configuring the interface to have a static IP, do so now. Once you're done editing;
+Make it executable.
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-nmcli connection reload
+!<span class="code">an-a04n01</span>
-systemctl restart NetworkManager.service
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-ip addr show
+chmod 755 linbit-manage-node.py
+ls -lah linbit-manage-node.py
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+-rwxr-xr-x. 1 root root 27K Oct 11 05:54 linbit-manage-node.py
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+chmod 755 linbit-manage-node.py
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
+-rwxr-xr-x. 1 root root 27K Oct 11 05:54 linbit-manage-node.py
-    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
-    inet 127.0.0.1/8 scope host lo
-       valid_lft forever preferred_lft forever
-    inet6 ::1/128 scope host
-       valid_lft forever preferred_lft forever
-: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
-    link/ether 52:54:00:a7:9d:17 brd ff:ff:ff:ff:ff:ff
-    inet 192.168.122.201/24 scope global eth0
-       valid_lft forever preferred_lft forever
-    inet6 fe80::5054:ff:fea7:9d17/64 scope link
-       valid_lft forever preferred_lft forever
 </syntaxhighlight>
+|}
-The interface should now start on boot properly.
+{{note|1=If you get the error: '<span class="code">ERR: Could not detect MAC addresses of your node</span>', then the version of '<span class="code">linbit-manage-node.py</span>' does not yet recognise bridges or slaved interfaces in bonds. For now, you can download a [https://alteeve.ca/files/linbit-manage-node_anvil.py modified version from Alteeve] instead.}}
-=== Enabling the Beta Repos ===
+Now run the tool interactively.
-While using the RHEL 7 beta, the public beta repos are disabled by default. This enables them.
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
-<syntaxhighlight lang="bash">
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-sed -i.anvil 's/enabled=0/enabled=1/' /etc/yum.repos.d/rhel-beta.repo
+/root/linbit-manage-node.py
-diff -u /etc/yum.repos.d/rhel-beta.repo.anvil /etc/yum.repos.d/rhel-beta.repo
 </syntaxhighlight>
-<syntaxhighlight lang="diff">
+<syntaxhighlight lang="text">
---- /etc/yum.repos.d/rhel-beta.repo.anvil	2013-11-27 03:46:13.000000000 -0500
+linbit-manage-node.py (Version: 1.11)
-+++ /etc/yum.repos.d/rhel-beta.repo	2014-04-02 19:23:29.177918811 -0400
+Checking if version is up to date
-@@ -2,7 +2,7 @@
+[OK] Your version is up to date
- name=Red Hat Enterprise Linux 7 Beta - $basearch
+Username:
- #baseurl=ftp://ftp.redhat.com/pub/redhat/rhel/beta/7/$basearch/os/
- mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=rhel-7-beta&arch=$basearch
--enabled=0
-+enabled=1
- gpgcheck=1
- gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-beta
-@@ -10,7 +10,7 @@
- name=Red Hat Enterprise Linux 7 Beta - $basearch - Source
- #baseurl=ftp://ftp.redhat.com/pub/redhat/rhel/beta/7/source/
- mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=rhel-source-7-beta&arch=$basearch
--enabled=0
-+enabled=1
- gpgcheck=1
- gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-beta
-@@ -18,6 +18,6 @@
- name=Red Hat Enterprise Linux 7 Beta - $basearch - Debuginfo
- #baseurl=ftp://ftp.redhat.com/pub/redhat/rhel/beta/7/$basearch/debug/
- mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=rhel-debug-7-beta&arch=$basearch
--enabled=0
-+enabled=1
- gpgcheck=1
- gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-beta
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+|-
-yum clean all
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+/root/linbit-manage-node.py
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Loaded plugins: product-id, subscription-manager
+linbit-manage-node.py (Version: 1.11)
-This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
+Checking if version is up to date
-Cleaning repos: rhel-beta rhel-beta-debuginfo rhel-beta-source
+[OK] Your version is up to date
-Cleaning up everything
+Username:
 </syntaxhighlight>
+|}
-Done.
+Enter the user name and password given to you by LINBIT when you registered with them.
-== Install ==
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+Username: xxxxxx
+Credential (will not be echoed):
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+[OK] Login successful
+The following contracts are available:
+Will this node form a cluster with...
-Not all of these are required, but most are used at one point or another in this tutorial.
+) Contract: silver 2017-01-07 (ID: xxxx)
-{{note|1=The <span class="code">fence-agents-virsh</span> package is not available in RHEL 7 beta. Further, it's only needed if you're building your cluster using VMs.}}
+--> Please enter a number in range and press return:
+</syntaxhighlight>
-<syntaxhighlight lang="bash">
+|-
-yum install bridge-utils corosync net-tools ntp pacemaker pcs rsync syslinux \
+!<span class="code">an-a04n02</span>
-            wget fence-agents-all fence-agents-virsh gpm man vim screen mlocate \
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-            syslinux dlm dlm-lib lvm2-cluster gfs2-utils
+Username: xxxxxx
+Credential (will not be echoed):
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+[OK] Login successful
+The following contracts are available:
+Will this node form a cluster with...
-If you want to use your mouse at the node's terminal, run the following;
+) Contract: silver 2017-01-07 (ID: xxxx)
-<syntaxhighlight lang="bash">
+--> Please enter a number in range and press return:
-systemctl enable gpm.service
-systemctl start gpm.service
 </syntaxhighlight>
+|}
-Disable dlm and clvmd from starting on boot.
+If you have multiple contracts, select the number to the left of the contract identification. Otherwise, select '<span class="code">1</span>'.
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-systemctl disable clvmd.service
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+--> Please enter a number in range and press return: 1
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-clvmd.service is not a native service, redirecting to /sbin/chkconfig.
+Writing registration data:
-Executing /sbin/chkconfig clvmd off
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+|-
-systemctl disable dlm.service
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+--> Please enter a number in range and press return: 1
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-rm '/etc/systemd/system/multi-user.target.wants/dlm.service'
+Writing registration data:
 </syntaxhighlight>
+|}
-== Setting the Hostname ==
+Confirm that you want to write out the license file. Once you accept, you will be presented with a menu of which repositories you want to use from LINBIT. We're only going to enable the '<span class="field">drbd-9.0</span>' repo and leave the pacemaker repos disabled as we'll pull them from Red Hat.
-Fedora 19 is '''very''' different from [[EL6]].
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+--> Write to file (/var/lib/drbd-support/registration.json)? [y/N]
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+  Here are the repositories you can enable:
-{{note|1=The '<span class="code">--pretty</span>' line currently doesn't work as there is [https://bugzilla.redhat.com/show_bug.cgi?id=895299 a bug (rhbz#895299)] with single-quotes.}}
+) pacemaker-1.1.15(Disabled)
-{{note|1=The '<span class="code">--static</span>' option is currently needed to prevent the '<span class="code">.</span>' from being removed. See [https://bugzilla.redhat.com/show_bug.cgi?id=896756 this bug (rhbz#896756)].}}
+) pacemaker-1.1.12(Disabled)
+) pacemaker-1.1(Disabled)
+) drbd-9.0(Disabled)
+) drbd-8.4(Disabled)
-Use a format that works for you. For the tutorial, node names are based on the following;
+  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.
-* A two-letter prefix identifying the company/user (<span class="code">an</span>, for "Alteeve's Niche!")
+</syntaxhighlight>
-* A sequential cluster ID number in the form of <span class="code">cXX</span> (<span class="code">c01</span> for "Cluster 01", <span class="code">c02</span> for Cluster 02, etc)
+<syntaxhighlight lang="text">
-* A sequential node ID number in the form of <span class="code">nYY</span>
+  Enable/Disable: 4
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+  Here are the repositories you can enable:
-In my case, this is my third cluster and I use the company prefix <span class="code">an</span>, so my two nodes will be;
+) pacemaker-1.1.15(Disabled)
-* <span class="code">an-c03n01</span> - node 1
+) pacemaker-1.1.12(Disabled)
-* <span class="code">an-c03n02</span> - node 2
+) pacemaker-1.1(Disabled)
+) drbd-9.0(Enabled)
+) drbd-8.4(Disabled)
-Folks who've read my earlier tutorials will note that this is a departure in naming. I find this method spans and scales much better. Further, it the simply required in order to use the [[AN!CDB|AN! Cluster Dashboard]].
+  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.
+</syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-hostnamectl set-hostname an-c03n01.alteeve.ca --static
+  Enable/Disable: 0
-hostnamectl set-hostname --pretty "Alteeve's Niche! - Cluster 03, Node 01"
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+--> Write to file (/var/lib/drbd-support/registration.json)? [y/N]
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+  Here are the repositories you can enable:
-If you want the new host name to take effect immediately, you can use the traditional <span class="code">hostname</span> command:
+) pacemaker-1.1.15(Disabled)
+) pacemaker-1.1.12(Disabled)
+) pacemaker-1.1(Disabled)
+) drbd-9.0(Disabled)
+) drbd-8.4(Disabled)
-<syntaxhighlight lang="bash">
+  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.
-hostname an-c03n01.alteeve.ca
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+  Enable/Disable: 4
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+  Here are the repositories you can enable:
-The "pretty" host name is stored in <span class="code">/etc/machine-info</span> as the unquoted value for the <span class="code">PRETTY_HOSTNAME</span> value.
+) pacemaker-1.1.15(Disabled)
+) pacemaker-1.1.12(Disabled)
+) pacemaker-1.1(Disabled)
+) drbd-9.0(Enabled)
+) drbd-8.4(Disabled)
-<syntaxhighlight lang="bash">
+  Enter the number of the repository you wish to enable/disable. Hit 0 when you are done.
-vim /etc/machine-info
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-PRETTY_HOSTNAME=Alteeves Niche! - Cluster 01, Node 01
+  Enable/Disable: 0
 </syntaxhighlight>
+|}
-If you can't get the <span class="code">hostname</span> command to work for some reason, you can reboot to have the system read the new values.
+{{warning|1=The repository will include a node-specific hash string in the '<span class="code">baseurl</span>'. Keep this private!}}
-== Optional - Video Problems ==
+Once you select '<span class="code">0</span>' to exit that menu, a summary of the repo will be displayed and you will be asked if you want to save it or not.
-On some servers, the OS doesn't detect or use the video card properly. To resolve this, you need to add <span class="code">nomodeset</span> to the kernel line when installing and again after the install is complete.
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+Writing repository config:
+Content:
+[drbd-8.4]
+name=LINBIT Packages for drbd-8.4 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-8.4/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
+[pacemaker-1.1]
+name=LINBIT Packages for pacemaker-1.1 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-Once installed
+[pacemaker-1.1.15]
+name=LINBIT Packages for pacemaker-1.1.15 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.15/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-Edit the <span class="code">/etc/default/grub</span> and append <span class="code">nomodeset</span> to the end of the <span class="code">GRUB_CMDLINE_LINUX</span> variable.
+[pacemaker-1.1.12]
+name=LINBIT Packages for pacemaker-1.1.12 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.12/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-<syntaxhighlight lang="bash">
+[drbd-9.0]
-vim /etc/default/grub
+name=LINBIT Packages for drbd-9.0 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-9.0/$basearch
+enabled=1
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-GRUB_TIMEOUT=5
+--> Write to file (/etc/yum.repos.d/linbit.repo)? [y/N] y
-GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
-GRUB_DEFAULT=saved
-GRUB_CMDLINE_LINUX="nomodeset rd.md=0 rd.lvm=0 rd.dm=0 $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) rd.luks=0 vconsole.keymap=us nomodeset"
-GRUB_DISABLE_RECOVERY="true"
-GRUB_THEME="/boot/grub2/themes/system/theme.txt"
 </syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+Writing repository config:
+Content:
+[drbd-8.4]
+name=LINBIT Packages for drbd-8.4 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-8.4/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-Save that. and then rewrite the [[grub2]] configuration file.
+[pacemaker-1.1]
+name=LINBIT Packages for pacemaker-1.1 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-<syntaxhighlight lang="bash">
+[pacemaker-1.1.15]
-grub2-mkconfig -o /boot/grub2/grub.cfg
+name=LINBIT Packages for pacemaker-1.1.15 - $basearch
-</syntaxhighlight>
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.15/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-Next time you reboot, you should get a stock 80x25 character display. It's not much, but it will work on esoteric video cards or weird monitors.
+[pacemaker-1.1.12]
+name=LINBIT Packages for pacemaker-1.1.12 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/pacemaker-1.1.12/$basearch
+enabled=0
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
-== What Security? ==
+[drbd-9.0]
+name=LINBIT Packages for drbd-9.0 - $basearch
+baseurl=http://packages.linbit.com/xxxxxx/yum/rhel7/drbd-9.0/$basearch
+enabled=1
+gpgkey=https://packages.linbit.com/package-signing-pubkey.asc
+gpgcheck=1
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+--> Write to file (/etc/yum.repos.d/linbit.repo)? [y/N] y
+</syntaxhighlight>
+|}
-{{note|1=The final version of this tutorial '''will''' use the firewall and selinux. It's disabled to simplify debugging during the development stage of the tutorial only.}}
+When you accept, it will download the yum plugins and then ask you if you want to save their PGP key.
-This section will be re-added at the end. For now;
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
-<syntaxhighlight lang="bash">
+|style="white-space: nowrap;"|<syntaxhighlight lang="text">
-setenforce 0
+[OK] Repository configuration written
-sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
+Downloading LINBIT yum plugin
-systemctl disable firewalld.service
+Downloading LINBIT yum plugin config
-systemctl stop firewalld.service
+Final Notes:
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+--> Add linbit signing key to keyring now? [y/N] y
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Now update your package information and install
+LINBIT's kernel module and/or user space utilities
+[OK] Congratulations! Your node was successfully configured.
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="text">
+[OK] Repository configuration written
+Downloading LINBIT yum plugin
+Downloading LINBIT yum plugin config
+Final Notes:
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+--> Add linbit signing key to keyring now? [y/N] y
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Now update your package information and install
+LINBIT's kernel module and/or user space utilities
+[OK] Congratulations! Your node was successfully configured.
 </syntaxhighlight>
+|}
-== Network ==
+Done!
-We want static, named network devices. Follow this;
+== Install ==
-* [[Changing Ethernet Device Names in EL7 and Fedora 15+]]
+Not all of these are required, but most are used at one point or another in this tutorial.
-Then, use these configuration files;
-Build the bridge;
+{{note|1=The <span class="code">fence-agents-virsh</span> package is not available in RHEL 7 beta. Further, it's only needed if you're building your Anvil! using VMs.}}
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-vim /etc/sysconfig/network-scripts/ifcfg-ifn-vbr1
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+yum install rsync pacemaker bridge-utils ntp corosync pcs wget gpm man vim screen mlocate syslinux bzip2 \
+            openssh-clients fence-agents-all fence-agents-virsh policycoreutils-python drbd drbd-bash-completion \
+            drbd-pacemaker drbd-udev drbd-utils drbdmanage
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+|-
-# Internet-Facing Network - Bridge
+!<span class="code">an-a04n02</span>
-DEVICE="ifn-vbr1"
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-TYPE="Bridge"
+<same>
-BOOTPROTO="none"
-IPADDR="10.255.10.1"
-NETMASK="255.255.0.0"
-GATEWAY="10.255.255.254"
-DNS1="8.8.8.8"
-DNS2="8.8.4.4"
-DEFROUTE="yes"
 </syntaxhighlight>
+|}
-Now build the bonds;
-<syntaxhighlight lang="bash">
+== Making ssh faster when the net is down ==
-vim /etc/sysconfig/network-scripts/ifcfg-ifn-bond1
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-# Internet-Facing Network - Bond
-DEVICE="ifn-bond1"
-BRIDGE="ifn-vbr1"
-BOOTPROTO="none"
-NM_CONTROLLED="no"
-ONBOOT="yes"
-BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=ifn1"
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
+By default, the nodes will try to resolve the host name of an incoming ssh connection. When the internet connection is down, DNS lookups have to time out, which can make login times quite slow. When something goes wrong, seconds count and waiting for up to a minute for an SSH password prompt can be maddening.
-vim /etc/sysconfig/network-scripts/ifcfg-sn-bond1
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-# Storage Network - Bond
-DEVICE="sn-bond1"
-BOOTPROTO="none"
-NM_CONTROLLED="no"
-ONBOOT="yes"
-BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=sn1"
-IPADDR="10.10.10.1"
-NETMASK="255.255.0.0"
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
+For this reason, we will make two changes to <span class="code">/etc/ssh/sshd_config</span> that disable this login delay.
-vim /etc/sysconfig/network-scripts/ifcfg-bcn-bond1
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-# Back-Channel Network - Bond
-DEVICE="bcn-bond1"
-BOOTPROTO="none"
-NM_CONTROLLED="no"
-ONBOOT="yes"
-BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=bcn1"
-IPADDR="10.20.10.1"
-NETMASK="255.255.0.0"
-</syntaxhighlight>
-Now tell the interfaces to be slaves to their bonds;
+Please be aware that this can reduce security. If this is a concern, skip this step.
-Internet-Facing Network;
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
-<syntaxhighlight lang="bash">
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-vim /etc/sysconfig/network-scripts/ifcfg-ifn1
+sed -i.anvil 's/#GSSAPIAuthentication no/GSSAPIAuthentication no/' /etc/ssh/sshd_config
+sed -i 's/GSSAPIAuthentication yes/#GSSAPIAuthentication yes/' /etc/ssh/sshd_config
+sed -i 's/#UseDNS yes/UseDNS no/' /etc/ssh/sshd_config
+systemctl restart sshd.service
+diff -u /etc/ssh/sshd_config.anvil /etc/ssh/sshd_config
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="diff">
-# Internet-Facing Network - Link 1
+--- /etc/ssh/sshd_config.anvil	2014-06-09 21:15:52.000000000 -0400
-DEVICE="ifn1"
++++ /etc/ssh/sshd_config	2014-07-27 08:41:03.296760761 -0400
-NM_CONTROLLED="no"
+@@ -89,8 +89,8 @@
-BOOTPROTO="none"
+ #KerberosUseKuserok yes
-ONBOOT="yes"
-SLAVE="yes"
+ # GSSAPI options
-MASTER="ifn-bond1"
+-#GSSAPIAuthentication no
+-GSSAPIAuthentication yes
++GSSAPIAuthentication no
++#GSSAPIAuthentication yes
+ #GSSAPICleanupCredentials yes
+ GSSAPICleanupCredentials yes
+ #GSSAPIStrictAcceptorCheck yes
+@@ -127,7 +127,7 @@
+ #ClientAliveInterval 0
+ #ClientAliveCountMax 3
+ #ShowPatchLevel no
+-#UseDNS yes
++UseDNS no
+ #PidFile /var/run/sshd.pid
+ #MaxStartups 10:30:100
+ #PermitTunnel no
 </syntaxhighlight>
+|-
-<syntaxhighlight lang="bash">
+!<span class="code">an-a04n02</span>
-vim /etc/sysconfig/network-scripts/ifcfg-ifn2
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+same
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-# Back-Channel Network - Link 2
+same
-DEVICE="ifn2"
-NM_CONTROLLED="no"
-BOOTPROTO="none"
-ONBOOT="yes"
-SLAVE="yes"
-MASTER="ifn-bond1"
 </syntaxhighlight>
+|}
+Subsequent logins when the net is down should be quick.
-Storage Network;
+=== Configuring the network ===
-<syntaxhighlight lang="bash">
+If you want to make any other changes, like configuring the interface to have a static IP, do so now. Once you're done editing;
-vim /etc/sysconfig/network-scripts/ifcfg-sn1
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-# Storage Network - Link 1
-DEVICE="sn1"
-NM_CONTROLLED="no"
-BOOTPROTO="none"
-ONBOOT="yes"
-SLAVE="yes"
-MASTER="sn-bond1"
-</syntaxhighlight>
 <syntaxhighlight lang="bash">
-vim /etc/sysconfig/network-scripts/ifcfg-sn2
+nmcli connection reload
+systemctl restart NetworkManager.service
+ip addr show
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-# Storage Network - Link 1
+: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
-DEVICE="sn2"
+    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
-NM_CONTROLLED="no"
+    inet 127.0.0.1/8 scope host lo
-BOOTPROTO="none"
+       valid_lft forever preferred_lft forever
-ONBOOT="yes"
+    inet6 ::1/128 scope host
-SLAVE="yes"
+       valid_lft forever preferred_lft forever
-MASTER="sn-bond1"
+: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
+    link/ether 52:54:00:a7:9d:17 brd ff:ff:ff:ff:ff:ff
+    inet 192.168.122.201/24 scope global eth0
+       valid_lft forever preferred_lft forever
+    inet6 fe80::5054:ff:fea7:9d17/64 scope link
+       valid_lft forever preferred_lft forever
 </syntaxhighlight>
-Back-Channel Network
+The interface should now start on boot properly.
+== Setting the Hostname ==
+Fedora 19 is '''very''' different from [[EL6]].
+{{note|1=The '<span class="code">--pretty</span>' line currently doesn't work as there is [https://bugzilla.redhat.com/show_bug.cgi?id=895299 a bug (rhbz#895299)] with single-quotes.}}
+{{note|1=The '<span class="code">--static</span>' option is currently needed to prevent the '<span class="code">.</span>' from being removed. See [https://bugzilla.redhat.com/show_bug.cgi?id=896756 this bug (rhbz#896756)].}}
+Use a format that works for you. For the tutorial, node names are based on the following;
+* A two-letter prefix identifying the company/user (<span class="code">an</span>, for "Alteeve's Niche!")
+* A sequential Anvil! ID number in the form of <span class="code">aXX</span> (<span class="code">a01</span> for "Anvil! 01", <span class="code">a02</span> for Anvil! 02, etc)
+* A sequential node ID number in the form of <span class="code">nYY</span>
+In our case, this is my third Anvil! and we use the company prefix <span class="code">an</span>, so these two nodes will be;
+* <span class="code">an-a04n01</span> - node 1
+* <span class="code">an-a04n02</span> - node 2
 <syntaxhighlight lang="bash">
-vim /etc/sysconfig/network-scripts/ifcfg-bcn1
+hostnamectl set-hostname an-a04n01.alteeve.ca --static
+hostnamectl set-hostname --pretty "Alteeve's Niche! - Anvil! 03, Node 01"
 </syntaxhighlight>
+If you want the new host name to take effect immediately, you can use the traditional <span class="code">hostname</span> command:
 <syntaxhighlight lang="bash">
-# Back-Channel Network - Link 1
+hostname an-a04n01.alteeve.ca
-DEVICE="bcn1"
+</syntaxhighlight>
-NM_CONTROLLED="no"
-BOOTPROTO="none"
+The "pretty" host name is stored in <span class="code">/etc/machine-info</span> as the unquoted value for the <span class="code">PRETTY_HOSTNAME</span> value.
-ONBOOT="yes"
-SLAVE="yes"
-MASTER="bcn-bond1"
-</syntaxhighlight>
 <syntaxhighlight lang="bash">
-vim /etc/sysconfig/network-scripts/ifcfg-bcn2
+vim /etc/machine-info
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-# Storage Network - Link 1
+PRETTY_HOSTNAME=Alteeves Niche! - Anvil! 03, Node 01
-DEVICE="bcn2"
-NM_CONTROLLED="no"
-BOOTPROTO="none"
-ONBOOT="yes"
-SLAVE="yes"
-MASTER="bcn-bond1"
 </syntaxhighlight>
-Now restart the network, confirm that the bonds and bridge are up and you are ready to proceed.
+If you can't get the <span class="code">hostname</span> command to work for some reason, you can reboot to have the system read the new values.
-== Setup The hosts File ==
+== Network ==
-You can use [[DNS]] if you prefer. For now, lets use <span class="code">/etc/hosts</span> for node name resolution.
+{{note|1=(Note for myself) - Consider using '<span class="code">[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/sec-Specific_Kernel_Module_Capabilities.html#sec-Using_Channel_Bonding primary_reselect=1]</span>.}}
-<syntaxhighlight lang="bash">
+We want static, named network devices. Follow this;
-vim /etc/hosts
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
-::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
-# AN!Cluster 01, Node 01
+* [[Changing Ethernet Device Names in EL7 and Fedora 15+]]
-.255.30.1     an-c03n01.ifn
-.10.30.1      an-c03n01.sn
-.20.30.1      an-c03n01.bcn an-c03n01 an-c03n01.alteeve.ca
-.20.31.1      an-c03n01.ipmi
-# AN!Cluster 01, Node 02
+Then, use these configuration files;
-.255.30.2     an-c03n02.ifn
-.10.30.2      an-c03n02.sn
-.20.30.2      an-c03n02.bcn an-c03n02 an-c03n02.alteeve.ca
-.20.31.2      an-c03n02.ipmi
-# Foundation Pack
+Build the bridge;
-.20.2.7       an-p03 an-p03.alteeve.ca
-</syntaxhighlight>
-== Setup SSH ==
+<syntaxhighlight lang="bash">
+vim /etc/sysconfig/network-scripts/ifcfg-ifn_bridge1
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+# Internet-Facing Network - Bridge
+DEVICE="ifn_bridge1"
+TYPE="Bridge"
+BOOTPROTO="none"
+IPADDR="10.255.40.1"
+NETMASK="255.255.0.0"
+GATEWAY="10.255.255.254"
+DNS1="8.8.8.8"
+DNS2="8.8.4.4"
+DEFROUTE="yes"
+</syntaxhighlight>
-Same as [[AN!Cluster_Tutorial_2#Setting_up_SSH|before]].
+Now build the bonds;
-== Populating And Pushing ~/.ssh/known_hosts ==
+<syntaxhighlight lang="bash">
+vim /etc/sysconfig/network-scripts/ifcfg-ifn_bond1
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+# Internet-Facing Network - Bond
+DEVICE="ifn_bond1"
+BRIDGE="ifn_bridge1"
+BOOTPROTO="none"
+NM_CONTROLLED="no"
+ONBOOT="yes"
+BONDING_OPTS="mode=1 primary=ifn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
-{|class="wikitable"
+vim /etc/sysconfig/network-scripts/ifcfg-sn_bond1
-!<span class="code">an-c03n01</span>
+</syntaxhighlight>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+<syntaxhighlight lang="bash">
-ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa
+# Storage Network - Bond
+DEVICE="sn_bond1"
+BOOTPROTO="none"
+NM_CONTROLLED="no"
+ONBOOT="yes"
+BONDING_OPTS="mode=1 primary=sn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"
+IPADDR="10.10.40.1"
+NETMASK="255.255.0.0"
 </syntaxhighlight>
-<syntaxhighlight lang="text">
-Generating public/private rsa key pair.
-Your identification has been saved in /root/.ssh/id_rsa.
+<syntaxhighlight lang="bash">
-Your public key has been saved in /root/.ssh/id_rsa.pub.
+vim /etc/sysconfig/network-scripts/ifcfg-bcn_bond1
-The key fingerprint is:
-be:17:cc:23:8e:b1:b4:76:a1:e4:2a:91:cb:cd:d8:3a root@an-c03n01.alteeve.ca
-The key's randomart image is:
-+--[ RSA 8191]----+
-|                 |
-|                 |
-|                 |
-|                 |
-|   .    So       |
-|  o   +.o =      |
-| . B + B.o o     |
-|  E + B o..      |
-|  .+.o ...       |
-+-----------------+
 </syntaxhighlight>
-|-
+<syntaxhighlight lang="bash">
-!<span class="code">an-c03n01</span>
+# Back-Channel Network - Bond
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+DEVICE="bcn_bond1"
-ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa
+BOOTPROTO="none"
-</syntaxhighlight>
+NM_CONTROLLED="no"
-<syntaxhighlight lang="text">
+ONBOOT="yes"
-Generating public/private rsa key pair.
+BONDING_OPTS="mode=1 primary=bcn_link1 updelay=120000 downdelay=0 fail_over_mac=none miimon=100 primary_reselect=better resend_igmp=5"
-Created directory '/root/.ssh'.
+IPADDR="10.20.40.1"
-Your identification has been saved in /root/.ssh/id_rsa.
+NETMASK="255.255.0.0"
-Your public key has been saved in /root/.ssh/id_rsa.pub.
-The key fingerprint is:
-:b1:9d:31:9f:7a:c9:10:74:e0:4c:69:53:8f:e4:70 root@an-c03n02.alteeve.ca
-The key's randomart image is:
-+--[ RSA 8191]----+
-|          ..O+E  |
-|           B+% + |
-|        . o.*.= .|
-|         o   + . |
-|        S   . +  |
-|             .   |
-|                 |
-|                 |
-|                 |
-+-----------------+
 </syntaxhighlight>
-|}
-Setup autorized_keys:
+Now tell the interfaces to be slaves to their bonds;
+Internet-Facing Network;
-{|class="wikitable"
+<syntaxhighlight lang="bash">
-!<span class="code">an-c03n01</span>
+vim /etc/sysconfig/network-scripts/ifcfg-ifn_link1
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-ssh root@an-c03n02 "cat /root/.ssh/id_rsa.pub" >> ~/.ssh/authorized_keys
-rsync -av ~/.ssh/authorized_keys root@an-c03n02:/root/.ssh/
-ssh root@an-c03n01
-ssh root@an-c03n01.alteeve.ca
-ssh root@an-c03n02
-ssh root@an-c03n02.alteeve.ca
-rsync -av ~/.ssh/known_hosts root@an-c03n02:/root/.ssh/
-rsync -av /etc/hosts root@an-c03n02:/etc/
 </syntaxhighlight>
-|-
+<syntaxhighlight lang="bash">
-!<span class="code">an-c03n01</span>
+# Internet-Facing Network - Link 1
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+DEVICE="ifn_link1"
-ssh root@an-c03n01
+NM_CONTROLLED="no"
+BOOTPROTO="none"
+ONBOOT="yes"
+SLAVE="yes"
+MASTER="ifn_bond1"
 </syntaxhighlight>
-|}
-== Keeping Time in Sync ==
-It's not as critical as it used to be to keep the clocks on the nodes in sync, but it's still a good idea.
 <syntaxhighlight lang="bash">
-ln -sf /usr/share/zoneinfo/America/Toronto /etc/localtime
+vim /etc/sysconfig/network-scripts/ifcfg-ifn_link2
-systemctl start ntpd.service
-systemctl enable ntpd.service
 </syntaxhighlight>
-== Configuring IPMI ==
-F19 specifics based on the [[IPMI]] tutorial.
 <syntaxhighlight lang="bash">
-yum -y install ipmitools OpenIPMI
+# Internet-Facing Network - Link 2
-systemctl start ipmi.service
+DEVICE="ifn_link2"
-systemctl enable ipmi.service
+NM_CONTROLLED="no"
-</syntaxhighlight>
+BOOTPROTO="none"
-<syntaxhighlight lang="text">
+ONBOOT="yes"
-ln -s '/usr/lib/systemd/system/ipmi.service' '/etc/systemd/system/multi-user.target.wants/ipmi.service'
+SLAVE="yes"
+MASTER="ifn_bond1"
 </syntaxhighlight>
-Our servers use lan channel 2, yours might be 1 or something else. Experiment.
+Storage Network;
 <syntaxhighlight lang="bash">
-ipmitool lan print 2
+vim /etc/sysconfig/network-scripts/ifcfg-sn_link1
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+<syntaxhighlight lang="bash">
-Set in Progress         : Set Complete
+# Storage Network - Link 1
-Auth Type Support       : NONE MD5 PASSWORD
+DEVICE="sn_link1"
-Auth Type Enable        : Callback : NONE MD5 PASSWORD
+NM_CONTROLLED="no"
-                        : User     : NONE MD5 PASSWORD
+BOOTPROTO="none"
-                        : Operator : NONE MD5 PASSWORD
+ONBOOT="yes"
-                        : Admin    : NONE MD5 PASSWORD
+SLAVE="yes"
-                        : OEM      : NONE MD5 PASSWORD
+MASTER="sn_bond1"
-IP Address Source       : BIOS Assigned Address
-IP Address              : 10.20.51.1
-Subnet Mask             : 255.255.0.0
-MAC Address             : 00:19:99:9a:d8:e8
-SNMP Community String   : public
-IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
-Default Gateway IP      : 10.20.255.254
-.1q VLAN ID          : Disabled
-.1q VLAN Priority    : 0
-RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
-Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
-                        :     X=Cipher Suite Unused
-                        :     c=CALLBACK
-                        :     u=USER
-                        :     o=OPERATOR
-                        :     a=ADMIN
-                        :     O=OEM
 </syntaxhighlight>
-I need to set the IPs to <span class="code">10.20.31.1/16</span> and <span class="code">10.20.31.2/16</span> for nodes 1 and 2, respectively. I also want to set the password to <span class="code">secret</span> for the <span class="code">admin</span> user.
+<syntaxhighlight lang="bash">
+vim /etc/sysconfig/network-scripts/ifcfg-sn_link2
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+# Storage Network - Link 2
+DEVICE="sn_link2"
+NM_CONTROLLED="no"
+BOOTPROTO="none"
+ONBOOT="yes"
+SLAVE="yes"
+MASTER="sn_bond1"
+</syntaxhighlight>
-'''Node 01''' IP;
+Back-Channel Network
 <syntaxhighlight lang="bash">
-ipmitool lan set 2 ipsrc static
+vim /etc/sysconfig/network-scripts/ifcfg-bcn_link1
-ipmitool lan set 2 ipaddr 10.20.31.
+</syntaxhighlight>
-ipmitool lan set 2 netmask 255.255.0.0
+<syntaxhighlight lang="bash">
-ipmitool lan set 2 defgw ipaddr 10.20.255.254
+# Back-Channel Network - Link 1
-ipmitool lan print 2
+DEVICE="bcn_link1"
+NM_CONTROLLED="no"
+BOOTPROTO="none"
+ONBOOT="yes"
+SLAVE="yes"
+MASTER="bcn_bond1"
 </syntaxhighlight>
-<syntaxhighlight lang="text">
-Set in Progress         : Set Complete
+<syntaxhighlight lang="bash">
-Auth Type Support       : NONE MD5 PASSWORD
+vim /etc/sysconfig/network-scripts/ifcfg-bcn_link2
-Auth Type Enable        : Callback : NONE MD5 PASSWORD
+</syntaxhighlight>
-                        : User     : NONE MD5 PASSWORD
+<syntaxhighlight lang="bash">
-                        : Operator : NONE MD5 PASSWORD
+# Back-Channel Network - Link 2
-                        : Admin    : NONE MD5 PASSWORD
+DEVICE="bcn_link2"
-                        : OEM      : NONE MD5 PASSWORD
+NM_CONTROLLED="no"
-IP Address Source       : Static Address
+BOOTPROTO="none"
-IP Address              : 10.20.31.1
+ONBOOT="yes"
-Subnet Mask             : 255.255.0.0
+SLAVE="yes"
-MAC Address             : 00:19:99:9a:d8:e8
+MASTER="bcn_bond1"
-SNMP Community String   : public
-IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
-Default Gateway IP      : 10.20.255.254
-.1q VLAN ID          : Disabled
-.1q VLAN Priority    : 0
-RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
-Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
-                        :     X=Cipher Suite Unused
-                        :     c=CALLBACK
-                        :     u=USER
-                        :     o=OPERATOR
-                        :     a=ADMIN
-                        :     O=OEM
 </syntaxhighlight>
-'''Node 01''' IP;
+Now restart the network, confirm that the bonds and bridge are up and you are ready to proceed.
+== Setup The hosts File ==
+You can use [[DNS]] if you prefer. For now, lets use <span class="code">/etc/hosts</span> for node name resolution.
 <syntaxhighlight lang="bash">
-ipmitool lan set 2 ipsrc static
+vim /etc/hosts
-ipmitool lan set 2 ipaddr 10.20.31.2
-ipmitool lan set 2 netmask 255.255.0.0
-ipmitool lan set 2 defgw ipaddr 10.20.255.254
-ipmitool lan print 2
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Set in Progress         : Set Complete
+.0.0.1	localhost localhost.localdomain localhost4 localhost4.localdomain4
-Auth Type Support       : NONE MD5 PASSWORD
+::1		localhost localhost.localdomain localhost6 localhost6.localdomain6
-Auth Type Enable        : Callback : NONE MD5 PASSWORD
-                        : User     : NONE MD5 PASSWORD
+# Anvil! 03, Node 01
-                        : Operator : NONE MD5 PASSWORD
+.255.40.1	an-a04n01.ifn
-                        : Admin    : NONE MD5 PASSWORD
+.10.40.1	an-a04n01.sn
-                        : OEM      : NONE MD5 PASSWORD
+.20.40.1	an-a04n01.bcn an-a04n01 an-a04n01.alteeve.ca
-IP Address Source       : Static Address
+.20.41.1	an-a04n01.ipmi
-IP Address              : 10.20.31.2
-Subnet Mask             : 255.255.0.0
-MAC Address             : 00:19:99:9a:b1:78
-SNMP Community String   : public
-IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
-Default Gateway IP      : 10.20.255.254
-.1q VLAN ID          : Disabled
-.1q VLAN Priority    : 0
-RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
-Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
-                        :     X=Cipher Suite Unused
-                        :     c=CALLBACK
-                        :     u=USER
-                        :     o=OPERATOR
-                        :     a=ADMIN
-                        :     O=OEM
-</syntaxhighlight>
-Set the password.
+# Anvil! 03, Node 02
+.255.40.2	an-a04n02.ifn
+.10.40.2	an-a04n02.sn
+.20.40.2	an-a04n02.bcn an-a04n02 an-a04n02.alteeve.ca
+.20.41.2	an-a04n02.ipmi
-<syntaxhighlight lang="bash">
+# Foundation Pack
-ipmitool user list 2
+### Foundation Pack
-</syntaxhighlight>
+# Network Switches
-<syntaxhighlight lang="text">
+.20.1.1	an-switch01 an-switch01.alteeve.ca
-ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
+.20.1.2	an-switch02 an-switch02.alteeve.ca	# Only accessible when out of the stack
-                   true    true       true       Unknown (0x00)
-  admin            true    true       true       OEM
+# Switched PDUs
-Get User Access command failed (channel 2, user 3): Unknown (0x32)
+.20.2.1	an-pdu01 an-pdu01.alteeve.ca
+.20.2.2	an-pdu02 an-pdu02.alteeve.ca
+# Network-monitored UPSes
+.20.3.1	an-ups01 an-ups01.alteeve.ca
+.20.3.2	an-ups02 an-ups02.alteeve.ca
+### Monitor Packs
+.20.4.1	an-striker01 an-striker01.alteeve.ca
+.255.4.1	an-striker01.ifn
+.20.4.2	an-striker02 an-striker02.alteeve.ca
+.255.4.2	an-striker02.ifn
 </syntaxhighlight>
-(ignore the error, it's harmless... *BOOM*)
+== Setup SSH ==
+Same as [[AN!Cluster_Tutorial_2#Setting_up_SSH|before]].
-We want to set <span class="code">admin</span>'s password, so we do:
+== Populating And Pushing ~/.ssh/known_hosts ==
-{{note|1=The <span class="code">2</span> below is the ID number, not the LAN channel.}}
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-ipmitool user set password 2 secret
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+Generating public/private rsa key pair.
-Done!
+Your identification has been saved in /root/.ssh/id_rsa.
+Your public key has been saved in /root/.ssh/id_rsa.pub.
-= Configuring the Cluster =
+The key fingerprint is:
+be:17:cc:23:8e:b1:b4:76:a1:e4:2a:91:cb:cd:d8:3a root@an-a04n01.alteeve.ca
-Now we're getting down to business!
+The key's randomart image is:
++--[ RSA 8191]----+
-For this section, we will be working on <span class="code">an-c03n01</span> and using [[ssh]] to perform tasks on <span class="code">an-c03n02</span>.
+|                 |
+|                 |
-{{note|1=TODO: explain what this is and how it works.}}
+|                 |
+|                 |
-== Enable the pcs Daemon ==
+|   .    So       |
+|  o   +.o =      |
-{{note|1=Most of this section comes more or less verbatim from the main [http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html Clusters from Scratch] tutorial.}}
+| . B + B.o o     |
+|  E + B o..      |
-We will use [[pcs]], the Pacemaker Configuration System, to configure our cluster.
+|  .+.o ...       |
++-----------------+
-<syntaxhighlight lang="bash">
+</syntaxhighlight>
-systemctl start pcsd.service
+|-
-systemctl enable pcsd.service
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+ssh-keygen -t rsa -N "" -b 8191 -f ~/.ssh/id_rsa
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
+Generating public/private rsa key pair.
-</syntaxhighlight>
+Created directory '/root/.ssh'.
+Your identification has been saved in /root/.ssh/id_rsa.
+Your public key has been saved in /root/.ssh/id_rsa.pub.
+The key fingerprint is:
+:b1:9d:31:9f:7a:c9:10:74:e0:4c:69:53:8f:e4:70 root@an-a04n02.alteeve.ca
+The key's randomart image is:
++--[ RSA 8191]----+
+|          ..O+E  |
+|           B+% + |
+|        . o.*.= .|
+|         o   + . |
+|        S   . +  |
+|             .   |
+|                 |
+|                 |
+|                 |
++-----------------+
+</syntaxhighlight>
+|}
-Now we need to set a password for the <span class="code">hacluster</span> user. This is the account used by <span class="code">pcs</span> on one node to talk to the <span class="code">pcs</span> [[daemon]] on the other node. For this tutorial, we will use the password <span class="code">secret</span>. You will want to use [https://xkcd.com/936/ a stronger password], of course.
+Setup autorized_keys:
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-echo secret | passwd --stdin hacluster
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
+ssh root@an-a04n02 "cat /root/.ssh/id_rsa.pub" >> ~/.ssh/authorized_keys
+rsync -av ~/.ssh/authorized_keys root@an-a04n02:/root/.ssh/
+ssh-keyscan an-a04n01.alteeve.ca >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n01 >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n01.bcn >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n01.sn >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n01.ifn >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n02.alteeve.ca >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n02 >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n02.bcn >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n02.sn >> ~/.ssh/known_hosts
+ssh-keyscan an-a04n02.ifn >> ~/.ssh/known_hosts
+rsync -av ~/.ssh/known_hosts root@an-a04n02:/root/.ssh/
+rsync -av /etc/hosts root@an-a04n02:/etc/
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+|-
-Changing password for user hacluster.
+!<span class="code">an-a04n01</span>
-passwd: all authentication tokens updated successfully.
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
 </syntaxhighlight>
+|}
-== Initializing the Cluster ==
+== Keeping Time in Sync ==
-One of the biggest reasons we're using the [[pcs]] tool, over something like [[crm]], is that it has been written to simplify the setup of clusters on [[Red Hat]] style operating systems. It will configure [[corosync]] automatically.
+It's not as critical as it used to be to keep the clocks on the nodes in sync, but it's still a good idea.
-First, we need to know what <span class="code">hostname</span> we will need to use for <span class="code">[[pcs]]</span>.
-'''Node 01''':
 <syntaxhighlight lang="bash">
-hostname
+ln -sf /usr/share/zoneinfo/America/Toronto /etc/localtime
-</syntaxhighlight>
+systemctl start ntpd.service
-<syntaxhighlight lang="bash">
+systemctl enable ntpd.service
-an-c03n01.alteeve.ca
 </syntaxhighlight>
-'''Node 02''':
+== Configuring IPMI ==
+F19 specifics based on the [[IPMI]] tutorial.
 <syntaxhighlight lang="bash">
-hostname
+yum -y install ipmitools OpenIPMI
+systemctl start ipmi.service
+systemctl enable ipmi.service
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-an-c03n02.alteeve.ca
+ln -s '/usr/lib/systemd/system/ipmi.service' '/etc/systemd/system/multi-user.target.wants/ipmi.service'
 </syntaxhighlight>
-Next, authenticate against the cluster nodes.
+Our servers use lan channel 2, yours might be 1 or something else. Experiment.
-'''Both nodes''':
 <syntaxhighlight lang="bash">
-pcs cluster auth an-c03n01.alteeve.ca an-c03n02.alteeve.ca -u hacluster
+ipmitool lan print 2
 </syntaxhighlight>
+<syntaxhighlight lang="text">
-This will ask you for the user name and password. The default user name is <span class="code">hacluster</span> and we set the password to <span class="code">secret</span>.
+Set in Progress         : Set Complete
+Auth Type Support       : NONE MD5 PASSWORD
-<syntaxhighlight lang="text">
+Auth Type Enable        : Callback : NONE MD5 PASSWORD
-Password:
+                        : User     : NONE MD5 PASSWORD
-an-c03n01.alteeve.ca: 6e9f7e98-dfb7-4305-b8e0-d84bf4f93ce3
+                        : Operator : NONE MD5 PASSWORD
-an-c03n01.alteeve.ca: Authorized
+                        : Admin    : NONE MD5 PASSWORD
-an-c03n02.alteeve.ca: ffee6a85-ddac-4d03-9b97-f136d532b478
+                        : OEM      : NONE MD5 PASSWORD
-an-c03n02.alteeve.ca: Authorized
+IP Address Source       : BIOS Assigned Address
+IP Address              : 10.20.41.1
+Subnet Mask             : 255.255.0.0
+MAC Address             : 00:19:99:9a:d8:e8
+SNMP Community String   : public
+IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
+Default Gateway IP      : 10.20.255.254
+.1q VLAN ID          : Disabled
+.1q VLAN Priority    : 0
+RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
+Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
+                        :     X=Cipher Suite Unused
+                        :     c=CALLBACK
+                        :     u=USER
+                        :     o=OPERATOR
+                        :     a=ADMIN
+                        :     O=OEM
 </syntaxhighlight>
-'''Do this on one node only''':
+I need to set the IPs to <span class="code">10.20.41.1/16</span> and <span class="code">10.20.41.2/16</span> for nodes 1 and 2, respectively. I also want to set the password to <span class="code">secret</span> for the <span class="code">admin</span> user.
-Now to initialize the cluster's communication and membership layer.
+'''Node 01''' IP;
 <syntaxhighlight lang="bash">
-pcs cluster setup --name an-cluster-03 an-c03n01.alteeve.ca an-c03n02.alteeve.ca
+ipmitool lan set 2 ipsrc static
+ipmitool lan set 2 ipaddr 10.20.41.1
+ipmitool lan set 2 netmask 255.255.0.0
+ipmitool lan set 2 defgw ipaddr 10.20.255.254
+ipmitool lan print 2
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-an-c03n01.alteeve.ca: Succeeded
+Set in Progress         : Set Complete
-an-c03n02.alteeve.ca: Succeeded
+Auth Type Support       : NONE MD5 PASSWORD
+Auth Type Enable        : Callback : NONE MD5 PASSWORD
+                        : User     : NONE MD5 PASSWORD
+                        : Operator : NONE MD5 PASSWORD
+                        : Admin    : NONE MD5 PASSWORD
+                        : OEM      : NONE MD5 PASSWORD
+IP Address Source       : Static Address
+IP Address              : 10.20.41.1
+Subnet Mask             : 255.255.0.0
+MAC Address             : 00:19:99:9a:d8:e8
+SNMP Community String   : public
+IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
+Default Gateway IP      : 10.20.255.254
+.1q VLAN ID          : Disabled
+.1q VLAN Priority    : 0
+RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
+Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
+                        :     X=Cipher Suite Unused
+                        :     c=CALLBACK
+                        :     u=USER
+                        :     o=OPERATOR
+                        :     a=ADMIN
+                        :     O=OEM
 </syntaxhighlight>
-This will create the corosync configuration file <span class="code">/etc/corosync/corosync.conf</span>;
+'''Node 01''' IP;
 <syntaxhighlight lang="bash">
-cat /etc/corosync/corosync.conf
+ipmitool lan set 2 ipsrc static
+ipmitool lan set 2 ipaddr 10.20.41.2
+ipmitool lan set 2 netmask 255.255.0.0
+ipmitool lan set 2 defgw ipaddr 10.20.255.254
+ipmitool lan print 2
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-totem {
+Set in Progress         : Set Complete
-version: 2
+Auth Type Support       : NONE MD5 PASSWORD
-secauth: off
+Auth Type Enable        : Callback : NONE MD5 PASSWORD
-cluster_name: an-cluster-03
+                        : User     : NONE MD5 PASSWORD
-transport: udpu
+                        : Operator : NONE MD5 PASSWORD
-}
+                        : Admin    : NONE MD5 PASSWORD
+                        : OEM      : NONE MD5 PASSWORD
+IP Address Source       : Static Address
+IP Address              : 10.20.41.2
+Subnet Mask             : 255.255.0.0
+MAC Address             : 00:19:99:9a:b1:78
+SNMP Community String   : public
+IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
+Default Gateway IP      : 10.20.255.254
+.1q VLAN ID          : Disabled
+.1q VLAN Priority    : 0
+RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,17
+Cipher Suite Priv Max   : OOOOOOOOXXXXXXX
+                        :     X=Cipher Suite Unused
+                        :     c=CALLBACK
+                        :     u=USER
+                        :     o=OPERATOR
+                        :     a=ADMIN
+                        :     O=OEM
+</syntaxhighlight>
-nodelist {
+Set the password.
-  node {
-        ring0_addr: an-c03n01.alteeve.ca
-        nodeid: 1
-       }
-  node {
-        ring0_addr: an-c03n02.alteeve.ca
-        nodeid: 2
-       }
-}
-quorum {
+<syntaxhighlight lang="bash">
-provider: corosync_votequorum
+ipmitool user list 2
-two_node: 1
+</syntaxhighlight>
-}
+<syntaxhighlight lang="text">
+ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
-logging {
+                    true    true       true       Unknown (0x00)
-to_syslog: yes
+   admin            true    true       true       OEM
-}
+Get User Access command failed (channel 2, user 3): Unknown (0x32)
 </syntaxhighlight>
-== Start the Cluster For the First Time ==
+(ignore the error, it's harmless... *BOOM*)
-This starts the cluster communication and membership layer for the first time.
+We want to set <span class="code">admin</span>'s password, so we do:
-'''On one node only''';
+{{note|1=The <span class="code">2</span> below is the ID number, not the LAN channel.}}
 <syntaxhighlight lang="bash">
-pcs cluster start --all
+ipmitool user set password 2 secret
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-an-c03n01.alteeve.ca: Starting Cluster...
-an-c03n02.alteeve.ca: Starting Cluster...
 </syntaxhighlight>
-After a few moments, you should be able to check the status;
+Done!
-<syntaxhighlight lang="bash">
+= Configuring the Anvil! =
-pcs status
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Cluster name: an-cluster-03
-WARNING: no stonith devices and stonith-enabled is not false
-Last updated: Mon Jun 24 23:28:29 2013
-Last change: Mon Jun 24 23:28:10 2013 via crmd on an-c03n01.alteeve.ca
-Current DC: NONE
-Nodes configured, unknown expected votes
-Resources configured.
+Now we're getting down to business!
-Node an-c03n01.alteeve.ca (1): UNCLEAN (offline)
+For this section, we will be working on <span class="code">an-a04n01</span> and using [[ssh]] to perform tasks on <span class="code">an-a04n02</span>.
-Node an-c03n02.alteeve.ca (2): UNCLEAN (offline)
-Full list of resources:
+{{note|1=TODO: explain what this is and how it works.}}
-</syntaxhighlight>
-The other node should show almost the identical output.
+== Enable the pcs Daemon ==
-== Disabling Quorum ==
+{{note|1=Most of this section comes more or less verbatim from the main [http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html Clusters from Scratch] tutorial.}}
-{{note|1=Show the math.}}
+We will use [[pcs]], the Pacemaker Configuration System, to configure our Anvil!.
-With quorum enabled, a two node cluster will lose quorum once either node fails. So we have to disable quorum.
+Note that pcsd uses TCP port 2224.
-By default, pacemaker uses quorum. You don't see this initially though;
 <syntaxhighlight lang="bash">
-pcs property
+systemctl start pcsd.service
+systemctl enable pcsd.service
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster Properties:
+ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
- dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
+</syntaxhighlight>
- cluster-infrastructure: corosync
+Now we need to set a password for the <span class="code">hacluster</span> user. This is the account used by <span class="code">pcs</span> on one node to talk to the <span class="code">pcs</span> [[daemon]] on the other node. For this tutorial, we will use the password <span class="code">secret</span>. You will want to use [https://xkcd.com/936/ a stronger password], of course.
+<syntaxhighlight lang="bash">
+echo "super secret password" | passwd --stdin hacluster
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Changing password for user hacluster.
+passwd: all authentication tokens updated successfully.
 </syntaxhighlight>
-To disable it, we set <span class="code">no-quorum-policy=ignore</span>.
+Open up the firewall:
 <syntaxhighlight lang="bash">
-pcs property set no-quorum-policy=ignore
+firewall-cmd --permanent --add-service=high-availability
-pcs property
+firewall-cmd --reload
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster Properties:
- dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
- cluster-infrastructure: corosync
- no-quorum-policy: ignore
 </syntaxhighlight>
-== Enabling and Configuring Fencing ==
+== Initializing the Cluster ==
+One of the biggest reasons we're using the [[pcs]] tool, over something like [[crm]], is that it has been written to simplify the setup of clusters on [[Red Hat]] style operating systems. It will configure [[corosync]] automatically.
-We will use IPMI and PDU based fence devices for redundancy.
+First, we need to know what <span class="code">hostname</span> we will need to use for <span class="code">[[pcs]]</span>.
-You can see the list of available fence agents here. You will need to find the one for your hardware fence devices.
+'''Node 01''':
 <syntaxhighlight lang="bash">
-pcs stonith list
+hostname
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+<syntaxhighlight lang="bash">
-fence_alom - Fence agent for Sun ALOM
+an-a04n01.alteeve.ca
-fence_apc - Fence agent for APC over telnet/ssh
+</syntaxhighlight>
-fence_apc_snmp - Fence agent for APC over SNMP
-fence_baytech - I/O Fencing agent for Baytech RPC switches in combination with a Cyclades Terminal
+'''Node 02''':
-                Server
-fence_bladecenter - Fence agent for IBM BladeCenter
+<syntaxhighlight lang="bash">
-fence_brocade - Fence agent for Brocade over telnet
+hostname
-fence_bullpap - I/O Fencing agent for Bull FAME architecture controlled by a PAP management console.
+</syntaxhighlight>
-fence_cisco_mds - Fence agent for Cisco MDS
+<syntaxhighlight lang="bash">
-fence_cisco_ucs - Fence agent for Cisco UCS
+an-a04n02.alteeve.ca
-fence_cpint - I/O Fencing agent for GFS on s390 and zSeries VM clusters
+</syntaxhighlight>
-fence_drac - fencing agent for Dell Remote Access Card
-fence_drac5 - Fence agent for Dell DRAC CMC/5
+Next, authenticate against the cluster nodes.
-fence_eaton_snmp - Fence agent for Eaton over SNMP
-fence_egenera - I/O Fencing agent for the Egenera BladeFrame
+'''Both nodes''':
-fence_eps - Fence agent for ePowerSwitch
-fence_hpblade - Fence agent for HP BladeSystem
+<syntaxhighlight lang="bash">
-fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
+pcs cluster auth an-a04n01.alteeve.ca an-a04n02.alteeve.ca -u hacluster
-fence_idrac - Fence agent for IPMI over LAN
+</syntaxhighlight>
-fence_ifmib - Fence agent for IF MIB
-fence_ilo - Fence agent for HP iLO
+This will ask you for the user name and password. The default user name is <span class="code">hacluster</span> and we set the password to <span class="code">secret</span>.
-fence_ilo2 - Fence agent for HP iLO
-fence_ilo3 - Fence agent for IPMI over LAN
+<syntaxhighlight lang="text">
-fence_ilo_mp - Fence agent for HP iLO MP
+Password:
-fence_imm - Fence agent for IPMI over LAN
+an-a04n01.alteeve.ca: 6e9f7e98-dfb7-4305-b8e0-d84bf4f93ce3
-fence_intelmodular - Fence agent for Intel Modular
+an-a04n01.alteeve.ca: Authorized
-fence_ipdu - Fence agent for iPDU over SNMP
+an-a04n02.alteeve.ca: ffee6a85-ddac-4d03-9b97-f136d532b478
-fence_ipmilan - Fence agent for IPMI over LAN
+an-a04n02.alteeve.ca: Authorized
-fence_kdump - Fence agent for use with kdump
+</syntaxhighlight>
-fence_ldom - Fence agent for Sun LDOM
-fence_lpar - Fence agent for IBM LPAR
+'''Do this on one node only''':
-fence_mcdata - I/O Fencing agent for McData FC switches
-fence_rackswitch - fence_rackswitch - I/O Fencing agent for RackSaver RackSwitch
+Now to initialize the cluster's communication and membership layer.
-fence_rhevm - Fence agent for RHEV-M REST API
-fence_rsa - Fence agent for IBM RSA
+<syntaxhighlight lang="bash">
-fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
+pcs cluster setup --name an-anvil-03 an-a04n01.alteeve.ca an-a04n02.alteeve.ca
-fence_sanbox2 - Fence agent for QLogic SANBox2 FC switches
+</syntaxhighlight>
-fence_scsi - fence agent for SCSI-3 persistent reservations
+<syntaxhighlight lang="text">
-fence_virsh - Fence agent for virsh
+an-a04n01.alteeve.ca: Succeeded
-fence_vixel - I/O Fencing agent for Vixel FC switches
+an-a04n02.alteeve.ca: Succeeded
-fence_vmware - Fence agent for VMWare
+</syntaxhighlight>
-fence_vmware_soap - Fence agent for VMWare over SOAP API
-fence_wti - Fence agent for WTI
+This will create the corosync configuration file <span class="code">/etc/corosync/corosync.conf</span>;
-fence_xcat - I/O Fencing agent for xcat environments
-fence_xenapi - XenAPI based fencing for the Citrix XenServer virtual machines.
-fence_zvm - I/O Fencing agent for GFS on s390 and zSeries VM clusters
-</syntaxhighlight>
-We will use <span class="code">fence_ipmilan</span> and <span class="code">fence_apc_snmp</span>.
-=== Configuring IPMI Fencing ===
-Every fence agent has a possibly unique subset of options that can be used. You can see a brief description of these options with the <span class="code">pcs stonith describe fence_X</span> command. Let's look at the options available for <span class="code">fence_ipmilan</span>.
 <syntaxhighlight lang="bash">
-pcs stonith describe fence_ipmilan
+cat /etc/corosync/corosync.conf
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Stonith options for: fence_ipmilan
+totem {
-  auth: IPMI Lan Auth type (md5, password, or none)
+version: 2
-  ipaddr: IPMI Lan IP to talk to
+secauth: off
-  passwd: Password (if required) to control power on IPMI device
+cluster_name: an-anvil-03
-  passwd_script: Script to retrieve password (if required)
+transport: udpu
-  lanplus: Use Lanplus
+}
-  login: Username/Login (if required) to control power on IPMI device
-  action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
-  timeout: Timeout (sec) for IPMI operation
-  cipher: Ciphersuite to use (same as ipmitool -C parameter)
-  method: Method to fence (onoff or cycle)
-  power_wait: Wait X seconds after on/off operation
-  delay: Wait X seconds before fencing is started
-  privlvl: Privilege level on IPMI device
-  verbose: Verbose mode
-</syntaxhighlight>
-One of the nice things about pcs is that it allows us to create a test file to prepare all our changes in. Then, when we're happy with the changes, merge them into the running cluster. So let's make a copy called <span class="code">stonith_cfg</span>
+nodelist {
+  node {
+        ring0_addr: an-a04n01.alteeve.ca
+        nodeid: 1
+       }
+  node {
+        ring0_addr: an-a04n02.alteeve.ca
+        nodeid: 2
+       }
+}
-<syntaxhighlight lang="bash">
+quorum {
-pcs cluster cib stonith_cfg
+provider: corosync_votequorum
-</syntaxhighlight>
+two_node: 1
+}
-Now add [[IPMI]] fencing.
+logging {
+to_syslog: yes
-<syntaxhighlight lang="bash">
+}
-#                  unique name    fence agent   target node                           device addr             options
-pcs stonith create fence_n01_ipmi fence_ipmilan pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="an-c03n01.ipmi" action="reboot" login="admin" passwd="secret" delay=15 op monitor interval=60s
-pcs stonith create fence_n02_ipmi fence_ipmilan pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="an-c03n02.ipmi" action="reboot" login="admin" passwd="secret" op monitor interval=60s
 </syntaxhighlight>
-Note that <span class="code">fence_n01_ipmi</span> has a <span class="code">delay=15</span> set but <span class="code">fence_n02_ipmi</span> does not. If the network connection breaks between the two nodes, they will both try to fence each other at the same time. If <span class="code">acpid</span> is running, the slower node will not die right away. It will continue to run for up to four more seconds, ample time for it to also initiate a fence against the faster node. The end result is that both nodes get fenced. The ten-second delay protects against this by causing <span class="code">an-c03n02</span> to pause for <span class="code">10</span> seconds before initiating a fence against <span class="code">an-c03n01</span>. If both nodes are alive, <span class="code">an-c03n02</span> will power off before the 10 seconds pass, so it will never fence <span class="code">an-c03n01</span>. However, if <span class="code">an-c03n01</span> really is dead, after the ten seconds have elapsed, fencing will proceed as normal.
+== Start the Cluster For the First Time ==
-{{note|1=At the time of writing, <span class="code">pcmk_reboot_action</span> is needed to override pacemaker's global fence action and <span class="code">pcmk_reboot_action</span> is not recognized by pcs. Both of these issues will be resolved shortly; Pacemaker will honour <span class="code">action="..."</span> in v1.1.10 and pcs will recognize <span class="code">pcmk_*</span> special attributes "real soon now". Until then, the <span class="code">--force</span> switch is needed.}}
+This starts the cluster communication and membership layer for the first time.
-Next, add the [[PDU]] fencing. This requires distinct "off" and "on" actions for each outlet on each PDU. With two nodes, each with two [[PSU]]s, this translates to eight commands. The "off" commands will be monitored to alert us if the PDU fails for some reason. There is no reason to monitor the "on" actions (it would be redundant). Note also that we don't bother using a "delay". The IPMI fence method will go first, before the PDU actions, so the PDU is already delayed.
+'''On one node only''';
 <syntaxhighlight lang="bash">
-# Node 1 - off
+pcs cluster start --all
-pcs stonith create fence_n01_pdu1_off fence_apc_snmp pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="an-p01" action="off" port="1" op monitor interval="60s"
+</syntaxhighlight>
-pcs stonith create fence_n01_pdu2_off fence_apc_snmp pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="an-p02" action="off" port="1" power_wait="5" op monitor interval="60s"
+<syntaxhighlight lang="text">
+an-a04n01.alteeve.ca: Starting Cluster...
-# Node 1 - on
+an-a04n02.alteeve.ca: Starting Cluster...
-pcs stonith create fence_n01_pdu1_on fence_apc_snmp pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="an-p01" action="on" port="1"
-pcs stonith create fence_n01_pdu2_on fence_apc_snmp pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="an-p02" action="on" port="1"
-# Node 2 - off
-pcs stonith create fence_n02_pdu1_off fence_apc_snmp pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="an-p01" action="off" port="2" op monitor interval="60s"
-pcs stonith create fence_n02_pdu2_off fence_apc_snmp pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="an-p02" action="off" port="2" power_wait="5" op monitor interval="60s"
-# Node 2 - on
-pcs stonith create fence_n02_pdu1_on fence_apc_snmp pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="an-p01" action="on" port="2"
-pcs stonith create fence_n02_pdu2_on fence_apc_snmp pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="an-p02" action="on" port="2"
 </syntaxhighlight>
-We can check the new configuration now;
+After a few moments, you should be able to check the status;
 <syntaxhighlight lang="bash">
@@ Line 1,014: / Line 1,137: @@
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster name: an-cluster-03
+Cluster name: an-anvil-04
-Last updated: Tue Jul  2 16:41:55 2013
+WARNING: no stonith devices and stonith-enabled is not false
-Last change: Tue Jul  2 16:41:44 2013 via cibadmin on an-c03n01.alteeve.ca
+Last updated: Mon Jun 24 23:28:29 2013
-Stack: corosync
+Last change: Mon Jun 24 23:28:10 2013 via crmd on an-a04n01.alteeve.ca
-Current DC: an-c03n01.alteeve.ca (1) - partition with quorum
+Current DC: NONE
-Version: 1.1.9-3.fc19-781a388
 Nodes configured, unknown expected votes
 Resources configured.
-Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
+Node an-a04n01.alteeve.ca (1): UNCLEAN (offline)
+Node an-a04n02.alteeve.ca (2): UNCLEAN (offline)
 Full list of resources:
+</syntaxhighlight>
+The other node should show almost the identical output.
+== Disabling Quorum ==
+{{note|1=Show the math.}}
+With quorum enabled, a two node cluster will lose quorum once either node fails. So we have to disable quorum.
+By default, pacemaker uses quorum. You don't see this initially though;
- fence_n01_ipmi	(stonith:fence_ipmilan):	Started an-c03n01.alteeve.ca
+<syntaxhighlight lang="bash">
- fence_n02_ipmi	(stonith:fence_ipmilan):	Started an-c03n02.alteeve.ca
+pcs property
- fence_n01_pdu1_off	(stonith:fence_apc_snmp):	Started an-c03n01.alteeve.ca
+</syntaxhighlight>
-  fence_n01_pdu2_off	(stonith:fence_apc_snmp):	Started an-c03n02.alteeve.ca
+<syntaxhighlight lang="text">
- fence_n02_pdu1_off	(stonith:fence_apc_snmp):	Started an-c03n01.alteeve.ca
+Cluster Properties:
- fence_n02_pdu2_off	(stonith:fence_apc_snmp):	Started an-c03n02.alteeve.ca
+  dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
- fence_n01_pdu1_on	(stonith:fence_apc_snmp):	Started an-c03n01.alteeve.ca
+  cluster-infrastructure: corosync
- fence_n01_pdu2_on	(stonith:fence_apc_snmp):	Started an-c03n02.alteeve.ca
-  fence_n02_pdu1_on	(stonith:fence_apc_snmp):	Started an-c03n01.alteeve.ca
- fence_n02_pdu2_on	(stonith:fence_apc_snmp):	Started an-c03n02.alteeve.ca
 </syntaxhighlight>
-Before we proceed, we need to tell pacemaker to use fencing;
+To disable it, we set <span class="code">no-quorum-policy=ignore</span>.
 <syntaxhighlight lang="bash">
-pcs property set stonith-enabled=true
+pcs property set no-quorum-policy=ignore
 pcs property
 </syntaxhighlight>
 <syntaxhighlight lang="text">
 Cluster Properties:
-Cluster Properties:
+ dc-version: 1.1.9-0.1318.a7966fb.git.fc18-a7966fb
   cluster-infrastructure: corosync
- dc-version: 1.1.9-3.fc19-781a388
   no-quorum-policy: ignore
- stonith-enabled: true
 </syntaxhighlight>
-Excellent!
+== Enabling and Configuring Fencing ==
-== Configuring Fence Levels ==
+We will use IPMI and PDU based fence devices for redundancy.
-The goal of fence levels is to tell pacemaker that there are "fence methods" to try and to impose an order on those methods. Each method composes one or more fence primitives and, when 2 or more primitives are tied together, that all primitives must succeed for the overall method to succeed.
+You can see the list of available fence agents here. You will need to find the one for your hardware fence devices.
-So in our case; the order we want is;
-* IPMI -> PDUs
-The reason is that when IPMI fencing succeeds, we can be very certain the node is truly fenced. When PDU fencing succeeds, it only confirms that the power outlets were cycled. If someone moved a node's power cables to another outlet, we'll get a false positive. On that topic, tie-down the node's PSU cables to the PDU's cable tray when possible, clearly label the power cables and wrap the fingers of anyone who might move them around.
-The PDU fencing needs to be implemented using four steps;
-* PDU 1, outlet X -> off
-* PDU 2, outlet X -> off
-** The <span class="code">power_wait="5"</span> setting for the <span class="code">fence_n0X_pdu2_off</span> primitives will cause a 5 second delay here, giving ample time to ensure the nodes lose power
-* PDU 1, outlet X -> on
-* PDU 2, outlet X -> on
-This is to ensure that both outlets are off at the same time, ensuring that the node loses power. This works because <span class="code">fencing_topology</span> acts serially.
-Putting all this together, we issue this command;
 <syntaxhighlight lang="bash">
-pcs stonith level add 1 an-c03n01.alteeve.ca fence_n01_ipmi
+pcs stonith list
-pcs stonith level add 1 an-c03n02.alteeve.ca fence_n02_ipmi
 </syntaxhighlight>
+<syntaxhighlight lang="text">
-The <span class="code">1</span> tells pacemaker that this is our highest priority fence method. We can see that this was set using pcs;
+fence_alom - Fence agent for Sun ALOM
+fence_apc - Fence agent for APC over telnet/ssh
-<syntaxhighlight lang="bash">
+fence_apc_snmp - Fence agent for APC over SNMP
-pcs stonith level
+fence_baytech - I/O Fencing agent for Baytech RPC switches in combination with a Cyclades Terminal
-</syntaxhighlight>
+                Server
-<syntaxhighlight lang="bash">
+fence_bladecenter - Fence agent for IBM BladeCenter
- Node: an-c03n01.alteeve.ca
+fence_brocade - Fence agent for Brocade over telnet
-  Level 1 - fence_n01_ipmi
+fence_bullpap - I/O Fencing agent for Bull FAME architecture controlled by a PAP management console.
- Node: an-c03n02.alteeve.ca
+fence_cisco_mds - Fence agent for Cisco MDS
-  Level 1 - fence_n02_ipmi
+fence_cisco_ucs - Fence agent for Cisco UCS
-</syntaxhighlight>
+fence_cpint - I/O Fencing agent for GFS on s390 and zSeries VM clusters
+fence_drac - fencing agent for Dell Remote Access Card
-Now we'll tell pacemaker to use the PDUs as the second fence method. Here we tie together the two <span class="code">off</span> calls and the two <span class="code">on</span> calls into a single method.
+fence_drac5 - Fence agent for Dell DRAC CMC/5
+fence_eaton_snmp - Fence agent for Eaton over SNMP
-<syntaxhighlight lang="bash">
+fence_egenera - I/O Fencing agent for the Egenera BladeFrame
-pcs stonith level add 2 an-c03n01.alteeve.ca fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
+fence_eps - Fence agent for ePowerSwitch
-pcs stonith level add 2 an-c03n02.alteeve.ca fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on
+fence_hpblade - Fence agent for HP BladeSystem
-</syntaxhighlight>
+fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
+fence_idrac - Fence agent for IPMI over LAN
-Check again and we'll see that the new methods were added.
+fence_ifmib - Fence agent for IF MIB
+fence_ilo - Fence agent for HP iLO
-<syntaxhighlight lang="bash">
+fence_ilo2 - Fence agent for HP iLO
-pcs stonith level
+fence_ilo3 - Fence agent for IPMI over LAN
+fence_ilo_mp - Fence agent for HP iLO MP
+fence_imm - Fence agent for IPMI over LAN
+fence_intelmodular - Fence agent for Intel Modular
+fence_ipdu - Fence agent for iPDU over SNMP
+fence_ipmilan - Fence agent for IPMI over LAN
+fence_kdump - Fence agent for use with kdump
+fence_ldom - Fence agent for Sun LDOM
+fence_lpar - Fence agent for IBM LPAR
+fence_mcdata - I/O Fencing agent for McData FC switches
+fence_rackswitch - fence_rackswitch - I/O Fencing agent for RackSaver RackSwitch
+fence_rhevm - Fence agent for RHEV-M REST API
+fence_rsa - Fence agent for IBM RSA
+fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
+fence_sanbox2 - Fence agent for QLogic SANBox2 FC switches
+fence_scsi - fence agent for SCSI-3 persistent reservations
+fence_virsh - Fence agent for virsh
+fence_vixel - I/O Fencing agent for Vixel FC switches
+fence_vmware - Fence agent for VMWare
+fence_vmware_soap - Fence agent for VMWare over SOAP API
+fence_wti - Fence agent for WTI
+fence_xcat - I/O Fencing agent for xcat environments
+fence_xenapi - XenAPI based fencing for the Citrix XenServer virtual machines.
+fence_zvm - I/O Fencing agent for GFS on s390 and zSeries VM clusters
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
- Node: an-c03n01.alteeve.ca
+We will use <span class="code">fence_ipmilan</span> and <span class="code">fence_apc_snmp</span>.
-   Level 1 - fence_n01_ipmi
-   Level 2 - fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
+=== Configuring IPMI Fencing ===
- Node: an-c03n02.alteeve.ca
-   Level 1 - fence_n02_ipmi
+Every fence agent has a possibly unique subset of options that can be used. You can see a brief description of these options with the <span class="code">pcs stonith describe fence_X</span> command. Let's look at the options available for <span class="code">fence_ipmilan</span>.
-   Level 2 - fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on
+<syntaxhighlight lang="bash">
+pcs stonith describe fence_ipmilan
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Stonith options for: fence_ipmilan
+  auth: IPMI Lan Auth type (md5, password, or none)
+  ipaddr: IPMI Lan IP to talk to
+  passwd: Password (if required) to control power on IPMI device
+  passwd_script: Script to retrieve password (if required)
+  lanplus: Use Lanplus
+   login: Username/Login (if required) to control power on IPMI device
+   action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
+  timeout: Timeout (sec) for IPMI operation
+  cipher: Ciphersuite to use (same as ipmitool -C parameter)
+  method: Method to fence (onoff or cycle)
+  power_wait: Wait X seconds after on/off operation
+  delay: Wait X seconds before fencing is started
+   privlvl: Privilege level on IPMI device
+   verbose: Verbose mode
 </syntaxhighlight>
-For those of us who are [[XML]] fans, this is what the [[cib]] looks like now:
+One of the nice things about pcs is that it allows us to create a test file to prepare all our changes in. Then, when we're happy with the changes, merge them into the running cluster. So let's make a copy called <span class="code">stonith_cfg</span>
 <syntaxhighlight lang="bash">
-cat /var/lib/pacemaker/cib/cib.xml
+pcs cluster cib stonith_cfg
 </syntaxhighlight>
-<syntaxhighlight lang="xml">
-<cib epoch="18" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Jul 18 13:15:53 2013" update-origin="an-c03n01.alteeve.ca" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="1">
+Now add [[IPMI]] fencing.
-  <configuration>
-    <crm_config>
+<syntaxhighlight lang="bash">
-      <cluster_property_set id="cib-bootstrap-options">
+#                  unique name    fence agent   target node                           device addr             options
-        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.9-dde1c52"/>
+pcs stonith create fence_n01_ipmi fence_ipmilan pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-a04n01.ipmi" action="reboot" login="admin" passwd="secret" delay=15 op monitor interval=60s
-        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
+pcs stonith create fence_n02_ipmi fence_ipmilan pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-a04n02.ipmi" action="reboot" login="admin" passwd="secret" op monitor interval=60s
-        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
+</syntaxhighlight>
-      </cluster_property_set>
-    </crm_config>
+Note that <span class="code">fence_n01_ipmi</span> has a <span class="code">delay=15</span> set but <span class="code">fence_n02_ipmi</span> does not. If the network connection breaks between the two nodes, they will both try to fence each other at the same time. If <span class="code">acpid</span> is running, the slower node will not die right away. It will continue to run for up to four more seconds, ample time for it to also initiate a fence against the faster node. The end result is that both nodes get fenced. The ten-second delay protects against this by causing <span class="code">an-a04n02</span> to pause for <span class="code">10</span> seconds before initiating a fence against <span class="code">an-a04n01</span>. If both nodes are alive, <span class="code">an-a04n02</span> will power off before the 10 seconds pass, so it will never fence <span class="code">an-a04n01</span>. However, if <span class="code">an-a04n01</span> really is dead, after the ten seconds have elapsed, fencing will proceed as normal.
-    <nodes>
-      <node id="1" uname="an-c03n01.alteeve.ca"/>
+{{note|1=At the time of writing, <span class="code">pcmk_reboot_action</span> is needed to override pacemaker's global fence action and <span class="code">pcmk_reboot_action</span> is not recognized by pcs. Both of these issues will be resolved shortly; Pacemaker will honour <span class="code">action="..."</span> in v1.1.10 and pcs will recognize <span class="code">pcmk_*</span> special attributes "real soon now". Until then, the <span class="code">--force</span> switch is needed.}}
-      <node id="2" uname="an-c03n02.alteeve.ca"/>
-    </nodes>
+Next, add the [[PDU]] fencing. This requires distinct "off" and "on" actions for each outlet on each PDU. With two nodes, each with two [[PSU]]s, this translates to eight commands. The "off" commands will be monitored to alert us if the PDU fails for some reason. There is no reason to monitor the "on" actions (it would be redundant). Note also that we don't bother using a "delay". The IPMI fence method will go first, before the PDU actions, so the PDU is already delayed.
-    <resources>
-      <primitive class="stonith" id="fence_n01_ipmi" type="fence_ipmilan">
+<syntaxhighlight lang="bash">
-        <instance_attributes id="fence_n01_ipmi-instance_attributes">
+# Node 1 - off
-          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
+pcs stonith create fence_n01_pdu1_off fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu01" action="off" port="1" op monitor interval="60s"
-          <nvpair id="fence_n01_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-c03n01.ipmi"/>
+pcs stonith create fence_n01_pdu2_off fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu02" action="off" port="1" power_wait="5" op monitor interval="60s"
-          <nvpair id="fence_n01_ipmi-instance_attributes-action" name="action" value="reboot"/>
-          <nvpair id="fence_n01_ipmi-instance_attributes-login" name="login" value="admin"/>
+# Node 1 - on
-          <nvpair id="fence_n01_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
+pcs stonith create fence_n01_pdu1_on fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu01" action="on" port="1"
-          <nvpair id="fence_n01_ipmi-instance_attributes-delay" name="delay" value="15"/>
+pcs stonith create fence_n01_pdu2_on fence_apc_snmp pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="an-pdu02" action="on" port="1"
-        </instance_attributes>
-        <operations>
+# Node 2 - off
-          <op id="fence_n01_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
+pcs stonith create fence_n02_pdu1_off fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu01" action="off" port="2" op monitor interval="60s"
-        </operations>
+pcs stonith create fence_n02_pdu2_off fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu02" action="off" port="2" power_wait="5" op monitor interval="60s"
-      </primitive>
-      <primitive class="stonith" id="fence_n02_ipmi" type="fence_ipmilan">
+# Node 2 - on
-        <instance_attributes id="fence_n02_ipmi-instance_attributes">
+pcs stonith create fence_n02_pdu1_on fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu01" action="on" port="2"
-          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
+pcs stonith create fence_n02_pdu2_on fence_apc_snmp pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="an-pdu02" action="on" port="2"
-          <nvpair id="fence_n02_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-c03n02.ipmi"/>
+</syntaxhighlight>
-          <nvpair id="fence_n02_ipmi-instance_attributes-action" name="action" value="reboot"/>
-          <nvpair id="fence_n02_ipmi-instance_attributes-login" name="login" value="admin"/>
+We can check the new configuration now;
-          <nvpair id="fence_n02_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
-        </instance_attributes>
+<syntaxhighlight lang="bash">
-        <operations>
+pcs status
-          <op id="fence_n02_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
+</syntaxhighlight>
-        </operations>
+<syntaxhighlight lang="text">
-      </primitive>
+Cluster name: an-anvil-04
-      <primitive class="stonith" id="fence_n01_pdu1_off" type="fence_apc_snmp">
+Last updated: Tue Jul  2 16:41:55 2013
-        <instance_attributes id="fence_n01_pdu1_off-instance_attributes">
+Last change: Tue Jul  2 16:41:44 2013 via cibadmin on an-a04n01.alteeve.ca
-          <nvpair id="fence_n01_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
+Stack: corosync
-          <nvpair id="fence_n01_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
+Current DC: an-a04n01.alteeve.ca (1) - partition with quorum
-          <nvpair id="fence_n01_pdu1_off-instance_attributes-action" name="action" value="off"/>
+Version: 1.1.9-3.fc19-781a388
-          <nvpair id="fence_n01_pdu1_off-instance_attributes-port" name="port" value="1"/>
+Nodes configured, unknown expected votes
-        </instance_attributes>
+Resources configured.
-        <operations>
-          <op id="fence_n01_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
-        </operations>
+Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
-      </primitive>
-      <primitive class="stonith" id="fence_n01_pdu2_off" type="fence_apc_snmp">
+Full list of resources:
-        <instance_attributes id="fence_n01_pdu2_off-instance_attributes">
-          <nvpair id="fence_n01_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
+ fence_n01_ipmi	(stonith:fence_ipmilan):	Started an-a04n01.alteeve.ca
-          <nvpair id="fence_n01_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
+ fence_n02_ipmi	(stonith:fence_ipmilan):	Started an-a04n02.alteeve.ca
-          <nvpair id="fence_n01_pdu2_off-instance_attributes-action" name="action" value="off"/>
+ fence_n01_pdu1_off	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca
-          <nvpair id="fence_n01_pdu2_off-instance_attributes-port" name="port" value="1"/>
+ fence_n01_pdu2_off	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca
-          <nvpair id="fence_n01_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
+ fence_n02_pdu1_off	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca
-        </instance_attributes>
+ fence_n02_pdu2_off	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca
-        <operations>
+ fence_n01_pdu1_on	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca
-          <op id="fence_n01_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
+ fence_n01_pdu2_on	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca
-        </operations>
+ fence_n02_pdu1_on	(stonith:fence_apc_snmp):	Started an-a04n01.alteeve.ca
-      </primitive>
+ fence_n02_pdu2_on	(stonith:fence_apc_snmp):	Started an-a04n02.alteeve.ca
-      <primitive class="stonith" id="fence_n01_pdu1_on" type="fence_apc_snmp">
+</syntaxhighlight>
-        <instance_attributes id="fence_n01_pdu1_on-instance_attributes">
-          <nvpair id="fence_n01_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
+Before we proceed, we need to tell pacemaker to use fencing;
-          <nvpair id="fence_n01_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
-          <nvpair id="fence_n01_pdu1_on-instance_attributes-action" name="action" value="on"/>
+<syntaxhighlight lang="bash">
-          <nvpair id="fence_n01_pdu1_on-instance_attributes-port" name="port" value="1"/>
+pcs property set stonith-enabled=true
-        </instance_attributes>
+pcs property
-        <operations>
+</syntaxhighlight>
-          <op id="fence_n01_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
+<syntaxhighlight lang="text">
-        </operations>
+Cluster Properties:
-      </primitive>
+Cluster Properties:
-      <primitive class="stonith" id="fence_n01_pdu2_on" type="fence_apc_snmp">
+ cluster-infrastructure: corosync
-        <instance_attributes id="fence_n01_pdu2_on-instance_attributes">
+ dc-version: 1.1.9-3.fc19-781a388
-          <nvpair id="fence_n01_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n01.alteeve.ca"/>
+ no-quorum-policy: ignore
-          <nvpair id="fence_n01_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
+ stonith-enabled: true
-          <nvpair id="fence_n01_pdu2_on-instance_attributes-action" name="action" value="on"/>
+</syntaxhighlight>
-          <nvpair id="fence_n01_pdu2_on-instance_attributes-port" name="port" value="1"/>
-        </instance_attributes>
+Excellent!
-        <operations>
-          <op id="fence_n01_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
+== Configuring Fence Levels ==
-        </operations>
-      </primitive>
+The goal of fence levels is to tell pacemaker that there are "fence methods" to try and to impose an order on those methods. Each method composes one or more fence primitives and, when 2 or more primitives are tied together, that all primitives must succeed for the overall method to succeed.
-      <primitive class="stonith" id="fence_n02_pdu1_off" type="fence_apc_snmp">
-        <instance_attributes id="fence_n02_pdu1_off-instance_attributes">
+So in our case; the order we want is;
-          <nvpair id="fence_n02_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
-          <nvpair id="fence_n02_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
+* IPMI -> PDUs
-          <nvpair id="fence_n02_pdu1_off-instance_attributes-action" name="action" value="off"/>
-          <nvpair id="fence_n02_pdu1_off-instance_attributes-port" name="port" value="2"/>
+The reason is that when IPMI fencing succeeds, we can be very certain the node is truly fenced. When PDU fencing succeeds, it only confirms that the power outlets were cycled. If someone moved a node's power cables to another outlet, we'll get a false positive. On that topic, tie-down the node's PSU cables to the PDU's cable tray when possible, clearly label the power cables and wrap the fingers of anyone who might move them around.
-        </instance_attributes>
-        <operations>
+The PDU fencing needs to be implemented using four steps;
-          <op id="fence_n02_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
-        </operations>
+* PDU 1, outlet X -> off
-      </primitive>
+* PDU 2, outlet X -> off
-      <primitive class="stonith" id="fence_n02_pdu2_off" type="fence_apc_snmp">
+** The <span class="code">power_wait="5"</span> setting for the <span class="code">fence_n0X_pdu2_off</span> primitives will cause a 5 second delay here, giving ample time to ensure the nodes lose power
-        <instance_attributes id="fence_n02_pdu2_off-instance_attributes">
+* PDU 1, outlet X -> on
-          <nvpair id="fence_n02_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
+* PDU 2, outlet X -> on
-          <nvpair id="fence_n02_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
-          <nvpair id="fence_n02_pdu2_off-instance_attributes-action" name="action" value="off"/>
+This is to ensure that both outlets are off at the same time, ensuring that the node loses power. This works because <span class="code">fencing_topology</span> acts serially.
-          <nvpair id="fence_n02_pdu2_off-instance_attributes-port" name="port" value="2"/>
-          <nvpair id="fence_n02_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
+Putting all this together, we issue this command;
-        </instance_attributes>
-        <operations>
+<syntaxhighlight lang="bash">
-          <op id="fence_n02_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
+pcs stonith level add 1 an-a04n01.alteeve.ca fence_n01_ipmi
-        </operations>
+pcs stonith level add 1 an-a04n02.alteeve.ca fence_n02_ipmi
-      </primitive>
+</syntaxhighlight>
-      <primitive class="stonith" id="fence_n02_pdu1_on" type="fence_apc_snmp">
-        <instance_attributes id="fence_n02_pdu1_on-instance_attributes">
+The <span class="code">1</span> tells pacemaker that this is our highest priority fence method. We can see that this was set using pcs;
-          <nvpair id="fence_n02_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
-          <nvpair id="fence_n02_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-p01"/>
+<syntaxhighlight lang="bash">
-          <nvpair id="fence_n02_pdu1_on-instance_attributes-action" name="action" value="on"/>
+pcs stonith level
-          <nvpair id="fence_n02_pdu1_on-instance_attributes-port" name="port" value="2"/>
+</syntaxhighlight>
-        </instance_attributes>
+<syntaxhighlight lang="bash">
-        <operations>
+ Node: an-a04n01.alteeve.ca
-          <op id="fence_n02_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
+  Level 1 - fence_n01_ipmi
-        </operations>
+ Node: an-a04n02.alteeve.ca
-      </primitive>
+  Level 1 - fence_n02_ipmi
-      <primitive class="stonith" id="fence_n02_pdu2_on" type="fence_apc_snmp">
+</syntaxhighlight>
-        <instance_attributes id="fence_n02_pdu2_on-instance_attributes">
-          <nvpair id="fence_n02_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-c03n02.alteeve.ca"/>
+Now we'll tell pacemaker to use the PDUs as the second fence method. Here we tie together the two <span class="code">off</span> calls and the two <span class="code">on</span> calls into a single method.
-          <nvpair id="fence_n02_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-p02"/>
-          <nvpair id="fence_n02_pdu2_on-instance_attributes-action" name="action" value="on"/>
+<syntaxhighlight lang="bash">
-          <nvpair id="fence_n02_pdu2_on-instance_attributes-port" name="port" value="2"/>
+pcs stonith level add 2 an-a04n01.alteeve.ca fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
-        </instance_attributes>
+pcs stonith level add 2 an-a04n02.alteeve.ca fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on
-        <operations>
-          <op id="fence_n02_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
-        </operations>
-      </primitive>
-    </resources>
-    <constraints/>
-    <fencing-topology>
-      <fencing-level devices="fence_n01_ipmi" id="fl-an-c03n01.alteeve.ca-1" index="1" target="an-c03n01.alteeve.ca"/>
-      <fencing-level devices="fence_n02_ipmi" id="fl-an-c03n02.alteeve.ca-1" index="1" target="an-c03n02.alteeve.ca"/>
-      <fencing-level devices="fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on" id="fl-an-c03n01.alteeve.ca-2" index="2" target="an-c03n01.alteeve.ca"/>
-      <fencing-level devices="fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on" id="fl-an-c03n02.alteeve.ca-2" index="2" target="an-c03n02.alteeve.ca"/>
-    </fencing-topology>
-  </configuration>
-</cib>
 </syntaxhighlight>
-== Fencing using fence_virsh ==
+Check again and we'll see that the new methods were added.
-{{note|1=To write this section, I used two virtual machines called <span class="code">pcmk1</span> and <span class="code">pcmk2</span>.}}
-If you are trying to learn fencing using KVM or Xen virtual machines, you can use the <span class="code">fence_virsh</span>. You can also use <span class="code">[[Fencing KVM Virtual Servers|fence_virtd]]</span>, which is actually recommended by many, but I have found it to be rather unreliable.
-To use <span class="code">fence_virsh</span>, first install it.
 <syntaxhighlight lang="bash">
-yum -y install fence-agents-virsh
+pcs stonith level
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+<syntaxhighlight lang="bash">
-<lots of yum output>
+ Node: an-a04n01.alteeve.ca
+  Level 1 - fence_n01_ipmi
+  Level 2 - fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on
+ Node: an-a04n02.alteeve.ca
+  Level 1 - fence_n02_ipmi
+  Level 2 - fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on
 </syntaxhighlight>
-Now test it from the command line. To do this, we need to know a few things;
+For those of us who are [[XML]] fans, this is what the [[cib]] looks like now:
-* The VM host is at IP <span class="code">192.168.122.1</span>
-* The username and password (<span class="code">-l</span> and <span class="code">-p</span> respectively) are the credentials used to log into VM host over [[SSH]].
-** If you don't want your password to be shown, create a little shell script that simply prints your password and then use <span class="code">-S /path/to/script</span> instead of <span class="code">-p "secret"</span>.
-* The name of the target VM, as shown by <span class="code">virsh list --all</span> on the host, is the node (<span class="code">-n</span>) value. For me, the nodes are called <span class="code">an-c03n01</span> and <span class="code">an-c03n02</span>.
-=== Create the Password Script ===
-In my case, the host is called '<span class="code">lemass</span>', so I want to create a password script called '<span class="code">/root/lemass.pw</span>'. The name of the script is entirely up to you.
-{|class="wikitable"
-!<span class="code">an-c03n01</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-vim /root/lemass.pw
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-echo "my secret password"
-</syntaxhighlight>
 <syntaxhighlight lang="bash">
-chmod 755 /root/lemass.pw
+cat /var/lib/pacemaker/cib/cib.xml
-/root/lemass.pw
+</syntaxhighlight>
+<syntaxhighlight lang="xml">
+<cib epoch="18" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Thu Jul 18 13:15:53 2013" update-origin="an-a04n01.alteeve.ca" update-client="cibadmin" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="1">
+  <configuration>
+    <crm_config>
+      <cluster_property_set id="cib-bootstrap-options">
+        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.9-dde1c52"/>
+        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
+        <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
+      </cluster_property_set>
+    </crm_config>
+    <nodes>
+      <node id="1" uname="an-a04n01.alteeve.ca"/>
+      <node id="2" uname="an-a04n02.alteeve.ca"/>
+    </nodes>
+    <resources>
+      <primitive class="stonith" id="fence_n01_ipmi" type="fence_ipmilan">
+        <instance_attributes id="fence_n01_ipmi-instance_attributes">
+          <nvpair id="fence_n01_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
+          <nvpair id="fence_n01_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-a04n01.ipmi"/>
+          <nvpair id="fence_n01_ipmi-instance_attributes-action" name="action" value="reboot"/>
+          <nvpair id="fence_n01_ipmi-instance_attributes-login" name="login" value="admin"/>
+          <nvpair id="fence_n01_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
+          <nvpair id="fence_n01_ipmi-instance_attributes-delay" name="delay" value="15"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n01_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n02_ipmi" type="fence_ipmilan">
+        <instance_attributes id="fence_n02_ipmi-instance_attributes">
+          <nvpair id="fence_n02_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
+          <nvpair id="fence_n02_ipmi-instance_attributes-ipaddr" name="ipaddr" value="an-a04n02.ipmi"/>
+          <nvpair id="fence_n02_ipmi-instance_attributes-action" name="action" value="reboot"/>
+          <nvpair id="fence_n02_ipmi-instance_attributes-login" name="login" value="admin"/>
+          <nvpair id="fence_n02_ipmi-instance_attributes-passwd" name="passwd" value="secret"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n02_ipmi-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n01_pdu1_off" type="fence_apc_snmp">
+        <instance_attributes id="fence_n01_pdu1_off-instance_attributes">
+          <nvpair id="fence_n01_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
+          <nvpair id="fence_n01_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
+          <nvpair id="fence_n01_pdu1_off-instance_attributes-action" name="action" value="off"/>
+          <nvpair id="fence_n01_pdu1_off-instance_attributes-port" name="port" value="1"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n01_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n01_pdu2_off" type="fence_apc_snmp">
+        <instance_attributes id="fence_n01_pdu2_off-instance_attributes">
+          <nvpair id="fence_n01_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
+          <nvpair id="fence_n01_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
+          <nvpair id="fence_n01_pdu2_off-instance_attributes-action" name="action" value="off"/>
+          <nvpair id="fence_n01_pdu2_off-instance_attributes-port" name="port" value="1"/>
+          <nvpair id="fence_n01_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n01_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n01_pdu1_on" type="fence_apc_snmp">
+        <instance_attributes id="fence_n01_pdu1_on-instance_attributes">
+          <nvpair id="fence_n01_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
+          <nvpair id="fence_n01_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
+          <nvpair id="fence_n01_pdu1_on-instance_attributes-action" name="action" value="on"/>
+          <nvpair id="fence_n01_pdu1_on-instance_attributes-port" name="port" value="1"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n01_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n01_pdu2_on" type="fence_apc_snmp">
+        <instance_attributes id="fence_n01_pdu2_on-instance_attributes">
+          <nvpair id="fence_n01_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n01.alteeve.ca"/>
+          <nvpair id="fence_n01_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
+          <nvpair id="fence_n01_pdu2_on-instance_attributes-action" name="action" value="on"/>
+          <nvpair id="fence_n01_pdu2_on-instance_attributes-port" name="port" value="1"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n01_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n02_pdu1_off" type="fence_apc_snmp">
+        <instance_attributes id="fence_n02_pdu1_off-instance_attributes">
+          <nvpair id="fence_n02_pdu1_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
+          <nvpair id="fence_n02_pdu1_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
+          <nvpair id="fence_n02_pdu1_off-instance_attributes-action" name="action" value="off"/>
+          <nvpair id="fence_n02_pdu1_off-instance_attributes-port" name="port" value="2"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n02_pdu1_off-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n02_pdu2_off" type="fence_apc_snmp">
+        <instance_attributes id="fence_n02_pdu2_off-instance_attributes">
+          <nvpair id="fence_n02_pdu2_off-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
+          <nvpair id="fence_n02_pdu2_off-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
+          <nvpair id="fence_n02_pdu2_off-instance_attributes-action" name="action" value="off"/>
+          <nvpair id="fence_n02_pdu2_off-instance_attributes-port" name="port" value="2"/>
+          <nvpair id="fence_n02_pdu2_off-instance_attributes-power_wait" name="power_wait" value="5"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n02_pdu2_off-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n02_pdu1_on" type="fence_apc_snmp">
+        <instance_attributes id="fence_n02_pdu1_on-instance_attributes">
+          <nvpair id="fence_n02_pdu1_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
+          <nvpair id="fence_n02_pdu1_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu01"/>
+          <nvpair id="fence_n02_pdu1_on-instance_attributes-action" name="action" value="on"/>
+          <nvpair id="fence_n02_pdu1_on-instance_attributes-port" name="port" value="2"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n02_pdu1_on-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+      <primitive class="stonith" id="fence_n02_pdu2_on" type="fence_apc_snmp">
+        <instance_attributes id="fence_n02_pdu2_on-instance_attributes">
+          <nvpair id="fence_n02_pdu2_on-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="an-a04n02.alteeve.ca"/>
+          <nvpair id="fence_n02_pdu2_on-instance_attributes-ipaddr" name="ipaddr" value="an-pdu02"/>
+          <nvpair id="fence_n02_pdu2_on-instance_attributes-action" name="action" value="on"/>
+          <nvpair id="fence_n02_pdu2_on-instance_attributes-port" name="port" value="2"/>
+        </instance_attributes>
+        <operations>
+          <op id="fence_n02_pdu2_on-monitor-interval-60s" interval="60s" name="monitor"/>
+        </operations>
+      </primitive>
+    </resources>
+    <constraints/>
+    <fencing-topology>
+      <fencing-level devices="fence_n01_ipmi" id="fl-an-a04n01.alteeve.ca-1" index="1" target="an-a04n01.alteeve.ca"/>
+      <fencing-level devices="fence_n02_ipmi" id="fl-an-a04n02.alteeve.ca-1" index="1" target="an-a04n02.alteeve.ca"/>
+      <fencing-level devices="fence_n01_pdu1_off,fence_n01_pdu2_off,fence_n01_pdu1_on,fence_n01_pdu2_on" id="fl-an-a04n01.alteeve.ca-2" index="2" target="an-a04n01.alteeve.ca"/>
+      <fencing-level devices="fence_n02_pdu1_off,fence_n02_pdu2_off,fence_n02_pdu1_on,fence_n02_pdu2_on" id="fl-an-a04n02.alteeve.ca-2" index="2" target="an-a04n02.alteeve.ca"/>
+    </fencing-topology>
+  </configuration>
+</cib>
 </syntaxhighlight>
-<syntaxhighlight lang="text">
-my secret password
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-rsync -av /root/lemass.pw root@an-c03n02:/root/
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-sending incremental file list
-lemass.pw
-sent 102 bytes  received 31 bytes  266.00 bytes/sec
-total size is 25  speedup is 0.19
-</syntaxhighlight>
-|-
-!<span class="code">an-c03n02</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-/root/lemass.pw
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-my secret password
-</syntaxhighlight>
-|}
-Done.
-=== Test fence_virsh Status from the Command Line ===
-{|class="wikitable"
-!<span class="code">an-c03n01</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-c03n02 -o status
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Status: ON
-</syntaxhighlight>
-|-
-!<span class="code">an-c03n02</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-c03n01 -o status
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Status: ON
-</syntaxhighlight>
-|}
-Excellent! Now to configure it in pacemaker;
-{|class="wikitable"
-!<span class="code">an-c03n01</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs stonith create fence_n01_virsh fence_virsh pcmk_host_list="an-c03n01.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-c03n01" delay=15 op monitor interval=60s
-pcs stonith create fence_n02_virsh fence_virsh pcmk_host_list="an-c03n02.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-c03n02" op monitor interval=60s
-pcs cluster status
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Cluster Status:
- Last updated: Sun Jan 26 15:45:31 2014
- Last change: Sun Jan 26 15:06:14 2014 via crmd on an-c03n01.alteeve.ca
- Stack: corosync
- Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
- Version: 1.1.10-19.el7-368c726
-Nodes configured
-Resources configured
-PCSD Status:
-an-c03n01.alteeve.ca:
-  an-c03n01.alteeve.ca: Online
-an-c03n02.alteeve.ca:
-  an-c03n02.alteeve.ca: Online
-</syntaxhighlight>
-|}
-=== Test Fencing ===
-ToDo: Kill each node with <span class="code">echo c > /proc/sysrq-trigger</span> and make sure the other node fences it.
-= Shared Storage =
-== DRBD ==
-We will use DRBD 8.4.
-=== Install DRBD 8.4.4 from AN! ===
-{{warning|1=this doesn't work.}}
-ToDo: Make a proper repo
-{|class="wikitable"
-!<span class="code">an-c03n01</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-rpm -Uvh https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-udev-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-utils-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm \
-         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-xen-8.4.4-4.el7.x86_64.rpm
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-</syntaxhighlight>
-|-
-!<span class="code">an-c03n02</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-</syntaxhighlight>
-|}
+== Fencing using fence_virsh ==
-=== Install DRBD 8.4.4 From Source ===
+{{note|1=To write this section, I used two virtual machines called <span class="code">pcmk1</span> and <span class="code">pcmk2</span>.}}
-At this time, no EPEL repo exists for RHEL7, and the Fedora RPMs don't work, so we will install DRBD 8.4.4 from source.
+If you are trying to learn fencing using KVM or Xen virtual machines, you can use the <span class="code">fence_virsh</span>. You can also use <span class="code">[[Fencing KVM Virtual Servers|fence_virtd]]</span>, which is actually recommended by many, but I have found it to be rather unreliable.
-Install dependencies:
+To use <span class="code">fence_virsh</span>, first install it.
 <syntaxhighlight lang="bash">
-yum -y install gcc flex rpm-build wget kernel-devel
+yum -y install fence-agents-virsh
-wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
+</syntaxhighlight>
-tar -xvzf drbd-8.4.4.tar.gz
+<syntaxhighlight lang="text">
-cd drbd-8.4.4
+<lots of yum output>
-./configure \
-  --prefix=/usr \
-  --localstatedir=/var \
-  --sysconfdir=/etc \
-  --with-km \
-  --with-udev \
-  --with-pacemaker \
-  --with-bashcompletion \
-  --with-utils \
-  --without-xen \
-  --without-rgmanager \
-  --without-heartbeat
-make
-make install
 </syntaxhighlight>
-Don't let DRBD start on boot (pacemaker will handle it for us).
+Now test it from the command line. To do this, we need to know a few things;
+* The VM host is at IP <span class="code">192.168.122.1</span>
+* The username and password (<span class="code">-l</span> and <span class="code">-p</span> respectively) are the credentials used to log into VM host over [[SSH]].
+** If you don't want your password to be shown, create a little shell script that simply prints your password and then use <span class="code">-S /path/to/script</span> instead of <span class="code">-p "secret"</span>.
+* The name of the target VM, as shown by <span class="code">virsh list --all</span> on the host, is the node (<span class="code">-n</span>) value. For me, the nodes are called <span class="code">an-a04n01</span> and <span class="code">an-a04n02</span>.
+=== Create the Password Script ===
+In my case, the host is called '<span class="code">lemass</span>', so I want to create a password script called '<span class="code">/root/lemass.pw</span>'. The name of the script is entirely up to you.
-<syntaxhighlight lang="bash">
+{|class="wikitable"
-systemctl disable drbd.service
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+vim /root/lemass.pw
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-drbd.service is not a native service, redirecting to /sbin/chkconfig.
+echo "my secret password"
-Executing /sbin/chkconfig drbd off
 </syntaxhighlight>
-Done.
-=== Optional; Make RPMs ===
-{{warning|1=I've not been able to get the RPMs genreated here to install yet. I'd recommend skipping this, unless you want to help sort out the problems. :) }}
-After <span class="code">./configure</span> above, you can make RPMs instead of installing directly.
-Dependencies:
 <syntaxhighlight lang="bash">
-yum install rpmdevtools redhat-rpm-config kernel-devel
+chmod 755 /root/lemass.pw
+/root/lemass.pw
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-<install text>
+my secret password
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+rsync -av /root/lemass.pw root@an-a04n02:/root/
 </syntaxhighlight>
+<syntaxhighlight lang="bash">
+sending incremental file list
+lemass.pw
-Setup RPM dev tree:
+sent 102 bytes  received 31 bytes  266.00 bytes/sec
+total size is 25  speedup is 0.19
-<syntaxhighlight lang="bash">
+</syntaxhighlight>
-cd ~
+|-
-rpmdev-setuptree
+!<span class="code">an-a04n02</span>
-ls -lah ~/rpmbuild/
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
+/root/lemass.pw
-tar -xvzf drbd-8.4.4.tar.gz
-cd drbd-8.4.4
-./configure \
-  --prefix=/usr \
-  --localstatedir=/var \
-  --sysconfdir=/etc \
-  --with-km \
-  --with-udev \
-  --with-pacemaker \
-  --with-bashcompletion \
-  --with-utils \
-  --without-xen \
-  --without-heartbeat
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-total 4.0K
+my secret password
-drwxr-xr-x. 7 root root   67 Dec 23 20:06 .
-dr-xr-x---. 6 root root 4.0K Dec 23 20:06 ..
-drwxr-xr-x. 2 root root    6 Dec 23 20:06 BUILD
-drwxr-xr-x. 2 root root    6 Dec 23 20:06 RPMS
-drwxr-xr-x. 2 root root    6 Dec 23 20:06 SOURCES
-drwxr-xr-x. 2 root root    6 Dec 23 20:06 SPECS
-drwxr-xr-x. 2 root root    6 Dec 23 20:06 SRPMS
 </syntaxhighlight>
+|}
-Userland tools:
+Done.
-<syntaxhighlight lang="bash">
+=== Test fence_virsh Status from the Command Line ===
-make rpm
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-a04n02 -o status
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-checking for presence of 8\.4\.4 in various changelog files
+Status: ON
-<snip>
-+ exit 0
-You have now:
-/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm
 </syntaxhighlight>
+|-
-Kernel module:
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-<syntaxhighlight lang="bash">
+fence_virsh -a 192.168.122.1 -l root -S /root/lemass.pw -n an-a04n01 -o status
-make kmp-rpm
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-checking for presence of 8\.4\.4 in various changelog files
+Status: ON
-<snip>
+</syntaxhighlight>
-+ exit 0
+|}
-You have now:
-/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
+Excellent! Now to configure it in pacemaker;
-/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
+{|class="wikitable"
-/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
+!<span class="code">an-a04n01</span>
-/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
+pcs stonith create fence_n01_virsh fence_virsh pcmk_host_list="an-a04n01.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-a04n01" delay=15 op monitor interval=60s
-/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
+pcs stonith create fence_n02_virsh fence_virsh pcmk_host_list="an-a04n02.alteeve.ca" ipaddr="192.168.122.1" action="reboot" login="root" passwd_script="/root/lemass.pw" port="an-a04n02" op monitor interval=60s
-/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm
+pcs cluster status
-/root/rpmbuild/RPMS/x86_64/kmod-drbd-8.4.4_3.10.0_54.0.1-4.el7.x86_64.rpm
-/root/rpmbuild/RPMS/x86_64/drbd-kernel-debuginfo-8.4.4-4.el7.x86_64.rpm
 </syntaxhighlight>
+<syntaxhighlight lang="text">
+Cluster Status:
+ Last updated: Sun Jan 26 15:45:31 2014
+ Last change: Sun Jan 26 15:06:14 2014 via crmd on an-a04n01.alteeve.ca
+ Stack: corosync
+ Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
+ Version: 1.1.10-19.el7-368c726
+Nodes configured
+Resources configured
-=== Configure DRBD ===
+PCSD Status:
+an-a04n01.alteeve.ca:
+  an-a04n01.alteeve.ca: Online
+an-a04n02.alteeve.ca:
+  an-a04n02.alteeve.ca: Online
+</syntaxhighlight>
+|}
-Configure <span class="code">global-common.conf</span>;
+=== Test Fencing ===
-<syntaxhighlight lang="bash">
+ToDo: Kill each node with <span class="code">echo c > /proc/sysrq-trigger</span> and make sure the other node fences it.
-vim /etc/drbd.d/global_common.conf
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-# These are options to set for the DRBD daemon sets the default values for
-# resources.
-global {
-	# This tells DRBD that you allow it to report this installation to
-	# LINBIT for statistical purposes. If you have privacy concerns, set
-	# this to 'no'. The default is 'ask' which will prompt you each time
-	# DRBD is updated. Set to 'yes' to allow it without being prompted.
-	usage-count no;
-	# minor-count dialog-refresh disable-ip-verification
+= Shared Storage =
-}
-common {
+== DRBD ==
-	handlers {
-		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-		local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
-		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
-		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
-		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
-		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
-		# Hook into Pacemaker's fencing.
-		fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
-	}
-	startup {
+We will use DRBD 8.4.
-		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
-	}
-	options {
+=== Install DRBD 8.4.4 from AN! ===
-		# cpu-mask on-no-data-accessible
-	}
-	disk {
+{{warning|1=this doesn't work.}}
-		# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
-		# disk-drain md-flushes resync-rate resync-after al-extents
-                # c-plan-ahead c-delay-target c-fill-target c-max-rate
-                # c-min-rate disk-timeout
-                fencing resource-and-stonith;
-	}
-	net {
+ToDo: Make a proper repo
-		# protocol timeout max-epoch-size max-buffers unplug-watermark
-		# connect-int ping-int sndbuf-size rcvbuf-size ko-count
-		# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
-		# after-sb-1pri after-sb-2pri always-asbp rr-conflict
-		# ping-timeout data-integrity-alg tcp-cork on-congestion
-		# congestion-fill congestion-extents csums-alg verify-alg
-		# use-rle
-		# Protocol "C" tells DRBD not to tell the operating system that
+{|class="wikitable"
-		# the write is complete until the data has reach persistent
+!<span class="code">an-a04n01</span>
-		# storage on both nodes. This is the slowest option, but it is
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-		# also the only one that guarantees consistency between the
+rpm -Uvh https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-8.4.4-4.el7.x86_64.rpm \
-		# nodes. It is also required for dual-primary, which we will
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm \
-		# be using.
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm \
-		protocol C;
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-udev-8.4.4-4.el7.x86_64.rpm \
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-utils-8.4.4-4.el7.x86_64.rpm \
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm \
+         https://alteeve.ca/files/AN-Cluster_Tutorial_3/drbd84/drbd-xen-8.4.4-4.el7.x86_64.rpm
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+</syntaxhighlight>
+|}
+=== Install DRBD 8.4.4 From Source ===
-		# Tell DRBD to allow dual-primary. This is needed to enable
+At this time, no EPEL repo exists for RHEL7, and the Fedora RPMs don't work, so we will install DRBD 8.4.4 from source.
-		# live-migration of our servers.
-		allow-two-primaries yes;
-		# This tells DRBD what to do in the case of a split-brain when
+Install dependencies:
-		# neither node was primary, when one node was primary and when
-		# both nodes are primary. In our case, we'll be running
+<syntaxhighlight lang="bash">
-		# dual-primary, so we can not safely recover automatically. The
+yum -y install gcc flex rpm-build wget kernel-devel
-		# only safe option is for the nodes to disconnect from one
+wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
-		# another and let a human decide which node to invalidate. Of
+tar -xvzf drbd-8.4.4.tar.gz
-		after-sb-0pri discard-zero-changes;
+cd drbd-8.4.4
-		after-sb-1pri discard-secondary;
+./configure \
-		after-sb-2pri disconnect;
+  --prefix=/usr \
-	}
+  --localstatedir=/var \
-}
+  --sysconfdir=/etc \
+  --with-km \
+  --with-udev \
+  --with-pacemaker \
+  --with-bashcompletion \
+  --with-utils \
+  --without-xen \
+  --without-rgmanager \
+  --without-heartbeat
+make
+make install
 </syntaxhighlight>
-And now configure the first resource;
+Don't let DRBD start on boot (pacemaker will handle it for us).
 <syntaxhighlight lang="bash">
-vim /etc/drbd.d/r0.res
+systemctl disable drbd.service
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+<syntaxhighlight lang="text">
-# This is the first DRBD resource. If will store the shared file systems and
+drbd.service is not a native service, redirecting to /sbin/chkconfig.
-# the servers designed to run on node 01.
+Executing /sbin/chkconfig drbd off
-resource r0 {
+</syntaxhighlight>
-	# These options here are common to both nodes. If for some reason you
-	# need to set unique values per node, you can move these to the
+Done.
-	# 'on <name> { ... }' section.
+=== Optional; Make RPMs ===
-	# This sets the device name of this DRBD resouce.
-	device /dev/drbd0;
-	# This tells DRBD what the backing device is for this resource.
+{{warning|1=I've not been able to get the RPMs genreated here to install yet. I'd recommend skipping this, unless you want to help sort out the problems. :) }}
-	disk /dev/sda5;
-	# This controls the location of the metadata. When "internal" is used,
+After <span class="code">./configure</span> above, you can make RPMs instead of installing directly.
-	# as we use here, a little space at the end of the backing devices is
-	# set aside (roughly 32 MB per 1 TB of raw storage). External metadata
-	# can be used to put the metadata on another partition when converting
-	# existing file systems to be DRBD backed, when there is no extra space
-	# available for the metadata.
-	meta-disk internal;
-	# NOTE: this is not required or even recommended with pacemaker. remove
+Dependencies:
-	# 	this options as soon as pacemaker is setup.
-	startup {
+<syntaxhighlight lang="bash">
-		# This tells DRBD to promote both nodes to 'primary' when this
+yum install rpmdevtools redhat-rpm-config kernel-devel
-		# resource starts. However, we will let pacemaker control this
+</syntaxhighlight>
-		# so we comment it out, which tells DRBD to leave both nodes
+<syntaxhighlight lang="text">
-		# as secondary when drbd starts.
+<install text>
-		#become-primary-on both;
+</syntaxhighlight>
-	}
-	# NOTE: Later, make it an option in the dashboard to trigger a manual
+Setup RPM dev tree:
-	# 	verify and/or schedule periodic automatic runs
-	net {
-		# TODO: Test performance differences between sha1 and md5
-		# This tells DRBD how to do a block-by-block verification of
-		# the data stored on the backing devices. Any verification
-		# failures will result in the effected block being marked
-		# out-of-sync.
-		verify-alg md5;
-		# TODO: Test the performance hit of this being enabled.
+<syntaxhighlight lang="bash">
-		# This tells DRBD to generate a checksum for each transmitted
+cd ~
-		# packet. If the data received data doesn't generate the same
+rpmdev-setuptree
-		# sum, a retransmit request is generated. This protects against
+ls -lah ~/rpmbuild/
-		# otherwise-undetected errors in transmission, like
+wget -c http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz
-		# bit-flipping. See:
+tar -xvzf drbd-8.4.4.tar.gz
-		# http://www.drbd.org/users-guide/s-integrity-check.html
+cd drbd-8.4.4
-		data-integrity-alg md5;
+./configure \
-	}
+  --prefix=/usr \
+  --localstatedir=/var \
+  --sysconfdir=/etc \
+  --with-km \
+  --with-udev \
+  --with-pacemaker \
+  --with-bashcompletion \
+  --with-utils \
+  --without-xen \
+  --without-heartbeat
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+total 4.0K
+drwxr-xr-x. 7 root root   67 Dec 23 20:06 .
+dr-xr-x---. 6 root root 4.0K Dec 23 20:06 ..
+drwxr-xr-x. 2 root root    6 Dec 23 20:06 BUILD
+drwxr-xr-x. 2 root root    6 Dec 23 20:06 RPMS
+drwxr-xr-x. 2 root root    6 Dec 23 20:06 SOURCES
+drwxr-xr-x. 2 root root    6 Dec 23 20:06 SPECS
+drwxr-xr-x. 2 root root    6 Dec 23 20:06 SRPMS
+</syntaxhighlight>
-	# WARNING: Confirm that these are safe when the controller's BBU is
+Userland tools:
-	#          depleted/failed and the controller enters write-through
-	#          mode.
-	disk {
-		# TODO: Test the real-world performance differences gained with
-		#       these options.
-		# This tells DRBD not to bypass the write-back caching on the
-		# RAID controller. Normally, DRBD forces the data to be flushed
-		# to disk, rather than allowing the write-back cachine to
-		# handle it. Normally this is dangerous, but with BBU-backed
-		# caching, it is safe. The first option disables disk flushing
-		# and the second disabled metadata flushes.
-		disk-flushes no;
-		md-flushes no;
-	}
-	# This sets up the resource on node 01. The name used below must be the
+<syntaxhighlight lang="bash">
-	# named returned by "uname -n".
+make rpm
-	on an-c03n01.alteeve.ca {
-		# This is the address and port to use for DRBD traffic on this
-		# node. Multiple resources can use the same IP but the ports
-		# must differ. By convention, the first resource uses 7788, the
-		# second uses 7789 and so on, incrementing by one for each
-		# additional resource.
-		address 10.10.30.1:7788;
-	}
-	on an-c03n02.alteeve.ca {
-		address 10.10.30.2:7788;
-	}
-}
 </syntaxhighlight>
+<syntaxhighlight lang="text">
-Disable <span class="code">drbd</span> from starting on boot.
+checking for presence of 8\.4\.4 in various changelog files
+<snip>
-<syntaxhighlight lang="bash">
++ exit 0
-systemctl disable drbd.service
+You have now:
-</syntaxhighlight>
+/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
-<syntaxhighlight lang="text">
+/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
-drbd.service is not a native service, redirecting to /sbin/chkconfig.
+/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
-Executing /sbin/chkconfig drbd off
+/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm
 </syntaxhighlight>
-Load the config;
+Kernel module:
 <syntaxhighlight lang="bash">
-modprobe drbd
+make kmp-rpm
-</syntaxhighlight>
-Now check the config;
-<syntaxhighlight lang="bash">
-drbdadm dump
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-  --==  Thank you for participating in the global usage survey  ==--
+checking for presence of 8\.4\.4 in various changelog files
-The server's response is:
+<snip>
++ exit 0
-you are the 69th user to install this version
+You have now:
-/etc/drbd.d/r0.res:3: in resource r0:
+/root/rpmbuild/RPMS/x86_64/drbd-8.4.4-4.el7.x86_64.rpm
-become-primary-on is set to both, but allow-two-primaries is not set.
+/root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.4-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/kmod-drbd-8.4.4_3.10.0_54.0.1-4.el7.x86_64.rpm
+/root/rpmbuild/RPMS/x86_64/drbd-kernel-debuginfo-8.4.4-4.el7.x86_64.rpm
 </syntaxhighlight>
-Ignore that error. It has been reported and does not effect operation.
+=== Configure DRBD ===
-Create the metadisk;
+Configure <span class="code">global-common.conf</span>;
 <syntaxhighlight lang="bash">
-drbdadm create-md r0
+vim /etc/drbd.d/global_common.conf
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Writing meta data...
-initializing activity log
-NOT initializing bitmap
-New drbd meta data block successfully created.
-success
 </syntaxhighlight>
+<syntaxhighlight lang="bash">
+# These are options to set for the DRBD daemon sets the default values for
+# resources.
+global {
+	# This tells DRBD that you allow it to report this installation to
+	# LINBIT for statistical purposes. If you have privacy concerns, set
+	# this to 'no'. The default is 'ask' which will prompt you each time
+	# DRBD is updated. Set to 'yes' to allow it without being prompted.
+	usage-count no;
-Start the DRBD resource on both nodes;
+	# minor-count dialog-refresh disable-ip-verification
+}
-<syntaxhighlight lang="bash">
+common {
-drbdadm up r0
+	handlers {
-</syntaxhighlight>
+		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
+		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
+		local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
+		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
+		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
+		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
+		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
+		# Hook into Pacemaker's fencing.
+		fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
+	}
-Once <span class="code">/proc/drbd</span> shows both nodes connected, force one to primary and it will sync over the second.
+	startup {
+		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
+	}
-<syntaxhighlight lang="bash">
+	options {
-drbdadm primary --force r0
+		# cpu-mask on-no-data-accessible
-</syntaxhighlight>
+	}
-You should see the resource syncing now. Push both nodes to primary;
+	disk {
+		# size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
+		# disk-drain md-flushes resync-rate resync-after al-extents
+                # c-plan-ahead c-delay-target c-fill-target c-max-rate
+                # c-min-rate disk-timeout
+                fencing resource-and-stonith;
+	}
-<syntaxhighlight lang="bash">
+	net {
-drbdadm primary r0
+		# protocol timeout max-epoch-size max-buffers unplug-watermark
-</syntaxhighlight>
+		# connect-int ping-int sndbuf-size rcvbuf-size ko-count
+		# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
+		# after-sb-1pri after-sb-2pri always-asbp rr-conflict
+		# ping-timeout data-integrity-alg tcp-cork on-congestion
+		# congestion-fill congestion-extents csums-alg verify-alg
+		# use-rle
-== DLM, Clustered LVM and GFS2 ==
+		# Protocol "C" tells DRBD not to tell the operating system that
+		# the write is complete until the data has reach persistent
-{|class="wikitable"
+		# storage on both nodes. This is the slowest option, but it is
-!<span class="code">an-c03n01</span>
+		# also the only one that guarantees consistency between the
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+		# nodes. It is also required for dual-primary, which we will
-sed -i.anvil 's^filter = \[ "a/\.\*/" \]^filter = \[ "a|/dev/drbd*|", "r/.*/" \]^' /etc/lvm/lvm.conf
+		# be using.
-sed -i 's/locking_type = 1$/locking_type = 3/' /etc/lvm/lvm.conf
+		protocol C;
-sed -i 's/fallback_to_local_locking = 1$/fallback_to_local_locking = 0/' /etc/lvm/lvm.conf
-sed -i 's/use_lvmetad = 1$/use_lvmetad = 0/' /etc/lvm/lvm.conf
+		# Tell DRBD to allow dual-primary. This is needed to enable
+		# live-migration of our servers.
+		allow-two-primaries yes;
+		# This tells DRBD what to do in the case of a split-brain when
+		# neither node was primary, when one node was primary and when
+		# both nodes are primary. In our case, we'll be running
+		# dual-primary, so we can not safely recover automatically. The
+		# only safe option is for the nodes to disconnect from one
+		# another and let a human decide which node to invalidate. Of
+		after-sb-0pri discard-zero-changes;
+		after-sb-1pri discard-secondary;
+		after-sb-2pri disconnect;
+	}
+}
 </syntaxhighlight>
-<syntaxhighlight lang="diff">
---- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
-+++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.026928464 -0500
-@@ -84,7 +84,7 @@
-     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
-     # By default we accept every block device:
--    filter = [ "a/.*/" ]
-+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
-     # Exclude the cdrom drive
-     # filter = [ "r|/dev/cdrom|" ]
-@@ -451,7 +451,7 @@
-     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
-     # is set at the same time, LVM always issues a warning message about this
-     # and then it automatically disables lvmetad use.
--    locking_type = 1
-+    locking_type = 3
-     # Set to 0 to fail when a lock request cannot be satisfied immediately.
-     wait_for_locks = 1
-@@ -467,7 +467,7 @@
-     # to 1 an attempt will be made to use local file-based locking (type 1).
-     # If this succeeds, only commands against local volume groups will proceed.
-     # Volume Groups marked as clustered will be ignored.
--    fallback_to_local_locking = 1
-+    fallback_to_local_locking = 0
-     # Local non-LV directory that holds file-based locks while commands are
-     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
-@@ -594,7 +594,7 @@
-     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
-     # is set at the same time, LVM always issues a warning message about this
-     # and then it automatically disables lvmetad use.
--    use_lvmetad = 1
-+    use_lvmetad = 0
-     # Full path of the utility called to check that a thin metadata device
-     # is in a state that allows it to be used.
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-rsync -av /etc/lvm/lvm.conf* root@an-c03n02:/etc/lvm/
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-sending incremental file list
-lvm.conf
-lvm.conf.anvil
-sent 48536 bytes  received 440 bytes  97952.00 bytes/sec
+And now configure the first resource;
-total size is 90673  speedup is 1.85
-</syntaxhighlight>
+<syntaxhighlight lang="bash">
-|-
+vim /etc/drbd.d/r0.res
-!<span class="code">an-c03n02</span>
+</syntaxhighlight>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+<syntaxhighlight lang="bash">
-diff -u /etc/lvm/lvm.conf.anvil /etc/lvm/lvm.conf
+# This is the first DRBD resource. If will store the shared file systems and
-</syntaxhighlight>
+# the servers designed to run on node 01.
-<syntaxhighlight lang="diff">
+resource r0 {
---- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
+	# These options here are common to both nodes. If for some reason you
-+++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.000000000 -0500
+	# need to set unique values per node, you can move these to the
-@@ -84,7 +84,7 @@
+	# 'on <name> { ... }' section.
-     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
+	# This sets the device name of this DRBD resouce.
-     # By default we accept every block device:
+	device /dev/drbd0;
--    filter = [ "a/.*/" ]
-+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
+	# This tells DRBD what the backing device is for this resource.
+	disk /dev/sda5;
-     # Exclude the cdrom drive
-     # filter = [ "r|/dev/cdrom|" ]
+	# This controls the location of the metadata. When "internal" is used,
-@@ -451,7 +451,7 @@
+	# as we use here, a little space at the end of the backing devices is
-     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
+	# set aside (roughly 32 MB per 1 TB of raw storage). External metadata
-     # is set at the same time, LVM always issues a warning message about this
+	# can be used to put the metadata on another partition when converting
-     # and then it automatically disables lvmetad use.
+	# existing file systems to be DRBD backed, when there is no extra space
--    locking_type = 1
+	# available for the metadata.
-+    locking_type = 3
+	meta-disk internal;
-     # Set to 0 to fail when a lock request cannot be satisfied immediately.
+	# NOTE: this is not required or even recommended with pacemaker. remove
-     wait_for_locks = 1
+	# 	this options as soon as pacemaker is setup.
-@@ -467,7 +467,7 @@
+	startup {
-     # to 1 an attempt will be made to use local file-based locking (type 1).
+		# This tells DRBD to promote both nodes to 'primary' when this
-     # If this succeeds, only commands against local volume groups will proceed.
+		# resource starts. However, we will let pacemaker control this
-     # Volume Groups marked as clustered will be ignored.
+		# so we comment it out, which tells DRBD to leave both nodes
--    fallback_to_local_locking = 1
+		# as secondary when drbd starts.
-+    fallback_to_local_locking = 0
+		#become-primary-on both;
+	}
-     # Local non-LV directory that holds file-based locks while commands are
-     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
+	# NOTE: Later, make it an option in the dashboard to trigger a manual
-@@ -594,7 +594,7 @@
+	# 	verify and/or schedule periodic automatic runs
-     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
+	net {
-     # is set at the same time, LVM always issues a warning message about this
+		# TODO: Test performance differences between sha1 and md5
-     # and then it automatically disables lvmetad use.
+		# This tells DRBD how to do a block-by-block verification of
--    use_lvmetad = 1
+		# the data stored on the backing devices. Any verification
-+    use_lvmetad = 0
+		# failures will result in the effected block being marked
+		# out-of-sync.
-     # Full path of the utility called to check that a thin metadata device
+		verify-alg md5;
-     # is in a state that allows it to be used.
-</syntaxhighlight>
+		# TODO: Test the performance hit of this being enabled.
-|}
+		# This tells DRBD to generate a checksum for each transmitted
+		# packet. If the data received data doesn't generate the same
+		# sum, a retransmit request is generated. This protects against
+		# otherwise-undetected errors in transmission, like
+		# bit-flipping. See:
+		# http://www.drbd.org/users-guide/s-integrity-check.html
+		data-integrity-alg md5;
+	}
-Disable <span class="code">lvmetad</span> as it's not cluster-aware.
+	# WARNING: Confirm that these are safe when the controller's BBU is
+	#          depleted/failed and the controller enters write-through
-{|class="wikitable"
+	#          mode.
-!<span class="code">an-c03n01</span>
+	disk {
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+		# TODO: Test the real-world performance differences gained with
-systemctl disable lvm2-lvmetad.service
+		#       these options.
-systemctl disable lvm2-lvmetad.socket
+		# This tells DRBD not to bypass the write-back caching on the
-systemctl stop lvm2-lvmetad.service
+		# RAID controller. Normally, DRBD forces the data to be flushed
+		# to disk, rather than allowing the write-back cachine to
+		# handle it. Normally this is dangerous, but with BBU-backed
+		# caching, it is safe. The first option disables disk flushing
+		# and the second disabled metadata flushes.
+		disk-flushes no;
+		md-flushes no;
+	}
+	# This sets up the resource on node 01. The name used below must be the
+	# named returned by "uname -n".
+	on an-a04n01.alteeve.ca {
+		# This is the address and port to use for DRBD traffic on this
+		# node. Multiple resources can use the same IP but the ports
+		# must differ. By convention, the first resource uses 7788, the
+		# second uses 7789 and so on, incrementing by one for each
+		# additional resource.
+		address 10.10.40.1:7788;
+	}
+	on an-a04n02.alteeve.ca {
+		address 10.10.40.2:7788;
+	}
+}
+</syntaxhighlight>
+Disable <span class="code">drbd</span> from starting on boot.
+<syntaxhighlight lang="bash">
+systemctl disable drbd.service
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'
+drbd.service is not a native service, redirecting to /sbin/chkconfig.
+Executing /sbin/chkconfig drbd off
+</syntaxhighlight>
+Load the config;
+<syntaxhighlight lang="bash">
+modprobe drbd
 </syntaxhighlight>
-|-
-!<span class="code">an-c03n02</span>
+Now check the config;
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-systemctl disable lvm2-lvmetad.service
+<syntaxhighlight lang="bash">
-systemctl disable lvm2-lvmetad.socket
+drbdadm dump
-systemctl stop lvm2-lvmetad.service
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'
+  --==  Thank you for participating in the global usage survey  ==--
+The server's response is:
+you are the 69th user to install this version
+/etc/drbd.d/r0.res:3: in resource r0:
+become-primary-on is set to both, but allow-two-primaries is not set.
 </syntaxhighlight>
-|}
-{{note|1=This will be moved to pacemaker shortly. We're enabling it here just long enough to configure pacemaker.}}
+Ignore that error. It has been reported and does not effect operation.
-Start DLM and clvmd;
+Create the metadisk;
-{|class="wikitable"
+<syntaxhighlight lang="bash">
-!<span class="code">an-c03n01</span>
+drbdadm create-md r0
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-systemctl start dlm.service
-systemctl start clvmd.service
 </syntaxhighlight>
-|-
+<syntaxhighlight lang="text">
-!<span class="code">an-c03n02</span>
+Writing meta data...
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+initializing activity log
-systemctl start dlm.service
+NOT initializing bitmap
-systemctl start clvmd.service
+New drbd meta data block successfully created.
+success
 </syntaxhighlight>
-|}
-Create the [[PV]], [[VG]] and the <span class="code">/shared</span> [[LV]];
+Start the DRBD resource on both nodes;
-{|class="wikitable"
+<syntaxhighlight lang="bash">
-!<span class="code">an-c03n01</span>
+drbdadm up r0
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pvcreate /dev/drbd0
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  Physical volume "/dev/drbd0" successfully created
 </syntaxhighlight>
+Once <span class="code">/proc/drbd</span> shows both nodes connected, force one to primary and it will sync over the second.
 <syntaxhighlight lang="bash">
-vgcreate an-c03n01_vg0 /dev/drbd0
+drbdadm primary --force r0
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  /proc/devices: No entry for device-mapper found
-  Clustered volume group "an-c03n01_vg0" successfully created
 </syntaxhighlight>
+You should see the resource syncing now. Push both nodes to primary;
 <syntaxhighlight lang="bash">
-lvcreate -L 10G -n shared an-c03n01_vg0
+drbdadm primary r0
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  Logical volume "shared" created
-</syntaxhighlight>
-|-
-!<span class="code">an-c03n02</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pvscan
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  PV /dev/drbd0   VG an-c03n01_vg0   lvm2 [20.00 GiB / 20.00 GiB free]
-  Total: 1 [20.00 GiB] / in use: 1 [20.00 GiB] / in no VG: 0 [0   ]
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
-vgscan
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  Reading all physical volumes.  This may take a while...
-  Found volume group "an-c03n01_vg0" using metadata type lvm2
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-lvscan
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-  ACTIVE            '/dev/an-c03n01_vg0/shared' [10.00 GiB] inherit
-</syntaxhighlight>
-|}
-Format the <span class="code">/dev/an-c03n01_vg0/shared</span>;
+== DLM, Clustered LVM and GFS2 ==
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-mkfs.gfs2 -j 2 -p lock_dlm -t an-cluster-03:shared /dev/an-c03n01_vg0/shared
+sed -i.anvil 's^filter = \[ "a/\.\*/" \]^filter = \[ "a|/dev/drbd*|", "r/.*/" \]^' /etc/lvm/lvm.conf
-</syntaxhighlight>
+sed -i 's/locking_type = 1$/locking_type = 3/' /etc/lvm/lvm.conf
-<syntaxhighlight lang="text">
+sed -i 's/fallback_to_local_locking = 1$/fallback_to_local_locking = 0/' /etc/lvm/lvm.conf
-/dev/an-c03n01_vg0/shared is a symbolic link to /dev/dm-0
+sed -i 's/use_lvmetad = 1$/use_lvmetad = 0/' /etc/lvm/lvm.conf
-This will destroy any data on /dev/dm-0
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+<syntaxhighlight lang="diff">
-Are you sure you want to proceed? [y/n]y
+--- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
-</syntaxhighlight>
++++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.026928464 -0500
-<syntaxhighlight lang="text">
+@@ -84,7 +84,7 @@
-Device:                    /dev/an-c03n01_vg0/shared
+     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
-Block size:                4096
-Device size:               10.00 GB (2621440 blocks)
+     # By default we accept every block device:
-Filesystem size:           10.00 GB (2621438 blocks)
+-    filter = [ "a/.*/" ]
-Journals:                  2
++    filter = [ "a|/dev/drbd*|", "r/.*/" ]
-Resource groups:           40
-Locking protocol:          "lock_dlm"
+     # Exclude the cdrom drive
-Lock table:                "an-cluster-03:shared"
+     # filter = [ "r|/dev/cdrom|" ]
-UUID:                      20bafdb0-1f86-f424-405b-9bf608c0c486
+@@ -451,7 +451,7 @@
-</syntaxhighlight>
+     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
-<syntaxhighlight lang="bash">
+     # is set at the same time, LVM always issues a warning message about this
-mkdir /shared
+     # and then it automatically disables lvmetad use.
-mount /dev/an-c03n01_vg0/shared /shared
+-    locking_type = 1
-df -h
++    locking_type = 3
+     # Set to 0 to fail when a lock request cannot be satisfied immediately.
+     wait_for_locks = 1
+@@ -467,7 +467,7 @@
+     # to 1 an attempt will be made to use local file-based locking (type 1).
+     # If this succeeds, only commands against local volume groups will proceed.
+     # Volume Groups marked as clustered will be ignored.
+-    fallback_to_local_locking = 1
++    fallback_to_local_locking = 0
+     # Local non-LV directory that holds file-based locks while commands are
+     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
+@@ -594,7 +594,7 @@
+     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
+     # is set at the same time, LVM always issues a warning message about this
+     # and then it automatically disables lvmetad use.
+-    use_lvmetad = 1
++    use_lvmetad = 0
+     # Full path of the utility called to check that a thin metadata device
+     # is in a state that allows it to be used.
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+rsync -av /etc/lvm/lvm.conf* root@an-a04n02:/etc/lvm/
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Filesystem                         Size  Used Avail Use% Mounted on
+sending incremental file list
-/dev/vda3                           18G  5.6G   12G  32% /
+lvm.conf
-devtmpfs                           932M     0  932M   0% /dev
+lvm.conf.anvil
-tmpfs                              937M   61M  877M   7% /dev/shm
-tmpfs                              937M  1.9M  935M   1% /run
+sent 48536 bytes  received 440 bytes  97952.00 bytes/sec
-tmpfs                              937M     0  937M   0% /sys/fs/cgroup
+total size is 90673  speedup is 1.85
-/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
-/dev/vda1                          484M   83M  401M  18% /boot
-/dev/mapper/an--c03n01_vg0-shared   10G  259M  9.8G   3% /shared
 </syntaxhighlight>
 |-
-!<span class="code">an-c03n02</span>
+!<span class="code">an-a04n02</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+diff -u /etc/lvm/lvm.conf.anvil /etc/lvm/lvm.conf
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+<syntaxhighlight lang="diff">
-Filesystem                         Size  Used Avail Use% Mounted on
+--- /etc/lvm/lvm.conf.anvil	2013-11-27 03:28:08.000000000 -0500
-/dev/vda3                           18G  5.6G   12G  32% /
++++ /etc/lvm/lvm.conf	2014-01-26 18:57:41.000000000 -0500
-devtmpfs                           932M     0  932M   0% /dev
+@@ -84,7 +84,7 @@
-tmpfs                              937M   76M  862M   9% /dev/shm
+     # lvmetad is used" comment that is attached to global/use_lvmetad setting.
-tmpfs                              937M  2.0M  935M   1% /run
-tmpfs                              937M     0  937M   0% /sys/fs/cgroup
+     # By default we accept every block device:
-/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
+-    filter = [ "a/.*/" ]
-/dev/vda1                          484M   83M  401M  18% /boot
++    filter = [ "a|/dev/drbd*|", "r/.*/" ]
-/dev/mapper/an--c03n01_vg0-shared   10G  259M  9.8G   3% /shared
+     # Exclude the cdrom drive
+     # filter = [ "r|/dev/cdrom|" ]
+@@ -451,7 +451,7 @@
+     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
+     # is set at the same time, LVM always issues a warning message about this
+     # and then it automatically disables lvmetad use.
+-    locking_type = 1
++    locking_type = 3
+     # Set to 0 to fail when a lock request cannot be satisfied immediately.
+     wait_for_locks = 1
+@@ -467,7 +467,7 @@
+     # to 1 an attempt will be made to use local file-based locking (type 1).
+     # If this succeeds, only commands against local volume groups will proceed.
+     # Volume Groups marked as clustered will be ignored.
+-    fallback_to_local_locking = 1
++    fallback_to_local_locking = 0
+     # Local non-LV directory that holds file-based locks while commands are
+     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
+@@ -594,7 +594,7 @@
+     # supported in clustered environment. If use_lvmetad=1 and locking_type=3
+     # is set at the same time, LVM always issues a warning message about this
+     # and then it automatically disables lvmetad use.
+-    use_lvmetad = 1
++    use_lvmetad = 0
+     # Full path of the utility called to check that a thin metadata device
+     # is in a state that allows it to be used.
 </syntaxhighlight>
 |}
-Shut down <span class="code">gfs2</span>, <span class="code">clvmd</span> and <span class="code">drbd</span> now.
+Disable <span class="code">lvmetad</span> as it's not cluster-aware.
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-umount /shared/
+systemctl disable lvm2-lvmetad.service
-systemctl stop clvmd.service
+systemctl disable lvm2-lvmetad.socket
-drbdadm down r0
+systemctl stop lvm2-lvmetad.service
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'
 </syntaxhighlight>
 |-
-!<span class="code">an-c03n02</span>
+!<span class="code">an-a04n02</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-umount /shared/
+systemctl disable lvm2-lvmetad.service
-systemctl stop clvmd.service
+systemctl disable lvm2-lvmetad.socket
-drbdadm down r0
+systemctl stop lvm2-lvmetad.service
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+rm '/etc/systemd/system/sockets.target.wants/lvm2-lvmetad.socket'
 </syntaxhighlight>
 |}
-Done.
+{{note|1=This will be moved to pacemaker shortly. We're enabling it here just long enough to configure pacemaker.}}
-= Add Storage to Pacemaker =
+Start DLM and clvmd;
-== Configure Dual-Primary DRBD ==
-Setup DRBD as a dual-primary resource.
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs cluster cib drbd_cfg
+systemctl start dlm.service
-pcs -f drbd_cfg resource create drbd_r0 ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
+systemctl start clvmd.service
-pcs -f drbd_cfg resource master drbd_r0_Clone drbd_r0 master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
-pcs cluster cib-push drbd_cfg
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+|-
-CIB updated
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+systemctl start dlm.service
+systemctl start clvmd.service
 </syntaxhighlight>
 |}
-Give it a couple minutes to promote both nodes to <span class="code">Master</span> on both nodes. Initially, it will appear as <span class="code">Master</span> on one node only.
+Create the [[PV]], [[VG]] and the <span class="code">/shared</span> [[LV]];
-Once updated, you should see this:
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs status
+pvcreate /dev/drbd0
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+  Physical volume "/dev/drbd0" successfully created
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+vgcreate an-a04n01_vg0 /dev/drbd0
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster name: an-cluster-03
+  /proc/devices: No entry for device-mapper found
-Last updated: Sun Jan 26 20:26:33 2014
+  Clustered volume group "an-a04n01_vg0" successfully created
-Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-c03n01.alteeve.ca
+</syntaxhighlight>
-Stack: corosync
+<syntaxhighlight lang="bash">
-Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
+lvcreate -L 10G -n shared an-a04n01_vg0
-Version: 1.1.10-19.el7-368c726
+</syntaxhighlight>
-Nodes configured
+<syntaxhighlight lang="text">
-Resources configured
+   Logical volume "shared" created
-Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-Full list of resources:
- fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
- fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
- Master/Slave Set: drbd_r0_Clone [drbd_r0]
-     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-PCSD Status:
-an-c03n01.alteeve.ca:
-  an-c03n01.alteeve.ca: Online
-an-c03n02.alteeve.ca:
-  an-c03n02.alteeve.ca: Online
-Daemon Status:
-  corosync: active/disabled
-  pacemaker: active/disabled
-   pcsd: active/enabled
 </syntaxhighlight>
 |-
-!<span class="code">an-c03n02</span>
+!<span class="code">an-a04n02</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs status
+pvscan
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+  PV /dev/drbd0   VG an-a04n01_vg0   lvm2 [20.00 GiB / 20.00 GiB free]
+  Total: 1 [20.00 GiB] / in use: 1 [20.00 GiB] / in no VG: 0 [0   ]
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+vgscan
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster name: an-cluster-03
+  Reading all physical volumes.  This may take a while...
-Last updated: Sun Jan 26 20:26:58 2014
+  Found volume group "an-a04n01_vg0" using metadata type lvm2
-Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-c03n01.alteeve.ca
+</syntaxhighlight>
-Stack: corosync
+<syntaxhighlight lang="bash">
-Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
+lvscan
-Version: 1.1.10-19.el7-368c726
+</syntaxhighlight>
-Nodes configured
+<syntaxhighlight lang="text">
-Resources configured
+  ACTIVE            '/dev/an-a04n01_vg0/shared' [10.00 GiB] inherit
-Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-Full list of resources:
- fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
- fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
- Master/Slave Set: drbd_r0_Clone [drbd_r0]
-     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-PCSD Status:
-an-c03n01.alteeve.ca:
-  an-c03n01.alteeve.ca: Online
-an-c03n02.alteeve.ca:
-  an-c03n02.alteeve.ca: Online
-Daemon Status:
-  corosync: active/disabled
-  pacemaker: active/disabled
-  pcsd: active/enabled
 </syntaxhighlight>
 |}
-== Configure DLM ==
+Format the <span class="code">/dev/an-a04n01_vg0/shared</span>;
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs cluster cib dlm_cfg
+mkfs.gfs2 -j 2 -p lock_dlm -t an-anvil-04:shared /dev/an-a04n01_vg0/shared
-pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=60s
-pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
-pcs cluster cib-push dlm_cfg
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-CIB updated
+/dev/an-a04n01_vg0/shared is a symbolic link to /dev/dm-0
+This will destroy any data on /dev/dm-0
 </syntaxhighlight>
-|-
+<syntaxhighlight lang="text">
-!<span class="code">an-c03n02</span>
+Are you sure you want to proceed? [y/n]y
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs status
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster name: an-cluster-03
+Device:                    /dev/an-a04n01_vg0/shared
-Last updated: Sun Jan 26 20:34:36 2014
+Block size:                4096
-Last change: Sun Jan 26 20:33:31 2014 via cibadmin on an-c03n01.alteeve.ca
+Device size:               10.00 GB (2621440 blocks)
-Stack: corosync
+Filesystem size:           10.00 GB (2621438 blocks)
-Current DC: an-c03n02.alteeve.ca (2) - partition with quorum
+Journals:                  2
-Version: 1.1.10-19.el7-368c726
+Resource groups:           40
-Nodes configured
+Locking protocol:          "lock_dlm"
-Resources configured
+Lock table:                "an-anvil-04:shared"
+UUID:                      20bafdb0-1f86-f424-405b-9bf608c0c486
+</syntaxhighlight>
-Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
+<syntaxhighlight lang="bash">
+mkdir /shared
-Full list of resources:
+mount /dev/an-a04n01_vg0/shared /shared
+df -h
- fence_n01_virsh	(stonith:fence_virsh):	Started an-c03n01.alteeve.ca
- fence_n02_virsh	(stonith:fence_virsh):	Started an-c03n02.alteeve.ca
- Master/Slave Set: drbd_r0_Clone [drbd_r0]
-     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
- Clone Set: dlm-clone [dlm]
-     Started: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-PCSD Status:
-an-c03n01.alteeve.ca:
-  an-c03n01.alteeve.ca: Online
-an-c03n02.alteeve.ca:
-  an-c03n02.alteeve.ca: Online
-Daemon Status:
-  corosync: active/disabled
-  pacemaker: active/disabled
-  pcsd: active/enabled
 </syntaxhighlight>
-|}
+<syntaxhighlight lang="text">
+Filesystem                         Size  Used Avail Use% Mounted on
-== Configure Cluster LVM ==
+/dev/vda3                           18G  5.6G   12G  32% /
+devtmpfs                           932M     0  932M   0% /dev
-{|class="wikitable"
+tmpfs                              937M   61M  877M   7% /dev/shm
-!<span class="code">an-c03n01</span>
+tmpfs                              937M  1.9M  935M   1% /run
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+tmpfs                              937M     0  937M   0% /sys/fs/cgroup
-pcs cluster cib clvmd_cfg
+/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
-pcs -f clvmd_cfg resource create clvmd lsb:clvmd params daemon_timeout=30s op monitor interval=60s
+/dev/vda1                          484M   83M  401M  18% /boot
-pcs -f clvmd_cfg resource clone clvmd clone-max=2 clone-node-max=1
+/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared
-pcs -f clvmd_cfg constraint colocation add dlm-clone clvmd-clone INFINITY
-pcs -f clvmd_cfg constraint order start dlm then start clvmd-clone
-pcs cluster cib-push clvmd_cfg</syntaxhighlight>
-<syntaxhighlight lang="text">
-CIB updated
 </syntaxhighlight>
 |-
-!<span class="code">an-c03n02</span>
+!<span class="code">an-a04n02</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs status
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Cluster name: an-cluster-03
+Filesystem                         Size  Used Avail Use% Mounted on
-Last updated: Mon Jan 27 19:00:33 2014
+/dev/vda3                           18G  5.6G   12G  32% /
-Last change: Mon Jan 27 19:00:19 2014 via crm_resource on an-c03n01.alteeve.ca
+devtmpfs                           932M     0  932M   0% /dev
-Stack: corosync
+tmpfs                              937M   76M  862M   9% /dev/shm
-Current DC: an-c03n01.alteeve.ca (1) - partition with quorum
+tmpfs                              937M  2.0M  935M   1% /run
-Version: 1.1.10-19.el7-368c726
+tmpfs                              937M     0  937M   0% /sys/fs/cgroup
-Nodes configured
+/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
-Resources configured
+/dev/vda1                          484M   83M  401M  18% /boot
+/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared
-Online: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-Full list of resources:
-  fence_n01_virsh        (stonith:fence_virsh):  Started an-c03n01.alteeve.ca
-  fence_n02_virsh        (stonith:fence_virsh):  Started an-c03n02.alteeve.ca
-  Master/Slave Set: drbd_r0_Clone [drbd_r0]
-     Masters: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-  Clone Set: dlm-clone [dlm]
-     Started: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-  Clone Set: clvmd-clone [clvmd]
-     Started: [ an-c03n01.alteeve.ca an-c03n02.alteeve.ca ]
-PCSD Status:
-an-c03n01.alteeve.ca:
-  an-c03n01.alteeve.ca: Online
-an-c03n02.alteeve.ca:
-   an-c03n02.alteeve.ca: Online
-Daemon Status:
-  corosync: active/disabled
-  pacemaker: active/disabled
-   pcsd: active/enabled
 </syntaxhighlight>
 |}
-== Configure the /shared GFS2 Partition ==
+Shut down <span class="code">gfs2</span>, <span class="code">clvmd</span> and <span class="code">drbd</span> now.
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs cluster cib fs_cfg
+umount /shared/
-pcs -f fs_cfg resource create sharedFS Filesystem device="/dev/an-c03n01_vg0/shared" directory="/shared" fstype="gfs2"
+systemctl stop clvmd.service
-pcs -f fs_cfg resource clone sharedFS
+drbdadm down r0
-pcs cluster cib-push fs_cfg
 </syntaxhighlight>
-<syntaxhighlight lang="text">
+|-
-CIB updated
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+umount /shared/
+systemctl stop clvmd.service
+drbdadm down r0
 </syntaxhighlight>
-<syntaxhighlight lang="bash">
+|}
-df -h
+Done.
+= Add Storage to Pacemaker =
+== Configure Dual-Primary DRBD ==
+Setup DRBD as a dual-primary resource.
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs cluster cib drbd_cfg
+pcs -f drbd_cfg resource create drbd_r0 ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
+pcs -f drbd_cfg resource master drbd_r0_Clone drbd_r0 master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
+pcs cluster cib-push drbd_cfg
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Filesystem                         Size  Used Avail Use% Mounted on
+CIB updated
-/dev/vda3                           18G  5.6G   12G  32% /
-devtmpfs                           932M     0  932M   0% /dev
-tmpfs                              937M   61M  877M   7% /dev/shm
-tmpfs                              937M  2.2M  935M   1% /run
-tmpfs                              937M     0  937M   0% /sys/fs/cgroup
-/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
-/dev/vda1                          484M   83M  401M  18% /boot
-/dev/mapper/an--c03n01_vg0-shared   10G  259M  9.8G   3% /shared
 </syntaxhighlight>
-|-
+|}
-!<span class="code">an-c03n02</span>
-|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+Give it a couple minutes to promote both nodes to <span class="code">Master</span> on both nodes. Initially, it will appear as <span class="code">Master</span> on one node only.
-df -h
-</syntaxhighlight>
+Once updated, you should see this:
-<syntaxhighlight lang="text">
-Filesystem                         Size  Used Avail Use% Mounted on
+{|class="wikitable"
-/dev/vda3                           18G  5.6G   12G  32% /
+!<span class="code">an-a04n01</span>
-devtmpfs                           932M     0  932M   0% /dev
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-tmpfs                              937M   76M  862M   9% /dev/shm
+pcs status
-tmpfs                              937M  2.6M  935M   1% /run
+</syntaxhighlight>
-tmpfs                              937M     0  937M   0% /sys/fs/cgroup
+<syntaxhighlight lang="text">
-/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
+Cluster name: an-anvil-04
-/dev/vda1                          484M   83M  401M  18% /boot
+Last updated: Sun Jan 26 20:26:33 2014
-/dev/mapper/an--c03n01_vg0-shared   10G  259M  9.8G   3% /shared
+Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-a04n01.alteeve.ca
+Stack: corosync
+Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
+Version: 1.1.10-19.el7-368c726
+Nodes configured
+Resources configured
+Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+Full list of resources:
+ fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca
+ fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca
+ Master/Slave Set: drbd_r0_Clone [drbd_r0]
+     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+PCSD Status:
+an-a04n01.alteeve.ca:
+  an-a04n01.alteeve.ca: Online
+an-a04n02.alteeve.ca:
+  an-a04n02.alteeve.ca: Online
+Daemon Status:
+  corosync: active/disabled
+  pacemaker: active/disabled
+  pcsd: active/enabled
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs status
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Cluster name: an-anvil-04
+Last updated: Sun Jan 26 20:26:58 2014
+Last change: Sun Jan 26 20:23:23 2014 via cibadmin on an-a04n01.alteeve.ca
+Stack: corosync
+Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
+Version: 1.1.10-19.el7-368c726
+Nodes configured
+Resources configured
+Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+Full list of resources:
+ fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca
+ fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca
+ Master/Slave Set: drbd_r0_Clone [drbd_r0]
+     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+PCSD Status:
+an-a04n01.alteeve.ca:
+  an-a04n01.alteeve.ca: Online
+an-a04n02.alteeve.ca:
+  an-a04n02.alteeve.ca: Online
+Daemon Status:
+  corosync: active/disabled
+  pacemaker: active/disabled
+  pcsd: active/enabled
+</syntaxhighlight>
+|}
+== Configure DLM ==
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs cluster cib dlm_cfg
+pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=60s
+pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
+pcs cluster cib-push dlm_cfg
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+CIB updated
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs status
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Cluster name: an-anvil-04
+Last updated: Sun Jan 26 20:34:36 2014
+Last change: Sun Jan 26 20:33:31 2014 via cibadmin on an-a04n01.alteeve.ca
+Stack: corosync
+Current DC: an-a04n02.alteeve.ca (2) - partition with quorum
+Version: 1.1.10-19.el7-368c726
+Nodes configured
+Resources configured
+Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+Full list of resources:
+ fence_n01_virsh	(stonith:fence_virsh):	Started an-a04n01.alteeve.ca
+ fence_n02_virsh	(stonith:fence_virsh):	Started an-a04n02.alteeve.ca
+ Master/Slave Set: drbd_r0_Clone [drbd_r0]
+     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+ Clone Set: dlm-clone [dlm]
+     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+PCSD Status:
+an-a04n01.alteeve.ca:
+  an-a04n01.alteeve.ca: Online
+an-a04n02.alteeve.ca:
+  an-a04n02.alteeve.ca: Online
+Daemon Status:
+  corosync: active/disabled
+  pacemaker: active/disabled
+  pcsd: active/enabled
+</syntaxhighlight>
+|}
+== Configure Cluster LVM ==
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs cluster cib clvmd_cfg
+pcs -f clvmd_cfg resource create clvmd lsb:clvmd params daemon_timeout=30s op monitor interval=60s
+pcs -f clvmd_cfg resource clone clvmd clone-max=2 clone-node-max=1
+pcs -f clvmd_cfg constraint colocation add dlm-clone clvmd-clone INFINITY
+pcs -f clvmd_cfg constraint order start dlm then start clvmd-clone
+pcs cluster cib-push clvmd_cfg</syntaxhighlight>
+<syntaxhighlight lang="text">
+CIB updated
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs status
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Cluster name: an-anvil-04
+Last updated: Mon Jan 27 19:00:33 2014
+Last change: Mon Jan 27 19:00:19 2014 via crm_resource on an-a04n01.alteeve.ca
+Stack: corosync
+Current DC: an-a04n01.alteeve.ca (1) - partition with quorum
+Version: 1.1.10-19.el7-368c726
+Nodes configured
+Resources configured
+Online: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+Full list of resources:
+ fence_n01_virsh        (stonith:fence_virsh):  Started an-a04n01.alteeve.ca
+ fence_n02_virsh        (stonith:fence_virsh):  Started an-a04n02.alteeve.ca
+ Master/Slave Set: drbd_r0_Clone [drbd_r0]
+     Masters: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+ Clone Set: dlm-clone [dlm]
+     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+ Clone Set: clvmd-clone [clvmd]
+     Started: [ an-a04n01.alteeve.ca an-a04n02.alteeve.ca ]
+PCSD Status:
+an-a04n01.alteeve.ca:
+  an-a04n01.alteeve.ca: Online
+an-a04n02.alteeve.ca:
+  an-a04n02.alteeve.ca: Online
+Daemon Status:
+  corosync: active/disabled
+  pacemaker: active/disabled
+  pcsd: active/enabled
+</syntaxhighlight>
+|}
+== Configure the /shared GFS2 Partition ==
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs cluster cib fs_cfg
+pcs -f fs_cfg resource create sharedFS Filesystem device="/dev/an-a04n01_vg0/shared" directory="/shared" fstype="gfs2"
+pcs -f fs_cfg resource clone sharedFS
+pcs cluster cib-push fs_cfg
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+CIB updated
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+df -h
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Filesystem                         Size  Used Avail Use% Mounted on
+/dev/vda3                           18G  5.6G   12G  32% /
+devtmpfs                           932M     0  932M   0% /dev
+tmpfs                              937M   61M  877M   7% /dev/shm
+tmpfs                              937M  2.2M  935M   1% /run
+tmpfs                              937M     0  937M   0% /sys/fs/cgroup
+/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
+/dev/vda1                          484M   83M  401M  18% /boot
+/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+df -h
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Filesystem                         Size  Used Avail Use% Mounted on
+/dev/vda3                           18G  5.6G   12G  32% /
+devtmpfs                           932M     0  932M   0% /dev
+tmpfs                              937M   76M  862M   9% /dev/shm
+tmpfs                              937M  2.6M  935M   1% /run
+tmpfs                              937M     0  937M   0% /sys/fs/cgroup
+/dev/loop0                         4.4G  4.4G     0 100% /mnt/dvd
+/dev/vda1                          484M   83M  401M  18% /boot
+/dev/mapper/an--a03n01_vg0-shared   10G  259M  9.8G   3% /shared
+</syntaxhighlight>
+|}
+== Configuring Constraints ==
+{|class="wikitable"
+!<span class="code">an-a04n01</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs cluster cib cst_cfg
+pcs -f cst_cfg constraint order start dlm then promote drbd_r0_Clone
+pcs -f cst_cfg constraint order promote drbd_r0_Clone then start clvmd-clone
+pcs -f cst_cfg constraint order promote clvmd-clone then start sharedFS-clone
+pcs cluster cib-push cst_cfg
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+CIB updated
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+pcs constraint show
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Location Constraints:
+Ordering Constraints:
+  start dlm then promote drbd_r0_Clone
+  promote drbd_r0_Clone then start clvmd-clone
+  start clvmd-clone then start sharedFS-clone
+Colocation Constraints:
+</syntaxhighlight>
+|-
+!<span class="code">an-a04n02</span>
+|style="white-space: nowrap;"|<syntaxhighlight lang="bash">
+pcs constraint show
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+Location Constraints:
+Ordering Constraints:
+  start dlm then promote drbd_r0_Clone
+  promote drbd_r0_Clone then start clvmd-clone
+  start clvmd-clone then start sharedFS-clone
+Colocation Constraints:
+</syntaxhighlight>
+|}
+= Odds and Sods =
+This is a section for random notes. The stuff here will be integrated into the finished tutorial or removed.
+== Determine multicast Address ==
+Useful if you need to ensure that your switch has persistent multicast addresses set.
+<syntaxhighlight lang="bash">
+corosync-cmapctl | grep mcastaddr
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+totem.interface.0.mcastaddr (str) = 239.192.122.199
+</syntaxhighlight>
+<syntaxhighlight lang="bash">
+</syntaxhighlight>
+<syntaxhighlight lang="text">
+</syntaxhighlight>
+<syntaxhighlight lang="diff">
 </syntaxhighlight>
-|}
-== Configuring Constraints ==
 {|class="wikitable"
-!<span class="code">an-c03n01</span>
+!<span class="code">an-a04n01</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs cluster cib cst_cfg
-pcs -f cst_cfg constraint order start dlm then promote drbd_r0_Clone
-pcs -f cst_cfg constraint order promote drbd_r0_Clone then start clvmd-clone
-pcs -f cst_cfg constraint order promote clvmd-clone then start sharedFS-clone
-pcs cluster cib-push cst_cfg
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-CIB updated
-</syntaxhighlight>
-<syntaxhighlight lang="bash">
-pcs constraint show
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-Location Constraints:
-Ordering Constraints:
-  start dlm then promote drbd_r0_Clone
-  promote drbd_r0_Clone then start clvmd-clone
-  start clvmd-clone then start sharedFS-clone
-Colocation Constraints:
 </syntaxhighlight>
 |-
-!<span class="code">an-c03n02</span>
+!<span class="code">an-a04n02</span>
 |style="white-space: nowrap;"|<syntaxhighlight lang="bash">
-pcs constraint show
 </syntaxhighlight>
 <syntaxhighlight lang="text">
-Location Constraints:
-Ordering Constraints:
-  start dlm then promote drbd_r0_Clone
-  promote drbd_r0_Clone then start clvmd-clone
-  start clvmd-clone then start sharedFS-clone
-Colocation Constraints:
 </syntaxhighlight>
 |}
-= Odds and Sods =
+<span class="highlight_warning"></span>
+<span class="field"></span>
-This is a section for random notes. The stuff here will be integrated into the finished tutorial or removed.
+<span class="button"></span>
-== Determine multicast Address ==
-Useful if you need to ensure that your switch has persistent multicast addresses set.
-<syntaxhighlight lang="bash">
-corosync-cmapctl | grep mcastaddr
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-totem.interface.0.mcastaddr (str) = 239.192.122.199
-</syntaxhighlight>
 <span class="code"></span>
-<syntaxhighlight lang="bash">
-</syntaxhighlight>
-<syntaxhighlight lang="text">
-</syntaxhighlight>
-<syntaxhighlight lang="diff">
-</syntaxhighlight>
 = Notes =
 * [http://blog.clusterlabs.org/blog/2013/pacemaker-logging/ Pacemaker Logging]
+* Editing cib.xml offline is possible with: <span class="code">CIB_file=/path/to/real/cib.xml cibadmin ....</span> and sync to other nodes when done.
 = Thanks =

Anvil! Tutorial 3: Difference between revisions

Latest revision as of 16:48, 19 November 2016

Before We Begin

OS Setup

Post OS Install

If you are running RHEL

Adding LINBIT Repos

Install

Making ssh faster when the net is down

Configuring the network

Setting the Hostname

Network

Setup The hosts File

Setup SSH

Populating And Pushing ~/.ssh/known_hosts

Keeping Time in Sync

Configuring IPMI

Configuring the Anvil!

Enable the pcs Daemon

Initializing the Cluster

Start the Cluster For the First Time

Disabling Quorum

Enabling and Configuring Fencing

Configuring IPMI Fencing

Configuring Fence Levels

Fencing using fence_virsh

Create the Password Script

Test fence_virsh Status from the Command Line

Test Fencing

Shared Storage

DRBD

Install DRBD 8.4.4 from AN!

Install DRBD 8.4.4 From Source

Optional; Make RPMs

Configure DRBD

DLM, Clustered LVM and GFS2

Add Storage to Pacemaker

Configure Dual-Primary DRBD

Configure DLM

Configure Cluster LVM

Configure the /shared GFS2 Partition

Configuring Constraints

Odds and Sods

Determine multicast Address

Notes

Thanks

Navigation menu