2-Node Red Hat KVM Cluster Tutorial - Quick guide

From Alteeve Wiki
Jump to navigation Jump to search

 AN!Wiki :: How To :: 2-Node Red Hat KVM Cluster Tutorial - Quick guide

Warning: This document is old, abandoned and very out of date. DON'T USE ANYTHING HERE! Consider it only as historical note taking.

This is a "cookbook" version of the complete 2-Node Red Hat KVM Cluster Tutorial tutorial. It is designed for walking through all the steps needed to build a cluster, without any explanation at all. This is only useful to people who've already read the full tutorial and want something of a "cluster build checklist".

OS Install

This section is based on a minimal install of RHEL and CentOS 6.4, x86_64. Please follow your favourite installation guide to complete the initial minimal install before proceeding.

The nods used in this cookbook are installed with the following settings;

  • Disk Partitioning;
    • Nodes use hardware RAID which presents the virtual drive as /dev/sda. You can use mdadm "software" RAID if you wish, simply replace the /dev/sdaX with the /dev/mdY devices you create.
    • 500 MiB for /boot
    • 4096 MiB for <swap>
    • 40960 MiB for /
    • The remainder of the free space will be used for the DRBD resources later in the tutorial.
  • Networking;
    • Six network interfaces plus an additional dedicated IPMI interface are used. For the OS install, setting an interface to DHCP is sufficient. Detailed configuration will be done later.

Setup SSH

ssh-keygen -t rsa -N "" -b 4095 -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub
vim ~/.ssh/authorized_keys

Install Apps

# Apps
yum -y update
yum -y install cman corosync rgmanager ricci gfs2-utils ntp libvirt lvm2-cluster qemu-kvm qemu-kvm-tools virt-install virt-viewer syslinux wget gpm rsync acpid ccs \
               freeipmi freeipmi-bmc-watchdog freeipmi-ipmidetectd OpenIPMI OpenIPMI-libs OpenIPMI-perl OpenIPMI-tools cpan perl-YAML-Tiny perl-Net-SSLeay gcc make \
               perl-CGI fence-agents syslinux openssl-devel vim man bridge-utils dmidecode openssh-clients perl rsync screen telnet
yum -y groupinstall development

# DRBD
rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://elrepo.org/elrepo-release-6-4.el6.elrepo.noarch.rpm
yum -y install drbd83-utils kmod-drbd83

# Remove
yum -y remove NetworkManager

# Automate the install of the perl modules.
export PERL_MM_USE_DEFAULT=1
perl -MCPAN -e 'install("YAML")'
perl -MCPAN -e 'install Moose::Role'
perl -MCPAN -e 'install Throwable::Error'
perl -MCPAN -e 'install Email::Sender::Transport::SMTP::TLS'
#

Set ricci's Password

passwd ricci

Restore Detailed Boot Screen

Note: This can take a minute or three to finish, be patient.
plymouth-set-default-theme details -R

Enable and Disable Daemons

chkconfig acpid off
chkconfig iptables off
chkconfig ip6tables off
chkconfig network on
chkconfig ntpd on
chkconfig ricci on
chkconfig modclusterd on
chkconfig drbd off
chkconfig clvmd off
chkconfig gfs2 off
chkconfig cman off
chkconfig rgmanager off
chkconfig ipmi on

chkconfig --list acpid
chkconfig --list iptables
chkconfig --list ip6tables
chkconfig --list network
chkconfig --list ntpd
chkconfig --list ricci
chkconfig --list modclusterd
chkconfig --list drbd
chkconfig --list clvmd
chkconfig --list gfs2
chkconfig --list cman
chkconfig --list rgmanager
chkconfig --list ipmi
#

Start Daemons

/etc/init.d/iptables stop
/etc/init.d/ip6tables stop
/etc/init.d/ntpd start
/etc/init.d/ricci start
/etc/init.d/modclusterd start
/etc/init.d/ipmi start
#

Configure Networking

Note: This assumes you've already renamed your ifcfg-ethX files.

Destroy the libvirtd bridge if needed.

If libvirtd is not yet running:

cat /dev/null >/etc/libvirt/qemu/networks/default.xml

Otherwise, if libvirtd has started:

virsh net-destroy default
virsh net-autostart default --disable
virsh net-undefine default
/etc/init.d/iptables stop

Backup the existing network files and create the bond and bridge files.

mkdir /root/backups/
rsync -av /etc/sysconfig/network-scripts/ifcfg-eth* /root/backups/
touch /etc/sysconfig/network-scripts/ifcfg-bond{0..2}
touch /etc/sysconfig/network-scripts/ifcfg-vbr2
Warning: Be sure you use your MAC addresses in the HWADDR="..." lines below.

Bridge:

vim /etc/sysconfig/network-scripts/ifcfg-vbr2
# Internet-Facing Network - Bridge
DEVICE="vbr2"
TYPE="Bridge"
BOOTPROTO="static"
IPADDR="10.255.0.1"
NETMASK="255.255.0.0"
GATEWAY="10.255.255.254"
DNS1="8.8.8.8"
DNS2="8.8.4.4"
DEFROUTE="yes"

Bonds:

vim /etc/sysconfig/network-scripts/ifcfg-bond0
# Back-Channel Network - Bond
DEVICE="bond0"
BOOTPROTO="static"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=eth0"
IPADDR="10.20.0.1"
NETMASK="255.255.0.0"
vim /etc/sysconfig/network-scripts/ifcfg-bond1
# Storage Network - Bond
DEVICE="bond1"
BOOTPROTO="static"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=eth1"
IPADDR="10.10.0.1"
NETMASK="255.255.0.0"
vim /etc/sysconfig/network-scripts/ifcfg-bond2
# Internet-Facing Network - Bond
DEVICE="bond2"
BRIDGE="vbr2"
BOOTPROTO="none"
NM_CONTROLLED="no"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=eth2"

Ethernet devices:

vim /etc/sysconfig/network-scripts/ifcfg-eth0
# Back-Channel Network - Link 1
HWADDR="00:E0:81:C7:EC:49"
DEVICE="eth0"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond0"
SLAVE="yes"
vim /etc/sysconfig/network-scripts/ifcfg-eth1
# Storage Network - Link 1
HWADDR="00:E0:81:C7:EC:48"
DEVICE="eth1"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond1"
SLAVE="yes"
vim /etc/sysconfig/network-scripts/ifcfg-eth2
# Internet-Facing Network - Link 1
HWADDR="00:E0:81:C7:EC:47"
DEVICE="eth2"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond2"
SLAVE="yes"
vim /etc/sysconfig/network-scripts/ifcfg-eth3
# Back-Channel Network - Link 2
HWADDR="00:1B:21:9D:59:FC"
DEVICE="eth3"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond0"
SLAVE="yes"
vim /etc/sysconfig/network-scripts/ifcfg-eth4
# Storage Network - Link 2
HWADDR="00:1B:21:BF:70:02"
DEVICE="eth4"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond1"
SLAVE="yes"
vim /etc/sysconfig/network-scripts/ifcfg-eth5
# Internet-Facing Network - Link 2
HWADDR="00:1B:21:BF:6F:FE"
DEVICE="eth5"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO="none"
MASTER="bond2"
SLAVE="yes"

Populate the hosts file:

vim /etc/hosts
# an-node01
10.20.0.1	an-node01 an-node01.bcn an-node01.alteeve.ca
10.20.1.1	an-node01.ipmi
10.10.0.1	an-node01.sn
10.255.0.1	an-node01.ifn
 
# an-node01
10.20.0.2	an-node02 an-node02.bcn an-node02.alteeve.ca
10.20.1.2	an-node02.ipmi
10.10.0.2	an-node02.sn
10.255.0.2	an-node02.ifn
 
# Fence devices
10.20.2.1       pdu1 pdu1.alteeve.ca
10.20.2.2       pdu2 pdu2.alteeve.ca
10.20.2.3       switch1 switch1.alteeve.ca

Restart networking:

/etc/init.d/network restart

SSH Configuration

Note: Populating files is done here on node1 and sync'ed to node2.
ssh-keygen -t rsa -N "" -b 4095 -f ~/.ssh/id_rsa

Add the ~/.ssh/id_rsa.pub from both nodes to:

vim ~/.ssh/authorized_keys
rsync -av ~/.ssh/authorized_keys root@an-node02:/root/.ssh/

SSH into both nodes using all host names to populate ~/.ssh/known_hosts

# After ssh'ing into all host names:
rsync -av ~/.ssh/known_hosts root@an-node02:/root/.ssh/

Cluster Communications

Note: This assumes a pair of nodes with IPMI and redundant PSUs split across two switched PDUs.

Build the cluster communication section of /etc/cluster/cluster.conf.

vim /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="an-cluster-01" config_version="1">
	<cman expected_votes="1" two_node="1" />
	<clusternodes>
		<clusternode name="an-c01n01.alteeve.ca" nodeid="1">
			<fence>
				<method name="ipmi">
					<device name="ipmi_n01" action="reboot" delay="15" />
				</method>
				<method name="pdu">
					<device name="pdu1" port="1" action="reboot" delay="15" />
					<device name="pdu2" port="1" action="reboot" delay="15" />
				</method>
			</fence>
		</clusternode>
		<clusternode name="an-c01n02.alteeve.ca" nodeid="2">
			<fence>
				<method name="ipmi">
					<device name="ipmi_n02" action="reboot" />
				</method>
				<method name="pdu">
					<device name="pdu1" port="2" action="reboot" />
					<device name="pdu2" port="2" action="reboot" />
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice name="ipmi_n01" agent="fence_ipmilan" ipaddr="an-c01n01.ipmi" login="root" passwd="secret" />
		<fencedevice name="ipmi_n02" agent="fence_ipmilan" ipaddr="an-c01n02.ipmi" login="root" passwd="secret" />
		<fencedevice agent="fence_apc_snmp" ipaddr="an-p01.alteeve.ca" name="pdu1" />
		<fencedevice agent="fence_apc_snmp" ipaddr="an-p02.alteeve.ca" name="pdu2" />
	</fencedevices>
	<fence_daemon post_join_delay="30" />
	<totem rrp_mode="none" secauth="off"/>
</cluster>

Verify;

ccs_config_validate

Push to the other node;

rsync -av /etc/cluster/cluster.conf root@an-c01n02:/etc/cluster/

Start the cluster;

/etc/init.d/cman start

Setting Up DRBD

Partitioning The Drives

Note: This assumes a hardware RAID array at /dev/sda.

Create the partitions. Here, two are shown with a being roughly half way through to extended partition.

parted -a optimal /dev/sda
print free
mkpart extended xGB yGB
mkpart logical xGB aGB
mkpart logical aGB yGB
align-check opt 5
align-check opt 6
quit

Now reboot to ensure the kernel sees the new partition table.

reboot

Install the DRBD Fence Handler

Use one of these. The next section assumes you use rhcs_fence, so be sure to adjust if you

wget -c https://raw.github.com/digimer/rhcs_fence/master/rhcs_fence -O /sbin/rhcs_fence
chmod 755 /sbin/rhcs_fence
ls -lah /sbin/rhcs_fence

Alternatively;

wget -c https://alteeve.ca/files/an-cluster/sbin/obliterate-peer.sh -O /sbin/obliterate-peer.sh
chmod a+x /sbin/obliterate-peer.sh
ls -lah /sbin/obliterate-peer.sh

Configure DRBD

mv /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.orig
wget -c https://alteeve.ca/files/global_common.conf -O /etc/drbd.d/global_common.conf
diff -u /etc/drbd.d/global_common.conf.orig /etc/drbd.d/global_common.conf
--- /etc/drbd.d/global_common.conf.orig	2012-12-20 20:17:43.000000000 -0500
+++ /etc/drbd.d/global_common.conf	2012-12-11 14:05:45.000000000 -0500
@@ -7,37 +7,42 @@
 	protocol C;
 
 	handlers {
-		# These are EXAMPLE handlers only.
-		# They may have severe implications,
-		# like hard resetting the node under certain circumstances.
-		# Be careful when chosing your poison.
-
-		# pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-		# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
-		# local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
+		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
+		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
+		local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
 		# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
 		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
 		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
 		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
+		fence-peer		"/sbin/rhcs_fence";
 	}
 
 	startup {
 		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
+		become-primary-on	both;
+		wfc-timeout		300;
+		degr-wfc-timeout	120;
 	}
 
 	disk {
 		# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
 		# no-disk-drain no-md-flushes max-bio-bvecs
+		fencing			resource-and-stonith;
 	}
 
 	net {
 		# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
 		# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
 		# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
+		allow-two-primaries;
+		after-sb-0pri		discard-zero-changes;
+		after-sb-1pri		discard-secondary;
+		after-sb-2pri		disconnect;
 	}
 
 	syncer {
 		# rate after al-extents use-rle cpu-mask verify-alg csums-alg
+		rate			40M;
 	}
 }

Add the resources;

wget -c https://alteeve.ca/files/r0.res -O /etc/drbd.d/r0.res
wget -c https://alteeve.ca/files/r1.res -O /etc/drbd.d/r1.res

The resource files almost certainly need to be altered so that the on ... names, IP addresses and block device paths match your server.

Now, on both nodes, check that the configuration is sane;

drbdadm dump

If there are errors, fix them. If it dumps a (condensed) version of your configuration, you are ready to go.

Starting DRBD

On both nodes:

modprobe drbd
drbdadm create-md r{0,1}
drbdadm attach r{0,1}
drbdadm connect r{0,1}

On one node:

drbdadm -- --clear-bitmap new-current-uuid r{0,1}

On both nods:

drbdadm primary r{0,1}

Clustered LVM

We're using two PVs here, one for each PV.

Setup Clustered LVM

On both nodes;

mv /etc/lvm/lvm.conf /etc/lvm/lvm.conf.orig
wget -c https://alteeve.ca/files/lvm.conf -O /etc/lvm/lvm.conf
/etc/init.d/clvmd start

Creating PVs, VGs and the first LV

On one node;

pvcreate /dev/drbd{0,1}
vgcreate -c y an-c05n01_vg0 /dev/drbd0
vgcreate -c y an-c05n02_vg0 /dev/drbd1
lvcreate -L 40G -n shared an-c05n01_vg0

Setup GFS2

On one node:

Note: Remember to change an-cluster-05 to the target cluster name!
mkfs.gfs2 -p lock_dlm -j 2 -t an-cluster-05:shared /dev/an-c05n01_vg0/shared

On both nodes:

mkdir /shared
mount /dev/an-c05n01_vg0/shared /shared/
echo `gfs2_tool sb /dev/an-c05n01_vg0/shared uuid | awk '/uuid =/ { print $4; }' | sed -e "s/\(.*\)/UUID=\L\1\E \/shared\t\tgfs2\tdefaults,noatime,nodiratime\t0 0/"` >> /etc/fstab
/etc/init.d/gfs2 status
Configured GFS2 mountpoints: 
/shared
Active GFS2 mountpoints: 
/shared

Adding Storage To The Cluster

First, stop the storage if it's still running from the previous step;

/etc/init.d/gfs2 stop && /etc/init.d/clvmd stop && /etc/init.d/drbd stop

Add the cluster section to cluster.conf.

<?xml version="1.0"?>
<cluster name="an-cluster-01" config_version="2">
	<cman expected_votes="1" two_node="1" />
	<clusternodes>
		<clusternode name="an-c01n01.alteeve.ca" nodeid="1">
			<fence>
				<method name="ipmi">
					<device name="ipmi_n01" action="reboot" />
				</method>
				<method name="pdu">
					<device name="pdu1" port="1" action="reboot" />
					<device name="pdu2" port="1" action="reboot" />
				</method>
			</fence>
		</clusternode>
		<clusternode name="an-c01n02.alteeve.ca" nodeid="2">
			<fence>
				<method name="ipmi">
					<device name="ipmi_n02" action="reboot" />
				</method>
				<method name="pdu">
					<device name="pdu1" port="2" action="reboot" />
					<device name="pdu2" port="2" action="reboot" />
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<fencedevices>
		<fencedevice name="ipmi_n01" agent="fence_ipmilan" ipaddr="an-c01n01.ipmi" login="root" passwd="secret" />
		<fencedevice name="ipmi_n02" agent="fence_ipmilan" ipaddr="an-c01n02.ipmi" login="root" passwd="secret" />
		<fencedevice agent="fence_apc_snmp" ipaddr="an-p01.alteeve.ca" name="pdu1" />
		<fencedevice agent="fence_apc_snmp" ipaddr="an-p02.alteeve.ca" name="pdu2" />
	</fencedevices>
	<fence_daemon post_join_delay="30" />
	<totem rrp_mode="none" secauth="off"/>
	<rm log_level="5">
		<resources>
			<script file="/etc/init.d/drbd" name="drbd"/>
			<script file="/etc/init.d/clvmd" name="clvmd"/>
			<script file="/etc/init.d/gfs2" name="gfs2"/>
		</resources>
		<failoverdomains>
			<failoverdomain name="only_n01" nofailback="1" ordered="0" restricted="1">
				<failoverdomainnode name="an-c01n01.alteeve.ca"/>
			</failoverdomain>
			<failoverdomain name="only_n02" nofailback="1" ordered="0" restricted="1">
				<failoverdomainnode name="an-c01n02.alteeve.ca"/>
			</failoverdomain>
			<failoverdomain name="primary_n01" nofailback="1" ordered="1" restricted="1">
				<failoverdomainnode name="an-c01n01.alteeve.ca" priority="1"/>
				<failoverdomainnode name="an-c01n02.alteeve.ca" priority="2"/>
			</failoverdomain>
			<failoverdomain name="primary_n02" nofailback="1" ordered="1" restricted="1">
				<failoverdomainnode name="an-c01n01.alteeve.ca" priority="2"/>
				<failoverdomainnode name="an-c01n02.alteeve.ca" priority="1"/>
			</failoverdomain>
		</failoverdomains>
		<service name="storage_n01" autostart="1" domain="only_n01" exclusive="0" recovery="restart">
			<script ref="drbd">
				<script ref="clvmd">
					<script ref="gfs2"/>
				</script>
			</script>
		</service>
		<service name="storage_n02" autostart="1" domain="only_n02" exclusive="0" recovery="restart">
			<script ref="drbd">
				<script ref="clvmd">
					<script ref="gfs2"/>
				</script>
			</script>
		</service>
	</rm>
</cluster>

Now start rgmanager and confirm that drbd, clvmd and gfs2 started.

/etc/init.d/rgmanager start
/etc/init.d/drbd status && /etc/init.d/clvmd status && /etc/init.d/gfs2 status
drbd driver loaded OK; device status:
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by dag@Build64R6, 2012-09-04 12:06:10
m:res  cs         ro               ds                 p  mounted  fstype
0:r0   Connected  Primary/Primary  UpToDate/UpToDate  C
1:r1   Connected  Primary/Primary  UpToDate/UpToDate  C
clvmd (pid  18682) is running...
Clustered Volume Groups: n02_vg0 n01_vg0
Active clustered Logical Volumes: shared
Configured GFS2 mountpoints: 
/shared
Active GFS2 mountpoints: 
/shared

Provisioning VMs

Note: From here on out, everything is done on one node only.

Create the shared directories;

mkdir /shared/{definitions,provision,archive,files}



 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.