Revision as of 02:58, 12 March 2011

AN!Wiki :: How To :: Red Hat Cluster Service 2 Tutorial - Archive

Overview

This paper has one goal;

Creating a 2-node, high-availability cluster hosting Xen virtual machines using RHCS "stable 2" using DRBD for synchronized storage.

Technologies We Will Use

Enterprise Linux 5; specifically we will be using CentOS v5.5.
Red Hat Cluster Services "Stable" version 2. This describes the following core components:
- OpenAIS; Provides cluster communications using the totem protocol.
- Cluster Manager (cman); Manages the starting, stopping and managing of the cluster.
- Resource Manager (rgmanager); Manages cluster resources and services. Handles service recovery during failures.
- Cluster Logical Volume Manager (clvm); Cluster-aware (disk) volume manager. Backs GFS2 filesystems and Xen virtual machines.
- Global File Systems version 2 (gfs2); Cluster-aware, concurrently mountable file system.
Distributed Redundant Block Device (DRBD); Keeps shared data synchronized across cluster nodes.
Xen; Hypervisor that controls and supports virtual machines.

A Note on Patience

There is nothing inherently hard about clustering. However, there are many components that you need to understand before you can begin. The result is that clustering has an inherently steep learning curve.

You must have patience. Lots of it.

Many technologies can be learned by creating a very simple base and then building on it. The classic "Hello, World!" script created when first learning a programming language is an example of this. Unfortunately, there is no real analog to this in clustering. Even the most basic cluster requires several pieces be in place and working together. If you try to rush by ignoring pieces you think are not important, you will almost certainly waste time. A good example is setting aside fencing, thinking that your test cluster's data isn't important. The cluster software has no concept of "test". It treats everything as critical all the time and will shut down if anything goes wrong.

Take your time, work through these steps, and you will have the foundation cluster sooner than you realize. Clustering is fun because it is a challenge.

Prerequisites

It is assumed that you are familiar with Linux systems administration, specifically Red Hat Enterprise Linux and it's derivatives. You will need to have somewhat advanced networking experience as well. You should be comfortable working in a terminal (directly or over ssh). Familiarity with XML will help, but is not terribly required as it's use here is pretty self-evident.

If you feel a little out of depth at times, don't hesitate to set this tutorial aside. Branch over to the components you feel the need to study more, then return and continue on. Finally, and perhaps most importantly, you must have patience! If you have a manager asking you to "go live" with a cluster in a month, tell him or her that it simply won't happen. If you rush, you will skip important points and you will fail. Patience is vastly more important than any pre-existing skill.

Focus and Goal

There is a different cluster for every problem. Generally speaking though, there are two main problems that clusters try to resolve; Performance and High Availability. Performance clusters are generally tailored to the application requiring the performance increase. There are some general tools for performance clustering, like Red Hat's LVS (Linux Virtual Server) for load-balancing common applications like the Apache web-server.

This tutorial will focus on High Availability clustering, often shortened to simply HA and not to be confused with the HA Linux "heartbeat" cluster suite, which we will not be using here. The cluster will provide a shared file systems and will provide for the high availability on Xen-based virtual servers. The goal will be to have the virtual servers live-migrate during planned node outages and automatically restart on a surviving node when the original host node fails.

A very brief overview;

High Availability clusters like our have two main parts; Cluster management and resource management.

The cluster itself is responsible for maintaining the cluster nodes in a group. This group is part of a "Cluster Process Group", or CPG. When a node fails, the cluster manager must detect the failure, reliably eject the node from the cluster and reform the CPG. Each time the cluster changes, or "re-forms", the resource manager is called. The resource manager checks to see how the cluster changed, consults it's configuration and determines what to do, if anything.

The details of all this will be discussed in detail a little later on. For now, it's sufficient to have in mind these two major roles and understand that they are somewhat independent entities.

Platform

This tutorial was written using CentOS version 5.5, x86_64. No attempt was made to test on i686 or other EL5 derivatives. That said, there is no reason to believe that this tutorial will not apply to any variant. As much as possible, the language will be distro-agnostic. For reasons of memory constraints, it is advised that you use an x86_64 (64-bit) platform if at all possible.

Do note that as of EL5.4 and above, significant changes were made to how RHCS are supported. It is strongly advised that you use at least version 5.4 or newer while working with this tutorial.

Base Setup

Before we can look at the cluster, we must first build two cluster nodes and then install the operating system.

Hardware Requirements

The bare minimum requirements are;

All hardware must be supported by EL5. It is strongly recommended that you check compatibility before making any purchases.
A dual-core CPUs with hardware virtualization support.
Three network cards; At least one should be gigabit or faster.
One hard drive.
2 GiB of RAM
A fence device. This can be an IPMI-enabled server, a Node Assassin, a switched PDU or similar.

This tutorial was written using the following hardware:

AMD Athlon II X4 600e Processor
ASUS M4A785T-M/CSM
4GB Kingston KVR1333D3N9K2/4G, 4GB (2x2GB) DDR3-1333, Non-ECC
Seagate ST9500420AS 2.5" SATA HDD
2x Intel Pro/1000CT EXPI9301CT PCIe NICs
Node Assassin v1.1.4
Tripplite PDUMH15ATNET switched PDU

This is not an endorsement of the above hardware. I put a heavy emphasis on minimizing power consumption and bought what was within my budget. This hardware was never meant to be put into production, but instead was chosen to serve the purpose of my own study and for creating this tutorial. What you ultimately choose to use, provided it meets the minimum requirements, is entirely up to you and your requirements.

Note: I use three physical NICs, but you can get away with two by merging the storage and back-channel networks, which we will discuss shortly. If you are really in a pinch, you could create three aliases on on interface and isolate them using VLANs. If you go this route, please ensure that your VLANs are configured and working before beginning this tutorial. Pay close attention to multicast traffic.

Pre-Assembly

Before you assemble your nodes, take a moment to record the MAC addresses of each network interface and then note where each interface is physically installed. This will help you later when configuring the networks. I generally create a simple text file with the MAC addresses, the interface I intend to assign to it and where it physically is located.

-=] an-node01
48:5B:39:3C:53:15   # eth0 - onboard interface
00:1B:21:72:96:E8   # eth1 - right-most PCIe interface
00:1B:21:72:9B:56   # eth2 - left-most PCI interface

-=] an-node02
48:5B:39:3C:53:14   # eth0 - onboard interface
00:1B:21:72:9B:5A   # eth1 - right-most PCIe interface
00:1B:21:72:96:EA   # eth2 - left-most PCI interface

OS Install

Later steps will include packages to install, so the initial OS install can be minimal. I like to change the default run-level to 3, remove rhgb quiet from the grub menu, disable the firewall and disable SELinux. In a production cluster, you will want to use firewalling and selinux, but until you finish studying, leave it off to keep things simple.

Note: Before EL5.4, you could not use SELinux. It is now possible to use it, and it is recommended that you do so in any production cluster.
Note: Ports and protocols to open in a firewall will be discussed later in the networking section.

I like to minimize and automate my installs as much as possible. To that end, I run a little PXE server on my network and use a kickstart script to automate the install. Here is a simple one for use on a single-drive node:

generic_el5_node.ks

If you decide to manually install EL5 on your nodes, please try to keep the installation as small as possible. The fewer packages installed, the fewer sources of problems and vectors for attack.

Post Install OS Changes

This section discusses changes I recommend, but are not required.

Network Planning

The most important change that is recommended is to get your nodes into a consistent networking configuration. This will prove very handy when trying to keep track of your networks and where they're physically connected. This becomes exponentially more helpful as your cluster grows.

The first step is to understand the three networks we will be creating. Once you understand their role, you will need to decide which interface on the nodes will be used for each network.

Cluster Networks

The three networks are;

Network	Acronym	Use
Back-Channel Network	BCN	Private cluster communications, virtual machine migrations, fence devices
Storage Network	SN	Used exclusively for storage communications. Possible to use as totem's redundant ring.
Internet-Facing Network	IFN	Internet-polluted network. No cluster or storage communication or devices.

Things To Consider

When planning which interfaces to connect to each network, consider the following, in order of importance:

If your nodes have IPMI and an interface sharing a physical RJ-45 connector, this must be on the Back-Channel Network. The reasoning is that having your fence device accessible on the Internet-Facing Network poses a major security risk. Having the IPMI interface on the Storage Network can cause problems if a fence is fired and the network is saturated with storage traffic.

The lowest-latency network interface should be used as the Back-Channel Network. The cluster is maintained by multicast messaging between the nodes using something called the totem protocol. Any delay in the delivery of these messages can risk causing a failure and ejection of effected nodes when no actual failure existed. This will be discussed in greater detail later.

The network with the most raw bandwidth should be used for the Storage Network. All disk writes must be sent across the network and committed to the remote nodes before the write is declared complete. This causes the network to become the disk I/O bottle neck. Using a network with jumbo frames and high raw throughput will help minimize this bottle neck.

During the live migration of virtual machines, the VM's RAM is copied to the other node using the BCN. For this reason, the second fastest network should be used for back-channel communication. However, these copies can saturate the network, so care must be taken to ensure that cluster communications get higher priority. This can be done using a managed switch. If you can not ensure priority for totem multicast, then be sure to configure Xen later to use the storage network for migrations.

The remain, slowest interface should be used for the IFN.

Planning the Networks

This paper will use the following setup. Feel free to alter the interface to network mapping and the IP subnets used to best suit your needs. For reasons completely my own, I like to start my cluster IPs final octal at 71 for node 1 and then increment up from there. This is entirely arbitrary, so please use what ever makes sense to you. The remainder of this tutorial will follow the convention below:

Network	Interface	Subnet
IFN	eth0	192.168.1.0/24
SN	eth1	192.168.2.0/24
BCN	eth2	192.139.3.0/24

This translates to the following per-node configuration:

		an-node01		an-node02
	Interface	IP Address	Host Name(s)	IP Address	Host Name(s)
IFN	eth0	192.168.1.71	an-node01.ifn	192.168.1.72	an-node02.ifn
SN	eth1	192.168.2.71	an-node01.sn	192.168.2.72	an-node02.sn
BCN	eth2	192.168.3.71	an-node01 an-node01.alteeve.com an-node01.bcn	192.168.3.72	an-node02 an-node02.alteeve.com an-node02.bcn

Network Configuration

So now we've planned the network, so it is time to implement it.

Disable Firewalling

To "keep things simple", we will disable all firewalling on the cluster nodes. This is not recommended in production environments, obviously, so below will be a table of ports and protocols to open when you do get into production. Until then, we will simply use chkconfig to disable iptables and ip6tables.

Note: Cluster 2 does not support IPv6, so you can skip or ignore it if you wish. I like to disable it just to be certain that it can't cause issues though.

chkconfig iptables off
chkconfig ip6tables off

Now confirm that they are off by having iptables and ip6tables list their rules.

iptables -L

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

ip6tables -L

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

When you do prepare to go into production, these are the protocols and ports you need to open between cluster nodes. Remember to allow multicast communications as well!

Port	Protocol	Component
5404, 5405	UDP	cman
8084, 5405	TCP	luci
11111	TCP	ricci
14567	TCP	gnbd
16851	TCP	modclusterd
21064	TCP	dlm
50006, 50008, 50009	TCP	ccsd
50007	UDP	ccsd

Disable NetworkManager, Enable network

The NetworkManager daemon is an excellent daemon in environments where a system connects to a variety of networks. The NetworkManager daemon handles changing the networking configuration whenever it senses a change in the network state, like when a cable is unplugged or a wireless network comes or goes. As useful as this is on laptops and workstations, it can be detrimental in a cluster.

To prevent the networking from changing once we've got it setup, we want to replace NetworkManager daemon with the network initialization script. The network script will start and stop networking, but otherwise it will leave the configuration alone. This is ideal in servers, and doubly-so in clusters given their sensitivity to transient network issues.

Start by removing NetworkManager:

yum remove NetworkManager NetworkManager-glib NetworkManager-gnome NetworkManager-devel NetworkManager-glib-devel

Now you want to ensure that network starts with the system.

chkconfig network on

Setup /etc/hosts

The /etc/hosts file, by default, will resolve the hostname to the lo (127.0.0.1) interface. The cluster uses this name though for knowing which interface to use for the totem protocol (and thus all cluster communications). To this end, we will remove the hostname from 127.0.0.1 and instead put it on the IP of our BCN connected interface. At the same time, we will add entries for all networks for each node in the cluster and entries for the fence devices. Once done, the edited /etc/hosts file should be suitable for copying to all nodes in the cluster.

vim /etc/hosts

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1	localhost.localdomain localhost
::1		localhost6.localdomain6 localhost6

192.168.1.71	an-node01.ifn
192.168.2.71	an-node01.sn
192.168.3.71	an-node01 an-node01.bcn an-node01.alteeve.com

192.168.1.72	an-node02.ifn
192.168.2.72	an-node02.sn
192.168.3.72	an-node02 an-node02.bcn an-node02.alteeve.com

192.168.3.61	batou.alteeve.com	# Node Assassin
192.168.3.62	motoko.alteeve.com	# Switched PDU

Mapping Interfaces to ethX Names

Chances are good that the assignment of ethX interface names to your physical network cards is not ideal. There is no strict technical reason to change the mapping, but it will make you life a lot easier if all nodes use the same ethX names for the same subnets.

The actual process of changing the mapping is a little involved. For this reason, there is a dedicated mini-tutorial which you can find below. Please jump to it and then return once your mapping is as you like it.

Changing the ethX to Ethernet Device Mapping in EL5

Set IP Addresses

The last step in setting up the network interfaces is to manually assign the IP addresses and define the subnets for the interfaces. This involves directly editing the /etc/sysconfig/network-scripts/ifcfg-ethX files. There are a large set of options that can be set in these configuration files, but most are outside the scope of this tutorial. To get a better understanding of the available options, please see:

Red Hat's Interface Configuration Guide

Here are my three configuration files which you can use as guides. Please do not copy these over your files! Doing so will cause your interfaces to fail outright as every interface's MAC address is unique. Adapt these to suite your needs.

vim /etc/sysconfig/network-scripts/ifcfg-eth0

# Internet-Facing Network
HWADDR=48:5B:39:3C:53:15
DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.1.71
NETMASK=255.255.255.0
GATEWAY=192.168.1.254

vim /etc/sysconfig/network-scripts/ifcfg-eth1

# Storage Network
HWADDR=00:1B:21:72:96:E8
DEVICE=eth1
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.2.71
NETMASK=255.255.255.0

vim /etc/sysconfig/network-scripts/ifcfg-eth2

# Back Channel Network
HWADDR=00:1B:21:72:9B:56
DEVICE=eth2
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.3.71
NETMASK=255.255.255.0

You will also need to setup the /etc/resolv.conf file for DNS resolution. You can learn more about this file's purpose by reading it's man page; man resolv.conf. The main thing is to set valid DNS server IP addresses in the nameserver sections. Here is mine, for reference:

vim /etc/resolv.conf

search alteeve.com
nameserver 192.139.81.117
nameserver 192.139.81.1

Finally, restart network and you should have you interfaces setup properly.

/etc/init.d/network restart

Shutting down interface eth0:                              [  OK  ]
Shutting down interface eth1:                              [  OK  ]
Shutting down interface eth2:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface eth1:                                [  OK  ]
Bringing up interface eth2:                                [  OK  ]

You can verify your configuration using the ifconfig tool.

ifconfig

eth0      Link encap:Ethernet  HWaddr 48:5B:39:3C:53:15  
          inet addr:192.168.1.71  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::92e6:baff:fe71:82ea/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1727 errors:0 dropped:0 overruns:0 frame:0
          TX packets:655 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:208916 (204.0 KiB)  TX bytes:133171 (130.0 KiB)
          Interrupt:252 Base address:0x2000 

eth1      Link encap:Ethernet  HWaddr 00:1B:21:72:96:E8  
          inet addr:192.168.2.71  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::221:91ff:fe19:9653/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:998 errors:0 dropped:0 overruns:0 frame:0
          TX packets:47 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:97702 (95.4 KiB)  TX bytes:6959 (6.7 KiB)
          Interrupt:16 

eth2      Link encap:Ethernet  HWaddr 00:1B:21:72:9B:56  
          inet addr:192.168.3.71  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::20e:cff:fe59:46e4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5241 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4439 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1714026 (1.6 MiB)  TX bytes:1624392 (1.5 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:6449 (6.2 KiB)  TX bytes:6449 (6.2 KiB)

Setting up SSH

Any questions, feedback, advice, complaints or meanderings are welcome.
`Alteeve's Niche!`	`Enterprise Support: Alteeve Support`	`Community Support`
© Alteeve's Niche! Inc. 1997-2024		Anvil! "Intelligent Availability®" Platform
`legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.`

Red Hat Cluster Service 2 Tutorial - Archive: Difference between revisions