Red Hat Cluster Service 2 Tutorial - Archive

From Alteeve Wiki
Jump to navigation Jump to search

 AN!Wiki :: How To :: Red Hat Cluster Service 2 Tutorial - Archive

Overview

This paper has one goal;

  • Creating a 2-node, high-availability cluster hosting Xen virtual machines using RHCS "stable 2" using DRBD for synchronized storage.

Technologies We Will Use

  • Enterprise Linux 5; specifically we will be using CentOS v5.5.
  • Red Hat Cluster Services "Stable" version 2. This describes the following core components:
    • OpenAIS; Provides cluster communications using the totem protocol.
    • Cluster Manager (cman); Manages the starting, stopping and managing of the cluster.
    • Resource Manager (rgmanager); Manages cluster resources and services. Handles service recovery during failures.
    • Cluster Logical Volume Manager (clvm); Cluster-aware (disk) volume manager. Backs GFS2 filesystems and Xen virtual machines.
    • Global File Systems version 2 (gfs2); Cluster-aware, concurrently mountable file system.
  • Distributed Redundant Block Device (DRBD); Keeps shared data synchronized across cluster nodes.
  • Xen; Hypervisor that controls and supports virtual machines.

A Note on Patience

There is nothing inherently hard about clustering. However, there are many components that you need to understand before you can begin. The result is that clustering has an inherently steep learning curve.

You must have patience. Lots of it.

Many technologies can be learned by creating a very simple base and then building on it. The classic "Hello, World!" script created when first learning a programming language is an example of this. Unfortunately, there is no real analog to this in clustering. Even the most basic cluster requires several pieces be in place and working together. If you try to rush by ignoring pieces you think are not important, you will almost certainly waste time. A good example is setting aside fencing, thinking that your test cluster's data isn't important. The cluster software has no concept of "test". It treats everything as critical all the time and will shut down if anything goes wrong.

Take your time, work through these steps, and you will have the foundation cluster sooner than you realize. Clustering is fun because it is a challenge.

Prerequisites

It is assumed that you are familiar with Linux systems administration, specifically Red Hat Enterprise Linux and it's derivatives. You will need to have somewhat advanced networking experience as well. You should be comfortable working in a terminal (directly or over ssh). Familiarity with XML will help, but is not terribly required as it's use here is pretty self-evident.

If you feel a little out of depth at times, don't hesitate to set this tutorial aside. Branch over to the components you feel the need to study more, then return and continue on. Finally, and perhaps most importantly, you must have patience! If you have a manager asking you to "go live" with a cluster in a month, tell him or her that it simply won't happen. If you rush, you will skip important points and you will fail. Patience is vastly more important than any pre-existing skill.

Focus and Goal

There is a different cluster for every problem. Generally speaking though, there are two main problems that clusters try to resolve; Performance and High Availability. Performance clusters are generally tailored to the application requiring the performance increase. There are some general tools for performance clustering, like Red Hat's LVS (Linux Virtual Server) for load-balancing common applications like the Apache web-server.

This tutorial will focus on High Availability clustering, often shortened to simply HA and not to be confused with the HA Linux "heartbeat" cluster suite, which we will not be using here. The cluster will provide a shared file systems and will provide for the high availability on Xen-based virtual servers. The goal will be to have the virtual servers live-migrate during planned node outages and automatically restart on a surviving node when the original host node fails.

A very brief overview;

High Availability clusters like our have two main parts; Cluster management and resource management.

The cluster itself is responsible for maintaining the cluster nodes in a group. This group is part of a "Cluster Process Group", or CPG. When a node fails, the cluster manager must detect the failure, reliably eject the node from the cluster and reform the CPG. Each time the cluster changes, or "re-forms", the resource manager is called. The resource manager checks to see how the cluster changed, consults it's configuration and determines what to do, if anything.

The details of all this will be discussed in detail a little later on. For now, it's sufficient to have in mind these two major roles and understand that they are somewhat independent entities.

Platform

This tutorial was written using CentOS version 5.5, x86_64. No attempt was made to test on i686 or other EL5 derivatives. That said, there is no reason to believe that this tutorial will not apply to any variant. As much as possible, the language will be distro-agnostic. For reasons of memory constraints, it is advised that you use an x86_64 (64-bit) platform if at all possible.

Do note that as of EL5.4 and above, significant changes were made to how RHCS are supported. It is strongly advised that you use at least version 5.4 or newer while working with this tutorial.

Base Setup

Before we can look at the cluster, we must first build two cluster nodes and then install the operating system.

Hardware Requirements

The bare minimum requirements are;

  • All hardware must be supported by EL5. It is strongly recommended that you check compatibility before making any purchases.
  • A dual-core CPUs with hardware virtualization support.
  • Three network cards; At least one should be gigabit or faster.
  • One hard drive.
  • 2 GiB of RAM
  • A fence device. This can be an IPMI-enabled server, a Node Assassin, a switched PDU or similar.

This tutorial was written using the following hardware:

This is not an endorsement of the above hardware. I put a heavy emphasis on minimizing power consumption and bought what was within my budget. This hardware was never meant to be put into production, but instead was chosen to serve the purpose of my own study and for creating this tutorial. What you ultimately choose to use, provided it meets the minimum requirements, is entirely up to you and your requirements.

Note: I use three physical NICs, but you can get away with two by merging the storage and back-channel networks, which we will discuss shortly. If you are really in a pinch, you could create three aliases on on interface and isolate them using VLANs. If you go this route, please ensure that your VLANs are configured and working before beginning this tutorial. Pay close attention to multicast traffic.

Pre-Assembly

Before you assemble your nodes, take a moment to record the MAC addresses of each network interface and then note where each interface is physically installed. This will help you later when configuring the networks. I generally create a simple text file with the MAC addresses, the interface I intend to assign to it and where it physically is located.

-=] an-node01
48:5B:39:3C:53:15   # eth0 - onboard interface
00:1B:21:72:96:E8   # eth1 - right-most PCIe interface
00:1B:21:72:9B:56   # eth2 - left-most PCI interface

-=] an-node02
48:5B:39:3C:53:14   # eth0 - onboard interface
00:1B:21:72:9B:5A   # eth1 - right-most PCIe interface
00:1B:21:72:96:EA   # eth2 - left-most PCI interface

OS Install

Later steps will include packages to install, so the initial OS install can be minimal. I like to change the default run-level to 3, remove rhgb quiet from the grub menu, disable the firewall and disable SELinux. In a production cluster, you will want to use firewalling and selinux, but until you finish studying, leave it off to keep things simple.

  • Note: Before EL5.4, you could not use SELinux. It is now possible to use it, and it is recommended that you do so in any production cluster.
  • Note: Ports and protocols to open in a firewall will be discussed later in the networking section.

I like to minimize and automate my installs as much as possible. To that end, I run a little PXE server on my network and use a kickstart script to automate the install. Here is a simple one for use on a single-drive node:

If you decide to manually install EL5 on your nodes, please try to keep the installation as small as possible. The fewer packages installed, the fewer sources of problems and vectors for attack.

Post Install OS Changes

This section discusses changes I recommend, but are not required.

Network Planning

The most important change that is recommended is to get your nodes into a consistent networking configuration. This will prove very handy when trying to keep track of your networks and where they're physically connected. This becomes exponentially more helpful as your cluster grows.

The first step is to understand the three networks we will be creating. Once you understand their role, you will need to decide which interface on the nodes will be used for each network.

Cluster Networks

The three networks are;

Network Acronym Use
Back-Channel Network BCN Private cluster communications, virtual machine migrations, fence devices
Storage Network SN Used exclusively for storage communications. Possible to use as totem's redundant ring.
Internet-Facing Network IFN Internet-polluted network. No cluster or storage communication or devices.

Things To Consider

When planning which interfaces to connect to each network, consider the following, in order of importance:

  • If your nodes have IPMI and an interface sharing a physical RJ-45 connector, this must be on the Back-Channel Network. The reasoning is that having your fence device accessible on the Internet-Facing Network poses a major security risk. Having the IPMI interface on the Storage Network can cause problems if a fence is fired and the network is saturated with storage traffic.
  • The lowest-latency network interface should be used as the Back-Channel Network. The cluster is maintained by multicast messaging between the nodes using something called the totem protocol. Any delay in the delivery of these messages can risk causing a failure and ejection of effected nodes when no actual failure existed. This will be discussed in greater detail later.
  • The network with the most raw bandwidth should be used for the Storage Network. All disk writes must be sent across the network and committed to the remote nodes before the write is declared complete. This causes the network to become the disk I/O bottle neck. Using a network with jumbo frames and high raw throughput will help minimize this bottle neck.
  • During the live migration of virtual machines, the VM's RAM is copied to the other node using the BCN. For this reason, the second fastest network should be used for back-channel communication. However, these copies can saturate the network, so care must be taken to ensure that cluster communications get higher priority. This can be done using a managed switch. If you can not ensure priority for totem multicast, then be sure to configure Xen later to use the storage network for migrations.
  • The remain, slowest interface should be used for the IFN.

Planning the Networks

This paper will use the following setup. Feel free to alter the interface to network mapping and the IP subnets used to best suit your needs. For reasons completely my own, I like to start my cluster IPs final octal at 71 for node 1 and then increment up from there. This is entirely arbitrary, so please use what ever makes sense to you. The remainder of this tutorial will follow the convention below:

Network Interface Subnet
IFN eth0 192.168.1.0/24
SN eth1 192.168.2.0/24
BCN eth2 192.139.3.0/24

This translates to the following per-node configuration:

an-node01 an-node02
Interface IP Address Host Name(s) IP Address Host Name(s)
IFN eth0 192.168.1.71 an-node01.ifn 192.168.1.72 an-node02.ifn
SN eth1 192.168.2.71 an-node01.sn 192.168.2.72 an-node02.sn
BCN eth2 192.168.3.71 an-node01 an-node01.alteeve.com an-node01.bcn 192.168.3.72 an-node02 an-node02.alteeve.com an-node02.bcn

Network Configuration

So now we've planned the network, so it is time to implement it.

Disable Firewalling

To "keep things simple", we will disable all firewalling on the cluster nodes. This is not recommended in production environments, obviously, so below will be a table of ports and protocols to open when you do get into production. Until then, we will simply use chkconfig to disable iptables and ip6tables.

Note: Cluster 2 does not support IPv6, so you can skip or ignore it if you wish. I like to disable it just to be certain that it can't cause issues though.

chkconfig iptables off
chkconfig ip6tables off

Now confirm that they are off by having iptables and ip6tables list their rules.

iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ip6tables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

When you do prepare to go into production, these are the protocols and ports you need to open between cluster nodes. Remember to allow multicast communications as well!

Port Protocol Component
5404, 5405 UDP cman
8084, 5405 TCP luci
11111 TCP ricci
14567 TCP gnbd
16851 TCP modclusterd
21064 TCP dlm
50006, 50008, 50009 TCP ccsd
50007 UDP ccsd

Disable NetworkManager, Enable network

The NetworkManager daemon is an excellent daemon in environments where a system connects to a variety of networks. The NetworkManager daemon handles changing the networking configuration whenever it senses a change in the network state, like when a cable is unplugged or a wireless network comes or goes. As useful as this is on laptops and workstations, it can be detrimental in a cluster.

To prevent the networking from changing once we've got it setup, we want to replace NetworkManager daemon with the network initialization script. The network script will start and stop networking, but otherwise it will leave the configuration alone. This is ideal in servers, and doubly-so in clusters given their sensitivity to transient network issues.

Start by removing NetworkManager:

yum remove NetworkManager NetworkManager-glib NetworkManager-gnome NetworkManager-devel NetworkManager-glib-devel

Now you want to ensure that network starts with the system.

chkconfig network on

Setup /etc/hosts

The /etc/hosts file, by default, will resolve the hostname to the lo (127.0.0.1) interface. The cluster uses this name though for knowing which interface to use for the totem protocol (and thus all cluster communications). To this end, we will remove the hostname from 127.0.0.1 and instead put it on the IP of our BCN connected interface. At the same time, we will add entries for all networks for each node in the cluster and entries for the fence devices. Once done, the edited /etc/hosts file should be suitable for copying to all nodes in the cluster.

vim /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1	localhost.localdomain localhost
::1		localhost6.localdomain6 localhost6

192.168.1.71	an-node01.ifn
192.168.2.71	an-node01.sn
192.168.3.71	an-node01 an-node01.bcn an-node01.alteeve.com

192.168.1.72	an-node02.ifn
192.168.2.72	an-node02.sn
192.168.3.72	an-node02 an-node02.bcn an-node02.alteeve.com

192.168.3.61	batou.alteeve.com	# Node Assassin
192.168.3.62	motoko.alteeve.com	# Switched PDU

Mapping Interfaces to ethX Names

Chances are good that the assignment of ethX interface names to your physical network cards is not ideal. There is no strict technical reason to change the mapping, but it will make you life a lot easier if all nodes use the same ethX names for the same subnets.

The actual process of changing the mapping is a little involved. For this reason, there is a dedicated mini-tutorial which you can find below. Please jump to it and then return once your mapping is as you like it.

Set IP Addresses

The last step in setting up the network interfaces is to manually assign the IP addresses and define the subnets for the interfaces. This involves directly editing the /etc/sysconfig/network-scripts/ifcfg-ethX files. There are a large set of options that can be set in these configuration files, but most are outside the scope of this tutorial. To get a better understanding of the available options, please see:

Here are my three configuration files which you can use as guides. Please do not copy these over your files! Doing so will cause your interfaces to fail outright as every interface's MAC address is unique. Adapt these to suite your needs.

vim /etc/sysconfig/network-scripts/ifcfg-eth0
# Internet-Facing Network
HWADDR=48:5B:39:3C:53:15
DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.1.71
NETMASK=255.255.255.0
GATEWAY=192.168.1.254
vim /etc/sysconfig/network-scripts/ifcfg-eth1
# Storage Network
HWADDR=00:1B:21:72:96:E8
DEVICE=eth1
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.2.71
NETMASK=255.255.255.0
vim /etc/sysconfig/network-scripts/ifcfg-eth2
# Back Channel Network
HWADDR=00:1B:21:72:9B:56
DEVICE=eth2
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.3.71
NETMASK=255.255.255.0

You will also need to setup the /etc/resolv.conf file for DNS resolution. You can learn more about this file's purpose by reading it's man page; man resolv.conf. The main thing is to set valid DNS server IP addresses in the nameserver sections. Here is mine, for reference:

vim /etc/resolv.conf
search alteeve.com
nameserver 192.139.81.117
nameserver 192.139.81.1

Finally, restart network and you should have you interfaces setup properly.

/etc/init.d/network restart
Shutting down interface eth0:                              [  OK  ]
Shutting down interface eth1:                              [  OK  ]
Shutting down interface eth2:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface eth1:                                [  OK  ]
Bringing up interface eth2:                                [  OK  ]

You can verify your configuration using the ifconfig tool.

ifconfig
eth0      Link encap:Ethernet  HWaddr 48:5B:39:3C:53:15  
          inet addr:192.168.1.71  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::92e6:baff:fe71:82ea/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1727 errors:0 dropped:0 overruns:0 frame:0
          TX packets:655 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:208916 (204.0 KiB)  TX bytes:133171 (130.0 KiB)
          Interrupt:252 Base address:0x2000 

eth1      Link encap:Ethernet  HWaddr 00:1B:21:72:96:E8  
          inet addr:192.168.2.71  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::221:91ff:fe19:9653/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:998 errors:0 dropped:0 overruns:0 frame:0
          TX packets:47 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:97702 (95.4 KiB)  TX bytes:6959 (6.7 KiB)
          Interrupt:16 

eth2      Link encap:Ethernet  HWaddr 00:1B:21:72:9B:56  
          inet addr:192.168.3.71  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::20e:cff:fe59:46e4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5241 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4439 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1714026 (1.6 MiB)  TX bytes:1624392 (1.5 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:6449 (6.2 KiB)  TX bytes:6449 (6.2 KiB)

Setting up SSH

Setting up SSH shared keys will allow your nodes to pass files between one another and execute commands remotely without needing to enter a password. This will be needed later when we want to enable applications like libvirtd and virt-manager.

SSH is, on it's own, a very big topic. If you are not familiar with SSH, please take some time to learn about it before proceeding. A great first step is the Wikipedia entry on SSH, as well as the SSH man page; man ssh.

SSH can be a bit confusing keeping connections straight in you head. When you connect to a remote machine, you start the connection on your machine as the user you are logged in as. This is the source user. When you call the remote machine, you tell the machine what user you want to log in as. This is the remote user.

You will need to create an SSH key for each source user on each node, and then you will need to copy the newly generated public key to each remote machine's user directory that you want to connect to. In this example, we want to connect to either node, from either node, as the root user. So we will create a key for each node's root user and then copy the generated public key to the other node's root user's directory.

For each user, on each machine you want to connect from, run:

# The '2047' is just to screw with brute-forces a bit. :)
ssh-keygen -t rsa -N "" -b 2047 -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
a1:65:a9:50:bb:15:ae:b1:6e:06:12:4a:29:d1:68:f3 root@an-node01.alteeve.com

This will create two files: the private key called ~/.ssh/id_rsa and the public key called ~/.ssh/id_rsa.pub. The private must never be group or world readable! That is, it should be set to mode 0600.

The two files should look like:

Private key:

cat ~/.ssh/id_rsa
-----BEGIN RSA PRIVATE KEY-----
MIIEnwIBAAKCAQBTNg6FZyDKm4GAm7c+F2enpLWy+t8ZZjm4Z3Q7EhX09ukqk/Qm
MqprtI9OsiRVjce+wGx4nZ8+Z0NHduCVuwAxG0XG7FpKkUJC3Qb8KhyeIpKEcfYA
tsDUFnWddVF8Tsz6dDOhb61tAke77d9E01NfyHp88QBxjJ7w+ZgB2eLPBFm6j1t+
K50JHwdcFfxrZFywKnAQIdH0NCs8VaW91fQZBupg4OGOMpSBnVzoaz2ybI9bQtbZ
4GwhCghzKx7Qjz20WiqhfPMfFqAZJwn0WXfjALoioMDWavTbx+J2HM8KJ8/YkSSK
dDEgZCItg0Q2fC35TDX+aJGu3xNfoaAe3lL1AgEjAoIBABVlq/Zq+c2y9Wo2q3Zd
yjJsLrj+rmWd8ZXRdajKIuc4LVQXaqq8kjjz6lYQjQAOg9H291I3KPLKGJ1ZFS3R
AAygnOoCQxp9H6rLHw2kbcJDZ4Eknlf0eroxqTceKuVzWUe3ev2gX8uS3z70BjZE
+C6SoydxK//w9aut5UJN+H5f42p95IsUIs0oy3/3KGPHYrC2Zgc2TIhe25huie/O
psKhHATBzf+M7tHLGia3q682JqxXru8zhtPOpEAmU4XDtNdL+Bjv+/Q2HMRstJXe
2PU3IpVBkirEIE5HlyOV1T802KRsSBelxPV5Y6y5TRq+cEwn0G2le1GiFBjd0xQd
0csCgYEA2BWkxSXhqmeb8dzcZnnuBZbpebuPYeMtWK/MMLxvJ50UCUfVZmA+yUUX
K9fAUvkMLd7V8/MP7GrdmYq2XiLv6IZPUwyS8yboovwWMb+72vb5QSnN6LAfpUEk
NRd5JkWgqRstGaUzxeCRfwfIHuAHikP2KeiLM4TfBkXzhm+VWjECgYBilQEBHvuk
LlY2/1v43zYQMSZNHBSbxc7R5mnOXNFgapzJeFKvaJbVKRsEQTX5uqo83jRXC7LI
t14pC23tpW1dBTi9bNLzQnf/BL9vQx6KFfgrXwy8KqXuajfv1ECH6ytqdttkUGZt
TE/monjAmR5EVElvwMubCPuGDk9zC7iQBQKBgG8hEukMKunsJFCANtWdyt5NnKUB
X66vWSZLyBkQc635Av11Zm8qLusq2Ld2RacDvR7noTuhkykhBEBV92Oc8Gj0ndLw
hhamS8GI9Xirv7JwYu5QA377ff03cbTngCJPsbYN+e/uj6eYEE/1X5rZnXpO1l6y
G7QYcrLE46Q5YsCrAoGAL+H5LG4idFEFTem+9Tk3hDUhO2VpGHYFXqMdctygNiUn
lQ6Oj7Z1JbThPJSz0RGF4wzXl/5eJvn6iPbsQDpoUcC1KM51FxGn/4X2lSCZzgqr
vUtslejUQJn96YRZ254cZulF/YYjHyUQ3byhDRcr9U2CwUBi5OcbFTomlvcQgHcC
gYEAtIpaEWt+Akz9GDJpKM7Ojpk8wTtlz2a+S5fx3WH/IVURoAzZiXzvonVIclrH
5RXFiwfoXlMzIulZcrBJZfTgRO9A2v9rE/ZRm6qaDrGe9RcYfCtxGGyptMKLdbwP
UW1emRl5celU9ZEZRBpIVTES5ZVWqD2RkkkNNJbPf5F/x+w=
-----END RSA PRIVATE KEY-----

Public key:

cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQBTNg6FZyDKm4GAm7c+F2enpLWy+t8ZZjm4Z3Q7EhX09ukqk/QmMqprtI9OsiRVjce+wGx4nZ8+Z0NHduCVuwAxG0XG7FpKkUJC3Qb8KhyeIpKEcfYAtsDUFnWddVF8Tsz6dDOhb61tAke77d9E01NfyHp88QBxjJ7w+ZgB2eLPBFm6j1t+K50JHwdcFfxrZFywKnAQIdH0NCs8VaW91fQZBupg4OGOMpSBnVzoaz2ybI9bQtbZ4GwhCghzKx7Qjz20WiqhfPMfFqAZJwn0WXfjALoioMDWavTbx+J2HM8KJ8/YkSSKdDEgZCItg0Q2fC35TDX+aJGu3xNfoaAe3lL1 root@an-node01.alteeve.com

Copy the public key and then ssh normally into the remote machine as the root user. Create a file called ~/.ssh/authorized_keys and paste in the key.

From an-node01, type:

ssh root@an-node02
The authenticity of host 'an-node02 (192.168.3.72)' can't be established.
RSA key fingerprint is 55:58:c3:32:e4:e6:5e:32:c1:db:5c:f1:36:e2:da:4b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'an-node02,192.168.3.72' (RSA) to the list of known hosts.
Last login: Fri Mar 11 20:45:58 2011 from 192.168.1.202

You will now be logged into an-node02 as the root user. Create the ~/.ssh/authorized_keys file and paste into it the public key from an-node01. If the remote machine's user hasn't used ssh yet, their ~/.ssh directory will not exist.

cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQBTNg6FZyDKm4GAm7c+F2enpLWy+t8ZZjm4Z3Q7EhX09ukqk/QmMqprtI9OsiRVjce+wGx4nZ8+Z0NHduCVuwAxG0XG7FpKkUJC3Qb8KhyeIpKEcfYAtsDUFnWddVF8Tsz6dDOhb61tAke77d9E01NfyHp88QBxjJ7w+ZgB2eLPBFm6j1t+K50JHwdcFfxrZFywKnAQIdH0NCs8VaW91fQZBupg4OGOMpSBnVzoaz2ybI9bQtbZ4GwhCghzKx7Qjz20WiqhfPMfFqAZJwn0WXfjALoioMDWavTbx+J2HM8KJ8/YkSSKdDEgZCItg0Q2fC35TDX+aJGu3xNfoaAe3lL1 root@an-node01.alteeve.com

Now log out and then log back into the remote machine. This time, the connection should succeed without having entered a password!

Various applications will connect to the other node using different methods and networks. Each connection, when first established, will prompt for you to confirm that you trust the authentication, as we saw above. Many programs can't handle this prompt and will simply fail to connect. So to get around this, I will ssh into both nodes using all hostnames. This will populate a file called ~/.ssh/known_hosts. Once you do this on one node, you can simply copy the known_hosts to the other nodes and user's ~/.ssh/ directories.

I simply paste this into a terminal, answering yes and then immediately exiting from the ssh session. This is a bit tedious, I admit. Take the time to check the fingerprints as they are displayed to you. It is a bad habit to blindly type yes.

Alter this to suit your host names.

ssh root@an-node01 && \
ssh root@an-node01.alteeve.com && \
ssh root@an-node01.bcn && \
ssh root@an-node01.sn && \
ssh root@an-node01.ifn && \
ssh root@an-node02 && \
ssh root@an-node02.alteeve.com && \
ssh root@an-node02.bcn && \
ssh root@an-node02.sn && \
ssh root@an-node02.ifn

Altering Boot Up

Note: These are an optional steps.

There are two changes I like to make on my nodes. These are not required, but I find it helps to keep things as simple as possible. Particularly in the earlier learning and testing stages.

Changing the Default Run-Level

If you choose not to implement it, please change any referenced to /etc/rc3.d to /etc/rc5.d later in this tutorial.

I prefer to minimize the running daemons and apps on my nodes for two reasons; Performance and security. One of the simplest ways to minimize the number of running programs is to change the run-level to 3 by editing /etc/inittab. This tells the node when it boots not to start the graphical interface and instead simply boot to a bash shell.

This change is actually quite simple. Simple edit /etc/inittab and change the line id:5:initdefault: to id:3:initdefault:.

vim /etc/inittab
# Default runlevel. The runlevels used by RHS are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)
# 
id:3:initdefault:

If you are still in a graphical environment and want to disable the GUI without rebooting, you can run init 3. Conversely, if you want to start the GUI for a certain task, you can do so my running init 5.

Making Boot Messages Visible

Another optional step, in-line with the change above, is to disable the rhgb (Red Hat Graphical Boot) and quiet kernel arguments. These options provide the clean boot screen you normally see with EL5, but they also hide a lot of boot messages that we may find helpful.

To make this change, edit the grub boot-loader menu and remove the rhgb quiet arguments from the kernel /vmlinuz... line. These arguments are usually the last ones on the line. If you leave this until later you may see two or more kernel entries. Delete these arguments where ever they are found.

vim /boot/grub/menu.lst

Change:

title CentOS (2.6.18-194.32.1.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-194.32.1.el5 ro root=LABEL=/ rhgb quiet
        initrd /initrd-2.6.18-194.32.1.el5.img

To:

title CentOS (2.6.18-194.32.1.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-194.32.1.el5 ro root=LABEL=/
        initrd /initrd-2.6.18-194.32.1.el5.img

There is nothing more to do now. Future reboots will be simple terminal displays.

Setting Up Xen

It may seem premature to discuss Xen before the cluster itself. The reason we need to look at it now, before the cluster, is because Xen makes some fairly significant changes to the networking. Given how changes to networking can effect the cluster, we will want to get these changes out of the way.

We're not going to provision any virtual machines until the cluster is built.

A Brief Overview

Xen is a hypervisor the converts the installed operating system into a virtual machine running on a small Xen kernel. This same small kernel also runs all of the virtual machines you will add later. In this way, you will always be working in a virtual machine once you switch to booting a Xen kernel. In Xen terminology, virtual machines are known as domains.

The "host" operating system is known as dom0 (domain 0) and has a special view of the hardware plus contains the configuration and control of Xen itself. All other Xen virtual machines are known as domU (domain U). This is a collective term that represents the transient ID number assigned to all virtual machines. For example, when you boot the first virtual machine, it is known as dom1. The next will be dom2, then dom3 and so on. Do note that if a domU shuts down, it's ID is not reused. So when it restarts, it will use the next free ID (ie: dom4 in this list, despite it having been, say, dom1 initially).

This makes Xen somewhat unique in the virtualization world. Most others do not touch or alter the "host" OS, instead running the guest VMs fully withing the context of the host operating system.

Install Xen

In EL5, all the software we need to run Xen VMs and the Xen hypervisor are available from the stock repositories. These are admittedly older, but they are supported, so they will be used in this tutorial. If you wish to use newer versions, by all means do so, but that falls outside the scope of this paper. I would suggest that you work with the stock versions until you are comfortable with Xen and the Cluster suite before trying to upgrade to the newer versions.

Some of the packages listed below will be used later. To install the packages, run:

yum install kernel-xen kmod-drbd83-xen xen xen-libs

Understanding Networking in Xen

Xen uses a fairly complex networking system. This is, perhaps, it's strongest point. The trade off though is that it can be a little tricky to wrap your head around. To help you become familiar, there is a short tutorial dedicated to this topic. Please read it over before proceeding in you are not familiar with Xen's networking.

Taking the time to read and understand the mini-paper below will save you a lot of heartache in the following stages.

Making Network Interfaces Available To Xen Clients

As discussed above, Xen makes some significant changes to the dom0 network, which happens to be where the cluster will operate. These changes including shutting down and moving around the interfaces. As we will discuss later, this behaviour can trigger cluster failures. This is the main reason for dealing with Xen now. Once the changes are in place, the network is stable and safe for running the cluster on.

A Brief Overview

By default, Xen only makes eth0 available to the virtual machines. We will want to add eth2 as well, as we will use the Back Channel Network for inter-VM communication. We do not want to add the Storage Network to Xen though! Doing so puts the DRBD link at risk. Should xend get shut down, it could trigger a split-brain in DRBD.

What Xen does, in brief, is move the "real" eth0 over to a new device called peth0. Then it creates a virtual "clone" of the network interface called eth0. Next, Xen creates a bridge called xenbr0. Finally, both the real peth0 and the new virtual eth0 are connected to the xenbr0 bridge.

The reasoning behind all this is to separate the traffic coming to and from dom0 from any traffic doing to the various domUs. Think of it sort of like the bridge being a network switch, the peth0 being an uplink cable to the outside world and the virtual eth0 being dom0's "port" on the switch. We want the same to be done to the interface on the Back-Channel Network, too. The Storage Network will never be exposed to the domU machines, so combining the risk to the underlying storage, there is no reason to add eth1 to Xen's control.

Disable the 'qemu' Bridge

By default, qemu creates a bridge called virbr0 designed to connect virtual machines to the first eth0 interface. Our system will not need this, so we will remove it. This bridge is configured in the /etc/libvirt/qemu/networks/default.xml file, so to remove this bridge, simply delete the contents of the file.

cat /dev/null >/etc/libvirt/qemu/networks/default.xml

The next time you reboot, that bridge will be gone.

Create /etc/xen/scripts/an-network-script

We will create a script that Xen will be told to use for bringing up the "xenified" network interfaces.

Please note:

  1. You don't need to use the name 'an-network-script'. I suggest this name mainly to keep in line with the rest of the 'AN!x' naming used on this wiki.
  2. If you install convirt (not discussed further here), it will create it's own bridge script called convirt-xen-multibridge. Other tools may do something similar.

First, touch the file and then chmod it to be executable.

touch /etc/xen/scripts/an-network-script
chmod 755 /etc/xen/scripts/an-network-script

Now edit it to contain the following:

vim /etc/xen/scripts/an-network-script
#!/bin/sh
dir=$(dirname "$0")
"$dir/network-bridge" "$@" vifnum=0 netdev=eth0 bridge=xenbr0
"$dir/network-bridge" "$@" vifnum=2 netdev=eth2 bridge=xenbr2

Now tell Xen to reference that script by editing /etc/xen/xend-config.sxp file and changing the network-script argument to point to this new script (this is line 91 in the default xend-config.sxp script):

vim /etc/xen/xend-config.sxp
#(network-script network-bridge)
(network-script an-network-script)

Finally, check that it works by (re)starting xend:

/etc/init.d/xend restart
restart xend:                                              [  OK  ]

Now we'll use ifconfig to see the new network configuration (with a dash of creative grep to save screen space):

ifconfig |grep "Link encap" -A 1
eth0      Link encap:Ethernet  HWaddr 48:5B:39:3C:53:15
          inet addr:192.168.1.71  Bcast:192.168.1.255  Mask:255.255.255.0
--
eth1      Link encap:Ethernet  HWaddr 00:1B:21:72:96:E8
          inet addr:192.168.2.71  Bcast:192.168.2.255  Mask:255.255.255.0
--
eth2      Link encap:Ethernet  HWaddr 00:1B:21:72:9B:56
          inet addr:192.168.3.71  Bcast:192.168.3.255  Mask:255.255.255.0
--
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
--
peth0     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
--
peth2     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
--
vif0.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
--
vif0.2    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
--
xenbr0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
--
xenbr2    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1


 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.