Abandoned - Two Node Fedora 13 Cluster: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
Line 318: Line 318:
=== Pacemaker ===
=== Pacemaker ===


Pacemaker is the cluster resource manager. It can be used to trigger scripts to control non cluster-aware applications. In this way, these non cluster-aware applications  
[http://www.clusterlabs.org Pacemaker] is the cluster resource manager. It can be used to trigger scripts to control non cluster-aware applications. In this way, these non cluster-aware applications can be made more highly available.
 
=== DRBD ===
 
 
 
=== LVM ===






{{footer}}
{{footer}}

Revision as of 04:14, 5 July 2010

 AN!Wiki :: How To :: Abandoned - Two Node Fedora 13 Cluster

Overview

This paper has one goal;

  1. How to assemble the simplest cluster possible, a 2-Node Cluster, which you can then expand on for your own needs.

With this completed, you can then jump into "step 2" papers that will show various uses of a two node cluster:

  1. How to create a "floating" virtual machine that can move between the two nodes in the event of a node failure, maximizing up time.
  2. How to create simple resources that can move between nodes. Examples will be a simple PostgreSQL database, DHCP, DNS and web servers.

Prerequisites

It is expected that you are already comfortable with the Linux command line, specifically bash, and that you are familiar with general administrative tasks in Red Hat based distributions, specifically Fedora. You will also need to be comfortable using editors like vim, nano or similar. This paper uses vim in examples. Simply substitute your favourite editor in it's place.

You are also expected to be comfortable with networking concepts. You will be expected to understand TCP/IP, multicast, broadcast, subnets and netmasks, routing and other relatively basic networking concepts. Please take the time to become familiar with these concepts before proceeding.

This said, where feasible, as much detail as is possible will be provided. For example, all configuration file locations will be shown and functioning sample files will be provided.

Platform

This paper will implement the Red Hat Cluster Suite using the Fedora v13 distribution. This paper uses the x86_64 repositories, however, if you are on an i386 (32 bit) system, you should be able to following along fine. Simply replace x86_64 with i386 or i686 in package names.

You can either download the stock Fedora 13 DVD ISO (currently at version 5.4 which is used in this document), or you can try out the alpha AN!Cluster Install DVD. (4.3GB iso). If you use the later, please test it out on a development or test cluster. If you have any problems with the AN!Cluster variant Fedora distro, please contact me and let me know what your trouble was.

Why Fedora 13?

Generally speaking, I much prefer to use a server-oriented Linux distribution like CentOS, Debian or similar. However, there have been many recent changes in the Linux-Clustering world that have made all of the currently available server-class distributions obsolete. With luck, once Red Hat Enterprise Linux and CentOS version 6 is released, this will change.

Until then, Fedora version 13 provides the most up to date binary releases of the new implementation of the clustering stack available. For this reason, FC13 is the best choice in clustering, if you want to be current. To mitigate some of the issues introduced by using a workstation distribution, many packages will be stripped out of the default install.

Focus

Clusters can serve to solve three problems; Reliability, Performance and Scalability.

This focus of the cluster described in this paper is primarily reliability. Second to this, scalability will be the priority leaving performance to be addressed only when it does not impact the first two criteria. This is not to indicate that performance is not a valid priority, it simply isn't the priority of this paper.

Goal

At the end of this paper, you should have a fully functioning two-node array capable of hosting a "floating" resources. That is, resources that exists on one node and can be easily moved to the other node with minimal effort and down time. This should conclude with a solid foundation for adding more virtual servers up to the limit of your cluster's resources.

This paper should also server to show how to build the foundation of any other cluster configuration. This paper has a core focus of introducing the main issues that come with clustering and hopes to serve as a foundation for any cluster configuration outside the scope of this paper.

Begin

Let's begin!

Hardware

We will need two physical servers each with the following hardware:

  • One or more multi-core CPUs with Virtualization support.
  • Three network cards; At least one should be gigabit or faster.
  • One or more hard drives.
  • You will need some form of a fence device. This can be an IPMI-enabled server, a Node Assassin, a fenceable PDU or similar.

This paper uses the following hardware:

  • ASUS M4A78L-M
  • AMD Athlon II x2 250
  • 2GB Kingston DDR2 KVR800D2N6K2/4G (4GB kit split between the two nodes)
  • 1x Intel 82540 PCI NICs
  • 1x D-Link DGE-560T
  • Node Assassin

This is not an endorsement of the above hardware. I bought what was within my budget that would server the purposes of creating this document. What you purchase shouldn't matter, so long at the minimum requirements are met.

Pre-Assembly

With multiple NICs, it is quite likely that the mapping of physical devices to logical ethX devices may not be ideal. This is a particular issue if you decide to network boot your install media. In that case, if the wrong NIC is chosen for eth0, you will be presented with a list of MAC addresses to attempt setup with.

Before you assemble your servers, record their network cards' MAC addresses. I like to keep simply text files like these:

cat an-node01.mac
90:E6:BA:71:82:EA	eth0	# Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
00:21:91:19:96:53	eth1	# D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter
00:0E:0C:59:46:E4	eth2	# Intel Corporation 82540EM Gigabit Ethernet Controller
cat an-node02.mac
90:E6:BA:71:82:D8	eth0	# Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller
00:21:91:19:96:5A	eth1	# D-Link System Inc DGE-560T PCI Express Gigabit Ethernet Adapter
00:0E:0C:59:45:78	eth2	# Intel Corporation 82540EM Gigabit Ethernet Controller

Feel free to record the information in any way that suits you the best.

OS Install

Start with a stock Fedora 13 install. This How-To uses Fedora 13 x86_64, however it should be fairly easy to adapt to other recent Fedora version. This document is also attempting to be easily ported to CentOS/RHEL version 6 once it is released. This will not adapt well to CentOS/RHEL version 5 though... Much of the cluster stack has changed dramatically since it was initially released.

These are sample kickstart script used by this paper. Be sure to set your how password string and network settings.

Warning! These kickstart scripts will erase your hard drive! Adapt them, don't blindly use them.

Generic cluster node kickstart scripts.

AN!Cluster Install

If you are feeling brave, below is a link to a custom install DVD that contains the kickstart scripts to setup nodes and an an-cluster directory with all the configuration files.

  • Download the custom AN!Cluster v0.2.001 Install DVD. (4.5GiB iso). (Currently disabled - Reworking for F13)

Post OS Install

Once the OS is installed, we need to do a couple things;

  1. Setup networking.
  2. Change the default run-level.

Setup Networking

We need to remove NetworkManager, enable network, configure the ifcfg-eth* files and then start the network daemon.

Network Layout

This setup expects you to have three physical network cards connected to three independent networks. To have a common vernacular, lets use this table to describe them:

Network Description Short Name Device Name Suggested Subnet NIC Properties
Internet-Facing Network IFN eth0 192.168.1.0/24 Remaining NIC should be used here.
If using a PXE server, this should be a bootable NIC.
Storage Network SN eth1 10.0.0.0/24 Fastest NIC should be used here.
Back-Channel Network BCN eth2 10.0.1.0/24 NICs with IPMI piggy-back must be used here.
Second-fastest NIC should be used here.

Take note of these concerns when planning which NIC to use on each subnet. These issues are presented in the order that they must be addressed in:

  1. If your nodes have IPMI piggy-backing on a normal NIC, that NIC must be used on BCN subnet. Having your fence device accessible on a subnet that can be remotely accessed can pose a major security risk.
  2. The fastest NIC should be used for your SN subnet. Be sure to know which NICs support the largest jumbo frames when considering this.
  3. If you still have two NICs to choose from, use the fastest remaining NIC for your BCN subnet. This will minimize the time it takes to perform tasks like hot-migration of live virtual machines.
  4. The final NIC should be used for the IFN subnet.

Node IP Addresses

Obviously, the IP addresses you give to your nodes should be ones that suit you best. In this example, the following IP addresses are used:

  Internet-Facing Network (IFN) Storage Network (SN) Back-Channel Network (BCN)
an-node01 192.168.1.71 10.0.0.71 10.0.1.71
an-node02 192.168.1.72 10.0.0.72 10.0.1.72

Remove NetworkManager

Some cluster software will not start with NetworkManager installed. This is because is designed to be a highly-adaptive network system that can accommodate frequent changes in the network. To simplify these network transitions for end-users, a lot of reconfiguration of the network is done behind the scenes.

For workstations, this is wonderful. For clustering, this can be disastrous. Transient network outages are already a risk to a cluster's stability!

So first up, make sure that NetworkManager is completely removed from your system. If you used the kickstart scripts, then it was not installed. Otherwise, run:

yum remove NetworkManager NetworkManager-gnome NetworkManager-openconnect NetworkManager-openvpn NetworkManager-pptp NetworkManager-vpnc cnetworkmanager knetworkmanager knetworkmanager-openvpn knetworkmanager-pptp knetworkmanager-vpnc libproxy-networkmanager yum-NetworkManager-dispatcher

Setup 'network'

Before proceeding with network configuration, check to see if your network cards are aligned to the proper ethX network names. If they need to be adjust, please follow this How-To before proceeding:

There are a few ways to configure network in Fedora:

  • system-config-network (graphical)
  • system-config-network-tui (ncurses)
  • Directly editing the /etc/sysconfig/network-scripts/ifcfg-eth* files. (See: here for a full list of options)

Do not proceed until your node's networking is fully configured.

Update the Hosts File

Some applications expect to be able to call nodes by their name. To accommodate this, and to ensure that inter-node communication takes place on the back-channel subnet, we remove any existing hostname entries and then add the following to the /etc/hosts file:

Note: Any pre-existing entries matching the name returned by uname -n must be removed from /etc/hosts. There is a good chance there will be an entry that resolves to 127.0.0.1 which would cause problems later.

vim /etc/hosts
# Back-channel IPs to name mapping.
10.0.1.71	an-node01 an-node01.alteeve.com
10.0.1.72	an-node02 an-node02.alteeve.com

Obviously, adapt the names and IPs to match your nodes and subnets. The only critical thing is to make sure that the name returned by uname -n is resolvable to the back-channel subnet. I like to add a short-form name for convenience.

The updated /etc/hosts file should look like this:

cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# Back-channel IPs to name mapping.
10.0.1.71	an-node01 an-node01.alteeve.com
10.0.1.72	an-node02 an-node02.alteeve.com

To test this, ping both nodes by name and make sure the ping packets are sent on the 10.0.1.0/24 subnet:

ping -c 5 an-node01
PING an-node01 (10.0.1.71) 56(84) bytes of data.
64 bytes from an-node01 (10.0.1.71): icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from an-node01 (10.0.1.71): icmp_seq=2 ttl=64 time=0.055 ms
64 bytes from an-node01 (10.0.1.71): icmp_seq=3 ttl=64 time=0.055 ms
64 bytes from an-node01 (10.0.1.71): icmp_seq=4 ttl=64 time=0.055 ms
64 bytes from an-node01 (10.0.1.71): icmp_seq=5 ttl=64 time=0.055 ms
ping -c 5 an-node02
PING an-node02 (10.0.1.72) 56(84) bytes of data.
64 bytes from an-node02 (10.0.1.72): icmp_seq=1 ttl=64 time=0.221 ms
64 bytes from an-node02 (10.0.1.72): icmp_seq=2 ttl=64 time=0.188 ms
64 bytes from an-node02 (10.0.1.72): icmp_seq=3 ttl=64 time=0.217 ms
64 bytes from an-node02 (10.0.1.72): icmp_seq=4 ttl=64 time=0.192 ms
64 bytes from an-node02 (10.0.1.72): icmp_seq=5 ttl=64 time=0.163 ms

Disable Firewalls

Be sure to flush netfilter tables and disable iptables and ip6tables from starting on our nodes.

There will be enough potential sources of problem as it is. Disabling firewalls at this stage will minimize the chance of an errant iptables rule messing up our configuration. If, before launch, you wish to implement a firewall, feel free to do so but be sure to thoroughly test your cluster to ensure no problems were introduced.

chkconfig --level 2345 iptables off
/etc/init.d/iptables stop
chkconfig --level 2345 ip6tables off
/etc/init.d/ip6tables stop

Change the Default Run-Level

This is an optional step. It improves performance only, it is not a required step.

If you don't plan to work on your nodes directly, it makes sense to switch the default run level from 5 to 3. This prevents the window manager, like Gnome or KDE, from starting at boot. This frees up a fair of memory and system resources and reducing the possible attack vectors.

To do this, edit /etc/inittab, change the id:5:initdefault: line to id:3:initdefault: and then switch to run level 3:

vim /etc/inittab
id:3:initdefault:
init 3

Initial Cluster Setup

Before we get into specifics, let's take a minute to talk about the major components used in our cluster.

Core Program

These are the core programs that may be new to you that we will use to build our cluster.

Corosync

Corosync is a relatively new replacement to the original OpenAIS cluster manager. It's goal is to provide a substantially simpler and more flexible set of APIs to facilitate clustering in Linux. It manages which nodes are in the cluster, it triggers error messages when something fails, manages cluster locking and so on. Most other clustered applications rely on corosync to know when something has happened or to announce when a cluster-related action has taken place.

yum install corosync corosynclib libibverbs libmlx4 librdmacm

Please note that the corosync_overview man page is considered out of date at the time of this writing.

OpenAIS

OpenAIS is now an extension to Corosync that adds an open-source implementation of the Service Availability (SA) Forum's 'Application Interface Specification' (AIS). It is an API and policy designed to be used by applications concerned with maintaining services during faults. AIS implements the 'Availability Management Framework' (AMF) which, in turn, provides for application fail over, cluster management (CLM), Checkpointing (CKPT), Eventing (EVT), Messaging (MSG), and Distributed Locking (DLOCK).

In short; applications can use OpenAIS to be cluster-aware. It's libraries are used by some applications, including Pacemaker. In our application, we will only be using it's libraries indirectly. We will not be using it directly.

Please note that the openais_overview man page is considered out of date at the time of this writing.

Pacemaker

Pacemaker is the cluster resource manager. It can be used to trigger scripts to control non cluster-aware applications. In this way, these non cluster-aware applications can be made more highly available.


 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.