2x5 Scalable Cluster Tutorial
| Alteeve Wiki :: How To :: 2x5 Scalable Cluster Tutorial | 
|  | Warning: This tutorial is not even close to complete or accurate. It will be updated later, but so long as this warning is here, consider it defective and unusable. The only up to date clustering tutorial is: Red Hat Cluster Service 2 Tutorial. | 
The Design
All nodes have IPs as follows:
 * eth0 == Internet Facing Network == 192.168.1.x
 * eth1 == Storage Network         == 192.168.2.x
 * eth2 == Back Channel Network    == 192.168.3.x
   * Where 'x' = the node ID (ie: an-node01 -> x=1)
 * If a node has an IPMI (or similar) interface piggy-backed on a network
   interface, it will be shared with eth2. If it has a dedicated interface, it
   will be connected to the BCN.
 * Node management interfaces will be on 192.168.3.(x+100)
 * All subnets are /24 (255.255.255.0)
 Storage node use 2x SATA drives ('sda' and 'sdb') plus 2x SSD drives ('sdc' 
 and 'sdd').
 Logical map:
  ___________________________________________                     ___________________________________________ 
 | [ an-node01 ]                       ______|                   |______                       [ an-node02 ] |
 |  ______    _____    _______        | eth0 =------\     /------= eth0 |        _______    _____    ______  |
 | [_sda1_]--[_md0_]--[_/boot_]       |_____||      |     |      ||_____|       [_/boot_]--[_md0_]--[_sda1_] |
 | [_sdb1_]                                  |      |     |      |                                  [_sdb1_] |
 |  ______    _____    ______          ______|      |     |      |______          ______    _____    ______  |
 | [_sda2_]--[_md1_]--[_swap_]   /----| eth1 =----\ |     | /----= eth1 |----\   [_swap_]--[_md1_]--[_sda2_] |
 | [_sdb2_]                      | /--|_____||    | |     | |    ||_____|--\ |                      [_sdb2_] |
 |  ______    _____    ___       | |         |    | |     | |    |         | |       ___    _____    ______  |
 | [_sda3_]--[_md2_]--[_/_]      | |   ______|    | |     | |    |______   | |      [_/_]--[_md2_]--[_sda3_] |
 | [_sdb3_]                      | |  | eth2 =--\ | |     | | /--= eth2 |  | |                      [_sdb3_] |
 |  ______    _____    _______   | |  |_____||  | | |     | | |  ||_____|  | |   _______    _____    ______  |
 | [_sda5_]--[_md3_]--[_drbd0_]--/ |         |  | | |     | | |  |         | \--[_drbd0_]--[_md3_]--[_sda5_] |
 | [_sdb5_]                        |         |  | | |     | | |  |         |                        [_sdb5_] |
 |  ______    _____    _______     |         |  | | |     | | |  |         |     _______    _____    ______  |
 | [_sdc1_]--[_md4_]--[_drbd1_]----/         |  | | |     | | |  |         \----[_drbd1_]--[_md4_]--[_sdc1_] |
 | [_sdd1_]                                  |  | | |     | | |  |                                  [_sdd1_] |
 |___________________________________________|  | | |     | | |  |___________________________________________|
                                                | | |     | | |
                        /---------------------------/     | | |
                        |                       | |       | | |
                        | /-------------------------------/ | |
                        | |                     | |         | \-----------------------\
                        | |                     | |         |                         |
                        | |                     \-----------------------------------\ |
                        | |                       |         |                       | |
                        | |   ____________________|_________|____________________   | |
                        | |  [ iqn.2011-08.com.alteeve:an-clusterA.target01.hdd  ]  | |
                        | |  [ iqn.2011-08.com.alteeve:an-clusterA.target02.sdd  ]  | |
   _________________    | |  [    drbd0 == hdd == vg01           Floating IP     ]  | |      ___________________
  [ Internet Facing ]   | |  [____drbd1_==_sdd_==_vg02__________192.168.2.100____]  | | /---[ Internal Managed  ]
  [_____Routers_____]   | |                             | |                         | | |   [  Private Network  ]
                  |     | \-----------\                 | |                   /-----/ | |   [_and_fence_devices_]
                  |     \-----------\ |                 | |                   | /-----/ |
                  \---------------\ | |                 | |                   | | /-----/
                                 _|_|_|___________     _|_|_____________     _|_|_|___________
 [ Storage Cluster ]            [ Internet Facing ]   [ Storage Network ]   [  Back-Channel   ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[_____Network_____]~~~[_________________]~~~[_____Network_____]~~~~~~~~~~~~~~~
 [    VM Cluster   ]              | | | | |             | | | | |             | | | | |  
                                  | | | | |             | | | | |             | | | | |
  __________________________      | | | | |             | | | | |             | | | | |
 | [ an-node03 ]      ______|     | | | | |             | | | | |             | | | | |
 |                   | eth0 =-----/ | | | |             | | | | |             | | | | |
 |                   |_____||       | | | |             | | | | |             | | | | |
 |                          |       | | | |             | | | | |             | | | | |
 |                    ______|       | | | |             | | | | |             | | | | |
 |                   | eth1 =---------------------------/ | | | |             | | | | |
 |                   |_____||       | | | |               | | | |             | | | | |
 |                          |       | | | |               | | | |             | | | | |
 |                    ______|       | | | |               | | | |             | | | | |
 |                   | eth2 =-------------------------------------------------/ | | | |
 |                   |_____||       | | | |               | | | |               | | | |
 |__________________________|       | | | |               | | | |               | | | |
                                    | | | |               | | | |               | | | |
  __________________________        | | | |               | | | |               | | | |
 | [ an-node04 ]      ______|       | | | |               | | | |               | | | |
 |                   | eth0 =-------/ | | |               | | | |               | | | |
 |                   |_____||         | | |               | | | |               | | | |
 |                          |         | | |               | | | |               | | | |
 |                    ______|         | | |               | | | |               | | | |
 |                   | eth1 =-----------------------------/ | | |               | | | |
 |                   |_____||         | | |                 | | |               | | | |
 |                          |         | | |                 | | |               | | | |
 |                    ______|         | | |                 | | |               | | | |
 |                   | eth2 =---------------------------------------------------/ | | |
 |                   |_____||         | | |                 | | |                 | | |
 |__________________________|         | | |                 | | |                 | | |
                                      | | |                 | | |                 | | |
  __________________________          | | |                 | | |                 | | |
 | [ an-node05 ]      ______|         | | |                 | | |                 | | |
 |                   | eth0 =---------/ | |                 | | |                 | | |
 |                   |_____||           | |                 | | |                 | | |
 |                          |           | |                 | | |                 | | |
 |                    ______|           | |                 | | |                 | | |
 |                   | eth1 =-------------------------------/ | |                 | | |
 |                   |_____||           | |                   | |                 | | |
 |                          |           | |                   | |                 | | |
 |                    ______|           | |                   | |                 | | |
 |                   | eth2 =-----------------------------------------------------/ | |
 |                   |_____||           | |                   | |                   | |
 |__________________________|           | |                   | |                   | |
                                        | |                   | |                   | |
  __________________________            | |                   | |                   | |
 | [ an-node06 ]      ______|           | |                   | |                   | |
 |                   | eth0 =-----------/ |                   | |                   | |
 |                   |_____||             |                   | |                   | |
 |                          |             |                   | |                   | |
 |                    ______|             |                   | |                   | |
 |                   | eth1 =---------------------------------/ |                   | |
 |                   |_____||             |                     |                   | |
 |                          |             |                     |                   | |
 |                    ______|             |                     |                   | |
 |                   | eth2 =-------------------------------------------------------/ |
 |                   |_____||             |                     |                     |
 |__________________________|             |                     |                     |
                                          |                     |                     |
  __________________________              |                     |                     |
 | [ an-node07 ]      ______|             |                     |                     |
 |                   | eth0 =-------------/                     |                     |
 |                   |_____||                                   |                     |
 |                          |                                   |                     |
 |                    ______|                                   |                     |
 |                   | eth1 =-----------------------------------/                     |
 |                   |_____||                                                         |
 |                          |                                                         |
 |                    ______|                                                         |
 |                   | eth2 =---------------------------------------------------------/
 |                   |_____||
 |__________________________|
Install The Cluster Software
If you are using Red Hat Enterprise Linux, you will need to add the RHEL Server Optional (v. 6 64-bit x86_64) channel for each node in your cluster. You can do this in RHN by going the your subscription management page, clicking on each server, clicking on "Alter Channel Subscriptions", click to enable the RHEL Server Optional (v. 6 64-bit x86_64) channel and then by clicking on "Change Subscription".
This actual installation is simple, just use yum to install cman.
yum install cman fence-agents rgmanager resource-agents lvm2-cluster gfs2-utils python-virtinst libvirt qemu-kvm-tools qemu-kvm virt-manager virt-viewer
Initial Config
Everything uses ricci, which itself needs to have a password set. I set this to match root.
Both:
passwd ricci
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
With these decisions and the information gathered, here is what our first /etc/cluster/cluster.conf file will look like.
touch /etc/cluster/cluster.conf
vim /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="an-clusterA" config_version="1">
        <cman two_node="1" expected_votes="1" />
        <totem secauth="off" rrp_mode="none" />
        <clusternodes>
                <clusternode name="an-node01.alteeve.com" nodeid="1">
                        <fence>
                                <method name="PDU">
                                        <device name="pdu2" action="reboot" port="1" />
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node02.alteeve.com" nodeid="2">
                        <fence>
                                <method name="PDU">
                                        <device name="pdu2" action="reboot" port="2" />
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice name="pdu2" agent="fence_apc" ipaddr="192.168.1.6" login="apc" passwd="secret" />
        </fencedevices>
        <rm>
                <resources>
                        <ip address="192.168.2.100" monitor_link="on"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="an1_primary" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node01.alteeve.com" priority="1"/>
                                <failoverdomainnode name="an-node02.alteeve.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="1" name="san_ip1" domain="an1_primary">
                        <ip ref="192.168.2.100"/>
                </service>
        </rm>
</cluster>
Save the file, then validate it. If it fails, address the errors and try again.
ip addr list | grep <ip>
rg_test test /etc/cluster/cluster.conf
ccs_config_validate
Configuration validates
Push it to the other node:
rsync -av /etc/cluster/cluster.conf root@an-node02:/etc/cluster/
sending incremental file list
cluster.conf
sent 781 bytes  received 31 bytes  541.33 bytes/sec
total size is 701  speedup is 0.86
Start:
DO NOT PROCEED UNTIL YOUR cluster.conf FILE VALIDATES!
Unless you have it perfect, your cluster will fail.
Once it validates, proceed.
Starting The Cluster For The First Time
By default, if you start one node only and you've enabled the <cman two_node="1" expected_votes="1"/> option as we have done, the lone server will effectively gain quorum. It will try to connect to the cluster, but there won't be a cluster to connect to, so it will fence the other node after a timeout period. This timeout is 6 seconds by default.
For now, we will leave the default as it is. If you're interested in changing it though, the argument you are looking for is post_join_delay.
This behaviour means that we'll want to start both nodes well within six seconds of one another, least the slower one get needlessly fenced.
Left off here
Note to help minimize dual-fences:
- you could add FENCED_OPTS="-f 5" to /etc/sysconfig/cman on *one* node (ilo fence devices may need this)
DRBD Config
Install from source:
Both:
# Obliterate peer - fence via cman
wget -c https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh -O /sbin/obliterate-peer.sh
chmod a+x /sbin/obliterate-peer.sh
ls -lah /sbin/obliterate-peer.sh
# Download, compile and install DRBD
wget -c http://oss.linbit.com/drbd/8.3/drbd-8.3.11.tar.gz
tar -xvzf drbd-8.3.11.tar.gz
cd drbd-8.3.11
./configure \
   --prefix=/usr \
   --localstatedir=/var \
   --sysconfdir=/etc \
   --with-utils \
   --with-km \
   --with-udev \
   --with-pacemaker \
   --with-rgmanager \
   --with-bashcompletion
make
make install
Configure
an-node01:
# Configure DRBD's global value.
cp /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.orig
vim /etc/drbd.d/global_common.conf
diff -u /etc/drbd.d/global_common.conf
--- /etc/drbd.d/global_common.conf.orig	2011-08-01 21:58:46.000000000 -0400
+++ /etc/drbd.d/global_common.conf	2011-08-01 23:18:27.000000000 -0400
@@ -15,24 +15,35 @@
 		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
 		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
 		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
+		fence-peer		"/sbin/obliterate-peer.sh";
 	}
 
 	startup {
 		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
+		become-primary-on	both;
+		wfc-timeout		300;
+		degr-wfc-timeout	120;
 	}
 
 	disk {
 		# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
 		# no-disk-drain no-md-flushes max-bio-bvecs
+		fencing			resource-and-stonith;
 	}
 
 	net {
 		# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
 		# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
 		# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
+		allow-two-primaries;
+		after-sb-0pri		discard-zero-changes;
+		after-sb-1pri		discard-secondary;
+		after-sb-2pri		disconnect;
 	}
 
 	syncer {
 		# rate after al-extents use-rle cpu-mask verify-alg csums-alg
+		# This should be no more than 30% of the maximum sustainable write speed.
+		rate			20M;
 	}
 }
vim /etc/drbd.d/r0.res
resource r0 {
        device          /dev/drbd0;
        meta-disk       internal;
        on an-node01.alteeve.com {
                address         192.168.2.71:7789;
                disk            /dev/sda5;
        }
        on an-node02.alteeve.com {
                address         192.168.2.72:7789;
                disk            /dev/sda5;
        }
}
cp /etc/drbd.d/r0.res /etc/drbd.d/r1.res 
vim /etc/drbd.d/r1.res
resource r1 {
        device          /dev/drbd1;
        meta-disk       internal;
        on an-node01.alteeve.com {
                address         192.168.2.71:7790;
                disk            /dev/sdb1;
        }
        on an-node02.alteeve.com {
                address         192.168.2.72:7790;
                disk            /dev/sdb1;
        }
}
Validate:
drbdadm dump
  --==  Thank you for participating in the global usage survey  ==--
The server's response is:
you are the 369th user to install this version
# /usr/etc/drbd.conf
common {
    protocol               C;
    net {
        allow-two-primaries;
        after-sb-0pri    discard-zero-changes;
        after-sb-1pri    discard-secondary;
        after-sb-2pri    disconnect;
    }
    disk {
        fencing          resource-and-stonith;
    }
    syncer {
        rate             20M;
    }
    startup {
        wfc-timeout      300;
        degr-wfc-timeout 120;
        become-primary-on both;
    }
    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        local-io-error   "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        fence-peer       /sbin/obliterate-peer.sh;
    }
}
# resource r0 on an-node01.alteeve.com: not ignored, not stacked
resource r0 {
    on an-node01.alteeve.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/sda5;
        address          ipv4 192.168.2.71:7789;
        meta-disk        internal;
    }
    on an-node02.alteeve.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/sda5;
        address          ipv4 192.168.2.72:7789;
        meta-disk        internal;
    }
}
# resource r1 on an-node01.alteeve.com: not ignored, not stacked
resource r1 {
    on an-node01.alteeve.com {
        device           /dev/drbd1 minor 1;
        disk             /dev/sdb1;
        address          ipv4 192.168.2.71:7790;
        meta-disk        internal;
    }
    on an-node02.alteeve.com {
        device           /dev/drbd1 minor 1;
        disk             /dev/sdb1;
        address          ipv4 192.168.2.72:7790;
        meta-disk        internal;
    }
}
rsync -av /etc/drbd.d root@an-node02:/etc/
drbd.d/
drbd.d/global_common.conf
drbd.d/global_common.conf.orig
drbd.d/r0.res
drbd.d/r1.res
sent 3523 bytes  received 110 bytes  7266.00 bytes/sec
total size is 3926  speedup is 1.08
Initialize and First start
Both:
Create the meta-data.
drbdadm create-md r{0,1}
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
Attach, connect and confirm (after both have attached and connected):
drbdadm attach r{0,1}
drbdadm connect r{0,1}
cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:441969960
 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:29309628
There is no data, so force both devices to be instantly UpToDate:
drbdadm -- --clear-bitmap new-current-uuid r{0,1}
cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Set both to primary and run a final check.
drbdadm primary r{0,1}
cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Update the cluster
vim /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="17" name="an-clusterA">
        <cman expected_votes="1" two_node="1"/>
        <totem rrp_mode="none" secauth="off"/>
        <clusternodes>
                <clusternode name="an-node01.alteeve.com" nodeid="1">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node02.alteeve.com" nodeid="2">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.6" login="apc" name="pdu2" passwd="secret"/>
        </fencedevices>
        <fence_daemon post_join_delay="30"/>
        <rm>
                <resources>
                        <ip address="192.168.2.100" monitor_link="on"/>
                        <script file="/etc/init.d/drbd" name="drbd"/>
                        <script file="/etc/init.d/clvmd" name="clvmd"/>
                        <script file="/etc/init.d/tgtd" name="tgtd"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="an1_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node01.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an2_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node02.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an1_primary" nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode name="an-node01.alteeve.com" priority="1"/>
                                <failoverdomainnode name="an-node02.alteeve.com" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="0" domain="an1_only" exclusive="0" max_restarts="0" name="an1_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd"/>
                        </script>
                </service>
                <service autostart="0" domain="an2_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd"/>
                        </script>
                </service>
        </rm>
</cluster>
rg_test test /etc/cluster/cluster.conf
Running in test mode.
Loading resource rule from /usr/share/cluster/oralistener.sh
Loading resource rule from /usr/share/cluster/apache.sh
Loading resource rule from /usr/share/cluster/SAPDatabase
Loading resource rule from /usr/share/cluster/postgres-8.sh
Loading resource rule from /usr/share/cluster/lvm.sh
Loading resource rule from /usr/share/cluster/mysql.sh
Loading resource rule from /usr/share/cluster/lvm_by_vg.sh
Loading resource rule from /usr/share/cluster/service.sh
Loading resource rule from /usr/share/cluster/samba.sh
Loading resource rule from /usr/share/cluster/SAPInstance
Loading resource rule from /usr/share/cluster/checkquorum
Loading resource rule from /usr/share/cluster/ocf-shellfuncs
Loading resource rule from /usr/share/cluster/svclib_nfslock
Loading resource rule from /usr/share/cluster/script.sh
Loading resource rule from /usr/share/cluster/clusterfs.sh
Loading resource rule from /usr/share/cluster/fs.sh
Loading resource rule from /usr/share/cluster/oracledb.sh
Loading resource rule from /usr/share/cluster/nfsserver.sh
Loading resource rule from /usr/share/cluster/netfs.sh
Loading resource rule from /usr/share/cluster/orainstance.sh
Loading resource rule from /usr/share/cluster/vm.sh
Loading resource rule from /usr/share/cluster/lvm_by_lv.sh
Loading resource rule from /usr/share/cluster/tomcat-6.sh
Loading resource rule from /usr/share/cluster/ip.sh
Loading resource rule from /usr/share/cluster/nfsexport.sh
Loading resource rule from /usr/share/cluster/openldap.sh
Loading resource rule from /usr/share/cluster/ASEHAagent.sh
Loading resource rule from /usr/share/cluster/nfsclient.sh
Loading resource rule from /usr/share/cluster/named.sh
Loaded 24 resource rules
=== Resources List ===
Resource type: ip
Instances: 1/1
Agent: ip.sh
Attributes:
  address = 192.168.2.100 [ primary unique ]
  monitor_link = on
  nfslock [ inherit("service%nfslock") ]
Resource type: script
Agent: script.sh
Attributes:
  name = drbd [ primary unique ]
  file = /etc/init.d/drbd [ unique required ]
  service_name [ inherit("service%name") ]
Resource type: script
Agent: script.sh
Attributes:
  name = clvmd [ primary unique ]
  file = /etc/init.d/clvmd [ unique required ]
  service_name [ inherit("service%name") ]
Resource type: script
Agent: script.sh
Attributes:
  name = tgtd [ primary unique ]
  file = /etc/init.d/tgtd [ unique required ]
  service_name [ inherit("service%name") ]
Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = an1_storage [ primary unique required ]
  domain = an1_only [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = restart [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0
Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = an2_storage [ primary unique required ]
  domain = an2_only [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = restart [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0
Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = san_ip [ primary unique required ]
  domain = an1_primary [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = relocate [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0
=== Resource Tree ===
service (S0) {
  name = "an1_storage";
  domain = "an1_only";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "restart";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  script (S0) {
    name = "drbd";
    file = "/etc/init.d/drbd";
    service_name = "an1_storage";
    script (S0) {
      name = "clvmd";
      file = "/etc/init.d/clvmd";
      service_name = "an1_storage";
    }
  }
}
service (S0) {
  name = "an2_storage";
  domain = "an2_only";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "restart";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  script (S0) {
    name = "drbd";
    file = "/etc/init.d/drbd";
    service_name = "an2_storage";
    script (S0) {
      name = "clvmd";
      file = "/etc/init.d/clvmd";
      service_name = "an2_storage";
    }
  }
}
service (S0) {
  name = "san_ip";
  domain = "an1_primary";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "relocate";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  ip (S0) {
    address = "192.168.2.100";
    monitor_link = "on";
    nfslock = "0";
  }
}
=== Failover Domains ===
Failover domain: an1_only
Flags: Restricted No Failback
  Node an-node01.alteeve.com (id 1, priority 0)
Failover domain: an2_only
Flags: Restricted No Failback
  Node an-node02.alteeve.com (id 2, priority 0)
Failover domain: an1_primary
Flags: Ordered No Failback
  Node an-node01.alteeve.com (id 1, priority 1)
  Node an-node02.alteeve.com (id 2, priority 2)
=== Event Triggers ===
Event Priority Level 100:
  Name: Default
    (Any event)
    File: /usr/share/cluster/default_event_script.sl
[root@an-node01 ~]# cman_tool version -r
You have not authenticated to the ricci daemon on an-node01.alteeve.com
Password: 
[root@an-node01 ~]# clusvcadm -e service:an1_storage
Local machine trying to enable service:an1_storage...Success
service:an1_storage is now running on an-node01.alteeve.com
[root@an-node01 ~]# cat /proc/drbd 
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:924 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:916 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
cman_tool version -r
You have not authenticated to the ricci daemon on an-node01.alteeve.com
Password:
an-node01:
clusvcadm -e service:an1_storage
service:an1_storage is now running on an-node01.alteeve.com
an-node02:
clusvcadm -e service:an2_storage
service:an2_storage is now running on an-node02.alteeve.com
Either
cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:924 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:916 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Configure Clustered LVM
span class="code">an-node01:
cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.orig
vim /etc/lvm/lvm.conf
diff -u /etc/lvm/lvm.conf.orig /etc/lvm/lvm.conf
--- /etc/lvm/lvm.conf.orig	2011-08-02 21:59:01.000000000 -0400
+++ /etc/lvm/lvm.conf	2011-08-02 22:00:17.000000000 -0400
@@ -50,7 +50,8 @@
 
 
     # By default we accept every block device:
-    filter = [ "a/.*/" ]
+    #filter = [ "a/.*/" ]
+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
 
     # Exclude the cdrom drive
     # filter = [ "r|/dev/cdrom|" ]
@@ -308,7 +309,8 @@
     # Type 3 uses built-in clustered locking.
     # Type 4 uses read-only locking which forbids any operations that might 
     # change metadata.
-    locking_type = 1
+    #locking_type = 1
+    locking_type = 3
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -324,7 +326,8 @@
     # to 1 an attempt will be made to use local file-based locking (type 1).
     # If this succeeds, only commands against local volume groups will proceed.
     # Volume Groups marked as clustered will be ignored.
-    fallback_to_local_locking = 1
+    #fallback_to_local_locking = 1
+    fallback_to_local_locking = 0
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
rsync -av /etc/lvm/lvm.conf root@an-node02:/etc/lvm/
sending incremental file list
lvm.conf
sent 2412 bytes  received 247 bytes  5318.00 bytes/sec
total size is 24668  speedup is 9.28
Create the LVM PVs, VGs and LVs.
an-node01:
pvcreate /dev/drbd{0,1}
  Physical volume "/dev/drbd0" successfully created
  Physical volume "/dev/drbd1" successfully created
an-node02:
pvscan
  PV /dev/drbd0                      lvm2 [421.50 GiB]
  PV /dev/drbd1                      lvm2 [27.95 GiB]
  Total: 2 [449.45 GiB] / in use: 0 [0   ] / in no VG: 2 [449.45 GiB]
an-node01:
vgcreate -c y hdd_vg0 /dev/drbd0 && vgcreate -c y sdd_vg0 /dev/drbd1
  Clustered volume group "hdd_vg0" successfully created
  Clustered volume group "ssd_vg0" successfully created
an-node02:
vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "ssd_vg0" using metadata type lvm2
  Found volume group "hdd_vg0" using metadata type lvm2
an-node01:
lvcreate -l 100%FREE -n lun0 /dev/hdd_vg0 && lvcreate -l 100%FREE -n lun1 /dev/ssd_vg0
  Logical volume "lun0" created
  Logical volume "lun1" created
an-node02:
lvscan
  ACTIVE            '/dev/ssd_vg0/lun1' [27.95 GiB] inherit
  ACTIVE            '/dev/hdd_vg0/lun0' [421.49 GiB] inherit
iSCSI notes
IET vs tgt pros and cons needed.
default iscsi port is 3260
initiator: This is the client. target: This is the server side. sid: Session ID; Found with iscsiadm -m session -P 1. SID and sysfs path are not persistent, partially start-order based. iQN: iSCSI Qualified Name; This is a string that uniquely identifies targets and initiators.
Both:
yum install iscsi-initiator-utils scsi-target-utils
an-node01:
cp /etc/tgt/targets.conf /etc/tgt/targets.conf.orig
vim /etc/tgt/targets.conf
diff -u /etc/tgt/targets.conf.orig /etc/tgt/targets.conf
--- /etc/tgt/targets.conf.orig	2011-07-31 12:38:35.000000000 -0400
+++ /etc/tgt/targets.conf	2011-08-02 22:19:06.000000000 -0400
@@ -251,3 +251,9 @@
 #        vendor_id VENDOR1
 #    </direct-store>
 #</target>
+
+<target iqn.2011-08.com.alteeve:an-clusterA.target01>
+	direct-store /dev/drbd0
+	direct-store /dev/drbd1
+	vendor_id Alteeve
rsync -av /etc/tgt/targets.conf root@an-node02:/etc/tgt/
sending incremental file list
targets.conf
sent 909 bytes  received 97 bytes  670.67 bytes/sec
total size is 7093  speedup is 7.05
Update the cluster
               <service autostart="0" domain="an1_only" exclusive="0" max_restarts="0" name="an1_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd">
                                        <script ref="tgtd"/>
                                </script>
                        </script>
                </service>
                <service autostart="0" domain="an2_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd">
                                        <script ref="tgtd"/>
                                </script>
                        </script>
                </service>
Connect to the SAN from a VM node
an-node03+:
iscsiadm -m discovery -t sendtargets -p 192.168.2.100
192.168.2.100:3260,1 iqn.2011-08.com.alteeve:an-clusterA.target01
iscsiadm --mode node --portal 192.168.2.100 --target iqn.2011-08.com.alteeve:an-clusterA.target01 --login
Logging in to [iface: default, target: iqn.2011-08.com.alteeve:an-clusterA.target01, portal: 192.168.2.100,3260]
Login to [iface: default, target: iqn.2011-08.com.alteeve:an-clusterA.target01, portal: 192.168.2.100,3260] successful.
fdisk -l
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00062f4a
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          33      262144   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              33        5255    41943040   83  Linux
/dev/sda3            5255        5777     4194304   82  Linux swap / Solaris
Disk /dev/sdb: 452.6 GB, 452573790208 bytes
255 heads, 63 sectors/track, 55022 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdc: 30.0 GB, 30010245120 bytes
64 heads, 32 sectors/track, 28620 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdc doesn't contain a valid partition table
Setup the VM Cluster
Install RPMs.
yum -y install lvm2-cluster cman fence-agents
Configure lvm.conf.
cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.orig
vim /etc/lvm/lvm.conf
diff -u /etc/lvm/lvm.conf.orig /etc/lvm/lvm.conf
--- /etc/lvm/lvm.conf.orig	2011-08-02 21:59:01.000000000 -0400
+++ /etc/lvm/lvm.conf	2011-08-03 00:35:45.000000000 -0400
@@ -308,7 +308,8 @@
     # Type 3 uses built-in clustered locking.
     # Type 4 uses read-only locking which forbids any operations that might 
     # change metadata.
-    locking_type = 1
+    #locking_type = 1
+    locking_type = 3
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -324,7 +325,8 @@
     # to 1 an attempt will be made to use local file-based locking (type 1).
     # If this succeeds, only commands against local volume groups will proceed.
     # Volume Groups marked as clustered will be ignored.
-    fallback_to_local_locking = 1
+    #fallback_to_local_locking = 1
+    fallback_to_local_locking = 0
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.
rsync -av /etc/lvm/lvm.conf root@an-node04:/etc/lvm/
sending incremental file list
lvm.conf
sent 873 bytes  received 247 bytes  2240.00 bytes/sec
total size is 24625  speedup is 21.99
rsync -av /etc/lvm/lvm.conf root@an-node05:/etc/lvm/
sending incremental file list
lvm.conf
sent 873 bytes  received 247 bytes  2240.00 bytes/sec
total size is 24625  speedup is 21.99
Config the cluster.
vim /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="5" name="an-clusterB">
        <totem rrp_mode="none" secauth="off"/>
        <clusternodes>
                <clusternode name="an-node03.alteeve.com" nodeid="1">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node04.alteeve.com" nodeid="2">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="4"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node05.alteeve.com" nodeid="3">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="5"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.6" login="apc" name="pdu2" passwd="secret"/>
        </fencedevices>
        <fence_daemon post_join_delay="30"/>
        <rm>
                <resources>
                        <script file="/etc/init.d/iscsi" name="iscsi" />
                        <script file="/etc/init.d/clvmd" name="clvmd" />
                </resources>
                <failoverdomains>
                        <failoverdomain name="an3_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node03.alteeve.com" />
                        </failoverdomain>
                        <failoverdomain name="an4_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node04.alteeve.com" />
                        </failoverdomain>
                        <failoverdomain name="an5_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node05.alteeve.com" />
                        </failoverdomain>
                </failoverdomains>
                <service autostart="1" domain="an3_only" exclusive="0" max_restarts="0" name="an1_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd"/>
                        </script>
                </service>
                <service autostart="1" domain="an4_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd"/>
                        </script>
                </service>
                <service autostart="1" domain="an5_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd"/>
                        </script>
                </service>
        </rm>   
</cluster>
ccs_config_validate
Configuration validates
Make sure iscsi and clvmd do not start on boot, stop both, then make sure they start and stop cleanly.
chkconfig clvmd off; chkconfig iscsi off; /etc/init.d/iscsi stop && /etc/init.d/clvmd stop
Stopping iscsi:                                            [  OK  ]
/etc/init.d/clvmd start && /etc/init.d/iscsi start && /etc/init.d/iscsi stop && /etc/init.d/clvmd stop
Starting clvmd: 
Activating VG(s):   No volume groups found
                                                           [  OK  ]
Starting iscsi:                                            [  OK  ]
Stopping iscsi:                                            [  OK  ]
Signaling clvmd to exit                                    [  OK  ]
clvmd terminated                                           [  OK  ]
Use the cluster to stop (in case it autostarted before now) and then start the services.
# Disable (stop)
clusvcadm -d service:an3_storage
clusvcadm -d service:an4_storage
clusvcadm -d service:an5_storage
# Enable (start)
clusvcadm -e service:an3_storage -m an-node03.alteeve.com
clusvcadm -e service:an4_storage -m an-node04.alteeve.com
clusvcadm -e service:an5_storage -m an-node05.alteeve.com
# Check
clustat
Cluster Status for an-clusterB @ Wed Aug  3 00:25:10 2011
Member Status: Quorate
 Member Name                             ID   Status
 ------ ----                             ---- ------
 an-node03.alteeve.com                       1 Online, Local, rgmanager
 an-node04.alteeve.com                       2 Online, rgmanager
 an-node05.alteeve.com                       3 Online, rgmanager
 Service Name                   Owner (Last)                   State         
 ------- ----                   ----- ------                   -----         
 service:an3_storage            an-node03.alteeve.com          started       
 service:an4_storage            an-node04.alteeve.com          started       
 service:an5_storage            an-node05.alteeve.com          started
Flush iSCSI's Cache
If you remove an iQN (or change the name of one), the /etc/init.d/iscsi script will return errors. To flush it and re-scan:
I am sure there is a more elegant way.
/etc/init.d/iscsi stop && rm -rf /var/lib/iscsi/nodes/* && iscsiadm -m discovery -t sendtargets -p 192.168.2.100
Setup the VM Cluster's Clustered LVM
Partition the SAN disks
an-node03:
fdisk -l
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00062f4a
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          33      262144   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              33        5255    41943040   83  Linux
/dev/sda3            5255        5777     4194304   82  Linux swap / Solaris
Disk /dev/sdc: 30.0 GB, 30010245120 bytes
64 heads, 32 sectors/track, 28620 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdb: 452.6 GB, 452573790208 bytes
255 heads, 63 sectors/track, 55022 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdb doesn't contain a valid partition table
Create partitions.
fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x403f1fb8.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').
Command (m for help): c
DOS Compatibility flag is not set
Command (m for help): u
Changing display/entry units to sectors
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-55022, default 1): 1
Last cylinder, +cylinders or +size{K,M,G} (1-55022, default 55022): 
Using default value 55022
Command (m for help): p
Disk /dev/sdb: 452.6 GB, 452573790208 bytes
255 heads, 63 sectors/track, 55022 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x403f1fb8
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       55022   441964183+  83  Linux
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): p
Disk /dev/sdb: 452.6 GB, 452573790208 bytes
255 heads, 63 sectors/track, 55022 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x403f1fb8
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       55022   441964183+  8e  Linux LVM
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
fdisk /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xba7503eb.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').
Command (m for help): c
DOS Compatibility flag is not set
Command (m for help): u
Changing display/entry units to sectors
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-58613759, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-58613759, default 58613759): 
Using default value 58613759
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 8e
Changed system type of partition 1 to 8e (Linux LVM)
Command (m for help): p
Disk /dev/sdc: 30.0 GB, 30010245120 bytes
64 heads, 32 sectors/track, 28620 cylinders, total 58613760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xba7503eb
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048    58613759    29305856   8e  Linux LVM
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
fdisk -l
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00062f4a
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          33      262144   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              33        5255    41943040   83  Linux
/dev/sda3            5255        5777     4194304   82  Linux swap / Solaris
Disk /dev/sdc: 30.0 GB, 30010245120 bytes
64 heads, 32 sectors/track, 28620 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xba7503eb
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               2       28620    29305856   8e  Linux LVM
Disk /dev/sdb: 452.6 GB, 452573790208 bytes
255 heads, 63 sectors/track, 55022 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x403f1fb8
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       55022   441964183+  8e  Linux LVM
Setup LVM devices
Create PV.
an-node03:
pvcreate /dev/sd{b,c}1
  Physical volume "/dev/sdb1" successfully created
  Physical volume "/dev/sdc1" successfully created
an-node04 and an-node05:
pvscan
  PV /dev/sdb1                      lvm2 [421.49 GiB]
  PV /dev/sdc1                      lvm2 [27.95 GiB]
  Total: 2 [449.44 GiB] / in use: 0 [0   ] / in no VG: 2 [449.44 GiB]
Create the VGs.
an-node03:
vgcreate -c y san_vg01 /dev/sdb1
  Clustered volume group "san_vg01" successfully createdvgcreate -c y san_vg02 /dev/sdc1  Clustered volume group "san_vg02" successfully createdan-node04 and an-node05:
vgscan  Reading all physical volumes.  This may take a while...
  Found volume group "san_vg02" using metadata type lvm2
  Found volume group "san_vg01" using metadata type lvm2Create the first VM's LVs.
an-node03:
lvcreate -L 10G -n shared01 /dev/san_vg01  Logical volume "shared01" createdlvcreate -L 50G -n vm0001_hdd1 /dev/san_vg01  Logical volume "vm0001_hdd1" createdlvcreate -L 10G -n vm0001_ssd1 /dev/san_vg02  Logical volume "vm0001_ssd1" createdan-node04 and an-node05:
lvscan  ACTIVE            '/dev/san_vg01/shared01' [10.00 GiB] inherit
  ACTIVE            '/dev/san_vg02/vm0001_ssd1' [10.00 GiB] inherit
  ACTIVE            '/dev/san_vg01/vm0001_hdd1' [50.00 GiB] inheritan-node03:
mkfs.gfs2 -p lock_dlm -j 5 -t an-clusterB:shared01 /dev/san_vg01/shared01This will destroy any data on /dev/san_vg01/shared01.
It appears to contain: symbolic link to `../dm-2'
Are you sure you want to proceed? [y/n] y
Device:                    /dev/san_vg01/shared01
Blocksize:                 4096
Device Size                10.00 GB (2621440 blocks)
Filesystem Size:           10.00 GB (2621438 blocks)
Journals:                  5
Resource Groups:           40
Locking Protocol:          "lock_dlm"
Lock Table:                "an-clusterB:shared01"
UUID:                      6C0D7D1D-A1D3-ED79-705D-28EE3D674E75Add it to /etc/fstab (needed for the gfs2 init script to find and mount):
an-node03 - an-node07:
echo `gfs2_edit -p sb /dev/san_vg01/shared01 | grep sb_uuid | sed -e "s/.*sb_uuid  *\(.*\)/UUID=\L\1\E \/shared01\t\tgfs2\trw,suid,dev,exec,nouser,async\t0 0/"` >> /etc/fstab 
cat /etc/fstab#
# /etc/fstab
# Created by anaconda on Fri Jul  8 22:01:41 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=2c1f4cb1-959f-4675-b9c7-5d753c303dd1 /                       ext3    defaults        1 1
UUID=9a0224dc-15b4-439e-8d7c-5f9dbcd05e3f /boot                   ext3    defaults        1 2
UUID=4f2a83e8-1769-40d8-ba2a-e1f535306848 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
UUID=6c0d7d1d-a1d3-ed79-705d-28ee3d674e75 /shared01 gfs2 rw,suid,dev,exec,nouser,async 0 0Make the mount point and mount it.
mkdir /shared01
/etc/init.d/gfs2 startMounting GFS2 filesystem (/shared01):                      [  OK  ]df -hFilesystem            Size  Used Avail Use% Mounted on
/dev/sda2              40G  3.3G   35G   9% /
tmpfs                 1.8G   32M  1.8G   2% /dev/shm
/dev/sda1             248M   85M  151M  36% /boot
/dev/mapper/san_vg01-shared01
                       10G  647M  9.4G   7% /shared01Stop GFS2 on all five nodes and update the cluster.conf config.
/etc/init.d/gfs2 stopUnmounting GFS2 filesystem (/shared01):                    [  OK  ]df -hFilesystem            Size  Used Avail Use% Mounted on
/dev/sda2              40G  3.3G   35G   9% /
tmpfs                 1.8G   32M  1.8G   2% /dev/shm
/dev/sda1             248M   85M  151M  36% /boot
/dev/mapper/san_vg01-shared01
                       10G  647M  9.4G   7% /shared01an-node03:
<?xml version="1.0"?>
<cluster config_version="9" name="an-clusterB">
        <totem rrp_mode="none" secauth="off"/>
        <clusternodes>
                <clusternode name="an-node03.alteeve.com" nodeid="3">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node04.alteeve.com" nodeid="4">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="4"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node05.alteeve.com" nodeid="5">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="5"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node06.alteeve.com" nodeid="6">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="6"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node07.alteeve.com" nodeid="7">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="7"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.6" login="apc" name="pdu2" passwd="secret"/>
        </fencedevices>
        <fence_daemon post_join_delay="30"/>
        <rm>
                <resources>
                        <script file="/etc/init.d/iscsi" name="iscsi"/>
                        <script file="/etc/init.d/clvmd" name="clvmd"/>
                        <script file="/etc/init.d/gfs2" name="gfs2"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="an3_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node03.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an4_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node04.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an5_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node05.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an6_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node06.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an7_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node07.alteeve.com"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="1" domain="an3_only" exclusive="0" max_restarts="0" name="an3_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd">
                                        <script ref="gfs2"/>
                                </script>
                        </script>
                </service>
                <service autostart="1" domain="an4_only" exclusive="0" max_restarts="0" name="an4_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd">
                                        <script ref="gfs2"/>
                                </script>
                        </script>
                </service>
                <service autostart="1" domain="an5_only" exclusive="0" max_restarts="0" name="an5_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd">
                                        <script ref="gfs2"/>
                                </script>
                        </script>
                </service>
                <service autostart="1" domain="an6_only" exclusive="0" max_restarts="0" name="an6_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd">
                                        <script ref="gfs2"/>
                                </script>
                        </script>
                </service>
                <service autostart="1" domain="an7_only" exclusive="0" max_restarts="0" name="an7_storage" recovery="restart">
                        <script ref="iscsi">
                                <script ref="clvmd">
                                        <script ref="gfs2"/>
                                </script>
                        </script>
                </service>
        </rm>
</cluster>cman_tool version -rCheck that rgmanager picked up the updated config and remounted the GFS2 partition.
df -hFilesystem            Size  Used Avail Use% Mounted on
/dev/sda2              40G  3.3G   35G   9% /
tmpfs                 1.8G   32M  1.8G   2% /dev/shm
/dev/sda1             248M   85M  151M  36% /boot
/dev/mapper/san_vg01-shared01
                       10G  647M  9.4G   7% /shared01Configure KVM
Host network and VM hypervisor config.
Configure Bridges
On an-node03 through an-node07:
vim /etc/sysconfig/network-scripts/ifcfg-{eth,vbr}0ifcfg-eth0:
# Internet facing
HWADDR="bc:ae:c5:44:8a:de"
DEVICE="eth0"
BRIDGE="vbr0"
BOOTPROTO="static"
IPV6INIT="yes"
NM_CONTROLLED="no"
ONBOOT="yes"Note that you can use what ever bridge names makes sense to you. However, the file name for the bridge configuration must sort after the ifcfg-ethX file. If the bridge file is read before the ethernet interface, it will fail to come up. Also, the bridge name as defined in the file does not need to match the one used it the actual file name. Personally, I like vbrX for "vm bridge".
ifcfg-vbr0:
# Bridge - IFN
DEVICE="vbr0"
TYPE="Bridge"
IPADDR=192.168.1.73
NETMASK=255.255.255.0
GATEWAY=192.168.1.254
DNS1=192.139.81.117
DNS2=192.139.81.1You may wish to not make the Back-Channel Network accessible to the virtual machines, then there is no need to setup this second bridge.
vim /etc/sysconfig/network-scripts/ifcfg-{eth,vbr}2ifcfg-eth2:
# Back-channel
HWADDR="00:1B:21:72:9B:56"
DEVICE="eth2"
BRIDGE="vbr2"
BOOTPROTO="static"
IPV6INIT="yes"
NM_CONTROLLED="no"
ONBOOT="yes"ifcfg-vbr2:
# Bridge - BCN
DEVICE="vbr2"
TYPE="Bridge"
IPADDR=192.168.3.73
NETMASK=255.255.255.0Leave the cluster, lest we be fenced.
/etc/init.d/rgmanager stop && /etc/init.d/cman stopRestart networking and then check that the new bridges are up and that the proper ethernet devices are slaved to them.
/etc/init.d/network restartShutting down interface eth0:                              [  OK  ]
Shutting down interface eth1:                              [  OK  ]
Shutting down interface eth2:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]
Bringing up interface eth1:                                [  OK  ]
Bringing up interface eth2:                                [  OK  ]
Bringing up interface vbr0:                                [  OK  ]
Bringing up interface vbr2:                                [  OK  ]brctl showbridge name	bridge id		STP enabled	interfaces
vbr0		8000.bcaec5448ade	no		eth0
vbr2		8000.001b21729b56	no		eth2ifconfigeth0      Link encap:Ethernet  HWaddr BC:AE:C5:44:8A:DE  
          inet6 addr: fe80::beae:c5ff:fe44:8ade/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:4439 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2752 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:508352 (496.4 KiB)  TX bytes:494345 (482.7 KiB)
          Interrupt:31 Base address:0x8000 
eth1      Link encap:Ethernet  HWaddr 00:1B:21:72:96:E8  
          inet addr:192.168.2.73  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe72:96e8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:617100 errors:0 dropped:0 overruns:0 frame:0
          TX packets:847718 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:772489353 (736.7 MiB)  TX bytes:740536232 (706.2 MiB)
          Interrupt:18 Memory:fe9e0000-fea00000 
eth2      Link encap:Ethernet  HWaddr 00:1B:21:72:9B:56  
          inet6 addr: fe80::21b:21ff:fe72:9b56/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:86586 errors:0 dropped:0 overruns:0 frame:0
          TX packets:80934 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:11366700 (10.8 MiB)  TX bytes:10091579 (9.6 MiB)
          Interrupt:17 Memory:feae0000-feb00000 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:32 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:11507 (11.2 KiB)  TX bytes:11507 (11.2 KiB)
vbr0      Link encap:Ethernet  HWaddr BC:AE:C5:44:8A:DE  
          inet addr:192.168.1.73  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::beae:c5ff:fe44:8ade/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:165 errors:0 dropped:0 overruns:0 frame:0
          TX packets:89 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:25875 (25.2 KiB)  TX bytes:17081 (16.6 KiB)
vbr2      Link encap:Ethernet  HWaddr 00:1B:21:72:9B:56  
          inet addr:192.168.3.73  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe72:9b56/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:74 errors:0 dropped:0 overruns:0 frame:0
          TX packets:27 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:19021 (18.5 KiB)  TX bytes:4137 (4.0 KiB)Rejoin the cluster.
/etc/init.d/cman start && /etc/init.d/rgmanager start
Repeat these configurations, altering for MAC and IP addresses as appropriate, for the other four VM cluster nodes.
Benchmarks
GFS2 partition on an-node07's /shared01 partition. Test #1, no optimization:
bonnie++ -d /shared01/ -s 8g -u root:rootVersion  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
an-node07.alteev 8G   388  95 22203   6 14875   8  2978  95 48406  10 107.3   5
Latency               312ms   44400ms   31355ms   41505us     540ms   11926ms
Version  1.96       ------Sequential Create------ --------Random Create--------
an-node07.alteeve.c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1144  18 +++++ +++  8643  56   939  19 +++++ +++  8262  55
Latency               291ms     586us    2085us    3511ms      51us    3669us
1.96,1.96,an-node07.alteeve.com,1,1312497509,8G,,388,95,22203,6,14875,8,2978,95,48406,10,107.3,5,16,,,,,1144,18,+++++,+++,8643,56,939,19,+++++,+++,8262,55,312ms,44400ms,31355ms,41505us,540ms,11926ms,291ms,586us,2085us,3511ms,51us,3669usProvision vm0001
TODO.
Stuff
| Any questions, feedback, advice, complaints or meanderings are welcome. | |||
| Alteeve's Niche! | Alteeve Enterprise Support | Community Support | |
| © 2025 Alteeve. Intelligent Availability® is a registered trademark of Alteeve's Niche! Inc. 1997-2025 | |||
| legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. | |||
