Revision as of 02:24, 3 August 2011

AN!Wiki :: How To :: 2x5 Scalable Cluster Tutorial

Warning: This tutorial is not even close to complete or accurate. It will be updated later, but so long as this warning is here, consider it defective and unusable. The only up to date clustering tutorial is: Red Hat Cluster Service 2 Tutorial.

The Design

All nodes have IPs as follows:
 * eth0 == Internet Facing Network == 192.168.1.x
 * eth1 == Storage Network         == 192.168.2.x
 * eth2 == Back Channel Network    == 192.168.3.x
   * Where 'x' = the node ID (ie: an-node01 -> x=1)

 * If a node has an IPMI (or similar) interface piggy-backed on a network
   interface, it will be shared with eth2. If it has a dedicated interface, it
   will be connected to the BCN.
 * Node management interfaces will be on 192.168.3.(x+100)
 * All subnets are /24 (255.255.255.0)

 Storage node use 2x SATA drives ('sda' and 'sdb') plus 2x SSD drives ('sdc' 
 and 'sdd').

 Logical map:
  ___________________________________________                     ___________________________________________ 
 | [ an-node01 ]                       ______|                   |______                       [ an-node02 ] |
 |  ______    _____    _______        | eth0 =------\     /------= eth0 |        _______    _____    ______  |
 | [_sda1_]--[_md0_]--[_/boot_]       |_____||      |     |      ||_____|       [_/boot_]--[_md0_]--[_sda1_] |
 | [_sdb1_]                                  |      |     |      |                                  [_sdb1_] |
 |  ______    _____    ______          ______|      |     |      |______          ______    _____    ______  |
 | [_sda2_]--[_md1_]--[_swap_]   /----| eth1 =----\ |     | /----= eth1 |----\   [_swap_]--[_md1_]--[_sda2_] |
 | [_sdb2_]                      | /--|_____||    | |     | |    ||_____|--\ |                      [_sdb2_] |
 |  ______    _____    ___       | |         |    | |     | |    |         | |       ___    _____    ______  |
 | [_sda3_]--[_md2_]--[_/_]      | |   ______|    | |     | |    |______   | |      [_/_]--[_md2_]--[_sda3_] |
 | [_sdb3_]                      | |  | eth2 =--\ | |     | | /--= eth2 |  | |                      [_sdb3_] |
 |  ______    _____    _______   | |  |_____||  | | |     | | |  ||_____|  | |   _______    _____    ______  |
 | [_sda5_]--[_md3_]--[_drbd0_]--/ |         |  | | |     | | |  |         | \--[_drbd0_]--[_md3_]--[_sda5_] |
 | [_sdb5_]                        |         |  | | |     | | |  |         |                        [_sdb5_] |
 |  ______    _____    _______     |         |  | | |     | | |  |         |     _______    _____    ______  |
 | [_sdc1_]--[_md4_]--[_drbd1_]----/         |  | | |     | | |  |         \----[_drbd1_]--[_md4_]--[_sdc1_] |
 | [_sdd1_]                                  |  | | |     | | |  |                                  [_sdd1_] |
 |___________________________________________|  | | |     | | |  |___________________________________________|
                                                | | |     | | |
                        /---------------------------/     | | |
                        |                       | |       | | |
                        | /-------------------------------/ | |
                        | |                     | |         | \-----------------------\
                        | |                     | |         |                         |
                        | |                     \-----------------------------------\ |
                        | |                       |         |                       | |
                        | |   ____________________|_________|____________________   | |
                        | |  [ iqn.2011-08.com.alteeve:an-clusterA.target01.hdd  ]  | |
                        | |  [ iqn.2011-08.com.alteeve:an-clusterA.target02.sdd  ]  | |
   _________________    | |  [    drbd0 == hdd == vg01           Floating IP     ]  | |      ___________________
  [ Internet Facing ]   | |  [____drbd1_==_sdd_==_vg02__________192.168.2.100____]  | | /---[ Internal Managed  ]
  [_____Routers_____]   | |                             | |                         | | |   [  Private Network  ]
                  |     | \-----------\                 | |                   /-----/ | |   [_and_fence_devices_]
                  |     \-----------\ |                 | |                   | /-----/ |
                  \---------------\ | |                 | |                   | | /-----/
                                 _|_|_|___________     _|_|_____________     _|_|_|___________
 [ Storage Cluster ]            [ Internet Facing ]   [ Storage Network ]   [  Back-Channel   ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[_____Network_____]~~~[_________________]~~~[_____Network_____]~~~~~~~~~~~~~~~
 [    VM Cluster   ]              | | | | |             | | | | |             | | | | |  
                                  | | | | |             | | | | |             | | | | |
  __________________________      | | | | |             | | | | |             | | | | |
 | [ an-node03 ]      ______|     | | | | |             | | | | |             | | | | |
 |                   | eth0 =-----/ | | | |             | | | | |             | | | | |
 |                   |_____||       | | | |             | | | | |             | | | | |
 |                          |       | | | |             | | | | |             | | | | |
 |                    ______|       | | | |             | | | | |             | | | | |
 |                   | eth1 =---------------------------/ | | | |             | | | | |
 |                   |_____||       | | | |               | | | |             | | | | |
 |                          |       | | | |               | | | |             | | | | |
 |                    ______|       | | | |               | | | |             | | | | |
 |                   | eth2 =-------------------------------------------------/ | | | |
 |                   |_____||       | | | |               | | | |               | | | |
 |__________________________|       | | | |               | | | |               | | | |
                                    | | | |               | | | |               | | | |
  __________________________        | | | |               | | | |               | | | |
 | [ an-node04 ]      ______|       | | | |               | | | |               | | | |
 |                   | eth0 =-------/ | | |               | | | |               | | | |
 |                   |_____||         | | |               | | | |               | | | |
 |                          |         | | |               | | | |               | | | |
 |                    ______|         | | |               | | | |               | | | |
 |                   | eth1 =-----------------------------/ | | |               | | | |
 |                   |_____||         | | |                 | | |               | | | |
 |                          |         | | |                 | | |               | | | |
 |                    ______|         | | |                 | | |               | | | |
 |                   | eth2 =---------------------------------------------------/ | | |
 |                   |_____||         | | |                 | | |                 | | |
 |__________________________|         | | |                 | | |                 | | |
                                      | | |                 | | |                 | | |
  __________________________          | | |                 | | |                 | | |
 | [ an-node05 ]      ______|         | | |                 | | |                 | | |
 |                   | eth0 =---------/ | |                 | | |                 | | |
 |                   |_____||           | |                 | | |                 | | |
 |                          |           | |                 | | |                 | | |
 |                    ______|           | |                 | | |                 | | |
 |                   | eth1 =-------------------------------/ | |                 | | |
 |                   |_____||           | |                   | |                 | | |
 |                          |           | |                   | |                 | | |
 |                    ______|           | |                   | |                 | | |
 |                   | eth2 =-----------------------------------------------------/ | |
 |                   |_____||           | |                   | |                   | |
 |__________________________|           | |                   | |                   | |
                                        | |                   | |                   | |
  __________________________            | |                   | |                   | |
 | [ an-node06 ]      ______|           | |                   | |                   | |
 |                   | eth0 =-----------/ |                   | |                   | |
 |                   |_____||             |                   | |                   | |
 |                          |             |                   | |                   | |
 |                    ______|             |                   | |                   | |
 |                   | eth1 =---------------------------------/ |                   | |
 |                   |_____||             |                     |                   | |
 |                          |             |                     |                   | |
 |                    ______|             |                     |                   | |
 |                   | eth2 =-------------------------------------------------------/ |
 |                   |_____||             |                     |                     |
 |__________________________|             |                     |                     |
                                          |                     |                     |
  __________________________              |                     |                     |
 | [ an-node07 ]      ______|             |                     |                     |
 |                   | eth0 =-------------/                     |                     |
 |                   |_____||                                   |                     |
 |                          |                                   |                     |
 |                    ______|                                   |                     |
 |                   | eth1 =-----------------------------------/                     |
 |                   |_____||                                                         |
 |                          |                                                         |
 |                    ______|                                                         |
 |                   | eth2 =---------------------------------------------------------/
 |                   |_____||
 |__________________________|

Install The Cluster Software

If you are using Red Hat Enterprise Linux, you will need to add the RHEL Server Optional (v. 6 64-bit x86_64) channel for each node in your cluster. You can do this in RHN by going the your subscription management page, clicking on each server, clicking on "Alter Channel Subscriptions", click to enable the RHEL Server Optional (v. 6 64-bit x86_64) channel and then by clicking on "Change Subscription".

This actual installation is simple, just use yum to install cman.

yum install cman fence-agents

Initial Config

Everything uses ricci, which itself needs to have a password set. I set this to match root.

Both:

passwd ricci

New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.

With these decisions and the information gathered, here is what our first /etc/cluster/cluster.conf file will look like.

touch /etc/cluster/cluster.conf
vim /etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster name="an-clusterA" config_version="1">
        <cman two_node="1" expected_votes="1" />
        <totem secauth="off" rrp_mode="none" />
        <clusternodes>
                <clusternode name="an-node01.alteeve.com" nodeid="1">
                        <fence>
                                <method name="PDU">
                                        <device name="pdu2" action="reboot" port="1" />
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node02.alteeve.com" nodeid="2">
                        <fence>
                                <method name="PDU">
                                        <device name="pdu2" action="reboot" port="2" />
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice name="pdu2" agent="fence_apc" ipaddr="192.168.1.6" login="apc" passwd="secret" />
        </fencedevices>
        <rm>
                <resources>
                        <ip address="192.168.2.100" monitor_link="on"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="an1_primary" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node01.alteeve.com" priority="1"/>
                                <failoverdomainnode name="an-node02.alteeve.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="1" name="san_ip1" domain="an1_primary">
                        <ip ref="192.168.2.100"/>
                </service>
        </rm>
</cluster>

Save the file, then validate it. If it fails, address the errors and try again.

ip addr list | grep <ip>
rg_test test /etc/cluster/cluster.conf
ccs_config_validate

Configuration validates

Push it to the other node:

rsync -av /etc/cluster/cluster.conf root@an-node02:/etc/cluster/

sending incremental file list
cluster.conf

sent 781 bytes  received 31 bytes  541.33 bytes/sec
total size is 701  speedup is 0.86

Start:

DO NOT PROCEED UNTIL YOUR cluster.conf FILE VALIDATES!

Unless you have it perfect, your cluster will fail.

Once it validates, proceed.

Starting The Cluster For The First Time

By default, if you start one node only and you've enabled the <cman two_node="1" expected_votes="1"/> option as we have done, the lone server will effectively gain quorum. It will try to connect to the cluster, but there won't be a cluster to connect to, so it will fence the other node after a timeout period. This timeout is 6 seconds by default.

For now, we will leave the default as it is. If you're interested in changing it though, the argument you are looking for is post_join_delay.

This behaviour means that we'll want to start both nodes well within six seconds of one another, least the slower one get needlessly fenced.

Left off here

Note to help minimize dual-fences:

you could add FENCED_OPTS="-f 5" to /etc/sysconfig/cman on *one* node (ilo fence devices may need this)

DRBD Config

Install from source:

Both:

# Obliterate peer - fence via cman
wget -c https://alteeve.com/files/an-cluster/sbin/obliterate-peer.sh -O /sbin/obliterate-peer.sh
chmod a+x /sbin/obliterate-peer.sh
ls -lah /sbin/obliterate-peer.sh

# Download, compile and install DRBD
wget -c http://oss.linbit.com/drbd/8.3/drbd-8.3.11.tar.gz
tar -xvzf drbd-8.3.11.tar.gz
cd drbd-8.3.11
./configure \
   --prefix=/usr \
   --localstatedir=/var \
   --sysconfdir=/etc \
   --with-utils \
   --with-km \
   --with-udev \
   --with-pacemaker \
   --with-rgmanager \
   --with-bashcompletion
make
make install

Configure

an-node01:

# Configure DRBD's global value.
cp /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.orig
vim /etc/drbd.d/global_common.conf
diff -u /etc/drbd.d/global_common.conf

--- /etc/drbd.d/global_common.conf.orig	2011-08-01 21:58:46.000000000 -0400
+++ /etc/drbd.d/global_common.conf	2011-08-01 23:18:27.000000000 -0400
@@ -15,24 +15,35 @@
 		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
 		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
 		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
+		fence-peer		"/sbin/obliterate-peer.sh";
 	}
 
 	startup {
 		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
+		become-primary-on	both;
+		wfc-timeout		300;
+		degr-wfc-timeout	120;
 	}
 
 	disk {
 		# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
 		# no-disk-drain no-md-flushes max-bio-bvecs
+		fencing			resource-and-stonith;
 	}
 
 	net {
 		# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
 		# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
 		# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
+		allow-two-primaries;
+		after-sb-0pri		discard-zero-changes;
+		after-sb-1pri		discard-secondary;
+		after-sb-2pri		disconnect;
 	}
 
 	syncer {
 		# rate after al-extents use-rle cpu-mask verify-alg csums-alg
+		# This should be no more than 30% of the maximum sustainable write speed.
+		rate			20M;
 	}
 }

vim /etc/drbd.d/r0.res

resource r0 {
        device          /dev/drbd0;
        meta-disk       internal;
        on an-node01.alteeve.com {
                address         192.168.2.71:7789;
                disk            /dev/sda5;
        }
        on an-node02.alteeve.com {
                address         192.168.2.72:7789;
                disk            /dev/sda5;
        }
}

cp /etc/drbd.d/r0.res /etc/drbd.d/r1.res 
vim /etc/drbd.d/r1.res

resource r1 {
        device          /dev/drbd1;
        meta-disk       internal;
        on an-node01.alteeve.com {
                address         192.168.2.71:7790;
                disk            /dev/sdb1;
        }
        on an-node02.alteeve.com {
                address         192.168.2.72:7790;
                disk            /dev/sdb1;
        }
}

Note: If you have multiple DRBD resources on on (set of) backing disks, consider adding syncer { after <minor-1>; }. For example, tell /dev/drbd1 to wait for /dev/drbd0 by adding syncer { after 0; }. This will prevent simultaneous resync's which could seriously impact performance. Resources will wait in state until the defined resource has completed sync'ing.

Validate:

drbdadm dump

  --==  Thank you for participating in the global usage survey  ==--
The server's response is:

you are the 369th user to install this version
# /usr/etc/drbd.conf
common {
    protocol               C;
    net {
        allow-two-primaries;
        after-sb-0pri    discard-zero-changes;
        after-sb-1pri    discard-secondary;
        after-sb-2pri    disconnect;
    }
    disk {
        fencing          resource-and-stonith;
    }
    syncer {
        rate             20M;
    }
    startup {
        wfc-timeout      300;
        degr-wfc-timeout 120;
        become-primary-on both;
    }
    handlers {
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        local-io-error   "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        fence-peer       /sbin/obliterate-peer.sh;
    }
}

# resource r0 on an-node01.alteeve.com: not ignored, not stacked
resource r0 {
    on an-node01.alteeve.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/sda5;
        address          ipv4 192.168.2.71:7789;
        meta-disk        internal;
    }
    on an-node02.alteeve.com {
        device           /dev/drbd0 minor 0;
        disk             /dev/sda5;
        address          ipv4 192.168.2.72:7789;
        meta-disk        internal;
    }
}

# resource r1 on an-node01.alteeve.com: not ignored, not stacked
resource r1 {
    on an-node01.alteeve.com {
        device           /dev/drbd1 minor 1;
        disk             /dev/sdb1;
        address          ipv4 192.168.2.71:7790;
        meta-disk        internal;
    }
    on an-node02.alteeve.com {
        device           /dev/drbd1 minor 1;
        disk             /dev/sdb1;
        address          ipv4 192.168.2.72:7790;
        meta-disk        internal;
    }
}

rsync -av /etc/drbd.d root@an-node02:/etc/

drbd.d/
drbd.d/global_common.conf
drbd.d/global_common.conf.orig
drbd.d/r0.res
drbd.d/r1.res

sent 3523 bytes  received 110 bytes  7266.00 bytes/sec
total size is 3926  speedup is 1.08

Initialize and First start

Both:

Create the meta-data.

drbdadm create-md r{0,1}

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Attach, connect and confirm (after both have attached and connected):

drbdadm attach r{0,1}
drbdadm connect r{0,1}
cat /proc/drbd

version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:441969960
 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:29309628

There is no data, so force both devices to be instantly UpToDate:

drbdadm -- --clear-bitmap new-current-uuid r{0,1}
cat /proc/drbd

version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Set both to primary and run a final check.

drbdadm primary r{0,1}
cat /proc/drbd

version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:672 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Update the cluster

vim /etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="17" name="an-clusterA">
        <cman expected_votes="1" two_node="1"/>
        <totem rrp_mode="none" secauth="off"/>
        <clusternodes>
                <clusternode name="an-node01.alteeve.com" nodeid="1">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="an-node02.alteeve.com" nodeid="2">
                        <fence>
                                <method name="apc_pdu">
                                        <device action="reboot" name="pdu2" port="2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.6" login="apc" name="pdu2" passwd="secret"/>
        </fencedevices>
        <fence_daemon post_join_delay="30"/>
        <rm>
                <resources>
                        <ip address="192.168.2.100" monitor_link="on"/>
                        <script file="/etc/init.d/drbd" name="drbd"/>
                        <script file="/etc/init.d/clvmd" name="clvmd"/>
                        <script file="/etc/init.d/tgtd" name="tgtd"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="an1_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node01.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an2_only" nofailback="1" ordered="0" restricted="1">
                                <failoverdomainnode name="an-node02.alteeve.com"/>
                        </failoverdomain>
                        <failoverdomain name="an1_primary" nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode name="an-node01.alteeve.com" priority="1"/>
                                <failoverdomainnode name="an-node02.alteeve.com" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service autostart="0" domain="an1_only" exclusive="0" max_restarts="0" name="an1_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd"/>
                        </script>
                </service>
                <service autostart="0" domain="an2_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd"/>
                        </script>
                </service>
        </rm>
</cluster>

rg_test test /etc/cluster/cluster.conf

Running in test mode.
Loading resource rule from /usr/share/cluster/oralistener.sh
Loading resource rule from /usr/share/cluster/apache.sh
Loading resource rule from /usr/share/cluster/SAPDatabase
Loading resource rule from /usr/share/cluster/postgres-8.sh
Loading resource rule from /usr/share/cluster/lvm.sh
Loading resource rule from /usr/share/cluster/mysql.sh
Loading resource rule from /usr/share/cluster/lvm_by_vg.sh
Loading resource rule from /usr/share/cluster/service.sh
Loading resource rule from /usr/share/cluster/samba.sh
Loading resource rule from /usr/share/cluster/SAPInstance
Loading resource rule from /usr/share/cluster/checkquorum
Loading resource rule from /usr/share/cluster/ocf-shellfuncs
Loading resource rule from /usr/share/cluster/svclib_nfslock
Loading resource rule from /usr/share/cluster/script.sh
Loading resource rule from /usr/share/cluster/clusterfs.sh
Loading resource rule from /usr/share/cluster/fs.sh
Loading resource rule from /usr/share/cluster/oracledb.sh
Loading resource rule from /usr/share/cluster/nfsserver.sh
Loading resource rule from /usr/share/cluster/netfs.sh
Loading resource rule from /usr/share/cluster/orainstance.sh
Loading resource rule from /usr/share/cluster/vm.sh
Loading resource rule from /usr/share/cluster/lvm_by_lv.sh
Loading resource rule from /usr/share/cluster/tomcat-6.sh
Loading resource rule from /usr/share/cluster/ip.sh
Loading resource rule from /usr/share/cluster/nfsexport.sh
Loading resource rule from /usr/share/cluster/openldap.sh
Loading resource rule from /usr/share/cluster/ASEHAagent.sh
Loading resource rule from /usr/share/cluster/nfsclient.sh
Loading resource rule from /usr/share/cluster/named.sh
Loaded 24 resource rules
=== Resources List ===
Resource type: ip
Instances: 1/1
Agent: ip.sh
Attributes:
  address = 192.168.2.100 [ primary unique ]
  monitor_link = on
  nfslock [ inherit("service%nfslock") ]

Resource type: script
Agent: script.sh
Attributes:
  name = drbd [ primary unique ]
  file = /etc/init.d/drbd [ unique required ]
  service_name [ inherit("service%name") ]

Resource type: script
Agent: script.sh
Attributes:
  name = clvmd [ primary unique ]
  file = /etc/init.d/clvmd [ unique required ]
  service_name [ inherit("service%name") ]

Resource type: script
Agent: script.sh
Attributes:
  name = tgtd [ primary unique ]
  file = /etc/init.d/tgtd [ unique required ]
  service_name [ inherit("service%name") ]

Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = an1_storage [ primary unique required ]
  domain = an1_only [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = restart [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0

Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = an2_storage [ primary unique required ]
  domain = an2_only [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = restart [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0

Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = san_ip [ primary unique required ]
  domain = an1_primary [ reconfig ]
  autostart = 0 [ reconfig ]
  exclusive = 0 [ reconfig ]
  nfslock = 0
  nfs_client_cache = 0
  recovery = relocate [ reconfig ]
  depend_mode = hard
  max_restarts = 0
  restart_expire_time = 0
  priority = 0

=== Resource Tree ===
service (S0) {
  name = "an1_storage";
  domain = "an1_only";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "restart";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  script (S0) {
    name = "drbd";
    file = "/etc/init.d/drbd";
    service_name = "an1_storage";
    script (S0) {
      name = "clvmd";
      file = "/etc/init.d/clvmd";
      service_name = "an1_storage";
    }
  }
}
service (S0) {
  name = "an2_storage";
  domain = "an2_only";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "restart";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  script (S0) {
    name = "drbd";
    file = "/etc/init.d/drbd";
    service_name = "an2_storage";
    script (S0) {
      name = "clvmd";
      file = "/etc/init.d/clvmd";
      service_name = "an2_storage";
    }
  }
}
service (S0) {
  name = "san_ip";
  domain = "an1_primary";
  autostart = "0";
  exclusive = "0";
  nfslock = "0";
  nfs_client_cache = "0";
  recovery = "relocate";
  depend_mode = "hard";
  max_restarts = "0";
  restart_expire_time = "0";
  priority = "0";
  ip (S0) {
    address = "192.168.2.100";
    monitor_link = "on";
    nfslock = "0";
  }
}
=== Failover Domains ===
Failover domain: an1_only
Flags: Restricted No Failback
  Node an-node01.alteeve.com (id 1, priority 0)
Failover domain: an2_only
Flags: Restricted No Failback
  Node an-node02.alteeve.com (id 2, priority 0)
Failover domain: an1_primary
Flags: Ordered No Failback
  Node an-node01.alteeve.com (id 1, priority 1)
  Node an-node02.alteeve.com (id 2, priority 2)
=== Event Triggers ===
Event Priority Level 100:
  Name: Default
    (Any event)
    File: /usr/share/cluster/default_event_script.sl
[root@an-node01 ~]# cman_tool version -r
You have not authenticated to the ricci daemon on an-node01.alteeve.com
Password: 
[root@an-node01 ~]# clusvcadm -e service:an1_storage
Local machine trying to enable service:an1_storage...Success
service:an1_storage is now running on an-node01.alteeve.com
[root@an-node01 ~]# cat /proc/drbd 
version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:924 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:916 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

cman_tool version -r

You have not authenticated to the ricci daemon on an-node01.alteeve.com
Password:

an-node01:

clusvcadm -e service:an1_storage

service:an1_storage is now running on an-node01.alteeve.com

an-node02:

clusvcadm -e service:an2_storage

service:an2_storage is now running on an-node02.alteeve.com

Either

cat /proc/drbd

version: 8.3.11 (api:88/proto:86-96)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@an-node01.alteeve.com, 2011-08-01 22:04:32
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:924 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:916 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Configure Clustered LVM

span class="code">an-node01:

cp /etc/lvm/lvm.conf /etc/lvm/lvm.conf.orig
vim /etc/lvm/lvm.conf
diff -u /etc/lvm/lvm.conf.orig /etc/lvm/lvm.conf

--- /etc/lvm/lvm.conf.orig	2011-08-02 21:59:01.000000000 -0400
+++ /etc/lvm/lvm.conf	2011-08-02 22:00:17.000000000 -0400
@@ -50,7 +50,8 @@
 
 
     # By default we accept every block device:
-    filter = [ "a/.*/" ]
+    #filter = [ "a/.*/" ]
+    filter = [ "a|/dev/drbd*|", "r/.*/" ]
 
     # Exclude the cdrom drive
     # filter = [ "r|/dev/cdrom|" ]
@@ -308,7 +309,8 @@
     # Type 3 uses built-in clustered locking.
     # Type 4 uses read-only locking which forbids any operations that might 
     # change metadata.
-    locking_type = 1
+    #locking_type = 1
+    locking_type = 3
 
     # Set to 0 to fail when a lock request cannot be satisfied immediately.
     wait_for_locks = 1
@@ -324,7 +326,8 @@
     # to 1 an attempt will be made to use local file-based locking (type 1).
     # If this succeeds, only commands against local volume groups will proceed.
     # Volume Groups marked as clustered will be ignored.
-    fallback_to_local_locking = 1
+    #fallback_to_local_locking = 1
+    fallback_to_local_locking = 0
 
     # Local non-LV directory that holds file-based locks while commands are
     # in progress.  A directory like /tmp that may get wiped on reboot is OK.

rsync -av /etc/lvm/lvm.conf root@an-node02:/etc/lvm/

sending incremental file list
lvm.conf

sent 2412 bytes  received 247 bytes  5318.00 bytes/sec
total size is 24668  speedup is 9.28

Create the LVM PVs, VGs and LVs.

an-node01:

pvcreate /dev/drbd{0,1}

  Physical volume "/dev/drbd0" successfully created
  Physical volume "/dev/drbd1" successfully created

an-node02:

pvscan

  PV /dev/drbd0                      lvm2 [421.50 GiB]
  PV /dev/drbd1                      lvm2 [27.95 GiB]
  Total: 2 [449.45 GiB] / in use: 0 [0   ] / in no VG: 2 [449.45 GiB]

an-node01:

vgcreate -c y hdd_vg0 /dev/drbd0 && vgcreate -c y sdd_vg0 /dev/drbd1

  Clustered volume group "hdd_vg0" successfully created
  Clustered volume group "ssd_vg0" successfully created

an-node02:

vgscan

  Reading all physical volumes.  This may take a while...
  Found volume group "ssd_vg0" using metadata type lvm2
  Found volume group "hdd_vg0" using metadata type lvm2

an-node01:

lvcreate -l 100%FREE -n lun0 /dev/hdd_vg0 && lvcreate -l 100%FREE -n lun1 /dev/ssd_vg0

  Logical volume "lun0" created
  Logical volume "lun1" created

an-node02:

lvscan

  ACTIVE            '/dev/ssd_vg0/lun1' [27.95 GiB] inherit
  ACTIVE            '/dev/hdd_vg0/lun0' [421.49 GiB] inherit

iSCSI notes

IET vs tgt pros and cons needed.

default iscsi port is 3260

initiator: This is the client. target: This is the server side. sid: Session ID; Found with iscsiadm -m session -P 1. SID and sysfs path are not persistent, partially start-order based. iQN: iSCSI Qualified Name; This is a string that uniquely identifies targets and initiators.

Both:

yum install iscsi-initiator-utils scsi-target-utils

an-node01:

cp /etc/tgt/targets.conf /etc/tgt/targets.conf.orig
vim /etc/tgt/targets.conf
diff -u /etc/tgt/targets.conf.orig /etc/tgt/targets.conf

--- /etc/tgt/targets.conf.orig	2011-07-31 12:38:35.000000000 -0400
+++ /etc/tgt/targets.conf	2011-08-02 22:19:06.000000000 -0400
@@ -251,3 +251,9 @@
 #        vendor_id VENDOR1
 #    </direct-store>
 #</target>
+
+<target iqn.2011-08.com.alteeve:an-clusterA.target01>
+	direct-store /dev/drbd0
+	direct-store /dev/drbd1
+	vendor_id Alteeve

rsync -av /etc/tgt/targets.conf root@an-node02:/etc/tgt/

sending incremental file list
targets.conf

sent 909 bytes  received 97 bytes  670.67 bytes/sec
total size is 7093  speedup is 7.05

Update the cluster

               <service autostart="0" domain="an1_only" exclusive="0" max_restarts="0" name="an1_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd">
                                        <script ref="tgtd"/>
                                </script>
                        </script>
                </service>
                <service autostart="0" domain="an2_only" exclusive="0" max_restarts="0" name="an2_storage" recovery="restart" restart_expire_time="0">
                        <script ref="drbd">
                                <script ref="clvmd">
                                        <script ref="tgtd"/>
                                </script>
                        </script>
                </service>

Any questions, feedback, advice, complaints or meanderings are welcome.
`Alteeve's Niche!`	`Enterprise Support: Alteeve Support`	`Community Support`
© Alteeve's Niche! Inc. 1997-2024		Anvil! "Intelligent Availability®" Platform
`legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.`

2x5 Scalable Cluster Tutorial: Difference between revisions

Revision as of 02:24, 3 August 2011

Contents

The Design

Install The Cluster Software

Initial Config

Starting The Cluster For The First Time

DRBD Config

Configure

Initialize and First start

Update the cluster

Configure Clustered LVM

iSCSI notes

Update the cluster

Navigation menu