Watchdog Recovery: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 5: Line 5:
The new <span class="code">fence_sanlock</span> fence agent provides a new fencing option to users who may not have full out of band management or switched PDUs. It aims to provide a critical function in clusters to users who otherwise would have no (affordable) options.
The new <span class="code">fence_sanlock</span> fence agent provides a new fencing option to users who may not have full out of band management or switched PDUs. It aims to provide a critical function in clusters to users who otherwise would have no (affordable) options.


{{warn|1=This technology is TechPreview! There is no support for this fence method yet. Feedback and bug reports are much appreciated.}}
{{warning|1=This technology is TechPreview! There is no support for this fence method yet. Feedback and bug reports are much appreciated.}}


= About Fencing =  
= About Fencing =  

Revision as of 21:53, 11 October 2012

 AN!Wiki :: How To :: Watchdog Recovery

Note: This tutorial is written using Fedora 16.

The new fence_sanlock fence agent provides a new fencing option to users who may not have full out of band management or switched PDUs. It aims to provide a critical function in clusters to users who otherwise would have no (affordable) options.

Warning: This technology is TechPreview! There is no support for this fence method yet. Feedback and bug reports are much appreciated.

About Fencing

Traditionally in clustering, all nodes must be in a known state. In practice, this meant that when a node stopped responding, the rest of the cluster could not safely proceed until the silent node was put into a known state.

The action of putting a node into a known state is called "fencing". Typically, this is done by one of the other nodes in the cluster either isolating or forcibly powering off the lost node.

  • With isolation, the lost node would not itself be touched, but it's network link(s) would be disabled. This would ensure that even if the node recovered, it would no longer have access to the cluster or it's shared storage. This form of clustering is called "fabric fencing".
  • The far more common form of fencing is to forcibly power off the lost node. This is done by using an external device, like a server's out-of-band management card (IPMI, iLO, etc) or by using a network-connected power bar, called a PDU.

In either case, the purpose of fencing is to ensure that the lost node will be able to access clustered resources, like shared storage, or provide clustered services in an asynchronous manner. Skipping this crucial step could cause data loss, so it is critical to always use fencing in clusters.

Watchdog Timers

Many mother boards have "watchdog" timers built in. These timers will cause the host machine to reboot if the system appears to freeze for a period of time. The new fence_sanlock agent combines these with SAN storage to provide an alternative fence method.

Where "fabric fencing" can be thought of as a form of ostracism and "power fencing" can be thought of as a form of murder, watchdog fencing can be thought of as a form of suicide. Morbid, but accurate.

Important Note On Timing

Watchdog timers work by having a constant count down running. The host has to periodically and reliably reset this timer. If the watchdog timer is allowed to expire, the host machine will be reset. This timeout is often measured in minutes.

Traditional fencing methods which communicate with external devices that can report success as soon as the target node has been fenced. This process is usually measured in a small number of seconds.

When a cluster loses contact with a node, it blocks by design. It is not safe to proceed until all nodes are in a known state, so the users of the cluster services will notice a period of interruption until the lost node recovers or is fenced.

Putting this together, this timing difference means that any watchdog-based fencing will be much slower than traditional fencing. Your users will most likely experience an outage of several minutes while the fence_sanlock works. For this reason, fence_sanlock should be used only when no traditional fence methods are unavailable.

In short; watchdog fencing is not a replacement for traditional fencing. It is only a replacement for no fencing at all.

How fence_sanlock Works

ToDo.

Setup

Requirements

You will need;

  • A hardware watchdog timer.
  • External shared storage.

From the hardware end of things, you have to have a hardware watchdog timer. Many workstation and server mainboards offer this as a built-in feature which can be enabled in the BIOS. If your system does not have this, add-in and external watchdog timers can be used and are relatively inexpensive. To further save costs, some open-hardware watchdog timer designs are available for those handy with a soldering iron.

Note: Software watchdog timers exist but they are not supported in production. They rely on the host functioning to at least some degree which is a fatal design flaw. A simple test of issuing echo c > /proc/sysrq-trigger will demonstrate the flaw in using software watchdog timers.

You need to install;

  • cman ver. 3.1.99 +
  • wdmd ver. 2.6 + (available from the sanlock ver. 2.6 + package)
  • fence_sanlock ver. 2.6 +

Options

There are currently 2 mechanisms to trigger a node recover via watchdog device:

  1. fence_sanlock: Preferred method, but always requires shared storage.
  2. fcheckquorum.wdmd: Only requires shared storage for 2-node clusters.

Installation

To install fence_sanlock;

Note: Update this to use yum once the RPMs are available in the main repos.
yum install openais modcluster
Resolving Dependencies
--> Running transaction check
---> Package modcluster.x86_64 0:0.18.7-3.fc16 will be installed
--> Processing Dependency: libfence.so.4()(64bit) for package: modcluster-0.18.7-3.fc16.x86_64
--> Processing Dependency: libcman.so.3()(64bit) for package: modcluster-0.18.7-3.fc16.x86_64
---> Package openais.x86_64 0:1.1.4-2.fc15 will be installed
--> Processing Dependency: openaislib = 1.1.4-2.fc15 for package: openais-1.1.4-2.fc15.x86_64
--> Processing Dependency: corosync >= 1.0.0-1 for package: openais-1.1.4-2.fc15.x86_64
--> Running transaction check
---> Package clusterlib.x86_64 0:3.1.92-1.fc16 will be installed
--> Processing Dependency: libconfdb.so.4(COROSYNC_CONFDB_1.0)(64bit) for package: clusterlib-3.1.92-1.fc16.x86_64
--> Processing Dependency: libconfdb.so.4()(64bit) for package: clusterlib-3.1.92-1.fc16.x86_64
---> Package corosync.x86_64 0:1.4.3-1.fc16 will be installed
---> Package openaislib.x86_64 0:1.1.4-2.fc15 will be installed
--> Running transaction check
---> Package corosynclib.x86_64 0:1.4.3-1.fc16 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================
 Package                  Arch                Version                    Repository            Size
====================================================================================================
Installing:
 modcluster               x86_64              0.18.7-3.fc16              updates              189 k
 openais                  x86_64              1.1.4-2.fc15               fedora               190 k
Installing for dependencies:
 clusterlib               x86_64              3.1.92-1.fc16              updates               72 k
 corosync                 x86_64              1.4.3-1.fc16               updates              170 k
 corosynclib              x86_64              1.4.3-1.fc16               updates              149 k
 openaislib               x86_64              1.1.4-2.fc15               fedora                88 k

Transaction Summary
====================================================================================================
Install  2 Packages (+4 Dependent packages)

Total download size: 858 k
Installed size: 1.9 M
Is this ok [y/N]: y
Downloading Packages:
(1/6): clusterlib-3.1.92-1.fc16.x86_64.rpm                                   |  72 kB     00:00     
(2/6): corosync-1.4.3-1.fc16.x86_64.rpm                                      | 170 kB     00:00     
(3/6): corosynclib-1.4.3-1.fc16.x86_64.rpm                                   | 149 kB     00:00     
(4/6): modcluster-0.18.7-3.fc16.x86_64.rpm                                   | 189 kB     00:00     
(5/6): openais-1.1.4-2.fc15.x86_64.rpm                                       | 190 kB     00:00     
(6/6): openaislib-1.1.4-2.fc15.x86_64.rpm                                    |  88 kB     00:00     
----------------------------------------------------------------------------------------------------
Total                                                               803 kB/s | 858 kB     00:01     
Running Transaction Check
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : corosynclib-1.4.3-1.fc16.x86_64                                                  1/6 
  Installing : corosync-1.4.3-1.fc16.x86_64                                                     2/6 
  Installing : openaislib-1.1.4-2.fc15.x86_64                                                   3/6 
  Installing : openais-1.1.4-2.fc15.x86_64                                                      4/6 
  Installing : clusterlib-3.1.92-1.fc16.x86_64                                                  5/6 
  Installing : modcluster-0.18.7-3.fc16.x86_64                                                  6/6 
  Verifying  : openais-1.1.4-2.fc15.x86_64                                                      1/6 
  Verifying  : openaislib-1.1.4-2.fc15.x86_64                                                   2/6 
  Verifying  : clusterlib-3.1.92-1.fc16.x86_64                                                  3/6 
  Verifying  : modcluster-0.18.7-3.fc16.x86_64                                                  4/6 
  Verifying  : corosync-1.4.3-1.fc16.x86_64                                                     5/6 
  Verifying  : corosynclib-1.4.3-1.fc16.x86_64                                                  6/6 

Installed:
  modcluster.x86_64 0:0.18.7-3.fc16                  openais.x86_64 0:1.1.4-2.fc15                 

Dependency Installed:
  clusterlib.x86_64 0:3.1.92-1.fc16                 corosync.x86_64 0:1.4.3-1.fc16                  
  corosynclib.x86_64 0:1.4.3-1.fc16                 openaislib.x86_64 0:1.1.4-2.fc15                

Complete!
rpm -Uvh http://fabbione.fedorapeople.org/watchdog_fencing/cman-3.1.99-1.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/corosync-1.4.4-2.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/fence-sanlock-2.6-1.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/sanlock-2.6-1.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/sanlock-lib-2.6-1.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/corosync-1.4.4-2.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/corosynclib-1.4.4-2.fc16.x86_64.rpm \
         http://fabbione.fedorapeople.org/watchdog_fencing/clusterlib-3.1.99-1.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/cman-3.1.99-1.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/corosync-1.4.4-2.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/fence-sanlock-2.6-1.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/sanlock-2.6-1.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/sanlock-lib-2.6-1.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/corosync-1.4.4-2.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/corosynclib-1.4.4-2.fc16.x86_64.rpm
Retrieving http://fabbione.fedorapeople.org/watchdog_fencing/clusterlib-3.1.99-1.fc16.x86_64.rpm
warning: package corosync-1.4.4-2.fc16.x86_64 was already added, skipping corosync-1.4.4-2.fc16.x86_64
Preparing...                ########################################### [100%]
   1:corosynclib            ########################################### [ 14%]
   2:corosync               ########################################### [ 29%]
   3:sanlock-lib            ########################################### [ 43%]
   4:sanlock                ########################################### [ 57%]
   5:clusterlib             ########################################### [ 71%]
   6:cman                   ########################################### [ 86%]
   7:fence-sanlock          ########################################### [100%]

Configuring fence_sanlock

We need to disable wdmd and sanlock and then enable fence_sanlockd daemons.

systemctl disable wdmd.service
systemctl status wdmd.service
wdmd.service
	  Loaded: loaded (/lib/systemd/system/wdmd.service; disabled)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/wdmd.service
systemctl disable sanlock.service
systemctl status sanlock.service
sanlock.service
	  Loaded: loaded (/lib/systemd/system/sanlock.service; disabled)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/sanlock.service
systemctl enable fence_sanlockd.service
ln -s '/lib/systemd/system/fence_sanlockd.service' '/etc/systemd/system/multi-user.target.wants/fence_sanlockd.service'
systemctl status fence_sanlockd.service
fence_sanlockd.service
	  Loaded: loaded (/lib/systemd/system/fence_sanlockd.service; enabled)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/fence_sanlockd.service
Note: If you are using an pre-systemd OS, use chkconfig instead of systemctl.

Now stop cman and then start fence_sanlockd. If you're not running the cluster yet, then stopping cman is not needed.

systemctl stop cman.service
systemctl status cman.service
cman.service - LSB: Starts and stops cman
	  Loaded: loaded (/etc/rc.d/init.d/cman)
	  Active: inactive (dead)
	  CGroup: name=systemd:/system/cman.service
enable fence_sanlockd.service
ln -s '/lib/systemd/system/fence_sanlockd.service' '/etc/systemd/system/multi-user.target.wants/fence_sanlockd.service'
systemctl start fence_sanlockd.service
systemctl status fence_sanlockd.service
fence_sanlockd.service
	  Loaded: loaded (/lib/systemd/system/fence_sanlockd.service; enabled)
	  Active: active (running) since Thu, 11 Oct 2012 16:53:12 -0400; 3s ago
	 Process: 2000 ExecStart=/lib/systemd/systemd-fence_sanlockd start (code=exited, status=0/SUCCESS)
	Main PID: 2067 (fence_sanlockd)
	  CGroup: name=systemd:/system/fence_sanlockd.service
		  └ 2067 fence_sanlockd -w

Now we can re-enable the cman service.

systemctl start cman.service
systemctl status cman.service
cman.service - LSB: Starts and stops cman
	  Loaded: loaded (/etc/rc.d/init.d/cman)
	  Active: active (running) since Thu, 11 Oct 2012 16:58:10 -0400; 7s ago
	 Process: 2072 ExecStart=/etc/rc.d/init.d/cman start (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/cman.service
		  ├ 2133 corosync -f
		  ├ 2191 fenced
		  └ 2203 dlm_controld

Example Configuration

ToDo.

Note: fence_sanlock fencing and unfencing operations can take up to several minutes to complete. This is normal and expected behaviour. Other than this, the fencing operation will work as any other fence device implementation. From a user perspective there are no operational differences.

Configuring checkquorum.wdmd

Setting up checkquorum.wdwm: yum install wdmd chkconfig wdmd on cp /usr/share/cluster/checkquorum.wdmd /etc/wdmd.d/ chown root:root /etc/wdmd.d/checkquorum.wdmd chmod u+x /etc/wdmd.d/checkquorum.wdmd edit /etc/sysconfig/wdmd and add -S1 to wdmd startup options

(need to review this once we merge branch and ship default wdmd.sysconfig)

service wdmd stop service wdmd start service cman start checkquorum.wdmd does not require any fencing configuration in cluster.conf to be operational. checkquorum.wdmd can be configured via /etc/sysconfig/checkquorum

  1. Amount of time in seconds to wait after quorum is lost to fail script

waittime=60

  1. action to take if quorum is missing for over > waittime
  2. autodetect|hardreboot|crashdump|watchdog

action=autodetect actions is taken immediately if corosync crashes or exits abnormally. action is delayed "waittime" in second if the node had quorum and then lost it (in case it might be possible to rejoin the cluster and regain quorum). action hardreboot will trigger a kernel hard reboot action crashdump will trigger a kdump action in kernel action watchdog will return error to wdmd that will allow watchdog to reboot the machine action autodetect will: - if kdump is running, attempt a crash dump - if kdump is not running, return error to wdmd In case the node is running kdump, it is encouraged to use fence_kdump to allow the failed node to notify the other cluster node that a reboot has taken place, speeding up cluster recovery times. Use of fence_kdump is optional and can be considered an optimization. The following limitations apply with or without use of fence_kdump. Limitations: checkquorum.wdmd does not work with cman two_node=1. It does work in 2 node cluster when used in combination with qdiskd master_win mode. But given the shared storage requirements, it is safer to use fence_sanlock. when using checkquorum.wdmd, fencing is considered complete when the node that has failed, will rejoin the cluster in a clean state (after a reboot). If the cluster heartbeat network between the failed node and the rest of the cluster node does NOT work, the cluster will wait indefinetely. if a node suffers from a permanent hardware damage or power loss, the cluster will not be able to recover automatically. In the unsual case that all cluster nodes are disconnected from each other (permanent network loss), all nodes will assume that they need to reboot as a consequence of quorum loss. == In the current implementation it is not supported to use both fence_sanlock and checkquorum.wdmd at the same time.


Here is my own set of nodes, they mostly overlap what Fabio has written above. 1. watchdog based fencing with sanlock and shared storage - fence_sanlock configured in cluster.conf - wdmd/watchdog used to reset nodes when they fail or are fenced - wdmd uses watchdog device via /dev/watchdog - sanlock leases on shared storage verify nodes are reset, fencing

 completed by acquiring the lease of the victim

- requires shared storage to all nodes - limitations: slow to start and fence (each can take up to 5 min), does

 not work in two node clusters

- setup: create and initialize shared storage, enable fence_sanlockd - how it works: see fence_sanlock(8)

2. watchdog assisted recovery without fencing - no fencing configured in cluster.conf - fencing completed by a failed node rejoining the cluster - wdmd/watchdog script used to reset nodes in recovery situations - wdmd uses watchdog device via /dev/watchdog - loss of quorum results in a cluster reset - limitations: manual intervetion required if nodes loose power

 or experience persistent loss of the cluster network,
 does not work in two node clusters

- setup: enable checkquorum script, enable wdmd - how it works: wdmd/checkquorum/watchdog resets node, fencing is

 completed when the node cleanly rejoins the cluster after reset

3. watchdog assisted recovery with limited kexec/kdump fencing - fence_kdump configured in cluster.conf - fencing completed by a failed node rejoining the cluster or

 entering kexec environment

- wdmd/watchdog script used to enter kexec in recovery situations - wdmd uses watchdog device via /dev/watchdog - limitations: manual intervetion required if nodes loose power

 or experience persistent loss of the cluster network

- setup: enable checkquorum script, enable wdmd - how it works: wdmd/checkquorum cause node to enter kexec

 environment, fence_kdump waits for ack from kdump running in victim,
 or fencing is completed when node cleanly rejoins the cluster after
 reset

Those same points in sentence form. Leaving out 3 since I'm not sure if we want to use that. There are two methods for making use of a watchdog device in a cluster. They are available as Technical Preview in RHEL 6.4.

NOTES about #3: I think it´s difficult to define when we might need fence_kdump or not. It all depends on recovery time, reboot time, how long the node should take to rejoin the cluster to provide services etc. and I am not entirely sure how to define it. There isn´t really a "best practice" yet to do proper recommendations. How about we leave it out of the equation for 6.4 TP and see what feedback we receive from customers? 1. watchdog based fencing with sanlock and shared storage


This method uses the watchdog device for fencing using the fence_sanlock agent. The wdmd daemon and watchdog device are used to reset nodes when they fail or are fenced. sanlock leases on shared storage are used to verify that nodes are reset. Fencing completes successfully when fence_sanlock acquires the lease of the victim node. Shared storage is required among all nodes. Cluster startup, specificially the unfencing step, can take up to 5 minutes. fence_sanlock can take up to 5 minutes to fence another node. fence_sanlock can be used in two node clusters See the fence_sanlock man page for a complete description of how it works. Instructions: a) install the sanlock and sanlock-fence-agent packages sanlock-fence-agent includes fence_sanlock and fence_sanlockd. sanlock includes sanlock and wdmd. b) load watchdog module A watchdog kernel module should be loaded for the system's hardware watchdog. If no hardware watchdog is available, or no module is loaded, the wdmd init script will load the softdog module, which emulates a hardware watchdog. Set the module to be loaded at system startup by adding it to ? c) create shared storage 1G of shared storage must be available for sanlock leases. This is typically an lvm lv (non-clustered), but could also be another block device. e.g. vgcreate fence /dev/sdb2 lvcreate -n leases -L 1G fence c) initialize shared storage After creating the shared device, it must be initialized: fence_sanlock -o sanlock_init -p /dev/fence/leases d) edit cluster.conf fence_sanlock requires both fence and unfence sections. The host_id for each node must be between 1 and 128. <cluster> <clusternodes> <clusternode name="node01" nodeid="1">

       <fence>
       <method name="1">
       <device name="wd" host_id="1"/>
       </method>
       </fence>
       <unfence>
       <device name="wd" host_id="1" action="on"/>
       </unfence>

</clusternode> <clusternode name="node02" nodeid="2">

       <fence>
       <method name="1">
       <device name="wd" host_id="2"/>
       </method>
       </fence>
       <unfence>
       <device name="wd" host_id="2" action="on"/>
       </unfence>

</clusternode> </clusternodes> <fencedevices> <fencedevice name="wd" agent="fence_sanlock" device="/dev/fence/leases"/> </fencedevices> </cluster> e) enable fence_sanlockd Turn on the fence_sanlockd init script and make sure that the sanlock and wdmd init scripts are off (fence_sanlockd starts them itself). chkconfig fence_sanlockd on chkconfig sanlock off chkconfig wdmd off f) start services service fence_sanlockd start sevice cman start 2. watchdog assisted recovery without fencing


This method uses the watchdog device to reset nodes that have hung, or where corosync has terminated uncleanly, or where quorum is lost. The wdmd daemon runs the checkquorum script once every 10 seconds to test for these cluster conditions in which the node should be reset. The wdmd daemon and watchdog device reset a node when checkquorum returns a failed exit code for 5-6 consecutive tests. With this method, no fencing is configured for nodes in cluster.conf. Fencing completes successfully only when a failed node cleanly rejoins the cluster after being reset (or the fencing is manually overriden.) When a cluster looses quorum due to other nodes leaving or failing, the remaining inquorate nodes will eventually be reset by this method, even if there is nothing wrong apart from the lack of quorum. This is a very different behavior from normal cluster configurations where nodes in an inquorate cluster will simply wait for quorum to be regained. Manual intervention is required to override fencing if a node looses power that is not restored since the node will not restart and rejoin the cluster in this case. Similarly, manual intervention is required to reset a node and then override fencing if a node looses its cluster network connection and it is not restorted. This method should not be used for two node clusters. Instructions: a) load watchdog module A watchdog kernel module should be loaded for the system's hardware watchdog. If no hardware watchdog is available, or no module is loaded, the wdmd init script will load the softdog module, which emulates a hardware watchdog. Set the module to be loaded at system startup by adding it to ? b) enable checkquorum script cp /usr/share/cluster/checkquorum.wdmd /etc/wdmd.d/ c) enable wdmd script support echo WDMDOPTS=\"-G sanlock -S 1\" > /etc/sysconfig/wdmd d) enable wdmd chkconfig wdmd on e) start services service wdmd start service cman start

Setting Up tgtd As A SAN

This is not related to fence_sanlock per-se, but it is the notes I used to create a SAN to test it. Note that this is hosted on a machine outside of the cluster with the IP address of 10.255.0.222 and is exporting a storage device found at /dev/mmcblk0. A proper SAN device should be used in production, of course.

On the SAN server:

yum install scsi-target-utils
vim /etc/tgt/conf.d/sanlock.conf
<target iqn.2012-10.ca.alteeve:an-cluster-01.sanlock01>
	direct-store /dev/mmcblk0
	vendor_id Alteeve
</target>
systemctl start tgtd.service
systemctl status tgtd.service
tgtd.service - tgtd iSCSI target daemon
	  Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled)
	  Active: active (running) since Thu, 11 Oct 2012 17:22:20 -0400; 10min ago
	 Process: 13552 ExecStop=/usr/sbin/tgtadm --op delete --mode system (code=exited, status=0/SUCCESS)
	 Process: 13548 ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null (code=exited, status=0/SUCCESS)
	 Process: 13546 ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
	 Process: 13622 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v ready (code=exited, status=0/SUCCESS)
	 Process: 13589 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG (code=exited, status=0/SUCCESS)
	 Process: 13587 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
	Main PID: 13586 (tgtd)
	  CGroup: name=systemd:/system/tgtd.service
		  └ 13586 /usr/sbin/tgtd -f

On the cluster nodes:

Make sure they can see the SAN;

iscsiadm -m discovery -t sendtargets -p 10.255.0.222
10.255.0.222:3260,1 iqn.2012-10.ca.alteeve:an-cluster-01.sanlock01

Then connect to the LUN;

iscsiadm --mode node --portal 10.255.0.222 --target iqn.2012-10.ca.alteeve:an-cluster-01.sanlock01 --login
Logging in to [iface: default, target: iqn.2012-10.ca.alteeve:an-cluster-01.sanlock01, portal: 10.255.0.222,3260] (multiple)
Login to [iface: default, target: iqn.2012-10.ca.alteeve:an-cluster-01.sanlock01, portal: 10.255.0.222,3260] successful.

Looks good. Confirm with fdisk;

fdisk -l
<snip>
Disk /dev/sdb: 125 MB, 125960192 bytes
4 heads, 61 sectors/track, 1008 cylinders, total 246016 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

Done!

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.