AN!Cluster Recovery

From AN!Wiki
Jump to: navigation, search

 AN!Wiki :: How To :: AN!Cluster Recovery

Template note icon.png
Note: This tutorial is designed for users and customers who are running AN!Clusters, based on the 2-Node Red Hat KVM Cluster Tutorial tutorial.

This tutorial is designed to provide very basic, to the point steps to guide you through recovery of an AN!Cluster after a fault. It is not designed as a primary education on this topic. For a proper understanding of the steps below, please take the time to study the main tutorial.

Node Was Rebooted Uncleanly Rebooted, VM(s) Offline

If a node was rebooted using the reboot command, then it is possible for the VMs to have crashed in such a manner to leave them in the failed state. Once a VM goes into a failed state, it will not recover without human intervention. This section checks for and recovers from this scenario.

{{note|1=This section assumes you have a dedicate "cluster monitor" machine. If you don't, then use a machine running linux with virt-manager installed.

Recovery Steps When Both Nodes Are Running

Template note icon.png
Note: In this case, enabling the service should be performed on the known-good node. For this reason, the -m <target node> is intentionally left off the enable step. This will cause the VM to restart on the known-good node, assuming you make the call on the node which had not rebooted.

This recovery procedure assumes that the rebooted node was restored.

  1. Log into your cluster monitor machine (or a workstation with virt-manager) installed.
    1. Open up four terminals as described here.
    2. On the top two terminals, ssh into the first node in the cluster.
    3. On the bottom two terminals, ssh into the second node in the cluster.
    4. On the top and bottom right terminals, run clear; tail -f -n 0 /var/log/messages.
  2. Run clustat on the top and bottom left windows.
    1. Examine each VM (vm:vmXXXX-desc.
    2. For each VM whose state is failed, you need to disable and then enable it. The disable step is critical as that is how you tell the cluster that you have examined the problem and determined that the service is safe to recover.
    3. Run: clusvcadm -d vm:vmXXXX-desc, then re-run clustat. Ensure the new state is stopped.
    4. Run: clusvcadm -e vm:vmXXXX-desc, then re-run clustat. Ensure the new state is started.


Any questions, feedback, advice, complaints or meanderings are welcome.
Us: Alteeve's Niche! Support: Mailing List IRC: #clusterlabs on Freenode   © Alteeve's Niche! Inc. 1997-2019
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.
Personal tools