RHCS 2 Overview
AN!Wiki :: RHCS 2 Overview
Note: This is a dumping ground for me to use while reading more into detail on RHCS v2 clustering.
Four major types of clusters:
- Storage (GFS/GFS2)
- High availability (aka "Failover Cluster", High Availability Systems Management)
- Load balancing (Linux Virtual Server)
- High performance (aka "Computational Cluster" or "Grid Computing")
Implemented clusters can combine the above types.
- Cluster infrastructure - Provides fundamental functions for nodes to work together as a cluster: configuration-file management, membership management, lock management, and fencing.
- High-availability Service Management - Provides failover of services from one cluster node to another in case a node becomes inoperative.
- Cluster administration tools - Configuration and management tools for setting up, configuring, and managing a Red Hat cluster. The tools are for use with the Cluster Infrastructure components, the High-availability and Service Management components, and storage.
- Linux Virtual Server (LVS) - Routing software that provides IP-Load-balancing. LVS runs in a pair of redundant servers that distributes client requests evenly to real servers that are behind the LVS servers.
Optional components that are outside the scope of RHCS are:
- GFS2 - Global File System v2 provides a cluster file system for use with Red Hat Cluster Suite. GFS/GFS2 allows multiple nodes to share storage at a block level as if the storage were connected locally to each cluster node.
- Cluster Logical Volume Manager (CLVM) - Provides volume management of cluster storage.
- Global Network Block Device (GNBD) - An ancillary component of GFS/GFS2 that exports block-level storage to Ethernet. This is an economical way to make block-level storage available to GFS2.
- A cluster service can run on only one cluster node at a time to maintain data integrity.
- Failover domains are not required for operation. A failover domain is a subset of cluster nodes that are eligible to run a particular cluster service.
- GFS/GFS2 supports up to 16 nodes. It works, but is not supported on a single node (not counting failed nodes leaving just one node alive). Partitions can be, in theory, up to 8 EB in size, but are only supported up to 25 TB. GFS partitions must be created on a CLVM-backed linear or mirrored logical volume.
- Change needs to be made to /etc/lvm/lvm.conf. Specifically, locking_type must be set to 3. This can be done by calling lvmconf --enable-cluster.
- GNDB is designed to be an GFS-specific implementation of NDB. It's useful when more robust storage, like fibre-channel, is not needed of not affordable.
Command Line Tools
- ccs_tool: Cluster Configuration System Tool
- a program for making online updates to the cluster configuration file. It provides the capability to create and modify cluster infrastructure components (for example, creating a cluster, adding and removing a node). For more information about this tool, refer to the ccs_tool(8) man page.
- cman_tool: Cluster Management Tool
- a program that manages the CMAN cluster manager. It provides the capability to join a cluster, leave a cluster, kill a node, or change the expected quorum votes of a node in a cluster. For more information about this tool, refer to the cman_tool(8) man page.
- fence_tool: a program used to join or leave the default fence domain. Specifically, it starts the fence daemon (fenced) to join the domain and kills fenced to leave the domain. For more information about this tool, refer to the fence_tool(8) man page.
- clustat: Cluster Status Utility
- The clustat command displays the status of the cluster. It shows membership information, quorum view, and the state of all configured user services. For more information about this tool, refer to the clustat(8) man page.
- clusvcadm: Cluster User Service Administration Utility
- The clusvcadm command allows you to enable, disable, relocate, and restart high-availability services in a cluster. For more information about this tool, refer to the clusvcadm(8) man page.
- CCS: Cluster Configuration System
- CLVM: Clustered Logical Volume Manager
- CMAN: Cluster MANager
- DLM: Distributed Lock Manager
- GFS2: Global File System (version 2)
- GNDB: Global Network Block Device
This section discuses particular considerations to bear in mind when using various cluster-aware services.
All nodes in a cluster must be aware of GFS2 (and GFS) partitions, regardless of whether a given node ever intends to use it. Ensure that you consider this when creating the GFS2 filesystem. Define the total number of nodes in your cluster, not simply a total of the nodes you plan to mount the GFS2 partition on.
- Fencing is required for any cluster with shared storage, period. No, fence_manual is not supported.
- Bonding is recommended to protect against single-link failures.
- If using on-board out-of-band management, like IPMI, then you must install ACPI and have it working to ensure immediate and complete shutdown. Further, you must disable ACPI Soft-Off'. Leaving it enabled could cause slow recovery or, possibly, a failed fence call which would, in turn, hang the cluster. You have a few options for disabling ACPI Soft-Off. In order of preference, they are:
- IPv6 is note supported on EL5 clusters.
- Any managed switches used in the cluster must support IGMP and have it enabled for the cluster nodes.
Ports To Open
If you run a firewall on a node, the ports on the protocols in the table below much be opened for access by all other cluster nodes and, if using, remote management machines.
|50006, 50008, 50009||TCP||ccsd|
The luci Cluster Manager
Modern versions of luci do not use luci_admin anymore. If luci_admin was not included on your client, as is the case with recent Fedora releases, try logging in with a system account. The new luci uses PAM for authentication.
The rgmanager daemon is used to control and manage high-availability services. It has three core concepts, listed in the following sections.
These are not required in a cluster. When used, the provide a subset of nodes (from one node to all nodes) that a given service or resource can or should run on.
There are five types of failover domains.
An unrestricted failover domain is one where the defined subset of nodes are preferred, when available, but a service within the domain can migrate to a node outside the domain if necessary.
A restricted failover domain is one where services are only allowed to run on the defined nodes. If none of the defined nodes are available, the service will stop and can not be started until a node in the domain rejoins the cluster.
In an unordered failover domain, no priority is given to any specific node in the domain. Thus, if a service fails on another node, the cluster will randomly choose any one of the other nodes in the domain to restart the service on.
In an ordered failover domain, each node has a defined score. When a service fails, the available node with the highest priority is chosen to start the service. The valid range is between 1 and 100, with 1 being the highest priority.
When using an Ordered failover domain, this may be used to indicate whether the service should automatically return to the higher-priority node when it returns to the cluster. The risk is that when a node repeatedly fails, reboots and returns to the cluster, a failback-enabled service may very frequently migrate, causing performance or availability issues.
- A given failover domain can be a combination or unrestricted/restricted and unordered/ordered.
- Changing a failover domain configuration has no effect on currently running services.
- A cluster does not inherently require a failover domain.
- A failover domain may consist of only one node.
Resources are items that can be used by any cluster node. These can be IP addresses, filesystem mounts, user-created scripts and so on. Resources can be public or private.
A private resource can only be used with one service.
A public resource can be used with multiple services.
A service consists of one or more resources in a cohesive group and assigned to a failover domain and configured with a failure policy. A service can only run on one node at a time.
Services are represented as a resource tree, known as an HA Service or, alternatively, a resource tree or resource group (all are the same thing). The resources in a service's resource tree can be a parent, a child or a siblings resource to the other resources in the group. At the root of these resources is a special resource called the service resource.
- Do not use qdisk on DRBD resources. They fail silently and badly.
|Any questions, feedback, advice, complaints or meanderings are welcome.|
|Us: Alteeve's Niche!||Support: Mailing List||IRC: #clusterlabs on Freenode||© Alteeve's Niche! Inc. 1997-2019|
|legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.|