Two-Node Fedora 14 cluster.conf: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 12: Line 12:
version number will have to be incremented by 1.
version number will have to be incremented by 1.
-->
-->
<cluster name="an-cluster" config_version="12">
<cluster name="an-cluster" config_version="1">
<!-- Totem Ring and the Redundant Ring Protocol -->
<!--
This section is used to setup Corosync's behavious. Most of the
arguments usable in corosync.conf can be defined here. Please see the
AN!Wiki for a comprehensive list of what is supported.
Currently known unsupported arguments:
- version
- threads
-->
<totem rrp_mode="passive" secauth="off">
<!--
NOTE: If you wish to use a redundant ring, it must be
NOTE: configured in each node's <clusternode...> entry. See
NOTE: below for an example.
-->
</totem>
<!-- The Cluster Manager -->
<!-- The Cluster Manager -->
<!--  
<!--  
Line 42: Line 25:


<!-- DLM; The Distributed Lock Manager -->
<!-- DLM; The Distributed Lock Manager -->
<!-- Options can be combined in one <dlm...> statement. -->
<!-- Options must be combined in one <dlm...> statement. -->
<!--  
<!--  
This tells DLM to use automatically determine whether to use TCP or  
This tells DLM to use automatically determine whether to use TCP or  
Line 49: Line 32:
The default is 'detect'.
The default is 'detect'.
-->
-->
<dlm protocol="detect" />
<!-- <dlm protocol="detect" /> -->
 
<!--
<!--
This specifies how many 100ths of a second (centiseconds) to wait
This specifies how many 100ths of a second (centiseconds) to wait
Line 56: Line 40:
DLM_LSFL_TIMEWARN flag. The default is 5 seconds ('500').
DLM_LSFL_TIMEWARN flag. The default is 5 seconds ('500').
-->
-->
<dlm timewarn="500" />
<!-- <dlm timewarn="500" /> -->
 
<!--
<!--
Setting this to '1' will enable DLM debug messages. The default is '0'
Setting this to '1' will enable DLM debug messages. The default is '0'
Line 62: Line 47:
Q: Do these messages go to /var/log/messages ?
Q: Do these messages go to /var/log/messages ?
-->
-->
<dlm log_debug="0" />
<!-- <dlm log_debug="0" /> -->
 
<!-- DLM daemon options -->
<!-- DLM daemon options -->
<!--
<!--
Line 69: Line 55:
Q. Does this allow cman to start when no fence device is configured?
Q. Does this allow cman to start when no fence device is configured?
-->
-->
<dlm enable_fencing="1" />
<!-- <dlm enable_fencing="1" /> -->
 
<!--
<!--
This controls quorum recovery dependency. The default is enabled, '1'.
This controls quorum recovery dependency. The default is enabled, '1'.
Line 76: Line 63:
  functioning?
  functioning?
-->
-->
<dlm enable_quorum="0" />
<!-- <dlm enable_quorum="0" /> -->
 
<!--
<!--
The controls the deadlock detection code. The default is '1', to enable
The controls the deadlock detection code. The default is '1', to enable
Line 83: Line 71:
Q. Is this primarily a debugging tool?
Q. Is this primarily a debugging tool?
-->
-->
<dlm enable_deadlk="0" />
<!-- <dlm enable_deadlk="0" /> -->
 
<!--
<!--
This controls the posix lock code for clustered file systems. This is
This controls the posix lock code for clustered file systems. This is
Line 113: Line 102:
(corosynclib-devel) and then read the 'cpg_overview' man page.
(corosynclib-devel) and then read the 'cpg_overview' man page.
-->
-->
<dlm enable_plock="1" />
<!-- <dlm enable_plock="1" /> -->
 
<!--
<!--
This controls the rate of plock operations per second. The default is
This controls the rate of plock operations per second. The default is
Line 120: Line 110:
network load issues.
network load issues.
-->
-->
<dlm plock_rate_limit="0"/>
<!-- <dlm plock_rate_limit="0"/> -->
 
<!--
<!--
This controls the plock ownership function. When enabled, performance
This controls the plock ownership function. When enabled, performance
Line 129: Line 120:
Q. Is this right? This should be explained better.
Q. Is this right? This should be explained better.
-->
-->
<dlm plock_ownership="1" />
<!-- <dlm plock_ownership="1" /> -->
 
<!--
<!--
This is the number of milliseconds to wait before dropping the cache
This is the number of milliseconds to wait before dropping the cache
Line 137: Line 129:
Q. Is this right?
Q. Is this right?
-->
-->
<dlm drop_resources_time="10000"/>
<!-- <dlm drop_resources_time="10000" /> -->
 
<!--
<!--
This is the number of cached items to attempt to drop each  
This is the number of cached items to attempt to drop each  
Line 145: Line 138:
Q. Is this right?
Q. Is this right?
-->
-->
<dlm drop_resources_count="10"/>
<!-- <dlm drop_resources_count="10" /> -->
 
<!--
<!--
This is the number of milliseconds that a cached item is allowed to go
This is the number of milliseconds that a cached item is allowed to go
Line 154: Line 148:
Q. Is this right?
Q. Is this right?
-->
-->
<dlm drop_resources_age="10000"/>
<!-- <dlm drop_resources_age="10000" /> -->
 
<!-- All default DLM options listed below. -->
<dlm protocol="detect" timewarn="500" log_debug="0" enable_fencing="1"
    enable_quorum="0" enable_deadlk="0" enable_plock="1"
    plock_rate_limit="0" plock_ownership="1"
    drop_resources_time="10000" drop_resources_count="10"
    drop_resources_age="10000" />
<!-- GFS Control daemon -->
<!-- GFS Control daemon -->
Line 201: Line 202:
NOTE: has been removed.
NOTE: has been removed.
-->
-->
<!--
<altname name="an-node01-sn" port="6899"  
<altname name="an-node01-sn" port="6899"  
mcast="239.94.1.1" />
mcast="239.94.1.1" />
 
-->
<!-- Fence Devices attached to this node. -->
<!-- Fence Devices attached to this node. -->
<fence>
<fence>
Line 261: Line 263:
-->
-->
<fencedevice name="batou" agent="fence_na"  
<fencedevice name="batou" agent="fence_na"  
ipaddr="batou.alteeve.com"  login="section9"  
ipaddr="batou.alteeve.com"  login="user"  
passwd="project2501" quiet="1"></fencedevice>
passwd="secret" quiet="1"></fencedevice>
<fencedevice name="motoko" agent="fence_na"  
<fencedevice name="motoko" agent="fence_na"  
ipaddr="motoko.alteeve.com" login="section9"  
ipaddr="motoko.alteeve.com" login="user"  
passwd="project2501" quiet="1"></fencedevice>
passwd="secret" quiet="1"></fencedevice>
<!--
<!--
If you have two or more fence devices, you can add the extra
If you have two or more fence devices, you can add the extra

Revision as of 21:53, 24 July 2010

 AN!Wiki :: How To :: Two-Node Fedora 14 cluster.conf

This is the /etc/cluster/cluster.conf file used in the Two Node Fedora 13 Cluster.

Note: This is used in a generic node and should be safe to use and adapt for any 2-node cluster.

<?xml version="1.0"?>
<!--
The cluster's name is "an-cluster" and, as this is the first version of this
file, it is set to version "1". Each time this file changes in any way, the
version number will have to be incremented by 1.
-->
<cluster name="an-cluster" config_version="1">
	<!-- The Cluster Manager -->
	<!-- 
	This is a special cman argument to enable cluster services to run
	without quorum. Being a two-node cluster, quorum with a failed node is
	quit impossible. :) 
	If we had no need for this, we'd just put in the self-closing argument:
	<cman/>
	-->
	<cman two_node="1" expected_votes="1">
	</cman>

	<!-- DLM; The Distributed Lock Manager -->
	<!-- Options must be combined in one <dlm...> statement. -->
	<!-- 
	This tells DLM to use automatically determine whether to use TCP or 
	SCTP depending on the 'rrp_mode'. You can force one protocol by setting
	this to 'tcp' or 'sctp'. If 'rrp_mode' is 'none', then 'tcp' is used.
	The default is 'detect'.
	-->
	<!-- <dlm protocol="detect" /> -->

	<!--
	This specifies how many 100ths of a second (centiseconds) to wait
	before dlm emits a warning via netlink. This value is used for deadlock
	detection and only applies to lockspaces created with the
	DLM_LSFL_TIMEWARN flag. The default is 5 seconds ('500').
	-->
	<!-- <dlm timewarn="500" /> -->

	<!--
	Setting this to '1' will enable DLM debug messages. The default is '0'
	(disabled).
	Q: Do these messages go to /var/log/messages ?
	-->
	<!-- <dlm log_debug="0" /> -->

	<!-- DLM daemon options -->
	<!--
	This controls fencing recovery dependency. The default is enabled, '1'.
	Set this to '0' to disable fencing dependency.
	Q. Does this allow cman to start when no fence device is configured?
	-->
	<!-- <dlm enable_fencing="1" /> -->

	<!--
	This controls quorum recovery dependency. The default is enabled, '1'.
	Set this to '0' to disable quorum dependency.
	Q. Does this mean that a non-quorum partition will attempt to continue
	   functioning?
	-->
	<!-- <dlm enable_quorum="0" /> -->

	<!--
	The controls the deadlock detection code. The default is '1', to enable
	deadlock detection. Set this to '0' to disable it. The default is '0',
	disabled.
	Q. Is this primarily a debugging tool?
	-->
	<!-- <dlm enable_deadlk="0" /> -->

	<!--
	This controls the posix lock code for clustered file systems. This is
	required by cluster-aware filesystems like GFS2, OCFS2 and similar. In
	some cases though, like Oracle RAC, plock is implemented internally and
	thus needs to be disabled in the cluster. Also, plock can be expensive
	in terms of latency and bandwidth. Disabling this may help improve
	performance but should only be done if you are sure you do not need
	posix locking in your cluster. The default is '1', enabled. To disable
	it, set this to '0'.
	
	Unlike 'flock' (file lock), which locks an entire file, plock allows
	for locking parts of a file. When a plock is set, the filesystem must
	know the start and length of the lock. In clustering, this information
	is sent between the nodes via cpg (the cluster process group), which is
	a small process layer on top of the totem protocol in corosync.
	Messages are of the form 'take lock (pid, inode, start, length)'.
	Delivery of these messages are kept in the same order on all nodes
	(total order), which is a property of 'virtual synchrony'. For example,
	if you have three nodes; A, B and C, and each node sends two messages,
	cpg ensures that the message all arrive in the same order across all
	nodes. For example, the messages may arrive as 'c1,a1,a2,b1,b2,c2'. The
	actual order doesn't matter though.

	For more information on posix locks, see the 'fcntl' man page and read
	the sections on 'F_SETLK' and 'F_GETLK'.

	For more information on cpg, install the corosync development libraries
	(corosynclib-devel) and then read the 'cpg_overview' man page.
	-->
	<!-- <dlm enable_plock="1" /> -->

	<!--
	This controls the rate of plock operations per second. The default is
	'0', which is "unlimited". Set a positive whole integer to impose a
	limit. This mat be needed is excessive plock messages are causing
	network load issues.
	-->
	<!-- <dlm plock_rate_limit="0"/> -->

	<!--
	This controls the plock ownership function. When enabled, performance
	gains may be seen where a given node repeatedly issues the same lock.
	By default, this is set to '1', enabled. This can affect backward
	compatibility with older versions of dlm. To disable it, set this to
	'0'.
	Q. Is this right? This should be explained better.
	-->
	<!-- <dlm plock_ownership="1" /> -->

	<!--
	This is the number of milliseconds to wait before dropping the cache
	of lock information. The default is 10 seconds (10000). The lower this
	value, the better the performance but the more memory will be used.
	NOTE: This value is ignored when 'plock_ownership' is disabled.
	Q. Is this right?
	-->
	<!-- <dlm drop_resources_time="10000" /> -->

	<!--
	This is the number of cached items to attempt to drop each 
	'drop_resources_time' milliseconds. The higher this number, the better
	the potential performance, but the more memory will be used.
	NOTE: This value is ignored when 'plock_ownership' is disabled.
	Q. Is this right?
	-->
	<!-- <dlm drop_resources_count="10" /> -->

	<!--
	This is the number of milliseconds that a cached item is allowed to go
	unused before it is set to be dropped. The default it 10 seconds
	(10000). The lower this value, the better the performance but the more
	memory will be used.
	NOTE: This value is ignored when 'plock_ownership' is disabled.
	Q. Is this right?
	-->
	<!-- <dlm drop_resources_age="10000" /> -->

	<!-- All default DLM options listed below. -->
	<dlm protocol="detect" timewarn="500" log_debug="0" enable_fencing="1"
	     enable_quorum="0" enable_deadlk="0" enable_plock="1"
	     plock_rate_limit="0" plock_ownership="1" 
	     drop_resources_time="10000" drop_resources_count="10" 
	     drop_resources_age="10000" />
	
	<!-- GFS Control daemon -->
	<!--
	There are several <gfs_controld...> arguments that are still supported,
	but they have been deprecated in favour of the <dlm_controld...>
	arguments. To see a full list, please read the 'gfs_controld(8)' man
	page.

	The one remaining argument that is still current is 'enable_withdraw'.
	When set to '1', the default, GFS will respond to a withdrawl. To
	disable the responce, set this to '0'.
	Q. What does the responce actually do?
	-->
	<gfs_controld enable_withdraw="1"/>
	
	<!-- Cluster Nodes -->
	<clusternodes>
		<!-- AN!Cluster Node 1 -->
		<!-- 
		The clusternode 'name' value must match the name returned by
		`uname -n`. The network interface with the IP address mapped to
		this name will be the network used by the totem ring. The totem
		ring is used for cluster communication and reconfiguration, so
		all nodes must use network interfaces on the same network for
		the cluster to form. For the same reason, this name must not
		resolve to the localhost IP address (127.0.0.1/::1).
		
		Optional <clusternode ...> arguments:
		- weight="#"; This sets the DLM lock directory weight. This is
		              a DLM kernel option.
		  Q. This needs better explaining.
		-->
		<clusternode name="an-node01.alteeve.com" nodeid="1">
			<!-- 
			By default, an initial totem ring will be created on
			the interface that maps to the name above. Under
			Corosync, this would have been "ring 0". 
			
			To set up a second totem ring. The 'name' must be
			resolvable to an IP address on the network card you
			want you second ring on. Further, all other nodes must
			be setup to use the same network as their second ring
			as well.
			NOTE: Currently broken, do not use until this warning
			NOTE: has been removed.
			-->
			<!--
			<altname name="an-node01-sn" port="6899" 
							mcast="239.94.1.1" />
			-->
			<!-- Fence Devices attached to this node. -->
			<fence>
				<!-- 
				The entries here reference devices defined
				below in the <fencedevices/> section. The
				options passed control how the device is
				called. When multiple devices are listed, they
				are tried in the order that the are listed
				here.
 
				The 'name' argument must match a 'name'
				argument in the '<fencedevice>' section below.
 
				The details must define how 'fenced' will fence
				*this* device.
 
				The 'method' name seems to be unpassed to the
				fence agent and is useful to the human reader
				only?
 
				All options here are passed as 'var=val' to the
				fence agent, one per line.
 
				Note that 'action' was formerly known as
				'option'. In the 'fence_na' agent, 'option'
				will be converted to 'action' if used.
				--> 
				<method name="node_assassin">
					<device name="batou" port="01"
							 action="reboot"/>
				</method>
			</fence>
		</clusternode>
 
		<!-- AN!Cluster Node 2 -->
		<clusternode name="an-node02.alteeve.com" nodeid="2">
			<altname name="an-node02-sn" port="6899"
							 mcast="239.94.1.1" />
			<fence>
				<method name="node_assassin">
					<device name="batou" port="02"
							 action="reboot"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<!--
	The fence device is mandatory and it defined how the cluster will
	handle nodes that have dropped out of communication. In our case,
	we will use the Node Assassin fence device.
	-->
	<fencedevices>
		<!--
		This names the device, the agent (script) to controls it,
		where to find it and how to access it.
		-->
		<fencedevice name="batou" agent="fence_na" 
			ipaddr="batou.alteeve.com"  login="user" 
			passwd="secret" quiet="1"></fencedevice>
		<fencedevice name="motoko" agent="fence_na" 
			ipaddr="motoko.alteeve.com" login="user" 
			passwd="secret" quiet="1"></fencedevice>
		<!--
		If you have two or more fence devices, you can add the extra
		one(s) below. The cluster will attempt to fence a bad node
		using these devices in the order that they appear.
		-->
	</fencedevices>
 
	<!-- When the cluster starts, any nodes not yet in the cluster may be
	fenced. By default, there is a 6 second buffer, but this isn't very
	much time. The following argument increases the time window where other
	nodes can join before being fenced. I like to give up to one minute but
	the Red Hat man page suggests 20 seconds. Please do your own testing to
	determine what time is needed for your environment.
	-->
	<fence_daemon post_join_delay="60">
	</fence_daemon>
</cluster>

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.