Linux Clustering
Setup the two (or more) nodes. Give them hostnames and IP addresses.
Ensure that the firewall is opened for:
* Corosync: UDP 5404, UDP 5405 * pcs/crmsh: TCP 5560 * ICMP * Multicast (where?)
Install the cluster software. Heartbeat is deprecated and replaced by corosync.
yum -y install corosync pacemaker pcs
Enable and start the pcs daemon:
systemctl enable pcsd.service systemctl start pcsd.service
When the pcs package is installed, a 'hacluster' user is created. This system user's password must be set.
passwd hacluster
- echo redhat1 | passwd --stdin hacluster
Authenticate as the hacluster using pcs:
pcs cluster auth $hostA $hostB Username: hacluster Password: <What you just set previously>
Generate and synchronize corosync configs:
pcs cluster setup --name cloudcluster hostA hostB
eg:
pcs cluster setup --name cloudcluster ha hb Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... ha: Succeeded hb: Succeeded
You can view the config that was generated at /etc/corosync/corosync.conf. The generated configs will not have anything specific except for the invidiual nodes. The 'token' value is the amount of time before a token is considered lost. The 'token_retransmits_before_loss_const' is the number of tokens retransmitted upon a loss before the cluster is considered dead. You may want to edit these two values from their defaults (1 second, 4 retransmits) to lower the failover time at the risk of higher false positives or vice versa. The 'secauth' will cause corosync to authenticate with the shared secret in /etc/corosync/autheky. If using redundant networking, 'rrp_mode' should not be none.
On one of the nodes, you can start the cluster on all the configured nodes using --all:
pcs cluster start --all
Or you can run on each individual node:
pcs cluster start
See the corosync configs using
corosync-cmapctl | grep members (if you want to see only members)
Check the status of the corosync ring:
corosync-cfgtool -s
Check the status of members:
pcs status corosync
Start pacemaker on all nodes:
systemctl start pacemaker
See the pacemaker status:
pcs status
Check pacemaker with the crm_mon command:
Last updated: Wed Mar 18 17:57:19 2015 Last change: Wed Mar 18 17:03:52 2015 via crmd on ha Stack: corosync Current DC: ha (1) - partition with quorum Version: 1.1.10-32.el7_0.1-368c726 2 Nodes configured 0 Resources configured Online: [ ha hb ]
Verify crm configs
crm_verify -L -V
To show the configuration on the current node..
cibadmin --query --local pcs -f
STONITH (Shoot The Other Node In The Head aka. fencing)
"Just because a node is unresponsive doesn’t mean it has stopped accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH to ensure that the node is truly offline before allowing the data to be accessed from another node."
If you're testing, you can disable STONITH:
pcs property set stonith-enabled=false
TODO: Figure out how to get STONITH working for dell servers...?
Adding Resources
Adding IP Address
pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=172.20.20.200 cidr_netmask=32 nic=eth0:0 op monitor interval=30s
The resource contains:
* ocf: the standard - see all possibilities using `pcs resource standards` * heartbeat: standard specific - `pcs resource providers` * IPaddr2: resource agent - `pcs resource agents ocsf:heartbeat`
Resources can be listed with the `pcs status` or `pcs resource` command.
If the IP doesn't show up as active, then you probably still have stonith configured.
STONITH:
http://clusterlabs.org/doc/crm_fencing.html Devices: UPS, PDU, IPMI
See Also
- https://skcave.wordpress.com/2014/11/04/creating-high-availability-cluster-with-centos-7/
- http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/_configure_the_cluster_software.html