Linux Clustering

From Leo's Notes
Last edited on 1 September 2019, at 06:22.

Setup the two (or more) nodes. Give them hostnames and IP addresses.

Ensure that the firewall is opened for:

* Corosync: UDP 5404, UDP 5405
* pcs/crmsh: TCP 5560
* ICMP
* Multicast (where?)

Install the cluster software. Heartbeat is deprecated and replaced by corosync.

yum -y install corosync pacemaker pcs


Enable and start the pcs daemon:

systemctl enable pcsd.service
systemctl start pcsd.service

When the pcs package is installed, a 'hacluster' user is created. This system user's password must be set.

passwd hacluster
  1. echo redhat1 | passwd --stdin hacluster

Authenticate as the hacluster using pcs:

pcs cluster auth $hostA $hostB
Username: hacluster
Password: <What you just set previously>

Generate and synchronize corosync configs:

pcs cluster setup --name cloudcluster hostA hostB

eg:

pcs cluster setup --name cloudcluster ha hb
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
ha: Succeeded
hb: Succeeded

You can view the config that was generated at /etc/corosync/corosync.conf. The generated configs will not have anything specific except for the invidiual nodes. The 'token' value is the amount of time before a token is considered lost. The 'token_retransmits_before_loss_const' is the number of tokens retransmitted upon a loss before the cluster is considered dead. You may want to edit these two values from their defaults (1 second, 4 retransmits) to lower the failover time at the risk of higher false positives or vice versa. The 'secauth' will cause corosync to authenticate with the shared secret in /etc/corosync/autheky. If using redundant networking, 'rrp_mode' should not be none.

On one of the nodes, you can start the cluster on all the configured nodes using --all:

pcs cluster start --all

Or you can run on each individual node:

pcs cluster start

See the corosync configs using

corosync-cmapctl      | grep members  (if you want to see only members)

Check the status of the corosync ring:

corosync-cfgtool -s

Check the status of members:

pcs status corosync

Start pacemaker on all nodes:

systemctl start pacemaker

See the pacemaker status:

pcs status

Check pacemaker with the crm_mon command:

Last updated: Wed Mar 18 17:57:19 2015
Last change: Wed Mar 18 17:03:52 2015 via crmd on ha
Stack: corosync
Current DC: ha (1) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured


Online: [ ha hb ]

Verify crm configs

crm_verify -L -V

To show the configuration on the current node..

cibadmin --query --local
pcs -f 

STONITH (Shoot The Other Node In The Head aka. fencing)

"Just because a node is unresponsive doesn’t mean it has stopped accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH to ensure that the node is truly offline before allowing the data to be accessed from another node."

If you're testing, you can disable STONITH:

pcs property set stonith-enabled=false

TODO: Figure out how to get STONITH working for dell servers...?


Adding Resources

Adding IP Address

pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=172.20.20.200 cidr_netmask=32 nic=eth0:0 op monitor interval=30s

The resource contains:

* ocf: the standard - see all possibilities using `pcs resource standards`
* heartbeat: standard specific - `pcs resource providers`
* IPaddr2: resource agent - `pcs resource agents ocsf:heartbeat`

Resources can be listed with the `pcs status` or `pcs resource` command.

If the IP doesn't show up as active, then you probably still have stonith configured.



STONITH:

http://clusterlabs.org/doc/crm_fencing.html

Devices:
 UPS, PDU, IPMI



See Also