Pacemaker is a cluster resource manager used in Red Hat/CentOS 6.5+/7. This is a short post on considerations for building a server cluster using Pacemaker. This is not a tutorial.
A cluster is a democracy for computers. The primary purpose of a cluster member (node) is voting on the availibility and viability of other members. A cluster member does not necessarily have to do anything other than vote. Not all nodes in a cluster need to do the same thing.
When a member of a cluster is quorate it means it has decided that over 50% of the other members are available to it. Any node that is inquorate should fence itself off from the cluster. The cluster should attempt to fence any resource that it decides is unavailable.
There is no quorum without at least three nodes and there should be an odd number of nodes.
Fencing and STONITH
Fencing removes "bad" nodes or resources from the cluster. "Shoot the other node in the head" (STONITH) is the most extreme form of fencing. STONITH will power off or reboot a node.
There is little point ensuring service continuity if the underlying data is toast.
Do I Need STONITH?
Almost always, the answer is Yes!
STONITH devices range from power switches with an IP address (not to be confused with network switches) to the iDRAC interfaces built into DELL servers. STONITH may mean budgeting for hardware and rack units.
If your fence device is not supported you may need to write your own fence agent or source it from elsewhere.
Fence agents match on
/usr/sbin/fence_* on CentOS 6.x.
A resource can be practically anything. Typically resources are services like Apache HTTP Server or "virtual" IP addresses (VIPs) that move between nodes during failover. Resource agents are scripts that start, stop and monitor the state of resources.
Pacemaker supports three basic categories of resource:
- Default - run on a single node at a time
- Clones - run on multiple nodes at a time
- Multi-state - a specialization of cloning for things like master/slave
It is important to understand how resource agents work when building complex clusters.
Consider an asymmetric 4-node cluster with two sets of master/slave that use the same service for different purposes:
foo- the init.d script started with
service foo start
Bar- a resource defining a master/slave pair that relies on foo
- Has constraints to prefer nodes 1 & 2
- Has constraints to avoid nodes 3 & 4 at
Baz- another resource defining a master/slave pair that relies on foo
- Has constraints to prefer nodes 3 & 4
- Has constraints to avoid nodes 1 & 2 at
If all the resource does to ensure that
Bar is stopped on nodes 3 & 4 is call
service foo status it will mistakenly stop
For complex clusters you may need to write your own resource agents.
Resource agents are located under
/usr/lib/ocf/resource.d/ on CentOS 6.x.