Heartbeat configuration

We will now configure Heartbeat for automatic fail-over. I have written [a tutorial[(http://blog.thaigrid.or.th/2009/02/heartbeat-for-ha-tutorial-for-dummy.html), which cover most of the configuration that need to be done. Here some specific configuration

  • Only add 2 FE nodes as node in Heartbeat. Comput nodes are not counted
  • Remove IP Aliasing that we have done in previous step.
  • We only need IP fail-over for resource configuration. You may add NAS fail-over configuration if needed. Consult Linux-ha manual for NAS fail-over configuration
  • My cluster-property-set
<cluster_property_set id="cib-bootstrap-options">
<attributes>
 <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="false"/>
 <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="freeze"/>
</attributes>
</cluster_property_set>
  • My resource configuration is
<resources>
<group id="FE_group">
<primitive id="FE_LOCAL_IP" class="ocf" type="IPaddr" provider="heartbeat">
  <instance_attributes id="07a4de76-1ca8-47e7-9565-edb9acd8101d">
    <attributes>
      <nvpair name="ip" value="172.16.0.1" id="dd7cf63e-450a-4f69-ae12-4a13c35d573a"/>
      <nvpair name="nic" value="eth0" id="a65589d0-4e8f-40ef-a158-5725515d6714"/>
      <nvpair name="cidr_netmask" value="255.255.0.0" id="a4a92850-4080-44bb-826c-59c8ff7d81a1"/>
      <nvpair id="6b7e9089-a853-43d9-846b-418929125911" name="local_start_script" value="/usr/local/sbin/restart_named"/>
      <nvpair id="3f84554c-5044-46b7-92c7-9ab6fee8061f" name="local_stop_script" value="/usr/local/sbin/restart_named"/>
    </attributes>
  </instance_attributes>
</primitive>
<primitive id="FE_PUBLIC_IP" class="ocf" type="IPaddr" provider="heartbeat">
  <instance_attributes id="398fa300-f96b-433f-865e-a12a60d76820">
    <attributes>
      <nvpair name="ip" value="203.123.123.123" id="85aebb90-837b-458f-963a-f8827f3b09fe"/>
      <nvpair name="nic" value="eth1" id="a3513e93-08c5-42f2-964a-7e551e24b6dd"/>
      <nvpair name="cidr_netmask" value="255.255.255.0" id="8af8d9f2-ea65-4ad4-b828-199f1fdcb54f"/>
    </attributes>
  </instance_attributes>
</primitive>
<instance_attributes id="FE_group">
  <attributes>
    <nvpair id="FE_group-target_role" name="target_role" value="started"/>
  </attributes>
</instance_attributes>
</group>
</resources>
  • notice something? the local_start_script and local_stop_script is needed explicitly by ROCKS, because ROCKS frontend is also a DNS for cluster. By default, BIND will bind to all network interface in server in order to correctly build DNS query response packet. This will cause problem when we fail-over IP, so we need this directive to make Heartbeat restart named once fail-over is finished.

  • Contraint configuration

<constraints>
<rsc_location id="rsc_location_fe_1" rsc="FE_group">
<rule id="prefered_location_fe_1" score="100">
 <expression attribute="#uname" id="prefered_location_fe_1_expr" operation="eq" value="fe1.public"/>
</rule>
</rsc_location>
<rsc_location id="rsc_location_fe_2" rsc="FE_group">
<rule id="prefered_location_fe_2" score="50">
 <expression attribute="#uname" id="prefered_location_fe_2_expr" operation="eq" value="fe2.public"/>
</rule>
</rsc_location>
</constraints>
  • Make sure that value match the output of “uname -n” on that node

  • One surprise, although the “local_start_script” and “local_stop_script” is documented in IPAddr resource of Heartbeat, However, I found that it is actually never being used!. I have to modified IPAddr script to make this configuration works. Please download this script and replace /usr/lib/ocf/resource.d/heartbeat/IPaddr.

Testing

  • If you have follow the previous Heartbeat Tutorial, now everything should works. Try shutdown your primary FE and another FE should take over. Compute node should still be able to communicate to external network.
  • Make sure that you have done /home and /share isolation to external NAS storage. Now you can continue SSH to your Cluster public IP even if a FE is down.