15.7 Diagnosing a Poison Pill Condition

A poison pill is given to a node if it stops sending out the OES Cluster Services heartbeat packages, including the case when the node hangs or reboots.

To evaluate the poison pill condition on the node, look at the /var/log/messages log from the node that rebooted (was given the poison pill). Check the messages right before the restart. Normally, you can spot the process that caused the server to hang or reboot.

You can run the cluster stats display command on the surviving master node to show when the OES Cluster Services heartbeat packages stopped coming from the rebooted node. To be sure that is the case, you can also run a LAN trace to record the packages among the nodes.

Other events that might cause a pause in sending out heartbeat packages include the following:

  • Antivirus software scanning the /admin file system

  • Network traffic control (including the firewall)

  • check_oesnss_vol.pl running under Nagios

  • Packages that have been recently updated through the YaST Online Update or patch channels