8.8 Configuring NCS to Monitor the eDirectory Daemon (ndsd)

OES Cluster Services requires eDirectory to be running when you manage the cluster and resources. At other times, NCS uses a file cache of information retrieved from eDirectory to cover temporary eDirectory or LDAP outages. This allows normal cluster operations to be less dependent on eDirectory than some clustered services that require the eDirectory daemon (ndsd) to be running in order to function properly. An eDirectory outage might not impact NCS, but it can render the services provided by cluster resources to be non-functional.

NCS provides the ability to monitor the status of the eDirectory daemon (ndsd) at the NCS level. It is disabled by default. The monitoring can be set independently on each node. It runs whenever NCS is running on the node. The settings persist across system restarts, patches, or upgrades.

IMPORTANT:If you enable NDSD monitoring at the NCS level, you should remove or comment out the eDirectory status checks in individual monitor scripts.

8.8.1 Understanding eDirectory Monitoring

OES Cluster Services can monitor the eDirectory daemon for nodes where eDirectory is installed. You might want to monitor the status of ndsd if you cluster services that depend on the eDirectory daemon running in order to function properly, such as for logins of CIFS, AFP, or NCP users. These service resources might run on only a subset of the nodes in the cluster or on all of the nodes. Each node’s monitoring is configured separately.

Each monitoring iteration includes an interval of time to wait until NCS sends a request for status to ndsd, and a timeout period to listen for a response. You must specify the interval and timeout periods in seconds. The minimum interval is 16 seconds. The minimum timeout is 8 seconds. The following actions occur during the iteration:

  1. The iteration begins.

  2. NCS waits for the specified interval.

  3. The interval ends, and NCS sends a status query to ndsd.

  4. The timeout period begins, and NCS listens for a response.

  5. Three actions are possible:

    • If the ndsd status returns within the timeout period and it is good, the iteration ends and monitoring continues.

    • If the ndsd status returns within the timeout period and it is not good, the specified remedy action is taken.

    • If the timeout period elapses without a response from ndsd, the specified remedy action is taken.

  6. The monitoring continues with a new iteration.

If the ndsd status is bad or no status is returned within the timeout period, NCS can take one of three configurable remedy actions: an ndsd restart, a graceful node restart, or a hard node restart.

  • restart-ndsd: Restarting the eDirectory daemon (ndsd) is the least intrusive action. However, it has the least chance of solving the problem. Even if the restart succeeds, the affected services might not be able to take advantage of the success.

  • reboot-node: A graceful node restart attempts to synchronize the file systems and terminate all running processes. This option might not resolve a service’s problem if a process is stuck, and it can delay a resource failing-over to a another node.

  • hard-reboot: A hard node restart has a better chance of forcing resources to promptly fail over to other nodes. However, it does not synchronize the file systems, and clients can lose data that has not been committed to disks.

You use the /opt/novell/ncs/install/ncs_install.py script to enable, disable, or modify monitoring of the eDirectory daemon for a node. Issue the command from a console prompt as the root user.

/opt/novell/ncs/install/ncs_install.py 
  -m <display|disable|restart-ndsd|reboot-node|hard-reboot|help>
  [-i <interval_value>] 
  [-t <timeout_value>]]

For information about the command syntax and options, see Section A.8.2, Monitoring NDSD.

8.8.2 Displaying the Current Settings for NDSD Monitoring

  1. Open a terminal console, then enter as the root user:

    /opt/novell/ncs/install/ncs_install.py -m display

Example 1: Display current settings for ndsd monitoring, and monitoring is enabled.

# /opt/novell/ncs/install/ncs_install.py -m display
NDSD monitoring is enabled.
interval = 90 seconds, timeout = 20 seconds and remedy action is rebooting node hard.

Example 2: Display current settings for ndsd monitoring, and monitoring is disabled.

# /opt/novell/ncs/install/ncs_install.py -m display
NDSD monitoring is not enabled.

8.8.3 Configuring or Modifying NDSD Monitoring

  1. Open a terminal console, then enter as the root user:

    /opt/novell/ncs/install/ncs_install.py -m <hard-reboot|reboot-node|restart-ndsd> -i <interval_value> -t <timeout_value>

Example 1: Enable ndsd monitoring with an interval of 60 seconds, timeout of 15 seconds, and remedy action of rebooting gracefully.

# /opt/novell/ncs/install/ncs_install.py -m reboot-node -i 60 -t 15

   NDSD monitoring is enabled.
   interval = 60 seconds, timeout = 15 seconds and remedy action is rebooting node.

Example 2: Change the remedy action to restarting ndsd.

# /opt/novell/ncs/install/ncs_install.py -m restart-ndsd -i 60 -t 15

   NDSD monitoring is enabled.
   interval = 60 seconds, timeout = 15 seconds and remedy action is restarting ndsd.

8.8.4 Disabling NDSD Monitoring

Monitoring of the eDirectory daemon is disabled by default. If it is enabled on a node, you can use the following procedure to disable it.

  1. Open a terminal console, then enter as the root user:

    /opt/novell/ncs/install/ncs_install.py -m disable

Example: Disable ndsd monitoring.

# /opt/novell/ncs/install/ncs_install.py -m disable
ndsd monitoring is not enabled (3).