Previous Topic Next topic Print topic


System Monitoring and Recovery

Overview

Event monitors (seemonitor.exe processes) constantly monitor a running Enterprise Server for .NET system. If a SEP pool (w3wp.exe or seesep.exe) process dies for any reason (e.g. due to a StackOverflowException or a user decides to "Stop Debugging"), the event monitor running on the same machine will detect that and instigate the recovery of any orphaned running tasks from that process. All resources associated with an orphaned task are released (e.g. ENQ locks, initiators, or printers).

Event monitors may also be configured to monitor the health of SEP pools running on other machines in an Enterprise Server for .NET scale-out. When configured in this way, an event monitor will check whether a SEP pool is still active as part of its periodic system health check, and if it deems that the SEP pool is no longer alive, it will instigate the recovery of any orphaned running tasks associated with it.

An event monitor running on the same machine as a listener (seelistener.exe process) will also periodically check that the EZ listener associated with a region is still responsive. If it is not, the region's EZ listener is reset to allow subsequent EZ socket operations to be routed to a different listener process.

The Registered Event Monitors and Registered SEP Pools view in the Enterprise Server for .NET Administration UI are provided to allow the manual monitoring of the health of event monitors and SEP pools in the system respectively, and take action if an event monitor or SEP pool is deemed to be no longer active.

Local SEP Pool Recovery

An event monitor will automatically detect when a SEP pool process running on the same machine has died and instigate the recovery of any orphaned tasks. Messages indicating that a SEP pool has died and recovery has taken place will get written to the console of each region affected:
SEEEM0058W SEP pool process (Host: MY-MACHINE; PID:3868) terminated with orphaned tasks
SEEEP0173I SEP pool process (Host: MY-MACHINE; PID:3868) recovery started
SEEEP0174I SEP pool process (Host: MY-MACHINE; PID:3868) recovery completed
Corresponding Windows event log entries will also be created.

Remote SEP Pool Recovery

SEP pool processes periodically emit a heartbeat to indicate that they are still alive. When a SEP pool process stops emitting heartbeats it can be assumed that the machine on which it is running has either died, been powered down or has lost network connectivity. Any tasks that were running on a SEP pool no longer heartbeating will have been orphaned (e.g. a job previously running in the SEP pool would still appear Active in the Enterprise Server for .NET Administration UI's Spool view and hold resource locks, but no application would actually be running).

The Registered SEP Pools view in the Enterprise Server for .NET Administration UI can be used to manually detect when a SEP pool is no longer active. This view has a Status column that indicates whether the state of a SEP pool is:
  • Ok - the SEP pool is responsive and running normally
  • In doubt - no heartbeat has been issued by the SEP pool for over 10 seconds. The H/B (s) column indicates the number of seconds since the last heartbeat
  • Unknown - the SEP pool is running on a machine with a product version installed earlier than 2.3 update 2 hot-fix 4, and so does not support heartbeating. The SEP pool may be running normally, but the system cannot determine its state.
By selecting a SEP pool in the list and clicking on the Recover button, any orphaned tasks associated with the SEP pool will be recovered.
Note: The system will reject any recovery attempt of a SEP pool that is still actively heartbeating.
Note: Be very careful when attempting to recover a SEP pool in the Unknown state as it may still be active.
Event monitors can be configured to automatically recover SEP pools that are no longer heartbeating as part of their regular system health check cycle. The following settings are provided for inclusion in the MicroFocus.SEE.Monitor > general section of the seemonitor.exe.config file:
  • EM.HealthCheck.Interval - set the interval in seconds at which the event monitor will perform system health checking (default 20 seconds)
  • EM.SepPool.Recovery - set value to "true" or "yes" to enable automatic SEP pool recovery checking (default 'false')
  • EM.SepPool.Recovery.HeartbeatElapsed - set value to the minimum time in seconds since a SEP pool's last heartbeat after which the SEP pool is considered to be dead and needs recovering (default 10 seconds)
For example:
<MicroFocus.SEE.Monitor>
    <general>
      ...
      <add key="EM.HealthCheck.Interval" value="20" />
      <add key="EM.SepPool.Recovery" value="true" />
      <add key="EM.SepPool.Recovery.HeartbeatElapsed" value="10" />
      ...
    </general>
</MicroFocus.SEE.Monitor>
Messages similar to those for a local SEP pool recovery get written to the region console and Windows event log when an event monitor automatically recovers a SEP pool.
Previous Topic Next topic Print topic