System Monitoring and Recovery

Overview

Event monitors (seemonitor.exe processes) constantly monitor a running Enterprise Server for .NET system. If a SEP pool (w3wp.exe or seesep.exe) process dies for any reason (e.g. due to a StackOverflowException or a user decides to "Stop Debugging"), the event monitor running on the same machine will detect that and instigate the recovery of any orphaned running tasks from that process. All resources associated with an orphaned task are released (e.g. ENQ locks, initiators, or printers). See Local SEP Pool Recovery below.

Event monitors may also be configured to monitor the health of SEP pools running on other machines in an Enterprise Server for .NET scale-out. When configured in this way, an event monitor will check whether a SEP pool is still active as part of its periodic system health check, and if it deems that the SEP pool is no longer alive, it will instigate the recovery of any orphaned running tasks associated with it. See Remote SEP Pool Recovery below.

In rare circumstances it is possible that a request gets dispatched, but does not get associated with a SEP for execution (e.g. communication failure between dispatcher and SEP pool processes). Each time this happens, an available SEP is effectively allocated, until a point is reached where all SEPs are assigned and no further requests can get processed by the region. Such requests now get detected as part of the event monitor system health checking and re-dispatched to an available SEP. See Dispatched Request Recovery below.

An event monitor running on the same machine as a listener (seelistener.exe process) will also periodically check that the EZ listener associated with a region is still responsive. If it is not, the region's EZ listener is reset to allow subsequent EZ socket operations to be routed to a different listener process.

The Registered Event Monitors and Registered SEP Pools view in the Enterprise Server for .NET Administration UI are provided to allow the manual monitoring of the health of event monitors and SEP pools in the system respectively, and take action if an event monitor or SEP pool is deemed to be no longer active.

Local SEP Pool Recovery

An event monitor will automatically detect when a SEP pool process running on the same machine has died and instigate the recovery of any orphaned tasks. Messages indicating that a SEP pool has died and recovery has taken place will get written to the console of each region affected:
SEEEM0058W SEP pool process (Host: MY-MACHINE; PID:3868) terminated with orphaned tasks
SEEEP0173I SEP pool process (Host: MY-MACHINE; PID:3868) recovery started
SEEEP0174I SEP pool process (Host: MY-MACHINE; PID:3868) recovery completed
Corresponding Windows event log entries will also be created.

Remote SEP Pool Recovery

SEP pool processes periodically emit a heartbeat to indicate that they are still alive. When a SEP pool process stops emitting heartbeats it can be assumed that the machine on which it is running has either died, been powered down or has lost network connectivity. Any tasks that were running on a SEP pool no longer heartbeating will have been orphaned (e.g. a job previously running in the SEP pool would still appear Active in the Enterprise Server for .NET Administration UI's Spool view and hold resource locks, but no application would actually be running).

The Registered SEP Pools view in the Enterprise Server for .NET Administration UI can be used to manually detect when a SEP pool is no longer active. This view has a Status column that indicates whether the state of a SEP pool is:
  • Ok - the SEP pool is responsive and running normally
  • In doubt - no heartbeat has been issued by the SEP pool for over 10 seconds. The H/B (s) column indicates the number of seconds since the last heartbeat
  • Unknown - the SEP pool is running on a machine with a product version installed earlier than 2.3 update 2 hot-fix 4, and so does not support heartbeating. The SEP pool may be running normally, but the system cannot determine its state.
By selecting a SEP pool in the list and clicking on the Recover button, any orphaned tasks associated with the SEP pool will be recovered.
Note: The system will reject any recovery attempt of a SEP pool that is still actively heartbeating.
Note: Be very careful when attempting to recover a SEP pool in the Unknown state as it may still be active.
Event monitors can be configured to automatically recover SEP pools that are no longer heartbeating as part of their regular system health check cycle. The following settings are provided for inclusion in the MicroFocus.SEE.Monitor > general section of the seemonitor.exe.config file:
  • EM.HealthCheck.Interval - set the interval in seconds at which the event monitor will perform system health checking (default 20 seconds)
  • EM.SepPool.Recovery - set value to "true" or "yes" to enable automatic SEP pool recovery checking (default 'false')
  • EM.SepPool.Recovery.HeartbeatElapsed - set value to the minimum time in seconds since a SEP pool's last heartbeat after which the SEP pool is considered to be dead and needs recovering (default 10 seconds)
For example:
<MicroFocus.SEE.Monitor>
    <general>
      ...
      <add key="EM.HealthCheck.Interval" value="20" />
      <add key="EM.SepPool.Recovery" value="true" />
      <add key="EM.SepPool.Recovery.HeartbeatElapsed" value="10" />
      ...
    </general>
</MicroFocus.SEE.Monitor>
Messages similar to those for a local SEP pool recovery get written to the region console and Windows event log when an event monitor automatically recovers a SEP pool.

Dispatched Request Recovery

Event monitors will periodically check to see whether dispatched requests exist that have not been associated with a SEP for execution. By default, if at least 10 seconds has elapsed since the request has been marked active, the event monitor will cause it to be dispatched to an available SEP. The event monitor can be configured to specify a different elapsed time before re-dispatching by using the following setting in the MicroFocus.SEE.Monitor > general section of the seemonitor.exe.config file:
  • EM.ActiveRequest.Elapsed - set value to the minimum time in seconds before an active request that has not been dispatched to a SEP gets re-dispatched (default 10 seconds)
For example:
<MicroFocus.SEE.Monitor>
    <general>
      ...
      <add key="EM.ActiveRequest.Elapsed" value="15" />
      ...
    </general>
</MicroFocus.SEE.Monitor>