Machine Failures

There are two major concerns regarding client and server machine failures:

When a client application is terminated with a Control + C or kill command (other than a kill -9), the AcuServer software detects the termination and closes all files held open for that client process. However, other terminal software and hardware failures may not be detected.

AcuServer offers an optional mechanism for detecting connections that terminate unexpectedly. The mechanism works by establishing regular, periodic communication from the UNIX/Linux or Windows client to AcuServer. The acuserve process keeps a record of each participating connection's periodic communication and regularly analyzes these records for evidence of a dead connection. If a connection fails to send its periodic communication for two consecutive periods, acuserve concludes that the connection is dead, closes any open files associated with the connection, and disconnects the socket. The detection mechanism is enabled with two configuration variables, one on the server and one on the client.

On the server, the detection mechanism is enabled with the DEAD_CLIENT_TIMEOUT configuration variable. When DEAD_CLIENT_TIMEOUT is set to -1, its default value, the detection mechanism is disabled. When DEAD_CLIENT_TIMEOUT is set to any other value, the detection mechanism is enabled and the variable's value is interpreted as the interval, in seconds, at which acuserve will analyze its records to detect dead connections.

On the client, the AGS_PING-TIME configuration variable enables or disables the detection mechanism. When AGS_PING_TIME is set to -1, the periodic communications mechanism is disabled and the connection cannot be automatically disconnected. When AGS_PING_TIME is set to any other value, the reporting mechanism is enabled and the connection is included in acuserve's monitoring table. The value of AGS_PING_TIME specifies the interval, in seconds, at which the client will send an I'm alive message (a no-op instruction) to the server. The value is communicated to acuserve at the time that the connection is established so that acuserve can determine, when it checks its monitoring records, whether a connection has missed two or more consecutive communication periods.

The following example illustrates the system's behavior. If DEAD_CLIENT_TIMEOUT is set to 300 (300 seconds = 5 minutes), the dead connection detection mechanism is enabled on the server and acuserve analyzes its communication records every five minutes to look for dead connections. If, on the client, AGS_PING_TIME is set to 60, when a connection is established with the server, the server is told to expect an I'm alive message every 60 seconds and the connection proceeds to send those messages. If for some reason the connection fails to send the message for two or more consecutive periods, the next time that acuserve checks its records (which it's checking every five minutes), it will detect that the connection is dead and automatically close all files opened by that connection, release all locks held by that connection, and close the associated socket.

Note: The AGS_PING_TIME mechanism depends on regular system calls, which may slow some systems (notably some versions of Solaris) and affect the runtime's performance. Be sure to test for performance before using this configuration variable in a production environment.

Also note that AGS_PING_TIME starts an internal runtime thread to send its periodic communication to the server. This means that any activity that suspends thread switching will prevent this mechanism from working.

With clients that do not participate in the dead client detection mechanism, if a client system crashes while using AcuServer, the server will hold the client's open files open until the unlock function is used to close the files, or until the acuserve daemon is stopped and restarted (when AcuServer is stopped, all open files are closed). See Closing Stranded Files for a description of the use of the unlock function.

Should the server go down, all clients actively using AcuServer will get access errors when attempting to communicate with the server. Client applications must disconnect or shut down and wait for the return of the server. All files that were open on the server at the time of the crash are left in an unknown state and may be corrupt. See C$DISCONNECT in Appendix I_Library Routines in the ACUCOBOL-GT Appendices for more information.

Note: If acuserve is automatically started when the server boots, acuserve should be immediately halted. Before AcuServer is started, all files that might have been affected by the crash should be checked and, if necessary, rebuilt. After all files have been verified, the acuserve daemon can be started.