Failover System

The failover system is designed to shift monitors from one execution server to another and, if there has been a failure, for example a hardware damage, to deactivate a failed server. The system does not however shift or deactivate servers if the network at the location is slow or experiencing problems. To determine if a detected failure is due to a specific execution server or the server's local network, at least two execution servers must be run at each location within the same local area network. Otherwise, if only one server runs on a network, network outages and server hardware outages cannot be distinguished and therefore automatic server deactivation for failures cannot be enabled.

How quickly a failover system reacts to a failure is defined with the Responsiveness timeout [s] setting of the execution server.

The failover phases are as follows:

  1. After 2/3 of the defined time, the administrator is warned through email that the execution server is unavailable.
  2. If the server is still inaccessible after the full timeout has expired, failover analysis is initiated.
  3. It is determined if the functioning servers can accept additional load. If they can handle additional load, monitors are shifted to other servers that provide the required resources, for example client/server, Silk Test support, and others. The failed server is then set to Inactive mode and is no longer used by monitors. Completed failover is indicated by an email to the administrator stating that the execution server is in the state of Inaccessible.
  4. Once the previous step is complete, the system attempts to connect to the failed execution server every 30 seconds to add it back to the location. If this procedure is successful, the state of the server is set to Active and monitors will be deployed via load balancing again.