Master or Worker Nodes Down
This section describes the expected behavior if a master node or one or more worker nodes goes down.
- Kubernetes worker nodes will continue running even when the master is down, but if they reboot then they will not be able to find the master node and will fail.
- All services running on the master node will become unavailable.
- Transformation Hub Web Service running on the master node becomes unavailable.
- The services (Routing Stream Process) and integration (ArcMC management) that depend on the Web Service will fail.
- Any other Transformation Hub (Transform Stream Process, Schema Registry, Kafka Manager) that was running on the master will get scheduled by Kubernetes on other worker nodes depending on system resources available.
- If the master node was labeled for Kafka and/or ZooKeeper deployment, then those instances will fail but the cluster will still work with the rest of the instances on worker nodes.
- The NFS server, which runs on the master node, will become unavailable.
- Kafka and ZooKeeper do not depend on NFS storage and use local Kubernetes worker node storage. They will be available for event processing with some limitation.
- The beta feature Connector in TB (CTB) will be affected, since it depends on NFS storage, which is configured on master server.
- DNS service (
kube-dns) runs on the master server will become unavailable.- Worker nodes will lose the ability to resolve host names, except for those that had already been resolved, and which may be cached for some period.
- Any of the Transformation Hub service instances running on the worker node which is down and these instance are not tied to a worker node (such as Transform Stream Process, Routing Stream Process, Schema Registry, or Kafka Manager) will be scheduled by Kubernetes on other worker nodes, depending on system resources available on other worker nodes.
-
Depending on system resources on other worker nodes, Transformation Hub service instances that are labeled for Kafka and ZooKeeper will be automatically scheduled by Kubernetes on other worker nodes (if there are additional worker nodes already labeled for Zookeeper and kafka).
-
Likewise, the c2av-processor may cease if the worker node containing the
eb-c2av-processorgoes down and system resources prevent Kubernetes from automatically rescheduling processing on another worker node. -
If automatic re-scheduling of service instances does not occur for the Zookeeper, Kafka, or
eb-c2av-processor(that is, the node is not recoverable), run the following manual command from the master node to delete all service instances from the failed node and force Kubernetes to move the services to other nodes:
-
# kubectl delete node <Failed_Node_IP>
zookeeper and kafka labels, for the service instances to be migrated from the failed node.