Restarting Nodes in the Cluster

Click here to restart worker nodes in the AWS cluster.

If you need to restart or shut down any node in the cluster, you must stop Kubernetes and the databases services running on the node in order to enable the Kubernetes pods to start after a node restart, and to prevent database corruption.

To shut down a single worker nodes:

You must follow the steps below to bring down a single worker node from a cluster of one master and many worker nodes:

  1. Execute the following command to know fusion pods running on worker nodes 1 and 2:

    kubectl get pods -o wide -n arcsight-installer-fzj5b | grep fusion
  2. Select worker node 1 and execute the following command:

    kubectl drain worker node 1 --force --ignore-daemonsets --delete-local-data
  3. Wait till pods come up on worker node 2.

  4. Execute the following command on worker node 1 to shut down:

    shutdown -h now

To restart the single worker and join back to the cluster:

To restart and join worker node 1 back to the cluster, execute the command:

kubectl uncordon worker node 1

To shut down the entire Kubernetes cluster:

You must follow the steps below to safely bring down all master and worker nodes:

  1. Execute the following command as the root user on all worker nodes:

    shutdown -h now
  2. Execute the following command on the master node after all worker nodes are shut down:

    cd <K8S_HOME>/bin/
    /kube-stop.sh
    ./kubelet-umount-action.sh
    reboot

To restart all master and worker nodes:

You must follow the steps below to restart all worker and master nodes:

  1. Execute the following command to restart all worker nodes simultaneously or individually:

    kubectl uncordon worker node 1
  2. Execute the following command to restart the master node:

    cd <K8S_HOME>/bin/
    ./kube-status.sh
  3. Execute the following command to start a service if not running:

    ./kube-start.sh

To restart nodes manually:

  1. If the node contains CDF, perform the following from the master node and worker node:

    1. Log in to the node you need to restart as the root user.

    2. Change to the following directory:

      cd <K8S_HOME>/bin/

      For example: /opt/arcsight/kubernetes/bin

    3. Execute the following command to stop the Kubernetes services:

      kube-stop.sh

    4. Execute the following command to unmount Kubernetes volumes:

      kubelet-umount-action.sh

  2. If the node contains the database, see Rebooting Database Cluster.

  3. Restart the node:

    reboot

  4. If restart fails, perform a hard reboot of the node.

  5. After the node restarts, do the following if the node contains the database:

    1. Log in to the node as a database administrator.

    2. Execute the following command to start the database services:

      /opt/vertica/bin/admintools -t start_db -p <database_password> -d fusiondb --force

  6. After the node restarts, do the following if the node contains CDF:

    1. Log in to the node as root.

    2. Change to the following directory:

      cd <K8S_HOME>/bin/

      For example: /opt/arcsight/kubernetes/bin

    3. Check whether all Kubernetes services are running:

      kube-status.sh

    4. (Conditional) If any of the services is not running, start the service:

      kube-start.sh

To restart worker nodes in the AWS cluster:

  1. Log in to the bastion host.

  2. Using the Find Services search tool, locate and browse to the EC2 Dashboard.

  3. Do one of the following to stop and start the worker nodes:

    • By using the Instances option:

      1. In the left navigation pane, under Instances, click Instances. A list of the node instances is displayed.

      2. Select the worker node instance or instances, click Instance state, then select Stop instance from the drop-down list.

      3. Click the Refresh icon until a new set of worker node instance IDs is created.

    • By using the Auto Scaling option:

      1. In the left navigation pane, under Auto Scaling, click Auto Scaling Groups. A list of the created Auto Scaling groups is displayed.

      2. Select the required Auto Scaling group, then click Edit.

      3. Update the values of Desired capacity, Minimum capacity, Maximum capacity to 0.

      4. Click Update.

      5. In the left navigation pane, under Instances, click Instances. A list of the node instances is displayed.

      6. Ensure that the worker nodes corresponding to the Auto Scaling group that you edited are in the Terminated state.

      7. Repeat steps a and b.

      8. Update the values of Desired capacity, Minimum capacity, Maximum capacity to the earlier values.

      9. Click Update.

      10. In the left navigation pane, under Instances, click Instances. A list of the node instances is displayed.

      11. Click the Refresh icon until a new set of worker node instance IDs is created.

  4. Start by Labeling Cloud (AWS) Worker Nodes.

  5. Update the target groups for Ports 5443 Adding Targets to the Target Group for Port 5443) and Ports 443, 32081 (Adding Targets to the Target Group for Port 443 or 32081) with the new worker node Instance IDs.

  6. Log in to the CDF Management Portal (seeOpening the Management Portal) with the following credentials:

    User name: admin

    Password: <the password you provided during CDF installation>

  7. Click , then click Reconfigure.

  8. Depending on the capabilities for which you have assigned labels to the worker nodes, click the relevant tabs and reconfigure the properties.

    If the node you have restarted is an HDFS namenode, then ensure that you update the HDFS NameNode field in the Intelligence tab with the hostname or IP address of the worker node labeled as intelligence-namenode:yes.
  9. Click Save.

  10. SeeConfiguring HDFS Services to Use Keytabs to update the core-site.xml with the new value of the HDFS namenode.

  11. See Checking Cluster Status