Troubleshooting Your Cluster
The following troubleshooting tips may be helpful in resolving issues with the CDF cluster.
| Issue | Description |
|---|---|
|
Installation of master nodes fails |
During installation, installation of Master Nodes can fail with the error:
Ensure that your |
| Installation times out |
During installation, the process may time out with the error:
Ensure your |
During sudo installation, worker node fails to install |
During the Add Node phase, if one or more of the worker nodes fails to install and the log shows the following error message: [ERROR] : GET Url: https://itom-vault.core:8200/v1/***/PRIVATE_KEY_CONTENT_{hostname}_{sudo user}, ResponseStatusCode: 404
You can take the following steps to rectify the issue:
|
| Cluster list empty in Kafka Manager | If cluster list is empty in the Kafka Manager UI, delete the existing Kafka Manager pod and try the UI again after a new Kafka Manager pod is back to the Running state. |
| Worker nodes out of disk space and pods evicted |
If the worker nodes run out of disk space, causing the pods on the node to go into Evicted status, try one of the following steps:
{install dir} /kubernetes/bin/kube-restart.sh
For information on adjusting the eviction threshold, see "Updating the CDF Hard Eviction Policy." |
| Kafka fails to start up; fails to acquire lock or corrupted index file found |
Many scenarios can cause a failure for Kafka to start up and report either Workaround: To resolve this on the problematic Kafka node:.
1. Go to the directory: 2. Find the file
3. Search for all index files: 4. Delete all the corrupted index files 5. Restart the affected Kafka pod. |
| Slow network or slow VM response during upgrade causes delay or failure of web services operations |
An intermittent issue has been observed with web service pod startup, during the upgrade toTranformation Hub 3.3, that correlates with slow network and/or slow VM response. The pod startup gets blocked or delayed, leading to various issues, such as failing to create new topics and/or failing to register the new schema version. One error seen in the web service log file is, |
| Arcsight database rejects new sessions because the maximum sessions limit is reached |
You might observe the following error in the logs: Workaround: Do one of the following:
|
| ArcSight Database fails to restart |
If the database fails to start, you can run a set of commands to recover the last known good set of data and restart the database. For example, the database might not restart after an unexpected shutdown. Please consult your database administrator for the commands to run. |
| Multiple node failures |
Here are some considerations when handling node failures on 3 or more worker nodes.
|
| Second upgrade fails or some resources aren't really upgraded after it |
In some cases, a second upgrade may fail completely or fail to upgrade resources. If this is encountered, run the following command: Wait until the suite-upgrade-pod-arcsight-installer is deleted, then begin the second upgrade again. |
| CDF deployment fails on servers running VMWare VMotion | Installation of CDF may fail on virtual machines running the VMWare product VMotion. If this occurs, run the installation of CDF again but disable VMotion on all CDF virtual machines. |
| After adding or reducing stream processor instances, Kafka Manager fails to show accurate consumer information for topics |
After adding or reducing the number of stream processor instances, Kafka Manager may fail to show correct consumer information for some topics. To get the most current consumer information, restart the Kafka Manager pod with the command: Then reconnect to the Kafka Manager UI. |
| Kafka Manager not displaying members of the consumer group | When a new member is added to a consumer group, Kafka Manager must be restarted in order to display the new members. This applies to Logger, ESM, Vertica Scheduler, SOAR, and Intelligence. |
| New partition source topics not correctly displayed in Kafka Manager | Changes to the partition source topics in Kafka Manager may take up to 5 minutes to refresh and display correctly. |