Troubleshoot Ansible Playbooks for Confluent Platform¶
Complete the following steps if Ansible Playbooks for Confluent Platform (Confluent Ansible) fails:
Review the error log output from Ansible itself.
It will show the type of failure which has occurred and might indicate a misconfiguration in your inventory file. For example, if you set a file path variable to an invalid path, the logs will say “Could not find or access “ the file path, and you need to correct the variable and rerun the install.
Review your inventory file.
Validate that all variables are set correctly, with proper spacing in the inventory file. You can review
hosts_example.yml
and thesample_inventories
directory for examples.Review component log and property files.
If a component health check fails, the playbook will fetch log and property files back to the Ansible Control Node inside the
error_files/
directory. These error logs can indicate which properties are misconfigured.- If Confluent Ansible was downloaded from Ansbile Galaxy or Ansible
Automation Hub, the
error_files/
directory is under~/.ansible/collections/ansible_collections/confluent/platform/
. - If Confluent Ansible was downloaded from GitHub, the
error_files/
directory is located under the root ofcp-ansible
.
- If Confluent Ansible was downloaded from Ansbile Galaxy or Ansible
Automation Hub, the
If the log files do not provide a clear reason for the failure, use one of the following methods to generate more info:
Rerun the playbook with the
--diff
option and redirect the output to a file. For more information about the flag, see Ansible Playbook Options.This outputs the differences in the playbook files and templates. With this option, sensitive information, such as passwords, certificates, and keys, are not printed in the output.
Rerun the playbook again with the
-vvv
option to enable debug logging and redirect the output to a file:ansible-playbook -vvv -i hosts.yml confluent.platform.all > failure.txt
When debug is enabled, the information in the output cannot be suppressed, including sensitive information, such as passwords, certificates, and keys. It is not recommended to use the debug mode in production environments. For details, see Logging Ansible Output.
Open a support ticket with Confluent Support and provide the following within a compressed archive file:
Your inventory file
The log files generated from the
-vvv
or--diff
option.The
error_logs/
directory and its contentsThe output of the following GIT commands as a text file. Run the command from the root of
cp-ansible
to show any changes made to thecp-ansible
source code:git status
git diff
Generate logs using the fetch-logs playbook¶
When troubleshooting, you might need to collect the service logs and config
files of all Confluent components. Instead of having to ssh
to each host
machine and fetch the logs/files, you can run the fetch_logs
playbook to
get all service logs/config files in a single directory on the control node.
The playbook stores the gathered log files in a separate zip file for each component. The zip files are located in:
~/.ansible/collections/ansible_collections/confluent/platform/playbooks/troubleshooting
if Confluent Ansible was downloaded from Ansbile Galaxy or Ansible Automation Hub<the root of cp-ansible>/playbooks/troubleshooting
if Confluent Ansible was downloaded from GitHub<cp-ansible-directory>
To gather logs and config files of all components:
ansible-playbook -i hosts.yml confluent.platform.fetch_logs
To gather service logs and config files of a specific component use the
--tags
flag with the component name. For example, to get logs and config
files for Kafka:
ansible-playbook -i hosts.yml confluent.platform.fetch_logs --tags 'kafka_broker'
Troubleshoot known issues¶
Issue: An error, “Clusters not found”, returns after Kafka brokers restart¶
If keys get updated on Kafka brokers or any other Confluent Platform component, the communication between component services and the brokers would get broken.
Solution: Regenerate certificates when you are updating the keys, during an update or a redeployment of the cluster.
To regenerate certificates along with keys in your inventory file:
regenerate_ca: true
To only update the certificates whe keys have been already generated:
regenerate_ca: true regenerate_keystore_and_truststore: false
Issue: Need to enable both mTLS and SASL authentication modes for ZooKeeper¶
There is a limitation that Confluent Ansible does not support both mTLS and SASL authentication modes for ZooKeeper out-of-the-box.
Workaround: Configure the mTLS and SASL authentication modes using an override with the following properties in your inventory file:
zookeeper_sasl_protocol: scram
ssl_enabled: true # This will generate the needed x509 provider authProvider.x509=org.apache.zookeeper.server.auth.X509AuthenticationProvider
zookeeper_client_authentication_type: digest # This will generate the needed SASL provider authProvider.sasl=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
zookeeper_custom_properties:
ssl.clientAuth: need # This will force mTLS for client connecting to Zookeeper
# The following overrides are needed for the brokers to connect to Zookeeper using mTLS
kafka_broker_custom_properties:
zookeeper.ssl.keystore.location: /var/ssl/private/kafka_broker.keystore.jks
zookeeper.ssl.keystore.password: confluentkeystorestorepass
Issue: Python version mismatch¶
If you are using different versions of Python across the nodes, for example, the Ansible control node has python 2.7 installed while the target nodes has Python 3 set as default python, you may hit the following error:
The Python 2 bindings for rpm are needed for this module. If you require
Python 3 support use the `dnf` Ansible module instead. The Python 2 yum
module is needed for this module. If you require Python 3 support use the
`dnf` Ansible module instead.
Solution: Install the same, recommended Python version on all of your control nodes and managed nodes.
Issue: Missing Ansible POSIX collection¶
If the required Ansible POSIX collection is missing in your environment, you will get an error similar to:
ERROR! couldn't resolve module/action 'sysctl'
Solution: Install the Ansible POSIX collection.
ansible-galaxy collection install ansible.posix
Issue: Incorrect Ansible hash behavior¶
If the default Ansible hash behavior is not set to MERGE
, you will get an
error similar to:
TASK [confluent.platform.common : Confirm Hash Merging Enabled]
fatal: [ip-10-0-2-212.us-west-2.compute.internal]: FAILED! => {
"assertion": "lookup('config', 'DEFAULT_HASH_BEHAVIOUR') == 'merge'",
"changed": false,
"evaluated_to": false,
"msg": "Hash Merging must be enabled in ansible.cfg"
}
Solution: Set the Ansible hash behavior to merge
.
export ANSIBLE_HASH_BEHAVIOUR=merge
Issue: Missing Ansible community general collection¶
If the required Ansible community general collection is missing in your environment, you will get an error similar to:
TASK [confluent.platform.common : Custom Java Install]
ERROR! couldn't resolve module/action 'alternatives'.
Solution: Install the Ansible community general collection.
ansible-galaxy collection install community.general
Issue: Missing Ansible community crypto collection¶
If the required Ansible community crypto collection is missing in your environment, you will get an error similar to:
TASK [confluent.platform.ssl : Create Keystore and Truststore with Self Signed Certs]
ERROR! couldn't resolve module/action 'community.crypto.certificate_complete_chain'
Solution: Install the Ansible community crypto collection.
ansible-galaxy collection install community.crypto
Issue: Corrupted master key¶
When your master key is corrupted, you get an error message similar to the following:
TASK [confluent.platform.common : Encrypt Properties] **************************
task path: /root/.ansible/collections/ansible_collections/confluent/platform/roles/common/tasks/secrets_protection.yml
Error! failed to unwrap the data key: invalid master key or corrupted data key
Solution: Let Confluent Ansible recreate the master key.
Remove the variables,
regenerate_masterkey
,secrets_protection_masterkey
, andsecrets_protection_security_file
from your inventory file.Run the following command:
ansible-playbook -i <inventory.yml> confluent.platform.all --skip-tags package