Update a Running Confluent Platform Configuration Using Ansible Playbooks

You can use Ansible Playbooks for Confluent Platform to update the configuration of Confluent Platform components by rerunning the provisioning playbook with an updated inventory file.

There are two deployment strategies: rolling and parallel. Parallel is the default mode for redeployment on running clusters.

Parallel deployment

In a parallel deployment, the deployment steps happen across all nodes in a component at once. This method saves time, but leads to a service-wide simultaneous restart.

Because rolling deployments are less impactful and do not cause a service disruption, they are generally the safer option, but they do not work for every use case. Major authentication and encryption changes do not work in a rolling redeployment because, taking authentication as an example, the first node will be restarted with an updated authentication mechanism that is invalid against the rest of the cluster.

The following reconfiguration use cases are best handled with a parallel redeployment:

Major authentication changes
Updating certificates signed by a new CA or intermediate CA (for a zero-downtime alternative, see Rotate certificates signed by a new CA without downtime)
Enabling RBAC

Rolling deployment

In a rolling deployment, one node is reconfigured, redeployed, and has health checks validated before moving onto the next. In the event of a deployment failure on a node, the playbook stops and all remaining nodes stay untouched and keep the old configuration.

The following reconfigurations are best handled with a rolling deployment:

Simple property updates such as the Kafka property log.retention.hours
Java arguments updates
Environment variable updates
Updating certificates which are signed by the same CA or intermediate CA

To enable rolling deployment mode, set the following variable:

deployment_strategy: rolling

Or, to select specific components to use the rolling deployment mode, set the following variables:

kafka_broker_deployment_strategy: rolling

Additional variables and tags for reconfiguration

To have cp-ansible pause after each node passes its health check, set the below variable to true. The Ansible output logs will stop and wait for user input to proceed again. This can be useful if you want to do additional manual verification on each node.

pause_rolling_deployment: true

To specify a component to pause on, set <component>_pause_rolling_deployment to true. For example:

kafka_broker_pause_rolling_deployment: true

The Ansible Tag package has been added to package installation tasks. It is highly recommended to skip those tasks to ensure no upgrade happens. Skipping those tasks will also save time. Add the following argument to your Ansible command:

--skip-tags package

Update the Confluent Platform configuration

Before proceeding with any update, create a backup of your existing inventory file in a version control system, such as Git. This allows you to roll back to a previous configuration in case the reconfiguration fails.
Update your inventory file to reflect desired property changes on the cluster.

Run the provisioning playbook.

For a rolling deployment, run:

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --extra-vars deployment_strategy=rolling

The --extra-vars argument overrides variables in your inventory file.

For a parallel deployment, run:

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package

Failure handling

The following options are supported when a configuration update fails.

Note that many Confluent Platform components (especially Kafka) can handle single node outages.

After a deployment fails on a node, to rollback the node, revert your inventory file and redeploy on the node:

# Revert your inventory file and run the following command.

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --limit <broken-node>

Try a new configuration on the broken node. Update your inventory file once again and redeploy on the node:

# Update your inventory file and run the following command.

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --limit <broken-node>

# Now deploy against all nodes.

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package

Enter the following command if you need a parallel restart for the change to work (for example, when enabling RBAC):
```
ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  -e deployment_strategy=parallel
```

Rotate certificates signed by a new CA without downtime

This approach is for custom-provided keystores and truststores, whether newly provided (see Use custom keystores and truststores for TLS) or already existing on the hosts (see Use custom keystores and truststores already existing on hosts).

Add the new CA to the truststores. Update each truststore so that it trusts both the old and the new CA, then run a rolling deployment. Also update the truststores on all clients so that they trust both the old and the new CA. After this step, every node and client trusts certificates signed by either CA.
```
ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --extra-vars deployment_strategy=rolling
```
Replace the keystores with the new certificates. Update each keystore so that it contains only the new certificate signed by the new CA, then run a rolling deployment. Because all nodes and clients already trust the new CA from the previous step, each node can be restarted with the new certificate without breaking communication with nodes that have not yet been restarted.
```
ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --extra-vars deployment_strategy=rolling
```
Note
Provide a keystore that contains only the new certificate rather than keeping both the old and new certificates and selecting one with ssl_keystore_alias. For details, see Use custom keystores and truststores already existing on hosts.
Remove the old CA from the truststores. Update each truststore to remove the old CA, leaving only the new CA, then run a rolling deployment. After this step, remove the old CA from the truststores on all clients.
```
ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --extra-vars deployment_strategy=rolling
```

Roll back a failed rotation

In a rolling deployment, if a deployment fails on a node, the playbook stops and the remaining nodes keep their existing configuration. Because each step in the preceding sequence is backward compatible with the previous step, you can roll back a partially applied step by reverting your inventory file to the previous step and redeploying the affected node:

# Revert your inventory file to the previous step and run the following command.

ansible-playbook -i hosts.yml confluent.platform.all \
  --skip-tags package \
  --limit <broken-node>

For more failure-handling options, see Failure handling.

Next step

Troubleshoot Ansible Playbooks for Confluent Platform.