Update a Running Confluent Platform Configuration Using Ansible Playbooks

You can use Ansible Playbooks for Confluent Platform to update the configuration of Confluent Platform components by rerunning the provisioning playbook with an updated inventory file.

There are two deployment strategies: rolling and parallel. Parallel is the default mode for redeployment on running clusters.

Parallel deployment

In a parallel deployment, the deployment steps happen across all nodes in a component at once. This method saves time, but leads to a service-wide simultaneous restart.

Because rolling deployments are less impactful and do not cause a service disruption, they are generally the safer option, but they do not work for every use case. Major authentication and encryption changes do not work in a rolling redeployment because, taking authentication as an example, the first node will be restarted with an updated authentication mechanism that is invalid against the rest of the cluster.

The following reconfiguration use cases are best handled with a parallel redeployment:

  • Major authentication changes
  • Updating certificates signed by a new CA or intermediate CA
  • Enabling RBAC

Rolling deployment

In a rolling deployment, one node is reconfigured, redeployed, and has health checks validated before moving onto the next. In the event of a deployment failure on a node, the playbook stops and all remaining nodes stay untouched and keep the old configuration.

The following reconfigurations are best handled with a rolling deployment:

  • Simple property updates such as the Kafka property log.retention.hours
  • Java arguments updates
  • Environment variable updates
  • Updating certificates which are signed by the same CA or intermediate CA

To enable rolling deployment mode, set the following variable:

deployment_strategy: rolling

Or, to select specific components to use the rolling deployment mode, set the following variables:

zookeeper_deployment_strategy: rolling
kafka_broker_deployment_strategy: rolling

Additional variables and tags for reconfiguration

To have cp-ansible pause after each node passes its health check, set the below variable to true. The Ansible output logs will stop and wait for user input to proceed again. This can be useful if you want to do additional manual verification on each node.

pause_rolling_deployment: true

To specify a component to pause on, set <component>_pause_rolling_deployment to true. For example:

zookeeper_pause_rolling_deployment: true
kafka_broker_pause_rolling_deployment: true

The Ansible Tag package has been added to package installation tasks. It is highly recommended to skip those tasks to ensure no upgrade happens. Skipping those tasks will also save time. Add the following argument to your Ansible command:

--skip-tags package

Update the Confluent Platform configuration

  1. Before proceeding with any update, create a backup of your existing inventory file in a version control system, such as Git. This allows you to roll back to a previous configuration in case the reconfiguration fails.

  2. Update your inventory file to reflect desired property changes on the cluster.

  3. Run the provisioning playbook.

    • For a rolling deployment, run:

      ansible-playbook -i hosts.yml confluent.platform.all \
        --skip-tags package \
        --extra-vars deployment_strategy=rolling
      

      The --extra-vars argument overrides variables in your inventory file.

    • For a parallel deployment, run:

      ansible-playbook -i hosts.yml confluent.platform.all \
        --skip-tags package
      

Failure handling

The following options are supported when a configuration update fails.

Note that many Confluent Platform components (especially Kafka) can handle single node outages.

  • After a deployment fails on a node, to rollback the node, revert your inventory file and redeploy on the node:

    # Revert your inventory file and run the following command.
    
    ansible-playbook -i hosts.yml confluent.platform.all \
      --skip-tags package \
      --limit <broken-node>
    
  • Try a new configuration on the broken node. Update your inventory file once again and redeploy on the node:

    # Update your inventory file and run the following command.
    
    ansible-playbook -i hosts.yml confluent.platform.all \
      --skip-tags package \
      --limit <broken-node>
    
    # Now deploy against all nodes.
    
    ansible-playbook -i hosts.yml confluent.platform.all \
      --skip-tags package
    
  • Enter the following command if you need a parallel restart for the change to work (for example, when enabling RBAC):

    ansible-playbook -i hosts.yml confluent.platform.all \
      --skip-tags package \
      -e deployment_strategy=parallel