Migrate Confluent Platform from ZooKeeper to KRaft using Ansible Playbooks
Ansible Playbooks for Confluent Platform (Confluent Ansible) supports migration from a ZooKeeper-based Confluent Platform deployment to a KRaft-based deployment.
To safely migrate your hosts and to achieve zero downtime, Confluent Ansible performs rolling upgrades host by host, shutting down the component, upgrading packages, restarting the service, and validating service health before moving on to the next one.
Requirements and considerations
Confluent Ansible only supports migration over the same Confluent Platform version of 7.6 or later.
Note that migrating Confluent Platform 7.6.0 clusters is not recommended for production environments.
You need to upgrade Confluent Platform first before running the migration.
You cannot upgrade of Confluent Platform version and migrate ZooKeeper to KRaft at the same time.
You can upgrade from ZooKeeper to KRaft in isolated mode (the controller having
process.roles=controllerand broker havingprocess.roles=broker)You cannot migrate to the combined mode where KRaft and brokers are on the same process (
role=controller, broker).You can migrate colocated clusters where ZooKeeper and brokers are on the same node.
You can migrate ZooKeeper cluster to KRaft running on the same node.
Beware of port collisions if colocating components on the same host. If ZooKeeper and KRaft Controller are colocated, use the variables
kafka_controller_jolokia_portandkafka_controller_jmxexporter_portto define different ports for ZooKeeper and KRaft. For example,kafka_controller_jolokia_port: 7777andkafka_controller_jmxexporter_port: 8081.ACL is migrated from ZooKeeper to KRaft.
Confluent Ansible supports one-to-many or many-to-one mapping of ZooKeeper to KRaft controllers where the number of ZooKeeper nodes differs from the number of controller nodes.
You cannot enable ZooKeeper migration when multiple log directories (JBOD) are in use by the brokers.
Confluent Ansible supports migration with the same cluster configurations.
Different security protocols on the ZooKeeper cluster and the KRaft cluster are not recommended in migration.
You cannot change the number of KRaft controller nodes or change the KRaft node IP, hostname, or ports of the controllers after migration.
Migrate to KRaft
To migrate a ZooKeeper-based Confluent Platform to KRaft:
Enable the migration flag in the same inventory file you used for ZooKeeper cluster setup:
all: vars: kraft_migration: true
Add the
kafka_controllerhost to your inventory file. For example:kafka_controller: hosts: ip1.us-west-2.compute.amazonaws.com: zookeeper: hosts: ip2.us-west-2.compute.amazonaws.com: kafka_broker: hosts: ip3.us-west-2.compute.amazonaws.com:
Run the migration playbook. You have two options:
Migrate in two steps with validation in between
This is the recommended way because rollback can only be done till the cluster is in the Dual Write mode. This workflow allows you to stop to ensure migration completion before moving to complete the KRaft state.
Step 1. Migrate to the Dual Write mode:
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags migrate_to_dual_write
Step 2. Validate that all data has been migrated without any loss.
Step 3. Complete the migration to the KRaft mode:
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags migrate_to_kraft
Migrate in one step
If you want to migrate in one step without pausing at the Dual Write mode, run the following command without tags.
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml
Once the cluster is running in KRaft mode, you can stop your ZooKeeper if the ZooKeeper is not managing multiple Kafka clusters.
Remove the ZooKeeper section and the migration flag (set in the first step above) from your inventory file.
Roll back to ZooKeeper
If the migration runs into problems, you can roll back to ZooKeeper mode.
Important
You can roll back only before the cluster is finalized into KRaft mode. After you take the controllers out of migration mode and restart in KRaft mode (the FINALIZED state), the rollback is not possible.
Use the following tags depending on how far you want to roll back:
To revert the brokers from KRaft to
HYBRID_DUAL_WRITEmode while the KRaft controllers stay active, use therollback_to_hybridtag. Use this tag when the cluster is in thePURE_DUAL_WRITEstate and you want a partial rollback to the hybrid mode:ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags rollback_to_hybrid
To fully revert the cluster to pure ZooKeeper mode, use the
rollback_to_premigrationtag:ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags rollback_to_premigration
Roll back manually
If you prefer to roll back to ZooKeeper manually, complete the following steps while the cluster is still in a dual-write state:
For each KRaft broker:
Take each KRaft broker down.
Remove the
__cluster_metadatadirectory on the broker.Restart the broker in ZooKeeper mode.
Perform a clean shutdown of the KRaft controller quorum.
A clean shutdown of the KRaft quorum is important because there may be uncommitted metadata waiting to be written to ZooKeeper. A forceful shutdown could cause some metadata to be lost.
Using the ZooKeeper shell, delete the
/controllernode so that a ZooKeeper-based broker can become the next controller.
Troubleshoot migration issues
This section describes a few of the potential issues you might encounter while migrating ZooKeeper to KRaft and presents the steps to troubleshoot the issues.
- Migration failed with the error:
{"attempts": 10, "cache_control": "no-cache", "changed": false, "content_type": "text/plain; charset=utf-8", "cookies": {}, "cookies_string": "", "date": "Fri, 19 Jan 2024 11:12:14 GMT", "elapsed": 0, "expires": "Fri, 19 Jan 2024 10:12:14 GMT", "json": {"request": {"mbean": "kafka.controller:name=ZkMigrationState,type=KafkaController", "type": "read"}, "status": 200, "timestamp": 1705662734, "value": {"Value": 2}}, "msg": "OK (unknown bytes)", "pragma": "no-cache", "redirected": false, "status": 200, "transfer_encoding": "chunked", "url": "https://localhost:7770/jolokia/read/kafka.controller:type=KafkaController,name=ZkMigrationState"}Solution: Increase the
metadata_migration_retriesvalue. Due to the size of the cluster, it might be taking more time to migrate than expected.- Migration failed with the error:
{"msg": "The conditional check '( jolokia_output.content | from_json ).value.Value == 1' failed. The error was: Expecting value: line 1 column 1 (char 0)"}One of the following can cause the error:
The KRaft controller has failed. For details of the failure, see
server.logsof the KRaft controller.Jolokia is disabled in the the KRaft controller.
Solution: Enable Jolokia in the KRaft controller if needed. Or review and address the issue in
server.log.- Migration failed with the error:
{"msg": "The conditional check '( jolokia_output.content | from_json ).value.Value == 1' failed. The error was: error while evaluating conditional (( jolokia_output.content | from_json ).value.Value == 1): 'dict object' has no attribute 'value'"}Confluent Ansible playbooks are using an older version of Confluent Platform with
confluent_package_versionset to7.5or earlier.Solution: Use Confluent Platform 7.6 or later.
- Migration failed with an authorization error on an RBAC cluster.
When migrating an RBAC cluster, the principal for the controller should be a super user on the broker, and the broker’s principal should be a super user on the KRaft controller.
Solution: Use super users on the KRaft controller as the principals for the ZooKeeper broker.