Migrate Confluent Platform from ZooKeeper to KRaft using Ansible Playbooks¶
Ansible Playbooks for Confluent Platform (Confluent Ansible) supports migration from a ZooKeeper-based Confluent Platform deployment to a KRaft-based deployment.
To safely migrate your hosts and to achieve zero downtime, Confluent Ansible performs rolling upgrades host by host, shutting down the component, upgrading packages, restarting the service, and validating service health before moving on to the next one.
Requirements and considerations¶
Confluent Ansible only supports migration over the same Confluent Platform version of 7.6 or later.
Note that migrating Confluent Platform 7.6.0 clusters is not recommended for production environments.
You need to upgrade Confluent Platform first before running the migration.
You cannot upgrade of Confluent Platform version and migrate ZooKeeper to KRaft at the same time.
You can upgrade from ZooKeeper to KRaft in isolated mode (the controller having
process.roles=controller
and broker havingprocess.roles=broker
)You cannot migrate to the combined mode where KRaft and brokers are on the same process (
role=controller, broker
).You can migrate colocated clusters where ZooKeeper and brokers are on the same node.
You can migrate ZooKeeper cluster to KRaft running on the same node.
Beware of port collisions if colocating components on the same host. If ZooKeeper and KRaft Controller are colocated, use the variables
kafka_controller_jolokia_port
andkafka_controller_jmxexporter_port
to define different ports for ZooKeeper and KRaft. For example,kafka_controller_jolokia_port: 7777
andkafka_controller_jmxexporter_port: 8081
.ACL is migrated from ZooKeeper to KRaft.
Confluent Ansible supports one-to-many or many-to-one mapping of ZooKeeper to KRaft controllers where the number of ZooKeeper nodes differs from the number of controller nodes.
You cannot enable ZooKeeper migration when multiple log directories (JBOD) are in use by the brokers.
Confluent Ansible supports migration with the same cluster configurations.
- Different security protocols on the ZooKeeper cluster and the KRaft cluster are not recommended in migration.
- You cannot change the number of KRaft controller nodes or change the KRaft node IP, hostname, or ports of the controllers after migration.
Migrate to KRaft¶
To migrate a ZooKeeper-based Confluent Platform to KRaft:
Enable the migration flag in the same inventory file you used for ZooKeeper cluster setup:
all: vars: kraft_migration: true
Add the
kafka_controller
host to your inventory file. For example:kafka_controller: hosts: ip1.us-west-2.compute.amazonaws.com: zookeeper: hosts: ip2.us-west-2.compute.amazonaws.com: kafka_broker: hosts: ip3.us-west-2.compute.amazonaws.com:
Run the migration playbook. You have two options:
Migrate in two steps with validation in between
This is the recommended way because rollback can only be done till the cluster is in the Dual Write mode. This workflow allows you to stop to ensure migration completion before moving to complete the KRaft state.
Step 1. Migrate to the Dual Write mode:
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags migrate_to_dual_write
Step 2. Validate that all data has been migrated without any loss.
Step 3. Complete the migration to the KRaft mode:
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml \ --tags migrate_to_kraft
Migrate in one step
If you want to migrate in one step without pausing at the Dual Write mode, run the following command without tags.
ansible-playbook -i <inventory-file> confluent.platform.ZKtoKraftMigration.yml
Once the cluster is running in KRaft mode, you can stop your ZooKeeper if the ZooKeeper is not managing multiple Kafka clusters.
Remove the ZooKeeper section and the migration flag (set in the first step above) from your inventory file.
Roll back to ZooKeeper¶
If the migration fails, you can roll back to the ZooKeeper cluster at any point in the migration process prior to taking the KRaft controllers out of the migration mode. Up to that point, the controller makes dual writes to KRaft and ZooKeeper. Since the data in ZooKeeper is still consistent with that of the KRaft metadata log, it is still possible to revert to ZooKeeper.
Once you take the controller out of the migration mode and restart in KRaft mode, you can no longer roll back to ZooKeeper mode.
To roll back to ZooKeeper:
For each KRaft broker:
- Take each KRaft broker down.
- Remove the
__cluster_metadata
directory on the broker. - Restart the broker in ZooKeeper mode.
Perform a clean shutdown of the KRaft controller quorum.
A clean shutdown of the KRaft quorum is important because there may be uncommitted metadata waiting to be written to ZooKeeper. A forceful shutdown could cause some metadata to be lost.
Using the ZooKeeper shell, delete the
/controller
and/controller_epoch
nodes so that a ZooKeeper-based broker can become the next controller.
Troubleshoot migration issues¶
This section describes a few of the potential issues you might encounter while migrating ZooKeeper to KRaft and presents the steps to troubleshoot the issues.
- Migration failed with the error:
{"attempts": 10, "cache_control": "no-cache", "changed": false, "content_type": "text/plain; charset=utf-8", "cookies": {}, "cookies_string": "", "date": "Fri, 19 Jan 2024 11:12:14 GMT", "elapsed": 0, "expires": "Fri, 19 Jan 2024 10:12:14 GMT", "json": {"request": {"mbean": "kafka.controller:name=ZkMigrationState,type=KafkaController", "type": "read"}, "status": 200, "timestamp": 1705662734, "value": {"Value": 2}}, "msg": "OK (unknown bytes)", "pragma": "no-cache", "redirected": false, "status": 200, "transfer_encoding": "chunked", "url": "https://localhost:7770/jolokia/read/kafka.controller:type=KafkaController,name=ZkMigrationState"}
Solution: Increase the
metadata_migration_retries
value. Due to the size of the cluster, it might be taking more time to migrate than expected.- Migration failed with the error:
{"msg": "The conditional check '( jolokia_output.content | from_json ).value.Value == 1' failed. The error was: Expecting value: line 1 column 1 (char 0)"}
One of the following can cause the error:
- The KRaft controller has failed. For details of the failure, see
server.logs
of the KRaft controller. - Jolokia is disabled in the the KRaft controller.
Solution: Enable Jolokia in the KRaft controller if needed. Or review and address the issue in
server.log
.- The KRaft controller has failed. For details of the failure, see
- Migration failed with the error:
{"msg": "The conditional check '( jolokia_output.content | from_json ).value.Value == 1' failed. The error was: error while evaluating conditional (( jolokia_output.content | from_json ).value.Value == 1): 'dict object' has no attribute 'value'"}
Confluent Ansible playbooks are using an older version of Confluent Platform with
confluent_package_version
set to7.5
or earlier.Solution: Use Confluent Platform 7.6 or later.
- Migration failed with an authorization error on an RBAC cluster.
When migrating an RBAC cluster, the principal for the controller should be a super user on the broker, and the broker’s principal should be a super user on the KRaft controller.
Solution: Use super users on the KRaft controller as the principals for the ZooKeeper broker.