Migrate Confluent Platform from ZooKeeper to KRaft using Confluent for Kubernetes¶
Starting in the 2.8 release, Confluent for Kubernetes (CFK) supports migration from a ZooKeeper-based Confluent Platform deployment to a KRaft-based deployment.
Requirements and considerations¶
Use one of the following CFK versions:
- 2.8.4 or higher in the 2.8.x branch
- 2.9.2 or higher
CFK only supports migration over the same Confluent Platform version of 7.6 or later.
Migrating Confluent Platform 7.6.0 clusters is not recommended for production environments. Use Confluent Platform 7.6.1 or later version with CFK 2.8.1 or later for production environment migration.
The migration job, KRaftMigrationJob custom resource, does not support Helm-based Confluent Platform. It only supports migrating CFK-based Confluent Platform.
You need to upgrade Confluent Platform first before running the migration.
You cannot upgrade of Confluent Platform version and migrate ZooKeeper to KRaft at the same time.
To prevent unwanted modification to ZooKeeper, Kafka, and KRaft during and after migration, the migration job puts lock on ZooKeeper, Kafka, and KRaft resources.
This lock is only enforceable for the CFK deployments with webhooks enabled.
If you are migrating a deployment that does not have the webhooks enabled, make sure no other actor, for example, continuous integration and continuous delivery (CI/CD) tools such as gitOps, FluxCD, is updating/deleting ZooKeeper, Kafka, and KRaft resources while migration is in progress.
You need to manually remove the lock at the end of migration after validating a successful migration. For details, see the following migration steps.
You can upgrade from ZooKeeper to KRaft in the isolated mode with separate KRaft controller nodes and broker nodes.
You cannot migrate to the combined mode where KRaft and brokers are on the same process (
role=controller, broker
).ACL is migrated from ZooKeeper to KRaft.
Automatic migration from ZooKeeper to KRaft is not supported in multi-region clusters deployments.
CFK does not support rolling back to the ZooKeeper mode once migration to the KRaft mode is triggered.
CFK supports ZooKeeper migration of the brokers that use multiple log directories (JBOD). JBOD is in Early Access in Kafka 3.7 which is part of the Confluent Platform 7.7 release.
CFK does not automatically delete the ZooKeeper cluster after migration.
You can delete the ZooKeeper cluster when you verify that it is not used for any other Kafka cluster.
Migrate ZooKeeper to KRaft¶
To migrate a ZooKeeper-based Confluent Platform deployment to a KRaft-based deployment, use the KRaftMigrationJob custom resource (CR) and follow the steps below.
Note
When you use CFK to migrate from ZooKeeper to KRaft, CFK automatically fetches the Kafka cluster id. You do not need to manually perform the step described in the manual Confluent Platform migration process.
For an example tutorial of the migration process, see the CFK example GitHub repo.
Set the Kafka inter-broker protocol version if needed.
By default, CFK uses the inter-broker protocol (IBP) version of 3.6 in the ZooKeeper to KRaft migration workflow.
When migrating a Confluent Platform version higher than 7.6, set the corresponding IBP version in the Kafka CR. To get the IBP, refer to the table in Confluent Platform upgrade guide. For example, to migrate a Confluent Platform deployment of version 7.7 from ZooKeeper to KRaft, set the IBP version to the corresponding
3.7
as below:kind: Kafka metadata: annotation: "platform.confluent.io/kraft-migration-ibp-version": "3.7"
Create a KRaftController CR with the following annotations and apply the KRaftController CR.
This migration annotation indicates CFK to wait for KRaftMigrationJob to modify KRaftController CR with migration configurations and then create the KRaftController pods.
kind: KRaftController metadata: annotations: platform.confluent.io/kraft-migration-hold-krc-creation: "true"
An example KRaftController CR:
kind: KRaftController metadata: name: kraftcontroller namespace: confluent annotations: platform.confluent.io/kraft-migration-hold-krc-creation: "true" spec: dataVolumeCapacity: 10G image: application: docker.io/confluentinc/cp-server:7.6.0 init: confluentinc/confluent-init-container:2.8.0 replicas: 3
Create a KRaftMigrationJob CR and apply the CR to kick off the migration process.
kind: KRaftMigrationJob spec: dependencies: zookeeper: # provide ZK clusterRef kRaftController: # provide KRaftController clusterRef kafka: # provide Kafka clusterRef
For example:
kind: KRaftMigrationJob metadata: name: kraftmigrationjob namespace: confluent spec: dependencies: kafka: name: kafka namespace: confluent zookeeper: name: zookeeper namespace: confluent kRaftController: name: kraftcontroller namespace: confluent
Monitor the migration progress.
kubectl get kraftmigrationjob <migration job name> -n <namespace> -oyaml -w
Migration time would be dependent on the following factors:
- Time to take single Kafka roll multiplied by
3
- Time to migrate metadata from ZooKeeper to KRaftController
- Time to take the KRaftController roll
- Time to take single Kafka roll multiplied by
Now the migration process is in the Dual Write phase.
Check the logs and validate that all data has been migrated without any loss. You can use the following command:
kubectl logs -f -c=kraftcontroller --selector app=kraftcontroller -n <namespace> | grep "Completed migration of metadata from ZooKeeper to KRaft"
To track the logs, run this command as soon as the migration is started. Note that Kubernetes does not keep the logs of restarted pods.
If validation passes and you wish to complete the migration process, apply the
platform.confluent.io/kraft-migration-trigger-finalize-to-kraft
annotation to the migration job CR.kubectl annotate kraftmigrationjob <migration job name> \ platform.confluent.io/kraft-migration-trigger-finalize-to-kraft=true \ --namespace <namespace>
Alternatively, if validation fails, or if you want to go back to ZooKeeper, you can roll back to ZooKeeper.
Rollback can only be done till the cluster is in the Dual Write mode, up to this point. Once you take the controller out of the migration mode and restart in KRaft mode, you can no longer roll back to ZooKeeper mode.
For the steps, see Roll back to ZooKeeper.
Remove the lock that the migration job placed on the resources.
The migration job locks the Kafka, ZooKeeper, and KRaft resources on initialization of migration workflow.
When migration is completed, status message will prompt you to:
Remove the lock using the
platform.confluent.io/kraft-migration-release-cr-lock
annotation.kubectl annotate kraftmigrationjob <migration job name> \ platform.confluent.io/kraft-migration-release-cr-lock=true \ --namespace <namespace>
The migration job modifies the
yaml
representation of Kafka and KRaft CRs. Download the modified Kafka and KRaft CRs to update your CI/CD as necessary.For example:
kubectl get kafka <Kafka CR name> -n <namespace> -oyaml > updated_kafka.yaml kubectl get kraftcontroller <KRaftcontroller CR name> -n <namespace> -oyaml > updated_kraftcontroller.yaml
After migration completes and after verifying that it is not used for any other Kafka cluster, remove the ZooKeeper cluster.
kubectl delete -f <Zookeeper CR>
Roll back to ZooKeeper¶
If the migration fails, you can roll back to the ZooKeeper cluster at any point in the migration process prior to taking the KRaft controllers out of the migration mode. Up to that point, the controller makes dual writes to KRaft and ZooKeeper. Since the data in ZooKeeper is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZooKeeper.
To roll back to ZooKeeper:
Apply the
platform.confluent.io/kraft-migration-trigger-rollback-to-zk
annotation to the migration job.kubectl annotate kraftmigrationjob <migration job name> \ platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \ --namespace <namespace>
Remove the
controller
,controller_epoch
, andmigration
nodes.When you get the status message to delete the controller nodes from ZooKeeper, run the following commands:
zookeeper-shell <zkhost:zkport> deleteall <kafka-cr-name>-<kafka-cr-namespace>/controller
zookeeper-shell <zkhost:zkport> deleteall <kafka-cr-name>-<kafka-cr-namespace>/controller_epoch
zookeeper-shell <zkhost:zkport> deleteall <kafka-cr-name>-<kafka-cr-namespace>/migration
Trigger node removal process.
kubectl annotate kraftmigrationjob <migration job name> \ platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \ --overwrite \ --namespace <namespace>