Roll Back to ZooKeeper
Roll back a ZooKeeper to KRaft migration to return the cluster to ZooKeeper. Rollback is supported during the SETUP, MIGRATE, and DUAL-WRITE phases.
Warning
After you apply the finalize annotation in Step 5.3 of the migration procedure, the cluster enters the MoveToKRaftControllerOnly phase and rollback is not possible.
For the main migration procedure, see ZooKeeper to KRaft Migration.
Step 1: Trigger rollback
If you triggered rollback before KRaftController was started, znode removal is not needed and rollback completes automatically. You can skip to Step 4. This applies to both paths below.
Trigger a rollback of the migration to ZooKeeper:
kubectl confluent cluster kraft-migration rollback --name <migration-job-name> -n <namespace>
Monitor progress with the status command and proceed to znode removal when prompted. For command details, see kubectl confluent cluster kraft-migration rollback.
The command prompts for confirmation before proceeding:
WARNING: This will rollback the cluster from KRaft to ZooKeeper.
Current phase: DUAL-WRITE
Proceed? [y/N]: y
Triggering rollback...
Annotation applied: platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true
✓ Rollback initiated successfully!
Apply the rollback annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
-n <namespace>
Expected:
kraftmigrationjob.platform.confluent.io/<migration-job-name> annotated
Step 2: Remove znodes from ZooKeeper
Remove the controller and migration znodes from ZooKeeper.
The zk-node-removal command interactively removes both znodes and applies the continue annotation in a single workflow. After running this command, skip to Step 4.
kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace>
If ZooKeeper has TLS enabled:
kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace> \
--zk-tls-config-file <path-to-zookeeper-client-properties>
The plugin guides you through the znode cleanup process, requiring confirmation before each step:
Deletes the controller znode from ZooKeeper.
Deletes the migration znode from ZooKeeper.
Applies the continue annotation to resume rollback.
The plugin automatically derives the ZooKeeper connection details from the KRaftMigrationJob status. For command details, see kubectl confluent cluster kraft-migration zk-node-removal.
The command prompts for confirmation before each step:
=== ZooKeeper Znode Cleanup for KRaft Migration Rollback ===
Migration Job: kraftmigrationjob (namespace: confluent)
Current Phase: RollbackToZk / SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk
Step 1/3: Delete controller znode from ZooKeeper
Path to delete: /kafka-confluent/controller
Execute this command? [y/N]: y ✓ Controller znode deleted
Step 2/3: Delete migration znode from ZooKeeper
Path to delete: /kafka-confluent/migration
Execute this command? [y/N]: y ✓ Migration znode deleted
Step 3/3: Apply continue annotation to resume rollback
Annotation: platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true
Apply this annotation? [y/N]: y ✓ Annotation applied
✓ ZooKeeper znode cleanup complete!
Rollback will now continue automatically.
First, wait for the migration job to reach the correct phase. Re-run the following command every few seconds until you see one of the expected subphases:
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
-o jsonpath='{.status.subPhase}{"\n"}'
Expected:
SubPhaseRollbackToZkWaitForManualNodeRemovalFromZkwhen rolling back from theMIGRATEorDUAL-WRITEphase.SubPhaseRollbackToZkFromSetupWaitForManualNodeRemovalFromZkwhen rolling back from theSETUPphase.
Exec into one of the ZooKeeper pods to run zookeeper-shell. Everything the tool needs is already in the pod, for example, the binary, the JAAS configuration, and the TLS truststore (if TLS is used).
kubectl exec -it <zookeeper-pod-name> -n <namespace> -- bash
Inside the pod, set the ZooKeeper connection string. Use localhost:2181 for plaintext or the secured port (typically 2182) for TLS. Replace <kafka-cr-name> and <kafka-cr-namespace> with the Kafka CR name and the namespace it runs in.
Remove the controller znode
Plaintext (default port 2181):
zookeeper-shell localhost:2181 \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller
TLS (default secured port 2182):
zookeeper-shell localhost:2182 \
-zk-tls-config-file <path-to-zookeeper-client-properties> \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller
For example, if your Kafka CR is named kafka in the confluent namespace, the znode path is /kafka-confluent/controller.
Note
The controller znode reappears immediately after deletion. This is expected. The znode is ephemeral, and Kafka recreates it for ZooKeeper-mode leader election. The deleteall cleared the migration-time controller state, which is the purpose of the step.
Remove the migration znode
Use the same zookeeper-shell connection as the controller znode removal. For the earlier example, the path is /kafka-confluent/migration.
Plaintext (default port 2181):
zookeeper-shell localhost:2181 \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration
TLS (default secured port 2182):
zookeeper-shell localhost:2182 \
-zk-tls-config-file <path-to-zookeeper-client-properties> \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration
After removing both znodes, proceed to Step 3 to apply the continue annotation.
Troubleshoot: For NoAuthException or Failed to delete some node(s) errors, see Roll back errors.
Step 3: Apply the continue annotation
After removing the znodes, apply the continue annotation to resume the rollback.
Skip this step. The zk-node-removal command in Step 2 already applied the continue annotation. Continue to Step 4.
Apply the continue annotation to resume rollback. Unlike the other migration annotations, this annotation does not use the kraft-migration-* prefix. The annotation is scoped specifically to the post-znode-removal handoff, so the operator only honors it after the migration job enters a wait-for-manual-node-removal subphase.
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
--overwrite \
-n <namespace>
Expected:
kraftmigrationjob.platform.confluent.io/<migration-job-name> annotated
Step 4: Verify rollback
After the migration job reports completion, confirm rollback succeeded by checking the job status, the Kafka CR’s metadata-store dependency, that data created during DUAL-WRITE is still present, and the Kafka and operator logs for errors.
Step 4.1: Verify rollback completed
Confirm the migration job reports COMPLETE, which indicates the rollback finished.
kubectl get kraftmigrationjob <migration-job-name> -n <namespace>
Expected:
NAME STATUS AGE
kraftmigrationjob COMPLETE 31m
Step 4.2: Verify Kafka is using ZooKeeper
Confirm the Kafka CR lists ZooKeeper as its metadata-store dependency.
kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'
Expected: The output lists the zookeeper dependency. For example:
{"zookeeper":{"discovery":{"name":"<zookeeper-name>","namespace":"<namespace>"}}}
Step 4.3: Verify data preservation
Confirm that topics and consumer groups created during DUAL-WRITE are still present after rollback.
Port 9071 is the CFK default for internal Kafka listeners. If your Kafka CR exposes Kafka on a different port, substitute it.
# List all topics
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-topics --list \
--bootstrap-server <kafka-bootstrap-service>:9071 \
--command-config /path/to/client.properties
# Check consumer groups
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-consumer-groups --list \
--bootstrap-server <kafka-bootstrap-service>:9071 \
--command-config /path/to/client.properties
Expected: All topics and consumer groups that existed before and during DUAL-WRITE are still present. Output varies by deployment and includes your application topics alongside internal topics. For example:
# topics
<your-application-topics>
__consumer_offsets
_confluent-command
_schemas
...
# consumer groups
<your-consumer-groups>
...
Tip
If Kafka pods are not ready or a rolling restart appears stuck after rollback, check spec.configOverrides.server on the Kafka CR. Fix any incorrect values and the roll resumes automatically.
Step 4.4: Check for errors
Scan the Kafka and operator logs for errors after rollback.
kubectl logs <kafka-pod-name> -n <namespace> --tail=50 | grep -E "ERROR|FATAL"
kubectl logs deployment/confluent-operator -n <operator-namespace> --tail=50 | grep -i error
Expected: No ERROR or FATAL messages.
# no output
Step 5: Clean up after rollback
After rollback completes, release the CR lock and remove the migration resources you no longer need.
Step 5.1: Release CR lock
Release the migration lock so you can modify the Kafka and KRaftController CRs again.
Release the migration locks on the Kafka, KRaftController, and ZooKeeper CRs:
kubectl confluent cluster kraft-migration release-lock --name <migration-job-name> -n <namespace>
For command details, see kubectl confluent cluster kraft-migration release-lock.
The command prompts for confirmation before proceeding:
This will release the CR lock on Kafka, KRaftController, and ZooKeeper resources.
Current phase: COMPLETE
Proceed? [y/N]: y
✓ CR lock release triggered successfully!
Annotation applied: platform.confluent.io/kraft-migration-release-cr-lock=true
Apply the release lock annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/kraft-migration-release-cr-lock=true \
--overwrite -n <namespace>
Verify locks are removed:
kubectl get kafka <kafka-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
Expected: No output, which confirms the locks are released.
# (no output)
Step 5.2: Delete migration job
Delete the KRaftMigrationJob CR once rollback is complete.
kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>
Expected:
kraftmigrationjob.platform.confluent.io "<migration-job-name>" deleted
Step 5.3: Delete KRaftController (optional)
Deleting the KRaftController removes the controller pods and frees their resources, but it discards the configuration you authored. Choose based on whether you plan to retry the migration:
Option | When to use |
|---|---|
Delete now | Rollback is final and you do not plan to retry, or you want to start from a clean state. |
Keep the CR | You plan to retry migration soon. You can reuse the same |
To delete:
kubectl delete kraftcontroller <kraftcontroller-name> -n <namespace>
Expected:
kraftcontroller.platform.confluent.io "<kraftcontroller-name>" deleted