Roll Back to ZooKeeper

Roll back a ZooKeeper to KRaft migration to return the cluster to ZooKeeper. Rollback is supported during the SETUP, MIGRATE, and DUAL-WRITE phases.

Warning

After you apply the finalize annotation in Step 5.3 of the migration procedure, the cluster enters the MoveToKRaftControllerOnly phase and rollback is not possible.

For the main migration procedure, see ZooKeeper to KRaft Migration.

Step 1: Trigger rollback

If you triggered rollback before KRaftController was started, znode removal is not needed and rollback completes automatically. You can skip to Step 4. This applies to both paths below.

Trigger a rollback of the migration to ZooKeeper:

kubectl confluent cluster kraft-migration rollback --name <migration-job-name> -n <namespace>

Monitor progress with the status command and proceed to znode removal when prompted. For command details, see kubectl confluent cluster kraft-migration rollback.

The command prompts for confirmation before proceeding:

WARNING: This will rollback the cluster from KRaft to ZooKeeper.
Current phase: DUAL-WRITE
Proceed? [y/N]: y
Triggering rollback...
  Annotation applied: platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true

✓ Rollback initiated successfully!

Apply the rollback annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
  -n <namespace>

Expected:

kraftmigrationjob.platform.confluent.io/<migration-job-name> annotated

Step 2: Remove znodes from ZooKeeper

Remove the controller and migration znodes from ZooKeeper.

The zk-node-removal command interactively removes both znodes and applies the continue annotation in a single workflow. After running this command, skip to Step 4.

kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace>

If ZooKeeper has TLS enabled:

kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace> \
  --zk-tls-config-file <path-to-zookeeper-client-properties>

The plugin guides you through the znode cleanup process, requiring confirmation before each step:

  1. Deletes the controller znode from ZooKeeper.

  2. Deletes the migration znode from ZooKeeper.

  3. Applies the continue annotation to resume rollback.

The plugin automatically derives the ZooKeeper connection details from the KRaftMigrationJob status. For command details, see kubectl confluent cluster kraft-migration zk-node-removal.

The command prompts for confirmation before each step:

=== ZooKeeper Znode Cleanup for KRaft Migration Rollback ===
Migration Job: kraftmigrationjob (namespace: confluent)
Current Phase: RollbackToZk / SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk

Step 1/3: Delete controller znode from ZooKeeper
  Path to delete: /kafka-confluent/controller
Execute this command? [y/N]: y  ✓ Controller znode deleted

Step 2/3: Delete migration znode from ZooKeeper
  Path to delete: /kafka-confluent/migration
Execute this command? [y/N]: y  ✓ Migration znode deleted

Step 3/3: Apply continue annotation to resume rollback
  Annotation: platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true
Apply this annotation? [y/N]: y  ✓ Annotation applied

✓ ZooKeeper znode cleanup complete!
Rollback will now continue automatically.

First, wait for the migration job to reach the correct phase. Re-run the following command every few seconds until you see one of the expected subphases:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
  -o jsonpath='{.status.subPhase}{"\n"}'

Expected:

  • SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk when rolling back from the MIGRATE or DUAL-WRITE phase.

  • SubPhaseRollbackToZkFromSetupWaitForManualNodeRemovalFromZk when rolling back from the SETUP phase.

Exec into one of the ZooKeeper pods to run zookeeper-shell. Everything the tool needs is already in the pod, for example, the binary, the JAAS configuration, and the TLS truststore (if TLS is used).

kubectl exec -it <zookeeper-pod-name> -n <namespace> -- bash

Inside the pod, set the ZooKeeper connection string. Use localhost:2181 for plaintext or the secured port (typically 2182) for TLS. Replace <kafka-cr-name> and <kafka-cr-namespace> with the Kafka CR name and the namespace it runs in.

Remove the controller znode

Plaintext (default port 2181):

zookeeper-shell localhost:2181 \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller

TLS (default secured port 2182):

zookeeper-shell localhost:2182 \
  -zk-tls-config-file <path-to-zookeeper-client-properties> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller

For example, if your Kafka CR is named kafka in the confluent namespace, the znode path is /kafka-confluent/controller.

Note

The controller znode reappears immediately after deletion. This is expected. The znode is ephemeral, and Kafka recreates it for ZooKeeper-mode leader election. The deleteall cleared the migration-time controller state, which is the purpose of the step.

Remove the migration znode

Use the same zookeeper-shell connection as the controller znode removal. For the earlier example, the path is /kafka-confluent/migration.

Plaintext (default port 2181):

zookeeper-shell localhost:2181 \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration

TLS (default secured port 2182):

zookeeper-shell localhost:2182 \
  -zk-tls-config-file <path-to-zookeeper-client-properties> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration

After removing both znodes, proceed to Step 3 to apply the continue annotation.

Troubleshoot: For NoAuthException or Failed to delete some node(s) errors, see Roll back errors.

Step 3: Apply the continue annotation

After removing the znodes, apply the continue annotation to resume the rollback.

Skip this step. The zk-node-removal command in Step 2 already applied the continue annotation. Continue to Step 4.

Apply the continue annotation to resume rollback. Unlike the other migration annotations, this annotation does not use the kraft-migration-* prefix. The annotation is scoped specifically to the post-znode-removal handoff, so the operator only honors it after the migration job enters a wait-for-manual-node-removal subphase.

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
  --overwrite \
  -n <namespace>

Expected:

kraftmigrationjob.platform.confluent.io/<migration-job-name> annotated

Step 4: Verify rollback

After the migration job reports completion, confirm rollback succeeded by checking the job status, the Kafka CR’s metadata-store dependency, that data created during DUAL-WRITE is still present, and the Kafka and operator logs for errors.

Step 4.1: Verify rollback completed

Confirm the migration job reports COMPLETE, which indicates the rollback finished.

kubectl get kraftmigrationjob <migration-job-name> -n <namespace>

Expected:

NAME                STATUS     AGE
kraftmigrationjob   COMPLETE   31m

Step 4.2: Verify Kafka is using ZooKeeper

Confirm the Kafka CR lists ZooKeeper as its metadata-store dependency.

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'

Expected: The output lists the zookeeper dependency. For example:

{"zookeeper":{"discovery":{"name":"<zookeeper-name>","namespace":"<namespace>"}}}

Step 4.3: Verify data preservation

Confirm that topics and consumer groups created during DUAL-WRITE are still present after rollback.

Port 9071 is the CFK default for internal Kafka listeners. If your Kafka CR exposes Kafka on a different port, substitute it.

# List all topics
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-topics --list \
  --bootstrap-server <kafka-bootstrap-service>:9071 \
  --command-config /path/to/client.properties

# Check consumer groups
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-consumer-groups --list \
  --bootstrap-server <kafka-bootstrap-service>:9071 \
  --command-config /path/to/client.properties

Expected: All topics and consumer groups that existed before and during DUAL-WRITE are still present. Output varies by deployment and includes your application topics alongside internal topics. For example:

# topics
<your-application-topics>
__consumer_offsets
_confluent-command
_schemas
...

# consumer groups
<your-consumer-groups>
...

Tip

If Kafka pods are not ready or a rolling restart appears stuck after rollback, check spec.configOverrides.server on the Kafka CR. Fix any incorrect values and the roll resumes automatically.

Step 4.4: Check for errors

Scan the Kafka and operator logs for errors after rollback.

kubectl logs <kafka-pod-name> -n <namespace> --tail=50 | grep -E "ERROR|FATAL"
kubectl logs deployment/confluent-operator -n <operator-namespace> --tail=50 | grep -i error

Expected: No ERROR or FATAL messages.

# no output

Step 5: Clean up after rollback

After rollback completes, release the CR lock and remove the migration resources you no longer need.

Step 5.1: Release CR lock

Release the migration lock so you can modify the Kafka and KRaftController CRs again.

Release the migration locks on the Kafka, KRaftController, and ZooKeeper CRs:

kubectl confluent cluster kraft-migration release-lock --name <migration-job-name> -n <namespace>

For command details, see kubectl confluent cluster kraft-migration release-lock.

The command prompts for confirmation before proceeding:

This will release the CR lock on Kafka, KRaftController, and ZooKeeper resources.
Current phase: COMPLETE
Proceed? [y/N]: y
✓ CR lock release triggered successfully!
  Annotation applied: platform.confluent.io/kraft-migration-release-cr-lock=true

Apply the release lock annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-release-cr-lock=true \
  --overwrite -n <namespace>

Verify locks are removed:

kubectl get kafka <kafka-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock

Expected: No output, which confirms the locks are released.

# (no output)

Step 5.2: Delete migration job

Delete the KRaftMigrationJob CR once rollback is complete.

kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>

Expected:

kraftmigrationjob.platform.confluent.io "<migration-job-name>" deleted

Step 5.3: Delete KRaftController (optional)

Deleting the KRaftController removes the controller pods and frees their resources, but it discards the configuration you authored. Choose based on whether you plan to retry the migration:

Option

When to use

Delete now

Rollback is final and you do not plan to retry, or you want to start from a clean state.

Keep the CR

You plan to retry migration soon. You can reuse the same KRaftController CR by reapplying the platform.confluent.io/kraft-migration-hold-krc-creation annotation and creating a new KRaftMigrationJob.

To delete:

kubectl delete kraftcontroller <kraftcontroller-name> -n <namespace>

Expected:

kraftcontroller.platform.confluent.io "<kraftcontroller-name>" deleted