ZooKeeper to KRaft Migration Phases and Sub-phases

This topic is the concept reference for the ZooKeeper to KRaft migration. Use it to interpret the phase and sub-phase names reported by CFK, and to look up key concepts (such as dual-write mode and point of no return) that the migration procedure and rollback procedure reference.

Overview

The ZooKeeper to KRaft migration moves a Confluent Platform cluster from ZooKeeper-managed metadata to a KRaft-managed controller quorum without losing data. It runs as a state machine: the cluster passes through SETUP (preparation and validation), MIGRATE (initial metadata copy), and DUAL-WRITE (metadata mirrored to both stores) before you commit to MoveToKRaftControllerOnly or fall back via RollbackToZk. The final state is COMPLETE.

|zk| to |kraft| migration phase flow diagram

ZooKeeper to KRaft migration phase flow diagram

Key concepts

These concepts inform how each phase behaves. The procedure and rollback pages link back to this section for definitions.

Dual-write mode

In the DUAL-WRITE phase, CFK writes metadata changes to both ZooKeeper and the KRaftController quorum. The cluster remains fully serviceable for client traffic during this phase. DUAL-WRITE is a stable waiting state. The migration job stays here indefinitely until you trigger finalization or rollback.

Point of no return

Rollback is only possible during SETUP, MIGRATE, and DUAL-WRITE. When you apply the platform.confluent.io/kraft-migration-trigger-finalize-to-kraft annotation, the migration job moves out of DUAL-WRITE and into MoveToKRaftControllerOnly. Once in MoveToKRaftControllerOnly, ZooKeeper metadata is no longer authoritative and the cluster cannot return to ZooKeeper.

FINALIZE is not a phase

FINALIZE is the action you trigger from DUAL-WRITE by applying the finalize annotation. The annotation moves the migration into the MoveToKRaftControllerOnly phase. COMPLETE indicates that MoveToKRaftControllerOnly has finished successfully.

Data preservation

All topics, consumer groups, and partition data committed to Kafka during the migration are preserved across both finalization and rollback. Kafka keeps its log segments and partition data on disk regardless of which path you take from DUAL-WRITE.

CR locks

During migration, CFK locks the Kafka, ZooKeeper, and KRaftController CRs by adding the platform.confluent.io/kraft-migration-cr-lock=true annotation. Only the CFK operator service account can modify locked CRs; all other UPDATE and DELETE requests are denied. For the enforcement mechanisms (VAP versus webhook), see CR lock enforcement.

Migration phase: SETUP

SETUP prepares the cluster for migration. CFK validates configuration, derives the inter-broker protocol (IBP) version, locks the Kafka, ZooKeeper, and KRaftController CRs, and starts the KRaftController quorum in migration mode.

What to watch for: several SETUP sub-phases finish in under a second, so a single kubectl get might miss them. Use kubectl get kraftmigrationjob -w to see each sub-phase as it progresses.

When the migration job is in the SETUP phase, the log has the status, Migration current status SETUP/<sub-phase>.

CFK performs the following tasks at each sub-phase during the SETUP phase:

  1. SubPhaseSetupPreChecks: Validates configOverrides.server for blocklisted configuration keys that conflict with CFK-managed migration configurations.

  2. SubPhaseSetupDeriveIBPVersion: Derives the inter-broker protocol (IBP) version for migration. CFK automatically determines the IBP from the image tag for standard Confluent images. For custom images, CFK reads the platform.confluent.io/kraft-migration-ibp-version annotation from the Kafka CR. The migration job stores the derived IBP version in its status for use in subsequent phases.

  3. SubPhaseSetupAddMigrationAnnotation: Adds the platform.confluent.io/kraft-migration-cr-lock annotation to the ZooKeeper, Kafka, and KRaft controller CRs.

  4. SubPhaseSetupCheckHealthyKafka: Ensures Kafka is running healthy.

  5. SubPhaseSetupCheckKafkaVersion: Ensures Kafka version is 7.6.0 or later.

  6. SubPhaseSetupEnsureKRaftControllerExists: Ensure KRaftController is present in the hold state. The platform.confluent.io/kraft-migration-hold-krc-creation annotation prevents KRaftController from starting until CFK removes the annotation later, at SubPhaseSetupMutateKRaftController.

  7. SubPhaseSetupTriggerIBPUpgrade: Upgrades inter.broker.protocol.version using the annotation-specified value for custom images or the operator-determined value for standard Confluent Platform images.

  8. SubPhaseSetupEnsureIBPUpgradeComplete: Waits for Kafka roll to complete after CFK upgrades inter.broker.protocol.version in the previous sub-phase.

  9. SubPhaseSetupFetchKafkaClusterId: Fetches the Kafka Cluster ID from Jolokia endpoint in Kafka cluster.

  10. SubPhaseSetupFetchKafkaZookeeperEndpoint: Fetches the ZooKeeper endpoint from the Kafka CR Status.

  11. SubPhaseSetupMutateKRaftController:

    • Removes the hold annotation, platform.confluent.io/kraft-migration-hold-krc-creation.

    • Supplies the Kafka Cluster ID from previous step, so that the bootstrapped cluster has the same cluster ID.

    • Adds migration configuration.

  12. SubPhaseSetupKRaftControllerHealthy: Waits for KRaftController to become healthy.

  13. SubPhaseSetupCheckKraftControllerVersion: Validates if KRaftController version is 7.6 or later.

  14. SubPhaseSetupComplete: Marks the completion of the SETUP phase.

Migration phase: MIGRATE

MIGRATE triggers Kafka to begin streaming existing metadata to the KRaftController quorum. The phase ends when CFK confirms the cluster has entered dual-write mode.

When the migration job is in the MIGRATE phase, the log has the status, Migration current status MIGRATE/<sub-phase>.

CFK performs the following tasks at each sub-phase during the MIGRATE phase:

  1. SubPhaseMigrateTriggerKafkaMigration: Adds the following configurations at this phase:

    • Migration configurations

    • KRaft reference

  2. SubPhaseMigrateEnsureKafkaRollComplete: Waits for the Kafka roll to complete.

  3. SubPhaseMigrateMonitorMigrationProgress: Monitors the migration progress, and checks if it reaches DUAL-WRITE mode.

  4. SubPhaseMigrateDualWrite: Marks the completion of Migration phase.

Migration phase: DUAL-WRITE

DUAL-WRITE has no sub-phases. The cluster writes metadata to both ZooKeeper and the KRaftController quorum and remains fully serviceable for client traffic. The migration job waits here indefinitely for your input. For the data-handling guarantee, see Dual-write mode and Data preservation.

To exit DUAL-WRITE, apply one of two annotations on the KRaftMigrationJob CR:

  • platform.confluent.io/kraft-migration-trigger-finalize-to-kraft moves the migration to MoveToKRaftControllerOnly. This is irreversible. For details, see Point of no return.

  • platform.confluent.io/kraft-migration-trigger-rollback-to-zk moves the migration to RollbackToZk.

Migration phase: MoveToKRaftControllerOnly

After you trigger finalization, CFK removes the ZooKeeper dependency from Kafka and reconfigures the KRaftController quorum to operate without migration metadata.

What to watch for: this phase can persist for several minutes (up to 30 minutes or more in some clusters) before transitioning to COMPLETE. CFK does not emit an intermediate status update during this phase, so a kubectl get -w watch might appear idle. This is expected. Keep the watch running, or recheck the status manually with kubectl get kraftmigrationjob <migration-job-name> -n <namespace>.

When the migration job is in the MoveToKRaftControllerOnly phase, the log has the status, Migration current status MoveToKRaftControllerOnly/<sub-phase>.

The migration job performs the following tasks at each sub-phase during the MoveToKRaftControllerOnly phase:

  1. SubPhaseMoveToKRaftControllerTriggerZkRemoval: Removes ZooKeeper dependency from Kafka.

  2. SubPhaseMoveToKRaftControllerEnsureKafkaRollIsComplete: Waits for Kafka roll to finish.

  3. SubPhaseMoveToKRaftControllerTriggerKRaftControllerMigrationModeRemoval: Triggers migration config removal from KRaftControllers.

  4. SubPhaseMoveToKRaftControllerEnsureKRaftControllerMigrationModeRemovalComplete: Waits for KRaftController roll to complete.

  5. SubPhaseMoveToKRaftControllerComplete: Marks completion of the migration.

Migration phase: RollbackToZk

RollbackToZk returns the cluster to ZooKeeper as the metadata store. Topics, consumer groups, and partition data are preserved (see Data preservation).

What to watch for: the rollback sub-phases include a manual step. After CFK stops Kafka from writing to the KRaftController, you must remove the controller and migration znodes from ZooKeeper before CFK can complete the rollback. For the procedure, see Roll Back to ZooKeeper.

To roll back to ZooKeeper, the migration job performs the following tasks at each sub-phase during the RollbackToZK phase:

  1. SubPhaseRollbackToZkMakeProcessRoleEmpty: Removes process.roles properties from Kafka.

  2. SubPhaseRollbackToZkEnsureKafkaRollIsComplete: Ensures Kafka roll is complete.

  3. SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk: Waits for a manual removal for ZooKeeper controller node. This sub-phase applies when rollback is triggered from the MIGRATE or DUAL-WRITE phase.

    When you remove the controller and migration nodes and trigger the node removal process as described in the second and third steps in Roll Back to ZooKeeper, the migration moves to the next sub-phase.

  4. SubPhaseRollbackToZkFromSetupWaitForManualNodeRemovalFromZk: Waits for a manual removal for ZooKeeper controller node. This sub-phase applies when rollback is triggered from the SETUP phase.

    When you remove the controller and migration nodes and trigger the node removal process as described in the second and third steps in Roll Back to ZooKeeper, the migration moves to the next sub-phase.

  5. SubPhaseRollbackToZkRemoveKRaftControllerDepsInKafka: Removes KRaft dependency from Kafka.

  6. SubPhaseRollbackToZkEnsureKafkaRollIsComplete2: Ensures Kafka roll is complete.

  7. SubPhaseRollbackToZkAddClusterMetadataCleanUpAnnotationInKafka: Adds platform.confluent.io/format-cluster-metadata-in-kafka annotation which directs the init container to remove cluster_metadata directory /mnt/data/data0/logs/__cluster_metadata in Kafka.

  8. SubPhaseRollbackToZkEnsureKafkaRollIsComplete3: Ensures Kafka roll is complete.

  9. SubPhaseRollbackToZkRemoveClusterMetadataCleanUpAnnotationInKafka: Removes platform.confluent.io/format-cluster-metadata-in-kafka.

  10. SubPhaseRollbackToZkEnsureKafkaRollIsComplete4: Ensures Kafka roll is complete.

  11. SubPhaseRollbackToZkComplete: Marks completion of rollback to ZooKeeper.

Migration phase: COMPLETE

The final phase of the ZooKeeper to KRaft migration has one of the following status:

  • COMPLETE

    The final phase. No sub-phases.

  • FAILURE

    The migration enters this phase when it encounters a non-recoverable error.