Migrate a Multi-Region Cluster from ZooKeeper to KRaft
A multi-region cluster (MRC) is a Kafka deployment that spans more than one Kubernetes cluster or region, with one KRaftController quorum per region. This topic provides the complete sequence for migrating an MRC from ZooKeeper to KRaft, one region at a time, by using Confluent for Kubernetes. The per-region CRs and commands are identical to a single-cluster migration, so each step links to the matching single-cluster step for the detailed configuration. For the single-cluster procedure, see Migrate a Single Cluster from ZooKeeper to KRaft.
Important
Migrate, finalize, and roll back the regions strictly one at a time. This sequential procedure is the required approach for all multi-region clusters. Wait for each region to finish its broker rolls before you start the next region. Do not trigger more than one region at the same time.
Each region’s operator rolls brokers independently, with no coordination across regions. If you trigger more than one region at the same time, brokers that hold replicas of the same partition can restart together. This can make topics unavailable, even if the replication factor is high enough to survive one region restarting on its own.
You can migrate regions in parallel only if your cluster meets the redundancy conditions in Migrate regions in parallel (optional). If you are not sure, use this sequential procedure.
The following table lists the status that each region must reach before you move to the next region:
Stage | Wait for each region to reach |
|---|---|
Migration |
|
Finalization |
|
Rollback |
|
Before you start, ensure you have met all the prerequisites in KRaft Migration Prerequisites.
Step 1: Prepare each region
In every region, complete the single-cluster preparation steps. These steps do not roll brokers, so you can complete them in all regions before you start the migration.
Configure the IBP version. For details, see Step 1.
Create and apply the
KRaftControllerCR. Confirm theKRaftControllerreaches theHOLDstate in each region. For details, see Step 2.Set
zookeeper.connectmanually on theKRaftControllerCR and add thekraft-migration-bypass-prechecksannotation to theKRaftMigrationJobCR. This is required for MRC migrations. For details, see Step 3 and Multi-region cluster considerations.
Step 2: Migrate each region
Migrate the regions one at a time.
Apply the
KRaftMigrationJobCR in the first region. For details, see Step 3.Monitor the migration job until it reaches the
MIGRATEphase with the subphaseSubPhaseMigrateMonitorMigrationProgress. At this point, all broker rolls for this region are complete. For details, see Step 4.Repeat the previous steps for each remaining region. Wait for the same status before you move to the next region.
When the final region completes its broker rolls, the KRaftController quorums across all regions detect that all voters have registered and the cluster transitions to the DUAL-WRITE phase.
Note
DUAL-WRITE is a cluster-wide state, not a per-region state. The last region to finish its broker rolls determines when the cluster reaches DUAL-WRITE.
Step 3: Verify the cluster reaches DUAL-WRITE
Confirm that every region reports the DUAL-WRITE phase before you finalize. For details, see Step 5.1.
Then validate cluster health across all regions. For details, see Step 5.2.
Note
If validation fails, roll back instead of finalizing. For details, see Step 5. Rollback is supported during the SETUP, MIGRATE, and DUAL-WRITE phases.
Step 4: Finalize each region
Finalize the regions one at a time.
Warning
Finalization is irreversible per region. After ZooKeeper is removed from a region’s brokers, that region cannot roll back to ZooKeeper mode.
Trigger finalization in the first region. This removes the ZooKeeper dependency from the Kafka brokers and the migration configuration from the
KRaftController, each of which requires a roll. For details, see Step 5.3.Wait for the region to reach the
COMPLETEphase.Repeat the previous steps for each remaining region. Wait for the
COMPLETEphase before you move to the next region.
After all regions report COMPLETE, complete the post-migration tasks. For details, see Step 6.
Step 5: Roll back each region (if needed)
If you need to return to ZooKeeper, roll back the regions one at a time. Rollback involves up to multiple Kafka broker rolls per region, plus a manual step to delete ZooKeeper znodes. For the full rollback procedure, see Roll Back to ZooKeeper.
Trigger rollback in the first region. Monitor its status until it reaches
SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk, which indicates that the first broker roll is complete and the job is waiting for manual intervention. For details, see Step 1.Delete the
/controllerand/migrationznodes from ZooKeeper as directed by the status condition, then apply the continue annotation to resume the rollback. For details, see Step 2 and Step 3. Wait for the region to reachSubPhaseRollbackToZkComplete.Repeat the previous steps for each remaining region. Wait for
SubPhaseRollbackToZkCompletebefore you move to the next region.
Migrate regions in parallel (optional)
If minimizing migration time is a priority and your cluster can handle multiple brokers restarting at the same time across regions, you can trigger all regions in parallel. Before you do, ensure that both of the following conditions are met:
- KRaft quorum availability
The
KRaftControllerquorum must be large enough to keep a majority even when one controller per region restarts at the same time. A two-region deployment needs at least six KRaft controllers (three per region). A three-region deployment needs at least seven controllers. This keeps the quorum majority intact even with one controller down in every region at the same time.- Topic availability
The replication factor for all topics must be greater than the number of regions. Set
min.insync.replicaslow enough that a partition keeps enough in-sync replicas to accept writes when one replica in each region is offline during its broker restart. For example, in a three-region deployment,RF >= 4withmin.insync.replicas <= RF - 3ensures that one broker restarting per region does not cause topic unavailability.
If either condition is not met, use the sequential, one-region-at-a-time approach described in this topic.