ZooKeeper to KRaft Migration
This topic describes how to migrate your Confluent Platform deployment from ZooKeeper to KRaft by using Confluent for Kubernetes. The migration supports Confluent Platform version 7.6 and later.
Tip
Review the complete end-to-end examples in the GitHub: CFK Examples for KRaft Migration.
The repository includes examples for non-secured clusters, RBAC-enabled clusters, and multi-region clusters (MRC), with complete YAML files, commands, expected outputs, and troubleshooting guidance.
Before you begin
Ensure you meet the following requirements to start the migration process:
Confluent Platform 7.6 or later
# Check CP version kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.status.version}'
Expected:
7.6.0or later.CFK 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or later
# Check CFK version helm list -n confluent
Expected:
confluent-operatorversion 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or laterFor CFK 3.0 or later, add the
platform.confluent.io/use-log4j1: "true"annotation to KRaftController during migration. Add this annotation when creating the KRaftController CR in Step 2.2.Verify Kubernetes webhooks are enabled:
kubectl get validatingwebhookconfigurations | grep confluent
Expected: Webhook configurations like
confluent-operator-validating-webhook-configuration.
Important
If you are migrating a deployment that does not have the webhooks enabled, make sure no other actor, for example, continuous integration and continuous delivery (CI/CD) tools such as GitOps or FluxCD, is updating or deleting ZooKeeper, Kafka, and KRaft resources while migration is in progress.
Step 1: Derive Kafka IBP version
CFK automatically derives the inter-broker protocol (IBP) version from standard Confluent images. For example, confluentinc/cp-server:7.6.0 uses IBP 3.6.
For custom images, manually specify the IBP version using an annotation. Because incorrect IBP version causes migration to fail. Do not set IBP in configOverrides as the migration process manages this automatically.
Confluent Platform version | IBP version |
|---|---|
7.9.x | 3.9 |
7.8.x | 3.8 |
7.7.x | 3.7 |
7.6.x | 3.6 |
7.5.x | 3.5 |
7.4.x | 3.4 |
Step 1.1: Check your Kafka image type
kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.image.application}'
For standard Confluent images, skip to Step 2.
For custom images, continue to the next step.
Step 1.2: Apply IBP annotation for custom images
Apply the IBP annotation matching your Confluent Platform version from the table above:
kubectl annotate kafka <kafka-name> \
platform.confluent.io/kraft-migration-ibp-version="<your-ibp-version>" \
-n <namespace>
The annotation is used by the migration job in Step 3.
Step 2: Create KRaftController CR
For MRC deployments, create and apply the KRaftController CR in each region. The migration does not automatically copy configurations from Kafka to KRaftController. You must explicitly configure KRaftController to match your existing Kafka setup.
Step 2.1: Export current Kafka CR for reference
Use your existing Kafka CR as a template. Most configurations are identical, such as TLS, authentication, authorization, RBAC, and custom JVM settings.
kubectl get kafka <kafka-name> -n <namespace> -o yaml > current-kafka-config.yaml
Step 2.2: Create KRaftController CR with required annotations
Create the kraftcontroller.yaml file.
apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
name: kraftcontroller
namespace: <namespace>
annotations:
platform.confluent.io/kraft-migration-hold-krc-creation: "true"
platform.confluent.io/use-log4j1: "true" # Required for CFK 3.0 or later
spec:
replicas: 3
image:
application: confluentinc/cp-server:<cp-version> # Match your Kafka version
init: confluentinc/confluent-init-container:<init-container-version>
dataVolumeCapacity: 10Gi
Required annotations
kraft-migration-hold-krc-creation: "true": Delays pod creation until the migration job modifies the CR.use-log4j1: "true": Required for CFK 3.0 or later because the migration process requires Log4j 1 for compatibility with Confluent Platform 7.9.x. Remove this annotation after the migration completes. For details, see Step 6.3.
Step 2.3: Add security and configuration settings from your Kafka CR
Review your Kafka CR and add the following configurations to your KRaftController CR as needed.
RBAC configuration (if enabled on Kafka):
You can reuse an existing Kafka super user secret for secretRef: kraftcontroller-credential, or create a new secret. Ensure the principal is listed under spec.authorization.superUsers in both Kafka and KRaftController CRs.
spec:
authorization:
type: rbac
superUsers:
- User:kafka
- User:kraftcontroller
dependencies:
mdsKafkaCluster:
authentication:
type: plain
jaasConfig:
secretRef: kraftcontroller-credential # Create new or reuse existing super user credential
bootstrapEndpoint: kafka.confluent.svc.cluster.local:9071
tls:
enabled: true
Password encoder for cluster linking (if enabled on Kafka):
Extract the password encoder values:
kubectl get secret password-encoder-secret -n <namespace> -o yaml
Add to KRaftController:
spec:
configOverrides:
server:
- password.encoder.secret=<value-from-secret>
- password.encoder.old.secret=<old-value-if-present>
TLS and authentication (if configured on Kafka):
spec:
tls:
secretRef: tls-group1 # Same as Kafka
listeners:
controller:
authentication:
type: plain # Match Kafka's authentication type
jaasConfig:
secretRef: credential
tls:
enabled: true
Custom JVM settings (if configured on Kafka):
spec:
configOverrides:
jvm:
- -Xms4g
- -Xmx4g
Step 2.4: Apply KRaftController CR
kubectl apply -f kraftcontroller.yaml
Step 2.5: Verify KRaftController CR and pod state
Verify that the KRaftController status is HOLD:
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>
Expected: STATUS: HOLD and REPLICAS: 0
Verify no pods are created:
kubectl get pods -n <namespace> -l app=kraftcontroller
Expected: No resources found in <namespace> namespace.
Troubleshoot: If the STATUS is not HOLD or pods are created, verify both the annotations (kraft-migration-hold-krc-creation and use-log4j1) are set to "true" in the CR YAML.
Step 3: Start migration
For MRC deployments, deploy the migration job in each region. The migration job locks ZooKeeper, Kafka, and KRaft CRs to prevent modifications. The lock requires CFK deployments with webhooks enabled.
The KRaftMigrationJob orchestrates the migration process, places locks on resources, manages phased migration, monitors progress, handles errors, and enables safe rollback during DUAL-WRITE.
Step 3.1: Create the KRaftMigrationJob CR
Create the kraftmigrationjob.yaml file.
apiVersion: platform.confluent.io/v1beta1
kind: KRaftMigrationJob
metadata:
name: <migration-job-name>
namespace: <namespace>
spec:
dependencies:
kafka:
name: <kafka-name>
namespace: <namespace>
zookeeper:
name: <zookeeper-name>
namespace: <namespace>
kRaftController:
name: <kraftcontroller-name>
namespace: <namespace>
Step 3.2: Apply KRaftMigrationJob CR
kubectl apply -f kraftmigrationjob.yaml
Step 3.3: Verify that migration has started
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
-o jsonpath='Phase: {.status.phase} | SubPhase: {.status.subPhase}{"\n"}'
Expected: Phase: SETUP | SubPhase: SubPhaseSetup...
Troubleshoot: If the migration job fails to start, verify the CR syntax and ensure all dependencies (Kafka, ZooKeeper, kraftcontroller) exist with exact name matches.
Step 4: Monitor migration
The migration progresses through the following phases: SETUP > MIGRATE > DUAL-WRITE > FINALIZE > COMPLETE.
Step 4.1: Monitor migration progress
Watch phase and subphase progression:
Terminal 1: Monitor the migration job status:
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w
Expected: Real-time progression through phases with STATUS column showing current phase.
Terminal 2: Watch pods with
kubectl get pods -n <namespace> -wTerminal 3: Stream operator logs with
kubectl logs -f deployment/confluent-operator -n confluent | grep -i migration
Step 4.2: Check for errors if migration stalls
If a subphase takes longer than expected, check:
# Check migration job status
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml | grep -A 20 status
# Check operator for errors
kubectl logs deployment/confluent-operator -n <namespace> --tail=100 | grep -i error
# Check pod status
kubectl get pods -n <namespace>
Expected: Current phase/subphase shown, no errors, all pods Running or in normal rolling restarts.
Step 5: Validate and finalize migration
The migration reaches the DUAL-WRITE phase where the cluster writes metadata to both ZooKeeper and KRaft. Validate the migration before finalizing, or check if rollback is needed.
Step 5.1: Verify DUAL-WRITE mode
Check the migration job status:
kubectl get kraftmigrationjob <migration-job-name> -n <namespace>
Expected: STATUS: DUAL-WRITE
Tip
DUAL-WRITE is a stable waiting state. The migration stays in DUAL-WRITE indefinitely until you manually trigger finalization.
Step 5.2: Validate cluster health before finalizing
Run these validation checks to ensure the system is healthy.
Verify all Kafka and KRaftController pods are running:
kubectl get pods -n <namespace> -l app=kafka kubectl get pods -n <namespace> -l app=kraftcontroller
Expected: All pods in Running state.
Verify Kafka and KRaftController status:
kubectl get kafka <kafka-name> -n <namespace> kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>
Expected: Both show
STATUS: RUNNING.Check for errors in logs:
kubectl logs deployment/confluent-operator -n <namespace> --since=24h | grep -i error kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL" kubectl logs <kraftcontroller-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL"
Expected: No ERROR or FATAL messages.
RBAC validation (if enabled)
Verify MDS endpoint is responding:
kubectl exec <kafka-pod-name> -n <namespace> -- curl -k -s -o /dev/null \ -w "%{http_code}" https://<kafka-service>:8090/security/1.0/authenticate
Expected:
400,401, or200(not503or timeout).Verify ACLs are accessible:
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-acls --list \ --bootstrap-server <kafka-bootstrap-server>:9071 \ --command-config /path/to/client.properties
Expected: ACL list displayed without errors.
Step 5.3: Roll back or Finalize
For example, if critical functionality is broken, performance is worse than expected, or you discover KRaft incompatibility, you can Rollback to ZooKeeper.
Warning
Finalizing the migration removes ZooKeeper dependency from Kafka, removes migration configuration from KRaftController, and transitions the cluster irreversibly to KRaft mode. You cannot roll back after this point.
If you have completed the above steps and if you are ready, apply the finalize annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/kraft-migration-trigger-finalize-to-kraft=true \
-n <namespace>
Step 5.4: Verify migration completion
Monitor the migration job until it reaches the COMPLETE phase:
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w
Expected: The migration job progresses to COMPLETE phase.
For detailed phase descriptions, see ZooKeeper to KRaft Migration Phases and Sub-phases.
Step 6: Complete post-migration tasks
After the migration completes successfully, perform these tasks in order.
Step 6.1: Release migration locks
Manually release the migration locks applied in Step 3.
Apply the release lock annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/kraft-migration-release-cr-lock=true \
-n <namespace>
Verify locks are removed:
kubectl get kafka <kafka-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
Expected: No output. It means the locks are released successfully.
Important
Without releasing locks, you cannot modify Kafka or KRaftController configurations, scale resources, or apply upgrades.
Step 6.2: Download updated CRs (optional)
Download the updated CRs for backup or GitOps repository updates:
# Kafka CR
kubectl get kafka <kafka-name> -n <namespace> -o yaml > kafka-kraft-mode.yaml
# KRaftController CR
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml > kraftcontroller.yaml
# Optional: ZooKeeper backup before deletion
kubectl get zookeeper <zookeeper-name> -n <namespace> -o yaml > zookeeper-backup.yaml
Step 6.3: Remove Log4j1 annotation from KRaftController (CFK 3.0 or later)
If using CFK 3.0 or later, remove the platform.confluent.io/use-log4j1 annotation:
kubectl annotate kraftcontroller <kraftcontroller-name> \
platform.confluent.io/use-log4j1- \
-n <namespace>
Verify the annotation is removed:
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> \
-o jsonpath='{.metadata.annotations.platform\.confluent\.io/use-log4j1}'
Expected: Empty output.
Note
This triggers a KRaftController pod roll to apply Log4j 2 configuration, which is normal and safe after the migration completes. Skip this step if you are using CFK 2.x versions.
Step 6.4: Validate KRaft-only operation
Before deleting ZooKeeper, validate Kafka operates correctly in KRaft-only mode.
Verify Kafka has no ZooKeeper dependency:
kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'
Expected: Output shows only kRaftController dependencies with no zookeeper reference.
Verify Kafka is running without ZooKeeper errors:
kubectl get kafka <kafka-name> -n <namespace>
kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -iE "zookeeper.*(error|failed|disconnect|timeout|expired)"
Expected: STATUS shows RUNNING, and no ZooKeeper connection errors in logs.
Step 6.5: Delete the ZooKeeper cluster
Warning
Delete ZooKeeper only after confirming:
Kafka has been stable in KRaft-only mode.
All validation tests pass.
No other Kafka clusters use this ZooKeeper.
You have backups of ZooKeeper data if needed.
Verify that no other Kafka clusters depend on this ZooKeeper:
kubectl get kafka --all-namespaces -o yaml | grep -A 5 "zookeeper:"
Expected: No output.
Delete ZooKeeper cluster:
kubectl delete zookeeper <zookeeper-name> -n <namespace>
Watch ZooKeeper pods terminate:
kubectl get pods -n <namespace> -l app=zookeeper -w
Verify Kafka remains operational:
kubectl get kafka <kafka-name> -n <namespace>
Expected: STATUS shows RUNNING.
Clean up ZooKeeper Persistent Volume Claims:
kubectl get pvc -n <namespace> | grep zookeeper
kubectl delete pvc <pvc-name> -n <namespace>
Warning
This action deletes ZooKeeper data permanently. Only delete PVCs if you no longer need the data.
Step 6.6: Clean up migration resources
This is optional. You can save the final status for records, if needed:
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml > kmj-final-status.yaml
Delete the migration job:
kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>
Check for and delete migration-specific ConfigMaps or Secrets if they exist:
kubectl get configmaps -n <namespace> | grep migration
kubectl get secrets -n <namespace> | grep migration
Rollback to ZooKeeper
Rollback is only supported during DUAL-WRITE phase. After applying the finalize annotation or completing migration, rollback is not possible.
Step 1: Trigger rollback
Apply the rollback annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
-n <namespace>
Step 2: Remove nodes from ZooKeeper
In production environments with secured ZooKeeper, run these commands from inside a ZooKeeper pod.
Step 2.1: Wait for migration job to reach the correct phase
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
-o jsonpath='{.status.subPhase}'
Expected: SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk
Note
Add -zk-tls-config-file <path-to-zookeeper-client-properties> to the zookeeper-shell command only when TLS is enabled.
Step 2.2: Remove controller node
zookeeper-shell <zkhost:zkport> \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller
Step 2.3: Remove migration node
zookeeper-shell <zkhost:zkport> \
deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration
Troubleshoot: For NoAuthException or Failed to delete some node(s) errors, see Troubleshoot ZooKeeper to KRaft Migration Issues.
Step 3: Continue rollback process
Apply the continue annotation:
kubectl annotate kraftmigrationjob <migration-job-name> \
platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
--overwrite \
-n <namespace>
Step 4: Verify rollback completion
Step 4.1: Verify migration job status
kubectl get kraftmigrationjob <migration-job-name> -n <namespace>
Expected: STATUS: COMPLETE
Step 4.2: Verify Kafka is using ZooKeeper
kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'
Expected: Output shows zookeeper dependency.
Step 4.3: Verify data preservation
# List all topics
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-topics --list \
--bootstrap-server kafka:9071 \
--command-config /path/to/client.properties
# Check consumer groups
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-consumer-groups --list \
--bootstrap-server kafka:9071 \
--command-config /path/to/client.properties
Expected: All topics and consumer groups created during DUAL-WRITE are present.
Step 4.4: Check for errors
kubectl logs <kafka-pod-name> -n <namespace> --tail=50 | grep -E "ERROR|FATAL"
kubectl logs deployment/confluent-operator -n <namespace> --tail=50 | grep -i error
Expected: No ERROR or FATAL messages.
Step 5: Clean up after rollback
Step 5.1: Delete migration job
kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>
Step 5.2: Delete KRaftController (optional)
Warning
Delete only the KRaftController if you do not plan to retry migration. If you plan to attempt migration again, do not delete it.
kubectl delete kraftcontroller <kraftcontroller-name> -n <namespace>