ZooKeeper to KRaft Migration

This topic describes how to migrate your Confluent Platform deployment from ZooKeeper to KRaft by using Confluent for Kubernetes. The migration supports Confluent Platform version 7.6 and later.

Tip

Review the complete end-to-end examples in the GitHub: CFK Examples for KRaft Migration.

The repository includes examples for non-secured clusters, RBAC-enabled clusters, and multi-region clusters (MRC), with complete YAML files, commands, expected outputs, and troubleshooting guidance.

Before you begin

Ensure you meet the following requirements to start the migration process:

  • Confluent Platform 7.6 or later

    # Check CP version
    kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.status.version}'
    

    Expected: 7.6.0 or later.

  • CFK 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or later

    # Check CFK version
    helm list -n confluent
    

    Expected: confluent-operator version 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or later

  • For CFK 3.0 or later, add the platform.confluent.io/use-log4j1: "true" annotation to KRaftController during migration. Add this annotation when creating the KRaftController CR in Step 2.2.

  • Verify Kubernetes webhooks are enabled:

    kubectl get validatingwebhookconfigurations | grep confluent
    

    Expected: Webhook configurations like confluent-operator-validating-webhook-configuration.

Important

If you are migrating a deployment that does not have the webhooks enabled, make sure no other actor, for example, continuous integration and continuous delivery (CI/CD) tools such as GitOps or FluxCD, is updating or deleting ZooKeeper, Kafka, and KRaft resources while migration is in progress.

Step 1: Derive Kafka IBP version

CFK automatically derives the inter-broker protocol (IBP) version from standard Confluent images. For example, confluentinc/cp-server:7.6.0 uses IBP 3.6.

For custom images, manually specify the IBP version using an annotation. Because incorrect IBP version causes migration to fail. Do not set IBP in configOverrides as the migration process manages this automatically.

Confluent Platform version

IBP version

7.9.x

3.9

7.8.x

3.8

7.7.x

3.7

7.6.x

3.6

7.5.x

3.5

7.4.x

3.4

Step 1.1: Check your Kafka image type

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.image.application}'
  • For standard Confluent images, skip to Step 2.

  • For custom images, continue to the next step.

Step 1.2: Apply IBP annotation for custom images

Apply the IBP annotation matching your Confluent Platform version from the table above:

kubectl annotate kafka <kafka-name> \
  platform.confluent.io/kraft-migration-ibp-version="<your-ibp-version>" \
  -n <namespace>

The annotation is used by the migration job in Step 3.

Step 2: Create KRaftController CR

For MRC deployments, create and apply the KRaftController CR in each region. The migration does not automatically copy configurations from Kafka to KRaftController. You must explicitly configure KRaftController to match your existing Kafka setup.

Step 2.1: Export current Kafka CR for reference

Use your existing Kafka CR as a template. Most configurations are identical, such as TLS, authentication, authorization, RBAC, and custom JVM settings.

kubectl get kafka <kafka-name> -n <namespace> -o yaml > current-kafka-config.yaml

Step 2.2: Create KRaftController CR with required annotations

Create the kraftcontroller.yaml file.

apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
  name: kraftcontroller
  namespace: <namespace>
  annotations:
    platform.confluent.io/kraft-migration-hold-krc-creation: "true"
    platform.confluent.io/use-log4j1: "true"  # Required for CFK 3.0 or later
spec:
  replicas: 3
  image:
    application: confluentinc/cp-server:<cp-version>  # Match your Kafka version
    init: confluentinc/confluent-init-container:<init-container-version>
  dataVolumeCapacity: 10Gi

Required annotations

  • kraft-migration-hold-krc-creation: "true": Delays pod creation until the migration job modifies the CR.

  • use-log4j1: "true": Required for CFK 3.0 or later because the migration process requires Log4j 1 for compatibility with Confluent Platform 7.9.x. Remove this annotation after the migration completes. For details, see Step 6.3.

Step 2.3: Add security and configuration settings from your Kafka CR

Review your Kafka CR and add the following configurations to your KRaftController CR as needed.

RBAC configuration (if enabled on Kafka):

You can reuse an existing Kafka super user secret for secretRef: kraftcontroller-credential, or create a new secret. Ensure the principal is listed under spec.authorization.superUsers in both Kafka and KRaftController CRs.

spec:
  authorization:
    type: rbac
    superUsers:
      - User:kafka
      - User:kraftcontroller
  dependencies:
    mdsKafkaCluster:
      authentication:
        type: plain
        jaasConfig:
          secretRef: kraftcontroller-credential  # Create new or reuse existing super user credential
      bootstrapEndpoint: kafka.confluent.svc.cluster.local:9071
      tls:
        enabled: true

Password encoder for cluster linking (if enabled on Kafka):

Extract the password encoder values:

kubectl get secret password-encoder-secret -n <namespace> -o yaml

Add to KRaftController:

spec:
  configOverrides:
    server:
      - password.encoder.secret=<value-from-secret>
      - password.encoder.old.secret=<old-value-if-present>

TLS and authentication (if configured on Kafka):

spec:
  tls:
    secretRef: tls-group1  # Same as Kafka
  listeners:
    controller:
      authentication:
        type: plain  # Match Kafka's authentication type
      jaasConfig:
        secretRef: credential
      tls:
        enabled: true

Custom JVM settings (if configured on Kafka):

spec:
  configOverrides:
    jvm:
      - -Xms4g
      - -Xmx4g

Step 2.4: Apply KRaftController CR

kubectl apply -f kraftcontroller.yaml

Step 2.5: Verify KRaftController CR and pod state

Verify that the KRaftController status is HOLD:

kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>

Expected: STATUS: HOLD and REPLICAS: 0

Verify no pods are created:

kubectl get pods -n <namespace> -l app=kraftcontroller

Expected: No resources found in <namespace> namespace.

Troubleshoot: If the STATUS is not HOLD or pods are created, verify both the annotations (kraft-migration-hold-krc-creation and use-log4j1) are set to "true" in the CR YAML.

Step 3: Start migration

For MRC deployments, deploy the migration job in each region. The migration job locks ZooKeeper, Kafka, and KRaft CRs to prevent modifications. The lock requires CFK deployments with webhooks enabled.

The KRaftMigrationJob orchestrates the migration process, places locks on resources, manages phased migration, monitors progress, handles errors, and enables safe rollback during DUAL-WRITE.

Step 3.1: Create the KRaftMigrationJob CR

Create the kraftmigrationjob.yaml file.

apiVersion: platform.confluent.io/v1beta1
kind: KRaftMigrationJob
metadata:
  name: <migration-job-name>
  namespace: <namespace>
spec:
  dependencies:
    kafka:
      name: <kafka-name>
      namespace: <namespace>
    zookeeper:
      name: <zookeeper-name>
      namespace: <namespace>
    kRaftController:
      name: <kraftcontroller-name>
      namespace: <namespace>

Step 3.2: Apply KRaftMigrationJob CR

kubectl apply -f kraftmigrationjob.yaml

Step 3.3: Verify that migration has started

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
  -o jsonpath='Phase: {.status.phase} | SubPhase: {.status.subPhase}{"\n"}'

Expected: Phase: SETUP | SubPhase: SubPhaseSetup...

Troubleshoot: If the migration job fails to start, verify the CR syntax and ensure all dependencies (Kafka, ZooKeeper, kraftcontroller) exist with exact name matches.

Step 4: Monitor migration

The migration progresses through the following phases: SETUP > MIGRATE > DUAL-WRITE > FINALIZE > COMPLETE.

Step 4.1: Monitor migration progress

Watch phase and subphase progression:

  • Terminal 1: Monitor the migration job status:

    kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w
    

    Expected: Real-time progression through phases with STATUS column showing current phase.

  • Terminal 2: Watch pods with kubectl get pods -n <namespace> -w

  • Terminal 3: Stream operator logs with kubectl logs -f deployment/confluent-operator -n confluent | grep -i migration

Step 4.2: Check for errors if migration stalls

If a subphase takes longer than expected, check:

# Check migration job status
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml | grep -A 20 status

# Check operator for errors
kubectl logs deployment/confluent-operator -n <namespace> --tail=100 | grep -i error

# Check pod status
kubectl get pods -n <namespace>

Expected: Current phase/subphase shown, no errors, all pods Running or in normal rolling restarts.

Step 5: Validate and finalize migration

The migration reaches the DUAL-WRITE phase where the cluster writes metadata to both ZooKeeper and KRaft. Validate the migration before finalizing, or check if rollback is needed.

Step 5.1: Verify DUAL-WRITE mode

Check the migration job status:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace>

Expected: STATUS: DUAL-WRITE

Tip

DUAL-WRITE is a stable waiting state. The migration stays in DUAL-WRITE indefinitely until you manually trigger finalization.

Step 5.2: Validate cluster health before finalizing

Run these validation checks to ensure the system is healthy.

  1. Verify all Kafka and KRaftController pods are running:

    kubectl get pods -n <namespace> -l app=kafka
    kubectl get pods -n <namespace> -l app=kraftcontroller
    

    Expected: All pods in Running state.

  2. Verify Kafka and KRaftController status:

    kubectl get kafka <kafka-name> -n <namespace>
    kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>
    

    Expected: Both show STATUS: RUNNING.

  3. Check for errors in logs:

    kubectl logs deployment/confluent-operator -n <namespace> --since=24h | grep -i error
    kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL"
    kubectl logs <kraftcontroller-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL"
    

    Expected: No ERROR or FATAL messages.

RBAC validation (if enabled)

  1. Verify MDS endpoint is responding:

    kubectl exec <kafka-pod-name> -n <namespace> -- curl -k -s -o /dev/null \
      -w "%{http_code}" https://<kafka-service>:8090/security/1.0/authenticate
    

    Expected: 400, 401, or 200 (not 503 or timeout).

  2. Verify ACLs are accessible:

    kubectl exec <kafka-pod-name> -n <namespace> -- kafka-acls --list \
      --bootstrap-server <kafka-bootstrap-server>:9071 \
      --command-config /path/to/client.properties
    

    Expected: ACL list displayed without errors.

Step 5.3: Roll back or Finalize

For example, if critical functionality is broken, performance is worse than expected, or you discover KRaft incompatibility, you can Rollback to ZooKeeper.

Warning

Finalizing the migration removes ZooKeeper dependency from Kafka, removes migration configuration from KRaftController, and transitions the cluster irreversibly to KRaft mode. You cannot roll back after this point.

If you have completed the above steps and if you are ready, apply the finalize annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-trigger-finalize-to-kraft=true \
  -n <namespace>

Step 5.4: Verify migration completion

Monitor the migration job until it reaches the COMPLETE phase:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w

Expected: The migration job progresses to COMPLETE phase.

For detailed phase descriptions, see ZooKeeper to KRaft Migration Phases and Sub-phases.

Step 6: Complete post-migration tasks

After the migration completes successfully, perform these tasks in order.

Step 6.1: Release migration locks

Manually release the migration locks applied in Step 3.

Apply the release lock annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-release-cr-lock=true \
  -n <namespace>

Verify locks are removed:

kubectl get kafka <kafka-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock

Expected: No output. It means the locks are released successfully.

Important

Without releasing locks, you cannot modify Kafka or KRaftController configurations, scale resources, or apply upgrades.

Step 6.2: Download updated CRs (optional)

Download the updated CRs for backup or GitOps repository updates:

# Kafka CR
kubectl get kafka <kafka-name> -n <namespace> -o yaml > kafka-kraft-mode.yaml

# KRaftController CR
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml > kraftcontroller.yaml

# Optional: ZooKeeper backup before deletion
kubectl get zookeeper <zookeeper-name> -n <namespace> -o yaml > zookeeper-backup.yaml

Step 6.3: Remove Log4j1 annotation from KRaftController (CFK 3.0 or later)

If using CFK 3.0 or later, remove the platform.confluent.io/use-log4j1 annotation:

kubectl annotate kraftcontroller <kraftcontroller-name> \
  platform.confluent.io/use-log4j1- \
  -n <namespace>

Verify the annotation is removed:

kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> \
  -o jsonpath='{.metadata.annotations.platform\.confluent\.io/use-log4j1}'

Expected: Empty output.

Note

This triggers a KRaftController pod roll to apply Log4j 2 configuration, which is normal and safe after the migration completes. Skip this step if you are using CFK 2.x versions.

Step 6.4: Validate KRaft-only operation

Before deleting ZooKeeper, validate Kafka operates correctly in KRaft-only mode.

Verify Kafka has no ZooKeeper dependency:

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'

Expected: Output shows only kRaftController dependencies with no zookeeper reference.

Verify Kafka is running without ZooKeeper errors:

kubectl get kafka <kafka-name> -n <namespace>
kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -iE "zookeeper.*(error|failed|disconnect|timeout|expired)"

Expected: STATUS shows RUNNING, and no ZooKeeper connection errors in logs.

Step 6.5: Delete the ZooKeeper cluster

Warning

Delete ZooKeeper only after confirming:

  • Kafka has been stable in KRaft-only mode.

  • All validation tests pass.

  • No other Kafka clusters use this ZooKeeper.

  • You have backups of ZooKeeper data if needed.

Verify that no other Kafka clusters depend on this ZooKeeper:

kubectl get kafka --all-namespaces -o yaml | grep -A 5 "zookeeper:"

Expected: No output.

Delete ZooKeeper cluster:

kubectl delete zookeeper <zookeeper-name> -n <namespace>

Watch ZooKeeper pods terminate:

kubectl get pods -n <namespace> -l app=zookeeper -w

Verify Kafka remains operational:

kubectl get kafka <kafka-name> -n <namespace>

Expected: STATUS shows RUNNING.

Clean up ZooKeeper Persistent Volume Claims:

kubectl get pvc -n <namespace> | grep zookeeper
kubectl delete pvc <pvc-name> -n <namespace>

Warning

This action deletes ZooKeeper data permanently. Only delete PVCs if you no longer need the data.

Step 6.6: Clean up migration resources

This is optional. You can save the final status for records, if needed:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml > kmj-final-status.yaml

Delete the migration job:

kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>

Check for and delete migration-specific ConfigMaps or Secrets if they exist:

kubectl get configmaps -n <namespace> | grep migration
kubectl get secrets -n <namespace> | grep migration

Rollback to ZooKeeper

Rollback is only supported during DUAL-WRITE phase. After applying the finalize annotation or completing migration, rollback is not possible.

Step 1: Trigger rollback

Apply the rollback annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
  -n <namespace>

Step 2: Remove nodes from ZooKeeper

In production environments with secured ZooKeeper, run these commands from inside a ZooKeeper pod.

Step 2.1: Wait for migration job to reach the correct phase

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
  -o jsonpath='{.status.subPhase}'

Expected: SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk

Note

Add -zk-tls-config-file <path-to-zookeeper-client-properties> to the zookeeper-shell command only when TLS is enabled.

Step 2.2: Remove controller node

zookeeper-shell <zkhost:zkport> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller

Step 2.3: Remove migration node

zookeeper-shell <zkhost:zkport> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration

Troubleshoot: For NoAuthException or Failed to delete some node(s) errors, see Troubleshoot ZooKeeper to KRaft Migration Issues.

Step 3: Continue rollback process

Apply the continue annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
  --overwrite \
  -n <namespace>

Step 4: Verify rollback completion

Step 4.1: Verify migration job status

kubectl get kraftmigrationjob <migration-job-name> -n <namespace>

Expected: STATUS: COMPLETE

Step 4.2: Verify Kafka is using ZooKeeper

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'

Expected: Output shows zookeeper dependency.

Step 4.3: Verify data preservation

# List all topics
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-topics --list \
  --bootstrap-server kafka:9071 \
  --command-config /path/to/client.properties

# Check consumer groups
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-consumer-groups --list \
  --bootstrap-server kafka:9071 \
  --command-config /path/to/client.properties

Expected: All topics and consumer groups created during DUAL-WRITE are present.

Step 4.4: Check for errors

kubectl logs <kafka-pod-name> -n <namespace> --tail=50 | grep -E "ERROR|FATAL"
kubectl logs deployment/confluent-operator -n <namespace> --tail=50 | grep -i error

Expected: No ERROR or FATAL messages.

Step 5: Clean up after rollback

Step 5.1: Delete migration job

kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>

Step 5.2: Delete KRaftController (optional)

Warning

Delete only the KRaftController if you do not plan to retry migration. If you plan to attempt migration again, do not delete it.

kubectl delete kraftcontroller <kraftcontroller-name> -n <namespace>