ZooKeeper to KRaft Migration

This topic describes how to migrate your Confluent Platform deployment from ZooKeeper to KRaft by using Confluent for Kubernetes. The migration supports Confluent Platform version 7.6 and later.

Tip

Review the complete end-to-end examples in the GitHub: CFK Examples for KRaft Migration.

The repository includes examples for non-secured clusters, RBAC-enabled clusters, and multi-region clusters (MRC), with complete YAML files, commands, expected outputs, and troubleshooting guidance.

Before you begin

Ensure you meet the following requirements to start the migration process:

Confluent Platform 7.6 or later

# Check CP version
kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.status.version}'

Expected: 7.6.0 or later.

CFK 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or later
```
# Check CFK version
helm list -n confluent
```
Expected: confluent-operator version 2.8.4 or later, 2.9.2 or later, 2.10.x, 2.11.x, or 3.0.x or later
For CFK 3.0 or later, add the platform.confluent.io/use-log4j1: "true" annotation to KRaftController during migration. Add this annotation when creating the KRaftController CR in Step 2.2.
Verify Kubernetes webhooks are enabled:
```
kubectl get validatingwebhookconfigurations | grep confluent
```
Expected: Webhook configurations like confluent-operator-validating-webhook-configuration.
If you are migrating a deployment that does not have webhooks or VAP (ValidatingAdmissionPolicy) enabled, make sure no other actor, for example, continuous integration and continuous delivery (CI/CD) tools such as GitOps or FluxCD, is updating or deleting ZooKeeper, Kafka, and KRaft resources while migration is in progress.

Note

The Confluent kubectl plugin is available in CFK 3.2.1 and later. This is recommended for guided, interactive migration. The plugin validates migration state before executing each action and provides step-by-step guidance. For installation instructions, see Install the Confluent kubectl plugin.

To verify the plugin is installed:
```
kubectl confluent cluster kraft-migration --help
```
The plugin uses the following syntax, and the alias is kmj.
```
kubectl confluent cluster kraft-migration <subcommand> [flags]
```
For available subcommands, see KRaft Migration Plugin Commands.
If you are migrating a deployment that does not have webhooks or VAP (ValidatingAdmissionPolicy) enabled, make sure no other actor is updating or deleting ZooKeeper, Kafka, and KRaft resources while migration is in progress. Other actors include continuous integration and continuous delivery (CI/CD) tools such as GitOps or FluxCD.

Enforce CR locks during KRaft migration

During KRaft migration, CFK locks the Kafka, ZooKeeper, and KRaftController CRs by adding the annotation platform.confluent.io/kraft-migration-cr-lock=true. This prevents accidental modifications or deletions that could disrupt the migration process. Only the CFK operator service account can modify locked CRs.

Note

ValidatingAdmissionPolicy (VAP) support for CR lock enforcement is added in CFK 3.2.1. Earlier versions use webhook-only enforcement.

CFK supports two enforcement mechanisms:

Mechanism	When it applies
ValidatingAdmissionPolicy (VAP)	Clusters where the `ValidatingAdmissionPolicy` API is available. This is GA since Kubernetes version 1.30.
Webhook	Clusters where the `ValidatingAdmissionPolicy` API is not available.

CFK auto-detects VAP support at Helm installation or upgrade. If VAP is available and vapPolicies.enabled: true, which is the default setting. VAP handles CR lock enforcement and the webhook skips CR lock checks to avoid dual enforcement. If the API is not available, the webhook handles CR lock enforcement regardless of the vapPolicies.enabled setting.

When available and vapPolicies.enabled: true, which is the default setting, VAP handles CR lock enforcement, and the webhook skips CR lock checks to avoid dual enforcement. If the API is not available, the webhook handles CR lock enforcement regardless of the vapPolicies.enabled setting.

When a CR is locked during migration, CFK denies the following operations for all users except the CFK operator service account:

UPDATE on Kafka, KRaftController, and Zookeeper CRs
DELETE on Kafka, KRaftController, and Zookeeper CRs

When you attempt a blocked operation, the system returns an error message.

You can disable VAP by setting the following value during Helm install or upgrade:

vapPolicies:
  enabled: false

Bypass CR locks for emergency changes

If you need to make an emergency change to a locked CR during migration, add the bypass annotation to the CR:

Warning

Bypassing the CR lock allows any user to modify the CR. Use this only for emergency scenarios. Incorrect modifications during migration can leave the cluster in an inconsistent state.

kubectl annotate <kind> <name> -n <namespace> \
  platform.confluent.io/allow-request-during-kraft-migration=true

For example:

kubectl annotate kafka kafka -n confluent \
  platform.confluent.io/allow-request-during-kraft-migration=true

After the emergency change is complete, remove the bypass annotation:

kubectl annotate <kind> <name> -n <namespace> \
  platform.confluent.io/allow-request-during-kraft-migration-

Step 1: Derive Kafka IBP version

CFK automatically derives the inter-broker protocol (IBP) version from standard Confluent images. For example, confluentinc/cp-server:7.6.0 uses IBP 3.6.

For custom images, manually specify the IBP version using an annotation because an incorrect IBP version causes the migration to fail. Do not set IBP in configOverrides as the migration process manages this automatically.

Confluent Platform version	IBP version
7.9.x	3.9
7.8.x	3.8
7.7.x	3.7
7.6.x	3.6
7.5.x	3.5
7.4.x	3.4

Step 1.1: Check your Kafka image type

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.image.application}'

For standard Confluent images, skip to Step 2.
For custom images, continue to the next step.

Step 1.2: Apply IBP annotation for custom images

Apply the IBP annotation matching your Confluent Platform version from the table above:

kubectl annotate kafka <kafka-name> \
  platform.confluent.io/kraft-migration-ibp-version="<your-ibp-version>" \
  -n <namespace>

The annotation is used by the migration job in Step 3.

Step 2: Create KRaftController CR

For MRC deployments, create and apply the KRaftController CR in each region. The migration does not automatically copy configurations from Kafka to KRaftController. You must explicitly configure KRaftController to match your existing Kafka setup.

Step 2.1: Export current Kafka CR for reference

Use your existing Kafka CR as a template. Most configurations are identical, such as TLS, authentication, authorization, RBAC, and custom JVM settings.

kubectl get kafka <kafka-name> -n <namespace> -o yaml > current-kafka-config.yaml

Step 2.2: Create KRaftController CR with required annotations

Create the kraftcontroller.yaml file.

apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
  name: kraftcontroller
  namespace: <namespace>
  annotations:
    platform.confluent.io/kraft-migration-hold-krc-creation: "true"
    platform.confluent.io/use-log4j1: "true"  # Required for CFK 3.0 or later
spec:
  replicas: 3
  image:
    application: confluentinc/cp-server:<cp-version>  # Match your Kafka version
    init: confluentinc/confluent-init-container:<init-container-version>
  dataVolumeCapacity: 10Gi

Required annotations

kraft-migration-hold-krc-creation: "true": Delays pod creation until the migration job modifies the CR.
use-log4j1: "true": Required for CFK 3.0 or later because the migration process requires Log4j 1 for compatibility with Confluent Platform 7.9.x. Remove this annotation after the migration completes. For details, see Step 6.3.

Step 2.3: Add security and configuration settings from your Kafka CR

Review your Kafka CR and add the following configurations to your KRaftController CR as needed.

RBAC configuration (if enabled on Kafka):

You can reuse an existing Kafka super user secret for secretRef: kraftcontroller-credential, or create a new secret. Ensure the principal is listed under spec.authorization.superUsers in both Kafka and KRaftController CRs.

spec:
  authorization:
    type: rbac
    superUsers:
      - User:kafka
      - User:kraftcontroller
  dependencies:
    mdsKafkaCluster:
      authentication:
        type: plain
        jaasConfig:
          secretRef: kraftcontroller-credential  # Create new or reuse existing super user credential
      bootstrapEndpoint: kafka.confluent.svc.cluster.local:9071
      tls:
        enabled: true

Password encoder for cluster linking (if enabled on Kafka):

Extract the password encoder values:

kubectl get secret password-encoder-secret -n <namespace> -o yaml

Add to KRaftController:

spec:
  configOverrides:
    server:
      - password.encoder.secret=<value-from-secret>
      - password.encoder.old.secret=<old-value-if-present>

TLS and authentication (if configured on Kafka):

spec:
  tls:
    secretRef: tls-group1  # Same as Kafka
  listeners:
    controller:
      authentication:
        type: plain  # Match Kafka's authentication type
      jaasConfig:
        secretRef: credential
      tls:
        enabled: true

Custom JVM settings (if configured on Kafka):

spec:
  configOverrides:
    jvm:
      - -Xms4g
      - -Xmx4g

Step 2.4: Apply KRaftController CR

kubectl apply -f kraftcontroller.yaml

Step 2.5: Verify KRaftController CR and pod state

Verify that the KRaftController status is HOLD:

kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>

Expected: STATUS: HOLD and REPLICAS: 0

Verify no pods are created:

kubectl get pods -n <namespace> -l app=kraftcontroller

Expected: No resources found in <namespace> namespace.

Troubleshoot: If the STATUS is not HOLD or pods are created, verify both the annotations (kraft-migration-hold-krc-creation and use-log4j1) are set to "true" in the CR YAML.

Step 3: Start migration

For MRC deployments, deploy the migration job in each region. The migration job locks ZooKeeper, Kafka, and KRaft CRs to prevent modifications. The lock requires CFK deployments with webhooks enabled.

The KRaftMigrationJob orchestrates the migration process, places locks on resources, manages phased migration, monitors progress, handles errors, and enables safe rollback during the SETUP, MIGRATE, and DUAL-WRITE phases.

Step 3.1: Create the KRaftMigrationJob CR

Create the kraftmigrationjob.yaml file.

apiVersion: platform.confluent.io/v1beta1
kind: KRaftMigrationJob
metadata:
  name: <migration-job-name>
  namespace: <namespace>
spec:
  dependencies:
    kafka:
      name: <kafka-name>
      namespace: <namespace>
    zookeeper:
      name: <zookeeper-name>
      namespace: <namespace>
    kRaftController:
      name: <kraftcontroller-name>
      namespace: <namespace>

Override config validation

Starting with CFK 3.2.1, CFK validates configOverrides.server before starting KRaft migration. If CFK detects a blocklisted configuration key (for example, zookeeper.connect) on the KRaftController CR, it blocks migration with an actionable error message.

In MRC deployments, you might need to set zookeeper.connect manually on the KRaftController. To bypass the pre-flight check, add the following annotation to your KRaftMigrationJob CR:

apiVersion: platform.confluent.io/v1beta1
kind: KRaftMigrationJob
metadata:
  name: <migration-job-name>
  namespace: <namespace>
  annotations:
    platform.confluent.io/kraft-migration-bypass-prechecks: "true"

Warning

Only use the bypass annotation when you have a specific need to override pre-flight validation, such as in MRC deployments. Bypassing pre-flight checks can result in migration failures if conflicting configurations exist.

Step 3.2: Apply KRaftMigrationJob CR

kubectl apply -f kraftmigrationjob.yaml

Step 3.3: Verify that migration has started

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
  -o jsonpath='Phase: {.status.phase} | SubPhase: {.status.subPhase}{"\n"}'

Expected: Phase: SETUP | SubPhase: SubPhaseSetup...

Using the Confluent kubectl plugin:

List all the migration jobs in the namespace:

kubectl confluent cluster kraft-migration list -n <namespace>

Example output:

NAME                  KAFKA   KRAFTCONTROLLER   ZOOKEEPER   PHASE    SUBPHASE                        AGE
kraftmigrationjob     kafka   kraftcontroller   zookeeper   SETUP    SubPhaseSetupPreChecks           2m

Troubleshoot: If the migration job fails to start, verify the CR syntax and ensure all dependencies (Kafka, ZooKeeper, kraftcontroller) exist with exact name matches. The migration starts with the SETUP phase. In CFK 3.2.1 or later, the first subphase is SubPhaseSetupPreChecks, which validates configOverrides.server for blocklisted configuration keys (for example, zookeeper.connect) that conflict with CFK-managed migration configurations. If CFK finds blocklisted keys, it blocks migration with an actionable error message unless you bypass pre-flight checks using the annotation described in Step 3.1.

Step 4: Monitor migration

The migration progresses through the following phases: SETUP > MIGRATE > DUAL-WRITE > FINALIZE > COMPLETE.

Step 4.1: Monitor migration progress

Watch phase and subphase progression:

Terminal 1: Monitor the migration job status:
```
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w
```
Expected: Real-time progression through phases with STATUS column showing current phase.
Terminal 2: Watch pods with kubectl get pods -n <namespace> -w
Terminal 3: Stream operator logs with kubectl logs -f deployment/confluent-operator -n confluent | grep -i migration

Using the Confluent kubectl plugin:

Check the status of a specific migration job:

kubectl confluent cluster kraft-migration status --name <migration-job-name> -n <namespace>

The status command displays:

The current phase and sub-phase.
Time spent in the current state.
Kafka cluster ID and IBP version.
Health status of all dependencies (Kafka, KRaftController, ZooKeeper) with ready replica counts.
Contextual guidance on what to do next based on the current phase.

Step 4.2: Check for errors if migration stalls

If a subphase takes longer than expected, check:

# Check migration job status
kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml | grep -A 20 status

# Check operator for errors
kubectl logs deployment/confluent-operator -n <namespace> --tail=100 | grep -i error

# Check pod status
kubectl get pods -n <namespace>

Expected: Current phase/subphase shown, no errors, all pods Running or in normal rolling restarts.

Step 5: Validate and finalize migration

The migration reaches the DUAL-WRITE phase where the cluster writes metadata to both ZooKeeper and KRaft. Validate the migration before finalizing, or check if rollback is needed.

Step 5.1: Verify DUAL-WRITE mode

Check the migration job status:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace>

Expected: STATUS: DUAL-WRITE

Tip

DUAL-WRITE is a stable waiting state. The migration stays in DUAL-WRITE indefinitely until you manually trigger finalization.

Using the Confluent kubectl plugin:

Confirm the migration has reached the DUAL-WRITE phase:

kubectl confluent cluster kraft-migration status --name <migration-job-name> -n <namespace>

Verify if the output shows Phase: DUAL-WRITE.

Step 5.2: Validate cluster health before finalizing

Run these validation checks to ensure the system is healthy.

Verify all Kafka and KRaftController pods are running:

kubectl get pods -n <namespace> -l app=kafka
kubectl get pods -n <namespace> -l app=kraftcontroller

Expected: All pods in Running state.

Verify Kafka and KRaftController status:

kubectl get kafka <kafka-name> -n <namespace>
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace>

Expected: Both show STATUS: RUNNING.

Check for errors in logs:

kubectl logs deployment/confluent-operator -n <namespace> --since=24h | grep -i error
kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL"
kubectl logs <kraftcontroller-pod-name> -n <namespace> --since=24h | grep -E "ERROR|FATAL"

Expected: No ERROR or FATAL messages.

RBAC validation (if enabled)

Verify MDS endpoint is responding:

kubectl exec <kafka-pod-name> -n <namespace> -- curl -k -s -o /dev/null \
  -w "%{http_code}" https://<kafka-service>:8090/security/1.0/authenticate

Expected: 400, 401, or 200 (not 503 or timeout).

Verify ACLs are accessible:

kubectl exec <kafka-pod-name> -n <namespace> -- kafka-acls --list \
  --bootstrap-server <kafka-bootstrap-server>:9071 \
  --command-config /path/to/client.properties

Expected: ACL list displayed without errors.

Step 5.3: Roll back or Finalize

For example, if critical functionality is broken, performance is worse than expected, or you discover KRaft incompatibility, you can Rollback to ZooKeeper.

Warning

Finalizing the migration removes ZooKeeper dependency from Kafka, removes migration configuration from KRaftController, and transitions the cluster irreversibly to KRaft mode. You cannot roll back after this point.

If you have completed the above steps and if you are ready, apply the finalize annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-trigger-finalize-to-kraft=true \
  -n <namespace>

Using the Confluent kubectl plugin:

Trigger finalization of the migration:

kubectl confluent cluster kraft-migration finalize --name <migration-job-name> -n <namespace>

As finalization is irreversible, the plugin validates that the migration is in the DUAL-WRITE phase before proceeding and prompts for confirmation.

Step 5.4: Verify migration completion

Monitor the migration job until it reaches the COMPLETE phase:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -w

Expected: The migration job progresses to COMPLETE phase.

For detailed phase descriptions, see ZooKeeper to KRaft Migration Phases and Sub-phases.

Step 6: Complete post-migration tasks

After the migration completes successfully, perform these tasks in order.

Step 6.1: Release migration locks

Manually release the migration locks applied in Step 3.

Apply the release lock annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-release-cr-lock=true \
  -n <namespace>

Using the Confluent kubectl plugin:

Release the migration locks on the Kafka, KRaftController, and ZooKeeper CRs:

kubectl confluent cluster kraft-migration release-lock --name <migration-job-name> -n <namespace>

The plugin validates that the migration is in the COMPLETE phase and prompts for confirmation before releasing locks on Kafka, KRaftController, and ZooKeeper CRs.

Verify locks are removed:

kubectl get kafka <kafka-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml | grep kraft-migration-cr-lock

Expected: No output. It means the locks are released successfully.

Important

Without releasing locks, you cannot modify Kafka or KRaftController configurations, scale resources, or apply upgrades.

Step 6.2: Download updated CRs (optional)

Download the updated CRs for backup or GitOps repository updates:

# Kafka CR
kubectl get kafka <kafka-name> -n <namespace> -o yaml > kafka-kraft-mode.yaml

# KRaftController CR
kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> -o yaml > kraftcontroller.yaml

# Optional: ZooKeeper backup before deletion
kubectl get zookeeper <zookeeper-name> -n <namespace> -o yaml > zookeeper-backup.yaml

Step 6.3: Remove Log4j1 annotation from KRaftController (CFK 3.0 or later)

If using CFK 3.0 or later, remove the platform.confluent.io/use-log4j1 annotation:

kubectl annotate kraftcontroller <kraftcontroller-name> \
  platform.confluent.io/use-log4j1- \
  -n <namespace>

Verify the annotation is removed:

kubectl get kraftcontroller <kraftcontroller-name> -n <namespace> \
  -o jsonpath='{.metadata.annotations.platform\.confluent\.io/use-log4j1}'

Expected: Empty output.

Note

This triggers a KRaftController pod roll to apply Log4j 2 configuration, which is normal and safe after the migration completes. Skip this step if you are using CFK 2.x versions.

Step 6.4: Validate KRaft-only operation

Before deleting ZooKeeper, validate Kafka operates correctly in KRaft-only mode.

Verify Kafka has no ZooKeeper dependency:

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'

Expected: Output shows only kRaftController dependencies with no zookeeper reference.

Verify Kafka is running without ZooKeeper errors:

kubectl get kafka <kafka-name> -n <namespace>
kubectl logs <kafka-pod-name> -n <namespace> --since=24h | grep -iE "zookeeper.*(error|failed|disconnect|timeout|expired)"

Expected: STATUS shows RUNNING, and no ZooKeeper connection errors in logs.

Step 6.5: Delete the ZooKeeper cluster

Warning

Delete ZooKeeper only after confirming:

Kafka has been stable in KRaft-only mode.
All validation tests pass.
No other Kafka clusters use this ZooKeeper.
You have backups of ZooKeeper data if needed.

Verify that no other Kafka clusters depend on this ZooKeeper:

kubectl get kafka --all-namespaces -o yaml | grep -A 5 "zookeeper:"

Expected: No output.

Delete ZooKeeper cluster:

kubectl delete zookeeper <zookeeper-name> -n <namespace>

Watch ZooKeeper pods terminate:

kubectl get pods -n <namespace> -l app=zookeeper -w

Verify Kafka remains operational:

kubectl get kafka <kafka-name> -n <namespace>

Expected: STATUS shows RUNNING.

Clean up ZooKeeper Persistent Volume Claims:

kubectl get pvc -n <namespace> | grep zookeeper
kubectl delete pvc <pvc-name> -n <namespace>

Warning

This action deletes ZooKeeper data permanently. Only delete PVCs if you no longer need the data.

Step 6.6: Clean up migration resources

This is optional. You can save the final status for records, if needed:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> -o yaml > kmj-final-status.yaml

Delete the migration job:

kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>

Check for and delete migration-specific ConfigMaps or Secrets if they exist:

kubectl get configmaps -n <namespace> | grep migration
kubectl get secrets -n <namespace> | grep migration

Rollback to ZooKeeper

Rollback is supported during the SETUP, MIGRATE, and DUAL-WRITE phases. After applying the finalize annotation or completing migration, rollback is not possible.

Step 1: Trigger rollback

Apply the rollback annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
  -n <namespace>

Using the Confluent kubectl plugin:

Trigger a rollback of the migration to ZooKeeper:

kubectl confluent cluster kraft-migration rollback --name <migration-job-name> -n <namespace>

The plugin validates that the migration is in the SETUP, MIGRATE, or DUAL-WRITE phase and prompts for confirmation. After triggering, the plugin displays phase-specific guidance. Monitor progress with the status command and proceed to znode removal when prompted.

Tip

If you triggered rollback before KRaftController was started, znode removal is not needed and rollback completes automatically. You can skip to Step 4.

Step 2: Remove nodes from ZooKeeper

Remove the controller and migration znodes from ZooKeeper. You can use the Confluent kubectl plugin (recommended) or run the commands manually.

First, wait for the migration job to reach the correct phase:

kubectl get kraftmigrationjob <migration-job-name> -n <namespace> \
  -o jsonpath='{.status.subPhase}'

Expected: SubPhaseRollbackToZkWaitForManualNodeRemovalFromZk (when rolling back from MIGRATE or DUAL-WRITE phase) or SubPhaseRollbackToZkFromSetupWaitForManualNodeRemovalFromZk (when rolling back from SETUP phase).

Using the Confluent kubectl plugin (recommended):

The zk-node-removal command interactively removes both znodes and applies the continue annotation in a single workflow. After running this command, skip to Step 4.

kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace>

If ZooKeeper has TLS enabled:

kubectl confluent cluster kraft-migration zk-node-removal --name <migration-job-name> -n <namespace> \
  --zk-tls-config-file /opt/confluentinc/etc/kafka/zk-tls.properties

The plugin guides you through the znode cleanup process, requiring confirmation before each step:

Deletes the controller znode from ZooKeeper.
Deletes the migration znode from ZooKeeper.
Applies the continue annotation to resume rollback.

The plugin automatically derives the ZooKeeper connection details from the KRaftMigrationJob status.

Manual approach:

If the plugin is not available, run the commands manually, then proceed to Step 3 to apply the continue annotation. In production environments with secured ZooKeeper, run these commands from inside a ZooKeeper pod.

Note

Add -zk-tls-config-file <path-to-zookeeper-client-properties> to the zookeeper-shell command only when TLS is enabled.

Step 2.1: Remove controller node

zookeeper-shell <zkhost:zkport> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/controller

Step 2.2: Remove migration node

zookeeper-shell <zkhost:zkport> \
  deleteall /<kafka-cr-name>-<kafka-cr-namespace>/migration

Troubleshoot: For NoAuthException or Failed to delete some node(s) errors, see Troubleshoot ZooKeeper to KRaft Migration Issues.

Step 3: Continue rollback process (manual approach only)

If you used the Confluent kubectl plugin in Step 2, skip to Step 4. The plugin already applied the continue annotation.

If you removed the znodes manually, apply the continue annotation to resume rollback:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
  --overwrite \
  -n <namespace>

Step 4: Verify rollback completion

Step 4.1: Verify migration job status

kubectl get kraftmigrationjob <migration-job-name> -n <namespace>

Expected: STATUS: COMPLETE

Step 4.2: Verify Kafka is using ZooKeeper

kubectl get kafka <kafka-name> -n <namespace> -o jsonpath='{.spec.dependencies}'

Expected: Output shows zookeeper dependency.

Step 4.3: Verify data preservation

# List all topics
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-topics --list \
  --bootstrap-server kafka:9071 \
  --command-config /path/to/client.properties

# Check consumer groups
kubectl exec <kafka-pod-name> -n <namespace> -- kafka-consumer-groups --list \
  --bootstrap-server kafka:9071 \
  --command-config /path/to/client.properties

Expected: All topics and consumer groups created during DUAL-WRITE are present.

Tip

If Kafka pods are not ready or a rolling restart appears stuck after rollback, check spec.configOverrides.server on the Kafka CR. Fix any incorrect values and the roll resumes automatically.

Step 4.4: Check for errors

kubectl logs <kafka-pod-name> -n <namespace> --tail=50 | grep -E "ERROR|FATAL"
kubectl logs deployment/confluent-operator -n <namespace> --tail=50 | grep -i error

Expected: No ERROR or FATAL messages.

Step 5: Clean up after rollback

Step 5.1: Release CR lock

Apply the release lock annotation:

kubectl annotate kraftmigrationjob <migration-job-name> \
  platform.confluent.io/kraft-migration-release-cr-lock=true \
  --overwrite -n <namespace>

Using the Confluent kubectl plugin:

kubectl confluent cluster kraft-migration release-lock --name <migration-job-name> -n <namespace>

Step 5.2: Delete migration job

kubectl delete kraftmigrationjob <migration-job-name> -n <namespace>

Step 5.3: Delete KRaftController

kubectl delete kraftcontroller <kraftcontroller-name> -n <namespace>

Tip

To re-attempt migration after rollback, create a fresh KRaftController with the platform.confluent.io/kraft-migration-hold-krc-creation annotation and a new KRaftMigrationJob.

ZooKeeper to KRaft Migration

Before you begin

Enforce CR locks during KRaft migration

Bypass CR locks for emergency changes

Step 1: Derive Kafka IBP version

Step 1.1: Check your Kafka image type

Step 1.2: Apply IBP annotation for custom images

Step 2: Create KRaftController CR

Step 2.1: Export current Kafka CR for reference

Step 2.2: Create KRaftController CR with required annotations

Step 2.3: Add security and configuration settings from your Kafka CR

Step 2.4: Apply KRaftController CR

Step 2.5: Verify KRaftController CR and pod state

Step 3: Start migration

Step 3.1: Create the KRaftMigrationJob CR

Step 3.2: Apply KRaftMigrationJob CR

Step 3.3: Verify that migration has started

Step 4: Monitor migration

Step 4.1: Monitor migration progress

Step 4.2: Check for errors if migration stalls

Step 5: Validate and finalize migration

Step 5.1: Verify DUAL-WRITE mode

Step 5.2: Validate cluster health before finalizing

Step 5.3: Roll back or Finalize

Step 5.4: Verify migration completion

Step 6: Complete post-migration tasks

Step 6.1: Release migration locks

Step 6.2: Download updated CRs (optional)

Step 6.3: Remove Log4j1 annotation from KRaftController (CFK 3.0 or later)

Step 6.4: Validate KRaft-only operation

Step 6.5: Delete the ZooKeeper cluster

Step 6.6: Clean up migration resources

Rollback to ZooKeeper

Step 1: Trigger rollback

Step 2: Remove nodes from ZooKeeper

Step 3: Continue rollback process (manual approach only)

Step 4: Verify rollback completion

Step 5: Clean up after rollback

Related content