Migrate Confluent Platform from ZooKeeper to KRaft using Confluent for Kubernetes

Starting in the 2.8 release, Confluent for Kubernetes (CFK) supports migration from a ZooKeeper-based Confluent Platform deployment to a KRaft-based deployment.

Requirements and considerations

  • Use CFK version 2.8 or higher.

  • CFK only supports migration over the same Confluent Platform version of 7.6 or later.

    Migrating Confluent Platform 7.6.0 clusters is not recommended for production environments. Use Confluent Platform 7.6.1 or later version with CFK 2.8.1 or later for production environment migration.

  • You need to upgrade Confluent Platform first before running the migration.

    You cannot upgrade of Confluent Platform version and migrate ZooKeeper to KRaft at the same time.

  • To prevent unwanted modification to ZooKeeper, Kafka, and KRaft during and after migration, the migration job puts lock on ZooKeeper, Kafka, and KRaft resources.

    This lock is only enforceable for the CFK deployments with webhooks enabled.

    If you are migrating a deployment that does not have the webhooks enabled, make sure no other actor is updating/deleting ZooKeeper, Kafka, and KRaft resources when migration is in progress.

    You need to manually remove the lock at the end of migration after validating a successful migration. For details, see the following migration steps.

  • You can upgrade from ZooKeeper to KRaft in the isolated mode with separate KRaft controller nodes and broker nodes.

    You cannot migrate to the combined mode where KRaft and brokers are on the same process (role=controller, broker).

  • ACL is migrated from ZooKeeper to KRaft.

  • Migration from ZooKeeper to KRaft is not supported in multi-region clusters deployments.

  • CFK does not support rolling back to the ZooKeeper mode once migration to the KRaft mode is triggered.

  • You cannot enable ZooKeeper migration when multiple log directories (JBOD) are in use by the brokers.

  • CFK does not automatically delete the ZooKeeper cluster after migration.

    You can delete the ZooKeeper cluster when you verify that it is not used for any other Kafka cluster.

Migrate ZooKeeper to KRaft

To migrate a ZooKeeper-based Confluent Platform to a KRaft-based deployment, use the KRaftMigrationJob custom resource (CR) and follow the steps below.

Note

When you use CFK to migrate from ZooKeeper to KRaft, CFK automatically fetches the Kafka cluster id. You do not need to manually perform the step described in the manual Confluent Platform migration process.

For an example tutorial of the migration process, see the CFK example GitHub repo.

  1. To help with debugging, you can enable the TRACE level logging for metadata migration.

    In the Kafka CR, set the Kafka log level to TRACE and apply the CR. For more information about Kafka log during migration, see Migrate from ZooKeeper to KRaft on Confluent Platform.

    apiVersion: platform.confluent.io/v1beta1
    kind: Kafka
    spec:
      configOverrides:
        log4j:
          - log4j.logger.org.apache.kafka.metadata.migration=TRACE
    
  2. Create a KRaftController CR with the following annotations and apply the KRaftController CR.

    This migration annotation indicates CFK to wait for KRaftMigrationJob to modify KRaftController CR with migration configurations and then create the KRaftController pods.

    kind: KRaftController
    metadata:
      annotations:
        platform.confluent.io/kraft-migration-hold-krc-creation: "true"
    

    An example KRaftController CR:

    apiVersion: platform.confluent.io/v1beta1
    kind: KRaftController
    metadata:
      name: kraftcontroller
      namespace: confluent
      annotations:
        platform.confluent.io/kraft-migration-hold-krc-creation: "true"
    spec:
      dataVolumeCapacity: 10G
      image:
        application: docker.io/confluentinc/cp-server:7.6.0
        init: confluentinc/confluent-init-container:2.8.0
      replicas: 3
    
  3. Create a KRaftMigrationJob CR and apply the CR to kick off the migration process.

    kind: KRaftMigrationJob
    spec:
      dependencies:
        zookeeper:         # provide ZK clusterRef
        kRaftController:   # provide KRaftController clusterRef
        kafka:             # provide Kafka clusterRef
    

    For example:

    apiVersion: platform.confluent.io/v1beta1
    kind: KRaftMigrationJob
    metadata:
      name: kraftmigrationjob
      namespace: confluent
    spec:
      dependencies:
        kafka:
          name: kafka
          namespace: confluent
        zookeeper:
          name: zookeeper
          namespace: confluent
        kRaftController:
          name: kraftcontroller
          namespace: confluent
    
  4. Monitor the migration progress.

    kubectl get kraftmigrationjob <migration job name> -n <namespace> -oyaml -w
    

    Migration time would be dependent on the following factors:

    • Time to take single Kafka roll multiplied by 3
    • Time to migrate metadata from ZooKeeper to KRaftController
    • Time to take the KRaftController roll
  5. Now the migration process is in the Dual Write phase.

    Check the logs and validate that all data has been migrated without any loss. You can use the following command:

    kubectl logs -f -c=kraftcontroller --selector app=kraftcontroller -n <namespace> | grep "Completed migration of metadata from ZooKeeper to KRaft"
    

    To track the logs, run this command as soon as the migration is started. Note that Kubernetes does not keep the logs of restarted pods.

  6. If validation passes and you wish to complete the migration process, apply the platform.confluent.io/kraft-migration-trigger-finalize-to-kraft annotation to the migration job CR.

    kubectl annotate kraftmigrationjob <migration job name> \
      platform.confluent.io/kraft-migration-trigger-finalize-to-kraft=true \
      --namespace <namespace>
    
  7. Alternatively, if validation fails, or if you want to go back to ZooKeeper, you can roll back to ZooKeeper.

    Rollback can only be done till the cluster is in the Dual Write mode, up to this point. Once you take the controller out of the migration mode and restart in KRaft mode, you can no longer roll back to ZooKeeper mode.

    For the steps, see Roll back to ZooKeeper.

  8. Remove the lock that the migration job placed on the resources.

    The migration job locks the Kafka, ZooKeeper, and KRaft resources on initialization of migration workflow.

    When migration is completed, status message will prompt you to:

    Remove the lock using the platform.confluent.io/kraft-migration-release-cr-lock annotation.

    kubectl annotate kraftmigrationjob <migration job name> \
      platform.confluent.io/kraft-migration-release-cr-lock=true \
      --namespace <namespace>
    
  9. The migration job modifies the yaml representation of Kafka and KRaft CRs. Download the modified Kafka and KRaft CRs to update your CI/CD as necessary.

    For example:

    kubectl get kafka <Kafka CR name> -n <namespace> -oyaml > updated_kafka.yaml
    
    kubectl get kraftcontroller <KRaftcontroller CR name> -n <namespace> -oyaml > updated_kraftcontroller.yaml
    
  10. After migration completes and after verifying that it is not used for any other Kafka cluster, remove the ZooKeeper cluster.

    kubectl delete -f <Zookeeper CR>
    

Roll back to ZooKeeper

If the migration fails, you can roll back to the ZooKeeper cluster at any point in the migration process prior to taking the KRaft controllers out of the migration mode. Up to that point, the controller makes dual writes to KRaft and ZooKeeper. Since the data in ZooKeeper is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZooKeeper.

To roll back to ZooKeeper:

  1. Apply the platform.confluent.io/kraft-migration-trigger-rollback-to-zk annotation to the migration job.

    kubectl annotate kraftmigrationjob <migration job name> \
      platform.confluent.io/kraft-migration-trigger-rollback-to-zk=true --overwrite \
      --namespace <namespace>
    
  2. Remove the controller and controller_epoch nodes.

    When you get the status message to delete the controller nodes from ZooKeeper, run the following commands:

    zookeeper-shell <zkhost:zkport> deleteall <kafka-cr-name>-<kafka-cr-namespace>/controller
    
    zookeeper-shell <zkhost:zkport> deleteall <kafka-cr-name>-<kafka-cr-namespace>/controller_epoch
    
  3. Trigger node removal process.

    kubectl annotate kraftmigrationjob <migration job name> \
      platform.confluent.io/continue-kraft-migration-post-zk-node-removal=true \
      --overwrite \
      --namespace <namespace>