Configure Dynamic KRaft Quorum for Confluent Platform Using Confluent for Kubernetes

Use dynamic KRaft quorum to add and remove controller nodes from the metadata quorum without recreating the entire cluster. This feature is essential for scaling controller capacity, recovering from failures, and supporting single-region deployments with safer operational procedures.

Dynamic quorum is available in Confluent Platform 7.9 and later, and provides flexible quorum management for production environments. This feature is supported only for single-region cluster deployments. Multi-region cluster deployments are not supported.

Greenfield and brownfield support

  • CFK 3.2: Dynamic KRaft quorum support is available only for greenfield deployments.

  • CFK 3.3 and later: A future release is planned to support brownfield scenarios, including migration of existing static KRaft or ZooKeeper-based deployments to dynamic quorum.

Prerequisites and requirements

Review the following prerequisites and requirements before configuring dynamic KRaft quorum.

Version requirements

Feature

Version

Notes

KRaft (Static Quorum)

Confluent Platform 7.4 or later

Original KRaft implementation, continues to be supported

KRaft (Dynamic Quorum)

Confluent Platform 7.9 or later

Required for dynamic quorum features. Confluent Platform 7.9.5 or later is strongly recommended for dynamic quorum. Earlier 7.9.x patch versions have known issues with ZooKeeper to KRaft migration.

Auto-Join Quorum

Confluent Platform 8.2 or later

Simplifies observer promotion automatically. Confluent Platform 8.2 or later is recommended if you want automatic observer promotion using the auto-join feature.

KRaft Dynamic Quorum (CFK)

CFK 3.2 or later

Requires Confluent Platform 7.9 or later. For ZooKeeper to KRaft migration with dynamic quorum, use Confluent Platform 7.9.5 or later only.

KRaft Migration Job

CFK 3.2 or later

Supports ZK to KRaft with dynamic quorum

Infrastructure requirements

  • CFK 3.2 or later installed.

  • Confluent Platform 7.9.5 or later Docker images required.

Static and dynamic quorums

KRaft supports two modes for configuring the controller quorum. Understanding the difference is crucial for selecting the right approach for your deployment.

Static quorum

In the original KRaft implementation, controller membership is fixed at cluster creation:

  • The set of voters is defined by the static controller.quorum.voters property.

  • All controllers must be specified upfront as voters.

  • Cannot add or remove controllers without offline cluster reconfiguration.

  • All nodes must know about each other at bootstrap time.

Static quorum is optimized for basic single-region deployments where the controller set rarely changes.

Dynamic quorum

Dynamic quorum introduces a flexible membership model where controllers can join and leave the quorum online:

Configuration differences:

  • Controllers use controller.quorum.bootstrap.servers to discover the current controller set, instead of a fixed controller.quorum.voters list.

  • New controllers join as observers, which are read-only, and can be promoted to voters.

  • Controllers can be added or removed online using kafka-metadata-quorum commands.

  • The quorum maintains its own voter set in the metadata log.

The following table summarizes the key differences:

Aspect

Static Quorum

Dynamic Quorum

Configuration property

controller.quorum.voters

controller.quorum.bootstrap.servers

Membership changes

Requires offline reconfiguration

Online add or remove via CLI

Initial setup

All voters at startup

Single bootstrap voter + observers

Flexibility

Rigid

Highly flexible

Quorum roles

In Dynamic KRaft, controllers can have different roles in the quorum.

Voter

A controller that participates in the Raft consensus and can vote for leader election. Voters are the only controllers that count toward quorum majority. Voters actively participate in metadata replication and leader election.

Observer

A controller that replicates metadata but does not vote. Observers follow the metadata log but are not part of the quorum calculation. Observers can be promoted to voters online without downtime. This role is useful for:

  • Gradually adding new controllers to the quorum.

  • Preparing standby controllers for failover scenarios.

Bootstrap controller

The first controller that creates the quorum. The bootstrap controller formats its storage with --standalone mode and becomes the initial single-voter quorum. Other controllers join this bootstrap controller as observers.

Limitation

CFK currently provisions static KRaft quorums by default. Dynamic quorum must be explicitly enabled using the CFK native way via spec.

Deploy a single-region cluster with dynamic quorum

This section describes how to deploy a new KRaft cluster in a single region with dynamic quorum enabled. This is the recommended approach for greenfield deployments.

Step 1: Create bootstrap coordination ConfigMap

Dynamic quorum requires a ConfigMap to coordinate which pod becomes the bootstrap controller. This ConfigMap ensures that only one controller formats storage with --standalone mode.

Create the following ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kraftcontroller-dynamic-quorum  --- [1]
  namespace: confluent                   --- [2]
data:
  bootstrap-status: '{"bootstrap_formatted": false}' --- [3]
  • [1] The name of the ConfigMap. This name is referenced by the KRaft controller pods to coordinate bootstrap formatting.

  • [2] The namespace where the KRaft controllers are deployed must match the namespace of this ConfigMap.

  • [3] Required. The initial bootstrap status. Must be exactly as shown. The bootstrap controller will update this value from false to true after formatting storage.

Apply the ConfigMap:

kubectl apply -f kraftcontroller-bootstrap-configmap.yaml

The ConfigMap is required on the cluster which has the bootstrap pod.

Important

  • After the cluster is running and bootstrapped, do not update this ConfigMap to mark bootstrap_formatted as false.

  • The absence of this ConfigMap is treated the same as bootstrap_formatted: true.

Step 2: Create RBAC resources

The bootstrap controller needs permissions to update the ConfigMap to signal that bootstrap formatting is complete.

Create the following RBAC resources:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kraftcontroller-sa       --- [1]
  namespace: confluent            --- [2]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kraftcontroller-bootstrap-role
  namespace: confluent
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["kraftcontroller-dynamic-quorum"]  --- [3]
  verbs: ["get", "update", "patch"]                  --- [4]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kraftcontroller-bootstrap-rolebinding
  namespace: confluent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kraftcontroller-bootstrap-role
subjects:
- kind: ServiceAccount
  name: kraftcontroller-sa
  namespace: confluent
  • [1] The service account used by KRaft controller pods.

  • [2] The namespace where the KRaft controllers are deployed.

  • [3] The ConfigMap name created in the previous step. This ensures the service account can only modify this specific ConfigMap.

  • [4] The permissions required to read and update the ConfigMap status.

Apply the RBAC resources:

kubectl apply -f kraftcontroller-rbac.yaml

Step 3: Deploy KRaft controllers

Create and configure a KRaftController CR with dynamic quorum enabled.

apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
  name: kraftcontroller                           --- [1]
  namespace: confluent                            --- [2]
  annotations:
    platform.confluent.io/broker-id-offset: "100" --- [3]
spec:
  replicas: 3                                     --- [4]
  dataVolumeCapacity: 10Gi                        --- [5]

  image:
    application: confluentinc/cp-server:7.9.5     --- [6]
    init: confluentinc/confluent-init-container:3.2.0

  dynamicQuorumConfig:                            --- [7]
    enabled: true                                 --- [8]
    bootstrapPod: 0               --- [9]

  podTemplate:
    serviceAccountName: kraftcontroller-sa        --- [10]
  • [1] The name of the KRaft controller.

  • [2] The namespace of the KRaft controller.

  • [3] The starting node ID for controllers. Controllers will use IDs starting from 100 (100, 101, 102, and higher). This helps avoid ID conflicts with broker node IDs.

  • [4] The number of controller replicas.

  • [5] The storage capacity for each controller. Size according to your expected metadata volume and retention requirements.

  • [6] Required. Use Confluent Platform 7.9.5 or later for dynamic quorum. Earlier 7.9.x versions have known issues.

  • [7] Required. The dynamic quorum configuration section.

  • [8] Required. Set to true to enable dynamic quorum. When enabled, CFK configures controllers to use controller.quorum.bootstrap.servers instead of controller.quorum.voters.

  • [9] Required. The name of the pod that will be the bootstrap controller. Typically, this is pod-0. Only this pod formats with --standalone mode. Other pods join as observers.

  • [10] Required. The service account created in Step 2 with permissions to update the bootstrap ConfigMap.

Apply the KRaftController CR:

kubectl apply -f kraftcontroller.yaml

Note

The bootstrap controller (kraftcontroller-0) will format its storage and create a single-voter quorum. Additional controllers (kraftcontroller-1 and kraftcontroller-2) will join as observers and wait to be promoted.

Step 4: Verify quorum formation

After the controllers are running, verify that the bootstrap controller is the only voter and other controllers are observers.

Check that all pods are running:

kubectl get pods -n confluent -l app=kraftcontroller

Check the quorum status:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-metadata-quorum --bootstrap-controller localhost:9074 \
  describe --replication

Expected output:

NodeId  LogEndOffset  Lag   LastFetchTimestamp  LastCaughtUpTimestamp  Status
100     <offset>      0     <timestamp>         <timestamp>            Leader
101     <offset>      0     <timestamp>         <timestamp>            Observer
102     <offset>      0     <timestamp>         <timestamp>            Observer

This confirms:

  • Node 100 (kraftcontroller-0) is the leader and only voter.

  • Nodes 101 and 102 (kraftcontroller-1 and kraftcontroller-2) are observers.

  • All observers show Lag: 0, which means they are caught up and ready for promotion. If observers show significant lag, wait for them to catch up before promoting in the next step.

Step 5: Promote observers to voters

To create a full 3-controller quorum, promote the observers to voters. On Confluent Platform 8.2 or later with auto-join enabled, observers automatically promote themselves after they are caught up with the leader. You can skip manual promotion and proceed to verification.

For Confluent Platform 7.9.x or 8.x without auto-join, manually promote observers:

# Promote kraftcontroller-1
kubectl exec kraftcontroller-1 -n confluent -- \
  kafka-metadata-quorum \
  --bootstrap-controller kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074 \
  --command-config <properties-file> \
  add-controller

# Promote kraftcontroller-2
kubectl exec kraftcontroller-2 -n confluent -- \
  kafka-metadata-quorum \
  --bootstrap-controller kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074 \
  --command-config <properties-file> \
  add-controller

You must ensure the following when promoting observers:

  • Always connect --bootstrap-controller to an existing voter, not another observer.

  • Wait for each promotion to complete before promoting the next controller.

  • The add-controller command must be run from the controller being promoted. This is required because the properties file (kafka.properties) contains the node.id and log.dirs for that specific controller pod, which points to the file containing the directory.id for that pod. Running this command from a different pod will result in incorrect node ID or missing directory information.

Step 6: Verify all controllers are voters

Verify that all controllers are now voters with zero lag:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-metadata-quorum --bootstrap-controller localhost:9074 \
  describe --replication

Expected output:

NodeId  LogEndOffset  Lag   LastFetchTimestamp  LastCaughtUpTimestamp  Status
100     <offset>      0     <timestamp>         <timestamp>            Leader
101     <offset>      0     <timestamp>         <timestamp>            Follower
102     <offset>      0     <timestamp>         <timestamp>            Follower

All controllers should show:

  • Lag of 0, which means they are fully caught up.

  • Recent fetch and caught-up timestamps.

  • Status as Leader or Follower, indicating all are voters.

Step 7: Deploy Kafka brokers

After the KRaft controller quorum is healthy, deploy Kafka brokers.

For KRaft-enabled Kafka with dynamic quorum, add a kRaftController cluster reference in the dependencies section:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
  namespace: confluent
spec:
  replicas: 3
  dataVolumeCapacity: 100Gi

  image:
    application: confluentinc/cp-server:7.9.5
    init: confluentinc/confluent-init-container:3.2.0

  dependencies:
    kRaftController:
      clusterRef:
        name: kraftcontroller  --- [1]
        namespace: confluent   --- [2]
  • [1] The name of the KRaftController CR.

  • [2] The namespace of the KRaftController CR.

When dynamicQuorumConfig.enabled: true is set on the controllers, CFK automatically configures brokers to use controller.quorum.bootstrap.servers to discover the current quorum membership dynamically.

Apply the Kafka CR:

kubectl apply -f kafka.yaml

Verify the brokers are running:

kubectl get pods -n confluent -l app=kafka

Your single-region KRaft cluster with dynamic quorum is now deployed and operational.

Verify dynamic quorum deployment

After deploying or migrating to dynamic quorum, perform the following verification steps.

Check quorum status

Verify the quorum has a leader and all expected controllers:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-metadata-quorum --bootstrap-controller localhost:9074 \
  describe --status

Expected output:

ClusterId:        <cluster-id>
LeaderId:         <node-id>
LeaderEpoch:      <epoch>
HighWatermark:    <offset>
MaxFollowerLag:   0
MaxFollowerLagTimeMs: 0
CurrentVoters:    [100,101,102]
CurrentObservers: []

Verify:

  • LeaderId is one of your controller node IDs.

  • CurrentVoters includes all expected controllers.

  • CurrentObservers is empty, meaning all controllers have been promoted.

  • MaxFollowerLag is 0 or very small.

Check replication status

Verify all controllers are replicating metadata with zero lag:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-metadata-quorum --bootstrap-controller localhost:9074 \
  describe --replication

Expected output:

NodeId  LogEndOffset  Lag   LastFetchTimestamp  LastCaughtUpTimestamp  Status
100     1234          0     <timestamp>         <timestamp>            Leader
101     1234          0     <timestamp>         <timestamp>            Follower
102     1234          0     <timestamp>         <timestamp>            Follower

Verify:

  • All controllers have the same LogEndOffset or are within a few offsets.

  • All controllers have Lag of 0.

  • LastFetchTimestamp and LastCaughtUpTimestamp are recent.

Check kraft.version feature flag

Verify that dynamic quorum is enabled by checking the kraft.version feature:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-features --bootstrap-controller localhost:9074 describe

Look for the kraft.version feature in the output:

Feature: kraft.version  SupportedMinVersion: 0  SupportedMaxVersion: 1  FinalizedVersionLevel: 1  Epoch: 1

The kraft.version FinalizedVersionLevel determines the quorum mode:

  • Version 0 = Static quorum (controller.quorum.voters)

  • Version 1 = Dynamic quorum (controller.quorum.bootstrap.servers)

If FinalizedVersionLevel: 1, dynamic quorum is active.

Check controller configuration

Verify that controllers are using controller.quorum.bootstrap.servers:

kubectl exec kraftcontroller-0 -n confluent -- \
  grep 'controller.quorum.bootstrap.servers' /opt/confluentinc/etc/kafka/kafka.properties

Expected output:

controller.quorum.bootstrap.servers=kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074,...

This confirms dynamic quorum configuration.

Troubleshooting

This section describes common issues when deploying and operating dynamic KRaft quorum.

Issue: Bootstrap pod stuck in init container

Symptom: kraftcontroller-0 is in Init:0/1 state indefinitely. The pod does not progress to the running state.

Cause: The bootstrap controller cannot update the ConfigMap due to insufficient RBAC permissions.

Solution: Verify RBAC permissions are correctly configured:

# Check if the service account has permission to update the ConfigMap
kubectl auth can-i update configmaps \
  --as=system:serviceaccount:confluent:kraftcontroller-sa \
  -n confluent

# Should output: yes

# Verify Role and RoleBinding exist
kubectl get role kraftcontroller-bootstrap-role -n confluent
kubectl get rolebinding kraftcontroller-bootstrap-rolebinding -n confluent

If permissions are missing, re-apply the RBAC resources from Step 2 in Deploy a single-region cluster with dynamic quorum.

Check the init container logs for more details:

kubectl logs kraftcontroller-0 -n confluent -c config-init-container

Issue: Observers not auto-promoting (Confluent Platform 8.2 or later)

Symptom: Controllers remain as observers indefinitely, even though auto-join should be enabled in Confluent Platform 8.2 or later.

Cause: Auto-join feature is not enabled. Check the properties file if controller.quorum.auto.join.enable is defined and set to true. If not, then it can be added via configOverrides.

Solution: Check the replication status to see if observers are caught up:

kubectl exec kraftcontroller-0 -n confluent -- \
  kafka-metadata-quorum --bootstrap-controller localhost:9074 \
  describe --replication

Check the Lag column for observers:

  • If Lag > 0, observers are still replicating metadata. Wait for replication to catch up (Lag = 0) before expecting auto-promotion.

  • If Lag = 0 and observers still do not promote, auto-join may not be enabled. Manually promote observers using the add-controller command as shown in Deploy a single-region cluster with dynamic quorum.

Note

CFK does not currently expose a configuration toggle to disable/enable auto-join. CFK attempts to detect the Confluent Platform version from the image to automatically enable auto-join for 8.2 or later. Version detection can fail if you use custom Docker images or images without standard version tags.

Issue: Observer promotion fails during migration

Symptom: The add-controller command fails with an error message about version compatibility or feature flags.

Cause: The inter-broker protocol (IBP) version is not set to 3.9, which is required for dynamic quorum (kraft.version=1).

Solution: Verify the IBP version is set correctly by checking the properties file:

kubectl exec kraftcontroller-0 -n confluent -- \
  grep 'inter.broker.protocol.version' /opt/confluentinc/etc/kafka/kafka.properties

Expected output:

inter.broker.protocol.version=3.9

If the IBP version is missing or shows a different value, add or update the annotation:

kubectl annotate kafka kafka \
  platform.confluent.io/kraft-migration-ibp-version=3.9 \
  --overwrite \
  -n confluent

After updating the annotation, restart the migration or retry observer promotion.

Known issues

Be aware of the following known issues when using dynamic KRaft quorum.

Confluent Platform version issues

Issue: Confluent Platform versions 7.9.0 through 7.9.4 have known issues with kraft.version conversion and dynamic quorum stability.

Impact: Dynamic quorum deployments or migrations may fail or behave unpredictably.

Solution: Always use Confluent Platform 7.9.5 or later or Confluent Platform 8.0 or later for dynamic quorum deployments. Do not use 7.9.0-7.9.4 for production.

LoadBalancer configuration issue

Issue: When using advertisedListenersEnabled: true with LoadBalancer on KRaft controllers within a single cluster/namespace, controllers cannot connect to themselves using their external addresses.

Impact: Controller communication fails, quorum cannot form.

Root cause: This is not a hairpin NAT issue. The networking layer works correctly, but KRaft internal logic fails when using advertised external addresses for same-cluster communication.

Workaround:

  • Set advertisedListenersEnabled: true on KRaft controllers for single-region deployments.

Auto-join version detection

Issue: CFK attempts to detect the Confluent Platform version from the Docker image to automatically enable the auto-join feature for Confluent Platform 8.2 or later. Version detection can fail if you use custom Docker images without standard version tags.

Impact: Auto-join may not be enabled even when using Confluent Platform 8.2 or later, requiring manual observer promotion.

Workaround: You should use config overrides to define controller.quorum.auto.join.enable=true.