Configure Dynamic KRaft Quorum for Confluent Platform Using Confluent for Kubernetes
Use dynamic KRaft quorum to add and remove controller nodes from the metadata quorum without recreating the entire cluster. This feature is essential for scaling controller capacity, recovering from failures, and supporting single-region deployments with safer operational procedures.
Dynamic quorum is available in Confluent Platform 7.9 and later, and provides flexible quorum management for production environments. This feature is supported only for single-region cluster deployments. Multi-region cluster deployments are not supported.
Greenfield and brownfield support
CFK 3.2: Dynamic KRaft quorum support is available only for greenfield deployments.
CFK 3.3 and later: A future release is planned to support brownfield scenarios, including migration of existing static KRaft or ZooKeeper-based deployments to dynamic quorum.
Prerequisites and requirements
Review the following prerequisites and requirements before configuring dynamic KRaft quorum.
Version requirements
Feature | Version | Notes |
|---|---|---|
KRaft (Static Quorum) | Confluent Platform 7.4 or later | Original KRaft implementation, continues to be supported |
KRaft (Dynamic Quorum) | Confluent Platform 7.9 or later | Required for dynamic quorum features. Confluent Platform 7.9.5 or later is strongly recommended for dynamic quorum. Earlier 7.9.x patch versions have known issues with ZooKeeper to KRaft migration. |
Auto-Join Quorum | Confluent Platform 8.2 or later | Simplifies observer promotion automatically. Confluent Platform 8.2 or later is recommended if you want automatic observer promotion using the auto-join feature. |
KRaft Dynamic Quorum (CFK) | CFK 3.2 or later | Requires Confluent Platform 7.9 or later. For ZooKeeper to KRaft migration with dynamic quorum, use Confluent Platform 7.9.5 or later only. |
KRaft Migration Job | CFK 3.2 or later | Supports ZK to KRaft with dynamic quorum |
Infrastructure requirements
CFK 3.2 or later installed.
Confluent Platform 7.9.5 or later Docker images required.
Static and dynamic quorums
KRaft supports two modes for configuring the controller quorum. Understanding the difference is crucial for selecting the right approach for your deployment.
Static quorum
In the original KRaft implementation, controller membership is fixed at cluster creation:
The set of voters is defined by the static
controller.quorum.votersproperty.All controllers must be specified upfront as voters.
Cannot add or remove controllers without offline cluster reconfiguration.
All nodes must know about each other at bootstrap time.
Static quorum is optimized for basic single-region deployments where the controller set rarely changes.
Dynamic quorum
Dynamic quorum introduces a flexible membership model where controllers can join and leave the quorum online:
Configuration differences:
Controllers use
controller.quorum.bootstrap.serversto discover the current controller set, instead of a fixedcontroller.quorum.voterslist.New controllers join as observers, which are read-only, and can be promoted to voters.
Controllers can be added or removed online using
kafka-metadata-quorumcommands.The quorum maintains its own voter set in the metadata log.
The following table summarizes the key differences:
Aspect | Static Quorum | Dynamic Quorum |
|---|---|---|
Configuration property |
|
|
Membership changes | Requires offline reconfiguration | Online add or remove via CLI |
Initial setup | All voters at startup | Single bootstrap voter + observers |
Flexibility | Rigid | Highly flexible |
Quorum roles
In Dynamic KRaft, controllers can have different roles in the quorum.
- Voter
A controller that participates in the Raft consensus and can vote for leader election. Voters are the only controllers that count toward quorum majority. Voters actively participate in metadata replication and leader election.
- Observer
A controller that replicates metadata but does not vote. Observers follow the metadata log but are not part of the quorum calculation. Observers can be promoted to voters online without downtime. This role is useful for:
Gradually adding new controllers to the quorum.
Preparing standby controllers for failover scenarios.
- Bootstrap controller
The first controller that creates the quorum. The bootstrap controller formats its storage with
--standalonemode and becomes the initial single-voter quorum. Other controllers join this bootstrap controller as observers.
Limitation
CFK currently provisions static KRaft quorums by default. Dynamic quorum must be explicitly enabled using the CFK native way via spec.
Deploy a single-region cluster with dynamic quorum
This section describes how to deploy a new KRaft cluster in a single region with dynamic quorum enabled. This is the recommended approach for greenfield deployments.
Step 1: Create bootstrap coordination ConfigMap
Dynamic quorum requires a ConfigMap to coordinate which pod becomes the bootstrap controller. This ConfigMap ensures that only one controller formats storage with --standalone mode.
Create the following ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: kraftcontroller-dynamic-quorum --- [1]
namespace: confluent --- [2]
data:
bootstrap-status: '{"bootstrap_formatted": false}' --- [3]
[1] The name of the ConfigMap. This name is referenced by the KRaft controller pods to coordinate bootstrap formatting.
[2] The namespace where the KRaft controllers are deployed must match the namespace of this ConfigMap.
[3] Required. The initial bootstrap status. Must be exactly as shown. The bootstrap controller will update this value from false to true after formatting storage.
Apply the ConfigMap:
kubectl apply -f kraftcontroller-bootstrap-configmap.yaml
The ConfigMap is required on the cluster which has the bootstrap pod.
Important
After the cluster is running and bootstrapped, do not update this ConfigMap to mark
bootstrap_formattedas false.The absence of this ConfigMap is treated the same as
bootstrap_formatted: true.
Step 2: Create RBAC resources
The bootstrap controller needs permissions to update the ConfigMap to signal that bootstrap formatting is complete.
Create the following RBAC resources:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kraftcontroller-sa --- [1]
namespace: confluent --- [2]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kraftcontroller-bootstrap-role
namespace: confluent
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["kraftcontroller-dynamic-quorum"] --- [3]
verbs: ["get", "update", "patch"] --- [4]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kraftcontroller-bootstrap-rolebinding
namespace: confluent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kraftcontroller-bootstrap-role
subjects:
- kind: ServiceAccount
name: kraftcontroller-sa
namespace: confluent
[1] The service account used by KRaft controller pods.
[2] The namespace where the KRaft controllers are deployed.
[3] The ConfigMap name created in the previous step. This ensures the service account can only modify this specific ConfigMap.
[4] The permissions required to read and update the ConfigMap status.
Apply the RBAC resources:
kubectl apply -f kraftcontroller-rbac.yaml
Step 3: Deploy KRaft controllers
Create and configure a KRaftController CR with dynamic quorum enabled.
apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
name: kraftcontroller --- [1]
namespace: confluent --- [2]
annotations:
platform.confluent.io/broker-id-offset: "100" --- [3]
spec:
replicas: 3 --- [4]
dataVolumeCapacity: 10Gi --- [5]
image:
application: confluentinc/cp-server:7.9.5 --- [6]
init: confluentinc/confluent-init-container:3.2.0
dynamicQuorumConfig: --- [7]
enabled: true --- [8]
bootstrapPod: 0 --- [9]
podTemplate:
serviceAccountName: kraftcontroller-sa --- [10]
[1] The name of the KRaft controller.
[2] The namespace of the KRaft controller.
[3] The starting node ID for controllers. Controllers will use IDs starting from 100 (100, 101, 102, and higher). This helps avoid ID conflicts with broker node IDs.
[4] The number of controller replicas.
[5] The storage capacity for each controller. Size according to your expected metadata volume and retention requirements.
[6] Required. Use Confluent Platform 7.9.5 or later for dynamic quorum. Earlier 7.9.x versions have known issues.
[7] Required. The dynamic quorum configuration section.
[8] Required. Set to
trueto enable dynamic quorum. When enabled, CFK configures controllers to usecontroller.quorum.bootstrap.serversinstead ofcontroller.quorum.voters.[9] Required. The name of the pod that will be the bootstrap controller. Typically, this is pod-0. Only this pod formats with
--standalonemode. Other pods join as observers.[10] Required. The service account created in Step 2 with permissions to update the bootstrap ConfigMap.
Apply the KRaftController CR:
kubectl apply -f kraftcontroller.yaml
Note
The bootstrap controller (kraftcontroller-0) will format its storage and create a single-voter quorum. Additional controllers (kraftcontroller-1 and kraftcontroller-2) will join as observers and wait to be promoted.
Step 4: Verify quorum formation
After the controllers are running, verify that the bootstrap controller is the only voter and other controllers are observers.
Check that all pods are running:
kubectl get pods -n confluent -l app=kraftcontroller
Check the quorum status:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-metadata-quorum --bootstrap-controller localhost:9074 \
describe --replication
Expected output:
NodeId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp Status
100 <offset> 0 <timestamp> <timestamp> Leader
101 <offset> 0 <timestamp> <timestamp> Observer
102 <offset> 0 <timestamp> <timestamp> Observer
This confirms:
Node 100 (
kraftcontroller-0) is the leader and only voter.Nodes 101 and 102 (
kraftcontroller-1andkraftcontroller-2) are observers.All observers show
Lag: 0, which means they are caught up and ready for promotion. If observers show significant lag, wait for them to catch up before promoting in the next step.
Step 5: Promote observers to voters
To create a full 3-controller quorum, promote the observers to voters. On Confluent Platform 8.2 or later with auto-join enabled, observers automatically promote themselves after they are caught up with the leader. You can skip manual promotion and proceed to verification.
For Confluent Platform 7.9.x or 8.x without auto-join, manually promote observers:
# Promote kraftcontroller-1
kubectl exec kraftcontroller-1 -n confluent -- \
kafka-metadata-quorum \
--bootstrap-controller kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074 \
--command-config <properties-file> \
add-controller
# Promote kraftcontroller-2
kubectl exec kraftcontroller-2 -n confluent -- \
kafka-metadata-quorum \
--bootstrap-controller kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074 \
--command-config <properties-file> \
add-controller
You must ensure the following when promoting observers:
Always connect
--bootstrap-controllerto an existing voter, not another observer.Wait for each promotion to complete before promoting the next controller.
The
add-controllercommand must be run from the controller being promoted. This is required because the properties file (kafka.properties) contains thenode.idandlog.dirsfor that specific controller pod, which points to the file containing thedirectory.idfor that pod. Running this command from a different pod will result in incorrect node ID or missing directory information.
Step 6: Verify all controllers are voters
Verify that all controllers are now voters with zero lag:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-metadata-quorum --bootstrap-controller localhost:9074 \
describe --replication
Expected output:
NodeId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp Status
100 <offset> 0 <timestamp> <timestamp> Leader
101 <offset> 0 <timestamp> <timestamp> Follower
102 <offset> 0 <timestamp> <timestamp> Follower
All controllers should show:
Lag of 0, which means they are fully caught up.
Recent fetch and caught-up timestamps.
Status as Leader or Follower, indicating all are voters.
Step 7: Deploy Kafka brokers
After the KRaft controller quorum is healthy, deploy Kafka brokers.
For KRaft-enabled Kafka with dynamic quorum, add a kRaftController cluster reference in the dependencies section:
apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
name: kafka
namespace: confluent
spec:
replicas: 3
dataVolumeCapacity: 100Gi
image:
application: confluentinc/cp-server:7.9.5
init: confluentinc/confluent-init-container:3.2.0
dependencies:
kRaftController:
clusterRef:
name: kraftcontroller --- [1]
namespace: confluent --- [2]
[1] The name of the KRaftController CR.
[2] The namespace of the KRaftController CR.
When dynamicQuorumConfig.enabled: true is set on the controllers, CFK automatically configures brokers to use controller.quorum.bootstrap.servers to discover the current quorum membership dynamically.
Apply the Kafka CR:
kubectl apply -f kafka.yaml
Verify the brokers are running:
kubectl get pods -n confluent -l app=kafka
Your single-region KRaft cluster with dynamic quorum is now deployed and operational.
Verify dynamic quorum deployment
After deploying or migrating to dynamic quorum, perform the following verification steps.
Check quorum status
Verify the quorum has a leader and all expected controllers:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-metadata-quorum --bootstrap-controller localhost:9074 \
describe --status
Expected output:
ClusterId: <cluster-id>
LeaderId: <node-id>
LeaderEpoch: <epoch>
HighWatermark: <offset>
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 0
CurrentVoters: [100,101,102]
CurrentObservers: []
Verify:
LeaderIdis one of your controller node IDs.CurrentVotersincludes all expected controllers.CurrentObserversis empty, meaning all controllers have been promoted.MaxFollowerLagis 0 or very small.
Check replication status
Verify all controllers are replicating metadata with zero lag:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-metadata-quorum --bootstrap-controller localhost:9074 \
describe --replication
Expected output:
NodeId LogEndOffset Lag LastFetchTimestamp LastCaughtUpTimestamp Status
100 1234 0 <timestamp> <timestamp> Leader
101 1234 0 <timestamp> <timestamp> Follower
102 1234 0 <timestamp> <timestamp> Follower
Verify:
All controllers have the same
LogEndOffsetor are within a few offsets.All controllers have
Lagof 0.LastFetchTimestampandLastCaughtUpTimestampare recent.
Check kraft.version feature flag
Verify that dynamic quorum is enabled by checking the kraft.version feature:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-features --bootstrap-controller localhost:9074 describe
Look for the kraft.version feature in the output:
Feature: kraft.version SupportedMinVersion: 0 SupportedMaxVersion: 1 FinalizedVersionLevel: 1 Epoch: 1
The kraft.version FinalizedVersionLevel determines the quorum mode:
Version 0 = Static quorum (
controller.quorum.voters)Version 1 = Dynamic quorum (
controller.quorum.bootstrap.servers)
If FinalizedVersionLevel: 1, dynamic quorum is active.
Check controller configuration
Verify that controllers are using controller.quorum.bootstrap.servers:
kubectl exec kraftcontroller-0 -n confluent -- \
grep 'controller.quorum.bootstrap.servers' /opt/confluentinc/etc/kafka/kafka.properties
Expected output:
controller.quorum.bootstrap.servers=kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local:9074,...
This confirms dynamic quorum configuration.
Troubleshooting
This section describes common issues when deploying and operating dynamic KRaft quorum.
Issue: Bootstrap pod stuck in init container
Symptom: kraftcontroller-0 is in Init:0/1 state indefinitely. The pod does not progress to the running state.
Cause: The bootstrap controller cannot update the ConfigMap due to insufficient RBAC permissions.
Solution: Verify RBAC permissions are correctly configured:
# Check if the service account has permission to update the ConfigMap
kubectl auth can-i update configmaps \
--as=system:serviceaccount:confluent:kraftcontroller-sa \
-n confluent
# Should output: yes
# Verify Role and RoleBinding exist
kubectl get role kraftcontroller-bootstrap-role -n confluent
kubectl get rolebinding kraftcontroller-bootstrap-rolebinding -n confluent
If permissions are missing, re-apply the RBAC resources from Step 2 in Deploy a single-region cluster with dynamic quorum.
Check the init container logs for more details:
kubectl logs kraftcontroller-0 -n confluent -c config-init-container
Issue: Observers not auto-promoting (Confluent Platform 8.2 or later)
Symptom: Controllers remain as observers indefinitely, even though auto-join should be enabled in Confluent Platform 8.2 or later.
Cause: Auto-join feature is not enabled. Check the properties file if controller.quorum.auto.join.enable is defined and set to true. If not, then it can be added via configOverrides.
Solution: Check the replication status to see if observers are caught up:
kubectl exec kraftcontroller-0 -n confluent -- \
kafka-metadata-quorum --bootstrap-controller localhost:9074 \
describe --replication
Check the Lag column for observers:
If
Lag > 0, observers are still replicating metadata. Wait for replication to catch up (Lag = 0) before expecting auto-promotion.If
Lag = 0and observers still do not promote, auto-join may not be enabled. Manually promote observers using theadd-controllercommand as shown in Deploy a single-region cluster with dynamic quorum.
Note
CFK does not currently expose a configuration toggle to disable/enable auto-join. CFK attempts to detect the Confluent Platform version from the image to automatically enable auto-join for 8.2 or later. Version detection can fail if you use custom Docker images or images without standard version tags.
Issue: Observer promotion fails during migration
Symptom: The add-controller command fails with an error message about version compatibility or feature flags.
Cause: The inter-broker protocol (IBP) version is not set to 3.9, which is required for dynamic quorum (kraft.version=1).
Solution: Verify the IBP version is set correctly by checking the properties file:
kubectl exec kraftcontroller-0 -n confluent -- \
grep 'inter.broker.protocol.version' /opt/confluentinc/etc/kafka/kafka.properties
Expected output:
inter.broker.protocol.version=3.9
If the IBP version is missing or shows a different value, add or update the annotation:
kubectl annotate kafka kafka \
platform.confluent.io/kraft-migration-ibp-version=3.9 \
--overwrite \
-n confluent
After updating the annotation, restart the migration or retry observer promotion.
Known issues
Be aware of the following known issues when using dynamic KRaft quorum.
Confluent Platform version issues
Issue: Confluent Platform versions 7.9.0 through 7.9.4 have known issues with kraft.version conversion and dynamic quorum stability.
Impact: Dynamic quorum deployments or migrations may fail or behave unpredictably.
Solution: Always use Confluent Platform 7.9.5 or later or Confluent Platform 8.0 or later for dynamic quorum deployments. Do not use 7.9.0-7.9.4 for production.
LoadBalancer configuration issue
Issue: When using advertisedListenersEnabled: true with LoadBalancer on KRaft controllers within a single cluster/namespace, controllers cannot connect to themselves using their external addresses.
Impact: Controller communication fails, quorum cannot form.
Root cause: This is not a hairpin NAT issue. The networking layer works correctly, but KRaft internal logic fails when using advertised external addresses for same-cluster communication.
Workaround:
Set
advertisedListenersEnabled: trueon KRaft controllers for single-region deployments.
Auto-join version detection
Issue: CFK attempts to detect the Confluent Platform version from the Docker image to automatically enable the auto-join feature for Confluent Platform 8.2 or later. Version detection can fail if you use custom Docker images without standard version tags.
Impact: Auto-join may not be enabled even when using Confluent Platform 8.2 or later, requiring manual observer promotion.
Workaround: You should use config overrides to define controller.quorum.auto.join.enable=true.