KRaft Overview

Previously, when you ran Confluent Platform, you also always ran ZooKeeper for metadata storage. Starting with Confluent Platform version 7.4, KRaft (pronounced craft) mode is in general availability and the default option for metadata management for new Apache Kafka® clusters.

KRaft mode provides a quorum controller service that uses an event-based variant of the Raft consensus protocol running in Kafka. This simplifies Kafka’s archictecture because it handles metadata with the quorom controller service, rather than splitting the responsibility between ZooKeeper and Kafka.

The following image provides a simple illustration of the difference between running with ZooKeeper and KRaft for managing metadata.


To learn more about KRaft mode and ZooKeeper-less Kafka, see A First Glimpse of a Kafka without Zookeeper.

Why move to KRaft?

Moving to KRaft for metadata storage simplifies Kafka. ZooKeeper is a separate system, with its own configuration file syntax, management tools, and deployment patterns, which makes deploying Kafka more complicated for system administrators. Specifically:

  • Performance is better running in KRaft mode versus with ZooKeeper and KRaft mode supports millions of partitions compared to Kafka running with ZooKeeper.
  • Metadata failover is near-instantaneous with KRaft.
  • KRaft enables Kafka to have a single security system

The controller quorum

The controller nodes comprise a Raft quorum which manages the metadata log. This log contains information about each change to the cluster metadata. Everything that is currently stored in ZooKeeper, such as topics, partitions, ISRs, configurations, and so on, is stored in this log.

Using the Raft consensus protocol, the controller nodes elect one of their nodes as a leader, without relying on any external system. The leader of the metadata log is called the active controller. The active controller handles all RPCs made from the brokers. The follower controllers replicate the data which is written to the active controller, and serve as hot standbys if the active controller should fail. Because the controllers will now all track the latest state, controller failover does not require a lengthy reloading period where all the of state data is transferred to the new controller.

Just like ZooKeeper, Raft requires a majority of nodes to be running. For example, a three-node controller cluster can survive one failure. A five-node controller cluster can survive two failures, and so on.

Periodically, the controllers will write out a snapshot of the metadata to disk. This is conceptually similar to compaction, but state is read from memory rather than re-reading the log from disk.

Configuring Confluent Platform with KRaft

For details on how to configure Confluent Platform with KRaft, see Configure KRaft in Production.

Client configurations are not impacted by Confluent Platform moving to KRaft to manage metadata.

Limitations and known issues

  • Currently, migration from ZooKeeper to KRaft is is not supported for Confluent Platform. You should choose how metadata is managed when you create a Kafka cluster.
  • Combined mode, where a broker is also a controller, is not currently supported for production workloads. There are key security and feature gaps between combined mode and isolated mode in Confluent Platform.
  • The Confluent Platform Multi-Region Clusters feature currently requires ZooKeeper, and so does not currently work with KRaft.
  • JBOD (just a bunch of disks) is not supported in KRaft mode, meaning you can only configure one directory for log.dirs configuration.
  • Cluster Linking for Confluent Platform between a source cluster running Confluent Platform 7.0.x or earlier and a destination cluster running in KRaft mode is not supported. Link creation may succeed, but the connection will ultimately fail. To work around this issue, make sure the source cluster is running Confluent Platform version 7.1.0 or later. If you have links from a Confluent Platform source cluster to a Confluent Cloud destination cluster, you must upgrade your source clusters to Confluent Platform 7.1.0 or later to avoid this issue.
  • Authentication using delegation tokens or SASL SCRAM (Salted Challenge Response Authentication Mechanism) is currently not supported.
  • There is currently no support for quorum reconfiguration, meaning you cannot add more KRaft controllers, or remove existing ones.
  • You cannot currently use Topic ACL Authorizer for Schema Registry with Confluent Platform in KRaft mode. As an alternative, you can use Schema Registry ACL Authorizer or Configuring Role-Based Access Control for Schema Registry.
  • Currently, Confluent Control Center and Health+ report KRaft controllers as brokers and as a result, alerts may not function as expected.