Choose and Change the Partition Count in Kafka¶
Apache Kafka® is an open-source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale.
A topic partition is the unit of parallelism in Apache Kafka®.
For both producers and brokers, writes to different partitions can be done in parallel. This frees up hardware resources for expensive operations such as compression.
For consumers, you can have up to one consumer instance per partition (within a consumer group); any more will be idle. Therefore, the more partitions there are in a Kafka cluster, the higher the throughput for the system.
Partition formula¶
To determine the minimum number of partitions, you can use the following formula:
max(t/p, t/c) partitions, where:
- t = target throughput
- p = measured throughput on a single production partition
- c = consumption
Note that the per-partition throughput that one can achieve for a producer depends on configurations such as the batching size, compression codec, type of acknowledgment, replication factor, and more.
Consumer throughput is often application dependent since it corresponds to how fast the consumer logic can process each message.
Partitions and message keys¶
You can increase the number of partitions over time, but you have to be careful if the messages that are produced contain keys. When Kafka publishes a keyed message, the message is deterministically mapped to a partition based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition, which can be important for some applications. Remember that messages within a partition are always delivered in order to the consumer. If the number of partitions changes, this delivery guarantee may no longer hold. To avoid this situation, you can purposefully over partition, meaning you create partitions based on the future growth of your application instead of the current need. [1]
Change the number of partitions¶
You can increase the number of partitions for an existing topic on a per-topic basis. Use the kafka-topics.sh CLI tool with a command similar to the following:
bin/kafka-topics.sh --bootstrap-server <broker:port> --topic <topic-name> --alter --partitions <number>
For more information on how to use the kafka-topic tool, see kafka-topics.sh and Change the retention value for a topic.
Tip
Currently each cluster can safely have up to 200,000 partitions. In the near future, when KRaft is available for production use, a Kafka cluster can potentially support up to 2 million partitions.