Kafka Producer Design¶
The producer works by writing directly to a Apache Kafka® broker. The sections in this topic describe how producers are designed to enable load balancing and asynchronous send operations.
Producer load balancing¶
The client producer controls which partition it publishes messages. Also, the producer sends data directly to the broker that is the leader for a partition without any intervening routing tier. To help a producer do this, all Kafka nodes provide metadata that specifies the brokers that are alive, and the brokers that are leaders for the partitions of a topic. This enables a producer to appropriately route its requests.
Load balancing can be random, or you can apply a semantic partitioning function. You can specify a key to partition by, and Kafka uses that key to hash to a partition.
There is also an option to override the partition function if needed.
For example, if the specified key is
user id then all data for a given user would be sent to the same
partition. This in turn will enables consumers to make locality assumptions about their consumption.
This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers.
Batching enables efficiency, and to enable batching the Kafka producer tries to accumulate data in memory, and sends larger batches in a single request. The batching can be configured:
- By batch size (example: 64 kb)
- By wait time (example: 10 ms)
This batching enable the accumulation of more bytes to send, and a few larger I/O operations on the servers. It also provides a mechanism to trade a small amount of additional latency for better throughput.