Kafka Producer Design

A Apache Kafka® producer works by writing directly to a Kafka broker. The sections in this topic describe how producers are designed to enable load balancing and asynchronous send operations.

Producer load balancing

The client producer controls which partition it publishes messages to. Also, the producer sends data directly to the broker that is the partition leader without any intervening routing tier. To help a producer do this, all Kafka nodes provide metadata that specifies the brokers that are alive, and the brokers that are leaders for the partitions of a topic. This enables a producer to appropriately route its requests.

Load balancing can be random, or you can apply a semantic partitioning function. You can specify a key to partition by, and Kafka uses that key to hash to a partition.

There is also an option to override the partition function if needed. For example, if the specified key is user id then all data for a given user would be sent to the same partition. This in turn will enables consumers to make locality assumptions about their consumption. This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers.

Batching

Batching enables efficiency, and to enable batching the Kafka producer tries to accumulate data in memory, and sends larger batches in a single request. The batching can be configured:

  • By batch size (example: 64 kb). Use the batch.size property to set this.
  • By wait time (example: 10 ms). Use the linger.ms property to set this.

This batching enable the accumulation of more bytes to send, and a few larger I/O operations on the servers. It also provides a mechanism to trade a small amount of additional latency for better throughput.

Additional reading

Note

This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2.