Optimizing for Latency
Apache Kafka® provides very low end-to-end latency for large volumes of data. This means the amount
of time it takes for a record that is produced to Kafka to be fetched by the consumer is short.
If you’re using a dedicated cluster, adding additional CKUs can reduce latency. Other relevant
variables that affect end-to-end latency include the implementation of client apps, partitioning
and keying strategy, produce and consume patterns, network latency and QoS, and more. Clusters in
Confluent Cloud are already configured for low latency and there are many ways to
configure your client applications to reduce latency.
This Confluent blog post is
a comprehensive guide to measuring and improving latency for your streaming applications, and
details more of the tradeoffs between latency, throughput, durability, and availability.
Key Concepts and Configuratons
Many of the Kafka configuration parameters discussed on the
Optimizing for Throughput page have default settings that optimize for
latency. Thus, you generally don’t need to adjust those configuration
parameters. This page includes a review of the key parameters to understand how
The values for some of the configuration parameters in this page depend on
other factors, such as the average message size and number of partitions.
These can differ greatly from environment to environment.
Some configuration parameters have a range of values, so benchmarking
helps to validate the configuration for your application and environment.
Number of Partitions
The Confluent guidelines
show you how to choose the number of partitions. Since a partition is a unit of
parallelism in Kafka, an increased number of partitions may increase throughput.
There is a trade-off for an increased number of partitions, and that’s increased
latency. It may take longer to replicate several partitions shared between each
pair of brokers and consequently take longer for messages to be considered
committed. No message can be consumed until it is committed, so this can
ultimately increase end-to-end latency.
Producers automatically batch messages, which means they collect messages to
send together. If there is less time given waiting for those batches to fill,
then there is less latency producing data to Confluent Cloud. By default, the producer
is tuned for low latency and the configuration parameter
linger.ms is set to
0–the producer sends data as soon as it has data to send. In this case, it isn’t
true that batching is disabled—messages are always sent in batches—but sometimes
a batch may have only one message (unless messages are passed to the producer
faster than the producer can send them).
Consider whether you need to enable compression. Enabling compression typically
requires more CPU cycles, but reduces network bandwidth usage. Disabling
compression, on the other hand, spares the CPU cycles but increases network
bandwidth usage. Depending on the compression performance, you may consider
leaving compression disabled with
compression.type=none to spare the CPU
cycles, although a good compression codec may potentially reduce latency as
You can tune the number of acknowledgments the producer requires the leader
broker in the Confluent Cloud cluster to have received before it considers a request
This acknowledgment to the producer differs from when a message is
The sooner the leader broker responds, the sooner the producer can continue
sending the next batch of messages, which reduces producer latency. You can set
the number of required acknowledgments with the producer
parameter. By default,
acks=1, which means the leader broker responds sooner
to the producer before all replicas have received the message. Depending on your
application requirements, you can set
acks=0 so that the producer won’t wait
for a response for a producer request from the broker, but then messages can
potentially get lost without the producer knowing.
Similar to the batching concept on the producers, you can tune consumers for
lower latency by adjusting how much data a consumer gets from each fetch from
the leader broker in Confluent Cloud. By default, the consumer configuration parameter
fetch.min.bytes is set to
1, which means that fetch requests are
answered as soon as a single byte of data is available or the fetch request
times out waiting for data to arrive–that is, the configuration parameter
fetch.max.wait.ms. Looking at these two configuration parameters together
lets you reason through the size of the fetch request (
the age of a fetch request (
Using Local Tables
If you have a Kafka event streaming application or use
ksqlDB, there are some performance enhancements you can
make within the application. For cases where you must perform table lookups at a
large scale and with a low processing latency, you can use local stream
processing. A popular method is to use Kafka Connect to make remote databases
available local to Kafka. Then you can leverage the Kafka Streams API or ksqlDB to
perform fast and efficient local joins of such tables and streams,
rather than requiring the application to make a query to a remote database over
the network for each record. You can track the latest state of each table in a
local state store, which reduces the processing latency and the load of remote
databases when doing such joins.
Kafka Streams applications are founded on processor topologies, a graph of stream
processor nodes that can act on partitioned data for parallel processing.
Depending on the application, there may be conservative but unnecessary data
shuffling based on repartition topics, which would not result in any correctness
issues but can introduce performance penalties. To avoid performance penalties,
you may enable topology optimizations for
your event streaming applications by setting the configuration parameter
topology.optimization. Enabling topology optimizations may reduce the amount
of reshuffled streams that are stored and piped through repartition topics.
Summary of Configurations for Optimizing Latency
linger.ms=0 (default 0)
none, meaning no compression)
acks=1 (default 1)
fetch.min.bytes=1 (default 1)
- Streams applications have embedded producers and consumers, so also
check those configuration recommendations