Optimizing for Latency¶
Apache Kafka® provides very low end-to-end latency for large volumes of data. This means the amount of time it takes for a record that is produced to Kafka to be fetched by the consumer is short.
If you’re using a dedicated cluster, adding additional CKUs can reduce latency. Other relevant variables that affect end-to-end latency include the implementation of client apps, partitioning and keying strategy, produce and consume patterns, network latency and QoS, and more. Clusters in Confluent Cloud are already configured for low latency and there are many ways to configure your client applications to reduce latency.
This Confluent blog post is a comprehensive guide to measuring and improving latency for your streaming applications, and this whitepaper details more of the tradeoffs between latency, throughput, durability, and availability.
Key Concepts and Configuratons¶
Many of the Kafka configuration parameters discussed on the Optimizing for Throughput page have default settings that optimize for latency. Thus, you generally don’t need to adjust those configuration parameters. This page includes a review of the key parameters to understand how they work.
The values for some of the configuration parameters in this page depend on other factors, such as the average message size and number of partitions. These can differ greatly from environment to environment.
Some configuration parameters have a range of values, so benchmarking helps to validate the configuration for your application and environment.
Number of Partitions¶
The Confluent guidelines show you how to choose the number of partitions. Since a partition is a unit of parallelism in Kafka, an increased number of partitions may increase throughput.
There is a trade-off for an increased number of partitions, and that’s increased latency. It may take longer to replicate several partitions shared between each pair of brokers and consequently take longer for messages to be considered committed. No message can be consumed until it is committed, so this can ultimately increase end-to-end latency.
Producers automatically batch messages, which means they collect messages to
send together. If there is less time given waiting for those batches to fill,
then there is less latency producing data to Confluent Cloud. By default, the producer
is tuned for low latency and the configuration parameter
linger.ms is set to
0–the producer sends data as soon as it has data to send. In this case, it isn’t
true that batching is disabled—messages are always sent in batches—but sometimes
a batch may have only one message (unless messages are passed to the producer
faster than the producer can send them).
Consider whether you need to enable compression. Enabling compression typically
requires more CPU cycles, but reduces network bandwidth usage. Disabling
compression, on the other hand, spares the CPU cycles but increases network
bandwidth usage. Depending on the compression performance, you may consider
leaving compression disabled with
compression.type=none to spare the CPU
cycles, although a good compression codec may potentially reduce latency as
You can tune the number of acknowledgments the producer requires the leader broker in the Confluent Cloud cluster to have received before it considers a request complete.
This acknowledgment to the producer differs from when a message is considered committed.
The sooner the leader broker responds, the sooner the producer can continue
sending the next batch of messages, which reduces producer latency. You can set
the number of required acknowledgments with the producer
parameter. By default,
acks=1, which means the leader broker responds sooner
to the producer before all replicas have received the message. Depending on your
application requirements, you can set
acks=0 so that the producer won’t wait
for a response for a producer request from the broker, but then messages can
potentially get lost without the producer knowing.
Similar to the batching concept on the producers, you can tune consumers for
lower latency by adjusting how much data a consumer gets from each fetch from
the leader broker in Confluent Cloud. By default, the consumer configuration parameter
fetch.min.bytes is set to
1, which means that fetch requests are
answered as soon as a single byte of data is available or the fetch request
times out waiting for data to arrive–that is, the configuration parameter
fetch.max.wait.ms. Looking at these two configuration parameters together
lets you reason through the size of the fetch request (
the age of a fetch request (
Using Local Tables¶
If you have a Kafka event streaming application or use ksqlDB, there are some performance enhancements you can make within the application. For cases where you must perform table lookups at a large scale and with a low processing latency, you can use local stream processing. A popular method is to use Kafka Connect to make remote databases available local to Kafka. Then you can leverage the Kafka Streams API or ksqlDB to perform fast and efficient local joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record. You can track the latest state of each table in a local state store, which reduces the processing latency and the load of remote databases when doing such joins.
Kafka Streams applications are founded on processor topologies, a graph of stream
processor nodes that can act on partitioned data for parallel processing.
Depending on the application, there may be conservative but unnecessary data
shuffling based on repartition topics, which would not result in any correctness
issues but can introduce performance penalties. To avoid performance penalties,
you may enable topology optimizations for
your event streaming applications by setting the configuration parameter
topology.optimization. Enabling topology optimizations may reduce the amount
of reshuffled streams that are stored and piped through repartition topics.
Summary of Configurations for Optimizing Latency¶
none, meaning no compression)
- Streams applications have embedded producers and consumers, so also check those configuration recommendations