Optimize Confluent Cloud Clients for Latency

Apache Kafka® provides very low end-to-end latency for large volumes of data. This means the amount of time it takes for a record that is produced to Kafka to be fetched by the consumer is short.

If you’re using a dedicated cluster, adding additional CKUs can reduce latency. Other relevant variables that affect end-to-end latency include the implementation of client apps, partitioning and keying strategy, produce and consume patterns, network latency and QoS, and more. Clusters in Confluent Cloud are already configured for low latency and there are many ways to configure your client applications to reduce latency.

Configure Kafka to Minimize Latency (blog post) provides a comprehensive guide to measuring and improving latency for your streaming applications, and Optimizing Your Apache Kafka Deployment (whitepaper) details more of the tradeoffs between latency, throughput, durability, and availability.

Key Concepts and Configuratons

Many of the Kafka configuration parameters discussed on the Optimize Confluent Cloud Clients for Throughput page have default settings that optimize for latency so typically you don’t need to adjust those configuration parameters. This page includes a review of the key parameters to understand how they work.

Some configuration parameters in this page have a range of values. How you set them depends on your requirements and other factors, such as the average message size, number of partitions, and other differences in the environment. So, benchmarking helps to validate the configuration for your application and environment.

Number of Partitions

The Confluent guidelines show you how to choose the number of partitions. Since a partition is a unit of parallelism in Kafka, an increased number of partitions may increase throughput.

There is a trade-off for an increased number of partitions, and that’s increased latency. It may take longer to replicate several partitions shared between each pair of brokers and consequently take longer for messages to be considered committed. No message can be consumed until it is committed, so this can ultimately increase end-to-end latency.

Batching Messages

Producers automatically batch messages, which means they collect messages to send together. If there is less time given waiting for those batches to fill, then there is less latency producing data to Confluent Cloud. By default, the producer is tuned for low latency and the configuration parameter linger.ms is set to 0–the producer sends data as soon as it has data to send. In this case, it isn’t true that batching is disabled—messages are always sent in batches—but sometimes a batch may have only one message (unless messages are passed to the producer faster than the producer can send them).

Compression

Consider whether you need to enable compression. Enabling compression typically requires more CPU cycles, but reduces network bandwidth usage. Disabling compression, on the other hand, spares the CPU cycles but increases network bandwidth usage. Depending on the compression performance, you may consider leaving compression disabled with compression.type=none to spare the CPU cycles, although a good compression codec may potentially reduce latency as well.

Producer acks

You can tune the number of acknowledgments the producer requires the leader broker in the Confluent Cloud cluster to have received before it considers a request complete.

Note

This acknowledgment to the producer differs from when a message is considered committed.

The sooner the leader broker responds, the sooner the producer can continue sending the next batch of messages, which reduces producer latency. You can set the number of required acknowledgments with the producer acks configuration parameter. By default, acks=1, which means the leader broker responds sooner to the producer before all replicas have received the message. Depending on your application requirements, you can set acks=0 so that the producer won’t wait for a response for a producer request from the broker, but then messages can potentially get lost without the producer knowing.

Consumer Fetching

Similar to the batching concept on the producers, you can tune consumers for lower latency by adjusting how much data a consumer gets from each fetch from the leader broker in Confluent Cloud. By default, the consumer configuration parameter fetch.min.bytes is set to 1, which means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive–that is, the configuration parameter fetch.max.wait.ms. Looking at these two configuration parameters together lets you reason through the size of the fetch request (fetch.min.bytes) or the age of a fetch request (fetch.max.wait.ms).

Using Local Tables

If you have a Kafka Streams application or use ksqlDB, there are some performance enhancements you can make within the application. For cases where you must perform table lookups at a large scale and with a low processing latency, you can use local stream processing. A popular method is to use Kafka Connect to make remote databases available local to Kafka. Then you can leverage the Kafka Streams API or ksqlDB to perform fast and efficient local joins of such tables and streams, rather than requiring the application to make a query to a remote database over the network for each record. You can track the latest state of each table in a local state store, which reduces the processing latency and the load of remote databases when doing such joins.

Topology Optimization

Kafka Streams applications are founded on processor topologies, a graph of stream processor nodes that can act on partitioned data for parallel processing. Depending on the application, there may be conservative but unnecessary data shuffling based on repartition topics, which would not result in any correctness issues but can introduce performance penalties. To avoid performance penalties, you may enable topology optimizations for your event streaming applications by setting the configuration parameter topology.optimization. Enabling topology optimizations may reduce the amount of reshuffled streams that are stored and piped through repartition topics.

Summary of Configurations for Optimizing Latency

Producer

  • linger.ms=0 (default 0)
  • compression.type=none (default none, meaning no compression)
  • acks=1 (default: all - prior to Kafka 3.0: 1)

Consumer

  • fetch.min.bytes=1 (default 1)

Streams

  • StreamsConfig.TOPOLOGY_OPTIMIZATION: StreamsConfig.OPTIMIZE (default StreamsConfig.NO_OPTIMIZATION)
  • Streams applications have embedded producers and consumers, so also check those configuration recommendations