Understand and Monitor Consumer Lag

Consumer lag refers to the delay between the production and consumption of messages in Apache Kafka®, which can have a significant impact on the overall performance of your system. Specifically consumer lag refers to the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer. This is the number of messages that are waiting to be consumed by a consumer.

To reduce Kafka consumer lag, it is important to ensure that the Kafka cluster and consumers are properly configured and that the system is optimized for performance. This includes monitoring the system for performance issues, tuning configuration settings, and upgrading hardware as needed.

Causes of consumer lag

Consumer lag causes can be categorized into two main areas: configuration and performance.

Following is a list of configuration issues that can occur and cause consumer lag.

  • Consumer group configuration: A misconfigured consumer group can lead to uneven distribution of messages across consumers, leading to consumer lag. To learn more about consumer configuration, see Kafka Consumer.
  • Topic configuration: Configuring a topic with too few partitions or too low a replication factor can lead to consumer lag.
  • Consumer configuration: Misconfigured consumer properties such as fetch.max.bytes, fetch.min.bytes, fetch.max.wait.ms, and max.poll.records can lead to consumer lag.

Following is a list of performance issues that can lead to consumer lag.

  • Network latency: High network latency between the Kafka cluster and the consumer can cause consumer lag.
  • Message size: Large message sizes can cause consumer lag, especially if the consumer is not configured to handle large messages.
  • Slow consumers: Slow consumers that take a long time to process messages can cause consumer lag.
  • High message throughput: High message throughput can overwhelm consumers, causing consumer lag.

Monitor consumer lag

Monitoring consumer lag is essential to help ensure the smooth functioning of your Kafka cluster. Consumer lag is a combination of both offset lag and consumer latency, and can be monitored using Confluent Control Center and using JMX metrics starting in Confluent Platform 7.5.

Use JMX metrics to monitor offset lag. To monitor offset lag, do the following:

  1. Enable consumer lag monitoring on brokers by enabling the consumer lag emitter. Enable the consumer lag emitter with the with the following properties:

    • Set confluent.consumer.lag.emitter.enabled value to true. This property enables consumer lag monitoring, and is false by default.
    • Set confluent.consumer.lag.emitter.interval.ms to the desired interval in milliseconds. The default for this property is 60000 milliseconds, which equals 1 minute.

    You set these properties in broker properties file. For cluster running in ZooKeeper mode, the properties file can be found under $CONFLUENT_HOME/etc/kafka/. For a cluster running in KRaft mode, the file can be found under $CONFLUENT_HOME/etc/kafka/kraft/.

  2. Monitor the consumer lag metric:

    Once Confluent Platform is running, you can monitor the consumer-lag-offset MBean, which provides the difference between the last offset stored by the broker and the last committed offset for that consumer group, client, topic and partition. Note that the emitter does not take latency, meaning the time since the last record was fetched, into account but provides the consumer lag in offsets only.

    To learn more about JMX monitoring and Kafka, see Monitoring Kafka with JMX.

You can additionally monitor consumer latency, by using Confluent Control Center, if you are running in normal mode. For more information on how to use Control Center for consumer latency monitoring, see Control Center Alerts Overview and Create a consumer group trigger for consumer lag.