Monitoring Kafka with JMX
Apache Kafka® brokers and clients report many internal metrics. JMX is the default reporter, though you can add any pluggable reporter. See additional topics about monitoring in the Related content section.
Tip
Confluent offers some alternatives to using JMX monitoring.
Health+: Consider monitoring and managing your environment with Confluent Health+. Ensure the health of your clusters and minimize business disruption with intelligent alerts, monitoring, and proactive support based on best practices created by the inventors of Kafka.
Confluent Control Center: You can deploy Control Center for out-of-the-box Kafka cluster monitoring so you don’t have to build your own monitoring system. Control Center (Legacy) makes it easy to manage the entire cp deployment. Control Center (Legacy) is a web-based application that allows you to manage your cluster and to alert on triggers. Additionally, Control Center (Legacy) measures how long messages take to be delivered, and determines the source of any issues in your cluster.
Server metrics
Broker metrics
There are many metrics reported at the broker level that can be monitored and used to troubleshoot issues with your cluster. At minimum, you should monitor and set alerts on ActiveControllerCount, OfflinePartitionsCount, and UncleanLeaderElectionsPerSec.
kafka.server:type=KafkaServer,name=linux-disk-read-bytesThe total number of bytes read by the broker process, including reads from all disks. The total doesn’t include reads from page cache. Available only on Linux-based systems.
kafka.server:type=KafkaServer,name=linux-disk-write-bytesThe total number of bytes written by the broker process, including writes from all disks. Available only on Linux-based systems.
kafka.server:type=ReplicaManager,name=PartitionsWithLateTransactionsCountNumber of partitions that have open transactions with durations exceeding the
transaction.max.timeout.msproperty value set on the broker.kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCountNumber of partitions whose in-sync replicas count is less than minIsr.
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionsNumber of under-replicated partitions (| ISR | < | current replicas |). Replicas that are added as part of a reassignment will not count toward this value. Alert if value is greater than 0.
kafka.server:type=ReplicaManager,name=ReassigningPartitionsNumber of reassigning partitions.
kafka.cluster:type=Partition,topic={topic},name=UnderMinIsr,partition={partition}Number of partitions whose in-sync replicas count is less than
minIsr. These partitions will be unavailable to producers who useacks=all.kafka.cluster:type=Partition,topic={topic},name=InSyncReplicasCount,partition={partition}A gauge metric that indicates the in-sync replica count per topic partition leader.
kafka.cluster:type=Partition,topic={topic}name=AtMinIsr,partition={partition}Number of partitions whose in-sync replicas count is equal to the
minIsrvalue.kafka.cluster:type=Partition,topic={topic}:name=ReplicasCount,partition={partition}A gauge metric that indicates the replica count per topic partition leader.
kafka.cluster:type=Partition,topic={topic}::name=UnderReplicated,partition={partition}Number of partitions that are under replicated per the
minIsrvalue.kafka.controller:type=KafkaController,name=OfflinePartitionsCount,partition={partition}Number of partitions that don’t have an active leader and are hence not writable or readable. Alert if value is greater than 0.
kafka.controller:type=KafkaController,name=ActiveControllerCountNumber of active controllers in the cluster. Alert if the aggregated sum across all brokers in the cluster is anything other than 1 because there should be exactly one controller per cluster.
kafka.controller:type=KafkaController,name=GlobalPartitionCountNumber of partitions across all topics in the cluster.
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSecByte-in rate from clients.
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSecByte-out rate from clients.
kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSecByte-in rate from other brokers.
kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSecByte-out rate to other brokers.
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower},version=([0-9]+)The request rate.
versionrefers to the API version of the request type. To get the total count for a specific request type, make sure that JMX monitoring tools aggregate across different versions.kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSecProduce request rate.
kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSecFetch request rate.
kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSecProduce request rate for requests that failed.
kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSecFetch request rate for requests that failed.
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSecIncoming byte rate of reassignment traffic.
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSecOutgoing byte rate of reassignment traffic.
kafka.server:type=BrokerTopicMetrics,name={Produce|Fetch}MessageConversionsPerSec,topic=([-.\w]+)Message format conversion rate, for Produce or Fetch requests, per topic. Omitting “topic={…}” will yield the all-topic rate.
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMsLeader election rate and latency.
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSecUnclean leader election rate.
kafka.server:type=ReplicaManager,name=PartitionCountNumber of partitions on this broker. This should be mostly even across all brokers.
kafka.server:type=ReplicaManager,name=LeaderCountNumber of leaders on this broker. This should be mostly even across all brokers. If not, set
auto.leader.rebalance.enabletotrueon all brokers in the cluster.kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercentAverage fraction of time the request handler threads are idle. Values are between
0(all resources are used) and1(all resources are available).kafka.server:type=Produce,name=DelayQueueSizeNumber of producer clients currently being throttled. The value can be any number greater than or equal to
0.Important
For monitoring quota applications and throttled clients, use the
kafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+)andkafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+)metrics.kafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+)Bandwidth quota. This metric has the following attributes:
throttle-time: the amount of time in ms the client was throttled. Ideally = 0.byte-rate: the data produce/consume rate of the client in bytes/sec.For (
user,client-id) quotas, specify bothuserandclient-id.If per-client-id quota is applied to the client, do not specify
user.If per-user quota is applied, do not specify
client-id.
kafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+)Request quota. This metric has the following attributes:
throttle-time: the amount of time in ms the client was throttled. Ideally = 0.request-time: the percentage of time spent in broker network and I/O threads to process requests from client group.For (
user,client-id) quotas, specify bothuserandclient-id.If per-client-id quota is applied to the client, do not specify
user.If per-user quota is applied, do not specify
client-id.
kafka.server:clientSoftwareName=(client-software-name),clientSoftwareVersion=(client-software-version),listener=(listener),networkProcessor=(processor-index),type=(type)Name and version of client software in the brokers. For example, the Kafka 2.4 Java client produces the following MBean on the broker:
kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercentAverage fraction of time the network processor threads are idle. Values are between
0(all resources are used) and1(all resources are available).kafka.network:type=RequestChannel,name=RequestQueueSizeSize of the request queue. A congested request queue will not be able to process incoming or outgoing requests.
kafka.network:type=RequestChannel,name=ResponseQueueSizeSize of the response queue. The response queue is unbounded. A congested response queue can result in delayed response times and memory pressure on the broker.
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-countNumber of currently open connections to the broker.
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-creation-rateNumber of new connections established per second.
kafka.network:type=RequestMetrics,name=MessageConversionsTimeMs,request={Produce|Fetch}Time in milliseconds spent on message format conversions.
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}Total time in milliseconds to serve the specified request.
kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}Time the request waits in the request queue.
kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}Time the request is processed at the leader.
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}Time the request waits for the follower. This is non-zero for produce requests when
acks=all.kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}Time the request waits in the response queue.
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}Time to send the response.
The following are some additional metrics you can optionally observe on a Kafka broker:
kafka.server:type=DelayedOperationPurgatory,delayedOperation=Produce,name=PurgatorySizeNumber of requests waiting in the producer purgatory. This should be non-zero when
acks=allis used on the producer.kafka.server:type=DelayedOperationPurgatory,delayedOperation=Fetch,name=PurgatorySizeNumber of requests waiting in the fetch purgatory. This is high if consumers use a large value for
fetch.wait.max.mskafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)Lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader and if the associated brokers need to be removed from the In-Sync Replicas list.
Important
This metric is internal to the cluster and does not represent the Kafka client application’s consumer lag.
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMsLog flush rate and time.
kafka.server:type=ReplicaManager,name=IsrShrinksPerSecIf a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
kafka.server:type=ReplicaManager,name=IsrExpandsPerSecWhen a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.
The following log segment metrics are only available on Confluent Server:
kafka.log:type=SegmentStats,name=SegmentAppendTimeMsThe time in milliseconds to append a record to the log segment.
kafka.log:type=SegmentStats,name=OffsetIndexAppendTimeMsThe time in milliseconds to append an entry to the log segment offset index. The offset index maps from logical offsets to physical file positions.
kafka.log:type=SegmentStats,name=TimestampIndexAppendTimeMsThe time in milliseconds to append an entry to the log segment timestamp index.
Controller metrics
For metrics to monitor Kafka when running in KRaft mode, see Monitor KRaft.
ZooKeeper metrics
ZooKeeper state transition counts are exposed as metrics, which can help to spot problems with your cluster. For example, such as broker connections to ZooKeeper. The metrics show the transitions rate per second for each one of the possible states. Here is the list of the counters we expose, one for each possible ZooKeeper client state.
kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSecThe ZooKeeper client is currently disconnected from the ensemble. The client lost its previous connection to a server and it is currently trying to reconnect. The session is not necessarily expired. Note that this metric tells you if the broker is disconnecting, but not if ZooKeeper is down. If you are checking system health,
ZooKeeperExpiresPerSecis a better metric to help you determine this.kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSecThe ZooKeeper session has expired. Note that when a session expires it could result in leader changes or possibly a new controller. You should monitor the number of these events across a Kafka cluster and if the overall number is high, you can do the following:
Check the health of your network
Check for garbage collection issues and tune it accordingly
If necessary, increase the session time out by setting the value of
zookeeper.session.timeout.ms.
Following are additional ZooKeeper metrics you can optionally observe on a Kafka broker.
kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSecZooKeeper client is connected to the ensemble and ready to execute operations.
kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSecAn attempt to connect to the ensemble failed because the client has not provided correct credentials.
kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSecThe server the client is connected to is currently LOOKING, which means that it is neither FOLLOWING nor LEADING. Consequently, the client can only read the ZooKeeper state, but not make any changes (create, delete, or set the data of znodes).
kafka.server:type=SessionExpireListener,name=ZooKeeperSaslAuthenticationsPerSecClient has successfully authenticated.
kafka.server:type=SessionExpireListener,name=SessionStateConnection status of broker’s ZooKeeper session. Expected value is
CONNECTED.
Producer metrics
Starting with Kafka version 3.1.1, the producer exposes the following metrics.
MBean: kafka.producer:type=producer-metrics,client-id=([-.\w]+)
waiting-threadsThe number of user threads blocked waiting for buffer memory to enqueue their records.
buffer-total-bytesThe maximum amount of buffer memory the client can use whether or not it is available.
buffer-available-bytesThe total amount of buffer memory that is not being used, either unallocated or in the free list.
bufferpool-wait-timeThe fraction of time an appender waits for space allocation.
bufferpool-wait-time-ns-totalThe total time in nanoseconds an appender waits for space allocation in nanoseconds.
flush-time-ns-totalThe total time in nanoseconds the producer spent in
Producer.flush.txn-init-time-ns-totalThe total time in nanoseconds that the producer spent initializing transactions for exactly-once semantics.
txn-begin-time-ns-totalThe total time in nanoseconds the producer spent in
beginTransactionfor exactly-once semantics.txn-send-offsets-time-ns-totalThe total time the producer spent sending offsets to transactions in nanoseconds for exactly-once semantics.
txn-commit-time-ns-totalThe total time in nanoseconds the producer spent committing transactions for exactly-once semantics.
txn-abort-time-ns-totalThe total time in nanoseconds the producer spent aborting transactions for exactly-once semantics.
Global Request Metrics
Starting with 0.8.2, the producer exposes the following metrics:
MBean: kafka.producer:type=producer-metrics,client-id=([-.\w]+)
batch-size-avgThe average number of bytes sent per partition per-request.
batch-size-maxThe max number of bytes sent per partition per-request.
batch-split-rateThe average number of batch splits per second.
batch-split-totalThe total number of batch splits.
compression-rate-avgThe average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size.
incoming-byte-rateThe average number of incoming bytes received per second from all servers.
metadata-ageThe age, in seconds, of the current producer metadata being used.
outgoing-byte-rateThe average number of bytes sent per second to all servers.
produce-throttle-time-avgThe average time in milliseconds a request was throttled by a broker.
produce-throttle-time-maxThe maximum time in milliseconds a request was throttled by a broker.
record-error-rateThe average per-second number of record sends that resulted in errors.
record-error-totalThe total number of record sends that resulted in errors.
record-queue-time-avgThe average time in milliseconds record batches spent in the send buffer.
record-queue-time-maxThe maximum time in milliseconds record batches spent in the send buffer.
record-retry-rateThe average per-second number of retried record sends.
record-retry-totalThe total number of retried record sends.
record-send-rateThe average number of records sent per second.
record-send-totalThe total number of records sent.
record-size-avgThe average record size.
record-size-maxThe maximum record size.
records-per-request-avgThe average number of records per request.
request-rateThe average number of requests sent per second.
requests-in-flightThe current number of in-flight requests awaiting a response.
Global connection metrics
MBean: kafka.producer:type=producer-metrics,client-id=([-.\w]+)
bufferpool-wait-time-ns-totalThe total time in nanoseconds a producer waits for space allocation.
connection-close-rateConnections closed per second in the window.
connection-countThe current number of active connections.
connection-creation-rateNew connections established per second in the window.
io-ratioThe fraction of time the I/O thread spent doing I/O.
io-time-ns-avgThe average length of time for I/O per select call in nanoseconds.
io-time-ns-totalThe total time the I/O thread spent doing I/O in nanoseconds.
io-wait-ratioThe fraction of time the I/O thread spent waiting.
io-wait-time-ns-avgThe average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.
io-wait-time-ns-totalThe total length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.
select-rateNumber of times the I/O layer checked for new I/O to perform per second.
Per-Broker Metrics
MBean: kafka.producer:type=producer-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
Besides the producer global request metrics, the following metrics are also available per broker:
incoming-byte-rateThe average number of bytes received per second from the broker.
outgoing-byte-rateThe average number of bytes sent per second to the broker.
request-size-avgThe average size of all requests in the window for a broker.
request-size-maxThe maximum size of any request sent in the window for a broker.
request-rateThe average number of requests sent per second to the broker.
response-rateThe average number of responses received per second from the broker.
Per-topic metrics
MBean: kafka.producer:type=producer-topic-metrics,client-id=”{client-id}”,topic=”{topic}”
Besides the producer global request metrics, the following metrics are also available per topic:
byte-rateThe average number of bytes sent per second for a topic.
byte-totalThe total number of bytes sent for a topic.
compression-rateThe average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size divided by the uncompressed size.
record-error-rateThe average per-second number of record sends that resulted in errors for a topic.
record-error-totalThe total number of record sends that resulted in errors for a topic.
record-retry-rateThe average per-second number of retried record sends for a topic.
record-retry-totalThe total number of retried record sends for a topic.
record-send-rateThe average number of records sent per second for a topic.
record-send-totalThe total number of records sent for a topic.
Audit metrics
confluent-audit-metrics:name=audit-log-rate-per-minuteThe number of audit logs created per minute. This metric is useful in cases where you need to know the number of audit logs created.
confluent-audit-metrics:name=audit-log-fallback-rate-per-minuteThe rate of audit log fallbacks per minute. This metric is useful in cases where you need to know the fallback rate of your audit logs.
RBAC and LDAP health metrics
confluent.metadata:type=LdapGroupManager,name=failure-start-seconds-agoThe number of seconds since the last failed attempt to process metadata from the LDAP server. This is reset to zero on the next successful metadata refresh. This metric is available on brokers in the metadata cluster if LDAP group-based authorization is enabled. Alert if value is greater than zero.
confluent.metadata:type=KafkaAuthStore,name=writer-failure-start-seconds-agoThe number of seconds since the last failure in the writer that updates authentication or authorization metadata on topics in the metadata cluster. This is reset to zero after the next successful metadata update. This metric is available on brokers in the metadata cluster. Alert if value is greater than zero.
confluent.metadata:type=KafkaAuthStore,name=reader-failure-start-seconds-agoThe number of seconds since the last failure in the consumer that processes authentication or authorization metadata from the topics in the metadata cluster. This is reset to zero after the next successful metadata refresh. This metric is available on all brokers configured to use RBAC. Alert if value is greater than zero.
confluent.metadata:type=KafkaAuthStore,name=remote-failure-start-seconds-agoThe number of seconds since the last failure in the metadata service, for example, due to LDAP refresh failures for a long duration. This is reset to zero when notification of successful refresh from the metadata service is processed. This metric is available on all brokers configured to use RBAC. Alert if value is greater than zero.
confluent.metadata:type=KafkaAuthStore,name=active-writer-countNumber of active writers in the metadata cluster. Alert if the sum is any number other than one because there should be exactly one writer in the metadata cluster.
confluent.metadata:type=KafkaAuthStore,name=metadata-status,topic=([-.\w]+),partition=([0-9]+)Current status of metadata on each metadata topic partition. Value may be UNKNOWN, INITIALIZING, INITIALIZED or FAILED.
confluent.metadata:type=KafkaAuthStore,name=record-send-rate,topic=([-.\w]+),partition=([0-9]+)The average number of records sent per second to the metadata topic partitions.
confluent.metadata:type=KafkaAuthStore,name=record-error-rate,topic=([-.\w]+),partition=([0-9]+)The average number of record send attempts per second to the metadata topic partitions that failed.
confluent-auth-store-metrics:name=rbac-role-bindings-countThe number of role bindings defined. This metric is useful in cases where you need to know the exact number of role bindings that exist.
confluent-auth-store-metrics:name=rbac-access-rules-countThe number of RBAC access rules defined. This metric is useful in cases where you need to know the exact number of RBAC access rules that exist. Access rules allow or deny access to specific resources within a specific scope, unlike role bindings, which assign an RBAC role for a specific resource to a specific principal.
confluent-auth-store-metrics:name=acl-access-rules-countThe number of ACL access rules defined. This metric is useful in cases where you need to know the exact number of ACLs that exist.
Consumer metrics
Starting with Kafka 3.1.1, the consumer exposes the following metrics:
MBean: kafka.consumer:type=consumer-metrics,client-id=([-.\w]+)
committed-time-ns-totalThe cumulative sum of time in nanoseconds elapsed during calls to
Consumer.committed.commit-sync-time-ns-totalThe cumulative sum of time in nanoseconds elapsed during calls to
Consumer.commitSync.
Starting with Kafka 2.4.0, the consumer exposes the following metrics:
MBean: kafka.consumer:type=consumer-metrics,client-id=([-.\w]+)
time-between-poll-avgThe average delay between invocations of
poll().time-between-poll-maxThe max delay between invocations of
poll().last-poll-seconds-agoThe number of seconds since the last
poll()invocation.poll-idle-ratio-avgThe average fraction of time the consumer’s
poll()is idle as opposed to waiting for the user code to process records.
Fetch metrics
Starting with Kafka 0.9.0.0, the consumer exposes the following metrics:
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”
bytes-consumed-rateThe average number of bytes consumed per second.
bytes-consumed-totalThe total number of bytes consumed.
fetch-latency-avgThe average time taken for a fetch request.
fetch-latency-maxThe max time taken for a fetch request.
fetch-rateThe number of fetch requests per second.
fetch-size-avgThe average number of bytes fetched per request.
fetch-size-maxThe maximum number of bytes fetched per request.
fetch-throttle-time-avgThe average throttle time in milliseconds. When quotas are enabled, the broker may delay fetch requests in order to throttle a consumer which has exceeded its limit. This metric indicates how throttling time has been added to fetch requests on average.
fetch-throttle-time-maxThe maximum throttle time in milliseconds.
fetch-totalThe total number of fetch requests.
records-consumed-rateThe average number of records consumed per second.
records-consumed-totalThe total number of records consumed.
records-lag-maxThe maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.
records-lead-minThe minimum lead in terms of number of records for any partition in this window.
records-per-request-avgThe average number of records in each request.
Topic-level fetch metrics
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=”{client-id}”,topic=”{topic}”
bytes-consumed-rateThe average number of bytes consumed per second for a specific topic.
bytes-consumed-totalThe total number of bytes consumed for a specific topic.
fetch-size-avgThe average number of bytes fetched per request for a specific topic.
fetch-size-maxThe maximum number of bytes fetched per request for a specific topic.
records-consumed-rateThe average number of records consumed per second for a specific topic.
records-consumed-totalThe total number of records consumed for a specific topic.
records-per-request-avgThe average number of records in each request for a specific topic.
Partition-level fetch metrics
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition=”{partition}”,topic=”{topic}”,client-id=”{client-id}
preferred-read-replicaThe current read replica for the partition, or
-1if reading from leader.records-lagThe latest lag of the partition.
records-lag-avgThe average lag of the partition.
records-lag-maxThe max lag of the partition.
records-leadThe latest lead of the partition.
records-lead-avgThe average lead of the partition.
records-lead-minThe min lead of the partition.
Consumer group metrics
MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
assigned-partitionsThe number of partitions currently assigned to this consumer.
commit-latency-avgThe average time taken for a commit request.
commit-latency-maxThe max time taken for a commit request.
commit-rateThe number of commit calls per second.
heartbeat-rateThe average number of heartbeats per second. After a rebalance, the consumer sends heartbeats to the coordinator to keep itself active in the group. You can control this using the
heartbeat.interval.mssetting for the consumer. You may see a lower rate than configured if the processing loop is taking more time to handle message batches. Usually this is OK as long as you see no increase in the join rate.heartbeat-response-time-maxThe max time taken to receive a response to a heartbeat request.
join-rateThe number of group joins per second. Group joining is the first phase of the rebalance protocol. A large value indicates that the consumer group is unstable and will likely be coupled with increased lag.
join-time-avgThe average time taken for a group rejoin. This value can get as high as the configured session timeout for the consumer, but should usually be lower.
join-time-maxThe max time taken for a group rejoin. This value should not get much higher than the configured session timeout for the consumer.
last-heartbeat-seconds-agoThe number of seconds since the last controller heartbeat.
sync-rateThe number of group syncs per second. Group synchronization is the second and last phase of the rebalance protocol. Similar to
join-rate, a large value indicates group instability.sync-time-avgThe average time taken for a group sync.
sync-time-maxThe max time taken for a group sync.
Global request metrics
MBean: kafka.consumer:type=consumer-metrics,client-id=([-.\w]+)
incoming-byte-rateThe average number of incoming bytes received per second from all servers.
outgoing-byte-rateThe average number of outgoing bytes sent per second to all servers.
request-latency-avgThe average request latency in ms.
request-latency-maxThe maximum request latency in ms.
request-rateThe average number of requests sent per second.
response-rateThe average number of responses received per second.
Global connection metrics
MBean: kafka.consumer:type=consumer-metrics,client-id=([-.\w]+)
connection-countThe current number of active connections.
connection-creation-rateNew connections established per second in the window.
connection-close-rateConnections closed per second in the window.
io-ratioThe fraction of time the I/O thread spent doing I/O.
io-time-ns-avgThe average length of time for I/O per select call in nanoseconds.
io-wait-ratioThe fraction of time the I/O thread spent waiting.
io-wait-time-ns-avgThe average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.
select-rateNumber of times the I/O layer checked for new I/O to perform per second.
Per-broker metrics
MBean: kafka.consumer:type=consumer-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
Besides the Global request metrics, the following metrics are also available per broker:
incoming-byte-rateThe average number of bytes received per second from the broker.
outgoing-byte-rateThe average number of bytes sent per second to the broker.
request-size-avgThe average size of all requests in the window for a broker.
request-size-maxThe maximum size of any request sent in the window for a broker.
request-rateThe average number of requests sent per second to the broker.
response-rateThe average number of responses received per second from the broker.
Old consumer metrics
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)Number of messages the consumer lags behind the producer by.
kafka.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=([-.\w]+)The minimum rate at which the consumer sends fetch requests to the broker. If a consumer is dead, this value drops to roughly 0.
kafka.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=([-.\w]+)The throughput in messages consumed per second.
kafka.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=([-.\w]+)The throughput in bytes consumed per second.
The following metrics are available only on the high-level consumer:
kafka.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=([-.\w]+)The rate at which this consumer commits offsets to Kafka. This is only relevant if
offsets.storage=kafka.kafka.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=([-.\w]+)The rate at which this consumer commits offsets to ZooKeeper. This is only relevant if
offsets.storage=zookeeper. Monitor this value if your ZooKeeper cluster is under performing due to high write load.kafka.consumer:type=ZookeeperConsumerConnector,name=RebalanceRateAndTime,clientId=([-.\w]+)The rate and latency of the rebalance operation on this consumer.
kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=([-.\w]+),groupId=([-.\w]+)The number of partitions owned by this consumer.