Broker and Controller Metrics
This topic describes JMX metrics for Kafka brokers, KRaft mode, and controllers. These metrics are useful for monitoring the health and performance of your Kafka cluster.
For information about how to configure JMX, see Configure JMX for Monitoring.
Search for a metric
Broker metrics
There are many metrics reported at the broker and controller level that can be monitored and used to troubleshoot issues with your cluster. At minimum, you should monitor and set alerts on ActiveControllerCount, OfflinePartitionsCount, and UncleanLeaderElectionsPerSec.
AtMinIsr
- MBean:
kafka.cluster:type=Partition,topic={topic},name=AtMinIsr,partition={partition} The number of partitions whose in-sync replicas count is equal to the
minIsrvalue.
Bandwidth quota
- MBean:
kafka.server:type={Produce|Fetch},user={userName},client-id={clientId} Use the attributes of this metric to measure the bandwidth quota. This metric has the following attributes:
throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.byte-rate: the data produce/consume rate of the client in bytes/sec.For (
user,client-id) quotas, specify bothuserandclient-id.If a
per-client-idquota is applied to the client, do not specifyuser.If a
per-userquota is applied, do not specifyclient-id.
BytesInPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={topicName} The incoming byte rate from clients, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.
BytesOutPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={topicName} The outgoing byte rate to clients per topic. Omitting ‘topic={…}’ will yield the all-topic rate.
BytesRejectedPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={topicName} The rejected byte rate per topic, due to the record batch size being greater than
max.message.bytesconfiguration. Omitting ‘topic={…}’ will yield the all-topic rate.
clientSoftwareName/clientSoftwareVersion
kafka.server:clientSoftwareName=(name),clientSoftwareVersion=(version),listener=(listener),networkProcessor=(processor-index),type=(type)The name and version of client software in the brokers. For example, the Kafka 2.4 Java client produces the following MBean on the broker:
kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics
connection-count
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-countThe number of currently open connections to the broker.
connection-creation-rate
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-creation-rateThe number of new connections established per second.
consumer-lag-offsets
kafka.server:type=tenant-metrics,member={mbrId},topic={tpcName},consumer-group={gpName},partition={Id},client-id={cliId},group-protocol={grpProtocol}consumer-lag-offsetsThis metric is the difference between the last offset stored by the broker and the last committed offset for a specific consumer group name, client ID, member ID, partition ID, topic name, and group protocol. The group protocol specifies the rebalance protocol used by the consumer group, currently either classic or consumer. For more information about the rebalance protocols, see Consumer Rebalance Protocols. This metric provides the consumer lag in offsets only and does not report latency. In addition, it is not reported for any groups that are not alive or are empty.
To enable this metric, you must set the following server properties.
confluent.consumer.lag.emitter.enabled=true # default is false
confluent.consumer.lag.emitter.interval.ms=60000 # default is 60000
For more information about this metric, see Monitor Consumer Lag in Confluent Platform.
ConsumerLag
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic={topicName},partition=([0-9]+)The lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader and if the associated brokers need to be removed from the In-Sync Replicas list.
CurrentControllerId
- MBean:
kafka.server:type=MetadataLoader,name=CurrentControllerId Outputs the ID of the current controller, or -1 if none is known. Reports the current controller ID on broker and controller nodes.
DelayQueueSize
- MBean:
kafka.server:type=Produce,name=DelayQueueSize The number of producer clients currently being throttled. The value can be any number greater than or equal to
0.Important
For monitoring quota applications and throttled clients, use the Bandwidth quota, and Request quota metrics.
ElectionRateAndTimeMs
- MBean:
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs The broker leader election rate and latency in milliseconds. This is non-zero when there are broker failures.
FailedFetchRequestsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec The fetch request rate for requests that failed.
FailedProduceRequestsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec The produce request rate for requests that failed.
InSyncReplicasCount
- MBean:
kafka.cluster:type=Partition,topic={topic},name=InSyncReplicasCount,partition={partition} A gauge metric that indicates the in-sync replica count per topic partition leader.
InvalidMagicNumberRecordsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=InvalidMagicNumberRecordsPerSec The message validation failure rate due to an invalid magic number. This should be 0.
InvalidMessageCrcRecordsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=InvalidMessageCrcRecordsPerSec The message validation failure rate due to incorrect Crc checksum
InvalidOffsetOrSequenceRecordsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=InvalidOffsetOrSequenceRecordsPerSec The message validation failure rate due to non-continuous offset or sequence number in batch. Normally this should be 0.
IsrExpandsPerSec
- MBean:
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec Measures the expansion of in-sync replicas per second. When a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.
IsrShrinksPerSec
- MBean:
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec Measures the reduction of in-sync replicas per second. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
LeaderCount
- MBean:
kafka.server:type=ReplicaManager,name=LeaderCount The number of leaders on this broker. This should be mostly even across all brokers. If not, set
auto.leader.rebalance.enabletotrueon all brokers in the cluster.
linux-disk-read-bytes
- MBean:
kafka.server:type=KafkaServer,name=linux-disk-read-bytes The total number of bytes read by the broker process, including reads from all disks. The total doesn’t include reads from page cache. Available only on Linux-based systems.
linux-disk-write-bytes
- MBean:
kafka.server:type=KafkaServer,name=linux-disk-write-bytes The total number of bytes written by the broker process, including writes from all disks. Available only on Linux-based systems.
MessageConversionsPerSec
kafka.server:type=BrokerTopicMetrics,name={Produce|Fetch}MessageConversionsPerSec,topic=([-.\w]+)The message format conversion rate, for Produce or Fetch requests, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.
MessagesInPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={topicName} The incoming message rate per topic. Omitting ‘topic={…}’ will yield the all-topic rate.
NoKeyCompactedTopicRecordsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=NoKeyCompactedTopicRecordsPerSec The message validation failure rate due to no key specified for compacted topic. This should be 0.
PartitionCount
- MBean:
kafka.server:type=ReplicaManager,name=PartitionCount The number of partitions on this broker. This should be mostly even across all brokers.
PartitionsWithLateTransactionCount
- MBean:
kafka.server:type=ReplicaManager,name=PartitionsWithLateTransactionsCount The number of partitions that have open transactions with durations exceeding the
transaction.max.timeout.msproperty value set on the broker.
PurgatorySize (fetch)
- MBean:
kafka.server:type=DelayedOperationPurgatory,delayedOperation=Fetch,name=PurgatorySize The number of requests waiting in the fetch purgatory. This is high if consumers use a large value for
fetch.wait.max.ms
PurgatorySize (produce)
- MBean:
kafka.server:type=DelayedOperationPurgatory,delayedOperation=Produce,name=PurgatorySize The number of requests waiting in the producer purgatory. This should be non-zero when
acks=allis used on the producer.
ReassigningPartitions
- MBean:
kafka.server:type=ReplicaManager,name=ReassigningPartitions The number of reassigning partitions.
ReassignmentBytesInPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec The incoming byte rate of reassignment traffic.
ReplicasCount
- MBean:
kafka.cluster:type=Partition,topic={topic},name=ReplicasCount,partition={partition} A gauge metric that indicates the replica count per topic partition leader.
ReplicationBytesInPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec,topic={topicName} The incoming byte rate from other brokers per topic. Omitting ‘topic={…}’ will yield the all-topic rate.
ReplicationBytesOutPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec The byte-out rate to other brokers.
Request quota
- MBean:
kafka.server:type=Request,user={userName},client-id={clientId} Use the attributes of this metric to measure request quota. This metric has the following attributes:
throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.request-time: the percentage of time spent in broker network and I/O threads to process requests from client group.For (
user,client-id) quotas, specify bothuserandclient-id.If a
per-client-idquota is applied to the client, do not specifyuser.If a
per-userquota is applied, do not specifyclient-id.
RequestHandlerAvgIdlePercent
- MBean:
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent The average fraction of time the request handler threads are idle. Values are between
0meaning all resources are used and1meaning all resources are available.
TotalFetchRequestsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec The fetch request rate per second.
TotalProduceRequestsPerSec
- MBean:
kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec The produce request rate per second.
UncleanLeaderElectionsPerSec
- MBean:
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec The unclean broker leader election rate. Should be 0.
UnderMinIsr
- MBean:
kafka.cluster:type=Partition,topic={topic},name=UnderMinIsr,partition={partition} The number of partitions whose in-sync replicas count is less than
minIsr. These partitions will be unavailable to producers who useacks=all.
UnderMinIsrPartitionCount
- MBean:
kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount The number of partitions whose in-sync replicas count is less than
minIsr.
UnderReplicated
- MBean:
kafka.cluster:type=Partition,topic={topic},name=UnderReplicated,partition={partition} The number of partitions that are under replicated meaning the number of in-sync replicas is less than the replica count.
UnderReplicatedPartitions
- MBean:
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions The number of under-replicated partitions (| ISR | < | current replicas |). Replicas that are added as part of a reassignment will not count toward this value. Alert if the value is greater than 0.
KRaft broker metrics
These metrics are only produced for a broker when running in KRaft mode.
last-applied-record-lag-ms
- MBean:
kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the broker.
last-applied-record-offset
- MBean:
kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset The offset of the last record from the cluster metadata partition that was applied by the broker.
last-applied-record-timestamp
- MBean:
kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp The timestamp of the last record from the cluster metadata partition that was applied by the broker.
metadata-apply-error-count
- MBean:
kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.
metadata-load-error-count
- MBean:
kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.
KRaft Quorum metrics
The set of metrics that allow monitoring of the KRaft quorum and metadata log. These metrics are reported on both controllers and brokers in a KRaft cluster.
CurrentMetadataVersion
- MBean:
kafka.server:type=MetadataLoader,name=CurrentMetadataVersion Outputs the feature level of the current effective metadata version.
HandleLoadSnapshotCount
- MBean:
kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount The total number of times we have loaded a KRaft snapshot since the process was started.
Following are attributes of the kafka.server:type=raft-metrics MBean:
append-records-rate
- MBean:
kafka.server:type=raft-metricsAttribute:append-records-rate The average number of records appended per sec by the leader of the raft quorum.
commit-latency-avg
- MBean:
kafka.server:type=raft-metricsAttribute:commit-latency-avg The average time in milliseconds to commit an entry in the raft log.
commit-latency-max
- MBean:
kafka.server:type=raft-metricsAttribute:commit-latency-max The maximum time in milliseconds to commit an entry in the raft log.
current-epoch
- MBean:
kafka.server:type=raft-metricsAttribute:current-epoch The current quorum epoch.
current-leader
- MBean:
kafka.server:type=raft-metricsAttribute:current-leader The current quorum leader’s id; -1 indicates unknown.
current-state
- MBean:
kafka.server:type=raft-metricsAttribute:current-state The current state of this member; possible values are leader, candidate, voted, follower, unattached, observer.
current-vote
- MBean:
kafka.server:type=raft-metricsAttribute:current-vote The current voted leader’s id; -1 indicates not voted for anyone.
election-latency-avg
- MBean:
kafka.server:type=raft-metricsAttribute:election-latency-avg The average time in milliseconds spent on electing a new leader.
election-latency-max
- MBean:
kafka.server:type=raft-metricsAttribute:election-latency-max The maximum time in milliseconds spent on electing a new leader.
fetch-records-rate
- MBean:
kafka.server:type=raft-metricsAttribute:fetch-records-rate The average number of records fetched from the leader of the raft quorum.
high-watermark
- MBean:
kafka.server:type=raft-metricsAttribute:high-watermark The high watermark maintained on this member; -1 if it is unknown.
log-end-epoch
- MBean:
kafka.server:type=raft-metricsAttribute:log-end-epoch The current raft log end epoch.
log-end-offset
- MBean:
kafka.server:type=raft-metricsAttribute:log-end-offset The current raft log end offset.
number-unknown-voter-connections
- MBean:
kafka.server:type=raft-metricsAttribute:number-unknown-voter-connections The number of unknown voters whose connection information is not cached. This value of this metric is always 0.
poll-idle-ratio-avg
- MBean:
kafka.server:type=raft-metricsAttribute:poll-idle-ratio-avg The average fraction of time the client’s poll() is idle as opposed to waiting for the user code to process records.
LatestSnapshotGeneratedAgeMs
- MBean:
kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs The interval in milliseconds since the latest snapshot that the node has generated. If no snapshot has been generated yet, this is the approximate time delta since the process was started.
LatestSnapshotGeneratedBytes
- MBean:
kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes The total size in bytes of the latest snapshot that the node has generated. If a snapshot has not been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0.
Controller metrics
The following metrics are exposed by a controller. For more about monitoring KRaft, see Monitor KRaft.
ActiveBrokerCount
- MBean:
kafka.controller:type=KafkaController,name=ActiveBrokerCount The number of active brokers as observed by this controller.
ActiveControllerCount
- MBean:
kafka.controller:type=KafkaController,name=ActiveControllerCount The number of active controllers in the cluster. Valid values are ‘0’ or ‘1’. Alert if the aggregated sum across all brokers in the cluster is anything other than 1 because there should be exactly one controller per cluster.
EventQueueOperationsStartedCount
- MBean:
kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount For KRaft mode, the total number of controller event queue operations that were started. This includes deferred operations.
EventQueueOperationsTimedOutCount
- MBean:
kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount For KRaft mode, the total number of controller event queue operations that timed out before they could be performed.
EventQueueProcessingTimeMs
- MBean:
kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs A Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue.
EventQueueSize
- MBean:
kafka.controller:type=ControllerEventManager,name=EventQueueSize Size of the controller’s event queue.
EventQueueTimeMs
- MBean:
kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs Time that an event (except the Idle event) waits, in milliseconds, in the controller event queue before being processed.
FencedBrokerCount
- MBean:
kafka.controller:type=KafkaController,name=FencedBrokerCount In KRaft mode, the number of fenced, but registered brokers as observed by this controller.
GlobalPartitionCount
- MBean:
kafka.controller:type=KafkaController,name=GlobalPartitionCount The number of partitions across all topics in the cluster.
GlobalTopicCount
- MBean:
kafka.controller:type=KafkaController,name=GlobalTopicCount The number of global topics as observed by this Controller.
LastAppliedRecordLagMs
- MBean:
kafka.controller:type=KafkaController,name=LastAppliedRecordLagMs The difference, in milliseconds, between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active controllers the value of this lag is always zero.
LastAppliedRecordOffset
- MBean:
kafka.controller:type=KafkaController,name=LastAppliedRecordOffset The offset of the last record from the cluster metadata partition that was applied by the Controller.
LastAppliedRecordTimestamp
- MBean:
kafka.controller:type=KafkaController,name=LastAppliedRecordTimestamp The timestamp of the last record from the cluster metadata partition that was applied by the controller.
LastCommittedRecordOffset
- MBean:
kafka.controller:type=KafkaController,name=LastCommittedRecordOffset The offset of the last record committed to this Controller.
MetadataErrorCount
- MBean:
kafka.controller:type=KafkaController,name=MetadataErrorCount The number of times this controller node has encountered an error during metadata log processing.
NewActiveControllersCount
- MBean:
kafka.controller:type=KafkaController,name=NewActiveControllersCount For KRaft mode, counts the number of times this node has seen a new controller elected. A transition to the “no leader” state is not counted here. If the same controller as before becomes active, that still counts.
OfflinePartitionsCount
- MBean:
kafka.controller:type=KafkaController,name=OfflinePartitionsCount,partition={partition} The number of partitions that don’t have an active leader and are therefore not writable or readable. Alert if value is greater than 0.
PreferredReplicaImbalanceCount
- MBean:
kafka.controller:type=KafkaController,name=PreferredReplicaImbalanceCount The count of topic partitions for which the leader is not the preferred leader.
ReplicasIneligibleToDeleteCount
- MBean:
kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount The number of ineligible pending replica deletes.
ReplicasToDeleteCount
- MBean:
kafka.controller:type=KafkaController,name=ReplicasToDeleteCount Pending replica deletes.
TimedOutBrokerHeartbeatCount
- MBean:
kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount For KRaft mode, the number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.
TopicsIneligibleToDeleteCount
- MBean:
kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount Ineligible pending topic deletes.
TopicsToDeleteCount
- MBean:
kafka.controller:type=KafkaController,name=TopicsToDeleteCount Pending topic deletes.