Broker and Controller Metrics

This topic describes JMX metrics for Kafka brokers, KRaft mode, and controllers. These metrics are useful for monitoring the health and performance of your Kafka cluster.

For information about how to configure JMX, see Configure JMX for Monitoring.

Search for a metric

Broker metrics

There are many metrics reported at the broker and controller level that can be monitored and used to troubleshoot issues with your cluster. At minimum, you should monitor and set alerts on ActiveControllerCount, OfflinePartitionsCount, and UncleanLeaderElectionsPerSec.

AtMinIsr

MBean: kafka.cluster:type=Partition,topic={topic},name=AtMinIsr,partition={partition}

The number of partitions whose in-sync replicas count is equal to the minIsr value.

Bandwidth quota

MBean: kafka.server:type={Produce|Fetch},user={userName},client-id={clientId}

Use the attributes of this metric to measure the bandwidth quota. This metric has the following attributes:

  • throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.

  • byte-rate: the data produce/consume rate of the client in bytes/sec.

    • For (user, client-id) quotas, specify both user and client-id.

    • If a per-client-id quota is applied to the client, do not specify user.

    • If a per-user quota is applied, do not specify client-id.

BytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={topicName}

The incoming byte rate from clients, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

BytesOutPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={topicName}

The outgoing byte rate to clients per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

BytesRejectedPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={topicName}

The rejected byte rate per topic, due to the record batch size being greater than max.message.bytes configuration. Omitting ‘topic={…}’ will yield the all-topic rate.

clientSoftwareName/clientSoftwareVersion

MBean: kafka.server:clientSoftwareName=(name),clientSoftwareVersion=(version),listener=(listener),networkProcessor=(processor-index),type=(type)

The name and version of client software in the brokers. For example, the Kafka 2.4 Java client produces the following MBean on the broker:

kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics

connection-count

MBean: kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-count

The number of currently open connections to the broker.

connection-creation-rate

MBean: kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-creation-rate

The number of new connections established per second.

consumer-lag-offsets

MBean: kafka.server:type=tenant-metrics,member={mbrId},topic={tpcName},consumer-group={gpName},partition={Id},client-id={cliId},group-protocol={grpProtocol}
Attribute: consumer-lag-offsets

This metric is the difference between the last offset stored by the broker and the last committed offset for a specific consumer group name, client ID, member ID, partition ID, topic name, and group protocol. The group protocol specifies the rebalance protocol used by the consumer group, currently either classic or consumer. For more information about the rebalance protocols, see Consumer Rebalance Protocols. This metric provides the consumer lag in offsets only and does not report latency. In addition, it is not reported for any groups that are not alive or are empty.

To enable this metric, you must set the following server properties.

confluent.consumer.lag.emitter.enabled=true # default is false
confluent.consumer.lag.emitter.interval.ms=60000 # default is 60000

For more information about this metric, see Monitor Consumer Lag in Confluent Platform.

ConsumerLag

MBean: kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic={topicName},partition=([0-9]+)

The lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader and if the associated brokers need to be removed from the In-Sync Replicas list.

CurrentControllerId

MBean: kafka.server:type=MetadataLoader,name=CurrentControllerId

Outputs the ID of the current controller, or -1 if none is known. Reports the current controller ID on broker and controller nodes.

DelayQueueSize

MBean: kafka.server:type=Produce,name=DelayQueueSize

The number of producer clients currently being throttled. The value can be any number greater than or equal to 0.

Important

For monitoring quota applications and throttled clients, use the Bandwidth quota, and Request quota metrics.

ElectionRateAndTimeMs

MBean: kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs

The broker leader election rate and latency in milliseconds. This is non-zero when there are broker failures.

FailedFetchRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec

The fetch request rate for requests that failed.

FailedProduceRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec

The produce request rate for requests that failed.

InSyncReplicasCount

MBean: kafka.cluster:type=Partition,topic={topic},name=InSyncReplicasCount,partition={partition}

A gauge metric that indicates the in-sync replica count per topic partition leader.

InvalidMagicNumberRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidMagicNumberRecordsPerSec

The message validation failure rate due to an invalid magic number. This should be 0.

InvalidMessageCrcRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidMessageCrcRecordsPerSec

The message validation failure rate due to incorrect Crc checksum

InvalidOffsetOrSequenceRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidOffsetOrSequenceRecordsPerSec

The message validation failure rate due to non-continuous offset or sequence number in batch. Normally this should be 0.

IsrExpandsPerSec

MBean: kafka.server:type=ReplicaManager,name=IsrExpandsPerSec

Measures the expansion of in-sync replicas per second. When a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.

IsrShrinksPerSec

MBean: kafka.server:type=ReplicaManager,name=IsrShrinksPerSec

Measures the reduction of in-sync replicas per second. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.

LeaderCount

MBean: kafka.server:type=ReplicaManager,name=LeaderCount

The number of leaders on this broker. This should be mostly even across all brokers. If not, set auto.leader.rebalance.enable to true on all brokers in the cluster.

linux-disk-read-bytes

MBean: kafka.server:type=KafkaServer,name=linux-disk-read-bytes

The total number of bytes read by the broker process, including reads from all disks. The total doesn’t include reads from page cache. Available only on Linux-based systems.

linux-disk-write-bytes

MBean: kafka.server:type=KafkaServer,name=linux-disk-write-bytes

The total number of bytes written by the broker process, including writes from all disks. Available only on Linux-based systems.

MessageConversionsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name={Produce|Fetch}MessageConversionsPerSec,topic=([-.\w]+)

The message format conversion rate, for Produce or Fetch requests, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

MessagesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={topicName}

The incoming message rate per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

NoKeyCompactedTopicRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=NoKeyCompactedTopicRecordsPerSec

The message validation failure rate due to no key specified for compacted topic. This should be 0.

PartitionCount

MBean: kafka.server:type=ReplicaManager,name=PartitionCount

The number of partitions on this broker. This should be mostly even across all brokers.

PartitionsWithLateTransactionCount

MBean: kafka.server:type=ReplicaManager,name=PartitionsWithLateTransactionsCount

The number of partitions that have open transactions with durations exceeding the transaction.max.timeout.ms property value set on the broker.

PurgatorySize (fetch)

MBean: kafka.server:type=DelayedOperationPurgatory,delayedOperation=Fetch,name=PurgatorySize

The number of requests waiting in the fetch purgatory. This is high if consumers use a large value for fetch.wait.max.ms

PurgatorySize (produce)

MBean: kafka.server:type=DelayedOperationPurgatory,delayedOperation=Produce,name=PurgatorySize

The number of requests waiting in the producer purgatory. This should be non-zero when acks=all is used on the producer.

ReassigningPartitions

MBean: kafka.server:type=ReplicaManager,name=ReassigningPartitions

The number of reassigning partitions.

ReassignmentBytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec

The incoming byte rate of reassignment traffic.

ReplicasCount

MBean: kafka.cluster:type=Partition,topic={topic},name=ReplicasCount,partition={partition}

A gauge metric that indicates the replica count per topic partition leader.

ReplicationBytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec,topic={topicName}

The incoming byte rate from other brokers per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

ReplicationBytesOutPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec

The byte-out rate to other brokers.

Request quota

MBean: kafka.server:type=Request,user={userName},client-id={clientId}

Use the attributes of this metric to measure request quota. This metric has the following attributes:

  • throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.

  • request-time: the percentage of time spent in broker network and I/O threads to process requests from client group.

    • For (user, client-id) quotas, specify both user and client-id.

    • If a per-client-id quota is applied to the client, do not specify user.

    • If a per-user quota is applied, do not specify client-id.

RequestHandlerAvgIdlePercent

MBean: kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent

The average fraction of time the request handler threads are idle. Values are between 0 meaning all resources are used and 1 meaning all resources are available.

TotalFetchRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec

The fetch request rate per second.

TotalProduceRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec

The produce request rate per second.

UncleanLeaderElectionsPerSec

MBean: kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec

The unclean broker leader election rate. Should be 0.

UnderMinIsr

MBean: kafka.cluster:type=Partition,topic={topic},name=UnderMinIsr,partition={partition}

The number of partitions whose in-sync replicas count is less than minIsr. These partitions will be unavailable to producers who use acks=all.

UnderMinIsrPartitionCount

MBean: kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount

The number of partitions whose in-sync replicas count is less than minIsr.

UnderReplicated

MBean: kafka.cluster:type=Partition,topic={topic},name=UnderReplicated,partition={partition}

The number of partitions that are under replicated meaning the number of in-sync replicas is less than the replica count.

UnderReplicatedPartitions

MBean: kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions

The number of under-replicated partitions (| ISR | < | current replicas |). Replicas that are added as part of a reassignment will not count toward this value. Alert if the value is greater than 0.

KRaft broker metrics

These metrics are only produced for a broker when running in KRaft mode.

last-applied-record-lag-ms

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms

The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the broker.

last-applied-record-offset

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset

The offset of the last record from the cluster metadata partition that was applied by the broker.

last-applied-record-timestamp

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp

The timestamp of the last record from the cluster metadata partition that was applied by the broker.

metadata-apply-error-count

MBean: kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count

The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.

metadata-load-error-count

MBean: kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count

The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.

KRaft Quorum metrics

The set of metrics that allow monitoring of the KRaft quorum and metadata log. These metrics are reported on both controllers and brokers in a KRaft cluster.

CurrentMetadataVersion

MBean: kafka.server:type=MetadataLoader,name=CurrentMetadataVersion

Outputs the feature level of the current effective metadata version.

HandleLoadSnapshotCount

MBean: kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount

The total number of times we have loaded a KRaft snapshot since the process was started.

Following are attributes of the kafka.server:type=raft-metrics MBean:

append-records-rate

MBean: kafka.server:type=raft-metrics Attribute: append-records-rate

The average number of records appended per sec by the leader of the raft quorum.

commit-latency-avg

MBean: kafka.server:type=raft-metrics Attribute: commit-latency-avg

The average time in milliseconds to commit an entry in the raft log.

commit-latency-max

MBean: kafka.server:type=raft-metrics Attribute: commit-latency-max

The maximum time in milliseconds to commit an entry in the raft log.

current-epoch

MBean: kafka.server:type=raft-metrics Attribute: current-epoch

The current quorum epoch.

current-leader

MBean: kafka.server:type=raft-metrics Attribute: current-leader

The current quorum leader’s id; -1 indicates unknown.

current-state

MBean: kafka.server:type=raft-metrics Attribute: current-state

The current state of this member; possible values are leader, candidate, voted, follower, unattached, observer.

current-vote

MBean: kafka.server:type=raft-metrics Attribute: current-vote

The current voted leader’s id; -1 indicates not voted for anyone.

election-latency-avg

MBean: kafka.server:type=raft-metrics Attribute: election-latency-avg

The average time in milliseconds spent on electing a new leader.

election-latency-max

MBean: kafka.server:type=raft-metrics Attribute: election-latency-max

The maximum time in milliseconds spent on electing a new leader.

fetch-records-rate

MBean: kafka.server:type=raft-metrics Attribute: fetch-records-rate

The average number of records fetched from the leader of the raft quorum.

high-watermark

MBean: kafka.server:type=raft-metrics Attribute: high-watermark

The high watermark maintained on this member; -1 if it is unknown.

log-end-epoch

MBean: kafka.server:type=raft-metrics Attribute: log-end-epoch

The current raft log end epoch.

log-end-offset

MBean: kafka.server:type=raft-metrics Attribute: log-end-offset

The current raft log end offset.

number-unknown-voter-connections

MBean: kafka.server:type=raft-metrics Attribute: number-unknown-voter-connections

The number of unknown voters whose connection information is not cached. This value of this metric is always 0.

poll-idle-ratio-avg

MBean: kafka.server:type=raft-metrics Attribute: poll-idle-ratio-avg

The average fraction of time the client’s poll() is idle as opposed to waiting for the user code to process records.

LatestSnapshotGeneratedAgeMs

MBean: kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs

The interval in milliseconds since the latest snapshot that the node has generated. If no snapshot has been generated yet, this is the approximate time delta since the process was started.

LatestSnapshotGeneratedBytes

MBean: kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes

The total size in bytes of the latest snapshot that the node has generated. If a snapshot has not been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0.

Controller metrics

The following metrics are exposed by a controller. For more about monitoring KRaft, see Monitor KRaft.

ActiveBrokerCount

MBean: kafka.controller:type=KafkaController,name=ActiveBrokerCount

The number of active brokers as observed by this controller.

ActiveControllerCount

MBean: kafka.controller:type=KafkaController,name=ActiveControllerCount

The number of active controllers in the cluster. Valid values are ‘0’ or ‘1’. Alert if the aggregated sum across all brokers in the cluster is anything other than 1 because there should be exactly one controller per cluster.

EventQueueOperationsStartedCount

MBean: kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount

For KRaft mode, the total number of controller event queue operations that were started. This includes deferred operations.

EventQueueOperationsTimedOutCount

MBean: kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount

For KRaft mode, the total number of controller event queue operations that timed out before they could be performed.

EventQueueProcessingTimeMs

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs

A Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue.

EventQueueSize

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueSize

Size of the controller’s event queue.

EventQueueTimeMs

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs

Time that an event (except the Idle event) waits, in milliseconds, in the controller event queue before being processed.

FencedBrokerCount

MBean: kafka.controller:type=KafkaController,name=FencedBrokerCount

In KRaft mode, the number of fenced, but registered brokers as observed by this controller.

GlobalPartitionCount

MBean: kafka.controller:type=KafkaController,name=GlobalPartitionCount

The number of partitions across all topics in the cluster.

GlobalTopicCount

MBean: kafka.controller:type=KafkaController,name=GlobalTopicCount

The number of global topics as observed by this Controller.

LastAppliedRecordLagMs

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordLagMs

The difference, in milliseconds, between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active controllers the value of this lag is always zero.

LastAppliedRecordOffset

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordOffset

The offset of the last record from the cluster metadata partition that was applied by the Controller.

LastAppliedRecordTimestamp

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordTimestamp

The timestamp of the last record from the cluster metadata partition that was applied by the controller.

LastCommittedRecordOffset

MBean: kafka.controller:type=KafkaController,name=LastCommittedRecordOffset

The offset of the last record committed to this Controller.

MetadataErrorCount

MBean: kafka.controller:type=KafkaController,name=MetadataErrorCount

The number of times this controller node has encountered an error during metadata log processing.

NewActiveControllersCount

MBean: kafka.controller:type=KafkaController,name=NewActiveControllersCount

For KRaft mode, counts the number of times this node has seen a new controller elected. A transition to the “no leader” state is not counted here. If the same controller as before becomes active, that still counts.

OfflinePartitionsCount

MBean: kafka.controller:type=KafkaController,name=OfflinePartitionsCount,partition={partition}

The number of partitions that don’t have an active leader and are therefore not writable or readable. Alert if value is greater than 0.

PreferredReplicaImbalanceCount

MBean: kafka.controller:type=KafkaController,name=PreferredReplicaImbalanceCount

The count of topic partitions for which the leader is not the preferred leader.

ReplicasIneligibleToDeleteCount

MBean: kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount

The number of ineligible pending replica deletes.

ReplicasToDeleteCount

MBean: kafka.controller:type=KafkaController,name=ReplicasToDeleteCount

Pending replica deletes.

TimedOutBrokerHeartbeatCount

MBean: kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount

For KRaft mode, the number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.

TopicsIneligibleToDeleteCount

MBean: kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount

Ineligible pending topic deletes.

TopicsToDeleteCount

MBean: kafka.controller:type=KafkaController,name=TopicsToDeleteCount

Pending topic deletes.