Monitoring Kafka with JMX in Confluent Platform

Java Management Extensions (JMX) and Managed Beans (MBeans) are technologies for monitoring and managing Java applications, and they are enabled by default for Kafka and provide metrics for its components; brokers, controllers, producers, and consumers. The next sections describe how to configure JMX and verify that you have configured it correctly. Note that features that are not enabled in your deployment will not generate MBeans.

You can search for a metric by name.

You can also browse metrics by category:

Broker metrics
KRaft broker metrics
KRaft Quorum metrics
Controller metrics
Log metrics
Network metrics
ZooKeeper metrics
Producer metrics
Consumer metrics
Consumer group metrics
Audit metrics
Authorizer metrics
RBAC and LDAP metrics

Tip

Confluent offers some alternatives to using JMX monitoring.

Health+: Consider monitoring and managing your environment with Monitor Confluent Platform with Health+. Ensure the health of your clusters and minimize business disruption with intelligent alerts, monitoring, and proactive support based on best practices created by the inventors of Kafka.
Confluent Control Center: You can deploy Control Center for out-of-the-box Kafka cluster monitoring so you don’t have to build your own monitoring system. Control Center (Legacy) makes it easy to manage the entire Confluent Platform deployment. Control Center (Legacy) is a web-based application that allows you to manage your cluster and to alert on triggers. Additionally, Control Center (Legacy) measures how long messages take to be delivered, and determines the source of any issues in your cluster.

Configure JMX

You can use the following JVM environment variables to configure JMX monitoring when you start Kafka or use Java system properties to enable remote JMX programmatically.

JMX_PORT

The port that JMX listens on for incoming connections.

JMX_HOSTNAME

The hostname associated with locally created remote objects.

KAFKA_JMX_OPTS

JMX options. Use this variable to override the default JMX options such as whether authentication is enabled. The default options are set as follows: -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

server.hostname Specifies the host a JMX client connects to. The default is localhost (127.0.0.1)
jmxremote=true Enables the JMX remote agent and enables the connector to listen through a specific port.
jmxremote.authenticate=false Indicates that authentication is off by default.
jmxremote.ssl=false- Indicates that TLS/SSL is off by default.

For example, to start Kafka, and specify a JMX port, you can use the following command:

JMX_PORT=2020 ./bin/kafka-server-start.sh ./etc/kafka/kraft/server.properties

For more on configuring JMX, see Monitoring and Management using JMX technology. For additional topics about monitoring, see the Related content section.

View MBeans with JConsole

To see JMX monitoring in action, you can start JConsole, a utility provided with Java.

To start JConsole, use the jconsole command, and connect to the Kafka process.

After JConsole starts, if Kafka is running locally, you will see it listed as a Local Process, and you can select it. Or if you are running Kafka remotely, such as in a Docker container, select Remote Process, enter the hostname and port you specified in your JMX configuration, and click Connect.

Enter a valid username and password, or if you have not configured authentication, you may be prompted to make an Insecure connection.

Once JConsole is running, you can select the MBeans tab and expand the folders to see the JMX events and attributes for those events.

It is important to note that you will only see MBeans for features that you have enabled on your nodes. If you do not see a MBean that you expected for a feature or component, check that the feature or component that generates it is successfully enabled, and that you are viewing the correct process.

You can also view JMX metrics with console tools such Jmxterm or export JMX metrics to other monitoring tools such as Prometheus. For a demo on how to use some popular monitoring tools with Confluent products, see JMX Monitoring Stacks on GitHub.

Configure security

Password authentication and TLS/SSL are disabled for JMX by default in Kafka, however in a production environment, you should enable authentication and TLS/SSL to prevent unauthorized users from accessing your brokers.

You override the default JMX settings to enable authentication and SSL.

To learn about how to secure JMX, follow the TLS/SSL and authentication sections in Monitoring and Management Using JMX Technology.

Configure other Confluent Platform components

Use the following environment variables to override the default JMX options such as authentication settings for other Confluent Platform components.

Component	Environment Variable
Confluent Control Center (Legacy)	CONTROL_CENTER_JMX_OPTS
REST Proxy	KAFKAREST_JMX_OPTS
ksqlDB	KSQL_JMX_OPTS
Rebalancer	REBALANCER_JMX_OPTS
Schema Registry	SCHEMA_REGISTRY_JMX_OPTS

Search for a metric

Broker metrics

There are many metrics reported at the broker and controller level that can be monitored and used to troubleshoot issues with your cluster. At minimum, you should monitor and set alerts on ActiveControllerCount, OfflinePartitionsCount, and UncleanLeaderElectionsPerSec.

clientSoftwareName/clientSoftwareVersion

MBean: kafka.server:clientSoftwareName=(name),clientSoftwareVersion=(version),listener=(listener),networkProcessor=(processor-index),type=(type): The name and version of client software in the brokers. For example, the Kafka 2.4 Java client produces the following MBean on the broker:

kafka.server:clientSoftwareName=apache-kafka-java,clientSoftwareVersion=2.4.0,listener=PLAINTEXT,networkProcessor=1,type=socket-server-metrics

RequestRateAndQueueTimeMs

MBean: kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs,brokerId=([0-9]+): The rate (requests per millisecond) at which the controller takes requests from the queue of the given broker, and the time it takes for a request to stay in this queue before it is taken from the queue. ZooKeeper mode.

ElectionRateAndTimeMs

MBean: kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs: The broker leader election rate and latency in milliseconds. This is non-zero when there are broker failures.

UncleanLeaderElectionsPerSec

MBean: kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec: The unclean broker leader election rate. Should be 0.

MessageConversionsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name={Produce|Fetch}MessageConversionsPerSec,topic=([-.\w]+): The message format conversion rate, for Produce or Fetch requests, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

BytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={topicName}: The incoming byte rate from clients, per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

BytesOutPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={topicName}: The outgoing byte rate to clients per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

BytesRejectedPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={topicName}: The rejected byte rate per topic, due to the record batch size being greater than max.message.bytes configuration. Omitting ‘topic={…}’ will yield the all-topic rate.

FailedFetchRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec: The fetch request rate for requests that failed.

FailedProduceRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec: The produce request rate for requests that failed.

InvalidMagicNumberRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidMagicNumberRecordsPerSec: The message validation failure rate due to an invalid magic number. This should be 0.

InvalidMessageCrcRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidMessageCrcRecordsPerSec: The message validation failure rate due to incorrect Crc checksum

InvalidOffsetOrSequenceRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=InvalidOffsetOrSequenceRecordsPerSec: The message validation failure rate due to non-continuous offset or sequence number in batch. Normally this should be 0.

MessagesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={topicName}: The incoming message rate per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

NoKeyCompactedTopicRecordsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=NoKeyCompactedTopicRecordsPerSec: The message validation failure rate due to no key specified for compacted topic. This should be 0.

ReassignmentBytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec: The incoming byte rate of reassignment traffic.

ReplicationBytesInPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec,topic={topicName}: The incoming byte rate from other brokers per topic. Omitting ‘topic={…}’ will yield the all-topic rate.

ReplicationBytesOutPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec: The byte-out rate to other brokers.

TotalFetchRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec: The fetch request rate per second.

TotalProduceRequestsPerSec

MBean: kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec: The produce request rate per second.

consumer-lag-offsets

MBean: kafka.server:type=tenant-metrics,member={memberId},topic={topicName},consumer-group={groupName},partition={Id},client-id={clientId}

consumer-lag-offsets is an attribute of the MBean listed.

This metric is the difference between the last offset stored by the broker and the last committed offset for a specific consumer group name, client ID, member ID, partition ID, and topic name. This metric provides the consumer lag in offsets only and does not report latency. In addition, it is not reported for any groups that are not alive or are empty.

To enable this metric, you must set the following server properties.

confluent.consumer.lag.emitter.enabled=true # default is false
confluent.consumer.lag.emitter.interval.ms=60000 # default is 60000

For more information about this metric, see Monitor Consumer Lag in Confluent Platform.

PurgatorySize (produce)

MBean: kafka.server:type=DelayedOperationPurgatory,delayedOperation=Produce,name=PurgatorySize: The number of requests waiting in the producer purgatory. This should be non-zero when acks=all is used on the producer.

PurgatorySize (fetch)

MBean: kafka.server:type=DelayedOperationPurgatory,delayedOperation=Fetch,name=PurgatorySize: The number of requests waiting in the fetch purgatory. This is high if consumers use a large value for fetch.wait.max.ms

ConsumerLag

MBean: kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic={topicName},partition=([0-9]+): The lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader and if the associated brokers need to be removed from the In-Sync Replicas list.

linux-disk-read-bytes

MBean: kafka.server:type=KafkaServer,name=linux-disk-read-bytes: The total number of bytes read by the broker process, including reads from all disks. The total doesn’t include reads from page cache. Available only on Linux-based systems.

linux-disk-write-bytes

MBean: kafka.server:type=KafkaServer,name=linux-disk-write-bytes: The total number of bytes written by the broker process, including writes from all disks. Available only on Linux-based systems.

PartitionsWithLateTransactionCount

MBean: kafka.server:type=ReplicaManager,name=PartitionsWithLateTransactionsCount: The number of partitions that have open transactions with durations exceeding the transaction.max.timeout.ms property value set on the broker.

UnderMinIsrPartitionCount

MBean: kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount: The number of partitions whose in-sync replicas count is less than minIsr.

UnderReplicatedPartitions

MBean: kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions: The number of under-replicated partitions (| ISR | < | current replicas |). Replicas that are added as part of a reassignment will not count toward this value. Alert if the value is greater than 0.

ReassigningPartitions

MBean: kafka.server:type=ReplicaManager,name=ReassigningPartitions: The number of reassigning partitions.

UnderMinIsr

MBean: kafka.cluster:type=Partition,topic={topic},name=UnderMinIsr,partition={partition}: The number of partitions whose in-sync replicas count is less than minIsr. These partitions will be unavailable to producers who use acks=all.

InSyncReplicasCount

MBean: kafka.cluster:type=Partition,topic={topic},name=InSyncReplicasCount,partition={partition}: A gauge metric that indicates the in-sync replica count per topic partition leader.

AtMinIsr

MBean: kafka.cluster:type=Partition,topic={topic}name=AtMinIsr,partition={partition}: The number of partitions whose in-sync replicas count is equal to the minIsr value.

ReplicasCount

MBean: kafka.cluster:type=Partition,topic={topic}:name=ReplicasCount,partition={partition}: A gauge metric that indicates the replica count per topic partition leader.

UnderReplicated

MBean: kafka.cluster:type=Partition,topic={topic}::name=UnderReplicated,partition={partition}: The number of partitions that are under replicated meaning the number of in-sync replicas is less than the replica count.

PartitionCount

MBean: kafka.server:type=ReplicaManager,name=PartitionCount: The number of partitions on this broker. This should be mostly even across all brokers.

LeaderCount

MBean: kafka.server:type=ReplicaManager,name=LeaderCount: The number of leaders on this broker. This should be mostly even across all brokers. If not, set auto.leader.rebalance.enable to true on all brokers in the cluster.

RequestHandlerAvgIdlePercent

MBean: kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent: The average fraction of time the request handler threads are idle. Values are between 0 meaning all resources are used and 1 meaning all resources are available.

IsrShrinksPerSec

MBean: kafka.server:type=ReplicaManager,name=IsrShrinksPerSec: Measures the reduction of in-sync replicas per second. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.

IsrExpandsPerSec

MBean: kafka.server:type=ReplicaManager,name=IsrExpandsPerSec: Measures the expansion of in-sync replicas per second. When a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.

DelayQueueSize

MBean: kafka.server:type=Produce,name=DelayQueueSize: The number of producer clients currently being throttled. The value can be any number greater than or equal to 0.
Important
For monitoring quota applications and throttled clients, use the Bandwidth quota, and Request quota metrics.

Bandwidth quota

MBean: kafka.server:type={Produce|Fetch},user={userName},client-id={clientId}

Use the attributes of this metric to measure the bandwidth quota. This metric has the following attributes:

throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.
byte-rate: the data produce/consume rate of the client in bytes/sec.
- For (user, client-id) quotas, specify both user and client-id.
- If a per-client-id quota is applied to the client, do not specify user.
- If a per-user quota is applied, do not specify client-id.

Request quota

MBean: kafka.server:type=Request,user={userName},client-id={clientId}

Use the attributes of this metric to measure request quota. This metric has the following attributes:

throttle-time: the amount of time in milliseconds the client was throttled. Ideally = 0.
request-time: the percentage of time spent in broker network and I/O threads to process requests from client group.
- For (user, client-id) quotas, specify both user and client-id.
- If a per-client-id quota is applied to the client, do not specify user.
- If a per-user quota is applied, do not specify client-id.

connection-count

MBean: kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-count: The number of currently open connections to the broker.

connection-creation-rate

MBean: kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-creation-rate: The number of new connections established per second.

KRaft broker metrics

These metrics are only produced for a broker when running in KRaft mode.

last-applied-record-offset

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset: The offset of the last record from the cluster metadata partition that was applied by the broker.

last-applied-record-timestamp

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp: The timestamp of the last record from the cluster metadata partition that was applied by the broker.

last-applied-record-lag-ms

MBean: kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms: The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the broker.

metadata-load-error-count

MBean: kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count: The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.

metadata-apply-error-count

MBean: kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count: The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.

KRaft Quorum metrics

The set of metrics that allow monitoring of the KRaft quorum and metadata log.

These metrics are reported on both controllers and brokers in a KRaft cluster.

CurrentMetadataVersion

MBean: kafka.server:type=MetadataLoader,name=CurrentMetadataVersion: Outputs the feature level of the current effective metadata version.

HandleLoadSnapshotCount

MBean: kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount: The total number of times we have loaded a KRaft snapshot since the process was started.

current-state

MBean: kafka.server:type=raft-metrics,name=current-state: The current state of this member; possible values are leader, candidate, voted, follower, unattached, observer.

current-leader

MBean: kafka.server:type=raft-metrics,name=current-leader: The current quorum leader’s id; -1 indicates unknown.

current-vote

MBean: kafka.server:type=raft-metrics,name=current-vote: The current voted leader’s id; -1 indicates not voted for anyone.

current-epoch

MBean: kafka.server:type=raft-metrics,name=current-epoch: The current quorum epoch.

high-watermark

MBean: kafka.server:type=raft-metrics,name=high-watermark: The high watermark maintained on this member; -1 if it is unknown.

log-end-offset

MBean: kafka.server:type=raft-metrics,name=log-end-offset: The current raft log end offset.

number-unknown-voter-connections

MBean: kafka.server:type=raft-metrics,name=number-unknown-voter-connections: The number of unknown voters whose connection information is not cached. This value of this metric is always 0.

commit-latency-avg

MBean: kafka.server:type=raft-metrics,name=commit-latency-avg: The average time in milliseconds to commit an entry in the raft log.

commit-latency-max

MBean: kafka.server:type=raft-metrics,name=commit-latency-max: The maximum time in milliseconds to commit an entry in the raft log.

election-latency-avg

MBean: kafka.server:type=raft-metrics,name=election-latency-avg: The average time in milliseconds spent on electing a new leader.

election-latency-max

MBean: kafka.server:type=raft-metrics,name=election-latency-max: The maximum time in milliseconds spent on electing a new leader.

fetch-records-rate

MBean: kafka.server:type=raft-metrics,name=fetch-records-rate: The average number of records fetched from the leader of the raft quorum.

append-records-rate

MBean: kafka.server:type=raft-metrics,name=append-records-rate: The average number of records appended per sec by the leader of the raft quorum.

poll-idle-ratio-avg

MBean: kafka.server:type=raft-metrics,name=poll-idle-ratio-avg: The average fraction of time the client’s poll() is idle as opposed to waiting for the user code to process records.

LatestSnapshotGeneratedBytes

MBean: kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes: The total size in bytes of the latest snapshot that the node has generated. If a snapshot has not been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0.

LatestSnapshotGeneratedAgeMs

MBean: kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs: The interval in milliseconds since the latest snapshot that the node has generated. If no snapshot has been generated yet, this is the approximate time delta since the process was started.

Controller metrics

The following metrics are exposed by a controller, which can be a broker controller in ZooKeeper mode or a KRaft controller in KRaft mode. For more about monitoring KRaft, see Monitor KRaft.

EventQueueProcessingTimeMs

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs: A Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue.

EventQueueSize

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueSize: Size of the controller’s event queue.

EventQueueTimeMs

MBean: kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs: Time that an event (except the Idle event) waits, in milliseconds, in the controller event queue before being processed.

ActiveBrokerCount

MBean: kafka.controller:type=KafkaController,name=ActiveBrokerCount: The number of active brokers as observed by this controller. This metric is produced for both ZooKeeper and KRaft mode.

ActiveControllerCount

MBean: kafka.controller:type=KafkaController,name=ActiveControllerCount: For KRaft mode, the number of active controllers in the cluster. Valid values are ‘0’ or ‘1’. For ZooKeeper, indicates whether the broker is the controller broker. Alert if the aggregated sum across all brokers in the cluster is anything other than 1 because there should be exactly one controller per cluster.

EventQueueOperationsStartedCount

MBean: kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount: For KRaft mode, the total number of controller event queue operations that were started. This includes deferred operations.

EventQueueOperationsTimedOutCount

MBean: kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount: For KRaft mode, the total number of controller event queue operations that timed out before they could be performed.

FencedBrokerCount

MBean: kafka.controller:type=KafkaController,name=FencedBrokerCount: In KRaft mode, the number of fenced, but registered brokers as observed by this controller. This metric is always 0 in ZooKeeper mode.

GlobalPartitionCount

MBean: kafka.controller:type=KafkaController,name=GlobalPartitionCount: The number of partitions across all topics in the cluster.

GlobalTopicCount

MBean: kafka.controller:type=KafkaController,name=GlobalTopicCount: The number of global partitions as observed by this Controller.

LastAppliedRecordLagMs

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordLagMs: The difference, in milliseconds, between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active controllers the value of this lag is always zero.

LastAppliedRecordOffset

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordOffset: The offset of the last record from the cluster metadata partition that was applied by the Controller.

LastAppliedRecordTimestamp

MBean: kafka.controller:type=KafkaController,name=LastAppliedRecordTimestamp: The timestamp of the last record from the cluster metadata partition that was applied by the controller.

LastCommittedRecordOffset

MBean: kafka.controller:type=KafkaController,name=LastCommittedRecordOffset: The offset of the last record committed to this Controller.

MetadataErrorCount

MBean: kafka.controller:type=KafkaController,name=MetadataErrorCount: The number of times this controller node has encountered an error during metadata log processing.

NewActiveControllersCount

MBean: kafka.controller:type=KafkaController,name=NewActiveControllersCount: For KRaft mode, counts the number of times this node has seen a new controller elected. A transition to the “no leader” state is not counted here. If the same controller as before becomes active, that still counts.

OfflinePartitionsCount

MBean: kafka.controller:type=KafkaController,name=OfflinePartitionsCount,partition={partition}: The number of partitions that don’t have an active leader and are therefore not writable or readable. Alert if value is greater than 0.

PreferredReplicaImbalanceCount

MBean: kafka.controller:type=KafkaController,name=PreferredReplicaImbalanceCount: The count of topic partitions for which the leader is not the preferred leader.

ReplicasIneligibleToDeleteCount

MBean: kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount: The number of ineligible pending replica deletes.

ReplicasToDeleteCount

MBean: kafka.controller:type=KafkaController,name=ReplicasToDeleteCount: Pending replica deletes.

TimedOutBrokerHeartbeatCount

MBean: kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount: For KRaft mode, the number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric.

TopicsIneligibleToDeleteCount

MBean: kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount: Ineligible pending topic deletes.

TopicsToDeleteCount

MBean: kafka.controller:type=KafkaController,name=TopicsToDeleteCount: Pending topic deletes.

ZkWriteDeltaTimeMs

MBean: kafka.controller:type=KafkaController,name=ZkWriteDeltaTimeMs: For KRaft mode, the number of milliseconds the KRaft controller took writing a delta into ZooKeeper.

ZkWriteSnapshotTimeMs

MBean: kafka.controller:type=KafkaController,name=ZkWriteSnapshotTimeMs: For KRaft mode, the number of milliseconds the KRaft controller took reconciling a snapshot into ZooKeeper.

ZkWriteBehindLag

MBean: kafka.controller:type=KafkaController,name=ZkWriteBehindLag: For KRaft mode, the amount of lag in records that ZooKeeper is behind relative to the highest committed record in the metadata log. This metric will only be reported by the active KRaft controller.

Log metrics

The following section contains metrics for logs, log cleaners, and log cleaner managers. A log directory is a directory on disk that contains one or more Kafka log segments. When a Kafka broker starts up, it registers all of the log directories that it finds on its local disk with the log manager. The log manager is responsible for managing the creation, deletion, and cleaning of log segments across all registered log directories.

LogEndOffset

MBean: kafka.log:type=Log,name=LogEndOffset: The offset of the last message in a partition. Use with LogStartOffset to calculate the current message count for a topic.

LogStartOffset

MBean: kafka.log:type=Log,name=LogStartOffset: The offset of the first message in a partition. Use with LogEndOffset to calculate the current message count for a topic.

NumLogSegments

MBean: kafka.log:type=Log:name=NumLogSegments: The number of log segments that currently exist for a given partition.

Size

MBean: kafka.log:type=Log,name=Size: A metric for the total size in bytes of all log segments that belong to a given partition.

DeadThreadCount

MBean: kafka.log:type=LogCleaner,name=DeadThreadCount: Provides the number of dead threads for the process.

cleaner-recopy-percent

MBean: kafka.log:type=LogCleaner,name=cleaner-recopy-percent: A metric to track the recopy rate of each thread’s last cleaning

max-buffer-utilization-percent

MBean: kafka.log:type=LogCleaner,name=max-buffer-utilization-percent: A metric to track the maximum utilization of any thread’s buffer in the last cleaning.

max-clean-time-secs

MBean: kafka.log:type=LogCleaner,name=max-clean-time-secs: A metric to track the maximum cleaning time for the last cleaning from each thread.

max-compaction-delay-secs

MBean: kafka.log:type=LogCleaner,name=max-compaction-delay-secs: A metric that provides the maximum delay, in seconds, that the log cleaner will wait before compacting a log segment.

compacted-partition-bytes

MBean: kafka.log:type=LogCleanerManager,name=compacted-partition-bytes: A gauge metric that provides compacted data in each partition.

compacted-partition-local-bytes

MBean: kafka.log:type=LogCleanerManager,name=compacted-partition-local-bytes: A gauge metric that provides local compacted data in each partition. Confluent Server only.

compacted-partition-tiered-bytes

MBean: kafka.log:type=LogCleanerManager,name=compacted-partition-tiered-bytes: A gauge metric that provides compacted data in each partition when using tiered storage. Confluent Server only.

max-dirty-percent

MBean: kafka.log:type=LogCleanerManager,name=max-dirty-percent: A gauge metric that provides the maximum percentage of the log segment that can contain dirty messages before the log cleaner will begin cleaning it.

time-since-last-run-ms

MBean: kafka.log:type=LogCleanerManager,name=time-since-last-run-ms: A gauge metric that provides the time, in milliseconds, since the log cleaner last ran.

OfflineLogDirectoryCount

MBean: kafka.log:type=LogManager,name=OfflineLogDirectoryCount: A metric that describes the number of log directories that are registered with the log manager, but are currently offline.

LogFlushRateAndTimeMs

MBean: kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs: Log flush rate and time in milliseconds.

uncleanable-bytes

MBean: kafka.log:logDirectory="{/log-directory}":type=LogCleanerManager:name=uncleanable-bytes: A metric that provides information about the total number of bytes in log segments for a specified log directory that cannot be cleaned by the log cleaner due to retention policy or other configuration settings.

uncleanable-partitions-count

MBean: kafka.log:logDirectory="{/log-directory}":type=LogCleanerManager:name=uncleanable-partitions-count: A gauge metric that tracks the number of partitions marked as uncleanable for each log directory.

LogDirectoryOffline

MBean: kafka.log:logDirectory="{/log-directory}":type=LogManager,name=LogDirectoryOffline: A metric that provides the number of log directories that are currently offline and not available for use by the Kafka broker.

SegmentAppendTimeMs

MBean: kafka.log:type=SegmentStats,name=SegmentAppendTimeMs: The time in milliseconds to append a record to the log segment. Available on Confluent Server only.

OffsetIndexAppendTimeMs

MBean: kafka.log:type=SegmentStats,name=OffsetIndexAppendTimeMs: The time in milliseconds to append an entry to the log segment offset index. The offset index maps from logical offsets to physical file positions. Available on Confluent Server only.

TimestampIndexAppendTimeMs

MBean: kafka.log:type=SegmentStats,name=TimestampIndexAppendTimeMs: The time in milliseconds to append an entry to the log segment timestamp index. Available on Confluent Server only.

Network metrics

Kafka brokers can generate a lot of network traffic because they collect and distribute data for processing. You can use the following metrics to help evaluate whether your Kafka deployment is communicating efficiently.

RequestQueueSize

MBean: kafka.network:type=RequestChannel,name=RequestQueueSize: Size of the request queue. A congested request queue will not be able to process incoming or outgoing requests.

ResponseQueueSize

MBean: kafka.network:type=RequestChannel,name=ResponseQueueSize: Size of the response queue. The response queue is unbounded. A congested response queue can result in delayed response times and memory pressure on the broker.

ErrorsPerSec

MBean: kafka.network:type=RequestMetrics,name=ErrorsPerSec,request={requestType},error={error-code}: Indicates the number of errors in responses counted per-request-type, per-error-code. If a response contains multiple errors, all are counted. error=NONE indicates successful responses.

LocalTimeMs

MBean: kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}: Time in milliseconds the request is processed at the leader.

MessageConversionsTimeMs

MBean: kafka.network:type=RequestMetrics,name=MessageConversionsTimeMs,request={Produce|Fetch}: Time in milliseconds spent on message format conversions.

RemoteTimeMs

MBean: kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}: Time in milliseconds the request waits for the follower. This is non-zero for produce requests when ack=-1 or acks=all.

RequestBytes

MBean: kafka.network:type=RequestMetrics,name=RequestBytes,request={requestType}: Request size in bytes.

RequestQueueTimeMs

MBean: kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}: Time in milliseconds that the request waits in the request queue.

RequestsPerSec

MBean: kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower},version=([0-9]+): The request rate. version refers to the API version of the request type. To get the total count for a specific request type, make sure that JMX monitoring tools aggregate across different versions.

ResponseSendTimeMs

MBean: kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}: Time to send the response.

TemporaryMemoryBytes

MBean: kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request={Produce|Fetch}: Temporary memory, in bytes, used for message format conversions and decompression.

TotalTimeMs

MBean: kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}: The total time in milliseconds to serve the specified request. - Produce: requests from producers to send data - FetchConsumer: requests from consumers to get data - FetchFollower: requests from follower brokers that are the followers of a partition to get new data

ExpiredConnectionsKilledCount

MBean: kafka.network:type=SocketServer,name=ExpiredConnectionsKilledCount: The total number of connections disconnected, across all processors, due to a client not re-authenticating and then using the connection beyond its expiration time for anything other than re-authentication. Ideally 0 when re-authentication is enabled, implying there are no longer any older, pre-2.2.0 clients connecting to this broker

NetworkProcessorAvgIdlePercent

MBean: kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent: The average fraction of time the network processor threads are idle. Values are between 0 (all resources are used) and 1 (all resources are available).Ideally this is less than 0.3.

ZooKeeper metrics

Important

As of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. For more information, see KRaft Overview for Confluent Platform.

ZooKeeper state transition counts are exposed as metrics, which can help to spot problems with your cluster. For example, such as broker connections to ZooKeeper. The metrics show the transitions rate per second for each one of the possible states. Here is the list of the counters we expose, one for each possible ZooKeeper client state.

ZooKeeperDisconnectsPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec: A meter that provides the number of recent ZooKeeper client disconnects. Note that this metric does not tell you the number of disconnected clients or if ZooKeeper is down. The number does not change when a connection that was reported as disconnected is reestablished. In addition, this count returns to zero after about 90 minutes, even if the client never successfully reconnects to ZooKeeper. If you are checking system health, ZooKeeperExpiresPerSec is a better metric to help you determine this.

ZooKeeperExpiresPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec

The ZooKeeper session has expired. Note that when a session expires it could result in leader changes or possibly a new controller. You should monitor the number of these events across a Kafka cluster and if the overall number is high, you can do the following:

Check the health of your network
Check for garbage collection issues and tune it accordingly
If necessary, increase the session time out by setting the value of zookeeper.session.timeout.ms.

Following are additional ZooKeeper metrics you can optionally observe on a Kafka broker.

ZooKeeperSyncConnectsPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec: Reports the number of ZooKeeper client connections per second to execute operations.

ZooKeeperAuthFailuresPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec: Reports the number of attempts to connect to the ensemble that failed because the client has not provided correct credentials.

ZooKeeperReadOnlyConnectsPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec: The server the client is connected to is currently LOOKING, which means that it is neither FOLLOWING nor LEADING. Consequently, the client can only read the ZooKeeper state, but not make any changes (create, delete, or set the data of znodes).

ZooKeeperSaslAuthenticationsPerSec

MBean: kafka.server:type=SessionExpireListener,name=ZooKeeperSaslAuthenticationsPerSec: Indicates the number of times the client has successfully authenticated per second.

SessionState

MBean: kafka.server:type=SessionExpireListener,name=SessionState: The connection status of broker’s ZooKeeper session. Expected value is CONNECTED.

Producer metrics

The following metrics are available on a producer instance. These metrics are attributes on the associated MBean.

waiting-threads

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The number of user threads blocked waiting for buffer memory to enqueue their records. Kafka 3.1.1 and later.

buffer-total-bytes

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The maximum amount of buffer memory the client can use whether or not it is available. Kafka 3.1.1 and later.

buffer-available-bytes

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total amount of buffer memory that is not being used, either unallocated or in the free list. Kafka 3.1.1 and later.

bufferpool-wait-time

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The fraction of time an appender waits for space allocation. Kafka 3.1.1 and later.

bufferpool-wait-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds an appender waits for space allocation in nanoseconds. Kafka 3.1.1 and later.

flush-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds the producer spent in Producer.flush. Kafka 3.1.1 and later.

txn-init-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds that the producer spent initializing transactions for exactly-once semantics. Kafka 3.1.1 and later.

txn-begin-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds the producer spent in beginTransaction for exactly-once semantics. Kafka 3.1.1 and later.

txn-send-offsets-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time the producer spent sending offsets to transactions in nanoseconds for exactly-once semantics. Kafka 3.1.1 and later.

txn-commit-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds the producer spent committing transactions for exactly-once semantics.

txn-abort-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds the producer spent aborting transactions for exactly-once semantics. Kafka 3.1.1 and later.

batch-size-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of bytes sent per partition per-request. Global request metric.

batch-size-max

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The maximum number of bytes sent per partition per-request. Global request metric.

batch-split-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of batch splits per second. Global request metric.

batch-split-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total number of batch splits. Global request metric.

compression-rate-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size. Global request metric.

incoming-byte-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of incoming bytes received per second from all servers. Global request metric.

metadata-age

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The age, in seconds, of the current producer metadata being used. Global request metric.

outgoing-byte-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of bytes sent per second to all servers. Global request metric.

produce-throttle-time-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average time in milliseconds a request was throttled by a broker. Global request metric.

produce-throttle-time-max

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The maximum time in milliseconds a request was throttled by a broker. Global request metric.

record-error-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average per-second number of record sends that resulted in errors. Global request metric.

record-error-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total number of record sends that resulted in errors. Global request metric.

record-queue-time-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average time in milliseconds record batches spent in the send buffer. Global request metric.

record-queue-time-max

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The maximum time in milliseconds record batches spent in the send buffer. Global request metric.

record-retry-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average per-second number of retried record sends. Global request metric.

record-retry-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total number of retried record sends. Global request metric.

record-send-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of records sent per second. Global request metric.

record-send-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total number of records sent. Global request metric.

record-size-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average record size. Global request metric.

record-size-max

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The maximum record size. Global request metric.

records-per-request-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of records per request. Global request metric.

request-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average number of requests sent per second. Global request metric.

requests-in-flight

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The current number of in-flight requests awaiting a response. Global request metric.

bufferpool-wait-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time in nanoseconds a producer waits for space allocation. Global connection metric.

connection-close-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The connections closed per second in the window. Global connection metric.

connection-count

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The current number of active connections. Global connection metric.

connection-creation-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The number of new connections established per second in the window. Global connection metric.

io-ratio

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The fraction of time the I/O thread spent doing I/O. Global connection metric.

io-time-ns-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average length of time for I/O per select call in nanoseconds. Global connection metric.

io-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total time the I/O thread spent doing I/O in nanoseconds. Global connection metric.

io-wait-ratio

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The fraction of time the I/O thread spent waiting. Global connection metric.

io-wait-time-ns-avg

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. Global connection metric.

io-wait-time-ns-total

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The total length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. Global connection metric.

select-rate

MBean: kafka.producer:type=producer-metrics,client-id={clientId}: The number of times the I/O layer checked for new I/O to perform per second. Global connection metric.

Besides the producer global request metrics, the following metrics are also available per broker:

incoming-byte-rate

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of bytes received per second from the broker.

outgoing-byte-rate

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of bytes sent per second to the broker.

request-size-avg

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The average size of all requests in the window for a broker.

request-size-max

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The maximum size of any request sent in the window for a broker.

request-rate

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of requests sent per second to the broker.

response-rate

MBean: kafka.producer:type=producer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of responses received per second from the broker.

In addition to the producer global request metrics, the following metric attributes are available per topic:

byte-rate

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The average number of bytes sent per second for a topic.

byte-total

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The total number of bytes sent for a topic.

compression-rate

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The average compression rate of record batches for a topic. This is computed by averaging the compression ratios of the record batches. The compression ratio is computed by dividing the compressed batch size by the uncompressed size.

record-error-rate

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The average per-second number of record sends that resulted in errors for a topic.

record-error-total

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The total number of record sends that resulted in errors for a topic.

record-retry-rate

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The average per-second number of retried record sends for a topic.

record-retry-total

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The total number of retried record sends for a topic.

record-send-rate

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The average number of records sent per second for a topic.

record-send-total

MBean: kafka.producer:type=producer-topic-metrics,client-id="{clientId}",topic="{topic}: The total number of records sent for a topic.

Consumer metrics

A consumer instance exposes the following metrics. These metrics are attributes on the listed MBean.

committed-time-ns-total

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The cumulative sum of time in nanoseconds elapsed during calls to Consumer.committed. Kafka 3.1.1 and later.

commit-sync-time-ns-total

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The cumulative sum of time in nanoseconds elapsed during calls to Consumer.commitSync. Kafka 3.1.1 and later.

time-between-poll-avg

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average delay between invocations of poll(). Kafka 2.4.0 and later.

time-between-poll-max

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The maximum delay between invocations of poll(). Kafka 2.4.0 and later.

last-poll-seconds-ago

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The number of seconds since the last poll() invocation. Kafka 2.4.0 and later.

poll-idle-ratio-avg

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average fraction of time the consumer’s poll() is idle as opposed to waiting for the user code to process records. Kafka 2.4.0 and later.

bytes-consumed-rate

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average number of bytes consumed per second.

bytes-consumed-total

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The total number of bytes consumed.

fetch-latency-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average time taken for a fetch request.

fetch-latency-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The maximum time taken for a fetch request.

fetch-rate

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The number of fetch requests per second.

fetch-size-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average number of bytes fetched per request.

fetch-size-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The maximum number of bytes fetched per request.

fetch-throttle-time-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average throttle time in milliseconds. When quotas are enabled, the broker may delay fetch requests in order to throttle a consumer which has exceeded its limit. This metric indicates how throttling time has been added to fetch requests on average.

fetch-throttle-time-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The maximum throttle time in milliseconds.

fetch-total

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The total number of fetch requests.

records-consumed-rate

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average number of records consumed per second.

records-consumed-total

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The total number of records consumed.

records-lag-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.

records-lead-min

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The minimum lead in terms of number of records for any partition in this window.

records-per-request-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}": The average number of records in each request.

bytes-consumed-rate

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The average number of bytes consumed per second for a specific topic.

bytes-consumed-total

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The total number of bytes consumed for a specific topic.

fetch-size-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The average number of bytes fetched per request for a specific topic.

fetch-size-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The maximum number of bytes fetched per request for a specific topic.

records-consumed-rate

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The average number of records consumed per second for a specific topic.

records-consumed-total

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The total number of records consumed for a specific topic.

records-per-request-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientId}",topic="{topic}": The average number of records in each request for a specific topic.

preferred-read-replica

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The current read replica for the partition, or -1 if reading from leader.

records-lag

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The latest lag of the specified partition.

records-lag-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The average lag of the specified partition.

records-lag-max

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The maximum lag of the specified partition.

records-lead

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The latest lead of the specified partition.

records-lead-avg

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The average lead of the specified partition.

records-lead-min

MBean: kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{clientId}: The min lead of the specified partition.

incoming-byte-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average number of incoming bytes received per second from all servers. Global request metric.

outgoing-byte-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average number of outgoing bytes sent per second to all servers. Global request metric.

request-latency-avg

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average request latency in milliseconds. Global request metric.

request-latency-max

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The maximum request latency in milliseconds. Global request metric.

request-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average number of requests sent per second. Global request metric.

response-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average number of responses received per second. Global request metric.

connection-count

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The current number of active connections. Consumer global connection metric.

connection-creation-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The number of new connections established per second in the window. Consumer global connection metric.

connection-close-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The number of connections closed per second in the window. Consumer global connection metric.

io-ratio

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The fraction of time the I/O thread spent doing I/O. Consumer global connection metric.

io-time-ns-avg

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average length of time for I/O per select call in nanoseconds. Consumer global connection metric.

io-wait-ratio

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The fraction of time the I/O thread spent waiting. Consumer global connection metric.

io-wait-time-ns-avg

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. Consumer global connection metric.

select-rate

MBean: kafka.consumer:type=consumer-metrics,client-id={clientId}: The number of times the I/O layer checked for new I/O to perform per second. Consumer global connection metric.

incoming-byte-rate

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of bytes received per second from the specified broker.

outgoing-byte-rate

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of bytes sent per second to the specified broker.

request-size-avg

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The average size of all requests in the window for a specified broker.

request-size-max

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The maximum size of any request sent in the window for a specified broker.

request-rate

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of requests sent per second to the specified broker.

response-rate

MBean: kafka.consumer:type=consumer-node-metrics,client-id={clientId},node-id=([0-9]+): The average number of responses received per second from the specified broker.

Consumer group metrics

A consumer group exposes the following metrics. These metrics are attributes on the listed MBean.

assigned-partitions

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of partitions currently assigned to this consumer.

commit-latency-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken for a commit request.

commit-latency-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken for a commit request.

commit-rate

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of commit calls per second.

commit-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of commit calls.

failed-rebalance-rate-per-hour

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of failed group rebalance event per hour.

failed-rebalance-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of failed group rebalances.

heartbeat-rate

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average number of heartbeats per second. After a rebalance, the consumer sends heartbeats to the coordinator to keep itself active in the group. You can control this using the heartbeat.interval.ms setting for the consumer. You may see a lower rate than configured if the processing loop is taking more time to handle message batches. Usually this is OK as long as you see no increase in the join rate.

heartbeat-response-time-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken to receive a response to a heartbeat request.

heartbeat-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of heartbeats.

join-rate

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of group joins per second. Group joining is the first phase of the rebalance protocol. A large value indicates that the consumer group is unstable and will likely be coupled with increased lag.

join-time-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken for a group rejoin. This value can get as high as the configured session timeout for the consumer, but should usually be lower.

join-time-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken for a group rejoin. This value should not get much higher than the configured session timeout for the consumer.

join-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of group joins.

last-heartbeat-seconds-ago

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of seconds since the last controller heartbeat.

last-rebalance-seconds-ago

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of seconds since the last rebalance event.

partitions-assigned-latency-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken by the on-partitions-assigned rebalance listener callback.

partitions-assigned-latency-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken by the on-partitions-assigned rebalance listener callback.

partitions-lost-latency-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken by the on-partitions-lost rebalance listener callback.

partitions-lost-latency-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken by the on-partitions-lost rebalance listener callback.

partitions-revoked-latency-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken by the on-partitions-revoked rebalance listener callback.

partitions-revoked-latency-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken by the on-partitions-revoked rebalance listener callback.

rebalance-latency-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken for a group rebalance.

rebalance-latency-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken for a group rebalance.

rebalance-latency-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total time taken for group rebalances so far.

rebalance-rate-per-hour

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of group rebalance participated per hour.

rebalance-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of group rebalances participated.

sync-rate

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The number of group syncs per second. Group synchronization is the second and last phase of the rebalance protocol. Similar to join-rate, a large value indicates group instability.

sync-time-avg

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The average time taken for a group sync.

sync-time-max

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The maximum time taken for a group sync.

sync-total

MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id={clientId}: The total number of group syncs.

Audit metrics

These audit metrics are specific Confluent Enterprise. For information about how audit logging works, see Audit Log Concepts in Confluent Platform.

audit-log-rate-per-minute

MBean: confluent-audit-metrics:name=audit-log-rate-per-minute: The number of audit log entries created per minute. This metric is useful in cases where you need to know the number of audit logs created.

audit-log-fallback-rate-per-minute

MBean: confluent-audit-metrics:name=audit-log-fallback-rate-per-minute: The rate of audit log fallback entries per minute. If the audit logging mechanism tries to write to the Kafka topic and doesn’t succeed for any reason, it writes the JSON audit log message to log4j instead. This metric is useful in cases where you need to know the fallback rate of your audit logs.

authentication-audit-log-rate

MBean: confluent-audit-metrics:name=authentication-audit-log-rate: The number authentication audit log entries created per second.

authentication-audit-log-failure-rate

MBean: confluent-audit-metrics:name=authentication-audit-log-rate: The number of authentication failure entries per second.

authorization-audit-log-rate

MBean: confluent-audit-metrics:name=authentication-audit-log-failure-rate: The number of authorization audit log entries created per second.

authorization-audit-log-failure-rate

MBean: confluent-audit-metrics:name=authorization-audit-log-failure-rate: The number of authorization audit log failure entries per second.

kafka-request-event-audit-log-rate

MBean: confluent-audit-metrics:name=kafka-request-event-audit-log-rate: The number of Kafka request event audit log entries per second.

kafka-request-event-audit-log-failure-rate

MBean: confluent-audit-metrics:name=kafka-request-event-audit-log-failure-rate: The number of Kafka request event audit log failure entries per second.

Authorizer metrics

The following metrics are exposed by Confluent Server. For more about Confluent Server Authorizer, see Configure Confluent Server Authorizer in Confluent Platform.

authorizer-authorization-latency-p90

MBean: confluent-authorizer-metrics:name=authorizer-authorization-latency-p90: The 90th percentile for time spent in milliseconds of requests going through the authorization process.

authorizer-authorization-latency-p99

MBean: confluent-authorizer-metrics:name=authorizer-authorization-latency-p99: The 99th percentile for time spent in milliseconds of requests going through the authorization process.

authorization-request-rate-per-minute

MBean: confluent-authorizer-metrics:name=authorization-request-rate-per-minute: The number of authorization requests per minute. This metric is useful in cases where you need to know the exact number of authorization requests per minute.

authorization-allowed-rate-per-minute

MBean: confluent-authorizer-metrics:name=authorization-allowed-rate-per-minute: The number of authorizations allowed per minute. This metric is useful in cases where you need to know the rate of authorizations allowed per minute.

authorization-denied-rate-per-minute

MBean: confluent-authorizer-metrics:name=authorization-denied-rate-per-minute: The number of authorizations denied per minute. This metric is useful in cases where you need to know the rate of authorizations denied per minute.

RBAC and LDAP metrics

The following metrics are exposed when you use RBAC and LDAP.

failure-start-seconds-ago

MBean: confluent.metadata:type=LdapGroupManager,name=failure-start-seconds-ago: The number of seconds since the last failed attempt to process metadata from the LDAP server. This is reset to zero on the next successful metadata refresh. This metric is available on brokers in the metadata cluster if LDAP group-based authorization is enabled. Alert if value is greater than zero.

writer-failure-start-seconds-ago

MBean: confluent.metadata:type=KafkaAuthStore,name=writer-failure-start-seconds-ago: The number of seconds since the last failure in the writer that updates authentication or authorization metadata on topics in the metadata cluster. This is reset to zero after the next successful metadata update. This metric is available on brokers in the metadata cluster. Alert if value is greater than zero.

reader-failure-start-seconds-ago

MBean: confluent.metadata:type=KafkaAuthStore,name=reader-failure-start-seconds-ago: The number of seconds since the last failure in the consumer that processes authentication or authorization metadata from the topics in the metadata cluster. This is reset to zero after the next successful metadata refresh. This metric is available on all brokers configured to use RBAC. Alert if value is greater than zero.

remote-failure-start-seconds-ago

MBean: confluent.metadata:type=KafkaAuthStore,name=remote-failure-start-seconds-ago: The number of seconds since the last failure in the metadata service, for example, due to LDAP refresh failures for a long duration. This is reset to zero when notification of successful refresh from the metadata service is processed. This metric is available on all brokers configured to use RBAC. Alert if value is greater than zero.

active-writer-count

MBean: confluent.metadata:type=KafkaAuthStore,name=active-writer-count: The number of active writers in the metadata cluster. Alert if the sum is any number other than one because there should be exactly one writer in the metadata cluster.

metadata-status

MBean: confluent.metadata:type=KafkaAuthStore,name=metadata-status,topic=([-.\w]+),partition=([0-9]+): The current status of metadata on each metadata topic partition. Value may be UNKNOWN, INITIALIZING, INITIALIZED or FAILED.

record-send-rate

MBean: confluent.metadata:type=KafkaAuthStore,name=record-send-rate,topic=([-.\w]+),partition=([0-9]+): The average number of records sent per second to the metadata topic partitions.

record-error-rate

MBean: confluent.metadata:type=KafkaAuthStore,name=record-error-rate,topic=([-.\w]+),partition=([0-9]+): The average number of record send attempts per second to the metadata topic partitions that failed.

rbac-role-bindings

MBean: kafka.server:type=confluent-auth-store-metrics:name=rbac-role-bindings-count: The number of role bindings defined. This metric is useful in cases where you need to know the exact number of role bindings that exist.

rbac-access-rules-count

MBean: kafka.server:type=confluent-auth-store-metrics:name=rbac-access-rules-count: The number of RBAC access rules defined. This metric is useful in cases where you need to know the exact number of RBAC access rules that exist. Access rules allow or deny access to specific resources within a specific scope, unlike role bindings, which assign an RBAC role for a specific resource to a specific principal.

acl-access-rules-count

MBean: kafka.server:type=confluent-auth-store-metrics:name=acl-access-rules-count: The number of ACL access rules defined. This metric is useful in cases where you need to know the exact number of ACLs that exist.