USM Agent: Metrics and Metadata Reference
The following tables describe the metrics and metadata collected by the USM Agent from your Confluent Platform deployment and sent to Confluent Cloud to power USM capabilities in the Confluent Cloud UI.
Data is organized by source component:
Metadata collected — Metadata records (topic configurations, connector configurations)
Metrics collected — Metrics (counters, gauges, and deltas)
Metadata collected
The USM Agent collects topic metadata from Kafka (KRaft-based clusters) and connector metadata from Connect clusters. Metadata is emitted when resources are created, updated, or deleted.
Kafka topic metadata
The following table lists the topic metadata fields collected from Kafka.
Name | Description | Example |
|---|---|---|
| Confluent Platform version |
|
| Event subject |
|
| Epoch value |
|
| Partition key |
|
| Whether the broker role is enabled |
|
| Kafka node ID |
|
| Java version of the emitting process |
|
| Data schema when present; | — |
| Whether the controller role is enabled |
|
| Kafka cluster ID |
|
| Kafka commit ID |
|
| Data content type |
|
| Event timestamp (ISO 8601) |
|
| Host name of the emitting node |
|
| Route |
|
| Broker or controller node ID |
|
| Unique topic identifier |
|
| Topic name |
|
| Number of partitions |
|
| Replication factor |
|
| Retention period in milliseconds (-1 if unset) |
|
| Retention limit in bytes (-1 if unset) |
|
| Cleanup policy |
|
| Compression type |
|
| Minimum in-sync replicas |
|
| Maximum message size in bytes |
|
| Segment size in bytes |
|
| Segment roll time in milliseconds |
|
| Timestamp type |
|
| Delay before deleting segment files |
|
| Number of messages before flush |
|
| Time before flush in milliseconds |
|
| Index interval in bytes |
|
| Maximum compaction lag in milliseconds |
|
| Minimum cleanable dirty ratio |
|
| Segment index size in bytes |
|
| Delete retention in milliseconds |
|
| Topic creation time (seconds since epoch) |
|
| Topic creation time (nanoseconds) |
|
Connect connector metadata
Connector metadata is emitted from Connect when connectors or their configurations change. The following table lists the connector metadata fields collected from Connect.
Name | Description | Example |
|---|---|---|
| Connector name |
|
| Connect cluster group ID |
|
| Kafka cluster ID the Connect cluster uses |
|
| Connector class |
|
| Topics used by the connector |
|
| Connector configuration entries (key-value pairs) |
|
| Maximum number of tasks |
|
| Value converter class |
|
Metrics collected
The USM Agent collects metrics from Kafka and Connect. Metrics are sampled periodically (typically every 60 seconds) and sent to Confluent Cloud.
Kafka cluster metrics
The following table lists the metrics collected from Kafka clusters.
Metric name | Description | Additional applicable tags |
|---|---|---|
| The current number of bytes retained by the cluster, summed across all partitions. Sampled every 60 seconds. |
|
| The offset of the first message in a partition. Sampled every 60 seconds. |
|
| The offset of the last message in a partition. Sampled every 60 seconds. |
|
| The number of bytes received from the network since the previous sample. Sampled every 60 seconds. |
|
| The number of bytes sent over the network since the previous sample. Sampled every 60 seconds. |
|
| The number of records received since the previous sample. Sampled every 60 seconds. |
|
| The number of records sent since the previous sample, including unsuccessful sends. Sampled every 60 seconds. |
|
| The number of active controllers in the cluster. Alert if the aggregated sum across all brokers is anything other than 1, as there must be exactly one controller per cluster. |
|
| The number of partitions without an active leader. These partitions are neither writable nor readable. Alert if the value is greater than 0. |
|
| The rate of unclean leader elections. |
|
| The number of under-replicated partitions. In a healthy cluster, the number of in-sync replicas (ISRs) equals the total number of replicas. Under-replicated partitions occur when a broker is down or cannot replicate fast enough from the leader. |
|
| The number of partitions where the in-sync replica (ISR) count is lower than the configured minimum. |
|
| The average fraction of time that network processor threads are idle. Values range from 0 (all resources in use) to 1 (all resources available). |
|
| The average fraction of time that request handler threads are idle. |
|
| The number of leaders on the broker. |
|
| The number of partitions on the broker. |
|
| The count of active connections. |
|
| The total capacity of the storage volume, in bytes. |
|
| The amount of free space on the storage volume, in bytes. |
|
| The number of failed produce requests since the previous sample. Sampled every 60 seconds. |
|
| The number of failed fetch requests since the previous sample. Sampled every 60 seconds. |
|
| The total duration of inbound requests within the sampling window. To calculate the average time per request, divide this value by |
|
| The lag between a consumer group member’s committed offset and the partition’s high watermark. |
|
| Lag between a member’s committed offset and the partition high watermark. |
|
| The maximum pending rebalance time, in milliseconds, among all members of a consumer group |
|
| The number of bytes received by the client since the previous sample. Sampled every 60 seconds. |
|
| The number of bytes sent by the client since the previous sample. Sampled every 60 seconds. |
|
| The current count of active nodes for the Confluent Platform resource. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX. | Only |
Connect cluster metrics
The following table lists the metrics collected from Connect clusters.
Metric name | Description | Additional applicable tags |
|---|---|---|
| The total number of tasks from all workers for a connector. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX. |
|
| The number of failed tasks from all workers for a connector. Ideally, this value is 0. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX. |
|
| The number of records sent from transformations and written to Kafka by the source connector since the previous sample. Sampled every 60 seconds. |
|
| The number of records received by the sink connector since the previous sample. Sampled every 60 seconds. |
|
| The number of bytes sent from transformations and written to Kafka by the source connector since the previous sample. Sampled every 60 seconds. |
|
| The number of bytes received by the sink connector since the previous sample. Sampled every 60 seconds. |
|
| The number of dead letter queue records written to Kafka by the sink connector since the previous sample. Sampled every 60 seconds. | — |
| The average size of all client requests to the Connect cluster within a 60-second window. | — |
| The current count of active nodes for the Confluent Platform resource. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX | Only |