USM Agent: Metrics and Metadata Reference

The following tables describe the metrics and metadata collected by the USM Agent from your Confluent Platform deployment and sent to Confluent Cloud to power USM capabilities in the Confluent Cloud UI.

Data is organized by source component:

Metadata collected

The USM Agent collects topic metadata from Kafka (KRaft-based clusters) and connector metadata from Connect clusters. Metadata is emitted when resources are created, updated, or deleted.

Kafka topic metadata

The following table lists the topic metadata fields collected from Kafka.

Name

Description

Example

kafkaversion

Confluent Platform version

8.1.0-0-ce

subject

Event subject

topic

epoch

Epoch value

1

partitionkey

Partition key

default

kafkaprocessrolesbroker

Whether the broker role is enabled

false

kafkakafkanodeid

Kafka node ID

9990

javaversion

Java version of the emitting process

21.0.8

dataschema

Data schema when present; Optional.empty when absent

kafkaprocessrolescontroller

Whether the controller role is enabled

true

kafkaclusterid

Kafka cluster ID

9IyY5G0YQVm1ZzTpnmy9NQ

kafkacommitid

Kafka commit ID

f0ff52f85454ec17

datacontenttype

Data content type

application/protobuf

time

Event timestamp (ISO 8601)

2025-08-05 08:03:58 +0000 UTC

hosthostname

Host name of the emitting node

kraftcontroller-0

route

Route

catalog-events

kafkabrokerid

Broker or controller node ID

9990

topic_id

Unique topic identifier

bSWrTCDdQWe0QXsCZc7_Dg

topic_name

Topic name

test-topic-1

partitions_count

Number of partitions

2

replication_factor

Replication factor

3

retention_ms

Retention period in milliseconds (-1 if unset)

604800000

retention_bytes

Retention limit in bytes (-1 if unset)

-1

cleanup_policy

Cleanup policy

DELETE or COMPACT

compression_type

Compression type

PRODUCER or GZIP

min_insync_replicas

Minimum in-sync replicas

1

max_message_bytes

Maximum message size in bytes

1048588

segment_bytes

Segment size in bytes

1073741824

segment_ms

Segment roll time in milliseconds

604800000

message_timestamp_type

Timestamp type

CreateTime

file_delete_delay_ms

Delay before deleting segment files

60000

flush_messages

Number of messages before flush

9223372036854775807

flush_ms

Time before flush in milliseconds

9223372036854775807

index_interval_bytes

Index interval in bytes

4096

max_compaction_lag_ms

Maximum compaction lag in milliseconds

9223372036854775807

min_cleanable_dirty_ratio

Minimum cleanable dirty ratio

0.5

segment_index_bytes

Segment index size in bytes

10485760

delete_retention_ms

Delete retention in milliseconds

86400000

create_time_seconds

Topic creation time (seconds since epoch)

1754381038

create_time_nanos

Topic creation time (nanoseconds)

911000000

Connect connector metadata

Connector metadata is emitted from Connect when connectors or their configurations change. The following table lists the connector metadata fields collected from Connect.

Name

Description

Example

connector_name

Connector name

test-create-source-connector

cluster_group_id

Connect cluster group ID

setu-bb5.connect

metadata_kafka_cluster_id

Kafka cluster ID the Connect cluster uses

9IyY5G0YQVm1ZzTpnmy9NQ

class

Connector class

io.confluent.kafka.connect.datagen.DatagenConnector

topics

Topics used by the connector

datagen-topic

configs

Connector configuration entries (key-value pairs)

configs: {key: 'connector.class', value: 'io.confluent.kafka.connect.datagen.DatagenConnector'}, configs: {key: 'kafka.topic', value: 'datagen-topic'}, configs: {key: 'name', value: 'test-create-source-connector'}, configs: {key: 'quickstart', value: 'orders'}, configs: {key: 'tasks.max', value: '1'}, configs: {key: 'value.converter', value: 'org.apache.kafka.connect.json.JsonConverter'}

tasks_max

Maximum number of tasks

1

value_converter

Value converter class

org.apache.kafka.connect.json.JsonConverter

Metrics collected

The USM Agent collects metrics from Kafka and Connect. Metrics are sampled periodically (typically every 60 seconds) and sent to Confluent Cloud.

Metric tags

Metrics are tagged so they can be associated with the correct resource. Common tags are attached at the cloud. The tables below list the main tags that can appear on metric records.

Common tags for Kafka clusters

The following table lists the common tags for Kafka cluster metrics. These tags apply to all metrics except the node count metric.

Tag name

Description

resource.kafka.id

The ID of the Kafka cluster. This tag is added in Confluent Cloud.

resource.kafka.provided.id

The provided ID of the Kafka cluster.

resource.kafka.broker.id

The ID of a Kafka broker.

resource.kafka.version

The version of the Kafka cluster.

resource.kafka.roles.broker

Indicates whether the Kafka broker role is enabled.

resource.kafka.roles.controller

Indicates whether the Kafka controller role is enabled.

resource.environment.id

The ID of the Confluent Cloud environment. This tag is added in Confluent Cloud.

resource.organization.id

The ID of the Confluent Cloud organization. This tag is added in Confluent Cloud.

Node count metric tags

The following tags apply only to the io.confluent.usm/node_count metric for Kafka clusters.

Tag name

Description

resource.id

The ID of the cluster or resource. This tag is added in Confluent Cloud.

resource.provided.id

The provided ID of the cluster.

resource.kafka.roles.broker

Indicates whether the Kafka broker role is enabled.

resource.name

The name of the resource.

resource.type

The type of the resource.

resource.environment.id

The ID of the Confluent Cloud environment. This tag is added in Confluent Cloud.

resource.organization.id

The ID of the Confluent Cloud organization. This tag is added in Confluent Cloud.

metric.node.id

The ID of the node.

Common tags for Connect clusters

The following table lists the common tags for Connect cluster metrics. These tags apply to all metrics except the node count metric.

Tag name

Description

resource.connector.id

The ID of the Connect cluster. This tag is added in Confluent Cloud.

resource.connector.name

The name of the connector.

resource.connector.kafka.provided.id

The provided ID of the Kafka cluster associated with the connector.

resource.connector.cluster.id

The ID of the Connect cluster running the connector.

resource.connector.version

The version of the Connect cluster running the connector.

resource.environment.id

The ID of the Confluent Cloud environment. This tag is added in Confluent Cloud.

resource.organization.id

The ID of the Confluent Cloud organization. This tag is added in Confluent Cloud.

Node count metric tags

The following tags apply only to the io.confluent.usm/node_count metric for Connect clusters.

Tag name

Description

resource.id

The ID of the cluster or resource. This tag is added in Confluent Cloud.

resource.provided.id

The provided ID of the cluster.

resource.kafka.roles.broker

Indicates whether the Kafka broker role is enabled.

resource.name

The name of the resource.

resource.type

The type of the resource.

resource.environment.id

The ID of the Confluent Cloud environment. This tag is added in Confluent Cloud.

resource.organization.id

The ID of the Confluent Cloud organization. This tag is added in Confluent Cloud.

metric.node.id

The ID of the node.

Specific tags (used only when applicable)

The following table lists the specific tags that can appear on metric records.

Tag name

Description

metric.host.hostname

The name of the host.

metric.topic

The name of the Kafka topic.

metric.partition

The partition number of the Kafka topic.

metric.volume

The volume or disk label.

metric.type

The Kafka protocol request type.

metric.consumer_group_id

The ID of the consumer group.

metric.consumer_group_member_id

The ID of the member in a consumer group.

metric.client_id

The client ID of the producer or consumer.

metric.group_protocol

The group protocol used by the consumer group member.

Kafka cluster metrics

The following table lists the metrics collected from Kafka clusters.

Metric name

Description

Additional applicable tags

io.confluent.kafka.server/retained_bytes

The current number of bytes retained by the cluster, summed across all partitions. Sampled every 60 seconds.

TOPIC, PARTITION, HOSTNAME

io.confluent.kafka.server/log_start_offset

The offset of the first message in a partition. Sampled every 60 seconds.

TOPIC, PARTITION, HOSTNAME

io.confluent.kafka.server/log_end_offset

The offset of the last message in a partition. Sampled every 60 seconds.

TOPIC, PARTITION, HOSTNAME

io.confluent.kafka.server/received_bytes

The number of bytes received from the network since the previous sample. Sampled every 60 seconds.

TOPIC, HOSTNAME

io.confluent.kafka.server/sent_bytes

The number of bytes sent over the network since the previous sample. Sampled every 60 seconds.

TOPIC, HOSTNAME

io.confluent.kafka.server/received_records

The number of records received since the previous sample. Sampled every 60 seconds.

TOPIC, HOSTNAME

io.confluent.kafka.server/sent_records

The number of records sent since the previous sample, including unsuccessful sends. Sampled every 60 seconds.

TOPIC, CLIENT_ID, HOSTNAME

io.confluent.kafka.server/active_controller_count

The number of active controllers in the cluster. Alert if the aggregated sum across all brokers is anything other than 1, as there must be exactly one controller per cluster.

HOSTNAME

io.confluent.kafka.server/offline_partitions_count

The number of partitions without an active leader. These partitions are neither writable nor readable. Alert if the value is greater than 0.

HOSTNAME

io.confluent.kafka.server/unclean_leader_elections

The rate of unclean leader elections.

HOSTNAME

io.confluent.kafka.server/under_replicated_partitions

The number of under-replicated partitions. In a healthy cluster, the number of in-sync replicas (ISRs) equals the total number of replicas. Under-replicated partitions occur when a broker is down or cannot replicate fast enough from the leader.

HOSTNAME

io.confluent.kafka.server/under_min_isr_partition_count

The number of partitions where the in-sync replica (ISR) count is lower than the configured minimum.

HOSTNAME

io.confluent.kafka.server/network_processor_avg_idle_percent

The average fraction of time that network processor threads are idle. Values range from 0 (all resources in use) to 1 (all resources available).

HOSTNAME

io.confluent.kafka.server/request_handler_avg_idle_percent

The average fraction of time that request handler threads are idle.

HOSTNAME

io.confluent.kafka.server/leader_count

The number of leaders on the broker.

HOSTNAME

io.confluent.kafka.server/partition_count

The number of partitions on the broker.

HOSTNAME

io.confluent.kafka.server/active_connection_count

The count of active connections.

HOSTNAME

system/volume/capacity_bytes

The total capacity of the storage volume, in bytes.

VOLUME, HOSTNAME

system/volume/free_bytes

The amount of free space on the storage volume, in bytes.

VOLUME, HOSTNAME

io.confluent.kafka.server/failed_produce_requests

The number of failed produce requests since the previous sample. Sampled every 60 seconds.

TOPIC, HOSTNAME

io.confluent.kafka.server/failed_fetch_requests

The number of failed fetch requests since the previous sample. Sampled every 60 seconds.

TOPIC, HOSTNAME

io.confluent.kafka.server/request_duration_milliseconds

The total duration of inbound requests within the sampling window. To calculate the average time per request, divide this value by request_count.

REQUEST_TYPE, HOSTNAME

io.confluent.kafka.server/request_count

The lag between a consumer group member’s committed offset and the partition’s high watermark.

REQUEST_TYPE, HOSTNAME

io.confluent.kafka.server/consumer_lag_offsets

Lag between a member’s committed offset and the partition high watermark.

CONSUMER_GROUP_ID, CONSUMER_GROUP_MEMBER_ID, PARTITION, CLIENT_ID, TOPIC, GROUP_PROTOCOL, HOSTNAME

io.confluent.kafka.server/max_pending_rebalance_time_milliseconds

The maximum pending rebalance time, in milliseconds, among all members of a consumer group

CONSUMER_GROUP_ID, GROUP_PROTOCOL

io.confluent.kafka.server/bytes_in

The number of bytes received by the client since the previous sample. Sampled every 60 seconds.

TOPIC, CLIENT_ID, HOSTNAME

io.confluent.kafka.server/bytes_out

The number of bytes sent by the client since the previous sample. Sampled every 60 seconds.

TOPIC, CLIENT_ID, HOSTNAME

io.confluent.usm/node_count

The current count of active nodes for the Confluent Platform resource. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX.

Only Node Count Metric Tags

Connect cluster metrics

The following table lists the metrics collected from Connect clusters.

Metric name

Description

Additional applicable tags

io.confluent.kafka.connect/task_count

The total number of tasks from all workers for a connector. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX.

HOSTNAME

io.confluent.kafka.connect/failed_task_count

The number of failed tasks from all workers for a connector. Ideally, this value is 0. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX.

HOSTNAME

io.confluent.kafka.connect/sent_records

The number of records sent from transformations and written to Kafka by the source connector since the previous sample. Sampled every 60 seconds.

HOSTNAME

io.confluent.kafka.connect/received_records

The number of records received by the sink connector since the previous sample. Sampled every 60 seconds.

HOSTNAME

io.confluent.kafka.connect/sent_bytes

The number of bytes sent from transformations and written to Kafka by the source connector since the previous sample. Sampled every 60 seconds.

HOSTNAME

io.confluent.kafka.connect/received_bytes

The number of bytes received by the sink connector since the previous sample. Sampled every 60 seconds.

HOSTNAME

io.confluent.kafka.connect/dead_letter_queue_records

The number of dead letter queue records written to Kafka by the sink connector since the previous sample. Sampled every 60 seconds.

io.confluent.kafka.connect/request_size_avg

The average size of all client requests to the Connect cluster within a 60-second window.

io.confluent.usm/node_count

The current count of active nodes for the Confluent Platform resource. Sampled every 60 seconds. The implicit time aggregation for this metric is MAX

Only Node Count Metric Tags