USM Agent: Sizing, High Availability, and Monitoring

Sizing, high availability, and monitoring for the Unified Stream Manager (USM) Agent depend on your deployment method. This page covers configuration for Confluent for Kubernetes and Ansible Playbooks for Confluent Platform deployments. For tooling-specific instructions, see also Confluent for Kubernetes or Confluent Ansible.

For information about networking configuration including IPv6 and dual-stack support, see USM Agent Networking Configuration.

Confluent for Kubernetes deployments

When you deploy with Confluent for Kubernetes (CFK), the CFK automates most configurations for sizing and high availability.

Sizing and scaling

CFK automatically manages resource allocation for the USM Agent, so you don’t need to configure it manually. The CFK sets the following default resources for the usm-agent container:

Requests: 100m CPU, 128Mi memory
Limits: 300m CPU, 256Mi memory

To scale out, increase the replicas for the USM Agent in your custom resource. For information about overriding these defaults, see Specify CPU and memory requests.

For performance test results, see Performance testing.

High availability

To achieve high availability for the USM Agent, use standard CFK patterns by increasing the replica count in your custom resource. For a redundant setup, use at least two replicas.

To expose the USM Agent externally, follow the standard CFK procedures for configuring load balancers.

Monitoring

You can monitor USM Agent logs for troubleshooting and scrape Prometheus metrics for performance analysis.

Logs

The USM Agent provides three types of logs, which are accessed differently in a Kubernetes environment:

Application and access logs: These are available by default in the standard Kubernetes Pod logs. The agent sends application logs to stderr and access logs to stdout. This lets you configure a logging agent, such as Fluentd, Logstash, or Filebeat, to capture and manage these streams separately.
Traffic logs: You can extract these logs by configuring the logcollector component. These logs are located in the /var/log/confluent/usm-agent/tap/ directory.

Metrics with Prometheus

The USM Agent exposes Prometheus-compatible metrics for monitoring the agent’s own performance, health, and traffic patterns. These are Envoy proxy metrics that cover HTTP request and response statistics, upstream cluster health, connection pools, and system resource utilization.

For information about the metrics and metadata that the USM Agent collects from your Kafka and Connect clusters and sends to Confluent Cloud, see USM Agent: Metrics and Metadata Reference.

Metrics endpoint

By default, the monitoring listener binds to port 9910 and exposes metrics at the /stats/prometheus endpoint. The security configuration for this endpoint, including the protocol (http or https) and any authentication requirements, mirrors the configuration of the main dataplane listener.

Prometheus configuration for Kubernetes

To scrape metrics from multiple USM Agent instances in Kubernetes, use the Prometheus Operator with a PodMonitor resource.

Recommended: Using PodMonitor

If you have the Prometheus Operator installed, create a Service to expose the monitoring port and then create a PodMonitor to discover and scrape all USM Agent pods. By default, Kubernetes only exposes the dataplane port 10000, so you must create this Service first.

Create a Kubernetes Service to expose port 9910:

apiVersion: v1
kind: Service
metadata:
  name: usm-agent-metrics
  namespace: <namespace>
spec:
  ports:
  - name: monitoring
    port: 9910
    protocol: TCP
    targetPort: 9910
  selector:
    app: usm-agent

Note

Only expose the monitoring port 9910. Do not expose the admin port 9901 because it has no authentication.

Create a PodMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: usm-agent-monitor
  namespace: <namespace>
spec:
  selector:
    matchLabels:
      app: usm-agent
  podMetricsEndpoints:
  - port: monitoring
    path: /stats/prometheus
    interval: 60s
    scheme: http
    scrapeTimeout: 30s

Replace <namespace> with your USM Agent namespace.

Alternative: Using a Service

Confluent for Kubernetes exposes the external port for the USM Agent but does not expose the monitoring port by default. For a single USM Agent instance, complete the following steps:

Create a Kubernetes Service to expose port 9910:

apiVersion: v1
kind: Service
metadata:
  name: usm-agent-metrics
  namespace: <namespace>
spec:
  ports:
  - name: prometheus
    port: 9910
    protocol: TCP
    targetPort: 9910
  selector:
    app: usm-agent

Replace <namespace> with your USM Agent namespace.

Configure Prometheus to scrape the agent directly:

scrape_configs:
  - job_name: 'usm-agent'
    scrape_interval: 15s
    metrics_path: /stats/prometheus
    # Uncomment and configure if USM Agent uses mTLS authentication
    # scheme: https
    # tls_config:
    #   cert_file: '/path/to/client.pem'
    #   key_file: '/path/to/client.key'
    #   ca_file: '/path/to/cacerts.pem'
    # Uncomment and configure if USM Agent uses basic authentication
    # basic_auth:
    #   username: <username>
    #   password: <password>
    static_configs:
      - targets: ['<usm-agent-service>.<namespace>.svc.cluster.local:9910']

Replace <usm-agent-service> with your USM Agent service name and <namespace> with your namespace.

For a complete list of available metrics and their descriptions, see the Envoy statistics documentation.

Visualization with Grafana

To visualize USM Agent metrics in Grafana, import the pre-built Envoy proxy monitoring dashboard into your Grafana instance.

Ansible Playbooks for Confluent Platform deployments

While Ansible Playbooks for Confluent Platform automates the deployment and configuration of the USM Agent, you must manually manage the underlying infrastructure for your Linux servers.

Sizing and scaling

Size the USM Agent vertically by CPU and memory, and horizontally by adding instances for high availability.

Vertical sizing: CPU and memory

For a server, such as a virtual machine or bare-metal server, that hosts an USM Agent instance, use a minimum configuration of 2 vCPU cores and 2 GB of RAM.

This baseline provides a stable environment with sufficient resources for both the agent process and the underlying operating system. After deployment, monitor the server’s CPU and memory utilization and adjust these resources to meet the specific demands of your workload.

For performance test results, see Performance testing.

Operating system requirements

For non-containerized deployments, you can use Red Hat Enterprise Linux (RHEL) 9 or later, or RHEL 8 with USM Agent 1.2.0 or later. USM Agent 1.0.x and 1.1.x are not compatible with RHEL 8. On those earlier versions, you can deploy the USM Agent on RHEL 8 VMs as a container using Podman.

Horizontal sizing: adding instances

Add multiple agents primarily to achieve high availability. This ensures service continuity if one agent fails, as traffic can be redirected to a healthy instance.
If high availability is not a requirement, running one larger, vertically scaled agent is more resource-efficient than running multiple smaller agents.

High availability

For critical environments, use one of the following methods to make the USM Agent highly available on Linux servers.

Use a load balancer

An HTTP-based load balancer in front of two or more USM Agent instances distributes traffic and manages failover automatically. The load balancer runs continuous health checks on each agent. If it detects a failure, it automatically routes traffic away from the unhealthy instance to a healthy one.

Use a virtual IP

A virtual IP (VIP) provides network-level failover without a dedicated load balancer, typically in an active-passive configuration. It requires clustering software to manage a shared IP address.

Two servers, a primary and a standby, both run the USM Agent. A single, floating Virtual IP (VIP) is assigned to the primary server.
All clients are configured to connect to this single VIP, not to the individual server IPs.
The clustering software constantly monitors the health of the primary agent. If it fails, the software automatically reassigns the VIP from the failed server to the standby server.
The standby server instantly takes over as primary and handles all incoming traffic.

Monitoring

You can monitor agent logs for troubleshooting and scrape Prometheus metrics for performance analysis.

Logs

The agent writes three types of logs to files on disk:

Application logs: Contain general runtime information and errors. These logs are located at /var/log/confluent/usm-agent/usm-agent_application.log.
Access logs: Provide a detailed record of every request handled by the agent. These logs are located at /var/log/confluent/usm-agent/usm-agent_access.log.
Traffic logs: Contain detailed, structured records of the traffic that is being processed. These logs are located in the /var/log/confluent/usm-agent/tap/ directory.

Metrics with Prometheus

The USM Agent exposes Prometheus-compatible metrics for monitoring its performance, health, and traffic patterns. These are Envoy proxy metrics that cover HTTP request and response statistics, upstream cluster health, connection pools, and system resource utilization.

For information about the metrics and metadata that the USM Agent collects from your Kafka and Connect clusters and sends to Confluent Cloud, see USM Agent: Metrics and Metadata Reference.

Metrics endpoint

By default, the monitoring listener binds to port 9910 and exposes metrics at the /stats/prometheus endpoint. The security configuration for this endpoint, including the protocol (http or https) and any authentication requirements, mirrors the configuration of the main dataplane listener.

To access metrics from a local agent instance:

curl http://localhost:9910/stats/prometheus

If the dataplane listener uses TLS or basic authentication, apply the same credentials to the monitoring endpoint.

Prometheus configuration

Configure your Prometheus server to scrape the agent’s metrics endpoint. Add the following to your prometheus.yml file:

scrape_configs:
  - job_name: 'usm-agent'
    scrape_interval: 15s
    metrics_path: /stats/prometheus
    # Uncomment and configure if USM Agent uses mTLS authentication
    # scheme: https
    # tls_config:
    #   cert_file: '/path/to/client.pem'
    #   key_file: '/path/to/client.key'
    #   ca_file: '/path/to/cacerts.pem'
    # Uncomment and configure if USM Agent uses basic authentication
    # basic_auth:
    #   username: <username>
    #   password: <password>
    static_configs:
      - targets: ['<agent-host>:9910']

Replace <agent-host> with the hostname or IP address of your USM Agent instance.

For a complete list of available metrics and their descriptions, see the Envoy statistics documentation.

Visualization with Grafana

To visualize USM Agent metrics in Grafana, import the pre-built Envoy proxy monitoring dashboard into your Grafana instance.

Performance testing

Confluent tested the USM Agent under simulated production workloads to evaluate resource requirements. Use these results as a benchmark to plan resource allocation for your deployment.

Test setup

The performance test simulated a high-scale deployment to determine the upper bounds of agent resource consumption. The test environment used the following configuration:

5 Confluent Platform clusters with a total of 130 nodes.
21 brokers and 5 controllers in each cluster for a total of 105 brokers and 25 controllers.
30,000 topics per cluster with a replication factor of 3 and 13 partitions per topic, totaling 1,170,000 partitions per cluster.
10,500 active clients per cluster, including 30% producers and 70% consumers.
The USM Agent ran in Kubernetes with default resource limits of 300m CPU and 256Mi memory.

Test results

Under the described workload, the USM Agent maintained a stable performance profile:

CPU and memory usage remained consistently low and within the default limits of 300m CPU and 256Mi memory.
Horizontally scaling by adding more USM Agent instances reduces the individual CPU load per instance while the memory footprint per instance remains stable.
CPU and memory usage show no significant spikes during sustained operation or peak metadata loads.

Note

Actual resource requirements vary based on the number of brokers, topics, partitions, and active clients in your environment. Any change to the network topology or the rate of metadata changes is likely to produce different results.