USM Agent: sizing, high availability, and monitoring

This page describes key architectural and operational considerations for the Unified Stream Manager (USM) Agent. The configuration for sizing, high availability, and monitoring depends on your deployment method. Find the section below that matches your environment: Confluent for Kubernetes or Confluent Ansible.

Confluent for Kubernetes deployments

When you deploy with Confluent for Kubernetes (CFK), the CFK automates most configurations for sizing and high availability.

Sizing and scaling

CFK automatically manages resource allocation for the USM Agent, so you don’t need to configure it manually. The CFK sets the following default resources for the usm-agent container:

Requests: 100m CPU, 128Mi Memory
Limits: 300m CPU, 256Mi Memory

To scale out, increase the replicas for the USM Agent in your custom resource. For information about overriding these defaults, see Specify CPU and memory requests.

High availability

To achieve high availability for the USM Agent, use standard CFK patterns by increasing the replica count in your custom resource. For a redundant setup, a minimum of two replicas is recommended.

To expose the USM Agent externally, follow the standard CFK procedures for configuring load balancers.

Monitoring

You can monitor USM Agent logs for troubleshooting and scrape Prometheus metrics for performance analysis.

Logs

The USM Agent provides three types of logs, which are accessed differently in a Kubernetes environment:

Application and access logs: These are available by default in the standard Kubernetes Pod logs. The agent sends application logs to stderr and access logs to stdout. This lets you configure a logging agent, such as Fluentd, Logstash, or Filebeat, to capture and manage these streams separately.
Traffic logs: You can extract these logs by configuring the logcollector component. These logs are located in the /var/log/confluent/usm-agent/tap/ directory.

Metrics with Prometheus

To monitor performance and health, the USM Agent exposes metrics for Prometheus scraping through a dedicated monitoring listener. By default, this listener binds to port 9910 and makes metrics available at the /stats/prometheus endpoint.

Configure Prometheus to scrape these metrics based on your Kubernetes-native deployment model.

For a single USM Agent instance, create a Kubernetes service that exposes the agent’s port.
For multiple USM Agent instances, use the Prometheus Operator to create a PodMonitor resource to manage scraping automatically.

Ansible Playbooks for Confluent Platform deployments

While Ansible Playbooks for Confluent Platform automate the deployment and configuration of the USM Agent, you must manually manage the underlying infrastructure for your Linux servers.

Sizing and scaling

Properly sizing the USM Agent is critical for performance and stability.

Vertical sizing: CPU and memory

For a server, such as virtual machine or bare-metal, that hosts an USM Agent instance, a minimum configuration of 2 vCPU cores and 2 GB of RAM is recommended.

This baseline provides a stable environment with sufficient resources for both the agent process and the underlying operating system. After deployment, monitor the server’s CPU and memory utilisation and adjust these resources to meet the specific demands of your workload.

Horizontal sizing: adding instances

Add multiple agents primarily to achieve high availability. This ensures service continuity if one agent fails, as traffic can be redirected to a healthy instance.
If high availability is not a requirement, running one larger, vertically scaled agent is more resource-efficient than running multiple smaller agents.

High availability

For critical environments, use one of the following methods to make the USM Agent highly available on Linux servers.

Use a load balancer

This approach involves placing an HTTP-based load balancer in front of two or more USM Agent instances to distribute traffic and manage failover automatically. The load balancer runs continuous health checks on each agent. If it detects a failure, it automatically routes traffic away from the unhealthy instance to a healthy one.

Use a virtual IP

This method provides network-level failover without a dedicated load balancer, typically in an active-passive configuration. It requires clustering software to manage a shared IP address.

Two servers, a primary and a standby, both run the USM Agent. A single, floating Virtual IP (VIP) is assigned to the primary server.
All clients are configured to connect to this single VIP, not to the individual server IPs.
The clustering software constantly monitors the health of the primary agent. If it fails, the software automatically reassigns the VIP from the failed server to the standby server.
The standby server is instantly promoted to primary and begins to handle all incoming traffic.

Monitoring

You can monitor agent logs for troubleshooting and scrape Prometheus metrics for performance analysis.

Logs

The agent writes three types of logs to files on disk:

Application logs: Contain general runtime information and errors. These logs are located at /var/log/confluent/usm-agent/usm-agent_application.log.
Access logs: Provide a detailed record of every request handled by the agent. These logs are located at /var/log/confluent/usm-agent/usm-agent_access.log.
Traffic logs: Contain detailed, structured records of the traffic that is being processed. These logs are located in the /var/log/confluent/usm-agent/tap/ directory.

Metrics with Prometheus

The agent exposes Prometheus metrics on port 9910 at the /stats/prometheus endpoint. You must configure your Prometheus server to scrape this target. The dataplane listener is the USM Agent listener that receives metrics and metadata events from Kafka and Connect. The security configuration for the metrics endpoint—including the protocol such as http or https, and any authentication—is the same as the configuration of the main dataplane listener.