The Metrics API provides endpoints for programmatic discovery of available resources and their metrics. This resource and metric metadata is represented by
The discovery endpoints can be used to avoid hardcoding metric and resource names into client scripts.
Discover available resources
A resource represents the entity against which metrics are collected, for example, a Kafka cluster, a Kafka Connector, a ksqlDB application, etc.
Get a description of the available resources by sending a
GET request to the
descriptors/resources endpoint of the API:
http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources' --auth '<API_KEY>:<SECRET>'
This returns a JSON document describing the available resources to query and their labels.
Discover available metrics
Get a description of the available metrics by sending a
GET request to the
descriptors/metrics endpoint of the API:
http 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/metrics?resource_type=kafka' --auth '<API_KEY>:<SECRET>'
resource_type query parameter is required to specify the type of resource for which to list metrics. The valid resource types can be determined using the
This returns a JSON document describing the available metrics to query and their labels.
A human-readable list of the current metrics is available in the API Reference.
Can the Metrics API be used to reconcile my bill?
No, the Metrics API is intended to provide information for the purposes of monitoring, troubleshooting,
and capacity planning. It is not intended as an audit system for reconciling bills as the metrics do not
include request overhead for the Kafka protocol at this time. For more details, see the billing documentation.
Why am I seeing empty data sets for topics that exist on queries other than for
If there are only values of 0.0 in the time range queried, than the API will return an empty set.
When there is non-zero data within the time range, time slices with values of 0.0 are returned.
retained_bytes decrease after I changed the retention policy for my topic?
The value of
retained_bytes is calculated as the maximum over the interval for each data point returned.
If data has been deleted during the current interval, you will not see the effect until the next time range window begins.
For example, if you produced 4GB of data per day over the last 30 days and queried for
retained_bytes over the last 3 days with a 1 day interval, the query would return values of 112GB, 116GB, 120GB as a time series. If you then deleted all data in the topic and stopped producing data, the query would return the same values until the next day. When queried at the start of the next day, the same query would return 116GB, 120GB, 0GB.
What are the supported granularity levels?
Data is stored at a granularity of one minute. However, the allowed granularity for a query is restricted by the size of the query’s interval.
Please see the API Reference for the currently supported granularity levels and query restrictions.
Why don’t I see consumer lag in the Metrics API?
In Kafka, consumer lag is not tracked as a metric on the server side. This is
because it is a cluster-level construct and today, Kafka’s metrics are derived
from instrumentation at a lower level abstraction. Consumer lag may be added to
the Metrics API at a later date. At this time, there are
multiple other ways to monitor Consumer lag including the client metrics, UI, CLI, and Admin API.
These methods are all available when using Confluent Cloud.
What is the retention time of metrics in the Metrics API?
Metrics are retained for seven days.
How do I know if a given metric is in preview or generally available (GA)?
We are always looking to add new metrics, but when we add a new metric, we
need to take some time to stabilize how we expose it, to ensure that it’s suitable
for most use cases. Each metric’s lifecycle stage (
generally available, etc.) is included in the
response from the
/descriptors/metrics endpoint. While a metric is in
preview we may
make breaking changes to its labels without an API version change, as we iterate
to provide the best possible experience.
What should I do if a query to Metrics API returns a timeout response (HTTP error code 504)?
If queries are exceeding the timeout (maximum query time is 60s) you may consider one or more of the following approaches:
- Reduce the time interval.
- Reduce the granularity of data returned.
- Break up the query on the client side to return fewer data points. For example, you can query for specific topics instead of all topics at once.
These approaches are especially important to when querying for partition-level data over days-long intervals.
Why are my Confluent Cloud metrics displaying only 1hr/6hrs/24hrs worth of data?
This is a known limitation that occurs in some clusters with a partition count
of more than 2,000. We are working on the issue, but there is no fix at this time.
What should I do if a query returns a 5xx response code?
We recommended retrying these type of responses. Usually, this is an indication of a transient server-side issue. You should
design your client implementations for querying the Metrics API to be resilient to this type of response for minutes-long periods.