Request rate limits

Confluent Cloud has a limit on the maximum number of client requests allowed within a second. Client requests include but are not limited to requests from a producer to send a batch, requests from a consumer to commit an offset, or requests from a consumer to fetch messages. If request rate limits are hit, requests may be refused and clients may be throttled to keep the cluster stable. When a client is throttled, Confluent Cloud will delay the client’s requests for produce-throttle-time-avg (in ms) for producers or fetch-throttle-time-avg (in ms) for consumers

Confluent Cloud offers different cluster types, each with its own usage limits. This demo assumes you are running on a “basic” or “standard” cluster; both have a request limit of 1500 per second.

  1. Open Grafana and use the username admin and password password to login.

  2. Navigate to the Confluent Cloud dashboard.

  3. Check the Requests (rate) panel. If this panel is yellow, you have used 80% of your allowed requests; if it’s red, you have used 90%. See Grafana documentation for more information about about configuring thresholds.

    Confluent Cloud Panel

  4. Scroll lower down on the dashboard to see a breakdown of where the requests are to in the Request rate stacked column chart.

    Confluent Cloud Request Breakdown

  5. Reduce requests by adjusting producer batching configurations (linger.ms), consumer batching configurations (fetch.max.wait.ms), and shut down unnecessary clients.