Kafka Quotas

An Apache Kafka® cluster has the ability to enforce quotas on requests to control the broker resources used by clients. Two types of client quotas can be enforced by Kafka brokers for each group of clients sharing a quota:

  1. Network bandwidth quotas define byte-rate thresholds (version 0.9 and later)
  2. Request rate quotas define CPU utilization thresholds as a percentage of network and I/O threads (version 0.11 and later)

Why are quotas necessary?

Producers and consumers can produce and consume very high volumes of data, and generate requests at a very high rate. This can monopolize broker resources, cause network saturation and generally cause denial-of-service (DOS) for other clients and the brokers themselves. Quotas are necessary to protects against these issues particularly for large multi-tenant clusters where a small set of badly-behaved clients can degrade user experience for the well-behaved ones. In fact, when running Kafka as a service quotas make it possible to enforce API limits according to an agreed upon contract.

Client groups

By default, quotas are defined in terms of a user and client ID, where the user acts as an opaque principal name, and the client ID as a generic group identifier assigned by the client application. User principals identify Kafka clients and represent an authenticated user in a secure cluster. In a cluster that supports unauthenticated clients, user principal is a grouping of unauthenticated users chosen by the broker using a configurable PrincipalBuilder. The tuple (user, client-id) defines a secure logical group of clients that share both user principal and client ID.

Quotas are applied to user and client ID groups (user, client-id). For a given connection, the most specific quota matching the connection is applied. All connections of a quota group share the quota configured for the group. For example, if a produce quota of 10 MBps is applied to a user ID=”test-user” and client ID=”test-client”, this quota is shared across all producer instances of user = “test-user” with client-id = “test-client”.

Quota configuration

You can override the default quotas for user and client ID groups at any quota level that needs a higher or lower quota, similar to the per-topic log configuration overrides. Overrride for user quotas are written to ZooKeeper under /config/users and client-id quota overrides are written under /config/clients. These overrides are read by all brokers and are effective immediately. This enables quota changes without requiring a rolling restart of the entire cluster.

The order of precedence for quota configuration is as follows.

  1. /config/users/<user>/clients/<client-id>
  2. /config/users/<user>/clients/<default>
  3. /config/users/<user>
  4. /config/users/<default>/clients/<client-id>
  5. /config/users/<default>/clients/<default>
  6. /config/users/<default>
  7. /config/clients/<client-id>
  8. /config/clients/<default>

Confluent Tip

For the steps to set and review quotas for Confluent Platform, see Client Quotas.

To read about client quotas for Confluent Cloud, see Client Quotas.

Network bandwidth quotas

Network bandwidth quotas are defined as the byte rate threshold for each group of clients sharing a quota. By default, each unique client group receives a fixed quota in bytes per second (Bps) as configured by the cluster. This quota is defined on a per-broker basis. Each group of clients can publish/fetch a maximum of X Bps per broker before clients are throttled.

Request rate quotas

Request rate quotas are defined as the percentage of time a client can utilize on request handler I/O threads and network threads of each broker within a quota window.

A quota of n% represents n% of one thread, so the quota is out of a total capacity of ((num.io.threads + num.network.threads) * 100)%.

Each group of clients may use up to a total percentage of n% across all I/O and network threads in a quota window before being throttled. Since the number of threads allocated for I/O and network threads are typically based on the number of cores available on the broker host, request rate quotas represent the total percentage of CPU that may be used by each group of clients sharing the quota.


By default, each unique client group receives a fixed quota, configured at the cluster level. This quota is defined on a per-broker basis, meaning each client can utilize this quota per broker, before it gets throttled. Quotas per broker are preferred over a fixed cluster-wide bandwidth per client because this requires the ability to share client quota usage across the brokers, which can be challenging to get right.

When a broker detects a quota violation, it first computes the amount of delay needed to bring the violating client under its quota. It then immediately returns a response with the delay. In case of a fetch request, the response will not contain any data. Then, until the delay is over, the broker mutes the client channel to stop processing requests from that client.

Upon receiving a response with a non-zero delay duration, the Kafka client also stops sending requests to the broker during the delay. Therefore, requests from a throttled client are effectively blocked from both sides. Even with older client implementations that do not respect the delay response from the broker, the back pressure applied by the broker via muting its socket channel can still handle the throttling of badly-behaving clients. Those clients who sent further requests to the throttled channel will receive responses only after the delay is over.

Byte-rate and thread utilization are measured over multiple small windows, such as 30 windows of 1 second each, to detect and correct quota violations quickly. This is preferred over large measurement windows, such as 10 windows of 30 seconds each, which leads to large bursts of traffic followed by long delays which is not great for the user experience.


This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2.