Multi-tenancy and Client Quotas on Confluent Cloud

This topic describes multi-tenancy, principals, and Client Quotas as a tool to support multi-tenant workloads on Dedicated Clusters. For information about service quotas, which define limits for Confluent Cloud resources, see Service Quotas for Confluent Cloud.

Why multi-tenancy?

As organizations continue to integrate Apache Kafka® into their applications, Confluent Cloud becomes a central nervous system for businesses, meaning that each of the organization’s clusters could contain large amounts of data that unique business units and teams consume, enrich, and generate business value from.

Running multiple distinct applications on a single cluster is known as multi-tenancy. Each tenant on the cluster consumes some portion of the cluster’s resources.

Your business may choose to support multi-tenancy on a single Dedicated Cluster for the following reasons:

  • Lower cost: Most use cases don’t need the full capacity of a Kafka cluster. You can minimize fixed costs by sharing a single cluster across workloads, and spread the costs across teams, even when those workloads are kept separate.
  • Simpler operations: Using one cluster instead of several means means a narrower scope of access controls and credentials to manage. Separate clusters may have different networking configurations, API keys, schema registries, roles, and ACLs.
  • Greater reuse: Teams can most easily reuse existing data when they’re already using the cluster - it’s a simple access control change to grant them access. Reusing topics and events created by other teams lets teams deliver value more quickly.

Supporting a multi-tenant workload imposes additional requirements. For example, you might need granular insights into the behaviors of each tenant, and details about the cluster as a whole. Running a multi-tenant cluster raises the following questions:

  • What resources are each tenant consuming?
  • What is the level of performance each tenant is obtaining?
  • Are some tenants consuming too many resources, while others are not getting enough?

Application identity and principals

To manage many tenants on a single cluster, and answer the questions posed previously, each tenant must have a unique identity. This identity is called a principal.

Assigning each tenant a unique principal provides the foundation for granular monitoring and management capabilities in Confluent Cloud. Although there isn’t a blanket recommendation for how to assign principals to applications, you can create principals that enable you to map unique identities to tenants with service accounts or identity pools.

Service accounts

Each service account represents an application principal programmatically accessing Confluent Cloud. Confluent recommends using one service account per producer application (or consumer group) for maximal granularity and control.

Each service account can have one or more API keys. The service account corresponds to the long-lived principal identity, while the API keys are credentials that you can and should rotate periodically. Client Quotas cannot be applied to a specific API key, and specific API keys are not labeled in the Metrics API.

Identity pools

Each identity pool represents either a single application or a group of applications. In general, each identity pool is seen as a unique principal inside of Confluent Cloud.

The Metrics API supports metric visibility at the identity pool level, and Client Quotas are applied at the identity pool level as well. If multiple applications are using the same identity pool, they will share a quota and their application metrics will be aggregated.

Monitoring metrics by principal

The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. The Metrics API supports labels, which can be used in queries to filter or group results. For multi-tenancy, the principal_id label enables metric filtering by a specific application. By monitoring the performance characteristics of specific applications, you can derive granular insights about cluster utilization. For example, you can learn which applications are responsible for driving high levels of throughput consumption or requests. You can also correlate changes in application metrics with fluctuations in overall cluster utilization as measured by the cluster load metric.

The following metrics are labeled with principal_id :

  • io.confluent.kafka.server/request_bytes
  • io.confluent.kafka.server/response_bytes
  • io.confluent.kafka.server/active_connection_count
  • io.confluent.kafka.server/request_count
  • io.confluent.kafka.server/successful_authentication_count

For an example of querying the metrics API by principal_id, see Query Metrics by Specific Principal in the Metrics API doc.

For a full list of metrics and supported labels, see the Metrics API.

See Monitor Confluent Cloud Clients for more information on monitoring clients.

Control application usage with Client Quotas

Confluent Cloud Client Quotas are a cloud-native implementation of Kafka Client Quotas. Confluent Cloud Client Quotas enable you to apply throughput limits to specific principals. Client Quotas in Confluent Cloud differ slightly from Quotas in Apache Kafka:

Quota parameter Cloud Client Quotas Apache Kafka Quotas
Apply to Service Accounts or identity pools User or Client ID
Managed by Calling the Confluent Cloud API API Interacting with Kafka Directly
Level enforced at Cluster level Broker level

Confluent Cloud Client Quotas:

  • Are defined on the cluster level. Confluent Cloud automatically distributes slices of the quota to the correct brokers for the target principal.
  • Restrict Ingress and Egress throughput
  • Can apply to one or more principals
  • Each principal assigned to a quota receives the full amount of the quota, meaning the quota is not shared by the principals it is assigned. For example if a 10 MBps ingress quota is applied to principals 1 and 2, principal 1 will be able to produce at most 10 MBps, no matter what principal 2 does.
  • Define a throughput maximum, but do not guarantee a throughput floor. Applications are rate-limited through the use of the Kafka throttling mechanism. Kafka asks the client to wait before sending more data and mutes the channel, which appears as latency to the client application.

Each cluster can be assigned one default quota, which applies to all principals connecting to the cluster, unless a more specific quota is applied. Each principal can be referenced by one quota at most per cluster.

  • Example 1: cluster default quota is 10 MBps. A specific 5 MBps quota for Principal 1 is applied. Principal 1 will be able to produce at most 5 MBps.
  • Example 2: cluster default quota is 10 MBps. A specific 50 MBps quota for Principal 1 is applied. Principal 1 will be able to produce at most 50 MBps.

Client Quotas can be used to support multi-tenancy by rate limiting distinct applications as necessary. Common approaches for managing cluster utilization include:

  • Creating a default quota that ensures all applications are held to a specified throughput value, unless the application is of high-priority and requires more throughput
  • Creating a quota for nightly ETL jobs that consume too much throughput and negatively impact the performance of other, more latency-sensitive applications
  • Creating quotas that represent different tiers of service, with certain applications assigned higher throughput levels than others.

Get started with Client Quotas

This section shows how to create, list, and delete Confluent Cloud Client Quotas using the Confluent CLI and the Confluent Cloud Console. For the full command reference, see confluent kafka quota. To make similar calls with a REST API, see the Client Quotas Reference.


  • Confluent CLI installed, and access to a Confluent Cloud administrator account.
  1. Sign in to your Confluent Cloud account, and create a Service Account by running the following command:

    confluent iam service-account create client-quota-SA --description "Client Quotas demo SA"
  2. Create a quota, specifying a quota name, the cluster where the quota is applied, values for ingress and egress and the principal ID. Ingress and egress values are specified in bytes. Use values of at least 1 MB (1,000,000 bytes) or higher. Consider the following examples.

    For example, creating a quota with a service account principal:

    confluent kafka quota create --name test-quota --cluster lkc-12345 \
                                --ingress 1000000 --egress 1000000 description "Test Quota" \
                                --principals sa-12345

    For example, creating a quota with an identity pool principal:

    confluent kafka quota create --name test-quota --cluster lkc-12345 \
                                --ingress 1000000 --egress 1000000 description "Test Quota" \
                                --principals pool-abcde

    For example, creating a quota with identity pool and service account principals:

    confluent kafka quota create --name test-quota --cluster lkc-12345 \
                                --ingress 1000000 --egress 1000000 description "Test Quota" \
                                --principals pool-abcde sa-12345

    The result should look similar to the following:

    | ID           | cq-abcde    |
    | Display Name | test-quota  |
    | Description  | Test Quota  |
    | Ingress      | 1000000 B/s |
    | Egress       | 1000000 B/s |
    | Cluster      | lkc-12345   |
    | Principals   | sa-12345    |
  3. You can modify your quota later. For example, to add or remove principals, use the update command:

    confluent kafka quota update cq-abcde --add-principals sa-12345
    confluent kafka quota update cq-abcde --remove-principals sa-12345
  4. Finally, you can retrieve a list of quota IDs with the list command, and use delete to delete the the sample quota you created:

    confluent kafka quota list --cluster lkc-12345
    confluent kafka quota delete cq-abcde --cluster lkc-12345