Multi-Tenancy and Client Quotas on Confluent Cloud
Multi-tenancy, principals, and client quotas support multi-tenant workloads on Enterprise, Freight, and Dedicated clusters. For step-by-step procedures to create and apply client quotas, see Get Started with Client Quotas in Confluent Cloud. For information about service quotas, which define limits for Confluent Cloud resources, see Service Quotas for Confluent Cloud.
Why multi-tenancy?
Multi-tenancy lets your business share a single Apache Kafka® cluster across multiple applications and teams, which lowers cost, simplifies operations, and increases data reuse. Multi-tenancy is the practice of running multiple distinct applications on a single cluster, where each tenant consumes some of the cluster’s resources.
Your business can choose to support multi-tenancy on a single Enterprise, Freight, or Dedicated cluster for the following reasons:
Lower cost: Share a single cluster across workloads to reduce fixed costs. Most use cases don’t need the full capacity of a single Kafka cluster, and sharing one cluster spreads the costs across teams, even when you keep those workloads separate.
Simpler operations: Use one cluster instead of multiple clusters to narrow the scope of access controls and credentials to manage. Separate clusters can have different networking configurations, API keys, schema registries, roles, and access control lists (ACLs).
Greater reuse: Reuse existing data more easily when teams already use the cluster. Granting access is a simple access control change, and reusing topics and events created by other teams lets teams deliver value more quickly.
Supporting a multi-tenant workload introduces new requirements. You need detailed insight into each tenant’s behavior, as well as details about the cluster as a whole. The rest of this topic explains how to answer questions such as:
What resources is each tenant consuming?
What level of performance is each tenant obtaining?
Are some tenants consuming too many resources, while others are not getting enough?
Identify applications with principals
A principal is the unique identity assigned to each tenant on a Confluent Cloud cluster. Principals are the foundation for detailed monitoring and quota management in Confluent Cloud.
To manage many tenants on a single cluster, and answer the questions from the preceding section, each tenant must have a unique identity. Although there is no single recommended way to assign principals to applications, you can map unique identities to tenants with service accounts or identity pools. Choose service accounts for programmatic applications that need their own credentials. Choose identity pools to group one or more applications that authenticate with an external identity provider.
Service accounts as principals
Each service account represents an application principal programmatically accessing Confluent Cloud. Confluent recommends using one service account per producer application or consumer group, so you can monitor and control each one individually.
Each service account can have one or more API keys. The service account corresponds to the long-lived principal identity, while the API keys are credentials that you can and should rotate periodically. API keys have the following limitations:
You cannot apply client quotas to a specific API key.
The Metrics API does not label specific API keys.
Identity pools as principals
Each identity pool represents either a single application or a group of applications. In general, Confluent Cloud treats each identity pool as a unique principal.
The Metrics API supports metric visibility at the identity pool level, and Confluent Cloud applies client quotas at the identity pool level as well. If more than one application uses the same identity pool, they share a quota and their application metrics are aggregated.
Monitor metrics by principal
The Confluent Cloud Metrics API provides operational metrics for your Confluent Cloud deployment, including throughput, connection counts, and authentication counts. The Metrics API supports labels, which you can use in queries to filter or group results. For multi-tenancy, the principal_id label enables metric filtering by a specific application.
By monitoring the performance of specific applications, you can understand cluster usage in detail. For example, you can:
Learn which applications drive high levels of throughput consumption or requests.
Correlate changes in application metrics with fluctuations in cluster usage, as measured by the cluster load metric.
Confluent Cloud labels the following metrics with principal_id:
io.confluent.kafka.server/request_bytesio.confluent.kafka.server/response_bytesio.confluent.kafka.server/active_connection_countio.confluent.kafka.server/request_countio.confluent.kafka.server/successful_authentication_count
For a Metrics API example, see Query Metrics by Specific Principal. For a related tutorial, see Track Usage by Team on Dedicated Clusters in Confluent Cloud. For a full list of metrics and supported labels, see the Metrics API.
How Confluent Cloud client quotas work
Confluent Cloud client quotas are throughput limits that you apply to specific principals (service accounts or identity pools) at the cluster level. They are a cloud-native implementation of Kafka client quotas.
How Confluent Cloud client quotas differ from Kafka quotas
Client quotas in Confluent Cloud differ slightly from quotas in Kafka. The following table compares the two.
Quota parameter | Confluent Cloud client quotas | Kafka quotas |
|---|---|---|
Scope | Service accounts or identity pools | User or client ID |
Managed by | Calling the Confluent Cloud API | API interacting with Kafka directly |
Enforcement level | Cluster level | Broker level |
Client quota behavior
Confluent Cloud client quotas have the following characteristics:
Apply at the cluster level. Confluent Cloud automatically distributes slices of the quota to the correct brokers for the target principal.
Restrict ingress and egress throughput.
Can apply to one or more principals.
Define a throughput maximum, but do not guarantee a throughput floor.
Each principal assigned to a quota receives the full amount of the quota; the quota is not shared among the principals to which it is assigned. For example, if you apply a 10 MBps (megabytes per second) ingress quota to principals 1 and 2, principal 1 can produce at most 10 MBps, no matter what principal 2 does.
Kafka enforces a quota through client-side throttling: it asks the client to wait before sending more data and temporarily blocks the channel, which appears as latency to the client application.
Default quotas and specific quotas
You can assign one default quota to each cluster, which applies to all principals connecting to the cluster, unless you apply a more specific quota. Each principal can be referenced by at most one quota per cluster. The following table shows how a specific quota for a principal overrides the cluster default quota.
Example | Default quota | Specific quota for Principal 1 | Result for Principal 1 |
|---|---|---|---|
Example 1 | 10 MBps | 5 MBps | Produces at most 5 MBps |
Example 2 | 10 MBps | 50 MBps | Produces at most 50 MBps |
Common approaches for managing cluster usage
You can use client quotas to support multi-tenancy by rate-limiting distinct applications as necessary. Common approaches include:
Set a baseline for all applications: Create a default quota that holds all applications to a specified throughput value, unless an application is high-priority and requires more throughput.
Contain batch workloads: Create a quota for nightly extract, transform, load (ETL) jobs. Without a quota, these jobs can consume too much throughput and reduce performance for latency-sensitive applications.
Create tiers of service: Create quotas that represent different tiers of service, and assign higher throughput levels to applications in higher tiers.
For step-by-step procedures to create and apply client quotas, see Get Started with Client Quotas in Confluent Cloud.