Multi-Tenancy and Client Quotas on Confluent Cloud

This topic describes multi-tenancy, principals, and Client Quotas as a tool to support multi-tenant workloads on Enterprise, Freight, and Dedicated Clusters. For information about service quotas, which define limits for Confluent Cloud resources, see Service Quotas for Confluent Cloud.

Why multi-tenancy?

As organizations continue to integrate Apache Kafka® into their applications, Confluent Cloud becomes a central nervous system for businesses, meaning that each of the organization’s clusters could contain large amounts of data that unique business units and teams consume, enrich, and generate business value from.

Running multiple distinct applications on a single cluster is known as multi-tenancy. Each tenant on the cluster consumes some portion of the cluster’s resources.

Your business may choose to support multi-tenancy on a single Enterprise, Freight, or Dedicated Cluster for the following reasons:

Lower cost: Most use cases don’t need the full capacity of a Kafka cluster. You can minimize fixed costs by sharing a single cluster across workloads, and spread the costs across teams, even when those workloads are kept separate.
Simpler operations: Using one cluster instead of several means means a narrower scope of access controls and credentials to manage. Separate clusters may have different networking configurations, API keys, schema registries, roles, and ACLs.
Greater reuse: Teams can most easily reuse existing data when they’re already using the cluster - it’s a simple access control change to grant them access. Reusing topics and events created by other teams lets teams deliver value more quickly.

Supporting a multi-tenant workload imposes additional requirements. For example, you might need granular insights into the behaviors of each tenant, and details about the cluster as a whole. Running a multi-tenant cluster raises the following questions:

What resources are each tenant consuming?
What is the level of performance each tenant is obtaining?
Are some tenants consuming too many resources, while others are not getting enough?

Application identity and principals

To manage many tenants on a single cluster, and answer the questions posed previously, each tenant must have a unique identity. This identity is called a principal.

Assigning each tenant a unique principal provides the foundation for granular monitoring and management capabilities in Confluent Cloud. Although there isn’t a blanket recommendation for how to assign principals to applications, you can create principals that enable you to map unique identities to tenants with service accounts or identity pools.

Service accounts

Each service account represents an application principal programmatically accessing Confluent Cloud. Confluent recommends using one service account per producer application (or consumer group) for maximal granularity and control.

Each service account can have one or more API keys. The service account corresponds to the long-lived principal identity, while the API keys are credentials that you can and should rotate periodically. Client Quotas cannot be applied to a specific API key, and specific API keys are not labeled in the Metrics API.

Identity pools

Each identity pool represents either a single application or a group of applications. In general, each identity pool is seen as a unique principal inside of Confluent Cloud.

The Metrics API supports metric visibility at the identity pool level, and Client Quotas are applied at the identity pool level as well. If multiple applications are using the same identity pool, they will share a quota and their application metrics will be aggregated.

Monitoring metrics by principal

The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. The Metrics API supports labels, which can be used in queries to filter or group results. For multi-tenancy, the principal_id label enables metric filtering by a specific application. By monitoring the performance characteristics of specific applications, you can derive granular insights about cluster utilization. For example, you can learn which applications are responsible for driving high levels of throughput consumption or requests. You can also correlate changes in application metrics with fluctuations in overall cluster utilization as measured by the cluster load metric.

The following metrics are labeled with principal_id :

io.confluent.kafka.server/request_bytes
io.confluent.kafka.server/response_bytes
io.confluent.kafka.server/active_connection_count
io.confluent.kafka.server/request_count
io.confluent.kafka.server/successful_authentication_count

For an example of querying the Metrics API by principal_id, see Query Metrics by Specific Principal and Track Usage by Team on Dedicated Clusters in Confluent Cloud.

For a full list of metrics and supported labels, see the Metrics API.

See Monitor Confluent Cloud Clients for more information on monitoring clients.

Control application usage with Client Quotas

Confluent Cloud Client Quotas are a cloud-native implementation of Kafka Client Quotas. Confluent Cloud Client Quotas enable you to apply throughput limits to specific principals. Client Quotas in Confluent Cloud differ slightly from Quotas in Apache Kafka:

Quota parameter	Cloud Client Quotas	Apache Kafka Quotas
Apply to	Service Accounts or identity pools	User or Client ID
Managed by	Calling the Confluent Cloud API	API Interacting with Kafka Directly
Level enforced at	Cluster level	Broker level

Confluent Cloud Client Quotas:

Are defined on the cluster level. Confluent Cloud automatically distributes slices of the quota to the correct brokers for the target principal.
Restrict Ingress and Egress throughput
Can apply to one or more principals
Each principal assigned to a quota receives the full amount of the quota, meaning the quota is not shared by the principals it is assigned. For example if a 10 MBps ingress quota is applied to principals 1 and 2, principal 1 will be able to produce at most 10 MBps, no matter what principal 2 does.
Define a throughput maximum, but do not guarantee a throughput floor. Applications are rate-limited through the use of the Kafka throttling mechanism. Kafka asks the client to wait before sending more data and mutes the channel, which appears as latency to the client application.

Each cluster can be assigned one default quota, which applies to all principals connecting to the cluster, unless a more specific quota is applied. Each principal can be referenced by one quota at most per cluster.

Example 1: cluster default quota is 10 MBps. A specific 5 MBps quota for Principal 1 is applied. Principal 1 will be able to produce at most 5 MBps.
Example 2: cluster default quota is 10 MBps. A specific 50 MBps quota for Principal 1 is applied. Principal 1 will be able to produce at most 50 MBps.

Client Quotas can be used to support multi-tenancy by rate limiting distinct applications as necessary. Common approaches for managing cluster utilization include:

Creating a default quota that ensures all applications are held to a specified throughput value, unless the application is of high-priority and requires more throughput
Creating a quota for nightly ETL jobs that consume too much throughput and negatively impact the performance of other, more latency-sensitive applications
Creating quotas that represent different tiers of service, with certain applications assigned higher throughput levels than others.