Track Usage by Team on Dedicated Clusters in Confluent Cloud¶

When you support multiple tenants on a Dedicated cluster, you may need to track usage by application to provide showbacks to internal customers for their consumption.

This document describes a model for tracking usage and implementing showbacks of Confluent Cloud Dedicated Cluster costs based on Service Accounts.

Calculate monthly costs by team¶

To track usage by team, you assign each unique team/application its own service account. Then you use the Metrics API and filter results using the principal_id label to separate usage by service account. You track and sum this usage on a monthly basis, and use it to create a derived showback of costs for each service account.

Why use principals instead of topics¶

Topics are shared resources just like clusters. The goal of showbacks is to divide the cost of shared resources across the teams responsible for creating the costs. The ideal way to do this is with principals. Each principal should represent an application programmatically accessing Confluent Cloud. By breaking down billing metrics by principal, you can accurately assign costs like throughput to each application responsible for the costs.

Prerequisites¶

In order to accurately track usage by principal ID, you:

Must have similar retention times across all of the topics
Can map service accounts to teams
Want to implement showbacks to teams by throughput usage
Must have sufficient access to a Confluent Cloud cluster to make queries and view billing information

Query for usage¶

You should execute daily queries for the previous 24 hour interval, making sure the daily window is in the past.

Steps to determine usage by principal:

Get the request bytes for a cluster daily by making a POST call to the Metrics API, filtered by principal ID. Store the data in a reliable location by month. For more details on this call, see the Metrics API Reference.

Your request might look like the following:

curl --location --request POST 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' \
--header 'Authorization: Basic <BASE-64-encoded-cloud-api-key-and-password>' \
--header 'Content-Type: application/json' \
--data-raw '{
"aggregations": [
  {
      "metric": "io.confluent.kafka.server/request_bytes"
  }
],
"filter": {
  "field": "resource.kafka.id",
  "op": "EQ",
  "value": "lkc-momo2"
},
"granularity": "P1D",
"group_by": [
  "metric.principal_id"
],
"intervals": [
    "2022-09-11T00:00:00-00:00/P1D"
],
"limit": 1000
}'

Your response will resemble the following. You will want to store the returned value for each service account (this example shows one) on a daily basis. Note that the returned value is in scientific notation, which you will probably want to convert to decimal format to calculate a showback amount.

{"data":[{"timestamp":"2022-09-11T00:00:00Z","value":4.69304195E8,"metric.principal_id":"sa-abcj5m"}]}

Get the response bytes for a cluster daily by making a POST call to the Metrics API. Store the data by month in a reliable location. For more details on this call, see the Metrics API Reference.

Your request to get all data for a cluster, sorted by principal ID, might look like the following:

curl --location --request POST 'https://api.telemetry.confluent.cloud/v2/metrics/cloud/query' \
--header 'Authorization: Basic <BASE-64-encoded-cloud-api-key-and-password>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "aggregations": [
     {
         "metric": "io.confluent.kafka.server/response_bytes"
     }
 ],
 "filter": {
     "field": "resource.kafka.id",
     "op": "EQ",
     "value": "lkc-abc12"
 },
 "granularity": "P1D",
 "group_by": [
     "metric.principal_id"
 ],
 "intervals": [
     "2022-09-11T00:00:00-00:00/P1D"
 ],
 "limit": 1000
}'

The response will resemble the following. You will want to store the value for each service account (this example shows one) on a daily basis. Note that the returned value is in scientific notation, which you will probably want to convert to decimal format to calculate a showback amount.

{"data":[{"timestamp":"2022-09-11T00:00:00Z","value":2.44310325E8,"metric.principal_id":"sa-abcj5m"}]}

At the end of a month, calculate the totals of all requests and response bytes, and sum the usage by principal_id. Meaning:
- Calculate the sum of the daily request bytes value for each principal_id for the month. Call this sum total_request_bytes<principal_id>.
- Calculate the total sum of request bytes for all principals for the month. Call this value total_request_bytes.
- Calculate the sum of the daily response bytes value for each principal_id for the month. Call these values total_response_bytes<principal_id>.
- Calculate the total sum of all response bytes for all principals for the month. Call this value total_response_bytes.

Get the monthly bill for a cluster¶

The next step in determining showbacks is to get your monthly Confluent Cloud bill for the cluster.

Use the Confluent Cloud Console to get your bill for a month. Billing & payment can be found on the Administration menu.
On the Billing & payment page, use the drop-downs to select a month and an environment. Charges for that environment will be separated by cluster.
There may be several categories billed for the cluster. Add up these costs for the cluster you want to calculate showbacks for. For this example, this value is called total_billing.
- Multiply total_billing for the cluster by 0.33. This is your request_bytes_cost.
- Multiply total_billing for the cluster by 0.67. this is your response_bytes_cost. Note that the ~2:1 ratio between response bytes weight and request bytes weight in the model reflects the higher cost of consumption in Confluent Cloud multi-AZ clusters.
  
  Your calculations might look like the following:
  
  Month = August 2022
  - total _billing = $92,935
  - request_bytes_cost = $92,935 * 0.33 = $30,668.55
  - response_bytes_cost = $92,935 * 0.67 = $62,266.45

Calculate the monthly showbacks¶

Finally, the monthly byte usage and Confluent Cloud billing data to calculate monthly showbacks.

Calculate the showback for request and response bytes by principal ID.
- total_request_cost_<principal_id> = total_billing * 0.33 * total_request_bytes<principal_id> / total_request_bytes
- total_response_cost_<principal_id> = total_billing * 0.67 * total_response_bytes<principal_id> / total_response_bytes
Add total_request_cost_<principal_id> and total_response_cost_<principal_id> from the previous step for each principal ID to calculate the total monthly showback for an internal customer or team.

Note

For clusters approaching one of the CKU limits, you can use the principal_id label on the following metrics to determine if there is outsized usage by a particular team that should be considered in showbacks. This also helps prevent the need to scale up the Dedicated cluster unnecessarily. You can also use the Monitor cluster load to help determine if you are approaching maximum load on a cluster.

io.confluent.kafka.server/active_connection_count
io.confluent.kafka.server/request_count
io.confluent.kafka.server/successful_authentication_count