Monitor Metrics for Cluster Linking on Confluent Cloud

To monitor Cluster Linking on Confluent Cloud, use the Confluent Cloud Metrics. As shown below, Cluster Linking exposes metrics in the API to determine the number of cluster links on a cluster, the number of mirror topics on a cluster, mirroring throughput, and mirroring lag.

Metrics

Number of mirror topics on a cluster

io.confluent.kafka.server/cluster_link_mirror_topic_count
The count of mirror topics on the cluster. You can filter or group by the name of the cluster link, or by the state of the mirror topic.

Labels

Label Description
link_name Name of the cluster link.
link_mirror_topic_state The state the mirror topic is in.

Possible states for mirror topic are as follows:

Mirror Topic State Description
Mirror Actively mirroring data. Corresponds to the ACTIVE state in REST API. Known issue: also contains topics that are in the SOURCE_UNAVAILABLE state in REST API.
PausedMirror A user has paused this mirror topic, and it is not mirroring data. Corresponds to the PAUSED state in REST API.
PendingStoppedMirror A user has called promote on the mirror topic, and the promotion is in progress. Corresponds to the PENDING_STOPPED state in REST API.
StoppedMirror A promote or failover command has completed, and this topic has changed from a mirror topic to a regular topic. Corresponds to the STOPPED state in REST API.
FailedMirror The mirror topic has permanently failed, and will no longer mirror data. Corresponds to the FAILED state in REST API.

Example

Get the count of active mirror topics over the past hour, grouped by cluster link name.

{
  "aggregations": [
  {
    "metric": "io.confluent.kafka.server/cluster_link_mirror_topic_count"
  }
  ],
  "filter": {
    "op": "AND",
    "filters": [
      {
        "field": "resource.kafka.id",
        "op": "EQ",
        "value": "lkc-52p82"
      },
      {
        "field": "metric.link_mirror_topic_state",
        "op": "EQ",
        "value": "Mirror"
      }
    ]
  },
  "granularity": "PT1M",
  "group_by": [
    "metric.link_name"
  ],
  "intervals": [
    "now-1h/now"
  ],
  "limit": 25
}

Mirror topic state transition

io.confluent.kafka.server/cluster_link_mirror_transition_in_error
Monitor mirror topic state transition errors. For example, if a mirror topic encounters errors during the promotion process; that is, while its state is pending_stopped and it is being transitioned to stopped.

Labels

Label Description
mode Either source or destination.
link_name Name of link, based on customer input
link_mirror_topic_state The state the mirror topic is in.

Example

{
 "aggregations": [
   {
     "metric": "io.confluent.kafka.server/cluster_link_mirror_transition_in_error"
 }
 ],
 "filter": {
   "field": "resource.kafka.id",
   "op": "AND",
   "filters": [
     {
       "field": "resource.kafka.id",
       "op": "EQ",
       "value": "lkc-71gba"
     }
   ]
 },
 "granularity": "PT1H",
 "group_by": [
   "metric.link_mirror_topic_reason",
   "metric.link_mirror_topic_state",
   "metric.link_name"
 ],
 "intervals": [
   "2024-01-07T00:00:00.000Z/2024-01-07T00:02:00.000Z"
 ]
}

Mirroring throughput

Source

io.confluent.kafka.server/cluster_link_source_response_bytes
Rate of mirroring throughput, in bytes per second, sent by the source. For a maximum of 30 links per cluster, the full link name is reported in the tag for this metric. Beyond this limit, the cluster link name is reported simply as _confluent. This limit can be raised for specific “aggregation” use cases (currently to as much as 100-200 links) by contacting Confluent Support. To learn more, see limits on Cluster types and networking.

Labels

None.

Destination

io.confluent.kafka.server/cluster_link_destination_response_bytes
Rate of mirroring throughput, in bytes per second, received by the destination. You can filter or group by cluster link name. For a maximum of 30 links per cluster, the full link name is reported in the tag for this metric. Beyond this limit, the cluster link name is reported simply as _confluent. This limit can be raised for specific “aggregation” use cases (currently to as much as 100-200 links) by contacting Confluent Support. To learn more, see limits on Cluster types and networking.

Labels

Label Description
link_name Name of the cluster link.

Example

Get mirroring throughput on a destination cluster for the past hour, grouped by cluster link name.

{
  "aggregations": [
  {
    "metric": "io.confluent.kafka.server/cluster_link_destination_response_bytes"
  }
  ],
  "filter": {
    "field": "resource.kafka.id",
    "op": "EQ",
    "value": "lkc-XXXXX"
  },
  "granularity": "PT1M",
  "group_by": [
    "metric.link_name"
  ],
  "intervals": [
    "now-1h/now"
  ],
  "limit": 25
}

Mirror Topics

io.confluent.kafka.server/cluster_link_mirror_topic_bytes
The amount of bytes sent over each mirror topic on a destination cluster.

Labels

Label Description
link_name Name of the cluster link.
topic Name of the mirror topic.

Example

Get the total number of bytes sent each day over the last week on a cluster link called from_west, grouped by mirror topic name.

{
  "aggregations": [
      {
          "metric": "io.confluent.kafka.server/cluster_link_mirror_topic_bytes"
      }
  ],
  "filter": {
      "op": "AND",
      "filters": [
          {
              "field": "resource.kafka.id",
              "op": "EQ",
              "value": "lkc-odq3o"
          },
          {
              "field": "metric.link_name",
              "op": "EQ",
              "value": "from-west"
          }
      ]
  },
  "granularity": "P1D",
  "group_by": [
      "metric.topic"
  ],
  "intervals": [
      "now-7d/now"
  ],
  "limit": 25
}

Mirroring lag

io.confluent.kafka.server/cluster_link_mirror_topic_offset_lag

The mirroring lag indicates how far behind the destination is from the source in terms of processing events. This is measured as the maximum number of messages lagging on any of the partitions for a mirror topic.

For example, given a mirror topic with three partitions: one partition lags 4 messages behind the source topic, another lags 24 messages behind, and the third lags 92 messages behind, the mirror topic’s lag is reported as 92.

Each mirror topic’s lag is measured once per minute. If your query’s granularity is higher than a minute (PT1M), then the API will return the maximum lag from each of the minutes in that time range.

If your query does not group by topic, then it will return the maximum lag over all of the mirror topics that match the filter clause. For example, if your query filters on a specific link_name, then it will return the maximum lag among all of that link’s mirror topics.

Labels

Label Description
link_name Name of the cluster link.
topic Name of the mirror topic.

Example

Get the maximum mirroring lag for each mirror topic on a destination cluster.

{
    "aggregations": [
        {
            "metric": "io.confluent.kafka.server/cluster_link_mirror_topic_offset_lag"
        }
    ],
    "filter": {
        "field": "resource.kafka.id",
        "op": "EQ",
        "value": "lkc-odq3o"
    },
    "granularity": "PT1M",
    "group_by": [
        "metric.topic"
    ],
    "intervals": [
        "2021-08-14T07:00:00Z/2021-08-14T08:00:00Z"
    ],
    "limit": 25
}