Looking for Confluent Platform Cluster Linking docs? This page describes Cluster Linking on Confluent Cloud. If you are looking for Confluent Platform documentation, check out Cluster Linking on Confluent Platform.

Cluster Linking on Confluent Cloud

What is Cluster Linking?

Cluster Linking on Confluent Cloud is a fully-managed service for moving data from one Confluent cluster to another. Programmatically, it creates perfect copies of your topics and keeps data in sync across clusters. Cluster Linking is a powerful geo-replication technology for:

  • Multi-cloud and global architectures powered by real-time data in motion
  • Data sharing between different teams and lines of business
  • High Availability (HA)/Disaster Recovery (DR) during a regional cloud provider outage
  • Data and workload migration from a Apache Kafka® cluster to Confluent Cloud

Cluster Linking is fully-managed in Confluent Cloud, so you don’t need to manage or tune data flows. Its usage-based pricing puts multi-cloud and multi-region costs into your control. Cluster Linking reduces operational burden and cloud egress fees, while improving the performance and reliability of your cloud data pipelines.

How it Works

Cluster Linking allows one Confluent cluster to mirror data directly from another. You can establish a cluster link between a source cluster and a destination cluster in a different region, cloud, line of business, or organization. You choose which topics to replicate from the source cluster to the destination. You can even mirror consumer offsets and ACLs, making it straightforward to move Kafka consumers from one cluster to another.

../../_images/cloud-cluster-linking-overview.png

In one command or API call, you can create a cluster link from one cluster to another. A cluster link acts as a persistent bridge between the two clusters.

ccloud kafka link create tokyo-sydney
  --source-bootstrap-server pkc-867530.ap-northeast-1.aws.confluent.cloud:9092
  --source-cluster-id lkc-42492
  --api-key AP1K3Y
  --api-secret ********

To mirror data across the cluster link, you create mirror topics on your destination cluster.

ccloud kafka mirror create clickstream.tokyo
   --link tokyo-sydney
../../_images/cloud-cluster-linking-mirror.png

Mirror topics are a special kind of topic: they are read-only copies of their source topic. Any messages produced to the source topic are mirrored to the mirror topic “byte-for-byte,” meaning that the same messages go to the same partition and same offset on the mirror topic. Mirror topics can be consumed from just the same as any other topic.

../../_images/cluster-link-mirror-topics-example.png

Cluster links and mirror topics are the building blocks you can use to create scalable, consistent architectures across regions, clouds, teams, and organizations.

Cluster Linking replicates essential metadata.

  • Cluster Linking applies the best practice of syncing topic configurations between the source and mirror topics. (Certain configurations are synced, others are not.)
  • You can enable consumer offset sync, which will sync consumer offsets from the source topic to the mirror topic (only for mirror topics), and you can filter to specific consumer groups if desired.
  • You can enable ACL sync, which will sync all ACLs on the cluster (not just for mirror topics). You can filter based on the topic name or the principal name, as needed.

These features are covered in the various Tutorials.

Use Cases

Confluent provides multi-cloud, multi-region, and hybrid capabilities in Confluent Cloud. Some of these are demo’ed in the Tutorials.

  • Global Data Sharing - Share data for selected topics across different regions, clouds, environments, or teams.
  • Data Migration - Migrate data and workloads from one cluster to another.
  • Disaster Recovery and High Availability - Create a disaster recovery cluster, and fail over to it during an outage.

Cluster Linking mirroring throughput (the bandwidth used to read data or write data to your cluster) is counted against your Dedicated cluster CKUs and limits.

Supported Cluster Types

A cluster link sends data from a “source cluster” to a “destination cluster”. The supported cluster types are shown in the table below. (Unsupported cluster types and other limits are described in Limitations.)

Source Cluster Options Destination Cluster Options
  • Dedicated Confluent Cloud cluster with Internet networking
  • Apache Kafka® 2.4+ or Confluent Platform 5.4+
  • Dedicated Confluent Cloud cluster with Internet networking capable of reaching source cluster brokers

How to Check the Cluster Type

To check a Confluent Cloud cluster’s type and endpoint type:

  1. Log on to Confluent Cloud.

  2. Select an environment.

  3. Select a cluster.

    The cluster type is shown on the summary card for the cluster.

    ../../_images/cloud-cluster-link-dedicated-cluster.png

    Alternatively, click into the cluster, and select Cluster settings from the left menu. The cluster type is shown on the summary card for “Cluster type”.

    ../../_images/cloud-cluster-link-settings.png

    From within Cluster settings for a dedicated cluster, click the Networking tab to view the endpoint type. Only Dedicated clusters have the Networking tab; Basic and Standard clusters always have Internet networking. Networking is defined when you first create the Dedicated cluster.

    ../../_images/cloud-cluster-link-dedicated-cluster-networking.png

Pricing

Confluent Cloud clusters that use Cluster Linking are charged based on the number of cluster links and the volume of mirroring throughput to or from the cluster.

For a detailed breakdown of how Cluster Linking is billed, including guidelines for using metrics to track your costs, see Cluster Linking in Confluent Cloud Billing.

More general information regarding prices for Confluent Cloud are on the website on the Confluent Cloud pricing page.

About Preview Features

Cluster Linking on Confluent Cloud is now in general availability. However, the following included are being introduced in preview mode to gain early feedback from developers. These metrics can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. Comments, questions, and suggestions related to preview features are encouraged and can be submitted to clusterlinking@confluent.io.

  • io.confluent.kafka.server/cluster_link_mirror_topic_count
  • io.confluent.kafka.server/cluster_link_mirror_topic_bytes
  • io.confluent.kafka.server/cluster_link_mirror_topic_offset_lag

First Look

Just getting started with Cluster Linking? Here are a few suggestions for next steps.

Tutorials

To get started, try one or more tutorial, each of which maps to a use case.

Mirror Topics

Read-only, mirror topics that reflect the data in original (source) topics are the building blocks of Cluster Linking. For a deep dive on this specialized type of topic and how it works, see Mirror Topics.

Commands and Prerequisites

The destination cluster can use the ccloud kafka link command to create a link from the source cluster. The following prerequisite steps are needed to run the tutorials during the Preview.

To try out Cluster Linking on Confluent Cloud:

  1. Install Confluent Cloud if you do not already have it.

    • As a shortcut alternative to installing from the web, you can get Confluent Cloud in two commands in your terminal window. (Replace ~/.local/bin in both commands with a different directory, if you wish.)

      curl -L --http1.1 https://cnfl.io/ccloud-cli | sh -s -- -b ~/.local/bin
      
      export PATH=~/.local/bin:$PATH;
      
    • To learn more about Confluent Cloud in general, see Quick Start for Apache Kafka using Confluent Cloud.

  2. Log on to Confluent Cloud.

  3. Update your Confluent Cloud CLI with ccloud update to be sure you have an up-to-date version of the Cluster Linking commands.

    The ccloud kafka link command has the following subcommands or flags.

    Command Description
    create Create a new cluster link.
    delete Delete a previously created cluster link.
    describe Describes an existing cluster link.
    list Lists existing cluster links.
    update Updates a property for an existing cluster link.

    The ccloud kafka mirror command has the following subcommands or flags.

    Command Description
    describe Describe a mirror topic.
    failover Failover the mirror topics.
    list List all mirror topics in the cluster or under the given cluster link.
    pause Pause the mirror topics.
    promote Promote the mirror topics.
    resume Updates a property for an existing cluster link.
  4. Follow the tutorials to try out Cluster Linking. The commands are demo’ed in the tutorials.

Pro Tips for the CLI

A list of Confluent Cloud CLI commands is available here. Following are some generic strategies for saving time on command line workflows.

Save command output to a text file

To keep track of information, save the output of the Confluent Cloud commands to a text file. If you do so, be sure to safeguard API keys and secrets afterwards by deleting the file or moving only the security codes to safer storage. To redirect command output to a file, you can use either of these methods and manually add in headings for organization:

  • To redirect output to a file, use Linux syntax such as <command> > notes.txt to run the first command and create the notes file, and then <command> >> notes.txt to append further output.
  • To send output to a file and also view it on-screen (recommended), use <command> | tee notes.txt to run the first command and create the file. Thereafter, use the tee command with the -a flag to append; for example, <command> | tee -a notes.txt.

Use configuration files to store data you will use in commands

Create configuration files to store API keys and secrets, detailed configurations on cluster links, or security credentials for clusters external to Confluent Cloud. Examples of this are provided in (Usually Optional) Use a config File in the topic data sharing tutorial and in Create the cluster link for the disaster recovery tutorial.

Use environment variables to store resource information

You can streamline your command line workflows by saving permissions and cluster data in shell environment variables. Save API keys and secrets, resources such as IDs for environments, clusters, or service accounts, and bootstrap servers, then use the variables in Confluent Cloud commands.

For example, create variables for an environment and clusters:

export CLINK_ENV=env-200py
export USA_EAST=lkc-qxxw7
export USA_WEST=lkc-1xx66

Then use these in commands:

$ ccloud environment use $CLINK_ENV
Now using "env-200py" as the default (active) environment.

$ ccloud kafka cluster use $USA_EAST
Set Kafka cluster "lkc-qxxw7" as the active cluster for environment "env-200py".

Put it all together in commands

Assuming you’ve created environment variables for your clusters, API keys, and secrets, and have cluster link configuration details in a file called link.config, here is an example of creating a cluster link named “east-west-link” using variables and your configuration file.

ccloud kafka link create east-west-link \
  --cluster $DESTINATION_ID  \
  --source-cluster-id $ORIG_ID \
  --source-bootstrap-server $ORIG_BOOT  \
  --config-file link.config

Limitations

This section details support and known limitations in terms of cluster types, cluster management, and performance.

Cluster Types and Networking

Currently supported cluster types are described in Supported Cluster Types.

Cluster Linking is not supported for Confluent Cloud clusters that have the Transit Gateway, VPC Peering, Privatelink or VNet Peering networking types. If you wish to use Cluster Linking with a privately networked Confluent Cloud cluster, contact your Confluent account team or email clusterlinking@confluent.io to find out more.

Cluster Linking does not currently support aggregating data from more than five different source clusters into one destination cluster.

ACL Syncing

A key feature of Cluster Linking is the capability to sync ACLs between clusters. This is useful when moving clients between clusters for a migration or failover. However, in Confluent Cloud, ACLs on a cluster can only be created for service accounts that are in the same Confluent Cloud organization as the cluster itself. Therefore, in practice, ACL sync is only useful between two Confluent Cloud clusters that are in the same Confluent Cloud organization.

ACL sync is not useful between two Confluent Cloud clusters in different organizations, between Confluent Platform and Confluent Cloud, or Apache Kafka® and Confluent Cloud.

Management Limitations

  • Cluster links must be created and managed on the destination cluster.
  • Cluster links can only be created with destination clusters that are Confluent Cloud Dedicated Confluent Cloud clusters with internet networking.
  • Mirror topics count against a cluster’s topic limits, partition limits, and/or storage limits; just like other topics.
  • There is no limit to the number of topics or partitions a cluster link can have, up to the destination cluster’s maximum number of topics and partitions.
  • A cluster can have at most five cluster links targeting it as the destination; that is, not more than five cluster links that are replicating data to it. If you require more than five cluster links on one cluster, contact Confluent Support.
  • By definition, a mirror topic can only have one cluster link and one source topic replicating data to it. Alternatively, a single topic can be the source topic for an unlimited number of mirror topics.
  • The frequency of sync processes for consumer group offset sync, ACL sync, and topic configuration sync are user-configurable. The frequency with which these syncs can occur is limited to at most once per second (that is, 1000 ms, since the setting is in milliseconds). You can set these syncs to occur less frequently, but no more frequent than 1000 ms.

Performance Limits

Throughput

For Cluster Linking, throughput indicates bytes-per-second of data replication. The following performance factors and limitations apply.

  • Cluster Linking throughput (bytes-per-second of data replication) counts towards the destination cluster’s produce limits (also known as “ingress” or “write” limits). However, production from Kafka clients is prioritized over Cluster Linking writes; therefore, these are exposed as separate metrics in the Metrics API: Kafka client writes are received_bytes and Cluster Linking writes are cluster_link_destination_response_bytes
  • Cluster Linking consumes from the source cluster similar to Kafka consumers. Throughput (bytes-per-second of data replication) is treated the same as consumer throughput. Cluster Linking will contribute to any quotas and hard or soft limits on your source cluster. The Kafka client reads and Cluster Linking reads are therefore included in the same metric in the Metrics API: sent_bytes
  • Cluster Linking is able to max out the throughput of your CKUs. The physical distance between clusters is a factor of Cluster Linking performance. Confluent monitors cluster links and optimizes their performance. Unlike Replicator and Kafka MirrorMaker 2, Cluster Linking does not have a unique scaling (that is, tasks). You do not need to scale up or scale down your cluster links to increase performance.
Connections
Cluster Linking conections count towards any connection limits on your clusters.
Request rate
Cluster Linking contributes requests which count towards your source cluster’s request rate limits.

Frequently Asked Questions

Known Issues

Considerations for deleting source topics

Do not delete a source topic for an active mirror topic, as it can cause issues with Cluster Linking. Instead, follow these steps as a best practice:

  1. Use the promote or failover commands to stop or delete any active mirror topics that read from the source topic you want to delete.
  2. Then, you can safely delete the source topic.

To learn more, see Source Topic Deletion in Mirror Topics.

Limited access to promoted or failover topics while the source cluster is unavailable

A known issue prevents viewing topics that were mirror topics, but were stopped with the promote or failover command.

If the cluster link that originally created the mirror topic cannot reach its source cluster, its mirror topics will enter the state SOURCE_UNAVAILABLE. (This is normal behavior)

When this happens, topics created by this link and then stopped will change their status from STOPPED to SOURCE_UNAVAILABLE, even though they are no longer mirror topics.

  • This incorrect status is reflected in the output of read commands from the CLI, the REST API, and the Metrics API.
  • In the Confluent Cloud Console, these topics will show up as mirror topics, even though they are regular topics. This prevents users from editing these topic configurations in the console.
  • The topics still function as normal topics. Produce and consume will not be affected. Schemas can be changed in Schema Registry. Topic configurations can be edited through the Confluent Cloud CLI and REST API, and the topics can be deleted.

Workarounds: To get the system out of this state, you can either delete the cluster link, or wait for the source cluster to become available again.

Unknown lag for paused or source unavailable mirror topics

A mirror topic with status SOURCE_UNAVAILABLE or PAUSED will expose a lag (Max Per Partition Mirror Lag) of 0 through the CLI and API. In reality, the lag is unknown, and may not be 0.

Suggested Resources