Important

Cluster Linking is currently available on Confluent Cloud in an Early Access Program to a limited set of adopters. This feature can be used by those in the Early Access program for evaluation and non-production testing purposes, and to provide feedback to Confluent.

Your feedback is important for building out this feature. (For example, what metrics would you like to see exposed on the cluster link?) If you would like to sign up to participate in the Early Access or send feedback, email clusterlinking@confluent.io.

Cluster Linking (Early Access)

Confluent is incrementally adding multi-cloud, multi-region, and hybrid capabilities to Confluent Cloud, starting with this limited access introduction of Cluster Linking.

These will support a variety of use cases and architectures, including:

  • Disaster recovery
  • High availability
  • Migration

What is Cluster Linking?

Cluster Linking allows you to directly connect clusters together and mirror topics from one cluster to another. You can think of a “cluster link” as a bridge that connects on cluster to another. Topics can share data across the bridge. Data moves from a topic on the source cluster to a “mirrored” topic on the destination cluster by means of the cluster link.

Mirrored topics are created on the destination based on an original topic on the source and a specified cluster link to use to share data. Consumers on the destination cluster can read from local, read-only, mirrored topics to get data produced to the source cluster.

../_images/cloud-cluster-linking.png

If an original topic on the source cluster is removed for any reason, you can stop mirroring that topic, and convert it to a read/write topic on the destination.

Cluster Linking supports various use cases for multi-cluster, multi-region and hybrid cloud deployments.

Use Cases

Cluster Linking provides the following capabilities and solutions for your Confluent Cloud Kafka clusters and connected external clusters:

  • Topic data sharing - Share data for selected topics across Confluent Cloud clusters in different environments, continents, and cloud providers.
  • Cluster migration - Migrate clusters to Confluent Cloud more efficiently by using a cluster link.
  • Disaster recovery - Create a disaster recovery cluster in another cloud or environment and quickly failover in an unplanned outage to protect your SLOs, RPOs, and RTOs.
  • Hybrid cloud architecture - Deploy an ongoing data funnel for a few topics from an on-premise environment to Confluent Cloud. Cluster Linking provides a network partition tolerant architecture that supports this nicely (losing a network connection momentarily does not materially affect the data on any particular cluster), whereas trying this with stretch clusters requires a highly reliable and robust network.

What’s Supported

A cluster link sends data from a “source cluster” to a “destination cluster”.

  • The destination cluster must be Confluent Cloud or Confluent Platform 6.0+
  • The source cluster can be Confluent Cloud, Apache Kafka® 2.4+, Confluent Platform 5.4+
  • Source cluster must be reachable on the public internet. (For non-public facing network connections, such as a private link cluster or VPC peering, you may need to set up additional networking to provide access to the source.)

First Look

Clusters enrolled in the Early Access program can use the ccloud kafka link command to create a link from the destination to the source cluster.

Tip

  • If you are not signed up for Early Access but would like to participate, email clusterlinking@confluent.io.
  • If you are signed up for Early Access, but find that Cluster Linking is not enabled for you, please file a Support ticket for us to enable your cluster. Provide the pkc for your Destination cluster in the ticket. To learn how to get this, read step 5 of Identify Source and Destination, API keys, and Endpoints.

To try out Cluster Linking on Confluent Cloud:

  1. Make sure you are signed up for the Early Access Program.

  2. Log on to Confluent Cloud.

  3. Update your Confluent Cloud CLI to the latest version by using the command ccloud update.

  4. Verify that Cluster Linking is enabled by typing the ccloud kafka link command with no flags, or append the --help flag.

    ccloud kafka link --help
    

    Your output should resemble:

    $ ccloud kafka link --help
    Manages inter-cluster links.
    
    Usage:
      ccloud kafka link [command]
    
    Available Commands:
      create      Create a new cluster link.
      delete      Delete a previously created cluster link.
      describe    Describes a previously created cluster link.
      list        List previously created cluster links.
      update      Updates a property for a previously created cluster link.
    
    Global Flags:
      -h, --help            Show help for this command.
      -v, --verbose count   Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace).
    
    Use "ccloud kafka link [command] --help" for more information about a command.
    
    Did you know you can use the `ccloud feedback` command to send the team feedback?
    Let us know if the CLI is meeting your needs, or what we can do to improve it.
    
  5. Follow the steps in the Tutorial below to try out the feature.

Tutorial

For this tutorial, you will:

  • Create two clusters; one of which will serve as the source and the other as the destination cluster. The destination cluster must be a Dedicated cluster cluster.
  • Set up a cluster link.
  • Create a topic mirror based on a topic on the source cluster.
  • Produce data to the original source topic.
  • Consume data on the mirror topic (destination) over the link.
  • Stop mirroring the destination topic, which will change it from read-only to read/write.

Let’s get started!

Set up two clusters

If you already have two Confluent Cloud clusters set up, one of which is a Dedicated cluster to use as the destination, you can skip to the next task.

Otherwise, set up your clusters as follows.

Tip

If you need more guidance than given below, see Create a Cluster in Confluent Cloud and Step 1: Create a Kafka cluster in Confluent Cloud in the Getting Started guide.

  1. Log on to the Confluent Cloud web UI.

  2. Create two clusters in the same environment, as described in Create a Cluster in Confluent Cloud.

    At least one of these must be a Dedicated cluster, which will serve as the destination cluster.

    For example, you could create a Basic cluster called US-EAST to use as the source, and a Dedicated cluster called US-WEST to use as the destination.

  3. When you have completed these steps, you should have two clusters, similar to the following.

    ../_images/clink-source-dest.png

Populate the Source Cluster

Create a topic on the source cluster.

For example, create a topic called tasting-menu on US-EAST (the Basic cluster that will act as the source).

  • To add a topic from the Web UI, navigate to the Topics page on the source cluster (US-EAST > Topics), click Add a topic, fill in the topic name, and click Create with defaults.

  • To add a topic from the Confluent Cloud CLI, log in to the CLI (ccloud login), select the environment and cluster you want to use, and enter the command ccloud kafka topic create <topic>. For example:

    ccloud kafka topic create tasting-menu
    

    More detail about working with the Confluent Cloud CLI is provided in the next tasks, so if you don’t yet know how to select an environment or cluster on the CLI, this is explained below.

Set up a config file to authenticate to the Source Cluster

You will need a configuration file to authenticate into the source cluster. This file must have a .config extension. Use your favorite text editor to add the file to your working directory.

  1. Specify details of the source cluster in a file called source.config.

    Copy this starter text into source.config and replace <src-bootstrap-url>, <src-api-key>, and <src-api-secret> with the values for your source cluster.

    bootstrap.servers=<src-bootstrap-url>
    security.protocol=SASL_SSL
    sasl.mechanism=PLAIN
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='<src-api-key>' password='<src-api-secret>';
    

    For example given the Endpoint in the previous section, the bootstrap.servers URL would be bootstrap.servers=pkc-4yyd6.us-east1.gcp.confluent.cloud:9092.

    Important

    • The last entry must be all on one line, from sasl.jaas.config all the way to password='<src-api-secret>';. Do not add returns, as this will cause the configs to break.
    • The configuration options are case-sensitive. Be sure to use upper and lower case as shown in the example.
    • Use punctuation marks such as single quotes and semicolon exactly as shown.
  2. (Optional) Sync Consumer Group Offsets and ACLs

    The source.config file can also take these parameters, which you can use to sync consumer group offsets and ACLs. Optionally configure any of these additional parameters, and save the file.

    • consumer.offset.sync.enable
      Whether or not to sync consumer offsets from the source to the destination.
      • Type: boolean
      • Default: false
    • consumer.offset.group.filters
      JSON representation of a regex pattern-matching scheme to specify the consumer groups whose offsets you want to mirror from the source to the destination. Make sure you not have the same consumer group running on the source and the destination because the consumer offsets will overwrite one another. Examples are here.
      • Type: string
      • Default: “”
    • acl.sync.enable
      Whether or not to sync ACLs from the source to the destination. Examples are here
      • Type: boolean
      • Default: false
    • acl.filters.json
      JSON representation of a regex pattern-matching scheme to select the ACLs that should be synced.
      • Type: string
      • Default: “”

Specify the API key to use for the Destination Cluster

Tell the CLI to use your destination API key for the destination cluster:

ccloud api-key use <dst-api-key> --resource <dst-cluster-id>

You will get a verification that the API key is set as the active key for the given cluster ID.

Note

This is a one-time action that will persist forever. This API key will be used whenever you perform one-time actions on your destination cluster. It will not be stored on the cluster link. If you create a cluster link with this API key, then it will continue to run even if you later disable this API key.

Mirror a topic

Now that you have a cluster link, you can mirror topics across it; from source to destination.

  1. List the topics on the source cluster.

    ccloud kafka topic list --cluster <src-cluster-id>
    

    For example:

    $ ccloud kafka topic list --cluster lkc-7k6kj
          Name
    +--------------+
      stocks
      tasting-menu
      transactions
    
  2. Create a mirrored topic.

    Choose a source topic to mirror and use your cluster link to mirror it.

    Tip

    If you don’t already have a topic in mind, create one on the source cluster now with ccloud kafka topic create <topic-name> --cluster <src-cluster-id>. If you’ve been following along, use tasting-menu.

    You create mirrored topics on the destination cluster just as you would create a normal topic, but with a few extra parameters:

    ccloud kafka topic create <topic-name> --link <link-name> --mirror-topic <source-topic-name> --cluster <dst-cluster-id>
    

    For example:

    $ ccloud kafka topic create tasting-menuu --link usa-east-west --mirror-topic tasting-menu --cluster lkc-161v5
    Created topic "tasting-menu".
    

    Note

    • The mirror topic name (on the Destination) must be the same as the Source topic name. Topic renaming is not yet supported.
    • Make sure that you use the Destination cluster ID in the command to create the mirror topic.

Test the topic mirror by sending data

With the cluster link available and a mirrored topic configured on the Destination, you can test mirroring and linking end-to-end.

  1. Open two new command windows for a producer and consumer.

    In each of them, log on to Confluent Cloud, and make sure you are using the environment that contains both your Source and Destination clusters.

    As before, use the commands ccloud environment list, ccloud environment use <environment-ID>, and ccloud kafka cluster list to navigate and verify where you are.

  2. In one of the windows, start a producer to produce to your source topic.

    ccloud kafka topic produce <topic-name> --cluster <src-cluster-id>
    
  3. In the other window, start a consumer to read from your mirrored topic.

    ccloud kafka topic consume <topic-name> --cluster <dst-cluster-id>
    
  4. Type entries to produce in the first terminal on your source and watch the messages appear in your second terminal on the mirrored topic on the destination.

    ../_images/clink-produce-consume.png

    You can even open another command window and start a consumer for the source cluster to verify that you are producing directly to the source topic. Both the source and mirrored topic consumers will match, showing the same data consumed.

    Tip

    The consumer command example shown above reads data from a topic in real time. To consume from the beginning: ccloud kafka topic consume --from-beginning <topic> --cluster <cluster-id>

Stop the topic mirror

There may come a point when you want to stop mirroring your topic. For example, if you complete a cluster migration, or need to failover to your destination cluster in a disaster event, you may need to stop mirroring topics on the destination.

You can stop mirroring on a per-topic basis. The destination’s mirrored topic will stop receiving new data from the source, and become a standard, writable topic into which your producers can send data. No topics or data will be deleted, and this will not affect the source cluster.

To stop mirroring a specific mirror topic on the destination cluster, use the following command:

ccloud kafka topic mirror stop <mirrored-topic-name> --cluster <dst-cluster-id>

To stop mirroring the topic tasting-menu using the destination cluster ID from the examples:

$ ccloud kafka topic mirror stop tasing-menu --cluster lkc-161v5
Stopped mirroring for topic "tasting-menu".

Note

This command may be renamed in future CLI releases.

What happens when you stop mirroring a topic

The topic mirror stop command immediately stops new data mirroring from the source to the destination for the specified topic. If consumer.offset.sync.enable is on, consumer offsets mirroring is also stopped. (See Set up a config file to authenticate to the Source Cluster.)

If there is any lag between the source cluster and the destination cluster (either data or consumer offsets) when you run stop, that lag will never be mirrored to the destination cluster. The lag will remain only on the source cluster. This action is not reversible.

How to restart mirroring for a topic

To restart mirroring for that topic, you will need to delete the destination topic, and then recreate the destination topic as a mirror.

Migration Best Practices

If you are migrating data from source to destination, and you want to make sure no lagged data is lost, you may want to stop producers first and make sure any lag is mirrored before stopping the topic mirror:

  1. Stop your producers on your source cluster.

  2. Wait for any lag to be mirrored.

    Tip

    Look at the end offsets for both the source and mirrored topic (high watermark) and make sure they are both at the same offset.

  3. Run the topic mirror stop command.

Failover Considerations

If you’re failing over from source to destination because of a disaster event, please note these considerations.

Order of actions and promoting the Destination as post-failover active cluster

You should first stop mirrored topics, and then move all of your producers and consumers over to the destination cluster. The destination cluster should become your new active cluster, at least for the duration of the disaster and the recovery. If it works for your use case, we suggest making the Destination cluster your new, permanent active cluster.

Recover lagged data

There may be lagged data that did not make it to the destination before the disaster occurred. When you move your consumers, if any had not already read that data on the source, then they will not read that data on the destination. If/when the disaster resolves your source cluster, that lagged data will still be there. So, you are free to consume it / handle it as fits with your use case.

For example, if the Source was up to offset 105, but the Destination was only up to offset 100, then the source data from offsets 101-105 will not be present on the Destination. The Destination will get new, fresh data from the producers that will go into its offsets 101-105. When the disaster resolves, the Source will still have its data from offsets 101-105 available to consume manually.

Lagged consumer offsets may result in duplicate reads

There may be lagged consumer offsets that did not make it to the destination before the disaster occurred. If this is the case, then when you move your consumers to the destination, they may read duplicate data.

For example, if at the time that you stop your mirroring:

  • Consumer A had read up to offset 100 on the Source
  • Cluster Linking had mirrored the data through offset 100 to the Destination
  • Cluster Linking had last mirrored consumer offsets that showed Consumer A was only at offset 95

Then when you move Consumer A to the Destination, it may read offsets 96-100 again, resulting in duplicate reads.

Stopping a mirrored topic clamps consumer offsets

The stop command “clamps” consumer offsets.

This means that, when you run topic mirror stop, if:

  • Consumer A was on source offset 105 – and that was successfully mirrored to the Destination, and
  • the data on the Destination was lagging and was only up to offset 100 (so it did not have offsets 101-105)

then when you call stop, Consumer A’s offset on the Destination will be “clamped” down to offset 100, since that is the highest available offset on the Destination.

Note that this will cause Consumer A to “re-consume” offsets 101-105. If your producers send new, fresh data to the Destination, then Consumer A will not read duplicate data. (However, if you had custom-coded your producers to re-send offsets 101-105 with the same data, then your consumers could read the same data twice. This is a rare case, and is likely not how you have designed your system.)

Use consumer.offset.sync.ms

Keep in mind that you can configure consumer.offset.sync.ms to suit your needs (default is 30 seconds). A more frequent sync might give you a better failover point for your consumer offsets, at the cost of bandwidth and throughput during normal operation.

Migrate a consumer group

To migrate a consumer group called <consumer-group-name> from one cluster to another, stop the consumers and update the cluster link to stop mirroring the consumer offsets:

ccloud kafka link update <link-name> --cluster <src-cluster-id> --config \
consumer.offset.group.filters="consumer.offset.group.filters={\"groupFilters\": \
[{\"name\": \"*\",\"patternType\": \"LITERAL\",\"filterType\": \"INCLUDE\"},\
{\"name\":\"<consumer-group-name>\",\"patternType\":\"LITERAL\",\"filterType\":\"EXCLUDE\"}]}"

Then, point your consumers at the destination, and they will restart at the offsets where they left off.

Migrate a producer

To migrate a producer:

  1. Stop the producer.

  2. Make the destination topic writable:

    $ ccloud kafka topic mirror stop <mirrored-topic-name>
    
  3. Point your producer at the destination cluster.

Suggested Resources