Confluent is incrementally adding multi-cloud, multi-region, and hybrid capabilities to Confluent Cloud,
starting with this limited access introduction of Cluster Linking.
What is Cluster Linking?
Cluster Linking allows you to directly connect clusters together and mirror
topics from one cluster to another. You can think of a “cluster link” as a
bridge that connects on cluster to another. Topics can share data across the
bridge. Data moves from a topic on the source cluster to a “mirrored” topic on
the destination cluster by means of the cluster link.
Mirrored topics are created on the destination based on an original topic on the
source and a specified cluster link to use to share data. Consumers on the
destination cluster can read from local, read-only, mirrored topics to get data
produced to the source cluster.
If an original topic on the source cluster is removed for any reason, you can
stop mirroring that topic, and convert it to a read/write topic on the destination.
Cluster Linking supports various use cases for multi-cluster, multi-region and
hybrid cloud deployments.
First Look
Clusters enrolled in the Early Access program can use the ccloud kafka link
command to create a link from the destination to the source cluster.
Tip
- If you are not signed up for Early Access but would like to participate, email clusterlinking@confluent.io.
- If you are signed up for Early Access, but find that Cluster Linking is not enabled for you,
please file a Support ticket for us to enable your cluster. Provide the
pkc
for your
Destination cluster in the ticket. To learn how to get this, read step 5 of Identify Source and Destination, API keys, and Endpoints.
To try out Cluster Linking on Confluent Cloud:
Make sure you are signed up for the Early Access Program.
Log on to Confluent Cloud.
Update your Confluent Cloud CLI to the latest version by using the command ccloud update
.
Verify that Cluster Linking is enabled by typing the ccloud kafka link
command with no flags, or append the --help
flag.
Your output should resemble:
$ ccloud kafka link --help
Manages inter-cluster links.
Usage:
ccloud kafka link [command]
Available Commands:
create Create a new cluster link.
delete Delete a previously created cluster link.
describe Describes a previously created cluster link.
list List previously created cluster links.
update Updates a property for a previously created cluster link.
Global Flags:
-h, --help Show help for this command.
-v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace).
Use "ccloud kafka link [command] --help" for more information about a command.
Did you know you can use the `ccloud feedback` command to send the team feedback?
Let us know if the CLI is meeting your needs, or what we can do to improve it.
Follow the steps in the Tutorial below to try out the feature.
Tutorial
For this tutorial, you will:
- Create two clusters; one of which will serve as the source and the other as the destination cluster. The destination cluster must be a Dedicated cluster cluster.
- Set up a cluster link.
- Create a topic mirror based on a topic on the source cluster.
- Produce data to the original source topic.
- Consume data on the mirror topic (destination) over the link.
- Stop mirroring the destination topic, which will change it from read-only to read/write.
Let’s get started!
Set up two clusters
If you already have two Confluent Cloud clusters set up, one of which is a
Dedicated cluster to use as the destination, you can
skip to the next task.
Otherwise, set up your clusters as follows.
Log on to the Confluent Cloud web UI.
Create two clusters in the same environment, as described in Create a Cluster in Confluent Cloud.
At least one of these must be a Dedicated cluster, which will serve as the destination cluster.
For example, you could create a Basic cluster called US-EAST to use as the source, and a Dedicated cluster called US-WEST to use as the destination.
When you have completed these steps, you should have two clusters, similar to the following.
Populate the Source Cluster
Create a topic on the source cluster.
For example, create a topic called tasting-menu
on US-EAST (the Basic cluster that will act as the source).
To add a topic from the Web UI, navigate to the Topics page on the source cluster (US-EAST > Topics), click Add a topic,
fill in the topic name, and click Create with defaults.
To add a topic from the Confluent Cloud CLI, log in to the CLI (ccloud login
), select the environment and cluster you want to use,
and enter the command ccloud kafka topic create <topic>
. For example:
ccloud kafka topic create tasting-menu
More detail about working with the Confluent Cloud CLI is provided in the next tasks, so if you don’t yet know how to select an environment
or cluster on the CLI, this is explained below.
Identify Source and Destination, API keys, and Endpoints
Log on to the Confluent Cloud CLI.
View environments, and select the one you want to use by ID.
An asterisk indicates the currently selected environment in the list. You can select a different environment as follows.
ccloud environment use <environment-ID>
View your clusters.
ccloud kafka cluster list
Your output should resemble:
$ ccloud kafka cluster list
Id | Name | Type | Provider | Region | Availability | Status
+-------------+---------+-----------+----------+----------+--------------+--------+
* lkc-161v5 | US-WEST | DEDICATED | gcp | us-west1 | single-zone | UP
lkc-7k6kj | US-EAST | BASIC | gcp | us-east1 | single-zone | UP
Decide which cluster to use as the Destination cluster and which to use as the Source cluster,
and note down the cluster IDs, as you will need them later. For example, using the IDs shown above:
My Destination Cluster ID: lkc-161v5 (DEDICATED)
My Source Cluster ID: lkc-7k6kj (BASIC)
In the examples for this tutorial, <dst-cluster-id>
and <src-cluster-id>
indicate your source and destination IDs, respectively.
- Data will be mirrored from a topic on the source cluster to a topic on the destination cluster.
- The destination cluster must be a Dedicated cluster.
- The source cluster must be one of:
- BASIC
- BASIC_LEGACY
- STANDARD
- DEDICATED-with-Public-Internet-Networking
Describe the source cluster.
ccloud kafka cluster describe <src-cluster-id>
Your output will resemble:
$ ccloud kafka cluster describe lkc-7k6kj
+--------------+--------------------------------------------------------+
| Id | lkc-7k6kj |
| Name | US-EAST |
| Type | BASIC |
| Ingress | 100 |
| Egress | 100 |
| Storage | 5000 |
| Provider | gcp |
| Availability | single-zone |
| Region | us-east1 |
| Status | UP |
| Endpoint | SASL_SSL://pkc-4yyd6.us-east1.gcp.confluent.cloud:9092 |
| ApiEndpoint | https://pkac-ew1dj.us-east1.gcp.confluent.cloud |
+--------------+--------------------------------------------------------+
Note the source cluster Endpoint (without the SASL_SSL://
prefix), as you will need this later on.
For example, the Endpoint from the above cluster description is: pkc-4yyd6.us-east1.gcp.confluent.cloud:9092
You will need two sets of API keys and secrets; an API key and secret for the destination, and the source.
You can use API keys you’ve already created (even ones associated with a service account), or create new ones now from the CLI with this command,
using the cluster ID as the value for the --resource
flag.
ccloud api-key create --resource <cluster-id>
Your output for each will resemble:
$ ccloud api-key create --resource lkc-7k6kj
It may take a couple of minutes for the API key to be ready.
Save the API key and secret. The secret is not retrievable later.
+---------+------------------------------------------------------------------+
| API Key | keykeykeykeykeyk |
| Secret | +secretsecretsecretsecretsecretsecretsecretsecretsecretsecretsec |
+---------+------------------------------------------------------------------+
This tutorial refers to the destination’s set <dst-api-key>
and <dst-api-secret>
and the source’s set <src-api-key>
and <src-api-secret>
.
Keep these in a safe place, as you cannot retrieve your secrets from the CLI, and you will need both sets of API keys and secrets later on.
Tip
- The source’s set of API key and secret will be stored on the cluster link and used to fetch data. If you ever revoke that API key’s permissions,
your link will stop working. In that case, you would have to edit the cluster link and give it a different API key and secret.
- If you want to restrict the ACLs that your API keys have access to, you can find the exact set of required in the Confluent Platform documentation,
under Cluster Linking Security, at Authorization (ACLs)
Set up a config file to authenticate to the Source Cluster
You will need a configuration file to authenticate into the source cluster. This
file must have a .config
extension. Use your favorite text editor to add the
file to your working directory.
Specify details of the source cluster in a file called source.config
.
Copy this starter text into source.config
and replace <src-bootstrap-url>
, <src-api-key>
, and <src-api-secret>
with the values for your source cluster.
bootstrap.servers=<src-bootstrap-url>
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='<src-api-key>' password='<src-api-secret>';
For example given the Endpoint in the previous section, the bootstrap.servers URL would be bootstrap.servers=pkc-4yyd6.us-east1.gcp.confluent.cloud:9092
.
Important
- The last entry must be all on one line, from
sasl.jaas.config
all the way to password='<src-api-secret>';
. Do not add returns, as this will cause the configs to break.
- The configuration options are case-sensitive. Be sure to use upper and lower case as shown in the example.
- Use punctuation marks such as single quotes and semicolon exactly as shown.
(Optional) Sync Consumer Group Offsets and ACLs
The source.config
file can also take these parameters, which you can use to sync consumer group offsets and ACLs.
Optionally configure any of these additional parameters, and save the file.
consumer.offset.sync.enable
- Whether or not to sync consumer offsets from the source to the destination.
- Type: boolean
- Default: false
consumer.offset.group.filters
- JSON representation of a regex pattern-matching scheme to specify the consumer groups whose offsets you want to mirror
from the source to the destination. Make sure you not have the same consumer group running on the source and the destination
because the consumer offsets will overwrite one another. Examples are here.
acl.sync.enable
- Whether or not to sync ACLs from the source to the destination. Examples are here
- Type: boolean
- Default: false
acl.filters.json
- JSON representation of a regex pattern-matching scheme to select the ACLs that should be synced.
Specify the API key to use for the Destination Cluster
Tell the CLI to use your destination API key for the destination cluster:
ccloud api-key use <dst-api-key> --resource <dst-cluster-id>
You will get a verification that the API key is set as the active key for the given cluster ID.
Note
This is a one-time action that will persist forever. This API key will be used
whenever you perform one-time actions on your destination cluster. It will not
be stored on the cluster link. If you create a cluster link with this API key,
then it will continue to run even if you later disable this API key.
Create a Cluster Link
Create a link from the destination cluster to the source cluster. (The link itself will reside on the destination cluster.)
ccloud kafka link create <link-name> --cluster <dst-cluster-id> --source_cluster <src-bootstrap-url> --config-file <source.config>
Replace <link-name> with whatever you would like to name your link. You will use this name whenever you perform actions on the link.
For example, the following creates a cluster link called usa-east-west
on the DEDICATED cluster (lkc-161v5) used for our destination.
Note that you specify the cluster ID for the destination (where the link will be created), and the bootstrap URL and config file for the source cluster.
ccloud kafka link create usa-east-west --cluster lkc-161v5 --source_cluster pkc-4yyd6.us-east1.gcp.confluent.cloud:9092 --config-file source.config
List existing cluster links on the destination cluster.
ccloud kafka link list --cluster <dst-cluster-id>
$ ccloud kafka link list --cluster lkc-161v5
LinkName
+------------------+
usa-east-west
If you have multiple cluster links on the given cluster, all links will be listed.
Describe a given cluster link to get full details.
ccloud kafka link describe <link-name> --cluster <dst-cluster-id>
For example, here is the output describing the usa-east-west link:
$ ccloud kafka link describe usa-east-west --cluster lkc-161v5
Key | Value
+----------------------------------------+---------------------------------------------+
connections.max.idle.ms | 600000
ssl.endpoint.identification.algorithm | https
num.cluster.link.fetchers | 1
sasl.mechanism | PLAIN
replica.socket.timeout.ms | 30000
socket.connection.setup.timeout.ms | 10000
consumer.offset.sync.enable | false
acl.sync.enable | false
consumer.offset.group.filters |
acl.filters |
request.timeout.ms | 30000
replica.fetch.wait.max.ms | 500
cluster.link.retry.timeout.ms | 300000
ssl.protocol | TLSv1.3
ssl.cipher.suites |
ssl.enabled.protocols | TLSv1.2,TLSv1.3
security.protocol | SASL_SSL
replica.fetch.max.bytes | 1048576
consumer.offset.sync.ms | 30000
topic.config.sync.ms | 5000
acl.sync.ms | 5000
replica.fetch.response.max.bytes | 10485760
metadata.max.age.ms | 300000
replica.socket.receive.buffer.bytes | 65536
bootstrap.servers | pkc-4yyd6.us-east1.gcp.confluent.cloud:9092
retry.backoff.ms | 100
sasl.jaas.config |
replica.fetch.backoff.ms | 1000
socket.connection.setup.timeout.max.ms | 127000
replica.fetch.min.bytes | 1
client.dns.lookup | use_all_dns_ips
Mirror a topic
Now that you have a cluster link, you can mirror topics across it; from source to destination.
List the topics on the source cluster.
ccloud kafka topic list --cluster <src-cluster-id>
For example:
$ ccloud kafka topic list --cluster lkc-7k6kj
Name
+--------------+
stocks
tasting-menu
transactions
Create a mirrored topic.
Choose a source topic to mirror and use your cluster link to mirror it.
Tip
If you don’t already have a topic in mind, create one on the source cluster now with ccloud kafka topic create <topic-name> --cluster <src-cluster-id>
.
If you’ve been following along, use tasting-menu
.
You create mirrored topics on the destination cluster just as you would create a normal topic, but with a few extra parameters:
ccloud kafka topic create <topic-name> --link <link-name> --mirror-topic <source-topic-name> --cluster <dst-cluster-id>
For example:
$ ccloud kafka topic create tasting-menuu --link usa-east-west --mirror-topic tasting-menu --cluster lkc-161v5
Created topic "tasting-menu".
Note
- The mirror topic name (on the Destination) must be the same as the Source topic name. Topic renaming is not yet supported.
- Make sure that you use the Destination cluster ID in the command to create the mirror topic.
Test the topic mirror by sending data
With the cluster link available and a mirrored topic configured on the Destination, you can test mirroring and linking end-to-end.
Open two new command windows for a producer and consumer.
In each of them, log on to Confluent Cloud, and make sure you are using the environment that contains both your Source and Destination clusters.
As before, use the commands ccloud environment list
, ccloud environment use <environment-ID>
, and ccloud kafka cluster list
to navigate and verify where you are.
In one of the windows, start a producer to produce to your source topic.
ccloud kafka topic produce <topic-name> --cluster <src-cluster-id>
In the other window, start a consumer to read from your mirrored topic.
ccloud kafka topic consume <topic-name> --cluster <dst-cluster-id>
Type entries to produce in the first terminal on your source and watch the messages appear in your second terminal on the mirrored topic on the destination.
You can even open another command window and start a consumer for the source cluster to verify that you are producing directly to the source topic.
Both the source and mirrored topic consumers will match, showing the same data consumed.
Tip
The consumer command example shown above reads data from a topic in real time. To consume from the beginning: ccloud kafka topic consume --from-beginning <topic> --cluster <cluster-id>
Stop the topic mirror
There may come a point when you want to stop mirroring your topic. For example,
if you complete a cluster migration, or need to failover to your destination
cluster in a disaster event, you may need to stop mirroring topics on the
destination.
You can stop mirroring on a per-topic basis. The destination’s mirrored topic
will stop receiving new data from the source, and become a standard, writable
topic into which your producers can send data. No topics or data will be
deleted, and this will not affect the source cluster.
To stop mirroring a specific mirror topic on the destination cluster, use the
following command:
ccloud kafka topic mirror stop <mirrored-topic-name> --cluster <dst-cluster-id>
To stop mirroring the topic tasting-menu
using the destination cluster ID from the examples:
$ ccloud kafka topic mirror stop tasing-menu --cluster lkc-161v5
Stopped mirroring for topic "tasting-menu".
Note
This command may be renamed in future CLI releases.
What happens when you stop mirroring a topic
The topic mirror stop
command immediately stops new data mirroring from the
source to the destination for the specified topic. If consumer.offset.sync.enable
is
on, consumer offsets mirroring is also stopped. (See Set up a config file to authenticate to the Source Cluster.)
If there is any lag between the source cluster and the destination cluster
(either data or consumer offsets) when you run stop, that lag will never be
mirrored to the destination cluster. The lag will remain only on the source
cluster. This action is not reversible.
How to restart mirroring for a topic
To restart mirroring for that topic, you will need to delete the destination
topic, and then recreate the destination topic as a mirror.
Migration Best Practices
If you are migrating data from source to destination, and you want to make sure
no lagged data is lost, you may want to stop producers first and make sure any
lag is mirrored before stopping the topic mirror:
Stop your producers on your source cluster.
Wait for any lag to be mirrored.
Tip
Look at the end offsets for both the source and mirrored topic (high watermark) and make sure they are both at the same offset.
Run the topic mirror stop
command.
Failover Considerations
If you’re failing over from source to destination because of a disaster event,
please note these considerations.
Order of actions and promoting the Destination as post-failover active cluster
You should first stop mirrored topics, and then move all of your producers and
consumers over to the destination cluster. The destination cluster should become
your new active cluster, at least for the duration of the disaster and the
recovery. If it works for your use case, we suggest making the Destination
cluster your new, permanent active cluster.
Recover lagged data
There may be lagged data that did not make it to the destination before the
disaster occurred. When you move your consumers, if any had not already read
that data on the source, then they will not read that data on the destination.
If/when the disaster resolves your source cluster, that lagged data will still
be there. So, you are free to consume it / handle it as fits with your use case.
For example, if the Source was up to offset 105, but the Destination was only up
to offset 100, then the source data from offsets 101-105 will not be present on
the Destination. The Destination will get new, fresh data from the producers
that will go into its offsets 101-105. When the disaster resolves, the Source
will still have its data from offsets 101-105 available to consume manually.
Lagged consumer offsets may result in duplicate reads
There may be lagged consumer offsets that did not make it to the destination
before the disaster occurred. If this is the case, then when you move your
consumers to the destination, they may read duplicate data.
For example, if at the time that you stop your mirroring:
- Consumer A had read up to offset 100 on the Source
- Cluster Linking had mirrored the data through offset 100 to the Destination
- Cluster Linking had last mirrored consumer offsets that showed Consumer A was only at offset 95
Then when you move Consumer A to the Destination, it may read offsets 96-100 again, resulting in duplicate reads.
Stopping a mirrored topic clamps consumer offsets
The stop command “clamps” consumer offsets.
This means that, when you run topic mirror stop
, if:
- Consumer A was on source offset 105 – and that was successfully mirrored to the Destination, and
- the data on the Destination was lagging and was only up to offset 100 (so it did not have offsets 101-105)
then when you call stop
, Consumer A’s offset on the Destination will be
“clamped” down to offset 100, since that is the highest available offset on the
Destination.
Note that this will cause Consumer A to “re-consume” offsets 101-105. If your
producers send new, fresh data to the Destination, then Consumer A will not read
duplicate data. (However, if you had custom-coded your producers to re-send
offsets 101-105 with the same data, then your consumers could read the same data
twice. This is a rare case, and is likely not how you have designed your
system.)
Use consumer.offset.sync.ms
Keep in mind that you can configure consumer.offset.sync.ms
to suit your
needs (default is 30 seconds). A more frequent sync might give you a better
failover point for your consumer offsets, at the cost of bandwidth and
throughput during normal operation.
Migrate a consumer group
To migrate a consumer group called <consumer-group-name>
from one cluster to another, stop the consumers
and update the cluster link to stop mirroring the consumer offsets:
ccloud kafka link update <link-name> --cluster <src-cluster-id> --config \
consumer.offset.group.filters="consumer.offset.group.filters={\"groupFilters\": \
[{\"name\": \"*\",\"patternType\": \"LITERAL\",\"filterType\": \"INCLUDE\"},\
{\"name\":\"<consumer-group-name>\",\"patternType\":\"LITERAL\",\"filterType\":\"EXCLUDE\"}]}"
Then, point your consumers at the destination, and they will restart at the offsets where they left off.