When migrating data from an old cluster to a new cluster, Cluster Linking
makes an identical copy of your topics on the new cluster, so it’s easy to make
the move with low downtime and no data loss.
Cluster Linking can:
- Automatically create matching “mirror” topics with the same configurations, so you don’t have to recreate your topics by hand
- Sync all historical data and new data from the existing topics to the new mirror topics
- Sync your consumer offsets, so your consumers can pick up exactly where they left off, without missing any messages or consuming any duplicates
- Move consumers from the old cluster to the new cluster independently
- Move producers from the old cluster to the new cluster topic-by-topic
Standard Migration with Cluster Linking
The sections below describe the general steps to migrate data from one cluster to another using Cluster Linking.
Step 1: Create a cluster link across two clusters
Create a cluster link with the following configurations:
Enable auto-create mirror topics.
This configures Cluster Linking to automatically create mirror topics on your new cluster for any existing topics on your old cluster.
You can filter by specific prefixes, if needed.
Alternatively, you can create individual mirror topics by CLI or REST API after you’ve created the cluster link.
consumer offset sync (see consumer offset configs in Configuring Cluster Link Behavior).
This syncs your consumer offsets from your old cluster to your new cluster. You can filter by specific consumer group
names or prefixes, if needed.
By default, this sync happens every 30 seconds. You can set it as low as 1 second, to minimize consumer downtime when switching from the old cluster to the new cluster.
Consumer offsets are part of the data that your cluster link mirrors, so syncing them more frequently comes at the cost of higher data throughput.
You can monitor your total data throughput in Confluent Cloud using the metrics shown under
in Metrics and Monitoring.
If migrating between two Confluent Cloud clusters, or two Confluent Platform / Apache Kafka® clusters with the same security system, enable ACL sync
You can filter by specific resources, principals, and so on, if needed.
This is not helpful when migrating to Confluent Cloud from a different platform because Confluent Cloud uses its own authentication system.
When using Schema Linking: To use a mirror topic that has a schema with Confluent Cloud Connect, ksqlDB, broker-side schema validation,
or the topic viewer, make sure that Schema Linking
puts the schema in the default context of the Confluent Cloud Schema Registry. To learn more, see
How Schemas work with Mirror Topics.
Step 2: Wait for mirroring lag to approach zero (0)
When mirroring lag is almost zero (0), this means that the existing data in your topics has been mirrored to your new cluster.
This allows you to switch your consumers and producers with minimal downtime.
If you want certain topics to be ready before others, you can prioritize those
topics by pausing mirroring on the other topics. That way, more throughput will
be allocated to the topics you prioritize.
If your cluster link is having trouble keeping up with the incoming data and is
not able to get mirroring lag near 0, you may need to prioritize certain topics
by pausing mirroring on the other topics.
(Optional) Step 3: Move consumer groups from the old cluster to the new cluster
You can move each consumer group independently, if you wish. Because consumer
offsets are synced, consumers will pick up from the same spot where they left
off. To move a consumer group, follow these steps:
Stop the consumer group on the old cluster.
Wait for at least
consumer.offset.sync.ms (default is 30 seconds) to ensure its latest offsets have been synced.
Exclude that consumer group’s name from the cluster link, in the
Verify that the topic offsets the consumer group is at have been synced to the mirror topic.
You can do this by checking that consumer lag > mirroring lag.
Before you start the consumer on the new cluster, you need to ensure that the offsets at which the consumer is at have been mirrored to the mirror topic.
If the consumer is ahead of the mirroring, then its offsets will be reset to the latest offsets in the topic, and it will consume duplicates.
For example, if the consumer is at offset 100 for a partition, but you start the consumer when the mirror topic is only at offset 90,
then the consumer will start consuming from the end of the topic, and will re-consume messages 90-100.
Restart the consumer group on the new cluster.
Step 4: Stop producers and consumers
Stop all producers and any remaining consumers. This gives the cluster link a chance to “catch up,” without new messages coming in.
Step 6: Restart producers and consumers
Wait for the promotion to complete and the mirror topics to enter the
state. Mirror topic state can be found in the REST API or CLI by describing an
individual mirror topic or all mirror topics on a cluster link. In the Confluent Cloud Console,
STOPPED mirror topic will appear as a regular topic, and no
longer be displayed as a mirror topic.
Once mirror topics are in the
STOPPED state, you can restart producers and consumers to them on the new cluster.
Producing to a mirror topic that is still in the
PENDING_STOPPED state can cause messages to fail;
consuming from a mirror topic that is still in the
PENDING_STOPPED state can cause the consumer to consume duplicate messages.
You have now moved your topics, producers, and consumers to a new cluster.
Alternate Migration Strategies
If you cannot move all of your producers for a given topic(s) at the same time,
you can consider two alternate approaches. Both involve more hands-on work than
the standard migration approach with Cluster Linking.
Repartitioning and renaming topics in a migration
Cluster Linking preserves the same number of partitions on any topics it
mirrors. It also keeps the topic name the same, though you can optionally add a
prefix before the name.
Here are the two options for migrations that need partition changes or name changes
for topics being migrated:
- (Recommended Approach) Use Cluster Linking to mirror the topics byte-for-byte to the new cluster. Then, use Confluent Cloud ksqlDB
to repartition each of the topics into your desired number of partitions. The advantage of this approach
is that both of these tools are fully managed and API-driven. Keep these points in mind:
- If you have many topics, you may want to do this in batches.
- If you want to change the topic name after the migration, ksqlDB can do that.
- If you want to keep the topic name the same after the migration, then have the
cluster link add a prefix to the topic names, and have ksqlDB create a repartitioned
topic using the original name.
- (Alternate Approach) Deploy Confluent Replicator for the migration. Replicator can inherently change the
number of partitions and/or the names of the topics that you want to replicate. Keep these points in mind:
- Replicator is not a fully managed SaaS service. It is software that you must deploy, manage, and monitor across
multiple nodes and VMs that you own, or on a Kubernetes cluster using Confluent for Kubernetes. This is a large investment
that takes more effort and Kafka expertise than setting up Cluster Linking.
- A Replicator license to use temporarily during migrations is available to all Confluent customers with a commit to Confluent Cloud.
In either case, you will need to develop a custom strategy for moving consumer groups
from the old cluster to the new cluster. None of these technologies — Cluster Linking, ksqlDB, or Replicator —
can correctly translate consumer offsets for you, since the partitions are changing and
thus the offsets are not consistent.
Bidirectional with Cluster Linking
Mirror topics are read-only. Therefore, if you migrate some producers but not
others, you cannot have those producers writing to the mirror topic on the new
cluster. You’ll need a different topic to write these events to, and you’ll need
to sync those events back to your old cluster for any consumers that haven’t
moved yet. You’ll also need to make some changes to your consumers to make sure
they get the events produced to both clusters.
For a given topic, you can set up three new topics:
A new, regular topic on your new cluster by the same name. This topic will receive new events produced to your new cluster.
A mirror topic on your new cluster, which mirrors historical data and any new events produced to the old cluster.
You’ll give this topic a prefix, so it doesn’t clash with the writable topic.
Prefixing is available in Confluent Cloud as of early Q2 2022.
A mirror topic on your old cluster, which mirrors the writable topic from the new cluster.
This brings new events back to your old cluster for straggling consumers.
There are several changes you need to make to your consumers to make this work:
- Your consumers need to consume from a regex pattern–instead of a topic name–that will capture both topics.
For example, if your topic is named
clicks, the consumers could consume from the pattern
to consume from both
clicks and the prefixed topics.
- When moving a consumer group, it will need to manually set its offsets on the new cluster for the writable topic.
Because this is moving “upstream,” the cluster link does not sync its consumer offsets from the mirror topic on the old cluster.
- Because your consumers are consuming from two different topics, you cannot rely on the partitions for ordering.
A message with a given key will be produced to one partition on the old cluster and a different partition on the new cluster.
Two messages with the same key may be read in different order by different consumers. So, your consumers must use something else
to determine message order, such as the timestamp.
Bidirectional with Replicator
Confluent Replicator is available as a free license to Confluent Cloud commit customers. It is a
piece of software that can be run in VMs or in a Kubernetes cluster. It syncs
messages between topics in two different clusters.
You can set up two deployments of Replicator to achieve bi-directional replication.
Replicator ensures that no cyclical loops are created; that is, that the same
message doesn’t get replicated back to the original cluster where it was produced.
However, the ordering between these two topics will not be the same. That means
it is impossible for a consumer to move from the `old` cluster to the `new` cluster
and pick up at the same spot where it left off. The consumer must choose to either:
- Rewind to an earlier offset in order to ensure that no messages are missed. However, this will cause the consumer to consume duplicates of some messages. Or,
- Start consuming at the end of the topic, which will cause it to miss the most recently produced messages.