What is Cluster Linking?
Cluster Linking enables you to directly connect clusters and mirror
topics from one cluster to another.
Cluster Linking makes it easy to build multi-datacenter, multi-region,
and hybrid cloud deployments. It is secure, performant, tolerant of
network latency, and built into Confluent Server and Confluent Cloud.
Unlike Replicator and MirrorMaker 2, Cluster Linking
does not require running Connect to move messages from one cluster to another,
and it creates identical “mirror topics” with globally consistent offsets. We call
this “byte-for-byte” replication. Messages on the source topics are mirrored
precisely on the destination cluster, at the same partitions and offsets.
Capabilities and Comparisons
Cluster Linking replicates topics from one Kafka or Confluent cluster to another,
providing the following capabilities:
- Global Replication: Unify data and applications from regions and continents around the world.
- Hybrid cloud: Create a secure, scalable, and seamless bridge-to-cloud by linking an
on-premise Confluent Platform cluster in a private cloud to a Confluent Cloud cluster in a
- HA/DR: Build a multi-region high availability and disaster recovery (“HA/DR”) strategy
that achieves low recovery times (RTOs) and minimal data loss (RPOs) by
replicating topic data and metadata to another cluster.
- Cluster migration: Migrate from an older cluster to one in a newer environment, region, or cloud.
- Aggregation: Combine data from many smaller clusters into one aggregate cluster.
- Data sharing: Exchange data between different teams, lines-of-business, and organizations.
Compared to other Kafka replication options, Cluster Linking offers these advantages:
- Built into Confluent Server and Confluent Cloud, so it does not depend on additional components, connectors,
virtual machines, or custom processes.
- Creates exact mirrors of topics, including offsets, to enable migration, failover, and rationalizing
about your system without offset translation or custom tooling.
- Can be dynamically updated via REST APIs, CLIs, and Kubernetes CRDs.
- For compressed messages, byte-to-byte replication achieves faster throughput by avoiding
Use Cases and Architectures
The following use cases can be achieved by the configurations and architectures shown.
Use Case: Easily create a persistent and seamless bridge from on-premise environments
to cloud environments. A cluster link between a Confluent Platform cluster in your datacenter and a Confluent Cloud
cluster in a public cloud acts as a single secure, scalable hybrid data bridge that can be used
by hundreds of topics, applications, and data systems. Cluster Linking can tolerate the high latency
and unpredictable networking availability that you might have between on-premise infrastructure
and the cloud, and recovers from reconnections automatically. Cluster Linking can replicate data
bidirectionally between your datacenter and the cloud without any firewall holes or special IP filters
because your datacenter always makes an outbound connection. Cluster Linking creates a byte-for-byte,
globally consistent copy of your data that preserves offsets, making it easy to migrate on-premise applications
to the cloud. Cluster Linking built into Confluent Platform and does not require any additional components to manage.
Tutorial: Hybrid Cloud and Bridge-to-Cloud
Use Case: Create a Disaster Recovery (“DR”) cluster that is available to failover should
your primary cluster experience an outage or disaster. Cluster Linking keeps your DR cluster
in sync with data, metadata, topic structure, topic configurations, and consumer offsets so
that you can achieve low recovery point objectives (“RPOs”) and recovery time objectives (“RTOs”),
often measured in minutes. Cluster Linking for DR does not require an expensive network, complicated
management, or extra software components. And because Cluster Linking preserves offsets and syncs consumer offsets,
consumer applications of all languages can failover and pickup near the point where they left off,
achieving low downtime without custom code or interceptors.
Use Case: Stream data between the continents and regions where your business operates. Unify
data from every region to create a global real-time event mesh. Aggregate data from different
regions to drive the real-time applications and analytics that power your business. By making geo-local reads
of real-time data possible, this can act like a content delivery network (CDN) for your Kafka events
throughout the public cloud, private cloud, and at the edge.
Use Case: Share data between different teams, lines of business, or organizations
in a pattern that provides high isolation between teams and efficient operational management.
Cluster Linking keeps an in-sync mirror copy of relevant data on the consuming team’s
cluster. This isolation empowers the consuming team to scale up hundreds of consumer
applications, stream processing apps, and data sinks without impacting the producing
team’s cluster: for the producing team, it’s the same load as one additional consumer.
The producing team simply issues a security credential with access to the topics that
the consuming team is allowed to read. Then the consuming team can create a cluster link,
which they control, monitor, and manage.
Tutorial: Tutorial: Using Cluster Linking for Topic Data Sharing
Use Case: Seamlessly move from an on-premises Kafka or Confluent Platform cluster to a Confluent Cloud cluster,
or from older infrastructure to new infrastructure, with low downtime and no data loss.
Cluster Linking’s native offset preservation and consumer offset syncing allows every consumer
application to switch from the old cluster to the new one when it’s ready. Topics can be
migrated over one by one, or in a batch. Cluster Linking handles topic creation,
configuration, and syncing.
Tutorial: Data Migration with Cluster Linking
Customer Success Story: In SAS Powers Instant, Real-Time Omnichannel Marketing at Massive Scale with Confluent’s Hybrid Capabilities,
the subtopic “A much easier migration thanks to Cluster Linking ” describes how SAS used Cluster Linking to migrate to Confluent for Kubernetes and other cloud-native solutions.
Scaling Cluster Linking
Because Cluster Linking fetches data from source topics, the first scaling
unit to inspect is the number of partitions in the source topics. Having enough
partitions lets Cluster Linking mirror data in parallel. Having too few
partitions can make Cluster Linking bottleneck on partitions that are more heavily used.
In a Confluent Platform or Apache Kafka® cluster, you can scale Cluster Linking throughput as follows:
- On the cluster link configurations, change the number of fetcher threads or change the fetch size to get better batching.
- Improve the cluster’s maximum throughput by scaling the brokers vertically or horizontally.
- Use the options listed under Cluster Link Replication Configurations
to tune cluster link performance, which helps scale cluster link throughput.
In Confluent Cloud, Cluster Linking scales with the ingress and egress quotas of
your cluster. Cluster Linking is able to use all remaining bandwidth in a
cluster’s throughput quota: 150 MB/s per CKU egress on a Confluent Cloud source
cluster or 50 MB/s per CKU ingress on a Confluent Cloud destination cluster,
whichever is hit first. Therefore, to scale Cluster Linking throughput, simply adjust
the number of CKUs on either the source, the destination, or both.
On the destination cluster, Cluster Linking write takes lower priority
than Kafka clients producing to that cluster; Cluster Linking will be throttled first.
Confluent proactively monitors all cluster links in Confluent Cloud and will
perform tuning when necessary. If you find that your cluster link is not hitting
these limits even after a full day of sustained traffic, contact Confluent Support.
See also, recommended guidelines for Confluent Cloud.