Cluster Linking for Confluent Platform

Cluster Linking on Confluent Platform enables you to directly connect clusters and mirror topics with byte-for-byte replication, supporting hybrid cloud, disaster recovery, global replication, data sharing, and cluster migration use cases.

What is Cluster Linking?

Cluster Linking enables you to directly connect clusters and mirror topics from one cluster to another with byte-for-byte replication. Mirror topics maintain globally consistent offsets and identical content across partitions. Cluster Linking is secure, high-performance, tolerant of network latency, and built into Confluent Server and Confluent Cloud, making it easy to build multi-datacenter, multi-region, and hybrid cloud deployments.

Unlike Replicator and MirrorMaker 2, Cluster Linking does not require running Connect to move messages between clusters.

Capabilities and comparisons

Cluster Linking replicates topics from one Kafka or Confluent cluster to another, providing the following capabilities:

Global Replication: Unify data and applications from regions and continents around the world.
Hybrid cloud: Create a secure, scalable, and seamless bridge-to-cloud by linking an on-premises Confluent Platform cluster in a private cloud to a Confluent Cloud cluster in a public cloud.
HA/DR: Build a multi-region high availability and disaster recovery (“HA/DR”) strategy that achieves low recovery times (RTOs) and minimal data loss (RPOs) by replicating topic data and metadata to another cluster.
Cluster migration: Migrate from an older cluster to one in a newer environment, region, or cloud.
Aggregation: Combine data from many smaller clusters into one aggregate cluster.
Data sharing: Exchange data between different teams, lines-of-business, and organizations.

Cluster Linking offers these advantages over other Kafka replication options:

Built into Confluent Server and Confluent Cloud, so it does not depend on additional components, connectors, virtual machines, or custom processes.
Creates exact mirrors of topics, including offsets, to enable migration, failover, and reasoning about your system without offset translation or custom tooling.
Can be dynamically updated via REST APIs, CLIs, and Kubernetes CRDs.
For compressed messages, byte-to-byte replication achieves faster throughput by avoiding decompression-and-recompression.

What’s supported

KRaft and ZooKeeper

Use KRaft mode for new deployments. ZooKeeper is no longer available for new deployments as of Confluent Platform 8.0. To learn more about running Kafka in KRaft mode, see KRaft Overview for Confluent Platform and the KRaft steps in the Quick Start for Confluent Platform. To learn about migrating from older versions, see Migrate from ZooKeeper to KRaft on Confluent Platform.
Specifically, in relation to this migration to KRaft, password.encoder.secret is not required for KRaft mode, but is required when migrating from ZooKeeper to KRaft. Use of this parameter for Cluster Linking, when needed for older versions on ZooKeeper, is shown in Tutorial: Link Confluent Platform and Confluent Cloud Clusters. To learn more about how this is handled in Confluent Platform 8.0 and later, see Update password configurations dynamically.
This documentation provides examples for KRaft mode only. Earlier versions of this documentation provide examples for both KRaft and ZooKeeper.
Some examples in the various tutorials show a combined mode configuration, where for each cluster the broker and controller run on the same server. Combined mode is not intended for production use but is shown here to simplify the tutorial. If you want to run controllers and brokers on separate servers, use KRaft in isolated mode. To learn more, see KRaft Overview for Confluent Platform and KRaft Configuration for Confluent Platform.

Supported platform and tools compatibilities

Confluent Server includes Cluster Linking at no extra licensing cost for Confluent Platform besides the cost of the Enterprise license subscription. Use an inter-broker protocol (IBP) of 3.0 or higher on both the source and destination clusters.

Requires Confluent Server destination cluster of Confluent Platform 7.8.x or later on the destination cluster.
Works with all clients. For a guide to upgrading, see Steps for upgrading to 8.3.x.
Built-in custom resource in Confluent for Kubernetes.
Compatible with Ansible. To learn more, see Using Cluster Linking with Ansible.
Provides support for authentication and authorization, as described in Manage Security for Cluster Linking on Confluent Platform.
The source cluster can be Kafka or Confluent Server or Confluent Cloud; the destination cluster must be Confluent Server, which is bundled with Confluent Enterprise.
Bidirectional links between two clusters are supported but must be established as two separate links, not a single link. The Hybrid tutorial gives an example of creating a bi-directional link between an on-premises Confluent Platform cluster and a Confluent Cloud cluster.
In addition to self-managed deployments on Confluent Platform, Cluster Linking is also available as a managed service on Confluent Cloud and in Hybrid cloud.

Source	Destination
Confluent Platform 7.8.x or later [1]	Confluent Platform 7.8.0 or later
Confluent Cloud	Confluent Platform 7.8.0 or later
Kafka 3.8.x or later [1]	Confluent Platform 7.8.0 or later
Confluent Platform 7.8.x or later [1]	Confluent Cloud [2]
Confluent Cloud	Confluent Cloud [2]
Kafka 3.8.x or later [1]	Confluent Cloud [2]
Confluent Platform 7.8.0 or later (source-initiated link)	Confluent Platform 7.8.0 or later
Confluent Platform 7.8.0 or later (source-initiated link)	Confluent Cloud

Footnotes

Use cases and architectures

The following use cases can be achieved by the configurations and architectures shown.

Hybrid cloud

Use Case: Easily create a persistent and seamless bridge from on-premises environments to cloud environments. A cluster link between a Confluent Platform cluster in your datacenter and a Confluent Cloud cluster in a public cloud acts as a single secure, scalable hybrid data bridge that can be used by hundreds of topics, applications, and data systems. Cluster Linking can tolerate the high latency and unpredictable networking availability that you might have between on-premise infrastructure and the cloud, and recovers from reconnections automatically. Cluster Linking can replicate data bidirectionally between your datacenter and the cloud without any firewall holes or special IP filters because your datacenter always makes an outbound connection. Cluster Linking creates a byte-for-byte, globally consistent copy of your data that preserves offsets, making it easy to migrate on-premises applications to the cloud. Cluster Linking is built into Confluent Platform and does not require extra components to manage.

../../_images/clusterlinking-usecase-hybrid.png

Tutorial: Tutorial: Link Confluent Platform and Confluent Cloud Clusters

Disaster recovery

Use Case: Create a Disaster Recovery (“DR”) cluster that is available to failover should your primary cluster experience an outage or disaster. Cluster Linking keeps your DR cluster in sync with data, metadata, topic structure, topic configurations, and consumer offsets to achieve low recovery point objectives (“RPOs”) and recovery time objectives (“RTOs”), often measured in minutes. Cluster Linking for DR does not require an expensive network, complicated management, or extra software components. Because Cluster Linking preserves offsets and syncs consumer offsets, consumer applications of all languages can failover and resume near the point where they left off, achieving low downtime without custom code or interceptors.

../../_images/clusterlinking-usecase-dr.png

Global replication

Use Case: Stream data between the continents and regions where your business operates. Unify data from every region to create a global real-time event mesh. Aggregate data from different regions to drive the real-time applications and analytics that power your business. By making geo-local reads of real-time data possible, this can act like a content delivery network (CDN) for your Kafka events throughout the public cloud, private cloud, and at the edge.

../../_images/clusterlinking-usecase-global.png

Cluster migration

Use Case: Seamlessly move from an on-premises Kafka or Confluent Platform cluster to a Confluent Cloud cluster, or from older infrastructure to new infrastructure, with low downtime and no data loss. Cluster Linking’s native offset preservation and consumer offset syncing allows every consumer application to switch from the old cluster to the new one when it’s ready. Topics can be migrated over one by one, or in a batch. Cluster Linking handles topic creation, configuration, and syncing.

../../_images/clusterlinking-usecase-migration.png

Tutorial: Tutorial: Migrate Data with Cluster Linking on Confluent Platform

Customer Success Story: In SAS Powers Instant, Real-Time Omnichannel Marketing at Massive Scale with Confluent’s Hybrid Capabilities, the subtopic “A much easier migration thanks to Cluster Linking “ describes how SAS used Cluster Linking to migrate to Confluent for Kubernetes and other cloud-native solutions.

Scaling Cluster Linking

Because Cluster Linking fetches data from source topics, start by examining the number of partitions in the source topics. Having enough partitions lets Cluster Linking mirror data in parallel. Having too few partitions can make Cluster Linking bottleneck on partitions that are more heavily used.

In a Confluent Platform or Apache Kafka® cluster, you can scale Cluster Linking throughput as follows:

On the cluster link configurations, change the number of fetcher threads or change the fetch size to get better batching.
Improve the cluster’s maximum throughput by scaling the brokers vertically or horizontally.
Use the options listed under Cluster Link Replication Configurations to tune cluster link performance, which helps scale cluster link throughput.

In Confluent Cloud, Cluster Linking scales with the ingress and egress quotas of your cluster. Cluster Linking is able to use all remaining bandwidth in a cluster’s throughput quota: 150 MB/s per CKU egress on a Confluent Cloud source cluster or 50 MB/s per CKU ingress on a Confluent Cloud destination cluster, whichever is hit first. To scale Cluster Linking throughput, adjust the number of CKUs on either the source, the destination, or both.

Note

On the destination cluster, Cluster Linking write takes lower priority than Kafka clients producing to that cluster; Cluster Linking will be throttled first.

Confluent proactively monitors all cluster links in Confluent Cloud and will perform tuning when necessary. If you find that your cluster link is not hitting these limits even after a full day of sustained traffic, contact Confluent Support.

To learn more, see recommended guidelines for Confluent Cloud.

Known limitations and best practices

Mirror topics

Confluent Control Center displays mirror topics as regular topics when not connected to REST Proxy API v3 for Confluent Platform, which can show features that are not available on mirror topics, such as producing messages or editing configurations. Connect the Confluent Platform cluster and Control Center to the v3 Confluent REST API for correct mirror topic display. To learn how to configure these clusters for the v3 REST API, see Required Configurations for Control Center.
Cluster Linking doesn’t support mirroring topics that contain records produced using the Kafka transactions feature.
Consumer group offsets that are deleted on the destination cluster, especially those that are auto-deleted, persist instead of being removed as expected. To prevent extended retention of inactive consumer group offsets, increase offsets.retention.minutes on the destination cluster by at least double offsets.retention.check.interval.ms. This ensures data is deleted on the source before it is deleted on the destination, preventing re-replication of offsets that are deleted on the source.
Cluster Linking fails when encountering messages in the v0 or v1 format from the earliest versions of Kafka, transitioning the mirror topic to a FAILED state and stopping replication. Cluster Linking can replicate messages in the v2 format (introduced in Apache Kafka® version 0.11) and later. To replicate a topic that contains messages in the v0 or v1 format, either begin replication after the last message in the v0 or v1 format using the cluster link configuration mirror.start.offset.spec, or use Confluent Replicator to replicate topics and messages.
The reverse commands (reverse-and-start and reverse-and-pause) do not support prefixed cluster links. If a cluster link is configured with a cluster.link.prefix, you cannot use reverse APIs to swap mirroring directions during disaster recovery or failover scenarios.

Security and management

A key feature of Cluster Linking is the capability to sync ACLs between clusters. This is useful when moving clients between clusters for a migration or failover. For limitations, see Limitations on prefixing.
Do not use unauthenticated listeners with Confluent Platform. Cluster Linking can access the listeners, increasing the security risk. As a best practice, always configure authentication on listeners. To learn more, see the Enable Security for a KRaft-Based Cluster in Confluent Platform, the Authentication in Confluent Platform, and the listener configuration examples in the brokers for the various protocols such as SASL and Use TLS Authentication in Confluent Platform. See also, Manage Security for Cluster Linking on Confluent Platform.
TLS/SSL key stores, trust stores, and Kerberos keytab files must be stored at the same location on each broker in a given cluster. If not, cluster links may fail. Alternatively, you can configure a PEM certificate in-line on the cluster link configuration.
Cluster Linking does not support the use of a proxy for authentication to the cluster. For supported security configurations, see Manage Security for Cluster Linking on Confluent Platform.
Prerequisites are provided per tutorial or use case because these differ depending on the context. Tutorials are provided on topic data sharing and Tutorial: Link Confluent Platform and Confluent Cloud Clusters. Additional requirements for secure setups are provided in Manage Security for Cluster Linking on Confluent Platform.

Networking and performance

Firewalls that allow the cluster link connection from source cluster brokers to destination cluster brokers must allow the TCP connection to persist for Cluster Linking to work.

Configuration, monitoring, and troubleshooting

When deleting a cluster link, first check that all mirror topics are in the STOPPED state. If any are in the PENDING_STOPPED state, deleting a cluster link can cause irrecoverable errors on those mirror topics due to a temporary limitation.
Cluster link configurations for TLS/SSL key stores, trust stores, and Kerberos keytab files should not be stored in /tmp because /tmp files can get deleted, leaving links and mirrors in a bad state on some brokers.
REST API calls to list and get source-initiated cluster links return destination cluster IDs under the parameter destination_cluster_id (or destination_cluster with Confluent CLI version 4). Previous releases returned these values under source_cluster_id.
If you encounter the error Unknown topic config name: message.timestamp.difference.max.ms when creating a link or during consumer offset syncing, remove message.timestamp.difference.max.ms from the link configuration topic.config.sync.include. This issue is fixed in Confluent Platform 8.0.1 and later, but users of version 8.0.0 may still encounter it. To learn more, see Manage Mirror Topics for Cluster Linking on Confluent Platform.

Kafka protocol limits for consumer group configuration strings

Individual configuration strings and metadata arrays have a maximum size of 32,767 characters.

Impact on consumer group synchronization

Large migrations with many consumer groups can exceed this limit if you define a long list of groups in a single filter such as consumer.offset.group.filters.

Solution

Break large migrations into multiple cluster links using:

Narrower filter patterns
Wildcard regular expressions (regex) patterns
Multiple smaller cluster links instead of one large link