What is Cluster Linking?
Cluster Linking allows you to directly connect clusters together and mirror
topics from one cluster to another without the need for Connect.
Cluster Linking makes it much easier to build multi-datacenter, multi-cluster,
and hybrid cloud deployments.
Unlike, Replicator and MirrorMaker2, Cluster Linking
does not require running Connect to move messages from one cluster to another,
ensuring that the offsets are preserved from one cluster to another. We call
this “byte-for-byte” replication. Whatever is on the source, will be mirrored
precisely on the destination cluster.
Cluster Linking and Built-in Multi-Region Replication can be combined to create a highly-available,
durable, and distributed global eventing fabric. Use Built-in Multi-Region Replication when auto-client
failover (low RTO) or RPO=0 is required on some topics. Cluster Linking should
be used when network quality is questionable, data centers are very far apart,
or RTO goals can tolerate client reconfiguration.
The destination cluster must be running Confluent Server and the source cluster can either be Confluent Server or Apache Kafka® 2.4+.
Cluster Linking is being introduced in Confluent Platform 6.0.0 as a preview feature.
Use Cases and Architectures
The following use cases can be achieved by the configurations and architectures shown. These deployments are demo’ed in Cluster Linking Demo (Docker)
and Tutorial: Using Cluster Linking for Topic Data Sharing.
Topic Data Sharing
Use Case: Share the data in a handful of topics across two Kafka clusters.
- source cluster
- destination cluster
For topic sharing, data moves from the source to the destination cluster by means of a cluster link.
Mirror topics are associated with a cluster link. Consumers on the destination cluster can read from local,
read-only, mirrored topics to read messages produced on the source cluster. If an original topic on the source cluster
is removed for any reason, you can stop mirroring that topic, and convert it to a read/write topic on the destination.
Use Case: Move from an on-premises Kafka cluster to a Confluent Cloud Kafka cluster,
or from an older version to a newer version. The native offset preservation you get
by leveraging Confluent Server on the brokers makes this much easier to do with Cluster Linking
than with other Connect based methods.
Hybrid Cloud Architectures
Use Case: Deploy an ongoing data funnel for a few topics from
an on-premise environment to Confluent Cloud. Cluster Linking provides a network
partition tolerant architecture that supports this nicely (losing a network
connection momentarily does not materially affect the data on any particular
cluster), whereas trying this with stretch clusters requires a highly reliable
and robust network.
Understanding Listeners in Cluster Linking
For a forward connection, the target server knows which listener the connection
came in on and associates the listener with that connection. When a metadata request
arrives on that connection, the server returns metadata corresponding to the listener.
For example, in Confluent Cloud, when a client on the external listener asks for
the leader of
topicA, it always gets the external endpoint of the leader and never
the internal one, because the system knows the listener name from the connection.
For reverse connections, the target server (that is, the source cluster) established the
connection. When the connection is reversed, this target server needs to know
which listener to associate the reverse connection with; that is, for example, which endpoint
it should return to the destination for its leader requests.
By default, the listener is associated based on the source cluster where the link was
created. In most cases this is sufficient because typically a single external listener is used.
On Confluent Cloud, this default is used and you cannot override it.
On self-managed Confluent Platform, you have the option to override the default listener/connection association.
This provides the flexibility to create the source link on an internal listener but associate the
external listener with the reverse connection.
local.listener.name refers to source cluster listener
name. By default, this is the listener that was used to create the source link.
If you want to use a different listener, you must explicitly configure it. If
Confluent Cloud is the source, then it would be the external listener (default) and cannot be overridden.
For the destination, the listener is determined by bootstrap.servers and cannot be overridden.