Multi-Datacenter Replication

Confluent Platform can be deployed across clusters and in multiple datacenters. Multi-datacenter deployments enable use-cases such as:

  • Active-active geo-localized deployments: allows users to access a near-by data center to optimize their architecture for low latency and high performance
  • Active-passive disaster recover (DR) deployments: in an event of a partial or complete datacenter disaster, allow failing over applications to use Confluent Platform in a different datacenter.
  • Centralized analytics: Aggregate data from multiple Apache Kafka® clusters into one location for organization-wide analytics
  • Cloud migration: Use Kafka to synchronize data between on-prem applications and cloud deployments

Replication of events in Kafka topics from one cluster to another is the foundation of Confluent's multi datacenter architecture.

Replication can be done with Confluent Replicator or using the open source MirrorMaker.

Replicator

Replicator allows you to easily and reliably replicate topics from one Kafka cluster to another. In addition to copying the messages, Replicator will create topics as needed preserving the topic configuration in the source cluster. This includes preserving the number of partitions, the replication factor, and any configuration overrides specified for individual topics.

Architecture

The diagram below shows the Replicator architecture. Replicator uses the Kafka Connect APIs and Workers to provide high availability, load-balancing and centralized management.

../_images/replicator_components.png

Replicator Architecture

Example Deployment

In a typical multi-datacenter deployment, data from two geographically distributed Kafka clusters located in separate datacenters is aggregated in a separate cluster located in another datacenter. The origin of the copied data is referred to as the "source" cluster while the target of the copied data is referred to as the "destination."

Each source cluster requires a separate instance of Replicator. For convenience you can run them in the same Connect cluster, located in the aggregate datacenter.

../_images/replicator.png

Replication to an Aggregate Cluster

Guidelines for Getting Started

Follow these guidelines to configure a multi-datacenter deployment using Replicator:

  1. Use the Replicator quick start to set up replication between two Kafka clusters.
  2. Learn how to install and configure Replicator and other Confluent Platform components in multi datacenter environments.
  3. Before running Replicator in production, make sure you read the monitoring and tuning guide.
  4. Review the Confluent Replicator example in the Confluent Platform demo. The demo shows users how to deploy a Kafka streaming ETL using KSQL for stream processing and Confluent Control Center for monitoring, along with Replicator to replicate data.
  5. For a practical guide to designing and configuring multiple Kafka clusters to be resilient in case of a disaster scenario, see the Disaster Recovery white paper. This white paper provides a plan for failover, failback, and ultimately successful recovery.

Topic Renaming

By default, the replicator is configured to use the same topic name in both the source and destination clusters. This works fine if you are only replicating from a single cluster. When copying data from multiple clusters to a single destination (i.e. the aggregate use case), you should use a separate topic for each source cluster in case there are any configuration differences between the topics in the source clusters.

It is possible to use the same Kafka cluster as the source and destination as long as you ensure that the replicated topic name is different. This is not a recommended pattern since generally you should prefer Kafka's built-in replication within the same cluster, but it may be useful in some cases (e.g. testing).

Periodic Metadata Updates

The replicator periodically checks topics in the source cluster to tell whether there are any new topics which need to be replicated, and whether there are any configuration changes (e.g. increases in the number of partitions). The frequency of this checking is controlled with the metadata.max.age.ms setting in the connector configuration. The default is set to 2 minutes, which is intended to provide reasonable responsiveness to configuration changes while ensuring that the connector does not add any unnecessary load on the source cluster. You can lower this setting to detect changes quicker, but it's probably not advisable as long as topic creation/reconfiguration is relatively rare (as is most common).

Security

Replicator supports communication with secure Kafka over SSL for both the source and destination clusters. Differing SSL configurations can be used on the source and destination clusters. You can configure Replicator connections to source and destination Kafka with:

You can configure ZooKeeper by passing the name of its JAAS file as a JVM parameter when starting:

export KAFKA_OPTS="-Djava.security.auth.login.config=etc/kafka/zookeeper_jaas.conf"
bin/zookeeper-server-start etc/kafka/zookeeper.properties

Important

The source and destination ZooKeeper must be secured with the same credentials.

To configure security on the source cluster, see the connector configurations here. To configure security on the destination cluster, see the general security configuration for Connect workers here.

Requirements

From a high level, Replicator works like a consumer group with the partitions of the replicated topics from the source cluster divided between the connector's tasks. Replicator periodically polls the source cluster for changes to the configuration of replicated topics and the number of partitions, and updates the destination cluster accordingly by creating topics or updating configuration. For this to work correctly, the following is required:

  • The Replicator principal must have permission to create and modify topics in the destination cluster. This requires write access to the corresponding ZooKeeper.
  • The default topic configurations in the source and destination clusters must match. In general, aside from any broker-specific settings (such as broker.id), you should use the same broker configuration in both clusters.
  • The destination Kafka cluster must have a similar capacity as the source cluster. In particular, since Replicator will preserve the replication factor of topics in the source cluster, which means that there must be at least as many brokers as the maximum replication factor used. If not, topic creation will fail until the destination cluster has the capacity to support the same replication factor. Note in this case, that topic creation will be retried automatically by the connector, so replication will begin as soon as the destination cluster has enough brokers.
  • The dest.kafka.bootstrap.servers destination connection setting in the Replicator properties file must be configured to use a single destination cluster, even when using multiple source clusters. For example, the figure shown at the start of this section shows two source clusters in different datacenters targeting a single aggregate destination cluster. Note that the aggregate destination cluster must have a similar capacity as the total of all associated source clusters.

Replicator Connector

Replicator is implemented as a Kafka connector. For basic information on the connector and additional use cases beyond multi-datacenter, see Confluent Replicator in Supported Connectors.

Important

This connector is bundled natively with Confluent Platform. If you have Confluent Platform installed and running, there are no additional steps required to install.

If you are using Confluent Platform using only Confluent Community components, you can install the connector using the Confluent Hub client (recommended) or you can manually download the ZIP file.

MirrorMaker

MirrorMaker is a stand-alone tool for copying data between two Kafka clusters. To learn more, see Kafka MirrorMaker.

Confluent Replicator is a more complete solution that handles topic configuration and data, and integrates with Kafka Connect and Confluent Control Center to improve availability, scalability and ease of use. To learn more, see comparing MirrorMaker to Confluent Replicator.