Tutorial: Move Active-Passive Data Center to Multi-Region in Confluent Platform

Confluent Platform provides a variety of technologies and architectures to create and manage clusters that span multiple regions. This tutorial shows you how to move a more traditional active-passive datacenter architecture that spans two regions to a multi-region stretched cluster that can leverage follower-fetching, observers, and replica placement available in Configure Multi-Region Clusters in Confluent Platform.

What the tutorial covers

This tutorial shows you how to move an existing active-passive cluster setup that uses Replicator for syncing across regions to a multi-region stretched cluster. Replicator is not needed for the stretched cluster because it is effectively a single cluster operating across multiple regions.

A broad outline of best practices and a sequence of steps is provided for pausing clients and replication, reconfiguring the clusters, and bringing the new clusters back online. However, detailed guidance on commands and configurations is not included. To get that kind of detail, refer to Configure Multi-Region Clusters in Confluent Platform and Tutorial: Multi-Region Clusters on Confluent Platform.

You’ll begin by freeing up resources on the passive cluster; pausing all consumers, producers, and stopping replication. Then you’ll effectively extend your original active cluster by pulling in those freed up resources. In the process, you’ll change the configuration of the cluster into a mult-region cluster with the features and capabilities described in Configure Multi-Region Clusters in Confluent Platform.

A successful transition to a multi-region cluster is defined not only by assuring that everything works on the new deployment, but also by having options to recover if something fails due to unforeseeable circumstances. Therefore each step includes a ROLLBACK section with instructions on how to roll back to the previous state in case of failure. Only perform ROLLBACK if something goes wrong; do not perform these steps otherwise.

Understanding the “Before” and “After” architectures

Your starting point is assumed to be a 2-region active-passive architecture that uses Replicator to sync data across the differnet cluters.

../_images/multi-dc-active-passive-before-mrc.png

The end state, after you’ve transitioned the datacenters, will be a multi-region, rack-aware cluster that uses a stretched 2.5 region active-passive stretch architecture like the one shown in the multi-region cluster tutorial example and demo’ed in the full tutorial here. The new datacenter setup will not require Replicator to sync data.

../_images/multi-region-topic-replicas.png

The steps below refer to these data centers (DCs):

West (original, passive DC)
East (original, active DC)
Central (new tiebreaker KRaft controller DC, added in as a part of the new multi-region configuration)

Prerequisites

These instructions assume you have:

Two Confluent Platform clusters with an active-passive setup
Confluent Enterprise version 7.5.0 or later (in order to use KRaft)
One extra node dedicated to be the “light” KRaft controller node
All instances within 50ms of network latency between each other

The tutorial refers to $CONFLUENT_HOME, which represents etc/kafka within your Confluent Platform install directory. For example, to set an environment variable for this:

export CONFLUENT_HOME=$HOME/confluent-8.1.1
PATH=$CONFLUENT_HOME/bin:$PATH

Step 1. Pause all consumers and producers

When all consumers and producers are paused, there should be no data flow in the cluster.

Potential data flow components:

ksqlDB Queries
Connectors
Application Producers
Application Consumers

Step 2. Pause replication

Ensure Replicator consumer lag is 0.
This is to ensure everything is in sync, which is not possible when producers and consumers are online because the 8 hour green zone does not stop data flow.
Stop replication.

Tip

Replicator is no longer needed in for a Multi-Region Clusters setup.

ROLLBACK option: Resume Replicator.

Step 3. Install the Confluent Platform on the KRaft controller node (Central DC)

Have the Confluent Platform package and property file ready so that you can start it up in later steps.

ROLLBACK option: Delete the Confluent Platform package and the property files from the tiebreaker node.

Step 4. Gracefully stop DC West components

Stop everything on the DC West environment (your original passive cluster) in below order. DO NOT DELETE them.

Broker
KRaft controller

ROLLBACK option: Restart all of the above components.

Step 5. Make a backup of your data directory and log4j for existing brokers and KRaft controllers on DC West

Create backup copies of both the data directory and log4j logs .

Copy the log.dirs file to another location.
Copy log4j logs to another location.

ROLLBACK option: Delete the copied files. (No rollback actions are necessary.)

Step 6. Delete DC West brokers and controller data folder and log4j log folder

Since you will start the controllers and brokers anew, delete the data on them:

Delete data on controllers
Delete data on the brokers

Tip

You are not deleting these in EAST DC because they are your expanding cluster.

ROLLBACK option: Copy and paste the backed up files in step 4 back to where they were originally.

Step 7. Create backups of properties files for all DC West and DC East components

Make backup property files for below components:

West controllers
West Brokers
East controllers
East Brokers

ROLLBACK option: Delete the copied files. (No rollback actions are necessary.)

Step 8. Change all properties files in both environments to Multi-Region Clusters configuration

Change the properties file for these components:

Broker (Add two fields and change one field)

Make the following changes to your broker properties files (for example, in $CONFLUENT_HOME/etc/kafka/kraft/broker.properties).

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
broker.rack=<region>
controller.quorum.voters <- this should now include the new controllers

KRaft controllers

Make sure all the controllers have the quorum configured. (For example, update $CONFLUENT_HOME/etc/kafka/kraft/controller.properties.)

controller.quorum.voters=<your-id-on-east-1>@<your-east-1>:9073,<your-id-on-east-2>@<your-east-2>:9073,<your-id-on-gtdc>@<your-gtdc>:9073,<your-id-on-west-1>@<your-west-1>:9073,<your-id-on-west-2>@<your-west-2>:9073

ROLLBACK option: Copy and paste the backed up files in step 6 back to where they were originally.

Step 9. Shut down one controller on EAST DC

You need to move one KRaft controller from the EAST DC, so shut down one controller on EAST and move it to the new Central DC (Tiebreaker).

ROLLBACK option: Restart this East controller with the previous properties.

Step 10. Start one controller on CENTRAL DC (tiebreaker)

To keep the cluster running properly, you will now start one controller on the new Central DC with the Multi-Region Clusters configuration, then perform a rolling restart of the two controller instances on EAST DC.

Important

At this stage, all three controllers should restart with all information for all three controllers in each of the controller.properties files, as shown:

<your-id-on-east-1>@<your-east-1>:9073,<your-id-on-east-2>@<your-east-2>:9073,<your-id-on-gtdc>@<your-gtdc>:9073

Controllers will not join if they do not know each other. Therefore, the information in each of the controller properties files must be the same.

ROLLBACK option: Gracefully shut down this this controller.

Step 11. Start two controllers on WEST DC

Start two controller instances on WEST Servers 1-2. Then you will perform a rolling restart of all three previous controller instances with updated configurations that contain all controller information.

Each controller.properties file should have these configurations:

<your-id-on-east-1>@<your-east-1>:9073,<your-id-on-east-2>@<your-east-2>:9073,<your-id-on-gtdc>@<your-gtdc>:9073,<your-id-on-west-1>@<your-west-1>:9073,<your-id-on-west-2>@<your-west-2>:9073

ROLLBACK option: Gracefully shut down these two controllers.

Step 12. Start 4 Brokers on WEST DC

Start with 4 Brokers on WEST Servers 1-4.

Be sure to keep all previous working properties, and change only the following configurations:

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
broker.rack=<region>
controller.quorum.voters <- this should now include the new controllers
node.id <- this has to be unique since you are adding new brokers.

ROLLBACK option: Gracefully shut down these brokers.

Step 13. Perform a rolling restart of the brokers on EAST DC

Now update these properties and restart the existing EAST DC brokers:

replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
broker.rack=<region>
controller.quorum.voters <- this should now include the new controllers

Note that you do not change node.id for these brokers because you want them to stay the same.

ROLLBACK option: Change the properties back to their original configurations, and perform a rolling restart.

Step 16. Check all components are working properly

Check the controller logs to make sure everything is running properly.

Step 17. Alter topic configurations so they are utilizing Multi-Region Clusters capabilities

To alter the topic configuratinos to use the multi-region capabilities, refer to the examples shown in Replica placement in the multi-region clusters tutorial.

Step 18. Rebalance the cluster

Either use the Confluent rebalancer tool to start the rebalance of the cluster based on previous placement or enable Manage Self-Balancing Kafka Clusters in Confluent Platform (for Confluent Platform 6.0.0+ clusters).

Step 19. Update consumer applications

Lastly, you can slowly update the consumer application with an additional property: client.rack.