Multi datacenter with Kafka based primary election
The image above shows two datacenters - DC A, and DC B. Either could be
on-premises, in Confluent Cloud, or part of a bridge to cloud solution. Each of the two datacenters has its own
Apache Kafka® cluster, ZooKeeper cluster, and Schema Registry.
The Schema Registry nodes in both datacenters link to the primary Kafka cluster in DC A, and
the secondary datacenter (DC B) forwards Schema Registry writes to the primary (DC A).
Note that Schema Registry nodes and hostnames must be addressable and routable
across the two sites to support this configuration.
Schema Registry instances in DC B have
leader.eligibility set to false, meaning that
none can be elected leader during steady state operation with both datacenters
To protect against complete loss of DC A, Kafka cluster A (the source) is
replicated to Kafka cluster B (the target). This is achieved by running the
Replicator local to the target cluster (DC B).
In this active-passive setup, Replicator runs in one direction, copying Kafka data
and configurations from the active DC A to the passive DC B. The Schema Registry
instances in both data centers point to the internal
_schemas topic in DC A.
For the purposes of disaster recovery, you must replicate the
internal schemas topic itself. If DC A goes down,
the system will failover to DC B. Therefore, DC B needs a copy of the
topic for this purpose.
Keep in mind, this failover scenario does not require the same overall configuration
needed to migrate schemas. So, do not set
schema.subject.translator.class, as you would
for a schema migration.
Producers write data to just the active cluster. Depending on the overall
design, consumers can read data from the active cluster only, leaving the
passive cluster for disaster recovery, or from both clusters to optimize reads
on a geo-local cache.
In the event of a partial or complete disaster in one datacenter, applications
can failover to the secondary datacenter.
This should point to the primary Kafka cluster (DC A in this example).
Use this setting to override the
group.id for the Kafka group used when Kafka is used for leader election. Without this configuration,
group.id will be “schema-registry”. If you want to run more than one Schema Registry cluster against a single Kafka cluster you, should make this setting unique for each cluster.
A Schema Registry server with
leader.eligibility set to false is guaranteed to remain a follower during any leader election. Schema Registry instances in a “secondary” datacenter should have this set to false, and Schema Registry instances local to the shared Kafka (primary) cluster should have this set to true.
Hostnames must be reachable and resolve across datacenters to support forwarding of new schemas from DC B to DC A.
If you have Schema Registry running in multiple datacenters, and you lose your “primary” datacenter; what do you do? First, note that the remaining Schema Registry instances running on the “secondary” can continue to serve any request that does not result in a write to Kafka. This includes GET requests on existing IDs and POST requests on schemas already in the registry. They will be unable to register new schemas.
- If possible, revive the “primary” datacenter by starting Kafka and Schema Registry as before.
- If you must designate a new datacenter (call it DC B) as “primary”, reconfigure the
kafkastore.bootstrap.servers in DC B to point to its local Kafka cluster and update Schema Registry config files to set
leader.eligibility to true.
- Restart your Schema Registry instances with these new configs in a rolling fashion.