Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Running Schema Registry in Production¶
This topic describes the key considerations before going to production with your cluster. However, it is not an exhaustive guide to running your Schema Registry in production.
Tip
Starting with Confluent Platform 5.2.0, best practice is to run the same versions of Schema Registry on all nodes in a cluster. Running different versions of Schema Registry in the same cluster with Confluent Platform 5.2.0 or newer will cause runtime errors that prevent the creation of new schema versions.
Hardware¶
If you’ve been following the normal development path, you’ve probably been playing with Schema Registry on your laptop or on a small cluster of machines laying around. But when it comes time to deploying Schema Registry to production, there are a few recommendations that you should consider. Nothing is a hard-and-fast rule.
Memory¶
Schema Registry uses Kafka as a commit log to store all registered schemas durably, and maintains a few in-memory indices to make schema lookups faster. A conservative upper bound on the number of unique schemas registered in a large data-oriented company like LinkedIn is around 10,000. Assuming roughly 1000 bytes heap overhead per schema on average, heap size of 1GB would be more than sufficient.
CPUs¶
CPU usage in Schema Registry is light. The most computationally intensive task is checking compatibility of two schemas, an infrequent operation which occurs primarily when new schemas versions are registered under a subject.
If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offers will far outweigh a slightly faster clock speed.
Disks¶
Schema Registry does not have any disk resident data. It currently uses Kafka as a commit log to store all schemas durably and holds in-memory indices of all schemas. Therefore, the only disk usage comes from storing the log4j logs.
Network¶
A fast and reliable network is obviously important to performance in a distributed system. Low latency helps ensure that nodes can communicate easily, while high bandwidth helps shard movement and recovery. Modern data-center networking (1 GbE, 10 GbE) is sufficient for the vast majority of clusters.
Avoid clusters that span multiple data centers, even if the data centers are colocated in close proximity. Definitely avoid clusters that span large geographic distances.
Larger latencies tend to exacerbate problems in distributed systems and make debugging and resolution more difficult.
Often, people might assume the pipe between multiple data centers is robust or low latency. But this is usually not true and network failures might happen at some point. Please refer to our recommended Multi-Datacenter Setup.
JVM¶
We recommend running the latest version of JDK 1.8 with the G1 collector (older freely available versions have disclosed security vulnerabilities).
If you are still on JDK 1.7 (which is also supported) and you are planning to use G1 (the current default), make sure you’re on u51. We tried out u21 in testing, but we had a number of problems with the GC implementation in that version.
Our recommended GC tuning looks like this:
-Xms1g -Xmx1g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 \
-XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M \
-XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80
Important Configuration Options¶
The following configurations should be changed for production environments. These options depend on your cluster layout.
For multi-cluster deployments, configure Schema Registry to use Kafka-based primary election as described below. (See Schema Registry Configuration Options for full descriptions of these properties.)
Kafka based primary election¶
Kafka based primary election is available in Confluent Platform 4.0 and later. This is the recommended method for leader election. To configure Schema Registry to use Kafka for primary election, configure the kafkastore.bootstrap.servers
setting.
kafkastore.bootstrap.servers¶
The Kafka cluster containing the bootstrap servers specified in kafkastore.bootstrap.servers
is used to coordinate Schema Registry instances (leader election), and store schema data.
When Kafka security is enabled, kafkastore.bootstrap.servers
is also used to specify security protocols that Schema Registry uses to connect to Kafka.
listeners¶
Schema Registry identities are stored in ZooKeeper and are made up of a hostname and port. If multiple listeners are configured, the first listener’s port is used for its identity.
host.name¶
The host name advertised in ZooKeeper. Make sure to set this if running Schema Registry with multiple nodes.
Note
Configure min.insync.replicas
on the Kafka server for the internal _schemas
topic that stores all registered
schemas to be higher than 1. For example, if the kafkastore.topic.replication.factor
is 3, then set
min.insync.replicas
on the Kafka server for the kafkastore.topic
to 2. This ensures that the
register schema write is considered durable if it gets committed on at least 2 replicas out of 3. Furthermore, it
is best to set unclean.leader.election.enable
to false so that a replica outside of the isr is never elected
leader (potentially resulting in data loss).
The full set of configuration options are documented in Schema Registry Configuration Options.
Don’t Modify These Storage Settings¶
Schema Registry stores all schemas in a Kafka topic defined by kafkastore.topic
. Since this Kafka topic acts as the commit log for Schema Registry database and is the source of truth, writes to this store need to be durable. Schema Registry ships with very good defaults for all settings that affect the durability of writes to the Kafka based commit log. Finally, kafkastore.topic
must be a compacted topic to avoid data loss. Whenever in doubt, leave these settings alone. If you must create the topic manually, this is an example of proper configuration:
# kafkastore.topic=_schemas
bin/kafka-topics --create --bootstrap-server localhost:9092 --topic _schemas --replication-factor 3 --partitions 1 --config cleanup.policy=compact
kafkastore.topic¶
The durable single partition topic that acts as the durable log for the data. This topic must be compacted to avoid losing data due to retention policy.
- Type: string
- Default: “_schemas”
- Importance: high
kafkastore.topic.replication.factor¶
The desired replication factor of the schema topic. The actual replication factor will be the smaller of this value and the number of live Kafka brokers.
- Type: int
- Default: 3
- Importance: high
kafkastore.init.timeout.ms¶
The timeout for initialization of the Kafka store, including creation of the Kafka topic that stores schema data.
- Type: int
- Default: 60000
- Importance: medium
Migration from ZooKeeper primary election to Kafka primary election¶
Important
ZooKeeper leader election is deprecated. Kafka leader election should be used instead.
If you choose to migrate from ZooKeeper based to Kafka based primary election, make the following configuration changes in all Schema Registry nodes:
- Remove
kafkastore.connection.url
. - Remove
schema.registry.zk.namespace
if it is configured. - Configure
kafkastore.bootstrap.servers
. - Configure
schema.registry.group.id
if you originally hadschema.registry.zk.namespace
for multiple Schema Registry clusters.
If both kafkastore.connection.url
and kafkastore.bootstrap.servers
are configured, Kafka will be used for leader election.
(Previous to Confluent Platform version 5.5.0, if both were configured, ZooKeeper was used for leader election.)
Downtime for Writes¶
You can migrate from ZooKeeper based primary election to Kafka based primary election by following below outlined steps. These steps would lead to Schema Registry not being available for writes for a brief amount of time.
- Make above outlined config changes on that node and also ensure
master.eligibility
is set to false in all the nodes - Do a rolling bounce of all the nodes.
- Configure
master.eligibility
to true on the nodes that can be primary eligible and bounce them
Complete Downtime¶
If you want to keep things simple, you can take a temporary downtime for Schema Registry and do the migration. To do so, simply shutdown all the nodes and start them again with the new configs.
Backup and Restore¶
As discussed in Kafka Backend, all schemas, subject/version and
ID metadata, and compatibility settings are appended as messages to a special
Kafka topic <kafkastore.topic>
(default _schemas
). This topic is a common
source of truth for schema IDs, and you should back it up. In case of some
unexpected event that makes the topic inaccessible, you can restore this schemas
topic from the backup, enabling consumers to continue to read Kafka messages that
were sent in the Avro format.
As a best practice, we recommend backing up the <kafkastore.topic>
. You have
three different options for doing so, as described below.
Backups using Replicator¶
If you already have a multi-datacenter Kafka deployment, you can back up the
<kafkastore.topic>
(_schemas
) to another Kafka cluster using Confluent Replicator.
Backups using a sink connector¶
Alternatively, you can use a sink-to-storage connector, like the Kafka Connect Amazon S3
sink connector, to back up your data to a separate storage (for
example, AWS S3) and continuously keep it updated in storage. Note that the
<kafkastore.topic>
(_schemas
) must be copied over as raw data
(bytes). There are a couple of ways to do this using the Connect S3 sink
connector:
- To store Kafka records and the
_schemas
topic in S3, use theByteArrayConverter
to back up the_schemas
topic to S3 as raw data. You can then use thetopics
ortopics.regex
configuration property to list other Kafka topics to back up to S3. (See Connect Sink Configurations for descriptions oftopics
andtopics.regex
.) - If you want to store Kafka records in JSON, AVRO, or Parquet, you can use one S3 sink connector
to back up a list of Kafka topics to S3. You then create a second S3 sink connector to copy the
_schemas
topic to S3 usingBytes
format.
Backups using command line tools¶
In lieu of either of the above options, you can use Kafka command line tools to
periodically save the contents of the topic to a file. For the following
examples, we assume that <kafkastore.topic>
has its default value
_schemas
.
To back up the topic using command line tools, use the
kafka-console-consumer
to capture messages from the schemas topic to a file
called schemas.log
. Save this file off the Kafka cluster.
bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic _schemas --from-beginning --property print.key=true --timeout-ms 1000 1> schemas.log
To restore the topic, use the kafka-console-producer
to write the contents
of file schemas.log
to a new schemas topic. This examples uses a new schemas
topic name _schemas_restore
. If you use a new topic name or use the old one
(_schemas
), make sure to set <kafkastore.topic>
accordingly.
bin/kafka-console-producer --broker-list localhost:9092 --topic _schemas_restore --property parse.key=true < schemas.log