.. _schema-registry-prod: Deploy |sr| in Production on |cp| ================================= This topic describes the key considerations before going to production with your cluster. However, it is not an exhaustive guide to running your |sr| in production. ======== Hardware ======== If you’ve been following the normal development path, you’ve probably been playing with |sr| on your laptop or on a small cluster of machines laying around. But when it comes time to deploying |sr| to production, there are a few recommendations that you should consider. Nothing is a hard-and-fast rule. ====== Memory ====== |sr| uses |ak| as a commit log to store all registered schemas durably, and maintains a few in-memory indices to make schema lookups faster. A conservative upper bound on the number of unique schemas registered in a large data-oriented company like LinkedIn is around 10,000. Assuming roughly 1000 bytes heap overhead per schema on average, heap size of 1GB would be more than sufficient. ==== CPUs ==== CPU usage in |sr| is light. The most computationally intensive task is checking compatibility of two schemas, an infrequent operation which occurs primarily when new schemas versions are registered under a subject. If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offers will far outweigh a slightly faster clock speed. ===== Disks ===== |sr| does not have any disk resident data. It currently uses |ak| as a commit log to store all schemas durably and holds in-memory indices of all schemas. Therefore, the only disk usage comes from storing the log4j logs. ======= Network ======= A fast and reliable network is obviously important to performance in a distributed system. Low latency helps ensure that nodes can communicate easily, while high bandwidth helps shard movement and recovery. Modern data-center networking (1 GbE, 10 GbE) is sufficient for the vast majority of clusters. Avoid clusters that span multiple data centers, even if the data centers are colocated in close proximity. Definitely avoid clusters that span large geographic distances. Larger latencies tend to exacerbate problems in distributed systems and make debugging and resolution more difficult. Often, people might assume the pipe between multiple data centers is robust or low latency. But this is usually not true and network failures might happen at some point. See the recommended :ref:`schemaregistry_mirroring`. === JVM === We recommend running the latest version of JDK 1.8 with the G1 collector (older freely available versions have disclosed security vulnerabilities). If you are still on JDK 1.7 (which is also supported) and you are planning to use G1 (the current default), make sure you're on u51. We tried out u21 in testing, but we had a number of problems with the GC implementation in that version. Our recommended GC tuning looks like this: .. sourcecode:: bash -Xms1g -Xmx1g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 \ -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M \ -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 .. _schema-reg-config: =============================== Important Configuration Options =============================== The following configurations should be changed for production environments. These options depend on your cluster layout. For multi-cluster deployments, configure |sr| to use |ak|-based primary election as described below. (See :ref:`schemaregistry_config` for full descriptions of these properties.) |ak| based primary election --------------------------- |ak| based primary election is available in |cp| 4.0 and later. This is the recommended method for leader election. To configure |sr| to use |ak| for primary election, configure the ``kafkastore.bootstrap.servers`` setting. ---------------------------- kafkastore.bootstrap.servers ---------------------------- .. line numbers for kafka.bootstrap.servers include to control heading levels here, and has anchor .. include:: ../includes/shared-config.rst :start-line: 27 :end-line: 34 --------- listeners --------- .. sr-listeners include, requires line number reference because heading has an anchor associated with it, and for heading levels here .. include:: ../includes/shared-config.rst :start-line: 41 :end-line: 47 --------- host.name --------- .. line numbers for host.name include to control heading levels here .. include:: ../includes/shared-config.rst :start-line: 128 :end-line: 136 .. note:: Configure ``min.insync.replicas`` on the |ak| server for the internal ``_schemas`` topic that stores all registered schemas to be higher than 1. For example, if the ``kafkastore.topic.replication.factor`` is 3, then set ``min.insync.replicas`` on the |ak| server for the ``kafkastore.topic`` to 2. This ensures that the register schema write is considered durable if it gets committed on at least 2 replicas out of 3. Furthermore, it is best to set ``unclean.leader.election.enable`` to false so that a replica outside of the isr is never elected leader (potentially resulting in data loss). The full set of configuration options are documented in :ref:`schemaregistry_config`. |zk| based primary election --------------------------- .. important:: |zk| leader election and use of :ref:`kafkastore-connection-url` for |zk| leader election |zk| leader election were removed in |cp| 7.0.0. |ak| leader election should be used instead. For details, see :ref:`schemaregistry_zk_migration`. To migrate from |zk| based to |ak| based primary election, see the :ref:`migration` details. =================================== Don't Modify These Storage Settings =================================== |sr| stores all schemas in a |ak| topic defined by ``kafkastore.topic``. Since this |ak| topic acts as the commit log for |sr| database and is the source of truth, writes to this store need to be durable. |sr| ships with very good defaults for all settings that affect the durability of writes to the |ak| based commit log. Finally, ``kafkastore.topic`` must be a compacted topic to avoid data loss. Whenever in doubt, leave these settings alone. If you must create the topic manually, this is an example of proper configuration: .. sourcecode:: bash # kafkastore.topic=_schemas bin/kafka-topics --create --bootstrap-server localhost:9092 --topic _schemas --replication-factor 3 --partitions 1 --config cleanup.policy=compact .. kafkastore.topic include .. include:: ../includes/shared-config.rst :start-after: sr-kafkastore-topic-start :end-before: sr-kafkastore-topic-end .. kafkastore.topic.replication.factor include .. include:: ../includes/shared-config.rst :start-after: sr-kafkastore-topic-replication-factor-start :end-before: sr-kafkastore-topic-replication-factor-end .. kafkastore.timeout.ms include .. include:: ../includes/shared-config.rst :start-after: sr-kafkastore-timeout-start :end-before: sr-kafkastore-timeout-end .. _schemaregistry_zk_migration: ============================================================= Migration from |zk| primary election to |ak| primary election ============================================================= .. important:: |zk| leader election and use of :ref:`kafkastore-connection-url` for |zk| leader election were removed in |cp| 7.0. |ak| leader election should be used instead. This section explains how to reconfigure for |ak| leader election. To migrate from |zk| based to |ak| based primary election, make the following configuration changes in all |sr| nodes: - Remove ``kafkastore.connection.url``. - Remove ``schema.registry.zk.namespace`` if it is configured. - Configure ``kafkastore.bootstrap.servers``. - Configure ``schema.registry.group.id`` if you originally had ``schema.registry.zk.namespace`` for multiple |sr| clusters. If both ``kafkastore.connection.url`` and ``kafkastore.bootstrap.servers`` are configured, |ak| will be used for leader election. =================== Downtime for Writes =================== You can migrate from |zk| based primary election to |ak| based primary election by following below outlined steps. These steps would lead to |sr| not being available for writes for a brief amount of time. - Make above outlined config changes on that node and also ensure ``leader.eligibility`` is set to false in all the nodes - Do a rolling bounce of all the nodes. - Configure ``leader.eligibility`` to true on the nodes that can be primary eligible and bounce them ================= Complete Downtime ================= If you want to keep things simple, you can take a temporary downtime for |sr| and do the migration. To do so, simply shutdown all the nodes and start them again with the new configs. ================== Backup and Restore ================== As discussed in :ref:`schemaregistry_design`, all schemas, subject/version and ID metadata, and compatibility settings are appended as messages to a special |ak| topic ```` (default ``_schemas``). This topic is a common source of truth for schema IDs, and you should back it up. In case of some unexpected event that makes the topic inaccessible, you can restore this schemas topic from the backup, enabling consumers to continue to read |ak| messages that were sent in the Avro format. As a best practice, we recommend backing up the ````. You have three different options for doing so, as described below. Backups using |crep| -------------------- If you already have a multi-datacenter |ak| deployment, you can back up the ```` (``_schemas``) to another |ak| cluster using :ref:`Confluent Replicator `. Backups using a sink connector ------------------------------ Alternatively, you can use a sink-to-storage connector, like the `Kafka Connect Amazon S3 sink connector `__, to back up your data to a separate storage (for example, AWS S3) and continuously keep it updated in storage. Note that the ```` (``_schemas``) must be copied over as raw data (bytes). There are a couple of ways to do this using the |kconnect| S3 sink connector: - To store |ak| records and the ``_schemas`` topic in S3, use the ``ByteArrayConverter`` to back up the ``_schemas`` topic to S3 as raw data. You can then use the ``topics`` or ``topics.regex`` configuration property to list other |ak| topics to back up to S3. (See :ref:`cp-config-sink-connect` for descriptions of ``topics`` and ``topics.regex``.) - If you want to store |ak| records in JSON, AVRO, or Parquet, you can use one S3 sink connector to back up a list of |ak| topics to S3. You then create a second S3 sink connector to copy the ``_schemas`` topic to S3 using ``Bytes`` format. Backups using command line tools -------------------------------- In lieu of either of the above options, you can use |ak| command line tools to periodically save the contents of the topic to a file. For the following examples, we assume that ```` has its default value ``_schemas``. To back up the topic using command line tools, use the ``kafka-console-consumer`` to capture messages from the schemas topic to a file called ``schemas.log``. Save this file off the |ak| cluster. .. sourcecode:: bash bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic _schemas --from-beginning --property print.key=true --timeout-ms 1000 1> schemas.log To restore the topic, use the ``kafka-console-producer`` to write the contents of file ``schemas.log`` to a new schemas topic. This examples uses a new schemas topic name ``_schemas_restore``. If you use a new topic name or use the old one (``_schemas``), make sure to set ```` accordingly. .. sourcecode:: bash bin/kafka-console-producer --broker-list localhost:9092 --topic _schemas_restore --property parse.key=true < schemas.log