.. _schema_evolution_and_compatibility: Schema Evolution and Compatibility ================================== Schema Evolution ---------------- An important aspect of data management is schema evolution. After the initial schema is defined, applications may need to evolve it over time. When this happens, it's critical for the downstream consumers to be able to handle data encoded with both the old and the new schema seamlessly. This is an area that tends to be overlooked in practice until you run into your first production issues. Without thinking through data management and schema evolution carefully, people often pay a much higher cost later on. When using Avro or other schema formats, one of the most important things is to manage the schemas and consider how these schemas should evolve. :ref:`Confluent Schema Registry ` is built for exactly that purpose. Schema compatibility checking is implemented in |sr| by versioning every single schema. The compatibility type determines how |sr| compares the new schema with previous versions of a schema, for a given subject. When a schema is first created for a subject, it gets a unique id and it gets a version number, i.e., version 1. When the schema is updated (if it passes compatibility checks), it gets a new unique id and it gets an incremented version number, i.e., version 2. .. _sr_compatibility_types: Compatibility Types ------------------- Summary ^^^^^^^ The following table presents a summary of the types of schema changes allowed for the different compatibility types, for a given subject. The |sr-long| default compatibility type is ``BACKWARD``. All the compatibility types are described in more detail in the sections below. See also, :ref:`configuration options` on connectors to |sr| that provide further control over compatibility requirements. +--------------------------+-----------------------------+----------------------------------+-------------------+ | Compatibility Type | Changes allowed | Check against which schemas | Upgrade first | +==========================+=============================+==================================+===================+ | ``BACKWARD`` | - Delete fields | Last version | Consumers | | | - Add optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``BACKWARD_TRANSITIVE`` | - Delete fields | All previous versions | Consumers | | | - Add optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``FORWARD`` | - Add fields | Last version | Producers | | | - Delete optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``FORWARD_TRANSITIVE`` | - Add fields | All previous versions | Producers | | | - Delete optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``FULL`` | - Add optional fields | Last version | Any order | | | - Delete optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``FULL_TRANSITIVE`` | - Add optional fields | All previous versions | Any order | | | - Delete optional fields | | | +--------------------------+-----------------------------+----------------------------------+-------------------+ | ``NONE`` | - All changes are accepted | Compatibility checking disabled | Depends | +--------------------------+-----------------------------+----------------------------------+-------------------+ .. _avro-backward_compatibility: Backward Compatibility ^^^^^^^^^^^^^^^^^^^^^^ ``BACKWARD`` compatibility means that consumers using the new schema can read data produced with the last schema. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``BACKWARD`` compatibility ensures that consumers using the new schema `X` can process data written by producers using schema `X` or `X-1`, but not necessarily `X-2`. If the consumer using the new schema needs to be able to process data written by all registered schemas, not just the last two schemas, then use ``BACKWARD_TRANSITIVE`` instead of ``BACKWARD``. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``BACKWARD_TRANSITIVE`` compatibility ensures that consumers using the new schema `X` can process data written by producers using schema `X`, `X-1`, or `X-2`. * ``BACKWARD``: consumer using schema `X` can process data produced with schema `X` or `X-1` * ``BACKWARD_TRANSITIVE``: consumer using schema `X` can process data produced with schema `X`, `X-1`, or `X-2` .. note:: The |sr-long| default compatibility type is ``BACKWARD``, not ``BACKWARD_TRANSITIVE``. An example of a backward compatible change is a removal of a field. A consumer that was developed to process events without this field will be able to process events written with the old schema and contain the field – the consumer will just ignore that field. Consider the case where all of the data in |ak| is also loaded into HDFS, and you want to run SQL queries (for example, using Apache Hive) over all the data. Here, it is important that the same SQL queries continue to work even as the data is undergoing changes over time. To support this kind of use case, you can evolve the schemas in a backward compatible way. All :ref:`supported schema formats ` have rules as to what changes are allowed on what changes are allowed in the new schema for it to be backward compatible. For example, here are the `Avro rules for compatibility `__ If all schemas are evolved in a backward compatible way, we can always use the latest schema to query all the data uniformly. For example, an application can evolve the :ref:`user schema from the previous section ` to the following by adding a new field ``favorite_color``: .. sourcecode:: json {"namespace": "example.avro", "type": "record", "name": "user", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": "int"}, {"name": "favorite_color", "type": "string", "default": "green"} ] } Note that the new field ``favorite_color`` has the default value "green". This allows data encoded with the old schema to be read with the new one. The default value specified in the new schema will be used for the missing field when deserializing the data encoded with the old schema. Had the default value been omitted in the new field, the new schema would not be backward compatible with the old one since it's not clear what value should be assigned to the new field, which is missing in the old data. .. note:: **Avro implementation details:** Take a look at `ResolvingDecoder `__ in the Apache Avro project to understand how, for data that was encoded with an older schema, Avro decodes that data with a newer, backward-compatible schema. .. _avro-forward_compatibility: Forward Compatibility ^^^^^^^^^^^^^^^^^^^^^ ``FORWARD`` compatibility means that data produced with a new schema can be read by consumers using the last schema, even though they may not be able to use the full capabilities of the new schema. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``FORWARD`` compatibility ensures that data written by producers using the new schema `X` can be processed by consumers using schema `X` or `X-1`, but not necessarily `X-2`. If data produced with a new schema needs to be read by consumers using all registered schemas, not just the last two schemas, then use ``FORWARD_TRANSITIVE`` instead of ``FORWARD``. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``FORWARD_TRANSITIVE`` compatibility ensures that data written by producers using the new schema `X` can be processed by consumers using schema `X`, `X-1`, or `X-2`. * ``FORWARD``: data produced using schema `X` can be ready by consumers with schema `X` or `X-1` * ``FORWARD_TRANSITIVE``: data produced using schema `X` can be ready by consumers with schema `X`, `X-1`, or `X-2` An example of a forward compatible schema modification is adding a new field. In most data formats, consumers that were written to process events without the new field will be able to continue doing so even when they receive new events that contain the new field. Consider a use case where a consumer has application logic tied to a particular version of the schema. When the schema evolves, the application logic may not be updated immediately. Therefore, you need to be able to project data with newer schemas onto the (older) schema that the application understands. To support this use case, you can evolve the schemas in a forward compatible way: data encoded with the new schema can be read with the old schema. For example, the new user schema shown in the previous section on :ref:`backward compatibility ` is also forward compatible with the old one. When projecting data written with the new schema to the old one, the new field is simply dropped. Had the new schema dropped the original field ``favorite_number`` (number, not color), it would not be forward compatible with the original user schema since consumers wouldn't know how to fill in the value for ``favorite_number`` for the new data because the original schema did not specify a default value for that field. .. _avro-full_compatibility: Full Compatibility ^^^^^^^^^^^^^^^^^^ ``FULL`` compatibility means schemas are both backward **and** forward compatible. Schemas evolve in a fully compatible way: old data can be read with the new schema, and new data can also be read with the last schema. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``FULL`` compatibility ensures that consumers using the new schema `X` can process data written by producers using schema `X` or `X-1`, but not necessarily `X-2`, and that data written by producers using the new schema `X` can be processed by consumers using schema `X` or `X-1`, but not necessarily `X-2`. If the new schema needs to be forward and backward compatible with all registered schemas, not just the last two schemas, then use ``FULL_TRANSITIVE`` instead of ``FULL``. For example, if there are three schemas for a subject that change in order `X-2`, `X-1`, and `X` then ``FULL_TRANSITIVE`` compatibility ensures that consumers using the new schema `X` can process data written by producers using schema `X`, `X-1`, or `X-2`, and that data written by producers using the new schema `X` can be processed by consumers using schema `X`, `X-1`, or `X-2`. * ``FULL``: backward and forward compatibile between schemas `X` and `X-1` * ``FULL_TRANSITIVE``: backward and forward compatibile between schemas `X`, `X-1`, and `X-2` In some data formats, such as JSON, there are no full-compatible changes. Every modification is either only forward or only backward compatible. But in other data formats, like Avro, you can define fields with default values. In that case adding or removing a field with a default value is a fully compatible change. .. _avro-none_compatibility: No Compatibility Checking ^^^^^^^^^^^^^^^^^^^^^^^^^ ``NONE`` compatibility type means schema compatibility checks are disabled. Sometimes we make incompatible changes. For example, modifying a field type from ``Number`` to ``String``. In this case, you will either need to upgrade all producers and consumers to the new schema version at the same time, or more likely – create a brand-new topic and start migrating applications to use the new topic and new schema, avoiding the need to handle two incompatible versions in the same topic. Transitive Property ------------------- .. include:: includes/transitive.rst Order of Upgrading Clients -------------------------- The configured compatibility type has an implication on the order for upgrading client applications, i.e., the producers using schemas to write events to Kafka and the consumers using schemas to read events from Kafka. Depending on the compatibility type: * ``BACKWARD`` or ``BACKWARD_TRANSITIVE``: there is no assurance that consumers using older schemas can read data produced using the new schema. Therefore, upgrade all consumers before you start producing new events. * ``FORWARD`` or ``FORWARD_TRANSITIVE``: there is no assurance that consumers using the new schema can read data produced using older schemas. Therefore, first upgrade all producers to using the new schema and make sure the data already produced using the older schemas are not available to consumers, then upgrade the consumers. * ``FULL`` or ``FULL_TRANSITIVE``: there are assurances that consumers using older schemas can read data produced using the new schema and that consumers using the new schema can read data produced using older schemas. Therefore, you can upgrade the producers and consumers independently. * ``NONE``: compatibility checks are disabled. Therefore, you need to be cautious about when to upgrade clients. Examples -------- Each of the sections above has an example of the compatibility type. An additional reference for Avro is `Avro compatibility test suite `__, which presents multiple test cases with two schemas and the respective result of the compatibility test between them. Using Compatibility Types ------------------------- Compatibility rules and references for all supported schema types are described in :ref:`sr-serdes-schemas-compatibility-checks` in :ref:`serializer_and_formatter`. You can find out the details on how to use |sr| to store Avro schemas and enforce certain compatibility rules during schema evolution by looking at the :ref:`schemaregistry_api`. Here are some tips to get you started. To check the currently configured compatibility type, view the configured setting: #. :ref:`Using the Schema Registry REST API `. To set the compatibility level, you can configure it in the following ways: #. :ref:`In your client application `. #. :ref:`Using the Schema Registry REST API `. #. Using the |c3-short| Edit Schema feature. See :ref:`topicschema`. To validate the compatibility of a given schema, you may test it one of two ways: #. :ref:`Using the Schema Registry Maven Plugin `. #. :ref:`Using the Schema Registry REST API `. Refer to the |sr-long| Tutorial which has an example of :ref:`checking schema compatibility `.