.. _schemaregistry_intro: |sr| Overview =============== Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for :term:`serialization ` and :term:`deserialization ` of the data over the network. :term:`Producers ` and :term:`consumers ` to |ak| topics can use schemas to ensure data consistency and compatibility as schemas evolve. |sr| is a key component for :cloud:`data governance|stream-governance/index.html#stream-governance-on-ccloud`, helping to ensure data quality, adherence to standards, visibility into data lineage, audit capabilities, collaboration across teams, efficient application development protocols, and system performance. Quick starts ------------ - The |sr| tutorials provide full walkthroughs on how to enable client applications to read and write Avro data, check schema version compatibility, and use the UIs to manage schemas. - Sign up for :ccloud-cta:`Confluent Cloud|` and use the :cloud:`Confluent Cloud Schema Registry Tutorial|sr/schema_registry_ccloud_tutorial.html` to get started. - Download :cp-download:`Confluent Platform|` and use the :platform:`Confluent Platform Schema Registry Tutorial|schema-registry/schema_registry_tutorial.html` to get started. - For a quick hands on introduction, jump to the `Schema Registry module of the free Apache Kafka 101 `__ course to learn why you would need a |sr|, what it is, and how to get started. Also see the free `Schema Registry 101 `__ course to learn about the schema formats and how to build, register, manage and evolve schemas. - On |ccloud|, try out the interactive tutorials embedded in the |ccloud-console-short|. `Take this link to sign up or sign in to Confluent Cloud `__, and try out the guided workflows directly in |ccloud|. - To learn about :ref:`schema formats `, create schemas, and use producers and consumers to send messages to topics, see :ref:`sr-test-drive-avro`, :ref:`sr-test-drive-protobuf`, and :ref:`sr-test-drive-json-schema`. About |sr| ---------- |sr| is a key component of :cloud:`Stream Governance|stream-governance/index.html`, available in Essentials and Advanced :cloud:`Stream Governance packages|stream-governance/packages.html`, and a premium feature of self-managed |cp| that provides a centralized repository for managing and validating schemas used in data processing and serialization (into and out of binary format). As an event based system, Confluent :term:`brokers ` uses |sr| to intelligently transfer |ak| topic message data and events between producers and consumers. .. figure:: ../images/schema-registry-ecosystem.jpg :align: center |sr| provides several benefits, including data validation, compatibility checking, versioning, and evolution. It also simplifies the development and maintenance of data pipelines and reduces the risk of data compatibility issues, data corruption, and data loss. |sr| enables you to define schemas for your data formats and versions, and register them with the registry. Once registered, the schema can be shared and reused across different systems and applications. When a producer sends data to a message broker, the schema for the data is included in the message header, and |sr| ensures that the schema is valid and compatible with the expected schema for the topic. Data ecosystem -------------- |sr| provides the following services: - Allows producers and consumers to communicate over a well-defined `data contract` in the form of a schema - Controls schema evolution with clear and explicit compatibility rules - Optimizes the payload over the wire by passing a schema ID instead of an entire schema definition At its core, |sr| has two main parts: - A REST service for validating, storing, and retrieving :term:`Avro`, :term:`JSON Schema`, and :term:`Protobuf` schemas - Serializers and deserializers that plug into |ak-tm| clients to handle schema storage and retrieval for |ak| messages across the three formats |sr| seamlessly integrates with the rest of the Confluent ecosystem: - |ak| is integrated with |sr| through :platform:`Schema Validation on Confluent Platform|schema-registry/schema-validation.html` and :cloud:`Schema Validation on Confluent Cloud|sr/broker-side-schema-validation.html`. - |kconnect| is integrated with |sr| with :platform:`converters|schema-registry/connect.html`. - |ksqldb|, |crest-long|, and the |confluent-cli| are integrated with |sr| through serialization formats. - Both the |ccloud-console| and |c3-short| for |cp| are integrated with |sr| through the :cloud:`message browsers|client-apps/topics/messages.html` on those UIs. Understanding schemas --------------------- A schema defines the structure of message data. It defines allowed data types, their format, and relationships. A schema acts as a blueprint for data, describing the structure of data records, the data types of individual fields, the relationships between fields, and any constraints or rules that apply to the data. Schemas are used in various data processing systems, including databases, message brokers, and distributed event and data processing frameworks. They help ensure that data is consistent, accurate, and can be efficiently processed and analyzed by different systems and applications. Schemas facilitate data sharing and interoperability between different systems and organizations. Common data problems solved by |sr| ----------------------------------- |sr| solves the following common problems of working with large scale data systems: - Data inconsistency: A registry ensures that all system data adheres to agreed upon schemas. This reduces risk of data inconsistency and increases data quality. - Incompatible data formats: With multiple data producers and consumers, different applications may use different data formats. |sr| solves this problem by providing centralized schema management and validation, to ensure that all message data is compatible. - Schema evolution: Schemas often change over time, which can cause compatibility issues between different versions of the schema. |sr| supports schema versioning, ensuring that different versions of the schema can be used simultaneously without causing compatibility issues. - Schema validation: |sr| checks that data produced to a topic is using a valid schema ID in |sr|. This ensures that data conforms to a standard format, reducing risk of data loss or corruption. - `Data governance`: |sr| a central location to manage and version data schemas. This simplifies governance by enabling easy tracking of schema changes, maintaining schema evolution history, and ensuring data compliance with regulatory requirements. |sr| features and benefits -------------------------- |sr| helps improve reliability, flexibility, and scalability of systems and applications by providing a standard way to manage and validate schemas used by producers and consumers. |sr| provides the following advantages with regard to data management: - Centralized schema management and storage, which makes it easier to track and maintain different versions of schemas used by various producers and consumers. - Schema validation, which means |sr| validates the structure and compatibility of schemas. This ensures that topic message data conforms to a standard format and is error-free, reducing the risk of data loss or corruption. - Compatibility checking of schemas between producers and consumers to ensure that message data can be consumed by different applications and systems without resulting in errors or data loss due to message formatting. - Versioning of schemas, which allows for updates to schemas without breaking compatibility with existing data. This provides a smooth transition to new versions of a schema with continued support for legacy data, and reduces the need for expensive and time-consuming data migration. - Simplified development, by providing a standard way to define and manage schemas. This reduces the amount of custom code needed to manage schema changes and makes it easier to onboard new developers to a project. Compatibility and schema evolution ---------------------------------- |ak-tm| producers write data to |ak| topics and |ak| consumers read data from |ak| topics. There is an implicit "contract" that producers write data with a schema that can be read by consumers, even as producers and consumers evolve their schemas. |sr| helps ensure that this contract is met with compatibility checks. It is useful to think about schemas as APIs. Applications depend on APIs and expect any changes made to APIs are still compatible and applications can still run. Similarly, streaming applications depend on schemas and expect any changes made to schemas are still compatible and they can still run. Schema evolution requires compatibility checks to ensure that the producer-consumer contract is not broken. This is where |sr| helps: it provides centralized schema management and compatibility checks as schemas evolve. To learn more about how |sr| manages compatibility, see the following topic in either the |ccloud| or |cp| documentation: - |ccloud| documentation: :cloud:`Schema Evolution and Compatibility for Schema Registry|sr/fundamentals/schema-evolution.html` - |cp| documentation: :platform:`Schema Evolution and Compatibility for Schema Registry|schema-registry/fundamentals/schema-evolution.html` Data serialization ------------------ A schema is typically used in data serialization, which is the process of converting data structures or objects into a format that can be transmitted across a network or stored in a file. In this context, a schema defines the format of the `serialized` data and is used to validate the data as it is being `deserialized` by another system or application. Confluent |sr| supports `Avro `__, `JSON Schema `__, and `Protobuf `__ serializers and deserializers (`serdes`). When you write producers and consumers using these supported formats, they handle the details of the wire format for you, so you don't have to worry about how messages are mapped to bytes. Always start with a |sr| ------------------------ If you start without |sr| and retrofit later, you increase the workload by using custom code as a base that you must then pull re-do to some extent. Starting from the beginning of a project with |sr| is a best practice for several reasons, some of which are already explained in previous sections: - Prevents data inconsistency and increases data quality. - Simplifies data governance. - Reduces development time. Starting with |sr| eliminates the need for custom code to manage and validate schemas. This also cuts costs and increases development productivity. - Facilitates collaboration by providing a standard interface for sharing schemas across different teams and applications. This facilitates collaboration and helps to avoid potential data integration issues. - Improves system performance. |sr| validates and optimizes schemas for efficient data exchange, which can improve system performance and reduce processing time. |sr-ccloud| ----------- |sr| is built into |ccloud|. You must opt for either the Essentials or Advanced package to use it. - For information about |sr| Essentials and Advanced packages, see :cloud:`Packages, Features, and Limits|stream-governance/index.html`. - To learn about working with schemas on |ccloud|, see :cloud:`Manage Schemas in Confluent Cloud|sr/schemas-manage.html`. - To learn about Stream Governance, see :cloud:`Stream Governance on Confluent Cloud|stream-governance/index.html`. .. _sr_license_info: |cp| license ------------ |sr| is licensed under the `Confluent Community License `__. A |cpe| license is required for the :platform:`Schema Registry Security Plugin|confluent-security-plugins/schema-registry/introduction.html#confluentsecurityplugins-schema-registry-security-plugin` and for broker-side :platform:`Schema Validation on Confluent Server|schema-registry/schema-validation.html`. You can use the plugin and |sv| under a 30-day trial period without a license key, and thereafter under an :platform:`Enterprise (Subscription) License |installation/license.html#enterprise-subscription-license` as part of |cp|. To learn more more about the security plugin, see :platform:`License for Schema Registry Security Plugin|schema-registry/installation/config.html#license-for-sr-security-plugin` and :platform:`Install and Configure the Schema Registry Security Plugin|confluent-security-plugins/schema-registry/install.html`. Related content --------------- - :cloud:`Confluent Cloud Schema Registry Tutorial|sr/schema_registry_ccloud_tutorial.html` - :platform:`Confluent Platform Schema Registry Tutorial|schema-registry/schema_registry_tutorial.html` - `Schema Registry 101: Key Concepts of Schema Registry `__ - `Schema Registry module of the free Apache Kafka 101 `__ - `How to Keep Bad Data Out of Apache Kafka with Stream Quality `__ - `Schemas, Contracts, and Compatibility `_ - `17 Ways to Mess Up Self-Managed Schema Registry `_ - `Yes, Virginia, You Really Do Need a Schema Registry `_ - `How I Learned to Stop Worrying and Love the Schema `_ - `Formats, Serializers, and Deserializers `__