Schema Linking on Confluent Cloud

Schema Registry supports Schema Linking. The quick start below guides you step-by-step with hands-on examples of how to create and use exporters to implement schema linking on your clusters. Following the quick start are details about how to work with schema contexts and exporters. Note that contexts are also useful outside of Schema Linking to organize schemas into purpose-specific groups and create virtual “sub-registries”.

What is Schema Linking?

Schema Linking keeps schemas in sync across two Schema Registry clusters. Schema Linking can be used in conjunction with Cluster Linking to keep both schemas and topic data in sync across two Schema Registry and Kafka clusters.

../_images/schema-linking.png

Schema Registry introduces two new concepts to support Schema Linking:

  • Schema contexts - A schema context represents an independent scope in Schema Registry, and can be used to create any number of separate “sub-registries” within one Schema Registry cluster. Each schema context is an independent grouping of schema IDs and subject names, allowing the same schema ID in different contexts to represent completely different schemas. Any schema ID or subject name without an explicit context lives in the default context, denoted by a single dot .. An explicit context starts with a dot and can contain any parts separated by additional dots, such as .mycontext.subcontext. Context names operate similar to absolute Unix paths, but with dots instead of forward slashes (the default schema is like the root Unix path). However, there is no relationship between two contexts that share a prefix.
  • Schema exporters - A schema exporter is a component that resides in Schema Registry for exporting schemas from one Schema Registry cluster to another. The lifecycle of a schema exporter is managed through APIs, which are used to create, pause, resume, and destroy a schema exporter. A schema exporter is like a “mini-connector” that can perform change data capture for schemas.

The Quick Start below shows you how to get started using schema exporters and contexts for Schema Linking.

For in-depth descriptions of these concepts, see Schema contexts and Schema Exporters

Quick start

This Quick Start has been recently updated to match the latest Confluent Cloud Console, show CLI commands for creating API keys for Schema Registry, and demonstrate how to use service accounts with Schema Linking.

If you’d like to jump in and try out Schema Linking now, follow the steps below. At the end of the Quick Start, you’ll find deep dives on contexts, exporters, command options, and APIs, which may make more sense after you’ve experimented with some hands-on examples.

Get the latest version of the Confluent CLI

Set up two clusters in source and destination environments

For this Quick Start, you set up a source and destination Kafka cluster, each in its own environment. A Schema Registry instance (“cluster”) is automatically created when you add a new environment (one Schema Registry per environment). Therefore, this provides a platform for testing “schema linking” to share schema subjects across two different registries.

  1. Log on to Confluent Cloud with your user account, and create two new environments: one named “SOURCE” and the other named “DESTINATION”.

    Navigate to Environments (top right menu), click Add cloud environment, and follow the steps.

    You must choose a Stream Governance package when you create an environment. For Schema Linking, you can choose Essentials or Advanced, either will work.

  2. Add a Kafka cluster to each environment (for example “my-source-cluster” in the SOURCE environment, and “my-destination-cluster” in the DESTINATION environment.) These can be any cluster type.

  3. Take note of the Schema Registry cluster IDs and API endpoints in each environment, using either the Confluent Cloud Console or the Confluent CLI, as you will need these in the next steps:

    • On the Confluent Cloud Console, navigate to an environment and select the cluster. The Schema Registry cluster ID and API endpoint are shown on the right panel of the environment level display under Stream Governance API.

    • On the Confluent CLI, log on, navigate to each environment (with confluent environment list, confluent environment use environment-id), and use confluent schema-registry cluster describe to get the details for the Schema Registry cluster in that environment.

      Here is example navigation, commands, and output for source and destination Schema Registry descriptions:

      my-laptop:~ me$ confluent env use env-src
      Using environment "env-src".
      
      my-laptop:~ me$ confluent schema-registry cluster describe
      +-------------------------+--------------------------------------------------+
      | Name                    | Always On Stream Governance                      |
      |                         | Package                                          |
      | Cluster                 | lsrc-x6612x                                      |
      | Endpoint URL            | https://psrc-22y2ny.us-west2.gcp.confluent.cloud |
      | Used Schemas            | 0                                                |
      | Available Schemas       | 100                                              |
      | Free Schemas Limit      | 100                                              |
      | Global Compatibility    | BACKWARD                                         |
      | Mode                    | READWRITE                                        |
      | Service Provider        | GCP                                              |
      | Service Provider Region | us-west2                                         |
      | Package                 | ESSENTIALS                                       |
      +-------------------------+--------------------------------------------------+
      
      my-laptop:~ me$ confluent env use <env-dest>
      Using environment "env-dest".
      
      my-laptop:~ me$ confluent schema-registry cluster describe
      +-------------------------+--------------------------------------------------+
      | Name                    | Always On Stream Governance                      |
      |                         | Package                                          |
      | Cluster                 | lsrc-jzzx6w                                      |
      | Endpoint URL            | https://psrc-22y2ny.us-west2.gcp.confluent.cloud |
      | Used Schemas            | 0                                                |
      | Available Schemas       | 100                                              |
      | Free Schemas Limit      | 100                                              |
      | Global Compatibility    | BACKWARD                                         |
      | Mode                    | READWRITE                                        |
      | Service Provider        | GCP                                              |
      | Service Provider Region | us-west2                                         |
      | Package                 | ESSENTIALS                                       |
      +-------------------------+--------------------------------------------------+
      

Configure credentials on source and destination

In the next steps, you will set up permissions to allow access to the source and destination registries in support of Schema Linking.

Configure credentials on the source

Follow the steps In this section to create an API key and secret to authenticate your user account to the source Schema Registry.

  1. Navigate to the SOURCE environment and create an API key associated with the Schema Registry on the SOURCE. For example:

    confluent api-key create --resource <cluster-ID>
    

    Save the SOURCE API key and secret in a secure location. The secret cannot be retrieved later.

  2. Review source access information.

    By now you should have a triplet of Schema Registry access details for the source cluster:

    • Schema Registry URL (the API endpoint)
    • API key
    • API secret

Configure credentials on the destination

Important

  • These are the credentials the exporter will use to access the destination.
  • For Schema Linking exporters in production, always use API keys tied to service accounts, not users. For now, you must use the CLI to create API keys tied to service accounts, as the Confluent Cloud Console does not currently support service accounts for Schema Linking. To learn more, see Create a resource API key and Best Practices for Using API Keys in Confluent Cloud.

In this section, you will:

  • Create a service account on the destination.
  • Create an API key and secret to authenticate the service account to the Schema Registry on the destination.
  • Give the service account permissions (RBAC roles) on Schema Registry subjects on the destination. These are the permissions needed to copy from SOURCE to DESTINATION.
  1. Navigate to the DESTINATION environment and run the command confluent iam service-account create. For example:

    confluent iam service-account create schema-link-destination --description "Service account for Schema Registry on the Destination cluster"
    

    Your output should resemble:

    +-------------+--------------------------------+
    | ID          | sa-123abc                      |
    | Name        | sr-desintation-demo            |
    | Description | Service account for Schema     |
    |             | Registry on the Destination    |
    |             | cluster                        |
    +-------------+--------------------------------+
    
  2. Create an API key and secret, and associate it with the service account you just created to authenticate to the destination Schema Registry.

    You will need the resource ID for the DESTINATION Schema Registry, which you can find on the Confluent Cloud UI in the destination environment on the right panel under “Stream Governance API”, or on the Confluent CLI with the command confluent schema-registry cluster describe.

    For example:

    confluent api-key create --service-account sa-123abc --resource <cluster-ID> --description "Destination Schema Registry API Key"
    

    Save the DESTINATION API key and secret in a secure location. The secret cannot be retrieved later.

  3. Review destination access information.

    By now you should have the following Schema Registry access details for the destination Schema Registry:

    • Schema Registry URL (the API endpoint)
    • Schema Registry cluster ID (resource ID)
    • Service account ID
    • API key
    • API secret
  4. Assign the RBAC role “ResourceOwner” on the Schema Registry as follows to give the service account permissions on all schema subjects.

    confluent iam rbac role-binding create --principal User:<sa-123abc> --role ResourceOwner --environment <env-dest> --schema-registry-cluster <cluster-ID> --resource "Subject:*"
    

    Your output should resemble:

    +---------------+----------------+
    | Principal     | User:sa-qqddpm |
    | Role          | ResourceOwner  |
    | Resource Type | Subject        |
    | Name          | *              |
    | Pattern Type  | LITERAL        |
    +---------------+----------------+
    

Create credentials for the exporter

Your schema exporter will copy schemas in the source environment and export linked copies to the destination, so it needs credentials to access the destination.

Create config.txt which you will use to create exporters, and fill in the URL and credentials the exporter needs to access the DESTINATION cluster. This will allow the exporter to use the service account and associated RBAC roles on the destination, which you set up in the steps under Configure credentials on the destination.

schema.registry.url=<destination sr url>
basic.auth.credentials.source=USER_INFO
basic.auth.user.info=<destination api key>:<destination api secret>

You can find the Destination Schema Registry URL (endpoint) either on the Cloud Console right panel of the DESTINATION environment under “Stream Governance API” or on the Confluent CLI by running the following command:

confluent schema-registry cluster describe --environment <environment ID>

With the CLI command, your output should resemble the following:

+-------------------------+--------------------------------------------------+
| Name                    | Always On Stream Governance                      |
|                         | Package                                          |
| Cluster                 | lsrc-jzzx6w                                      |
| Endpoint URL            | https://psrc-22y2ny.us-west2.gcp.confluent.cloud |
| Used Schemas            | 0                                                |
| Available Schemas       | 100                                              |
| Free Schemas Limit      | 100                                              |
| Global Compatibility    | BACKWARD                                         |
| Mode                    | READWRITE                                        |
| Service Provider        | GCP                                              |
| Service Provider Region | us-west2                                         |
| Package                 | ESSENTIALS                                       |
+-------------------------+--------------------------------------------------+

Create schemas on the source

Create at least three schemas in the source environment; at least one of which has a qualified subject name.

To create each schema from the Cloud Console, follow these steps:

  1. From the Schema Registry tab in the SOURCE environment, Click Schemas on the right panel for an environment, then Add schema.

  2. Fill in the subject name.

    • To create a schema with an unqualified subject name, simply provide a name such as coffee or donuts.

      ../_images/schema-link-unqualified-subject.png
    • To create a schema with a qualified subject name in a specified context, use the syntax: :.<context-name>:<subject-name>. For example: :.snowcones:sales or :.burgers:locations.

      ../_images/schema-link-qualified-subject.png

    On the Cloud Console, enter the following to create four subject names, two in the default context, and two qualified in custom contexts. Note that the qualified context names are prefixed by a dot (“.”):

    Schema Context Subject Name
    default coffee
    default donuts
    .snowcones sales
    .burgers locations
  3. Use the following example for your Avro content for each of the schemas.

    Tip

    For this Quick Start, you are not really working with the schemas themselves, but rather learning how to organize schemas under subject names and contexts. Example content for a schema is suggested below to mimic a real-world scenario, but you can also simply take the default schema to save time, or use schemas you already have. As long as the schemas are properly formatted, they will work for these examples. If you want to change something in the content of each schema as another visual cue that the subject name matches the schema, you could add fields for coffee, donuts, sales, and locations, respectively.

    Example Schema

    {
        "type": "record",
        "namespace": "com.mycorp.mynamespace",
        "name": "sampleRecord",
        "fields": [
            {
            "name": "item",
            "type": "string"
            },
            {
            "name": "location",
            "type": "string"
            },
            {
            "name": "cost",
            "type": "double"
            },
            {
            "name": "code",
            "type": "int"
            }
        ]
    }
    
  4. Click Create.

  5. When you have created your subjects, review the list on the Schema Registry tab on the SOURCE, which should look similar to the following.

    ../_images/schema-link-list-of-subjects.png

Create the exporter on the source

  1. For these Quick Start examples, you’ll want to create the exporters on the source, so make sure your current environment is SOURCE.

    You can use confluent environment list and confluent environment use <environment-id> to navigate.

  2. Create a new exporter using the confluent schema-registry exporter create command.

    For this demo, you want the exporter to copy all schemas, including those in specific contexts (other than the default), so include the --subjects flag with the context wildcard to denote subjects under all contexts: --subjects ":*:"

    confluent schema-registry exporter create <exporter-name> --subjects ":*:" --config <path-to-file>/config.txt
    

    For example, this command creates an exporter called “my-first-exporter” that will export all schemas, including those in specific contexts as well as those in the default context. (The config.txt for this example lives in the user’s home directory.):

    confluent schema-registry exporter create my-first-exporter --subjects ":*:" --config ~/config.txt
    

    You should get output verifying that the exporter was created: Created schema exporter "my-first-exporter".

You can list exporters with confluent schema-registry exporter list and check the status of an exporter with confluent confluent schema-registry exporter get-status <exporter-name>.

More options for exporters

The exporter you just created is relatively basic, in that it just exports everything. As you’ll see in the next section, this is an efficient way to get an understanding of how you might organize, export, and navigate schemas with qualified and unqualified subject names.

Keep in mind that you can create exporters that specify to export only specific subjects and contexts using this syntax:

confluent schema-registry exporter create <exporterName> --subjects <subjectName1>,<subjectName2> \
--context-type CUSTOM --context-name <contextName> \
--config ~/config.txt
  • subjects are listed as a comma-separated string list, such as “sales,coffee,donuts”.
  • subjects, context-type, and context-name are all optional. context-name is specified if context-type is CUSTOM.
  • subjects defaults to * (copy only subjects in the default context), and context-type defaults to AUTO.

Alternatively, if you take all the defaults and do not specify --subjects when you create an exporter, you will get an exporter that exports schemas in all contexts/subjects, including the default context:

confluent schema-registry exporter create my-first-exporter --config ~/config.txt

If you want to export the default context only, specify --subjects to be :.:* With this type of exporter, schemas on the source that have qualified subject names (in a non-default context) will not be exported to the destination.

Another optional parameter you can use with confluent schema-registry exporter create and confluent schema-registry exporter update is --subject-format. This specifies a format for the subject name in the destination cluster, and may contain ${subject} as a placeholder which will be replaced with the default subject name. For example, dc_${subject} for the subject orders will map to the destination subject name dc_orders.

You can create and run multiple exporters at once, so feel free to circle back at the end of the Quick Start to create and test more exporters with different parameters.

See Configuration options for full details on schema exporter parameters.

Verify the exporter is running and view information about it

Still in the SOURCE environment, run the following commands.

  1. List available exporters.

    confluent schema-registry exporter list
    

    Your exporter will show in the list.

  2. Describe the exporter.

    confluent schema-registry exporter describe <exporterName>
    

    For example, fill in my-first-exporter for <exporterName>:

    confluent schema-registry exporter describe my-first-exporter
    

    Your output should resemble:

    confluent schema-registry exporter describe my-first-exporter
    +----------------+------------------------------------------------------------------------+
    | Name           | my-first-exporter                                                      |
    | Subjects       | *                                                                      |
    | Subject Format | ${subject}                                                             |
    | Context Type   | AUTO                                                                   |
    | Context        | .lsrc-x6612x                                                           |
    | Config         | basic.auth.credentials.source="USER_INFO"                              |
    |                | basic.auth.user.info="[hidden]"                                        |
    |                | schema.registry.url="https://psrc-22y2ny.us-west2.gcp.confluent.cloud" |
    +----------------+------------------------------------------------------------------------+
    
  3. Get configurations for the exporter.

    confluent schema-registry exporter get-config my-first-exporter
    
  4. Get the status of exporter.

    confluent schema-registry exporter get-status my-first-exporter
    

    Your output should resemble:

    confluent schema-registry exporter get-status my-first-exporter
    +-------------+-------------------+
    | Name        | my-first-exporter |
    | State       | RUNNING           |
    | Offset      | 218567            |
    | Timestamp   | 1711598116280     |
    | Error Trace |                   |
    +-------------+-------------------+
    

    Tip

    If you get an error at this point and the exporter is in a PAUSED state, verify that you have assigned all the needed RBAC roles to the service account as described in Configure credentials on the destination.

  5. Finally, as a check, get a list of schemas on the source.

    Use the prefix wildcard to list all schemas:

    confluent schema-registry subject list --prefix ":*:"
    

    With the wildcard, this is effectively the same as the command: confluent schema-registry subject list The command will return the list of subjects you’ve created on the source, for example:

            Subject
    -----------------------
      :.burgers:locations
      :.snowcones:sales
      coffee
      donuts
    

Check that the schemas were exported

Now that you have verified that the exporter is running, and you know which schemas you created on the source, check to see that your schemas were exported to the destination.

  1. Switch to the DESTINATION.

    Use confluent environment list and confluent environment use <environment-id> to navigate.

  2. Run the following command to view all schemas.

    confluent schema-registry subject list --prefix ":*:"
    

    Your output list of schemas on the DESTINATION should match those on the SOURCE.

                  Subject
    ----------------------------------
      :.lsrc-jzzx6w.burgers:locations
      :.lsrc-jzzx6w.snowcones:sales
      :.lsrc-jzzx6w:coffee
      :.lsrc-jzzx6w:donuts
    
  3. List only schemas in particular contexts.

    One you have a list of all subjects on the destination with the prefixes (as in the above example), you can pass only the context name to see a narrowed list of subjects in a particular context.

    • For example, to list schemas in the burgers context, where lsrc-jzzx6w is the destination Schema Registry cluster ID:

      confluent schema-registry subject list --prefix ":.lsrc-jzzx6w.burgers:"
      

      The output will be:

                   Subject
      ----------------------------------
        :.lsrc-jzzx6w.burgers:locations
      
    • To list schemas in the snowcones context:

      confluent schema-registry subject list --prefix ":.lsrc-jzzx6w.snowcones:"
      

      The output will be:

                   Subject
      --------------------------------
        :.lsrc-jzzx6w.snowcones:sales
      

Tip

  • If you used the optional parameter --subject-format when you created the exporter on the source, check to see that the exported subjects on the destination map to the subject rename format you specified.

  • You use the same curl commands to call the APIs. Here is an example:

    curl -u <destination api key>:<destination api secret> '<destination sr url>/subjects?subjectPrefix=:.context1:foo'
    

Pause the exporter and make changes

  1. Pause the exporter.

    Switch back to the SOURCE, and run the following command to pause the exporter.

    confluent schema-registry exporter pause <exporterName>
    

    You should get output verifying that the command was successful. For example: Paused schema exporter "my-first-exporter".

    Check the status, just to be sure.

    confluent schema-registry exporter get-status <exporterName>
    

    Your output should resemble:

    confluent schema-registry exporter get-status my-first-exporter
    +--------------------+-------------------+
    | Name               | my-first-exporter |
    | Exporter State     | PAUSED            |
    | Exporter Offset    |          10011386 |
    | Exporter Timestamp |     1631107710822 |
    | Error Trace        |                   |
    +--------------------+-------------------+
    
  2. Reset the schema exporter offset back to schema.id=1.

    • A reset will restart the incremental iteration through the source schemas.

      confluent schema-registry exporter reset <exporterName>
      
    • After the reset, you can verify by getting the status of the exporter.

    confluent schema-registry exporter get-status <exporterName>
    

    The status will show that the offset is reset. For example:

    confluent schema-registry exporter get-status my-first-exporter
    +--------------------+-------------------+
    | Name               | my-first-exporter |
    | Exporter State     | PAUSED            |
    | Exporter Offset    |                -1 |
    | Exporter Timestamp |                 0 |
    | Error Trace        |                   |
    +--------------------+-------------------+
    
  3. Update exporter configurations or information.

    You can choose to update any of subjects, context-type, context-name, or config. For example:

    confluent schema-registry exporter update <exporterName> --context-name <newContextName>
    
  4. Resume schema exporter.

    confluent schema-registry exporter resume <exporterName>
    

Delete the exporter

When you are ready to wrap up your testing, pause and then delete the exporter(s) as follows.

  1. Pause the exporter.

    confluent schema-registry exporter pause <exporterName>
    
  2. Delete the exporter.

    confluent schema-registry exporter delete <exporterName>
    

This concludes the Quick Start. The next sections are a deep dive into the Schema Linking concepts and tools you just tried out.

Schema contexts

What is a schema context?

A schema context, or simply context, is essentially a grouping of subject names and schema IDs. A single Schema Registry cluster can host any number of contexts. Each context can be thought of as a separate “sub-registry”. A context can also be copied to another Schema Registry cluster, using a schema exporter.

How contexts work

Following are a few key aspects of contexts and how they help to organize schemas.

Schemas and schema IDs are scoped by context

Subject names and schema IDs are scoped by context so that two contexts in the same Schema Registry cluster can each have a schema with the same ID, such as 123, or a subject with the same name, such as mytopic-value, without any problem.

To put this another way, subject names and schema IDs are unique per context. You can have schema ID 123 in context .mycontext and schema ID 123 in context .yourcontext and these can be different from one another.

Default context

Any schema ID or subject name without an explicit context lives in the default context, which is represented as a single dot .. An explicit context starts with a dot and can contain any parts separated by additional dots, such as .mycontext.subcontext. You can think of context names as similar to absolute Unix paths, but with dots instead of forward slashes (in this analogy, the default schema context is like the root Unix path). However, there is no relationship between two contexts that share a prefix.

Qualified subjects

A subject name can be qualified with a context, in which case it is called a qualified subject. When a context qualifies a subject, the context must be surrounded by colons. An example is :.mycontext:mysubject. A subject name that is unqualified is assumed to be in the default context, so that mysubject is the same as :.:mysubject (the dot representing the default context).

There are two ways to pass a context to the REST APIs.

Using a qualified subject

A qualified subject can be passed anywhere that a subject name is expected. Most REST APIs take a subject name, such as POST /subjects/{subject}/versions.

There are a few REST APIs that don’t take a subject name as part of the URL path:

  • /schemas/ids/{id}
  • /schemas/ids/{id}/subjects
  • /schemas/ids/{id}/versions

The three APIs above can now take a query parameter named “subject” (written as ?subject), so you can pass a qualified subject name, such as /schemas/ids/{id}?subject=:.mycontext:mysubject, and the given context is then used to look up the schema ID.

Using a base context path

As mentioned, all APIs that specify an unqualified subject operate in the default context. Besides passing a qualified subject wherever a subject name is expected, a second way to pass the context is by using a base context path. A base context path takes the form /contexts/{context} and can be prepended to any existing Schema Registry path. Therefore, to look up a schema ID in a specific context, you could also use the URL /contexts/.mycontext/schemas/ids/{id}.

A base context path can also be used to operate with the default context. In this case, the base context path takes the form “/contexts/:.:/”; for example, /contexts/:.:/schemas/ids/{id}. A single dot cannot be used because it is omitted by some URL parsers.

Multi-Context APIs

All the examples so far operate in a single context. There are three APIs that return results for multiple contexts.

  • /contexts
  • /subjects
  • /schemas?subjectPrefix=:*:

The first two APIs, /contexts and /subjects, return a list of all contexts and subjects, respectively. The other API, /schemas, normally only operates in the default context. This API can be used to query all contexts by passing a subjectPrefix with the value :*:, called the context wildcard. The context wildcard matches all contexts.

Specifying a context name for clients

When using a client to talk to Schema Registry, you may want the client to use a particular context. An example of this scenario is when migrating a client from communicating with one Schema Registry to another. You can achieve this by using a base context path, as defined above. To do this, simply change the Schema Registry URL used by the client from https://<host1> to https://<host2>/contexts/.mycontext.

Note that by using a base context path in the Schema Registry URL, the client will use the same schema context for every Schema Registry request. However, an advanced scenario might involve a client using different contexts for different topics. To achieve this, you can specify a context name strategy to the serializer or deserializer:

  • context.name.strategy=com.acme.MyContextNameStrategy

The context name strategy is a class that must implement the following interface:

/**
 * A {@link ContextNameStrategy} is used by a serializer or deserializer to determine
 * the context name used with the schema registry.
 */
public interface ContextNameStrategy extends Configurable {

  /**
   * For a given topic, returns the context name to use.
   *
   * @param topic The Kafka topic name.
   * @return The context name to use
   */
  String contextName(String topic);
}

Again, the use of a context name strategy should not be common. Specifying the base context path in the Schema Registry URL should serve most needs.

Schema Exporters

What is a Schema Exporter?

Previously, Confluent Replicator was the primary means of migrating schemas from one Schema Registry cluster to another, as long as the source Schema Registry cluster was on-premise. To support schema migration using this method, the destination Schema Registry is placed in IMPORT mode, either globally or for a specific subject.

The new schema exporter functionality replaces and extends the schema migration functionality of Replicator. Schema exporters reside within a Schema Registry cluster, and can be used to replicate schemas between two Schema Registry clusters in Confluent Cloud.

Schema Linking

You use schema exporters to accomplish Schema Linking, using contexts and/or qualified subject names to sync schemas across registries. Schema contexts provide the conceptual basis and namespace framework, while the exporter does the heavy-lift work of the linking.

Schemas export from the source default context to a new context on the destination

By default, a schema exporter exports schemas from the default context in the source Schema Registry to a new context in the destination Schema Registry. The destination context (or a subject within the destination context) is placed in IMPORT mode. This allows the destination Schema Registry to use its default context as usual, without affecting any clients of its default context.

The new context created by default in the destination Schema Registry will have the form .lsrc-xxxxxx, taken from the logical name of the source.

Schema Registry clusters can export schemas to each other

Two Schema Registry clusters can each have a schema exporter that exports schemas from the default context to the other Schema Registry. In this setup, each side can read from or write to the default context, and each side can read from (but not write to) the exported context. This allows you to match the setup of Cluster Linking, where you might have a source topic and a read-only mirror topic on each side.

An exporter can copy schemas across contexts in the same Schema Registry

In addition, a schema exporter can copy schemas from one context to another within the same Schema Registry cluster. For example, you might create a “.staging” context, and then later copy the schemas from the “.staging” context to the default context when production-ready. When copying schemas to and from the same Schema Registry cluster, use the special URL local:///.

Customizing schema exports

There are various ways to customize which contexts are exported from the source Schema Registry, and which contexts are used in the destination Schema Registry. The full list of configuration properties is shown below.

How many exporters are allowed per Schema Registry?

The limit on the number of exporters allowed at any one time per Schema Registry is 10.

Configuration options

A schema exporter has these main configuration properties:

name
A unique name for the exporter.
subjects

This can take several forms:

  • A list of subject names and/or contexts, for example: [ "subject1", "subject2", ".mycontext1", ".mycontext2" ]
  • A singleton list containing a subject name prefix that ends in a wildcard, such as ["mytopic*"]
  • A singleton list containing a lone wildcard, ["*"], that indicates all subjects in the default context. This is the default.
  • A singleton list containing the context wildcard, [":*:"], that indicates all contexts.
subject-format
This is an optional parameter you can use to specify a format for the subject name in the destination cluster. You can specify ${subject} as a placeholder, which will be replaced with the default subject name. For example, dc_${subject} for the subject orders will map to the destination subject name dc_orders.
context-type

One of:

  • AUTO - Prepends the source context with an automatically generated context, which is .lsrc-xxxxxx for Confluent Cloud. This is the default.
  • CUSTOM - Prepends the source context with a custom context name, specified in context below.
  • NONE - Copies the source context as-is, without prepending anything. This is useful to make an exact copy of the source Schema Registry in the destination.
  • DEFAULT - Replaces the source context with the default context. This is useful for copying schemas to the default context in the destination. (Note: DEFAULT is available on Confluent Cloud as of July 2023, and on Confluent Platform starting with version 7.4.2.)
context-name
A context name to be used with the CUSTOM contextType above.
config

A set of configurations for creating a client to talk to the destination Schema Registry, which can be passed in a config file (for example, --config-file  ~/<my-config>.txt). Typically, this includes:

  • schema.registry.url - The URL of the destination Schema Registry. This can also be local:/// to allow for more efficient copying if the source and destination are the same.
  • basic.auth.credentials.source - Typically “USER_INFO”
  • basic.auth.user.info - Typically of the form <api-key>:<api-secret>

System topics and security configurations

The following configurations for system topics are available:

  • exporter.config.topic - Stores configurations for the exporters. The default name for this topic is _exporter_configs, and its default/required configuration is: numPartitions=1, replicationFactor=3, and cleanup.policy=compact.
  • exporter.state.topic - Stores the status of the exporters. The default name for this topic is _exporter_states, and its default/required configuration is: numPartitions=1, replicationFactor=3, and cleanup.policy=compact.

If you are using role-based access control (RBAC), exporter.config.topic and exporter.state.topic require ResourceOwner on these topics, as does the _schemas internal topic. See also, Use Role-Based Access Control (RBAC) in Confluent Cloud and Configuring Role-Based Access Control for Schema Registry on Confluent Platform.

If you are configuring Schema Registry on Confluent Platform using the Schema Registry Security Plugin, you must activate both the exporter and the Schema Registry security plugin by specifying both extension classes in the $CONFLUENT_HOME/etc/schema-registry/schema-registry.properties files:

resource.extension.class=io.confluent.kafka.schemaregistry.security.SchemaRegistrySecurityResourceExtension,io.confluent.schema.exporter.SchemaExporterResourceExtension

The configuration for the exporter resource extension class in the schema-registry.properties is described in Set up source and destination environments in Schema Linking on Confluent Platform.

Lifecycle and states

Schema Registry stores schemas in a Kafka topic. A schema exporter uses the topic offset to determine its progress.

When a schema exporter is created, it begins in the STARTING state. While in this state, it finds and exports all applicable schemas already written to the topic. After exporting previously registered schemas, the exporter then enters the RUNNING state, during which it will be notified of any new schemas, which it can export if applicable. As schemas are exported, the exporter will save its progress by recording the latest topic offset.

If you want to make changes to the schema exporter, you must first “pause” it, which causes it to enter the PAUSED state. The exporter can then be resumed after the proper changes are made. Upon resumption, the exporter will find and export any applicable schemas since the last offset that it recorded.

While an exporter is paused, it can also be “reset”, which will cause it to clear its saved offset and re-export all applicable schemas when it resumes. To accomplish this, the exporter starts off again in STARTING state after a reset, and follows the same lifecycle.

The states of a schema exporter at various stages in its lifecycle are summarized below.

State Description
STARTING The exporter finds and exports all applicable previously registered schemas for the topic. This is the starting state, or the state after a reset.
RUNNING The exporter is notified of new schemas, exports them if applicable, and tracks progress by recording last topic offset.
PAUSED An exporter can be paused; for example, to make configuration changes. When it resumes, the exporter finds and exports schemas since the last recorded offset.

REST APIs

Schema Registry supports the following REST APIs, as fully detailed in Exporters in the Schema Registry API documentation:

Task API
Gets a list of exporters for a tenant GET /exporters
Creates a new exporter POST /exporters
Gets info about an exporter GET /exporters/{name}
Gets the config for an exporter GET /exporters/{name}/config
Gets the status of an exporter GET /exporters/{name}/status
Updates the information for an exporter PUT /exporters/{name}/config
Pauses an exporter PUT /exporters/{name}/pause
Resumes an exporter PUT /exporters/{name}/resume
Resets an exporter, clears offsets PUT /exporters/{name}/reset
Deletes an exporter DELETE /exporters/{name}

Deployment strategies and Schema Linking

Schema Linking can replicate schemas between Schema Registry clusters as follows:

A schema link sends data from a “source cluster” to a “destination cluster”. The supported cluster types are shown in the table below.

Source Schema Registry Cluster Options Destination Schema Registry Cluster Options
Confluent Cloud with internet networking Confluent Cloud with internet networking
Confluent Cloud with internet networking Confluent Platform 7.0+ with an IP address accessible over the public internet
Confluent Platform 7.0+ Confluent Platform 7.0+
Confluent Platform 7.0+ Confluent Cloud with internet networking

Schema Linking can also be used in both directions between two clusters, allowing each side to continue to receive both reads and writes for schemas.

With regard to Confluent Cloud and Confluent Platform solutions, you would use Schema Linking with Cluster Linking to mirror from one instance to the other. Any use of Confluent Platform in these setups require Confluent Platform 7.0.+ or later.

To learn more about Cluster Linking and mirror topics, see Cluster Linking for Confluent Platform and Geo-replication with Cluster Linking on Confluent Cloud.

Access control (RBAC) for Schema Linking

Role-Based Access Control (RBAC) enables administrators to set up and manage user access to Schema Linking. This allows for multiple users to collaborate on with different access levels to various resources.

The following table shows how RBAC roles map to Schema Linking resources. For details on how to manage RBAC for these resources, see List the role bindings for a principal and List the role bindings for a principal.

Role Scope All Schema Linking resources
OrganizationAdmin Organization
EnvironmentAdmin Environment
CloudClusterAdmin Cluster  
Operator Organization, Environment, Cluster  
MetricsViewer Organization, Environment, Cluster  
ResourceOwner Schema Subject ✔ Only if (subject = *) on source Schema Registry and (:.schema_context:*`) on destination Schema Registry
DeveloperManage Schema Subject  
DeveloperRead Schema Subject  
DeveloperWrite Schema Subject  
DataDiscovery Environment  
DataSteward Environment

Table Legend:

  • ✔ = Yes
  • Blank space = No

Permissions on the destination Schema Registry

Keep in mind the following regarding the DESTINATION Schema Registry.

  • If you have schema exporters running, removing permissions for one or more subjects for an account will not prevent that user account from accessing these subjects in the DESTINATION if the DESTINATION Schema Registry is different from the source. Therefore, as a precaution you should also remove permissions for these subjects for the account in the DESTINATION Schema Registry.
  • The schema exporter will stop running if permissions are removed from the DESTINATION Schema Registry for the account that created the schema exporter.

Permissions on schema contexts

If you want to grant permissions to specific schema contexts, you can do so using the Prefix rule and grant permissions with prefix as :.schema_context:*. The permission can be applied to any of the RBAC roles that use scoping.

For example, topic apply READ permission (DeveloperRead) for the context .my_context:

  1. Use the Confluent CLI to create a resource specific service account which will provide the specific permissions for the context.

  2. On Confluent Cloud Console, select Access and Accounts under the dropdown menu on the sidebar.

    ../_images/schema-link-cloud-ui-rbac.png
  3. Select the service account and add the access as shown below.

    Applying the prefix rule, as shown, will enable the DeveloperRead role for the service account only for context .my_context.

    ../_images/schema-link-contexts-prefix-rule.png