Frequently Asked Questions for Schema Registry

FAQ Quick List

Q&As

What are the minimum required RBAC for Schema Registry on Confluent Cloud?

  • OrganizationAdmin, Environment Admin, and DataSteward roles have full access to Schema Registry operations.
  • Schema Registry also supports “resource level” role-based access control (RBAC). You can provide a resource level role, such as ResourceOwner, access to schema subjects within the Schema Registry.

To learn more, see Access control (RBAC) for Confluent Cloud Schema Registry.

What RBAC roles are needed to change Confluent Cloud Schema Registry mode (IMPORT, ReadOnly, and so on)?

  • Mode can be set at the Schema Registry level or subject level.
  • OrganizationAdmin, Environment Admin, and DataSteward roles can set mode at the Schema Registry level.
  • For individual subjects, mode follows the compatibility mode RBAC.

To learn more, see Access control (RBAC) for Confluent Cloud Schema Registry.

What RBAC roles are available for Stream Catalog?

To learn about RBAC and Stream Catalog on Confluent Cloud, see Access control (RBAC) for Stream Catalog.

How does RBAC work with Schema Registry on Confluent Platform?

To learn about RBAC and Confluent Platform, see Configuring Role-Based Access Control for Schema Registry.

How do you find and delete unused schemas?

To learn about managing storage, deleting schemas, and schema limits on Confluent Cloud see the following sections:

How do you find schema IDs?

There are several ways to get schema IDs, including:

View schema IDs on Confluent Cloud Console or Confluent Control Center (Legacy) in Confluent Platform:
Use the Confluent CLI:
Run the command confluent schema-registry schema describe, the output for which includes the ID of the specified schema.
Use the local Kafka scripts to print schema IDs with the consumer:
Use API calls to show schema IDs:

Are there limits on the number of schemas you can maintain?

Confluent Cloud Schema Registry imposes limits on the number of schema versions supported in the registry, depending on the cluster type. When these limits are reached, you can identify unused schemas and free up storage space by deleting them. To learn more, see Delete Schemas and Manage Storage Space on Confluent Cloud.

There are no limits on schemas on self-managed Confluent Platform. To learn more about managing schemas on Confluent Platform, including soft and hard deletes, and schema versioning, see Schema Deletion Guidelines in the Confluent Platform documentation.

How do you delete schemas?

To learn about deleting schemas on Confluent Cloud, see Delete Schemas and Manage Storage Space on Confluent Cloud.

To learn how to delete schemas on Confluent Platform, see Schema Deletion Guidelines in the Confluent Platform documentation.

Can you recover deleted schemas?

You can recover soft-deleted schemas on both Confluent Cloud and Confluent Platform, as described in:

If you have still have the schema definition for a hard-deleted schema that you want to recover, you can essentially recover a the schema using subject-level schema migration as a workaround. To learn how to do this, see Migrate an Individual Schema to an Already Populated Schema Registry (subject-level migration).

What are schema contexts and when should you use them?

A schema context is an ad-hoc grouping of subject names and schema IDs. You can use a context name strategy to help you organize your schemas; using them to group together by name a set of logically related schemas into what can be thought of as a sub-registry.

Schema IDs and subject names without explicit contexts are maintained in the default context. Subject names and IDs are unique per context, so you could have an unqualified subject :.:my-football-teams in the default context (indicated by the . representing the default context) and a qualified subject :.my-cool-teams:my-football-teams: in the context :.my-cool-teams: and they can function as independent and unique subjects. The qualified and unqualified subjects could even have the same schema IDs, and still be unique by virtue of being in different contexts.

There are a few use cases for contexts beyond simple organization, and more concepts and strategies for using them. You can leverage multi-context APIs and set up a context name strategy for Schema Registry clients to use.

Schema contexts are useful for Schema Linking, where they are used in concert with exporters, but you can also use them outside of Schema Linking if so desired. To learn more about schema contexts and how they work, see:

What is the advantage of using qualified schemas over schemas under the default context?

Schema linking preserves schema IDs; therefore if you export schemas to another cluster, you can copy them into non-default contexts to avoid ID collision with schemas under existing contexts. Also, contexts provide a way to separate different environments for schemas. For example, you could develop with schemas in a “developer” context, and promote them to “production” context when development is done.

To learn more, see:

Which clients can consume against the schema context?

All clients (Java, .NET, Spring Boot, and so on), can specify an explicit context as part of the Schema Registry URL; for example, http://mysr:8081/contexts/mycontext. Currently only the Java client also passes the subject name when it looks up an ID. With the subject name, Schema Registry can find the correct context for the ID if it is not in the default context. This may be supported by .NET and Python clients in future releases.

Does Schema Linking support mTLS?

Source and destination schema registries provide support for mTLS. Does Schema Linking also provide this support? If so, how do you provide certificates to connect with the source Schema Registry?

On Confluent Platform 7.1 and later, Schema Registry clients can accept certificates for mTLS authentication in PEM format.

Can the schema exporter use any set of valid certificates to authenticate with source and destination schema registries, or only default certificates?

Yes, any certificates can be passed.

How will Schema Linking be maintained across Confluent Platform version updates?

Any future changes to Schema Linking will be done in a backward-compatible manner.

How do you implement bi-directional Schema Linking?

Schema Linking is implemented in “push” mode; therefore, to achieve bi-directional Schema Linking, each side must initiate a schema exporter to the other side.

What is the best way to set up a custom-managed application on Confluent Cloud Schema Registry with a proxy?

When custom-managed applications are configured with Confluent Cloud Schema Registry through a proxy, configure the Schema Registry endpoint with a fully qualified domain name (FQDN) and resolve the FQDN to proxy, if necessary.

What are the requirements for using Data Contracts with Schema Registry?

Data Contracts require specific versions and packages:

General requirements: - Schema rules are only available on Confluent Enterprise and Confluent Cloud with the Stream Governance “Advanced” package. - Schema rules are only supported in version 7.4 or above.

Confluent Cloud requirements: - Enable Schema Registry with the Advanced Stream Governance package. - See Choose a Stream Governance package and enable Schema Registry for Confluent Cloud.

Confluent Platform requirements: - Schema rules are only available on Confluent Enterprise (not on the Community edition). - Enable schema rules by adding the appropriate property to the Schema Registry configuration before starting.

To learn more, see Data Contracts for Schema Registry on Confluent Platform.

How do schema rules work with schema evolution and compatibility?

Schema rules are part of Data Contracts and work alongside schema evolution and compatibility checking. They allow you to:

  • Define domain validation rules for your schemas
  • Ensure data quality through field-level validation
  • Maintain data consistency across different systems
  • Support schema migration with validation rules

Rules are enforced during schema registration and can be configured to validate data at serialization or deserialization time.

To learn more about schema evolution with rules, see Schema Evolution and Compatibility for Schema Registry on Confluent Platform.

What are the hardware requirements for Schema Registry in production?

Memory: - Schema Registry maintains in-memory indices for faster schema lookups - A conservative upper bound for large companies is around 10,000 unique schemas - With roughly 1000 bytes heap overhead per schema, 1GB heap size is more than sufficient

CPUs: - CPU usage is light in Schema Registry - Most intensive task is schema compatibility checking (infrequent operation) - Choose more cores over faster CPUs for better concurrency

Disks: - Schema Registry has no disk resident data - Uses Kafka as a commit log for durable storage - Only disk usage is for log4j logs

Network: - Fast and reliable network is important (1 GbE, 10 GbE sufficient) - Avoid clusters spanning multiple data centers or large geographic distances - Low latency helps with node communication

To learn more, see Deploy Schema Registry in Production.

How much memory does Schema Registry typically use?

Schema Registry uses Kafka as a commit log to store schemas durably and maintains in-memory indices for fast lookups. Memory usage is typically very light:

  • Large organizations might have around 10,000 unique schemas
  • Each schema requires roughly 1000 bytes of heap overhead
  • Therefore, 1GB of heap memory is more than sufficient for most deployments

The in-memory indices are used to make schema lookups faster, while the actual durable storage happens through Kafka.

How do you configure OAuth authentication for Schema Registry?

Schema Registry supports OAuth authentication for secure access. OAuth configuration involves:

  • Setting up OAuth providers and client credentials
  • Configuring Schema Registry to validate OAuth tokens
  • Setting up client applications to authenticate using OAuth

For detailed configuration steps, see Configure OAuth for Schema Registry on Confluent Cloud and Configure OAuth for Schema Registry.

What is passwordless authentication for Schema Registry?

Passwordless authentication allows Schema Registry to authenticate without traditional username/password credentials. This typically involves:

  • Certificate-based authentication
  • Token-based authentication
  • Integration with identity providers

To learn more about passwordless authentication options, see Passwordless authentication for Schema Registry.

When should I use GraphQL API vs REST API for Stream Catalog?

Both APIs are available for Stream Catalog, but they serve different purposes:

GraphQL API: - Preferred for search operations - Can search across entity relationships - Provides declarative data fetching - More natural way to explore the catalog graph - Currently does not support business metadata attribute search

REST API: - Supports full CRUD operations (create, read, update, delete) - Required for managing business metadata attributes - Good for simple operations and integration with existing REST-based systems

Recommendation: Use GraphQL for search and exploration, REST for entity management and business metadata operations.

To learn more, see Stream Catalog GraphQL API Usage and Stream Catalog REST API Usage.

What are the different Stream Governance packages available?

Stream Governance packages determine which features are available in your Confluent Cloud environment:

  • Essentials package: Basic Stream Governance capabilities such as Stream Catalog, Stream Lineage, and Data Portal.
  • Advanced package: Full Stream Governance features, including Data Contracts, schema rules, and advanced Stream Catalog features

Different features are available depending on your package level. Data Contracts and schema rules require the Advanced package.

To learn more about packages and features, see Stream Governance Packages.

How does the Data Portal access request workflow work?

The Data Portal provides a self-service interface for discovering and accessing Kafka topics:

For data users: - Search and discover existing topics using metadata - Request access to topics through an approval workflow - View and use data once access is granted

For data owners/admins: - Receive and manage access requests - Approve or deny requests based on business rules - Maintain topic metadata to improve discoverability

Note: The topic access request workflow is not available for topics on Basic clusters.

To learn more, see Data Portal on Confluent Cloud.

How does Stream Lineage help track data?

Stream Lineage provides visibility into data flow and transformations across your streaming applications:

  • Track data origins: See where your data comes from
  • Understand transformations: View how data changes as it moves through your pipeline
  • Impact analysis: Understand downstream effects of schema or data changes
  • Compliance and governance: Maintain audit trails for data usage

Stream Lineage integrates with Stream Catalog and Data Portal to provide a comprehensive view of your data ecosystem.

To learn more, see Track Data with Stream Lineage.