Integrate Schemas from Kafka Connect in Confluent Platform

Kafka Connect and Schema Registry integrate to capture schema information from connectors. Kafka Connect converters provide a mechanism for converting data from the internal data types used by Kafka Connect to data types represented as Avro, Protobuf, or JSON Schema. The AvroConverter, ProtobufConverter, and JsonSchemaConverter automatically register schemas generated by source connectors. Sink Connectors receive schema information in addition to the data for each message. This allows sink connectors to know the structure of the data to provide additional capabilities like maintaining a database table structure or creating a search index. Each of the converters change schema data into the internal data types used by Kafka Connect.

For additional information about converters and how they work, see Configuring Key and Value Converters.

Example Converter Properties

To use Kafka Connect with Schema Registry, you must specify the key.converter or value.converter properties in the connector or in the Connect worker configuration. The converters need an additional configuration for the Schema Registry URL, which is specified by providing the URL converter prefix as shown in the following property examples.

Avro

Example Avro converter properties are shown below:

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081

The following additional configuration properties can be used with the Avro converter (io.confluent.connect.avro.AvroConverter). These Avro-specific properties are added to the worker or connector configuration where the Avro converter properties are located. Note that when added to the worker or connector configuration, these properties require the key.converter. and value.converter. prefix. For example:

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
key.converter.enhanced.avro.schema.support=true
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
value.converter.enhanced.avro.schema.support=true

When using Avro with basic authentication, you add the following properties:

key.converter.basic.auth.credentials.source=USER_INFO
key.converter.basic.auth.user.info={username}:{password}
value.converter.basic.auth.credentials.source=USER_INFO
value.converter.basic.auth.user.info={username}:{password}

When using Avro in a secure environment, you add value.converter.schema.registry.ssl. properties. An example of these additional properties is shown below:

key.converter.schema.registry.ssl.truststore.location=<location>
key.converter.schema.registry.ssl.truststore.password=<truststore-password>
key.converter.schema.registry.ssl.keystore.location=<keystore-location>
key.converter.schema.registry.ssl.keystore.password=<keystore-password>
key.converter.schema.registry.ssl.key.password=<key-password>

value.converter.schema.registry.ssl.truststore.location=<location>
value.converter.schema.registry.ssl.truststore.password=<truststore-password>
value.converter.schema.registry.ssl.keystore.location=<keystore-location>
value.converter.schema.registry.ssl.keystore.password=<keystore-password>
value.converter.schema.registry.ssl.key.password=<key-password>

The following lists definitions for the Avro-specific configuration properties. For additional Connect Schema Registry configuration options, see Configuration Options.

scrub.invalid.names

Sanitize field and record names that don’t conform to Avro and Protobuf naming rules. Valid names start with a letter or underscore and contain only letters, digits, and underscores. When enabled, the converter transforms invalid names as follows:

If the name starts with a digit, the converter adds x to the start. For example, 48x48 becomes x48x48.
If the name contains other invalid characters, the converter URL-encodes the name and replaces each remaining non-[A-Za-z0-9_] character with _. For example, field name becomes field_name.

Enable this property when a source emits field names that don’t follow Avro or Protobuf rules, such as fields from systems with flexible naming conventions. Without scrubbing, such names cause connector failures or schema registration errors. Scrubbing renames fields, which can affect downstream consumers that rely on the original names.

Type: boolean
Default: false
Importance: medium

schemas.cache.config

The size of the schema cache used in the Avro converter.

Type: int
Default: 1000
Importance: low

enhanced.avro.schema.support

Enable enhanced Avro schema support in the Avro Converter. When set to true, this property preserves Avro schema package information and Enums when going from Avro schema to Connect schema. This information is added back in when going from Connect schema to Avro schema.

Type: boolean
Default: false
Importance: low

connect.meta.data

Allow the Connect converter to add its metadata to the output schema.

Type: boolean
Default: true
Importance: low

The connect.meta.data property preserves the following Connect schema metadata when going from Connect schema to Avro schema. The following metadata is added back in when going from Avro schema to Connect schema.

doc
version
parameters
default value
name
type

Protobuf

Protobuf example converter properties are shown below:

key.converter=io.confluent.connect.protobuf.ProtobufConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.protobuf.ProtobufConverter
value.converter.schema.registry.url=http://localhost:8081

The following lists definitions for the Protobuf-specific configuration properties. For additional Connect Schema Registry configuration options, see Configuration Options.

enhanced.protobuf.schema.support

Enable enhanced Protobuf schema support in the Protobuf Converter. When set to true, this property preserves Protobuf schema package information when going from Protobuf schema to Connect schema. This information is added back when going from Connect schema to Protobuf schema.

Type: boolean
Default: false
Importance: medium

generate.index.for.unions

Whether to generate an index suffix for unions. By default, oneOf messages have their field names suffixed with an index (for example _0), which results in a column name of value_0.thing. To configure oneOf message field names without this suffix, set generate.index.for.unions to false. To learn more about oneOfs in Protobuf, see Multiple event types in the same topic.

Type: boolean
Default: true
Importance: medium

scrub.invalid.names

Sanitize field and record names that don’t conform to Avro and Protobuf naming rules. Valid names start with a letter or underscore and contain only letters, digits, and underscores. When enabled, the converter transforms invalid names as follows:

If the name starts with a digit, the converter adds x to the start. For example, 48x48 becomes x48x48.
If the name contains other invalid characters, the converter URL-encodes the name and replaces each remaining non-[A-Za-z0-9_] character with _. For example, field name becomes field_name.

Enable this property when a source emits field names that don’t follow Avro or Protobuf rules, such as fields from systems with flexible naming conventions. Without scrubbing, such names cause connector failures or schema registration errors. Scrubbing renames fields, which can affect downstream consumers that rely on the original names.

Type: boolean
Default: false
Importance: medium

int.for.enums

Whether to represent enums as integers. The default is false. To represent enums as integers, set int.for.enums to true.

Type: boolean
Default: false
Importance: medium

optional.for.nullables

Whether nullable fields should be specified with an optional label.

Type: boolean
Default: false
Importance: medium

generate.struct.for.nulls

Whether to generate a struct variable for null values.

Type: boolean
Default: false
Importance: medium

wrapper.for.nullables

Whether nullable fields should use primitive wrapper messages.

Type: boolean
Default: false
Importance: medium

wrapper.for.raw.primitives

Whether a wrapper message should be interpreted as a raw primitive at the root level.

Type: boolean
Default: true
Importance: medium

schemas.cache.config

The size of the schema cache used in the Protobuf converter.

Type: int
Default: 1000
Importance: low

JSON Schema

JSON Schema example converter properties are shown below:

key.converter=io.confluent.connect.json.JsonSchemaConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.json.JsonSchemaConverter
value.converter.schema.registry.url=http://localhost:8081

The following lists definitions for the JSON Schema-specific configuration properties. For additional Connect Schema Registry configuration options, see Configuration Options.

Important

Starting with Confluent Platform patch release versions 7.4.15, 7.5.14, 7.6.11, 7.7.9, 7.8.8, 7.9.7, 8.0.5, 8.1.3, and 8.2.1, the json.type.allowed.packages property has been added to the JSON Schema converter and by default is set to * in the Kafka Connect runtime configuration. This option allows any class to be loaded and is not recommended. Set the property to empty string "" to disallow message deserialization with javaType. This is the most secure option. To allow specific packages for deserialization, set this property to a comma-separated list of package prefixes. Classes whose fully qualified name starts with one of the listed prefixes will be allowed. All other classes will be rejected. The following code shows an example of how to set this property to allow certain classes.

value.converter=io.confluent.connect.json.JsonSchemaConverter
value.converter.json.type.allowed.packages=com.mycompany.models,com.mycompany.events

key.converter=io.confluent.connect.json.JsonSchemaConverter
key.converter.json.type.allowed.packages=com.mycompany.keys

object.additional.properties

Whether to allow additional properties for object schemas.

Type: boolean
Default: true
Importance: medium

use.optional.for.nonrequired

Whether to set non-required properties to be optional.

Type: boolean
Default: false
Importance: medium

decimal.format

Controls which format this converter will serialize decimals in. This value is case insensitive and can be either BASE64 (default) or NUMERIC.

replace.null.with.default

Whether to replace fields that have a default value, and are null, with the default value. When set to true, the default value is used, otherwise null is used.

Type: boolean
Default: true
Importance: medium

Note

JSON Schema supports an empty schema that has no type defined. For example:

"f2": {}

This is not allowed with JsonSchemaConverter in Connect. Connect requires a strongly-typed structure for its internal record schema. For example:

"holiday": {
   "oneOf": [
     {
       "title": "Not included",
       "type": "null"
     },
     {}
   ]
 }

Using Independent Key and Value Converters

The key and value converters can be used independently from each other. For example, you may want to use a StringConverter for keys and a converter used with Schema Registry for values. An example of independent key and value properties is shown below:

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081

Converter Property Location and Inheritance

Confluent Platform first looks for converter configuration properties in the connector configuration. If none are found there, properties in the Connect worker configuration are used. You have the following three options for how to set these properties. Each one affects how the properties are inherited among the worker and connectors.

Specify all converter properties (with Schema Registry URL prefixes) in each connector configuration.
Specify all converter properties only in the worker configuration. In this case, all connectors inherit the worker converter properties.
Specify all converter properties in the worker configuration and add all converter properties in the connector configuration if you need to override the worker ones

Important

If converter values and associated Schema Registry URL are defined in both the worker and the connector, settings in the connector overwrite those in the worker.
If you specify a converter in a connector or worker (as an override or as the only setting), you must always include both the converter and the Schema Registry URL, otherwise the connector or worker will fail.
If you specify a converter in a connector that is not defined in the worker, you must supply all converter properties (key converter, value converter, and Schema Registry host:port) in the connector configuration.

Example Scenario

The following are the worker configuration properties used in this example scenario:

group.id=connect-cluster

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://host-1:port
value.converter=org.apache.kafka.connect.storage.StringConverter

Using the previous worker properties, start three connectors with the following configuration properties:

connector-1 configuration:

name=connector-1
<no converter configuration properties used>

connector-2 configuration:

name=connector-2
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://host-2:port

connector-3 configuration:

name=connector-3
key.converter=io.confluent.connect.avro.AvroConverter

The results of the deployment are:

connector-1 uses the worker configuration properties, with the Avro converter (io.confluent.connect.avro.AvroConverter) and the Schema Registry host-1:port.
connector-2 uses the Avro converter (io.confluent.connect.avro.AvroConverter) and the Schema Registry host-2:port.
connector-3 fails because it attempts to use the connector configuration, but does not find the Schema Registry URL configuration property. The Schema Registry URL configuration property is required for Avro, Protobuf, and JSON Schema.
All connectors use the value.converter worker property org.apache.kafka.connect.storage.StringConverter.

Tip

For a deep dive into converters, see: Converters and Serialization Explained.

NULL values replaced with default values

The configuration property ignore.default.for.nullables allows you to use a NULL value for a nullable (optional) column that has a default value configured for it. When set to true, this property ensures that the corresponding record in Kafka is NULL, instead of showing the default column value. The property defaults to false. A configuration snippet using this configuration property is shown below:

value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
value.converter.enhanced.avro.schema.support=true
value.converter.ignore.default.for.nullables=true

Union/Oneof Types in Kafka Connect

Avro, Protobuf, and JSON Schema each have support for union/oneof types, where the value of the union/oneof type can be one of several defined variants. In Connect, the union/oneof types are mapped to a Connect schema as follows:

A union/oneof type of a null and a subtype is flattened to an optional Connect schema representing the subtype.
A union of more than two subtypes is mapped to a Connect struct where each union/oneof member is an optional field of the Connect schema representing the subtype.

In addition, the converters for Avro, Protobuf, and JSON Schema support a configuration parameter called generalized.sum.type.support. If set to true, the converters will use a standard naming convention when converting unions/oneofs to Connect schemas. This allows cross-format compatibility, such as when transforming through a pipeline of the form Avro -> Connect -> Protobuf. By default, generalized.sum.type.support is set to false.

Configuration Options

schema.registry.url

Comma-separated list of URLs for Schema Registry instances that can be used to register or look up schemas.

Type: list
Default: “”
Importance: high

auto.register.schemas

Specify if the Serializer should attempt to register the Schema with Schema Registry.

Type: boolean
Default: true
Importance: medium

use.latest.version

Only applies when auto.register.schemas is set to false. If auto.register.schemas is set to false and use.latest.version is set to true, then instead of deriving a schema for the object passed to the client for serialization, Schema Registry will use the latest version of the schema in the subject for serialization. The property use.latest.version can be set on producers or consumers to serialize or deserialize messages per the latest version.

Type: boolean
Default: false
Importance: medium

Note

To learn more, see how to use schema references to combine multiple event types in the same topic with Avro, JSON Schema, or Protobuf.

latest.compatibility.strict

Only applies when use.latest.version is set to true.

If latest.compatibility.strict is true (the default), then when using use.latest.version=true during serialization, a check is performed to verify that the latest subject version is backward compatible with the schema of the object being serialized. If the check fails, then an error results. If the check succeeds, then serialization is performed.

If latest.compatibility.strict is false, then the latest subject version is used for serialization, without any compatibility check. Serialization may fail in this case. Relaxing the compatibility requirement (by setting latest.compatibility.strict to false) may be useful, for example, when implementing Kafka Connect converters and schema references.

Type: boolean
Default: true
Importance: medium

Note

To learn more about this setting, see Schema Evolution and Compatibility for Schema Registry on Confluent Platform.

max.schemas.per.subject

Maximum number of schemas to create or cache locally.

Type: int
Default: 1000
Importance: low

key.subject.name.strategy

Determines how to construct the subject name under which the key schema is registered with Schema Registry. For additional information, see Schema Registry Subject name strategy.

Any implementation of io.confluent.kafka.serializers.subject.strategy.SubjectNameStrategy can be specified. By default, <topic>-key is used as the subject. Specifying an implementation of io.confluent.kafka.serializers.subject.SubjectNameStrategy is deprecated as of 4.1.3 and if used may have some performance degradation.

Type: class
Default: class io.confluent.kafka.serializers.subject.TopicNameStrategy
Importance: medium

value.subject.name.strategy

Determines how to construct the subject name under which the value schema is registered with Schema Registry. For additional information, see Schema Registry Subject name strategy.

Any implementation of io.confluent.kafka.serializers.subject.strategy.SubjectNameStrategy can be specified. By default, <topic>-value is used as the subject. Specifying an implementation of io.confluent.kafka.serializers.subject.SubjectNameStrategy is deprecated as of 4.1.3 and if used may have some performance degradation.

Type: class
Default: class io.confluent.kafka.serializers.subject.TopicNameStrategy
Importance: medium

basic.auth.credentials.source

Specify how to pick the credentials for the Basic authentication header. The supported values are URL, USER_INFO and SASL_INHERIT.

Type: string
Default: “URL”
Importance: medium

basic.auth.user.info

Specify the user info for the Basic authentication in the form of {username}:{password}. schema.registry.basic.auth.user.info is a deprecated alias for this configuration.

Type: password
Default: “”
Importance: medium

The following Schema Registry dedicated properties, configurable on the client, are available on Confluent Platform version 5.4.0 (and later). To learn more, see the information on configuring clients in Additional configurations for HTTPS.

schema.registry.ssl.truststore.location

The location of the trust store file. For example, schema.registry.kafkastore.ssl.truststore.location=/etc/kafka/secrets/kafka.client.truststore.jks

Type: string
Default: “”
Importance: medium

schema.registry.ssl.truststore.password

The password for the trust store file. If a password is not set, access to the truststore is still available but integrity checking is disabled.

Type: password
Default: “”
Importance: medium

schema.registry.ssl.keystore.location

The location of the key store file. This is optional for the client and can be used for two-way authentication for the client. For example, schema.registry.kafkastore.ssl.keystore.location=/etc/kafka/secrets/kafka.schemaregistry.keystore.jks.

Type: string
Default: “”
Importance: medium

schema.registry.ssl.keystore.password

The store password for the key store file. This is optional for the client and only needed if ssl.keystore.location is configured.

Type: password
Default: “”
Importance: medium

schema.registry.ssl.key.password

The password of the private key in the key store file. This is optional for the client.

Type: password
Default: “”
Importance: medium