Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
JSON Schema Serializer and Deserializer¶
This document describes how to use JSON Schema with the Apache Kafka® Java client and console tools.
Both the JSON Schema serializer and deserializer can be configured to fail if
the payload is not valid for the given schema. This is set by specifying
json.fail.invalid.schema=true
. By default, this property is set to false
.
Tip
The examples below use the default address and port for the Kafka bootstrap server (localhost:9092
) and Schema Registry (localhost:8081
).
JSON Schema Serializer¶
Plug the KafkaJsonSchemaSerializer
into KafkaProducer
to send messages of JSON Schema type to Kafka.
Assuming you have a Java class that is decorated with Jackson annotations, such as the following:
public static class User {
@JsonProperty
public String firstName;
@JsonProperty
public String lastName;
@JsonProperty
public short age;
public User() {}
public User(String firstName, String lastName, short age) {
this(firstName, lastName, age, null);
}
}
You can serialize User objects as follows:
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
"io.confluent.kafka.serializers.json.KafkaJsonSchemaSerializer");
props.put("schema.registry.url", "http://127.0.0.1:8081");
Producer<String, User> producer = new KafkaProducer<String, User>(props);
String topic = "testjsonschema";
String key = "testkey";
User user = new User("John", "Doe", 33);
ProducerRecord<String, User> record
= new ProducerRecord<String, User>(topic, key, user);
producer.send(record).get();
producer.close();
The following additional configurations are available for JSON Schemas derived from Java objects:
json.schema.spec.version
Indicates the specification version to use for JSON schemas derived from objects. Valid values are one of the following strings:draft_4
,draft_6
,draft_7
, ordraft_2019_09
. The default isdraft_7
.json.oneof.for.nullables
Indicates whether JSON schemas derived from objects will useoneOf
for nullable fields. The default boolean value istrue
.json.write.dates.iso8601
Allows dates to be written as ISO-8601 strings. The default boolean value isfalse
.
JSON Schema Deserializer¶
Plug KafkaJsonSchemaDeserializer
into KafkaConsumer
to receive messages
of any JSON Schema type from Kafka.
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "io.confluent.kafka.serializers.json.KafkaJsonSchemaDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(KafkaJsonDeserializerConfig.JSON_VALUE_TYPE, User.class.getName());
String topic = "testjsonschema";
final Consumer<String, User> consumer = new KafkaConsumer<String, User>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, User> records = consumer.poll(100);
for (ConsumerRecord<String, User> record : records) {
System.out.printf("offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
Similar to how the Avro deserializer can return an instance of a specific Avro
record type or a GenericRecord
, the JSON Schema deserializer can return an
instance of a specific Java class, or an instance of JsonNode
.
If the JSON Schema deserializer cannot determine a specific type, then a generic type is returned.
One way to return a specific type is to use an explicit property.
For the Json Schema deserializer, you can configure the property
KafkaJsonSchemaDeseriaizerConfig.JSON_VALUE_TYPE
or
KafkaJsonSchemaDeserializerConfig.JSON_KEY_TYPE
.
In order to allow the JSON Schema deserializer to work with topics with heterogeneous types,
you must provide additional information to the schema. Configure the deserializer with a value
for type.property
that indicates the name of a top-level property on the JSON schema that
specifies the fully-qualified Java type. For example, if type.property=javaType
, the JSON schema
could specify "javaType":"org.acme.MyRecord"
at the top level.
When deserializing a JSON payload, the KafkaJsonSchemaDeserializer
can behave in three ways:
If given a
<json.key.type>
or<json.value.type>
, the deserializer uses the specified type to perform deserialization.The previous configuration won’t work for
RecordNameStrategy
, where more than one type of JSON message might exist in a topic. To handle this case, the deserializer can be configured with<type.property>
with a value that indicates the name of a top-level property on the JSON Schema that specifies the fully-qualified Java type to be used for deserialization. For example, if<type.property>=javaType
, it is expected that the JSON schema will have an additional top-level property namedjavaType
that specifies the fully-qualified Java type. For example, when using the mbknor-jackson-jsonSchema utility to generate a JSON Schema from a Java POJO, one can use the annotation@SchemaInject
to specify the javaType:// Generate javaType property @JsonSchemaInject(strings = {@JsonSchemaString(path="javaType", value="com.acme.User")}) public static class User { @JsonProperty public String firstName; @JsonProperty public String lastName; @JsonProperty @Min(0) public short age; @JsonProperty public Optional<String> nickName; public User() {} ... }
Finally, if no type is provided or no type can be determined, the deserializer returns an instance of a Jackson
JsonNode
.
Here is a summary of specific and generic return types for each schema format.
Avro | Protobuf | JSON Schema | |
---|---|---|---|
Specific type | Generated class that extends org.apache.avro.SpecificRecord | Generated class that extends com.google.protobuf.Message | Java class (that is compatible with Jackson serialization) |
Generic type | org.apache.avro.GenericRecord | com.google.protobuf.DynamicMessage | com.fasterxml.jackson.databind.JsonNode |
Test Drive JSON Schema¶
To get started with JSON Schema, you can use the command line producer and consumer for JSON Schema.
Note
- Prerequisites to run these examples are generally the same as those described for the Schema Registry Tutorial with the exception of Maven, which is not needed here. Also, Confluent Platform version 5.5.0 or later is required here.
- The following examples use the the default Schema Registry URL value (
localhost:8081
). The examples show how to configure this inline by supplying the URL as an argument to the--property
flag in the command line arguments of the producer and consumer (--property schema.registry.url=<address of your schema registry>
). Alternatively, you could set this property in$CONFLUENT_HOME/etc/kafka/server.properties
, and not have to include it in the producer and consumer commands. For example:confluent.schema.registry.url=http://localhost:8081
These examples make use of the kafka-json-schema-console-producer
and kafka-json-schema-console-consumer
, which are located in $CONFLUENT_HOME/bin
.
The command line producer and consumer are useful for understanding how the built-in JSON Schema support works on Confluent Platform.
When you incorporate the serializer and deserializer into the code for your own producers and consumers, messages and associated schemas are processed the same way as they are on the console producers and consumers.
The suggested consumer commands include a flag to read --from-beginning
to
be sure you capture the messages even if you don’t run the consumer immediately
after running the producer. If you leave off the --from-beginning
flag, the
consumer will read only the last message produced during its current session.
Start Confluent Platform using the following command:
confluent local start
Tip
- Alternatively, you can simply run
confluent local schema-registry
which also startskafka
andzookeeper
as dependencies. This demo does not directly reference the other services, such as Connect and Control Center. That said, you may want to run the full stack anyway to further explore, for example, how the topics and messages display on Control Center. To learn more aboutconfluent local
, see Quick Start for Apache Kafka using Confluent Platform (Local) and confluent local in the Confluent CLI command reference. - The
confluent local
commands run in the background so you can re-use this command window. Separate sessions are required for the producer and consumer.
- Alternatively, you can simply run
Verify registered schema types.
Starting with Confluent Platform 5.5.0, Schema Registry now supports arbitrary schema types. You should verify which schema types are currently registered with Schema Registry.
To do so, type the following command (assuming you use the default URL and port for Schema Registry,
localhost:8081
):curl http://localhost:8081/schemas/types
The response will be one or more of the following. If additional schema format plugins are installed, these will also be available.
["JSON", "PROTOBUF", "AVRO"]
Alternatively, use the curl
--silent
flag, and pipe the command through jq (curl --silent http://localhost:8081/schemas/types | jq
) to get nicely formatted output:"JSON", "PROTOBUF", "AVRO"
Use the producer to send JSON Schema records in JSON as the message value.
The new topic,
t1-j
, will be created as a part of this producer command if it does not already exist.kafka-json-schema-console-producer --broker-list localhost:9092 --property schema.registry.url=http://localhost:8081 --topic t1-j \ --property value.schema='{"type":"object","properties":{"f1":{"type":"string"}}}'
Tip
The current JSON Schema specific producer does not show a
>
prompt, just a blank line at which to type producer messages.Type the following command in the shell, and press return.
{"f1": "value1-j"}
Keep this session of the producer running.
Use the command line JSON Schema consumer to read the value just produced to this topic.
kafka-json-schema-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic t1-j --property schema.registry.url=http://localhost:8081
You should see following in the console.
{"f1": "value1-j"}
Keep this session of the consumer running.
Use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema.
JSON Schema has an open content model, which allows any number of additional properties to appear in a JSON document without being specified in the JSON schema. This is achieved with
additionalProperties
set totrue
, which is the default. If you do not explicitly disableadditionalProperties
(by setting it tofalse
), undeclared properties are allowed in records. These next few steps demonstrate this unique aspect of JSON Schema.Return to the producer session that is already running and send the following message, which includes a new property
"f2"
that is not declared in the schema with which we started this producer.{"f2": "value2-j"}
Use the command line JSON Schema consumer to read the value just produced.
Return to the consumer that is already running, and reading from topic
t1-j
. You should see following new message in the console.{"f2": "value2-j"}
The message with the new property (
f2
) is successfully produced and read. If you try this with the other schema formats (Avro, Protobuf), it will fail at the producer command because those specifications require that all properties be explicitly declared in the schemas.Keep this consumer running.
Start a producer and pass a JSON Schema with
additionalProperties
explicitly set tofalse
.Return to the producer command window, and stop the producer with Ctl+C.
Type the following in the shell, and press return. This is the same producer and topic (
t1-j
) used in the previous steps. The schema is almost the same as the previous one, but in this exampleadditionalProperties
is explicitly set to false, as a part of the schema.kafka-json-schema-console-producer --broker-list localhost:9092 --property schema.registry.url=http://localhost:8081 --topic t1-j \ --property value.schema='{"type":"object","properties":{"f1":{"type":"string"}}, "additionalProperties": false}'
Attempt to use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema.
Enter the following at the prompt for the producer that you just started, and press return.
{"f2": "value3-j-this-will-break"}
With
additionalProperties
set tofalse
, this will fail upon attempt to send and crash the producer, with the following error message.org.apache.kafka.common.errors.SerializationException: Error registering JSON schema: {"type":"object","properties":{"f1":{"type":"string"}},"additionalProperties":false} Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409 ...
The consumer will continue running, but no new messages will be displayed.
This is the same behavior you would see by default if using Avro or Protobuf in this scenario.
Tip
If you want to incorporate this behavior into JSON Schema producer code for your applications, include
"additionalProperties": false
into the schemas. Examples of this are shown in the discussion about properties in Understanding JSON Schema.Rerun the producer in default mode as before and send a follow-on message with an undeclared property.
In the producer command window, stop the producer with Ctl+C.
Run the original producer command. There is no need to explicitly declare
additionalProperties
as true (although you could), as this is the default.kafka-json-schema-console-producer --broker-list localhost:9092 --property schema.registry.url=http://localhost:8081 --topic t1-j \ --property value.schema='{"type":"object","properties":{"f1":{"type":"string"}}}'
Use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema.
{"f2": "value3-j-this-will-work-again"}
Return to the consumer session to read the new message.
The consumer should still be running and reading from topic
t1-j
. You will see following new message in the console.{"f2": "value3-j-this-will-work-again"}
More specifically, if you followed all steps in order and started the consumer with the
--from-beginning
flag as mentioned earlier, the consumer shows a history of all messages sent:{"f1":"value1-j"} {"f2":"value2-j"} {"f2":"value3-j-this-will-work-again"}
In another shell, use this curl command to query the schema that was registered with Schema Registry.
curl http://localhost:8081/subjects/t1-j-value/versions/1/schema
You should see the following output on the console:
{"type":"object","properties":{"f1":{"type":"string"}}}
For more readable output, optionally pipe that same command through jq (with
curl
download messages suppressed):curl --silent -X GET http://localhost:8081/subjects/t1-j-value/versions/1/schema | jq
Here is the expected output:
"type": "object", "properties": { "f1": { "type": "string" }
View the schema in more detail by running this command.
curl --silent -X GET http://localhost:8081/subjects/t1-j-value/versions/latest | jq
Here is the expected output of the above command:
"subject": "t1-j-value", "version": 1, "id": 1, "schemaType": "JSON", "schema": "{\"type\":\"object\",\"properties\":{\"f1\":{\"type\":\"string\"}}}"
Use Confluent Control Center to examine schemas and messages.
Messages that were successfully produced also show on Control Center (http://localhost:9021/) in Topics > <topicName> > Messages. You may have to select a partition or jump to a timestamp to see messages sent earlier. (For timestamp, type in a number, which will default to partition
1/Partition: 0
, and press return. To get the message view shown here, select the cards icon on the upper right.)Schemas you create are available on the Schemas tab for the selected topic.
Run shutdown and cleanup tasks.
- You can stop the consumer and producer with Ctl-C in their respective command windows.
- To stop Confluent Platform, type
confluent local stop
. - If you would like to clear out existing data (topics, schemas, and messages) before starting again with another test, type
confluent local destroy
.
Schema References in JSON Schemas¶
Confluent Platform provides full support for the notion of schema references, the ability of a schema to refer to other schemas.
Tip
Schema references are also supported in Confluent Cloud on Avro, Protobuf, and JSON Schema formats. On the Confluent Cloud CLI, you can use the --refs <file>
flag on ccloud schema-registry schema create to reference another schema.
For JSON Schema, the referenced schema is called by using the $ref
keyword, followed by a URL or address for the schema you want to refer to:
{ "$ref": "<URL path to referenced schema>" }
For examples of schema references, see Structuring a complex schema on the JSON Schema website, the example given below in Multiple Event Types in the Same Topic, and the associated blog post that goes into further detail on this.
Multiple Event Types in the Same Topic¶
In addition to providing a way for one schema to call other schemas, schema references can be used to efficiently combine multiple event types in the same topic and still maintain subject-topic constraints.
In JSON Schema, this is accomplished as follows:
Use the default subject naming strategy,
TopicNameStrategy
, which uses the topic name to determine the subject to be used for schema lookups, and helps to enforce subject-topic constraints.Use the JSON Schema construct
oneOf
to define a list of schema references, for example:{ "oneOf": [ { "$ref": "Customer.schema.json" }, { "$ref": "Product.schema.json" }, { "$ref": "Order.schema.json } ] }
When the schema is registered, send an array of reference versions. For example:
[ { "name": "Customer.schema.json", "subject": "customer", "version": 1 }, { "name": "Product.schema.json", "subject": "product", "version": 1 }, { "name": "Order.schema.json", "subject": "order", "version": 1 } ]
Configure the JSON Schema serializer to use your
oneOf
for serialization, and not the event type, by configuring the following properties in your producer application:auto.register.schemas=false use.latest.version=true
For example:
props.put(ProducerConfig.AUTO_REGISTER_SCHEMAS, "false"); props.put(ProducerConfig.USE_LATEST_VERSION, "true");
Tip
- Setting
auto.register.schemas
to false disables auto-registration of the event type, so that it does not override theoneOf
as the latest schema in the subject. Settinguse.latest.version
to true causes the JSON Schema serializer to look up the latest schema version in the subject (which will be theoneOf
) and use that for serialization. Otherwise, if set to false, the serializer will look for the event type in the subject and fail to find it. - See also, Auto Schema Registration in the Schema Registry Tutorial and Schema Registry Configuration Options for Kafka Connect.
- Setting
JSON Schema Compatibility Rules¶
The JSON Schema compatibility rules are loosely based on similar rules for Avro, however, the rules for backward compatibility are more complex.
Primitive Type Compatibility¶
JSON Schema has a more limited set of types than does Avro. However, the following rule from Avro also applies to JSON Schema:
- A writer’s schema of
integer
may be promoted to the reader’s schema ofnumber
.
There are a number of changes that can be made to a JSON primitive type schema that make the schema less restrictive, and thus allow a client with the new schema to read a JSON document written with the old schema. Here are some examples:
- For string types, the writer’s schema may have a
minLength
value that is greater than aminLength
value in the reader’s schema or not present in the reader’s schema; or amaxLength
value that is less than amaxLength
value in the reader’s schema or not present in the reader’s schema. - For string types, the writer’s schema may have a
pattern
value that is not present in the reader’s schema. - For number types, the writer’s schema may have a
minimum
value that is greater than aminimum
value in the reader’s schema or not present; or amaximum
value that is less than amaximum
value in the reader’s schema or not present in the reader’s schema. - For integer types, the writer’s schema may have a
multipleOf
value that is a multiple of themultipleOf
value in the reader’s schema; or that is not present in the reader’s schema.
Object Compatibility¶
For object schemas, JSON Schema supports open content models, closed content models and partially open content models.
- An open content model allows any number of additional properties to appear in a JSON document without being specified in the JSON schema.
This is achieved by specifying
additionalProperties
as true, which is the default. - A closed content model will cause an error to be signaled if a property appears in the JSON document that is not specified in the JSON schema.
This is achieved by specifying
additionalProperties
as false. - A partially open content model allows additional properties to appear in a JSON document without being named in the JSON schema, but the
additional properties are restricted to be of a particular type and/or have a particular name. This is achieved by specifying a schema for
additionalProperties
, or a value forpatternProperties
that maps regular expressions to schemas.
For example, a reader’s schema can add an additional property, say myProperty
, to those of the writer’s schema, but it can only be done in a
backward compatible manner if the writer’s schema has a closed content model. This is because if the writer’s schema has an open content model,
then the writer may have produced JSON documents with myProperty using a different type than the type expected for myProperty
in the reader’s schema.
With the notion of content models, you can adapt the Avro rules as follows:
- The ordering of fields may be different: fields are matched by name.
- Schemas for fields with the same name in both records are resolved recursively.
- If the writer’s schema contains a field with a name not present in the reader’s schema, then the reader’s schema must have an open content model or a partially open content model that captures the missing field.
- If the reader’s schema has a required field that contains a default value, and the writer’s schema has a closed content model and either does not have a field with the same name, or has an optional field with the same name, then the reader should use the default value from its field.
- If the reader’s schema has a required field with no default value, and the writer’s schema either does not have a field with the same name, or has an optional field with the same name, an error is signaled.
- If the reader’s schema has an optional field, and the writer’s schema has a closed content model and does not have a field with the same name, then the reader should ignore the field.
Here are some additional compatibility rules that are specific to JSON Schema:
- The writer’s schema may have a minProperties value that is greater than the minProperties value in the reader’s schema or that is not present in the reader’s schema; or a maxProperties value that is less than the maxProperties value in the reader’s schema or that is not present in the reader’s schema.
- The writer’s schema may have a required value that is a superset of the required value in the reader’s schema or that is not present in the reader’s schema.
- The writer’s schema may have a dependencies value that is a superset of the dependencies value in the reader’s schema or that is not present in the reader’s schema.
- The writer’s schema may have an additionalProperties value of false, whereas it can be true or a schema in the reader’s schema.
Enum Compatibility¶
The Avro rule for enums is directly applicable to JSON Schema.
- If the writer’s symbol is not present in the reader’s
enum
, then an error is signaled.
Array Compatibility¶
JSON Schema supports two types of validation for arrays: list validation, where the array elements are all of the same type, and tuple validation, where the array elements may have different types. case.
- This resolution algorithm is applied recursively to the reader’s and writer’s array item schemas.
Here are some additional compatibility rules that are specific to JSON Schema:
- The writer’s schema may have a minItems value that is greater than the
minItems
value in the reader’s schema or that is not present in the reader’s schema; or amaxItems
value that is less than themaxItems
value in the reader’s schema or that is not present in the reader’s schema. - The writer’s schema may have a
uniqueItems
value of true, whereas it can be false or not present in the reader’s schema.
Union Compatibility¶
Unions are implemented with the oneOf
keyword in JSON Schema. The rules from Avro can be adapted as follows.
- If the reader’s and writer’s schemas are both unions, then the writer’s schema must be a subset of the reader’s schema.
- If the reader’s schema is a union, but the writer’s is not, then the first schema in the reader’s union that matches the writer’s schema is recursively resolved against it. If none match, an error is signaled.
Suggested Reading¶
- Examples of Kafka client producers and consumers, with and without Avro, are documented at Code Examples.
- Schema Registry API Reference
- JSON Schema specification
- JSON Schema learning resources