Kafka Connect FromXML SMT Usage Reference for Confluent Cloud¶
The FromXML single message transform (SMT) reads XML data, which is stored as bytes or a string and converts the XML to a structure that is strongly
typed in connect. For example, it allows data to be converted from XML and stored as AVRO in a topic.
Note
The FromXML SMT is supported on the HTTP V2 Source, HTTP V2 Sink, and IBM MQ Source connectors.
To apply the FromXML SMT, add the following to your connector configuration:
{
"transforms" : "fromXml",
"transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
"transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd"
}
Examples¶
The example below shows how to use FromXML SMT.
Before:
{ "topic" : "test", "kafkaPartition" : 1, "valueSchema" : { "type" : "STRING", "isOptional" : false }, "value" : "<?xml version=\"1.0\"?>\n<x:books xmlns:x=\"urn:books\">\n <book id=\"bk001\">\n <author>Writer</author>\n <title>The First Book</title>\n <genre>Fiction</genre>\n <price>44.95</price>\n <pub_date>2000-10-01</pub_date>\n <review>An amazing story of nothing.</review>\n </book>\n\n <book id=\"bk002\">\n <author>Poet</author>\n \"title\">The Poet's First Poem</title>\n \"genre\">Poem</genre>\n \"price\">24.95</price>\n \"pub_date\">2000-10-01</pub_date>\n \"review\">Least poetic poems.</review>\n </book>\n</x:books>", "timestampType" : "NO_TIMESTAMP_TYPE", "offset" : 1574310211719, "headers" : [ ] }
Adding the SMT to your connector configuration:
To apply the
FromXMLSMT, add the following to your connector configuration:{ "transforms" : "fromXml", "transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value", "transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd" }
After:
After the
FromXMLSMT applies, the value transforms as follows:{ "topic" : "test", "kafkaPartition" : 1, "valueSchema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "book" : { "type" : "ARRAY", "isOptional" : true, "valueSchema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "author" : { "type" : "STRING", "isOptional" : false }, "title" : { "type" : "STRING", "isOptional" : false }, "genre" : { "type" : "STRING", "isOptional" : false }, "price" : { "type" : "FLOAT32", "isOptional" : true }, "pub_date" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "review" : { "type" : "STRING", "isOptional" : false }, "id" : { "type" : "STRING", "isOptional" : true } } } } } }, "value" : { "schema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "book" : { "type" : "ARRAY", "isOptional" : true, "valueSchema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "author" : { "type" : "STRING", "isOptional" : false }, "title" : { "type" : "STRING", "isOptional" : false }, "genre" : { "type" : "STRING", "isOptional" : false }, "price" : { "type" : "FLOAT32", "isOptional" : true }, "pub_date" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "review" : { "type" : "STRING", "isOptional" : false }, "id" : { "type" : "STRING", "isOptional" : true } } } } } }, "fieldValues" : [ { "name" : "book", "schema" : { "type" : "ARRAY", "isOptional" : true, "valueSchema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "author" : { "type" : "STRING", "isOptional" : false }, "title" : { "type" : "STRING", "isOptional" : false }, "genre" : { "type" : "STRING", "isOptional" : false }, "price" : { "type" : "FLOAT32", "isOptional" : true }, "pub_date" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "review" : { "type" : "STRING", "isOptional" : false }, "id" : { "type" : "STRING", "isOptional" : true } } } }, "storage" : [ { "schema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "author" : { "type" : "STRING", "isOptional" : false }, "title" : { "type" : "STRING", "isOptional" : false }, "genre" : { "type" : "STRING", "isOptional" : false }, "price" : { "type" : "FLOAT32", "isOptional" : true }, "pub_date" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "review" : { "type" : "STRING", "isOptional" : false }, "id" : { "type" : "STRING", "isOptional" : true } } }, "fieldValues" : [ { "name" : "author", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "Writer" }, { "name" : "title", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "The First Book" }, { "name" : "genre", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "Fiction" }, { "name" : "price", "schema" : { "type" : "FLOAT32", "isOptional" : true }, "storage" : 44.95 }, { "name" : "pub_date", "schema" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "storage" : 970358400000 }, { "name" : "review", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "An amazing story of nothing." }, { "name" : "id", "schema" : { "type" : "STRING", "isOptional" : true }, "storage" : "bk001" } ] }, { "schema" : { "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm", "type" : "STRUCT", "isOptional" : true, "fieldSchemas" : { "author" : { "type" : "STRING", "isOptional" : false }, "title" : { "type" : "STRING", "isOptional" : false }, "genre" : { "type" : "STRING", "isOptional" : false }, "price" : { "type" : "FLOAT32", "isOptional" : true }, "pub_date" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "review" : { "type" : "STRING", "isOptional" : false }, "id" : { "type" : "STRING", "isOptional" : true } } }, "fieldValues" : [ { "name" : "author", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "Poet" }, { "name" : "title", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "The Poet's First Poem" }, { "name" : "genre", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "Poem" }, { "name" : "price", "schema" : { "type" : "FLOAT32", "isOptional" : true }, "storage" : 24.95 }, { "name" : "pub_date", "schema" : { "name" : "org.apache.kafka.connect.data.Date", "type" : "INT32", "version" : 1, "isOptional" : false }, "storage" : 970358400000 }, { "name" : "review", "schema" : { "type" : "STRING", "isOptional" : false }, "storage" : "Least poetic poems." }, { "name" : "id", "schema" : { "type" : "STRING", "isOptional" : true }, "storage" : "bk002" } ] } ] } ] }, "timestampType" : "NO_TIMESTAMP_TYPE", "offset" : 1574310211719, "headers" : [ ] }
Properties¶
| Name | Description | Type | Default | Valid Values | Importance |
|---|---|---|---|---|---|
schema.path |
A list of URLs that specify the location of the XML schemas the connector must load. Both HTTP and HTTPS paths are supported. | LIST | HIGH | ||
package |
The Java package xjc used to generate the source code. This name is applied to the resulting schema. |
STRING | com.github.jcustenborder.kafka.connect.transform.xml.model |
HIGH | |
xjc.options.automatic.name.conflict.resolution.enabled |
Boolean value that tells the xjc package whether to automatically create unique names
for classes or properties when your XML schema definition causes a naming conflict.
If enabled (true), xjc resolves the conflict without stopping. if disabled (false),
code generation fails, requiring you to manually fix the conflict in schema definition. |
BOOLEAN | [true, false] | LOW | |
xjc.options.strict.check.enabled |
Specifies whether the xjc package performs strict validation of the XML schema definition. |
BOOLEAN | true |
[true, false] | LOW |
xjc.options.verbose.enabled |
If set to true, enables detailed logging of the xjc compilation process,
providing verbose output in the connector logs. If set to false, only standard
logging is provided. |
BOOLEAN | [true, false] | LOW |
Predicates¶
Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the Kafka Connect Filter (Kafka) SMT Usage Reference for Confluent Cloud, predicates can conditionally filter out specific records. For details and examples, see Predicates.