Kafka Connect FromXML SMT Usage Reference for Confluent Cloud

The FromXML single message transform (SMT) reads XML data, which is stored as bytes or a string and converts the XML to a structure that is strongly typed in connect. For example, it allows data to be converted from XML and stored as AVRO in a topic.

Note

The FromXML SMT is supported on the HTTP V2 Source, HTTP V2 Sink, and IBM MQ Source connectors.

To apply the FromXML SMT, add the following to your connector configuration:

{
  "transforms" : "fromXml",
  "transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
  "transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd"
 }

Examples

The example below shows how to use FromXML SMT.

  • Before:

    {
      "topic" : "test",
      "kafkaPartition" : 1,
      "valueSchema" : {
          "type" : "STRING",
          "isOptional" : false
         },
      "value" : "<?xml version=\"1.0\"?>\n<x:books xmlns:x=\"urn:books\">\n    <book id=\"bk001\">\n        <author>Writer</author>\n        <title>The First Book</title>\n        <genre>Fiction</genre>\n        <price>44.95</price>\n        <pub_date>2000-10-01</pub_date>\n        <review>An amazing story of nothing.</review>\n    </book>\n\n    <book id=\"bk002\">\n        <author>Poet</author>\n        \"title\">The Poet's First Poem</title>\n        \"genre\">Poem</genre>\n        \"price\">24.95</price>\n        \"pub_date\">2000-10-01</pub_date>\n        \"review\">Least poetic poems.</review>\n    </book>\n</x:books>",
      "timestampType" : "NO_TIMESTAMP_TYPE",
      "offset" : 1574310211719,
      "headers" : [ ]
     }
    
  • Adding the SMT to your connector configuration:

    To apply the FromXML SMT, add the following to your connector configuration:

    {
      "transforms" : "fromXml",
      "transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
      "transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd"
     }
    
  • After:

    After the FromXML SMT applies, the value transforms as follows:

    {
      "topic" : "test",
      "kafkaPartition" : 1,
      "valueSchema" : {
        "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm",
        "type" : "STRUCT",
        "isOptional" : true,
        "fieldSchemas" : {
          "book" : {
            "type" : "ARRAY",
            "isOptional" : true,
            "valueSchema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            }
          }
        }
      },
      "value" : {
        "schema" : {
          "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm",
          "type" : "STRUCT",
          "isOptional" : true,
          "fieldSchemas" : {
            "book" : {
              "type" : "ARRAY",
              "isOptional" : true,
              "valueSchema" : {
                "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
                "type" : "STRUCT",
                "isOptional" : true,
                "fieldSchemas" : {
                  "author" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "title" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "genre" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "price" : {
                    "type" : "FLOAT32",
                    "isOptional" : true
                  },
                  "pub_date" : {
                    "name" : "org.apache.kafka.connect.data.Date",
                    "type" : "INT32",
                    "version" : 1,
                    "isOptional" : false
                  },
                  "review" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "id" : {
                    "type" : "STRING",
                    "isOptional" : true
                  }
                }
              }
            }
          }
        },
        "fieldValues" : [ {
          "name" : "book",
          "schema" : {
            "type" : "ARRAY",
            "isOptional" : true,
            "valueSchema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            }
          },
          "storage" : [ {
            "schema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            },
            "fieldValues" : [ {
              "name" : "author",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Writer"
            }, {
              "name" : "title",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "The First Book"
            }, {
              "name" : "genre",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Fiction"
            }, {
              "name" : "price",
              "schema" : {
                "type" : "FLOAT32",
                "isOptional" : true
              },
              "storage" : 44.95
            }, {
              "name" : "pub_date",
              "schema" : {
                "name" : "org.apache.kafka.connect.data.Date",
                "type" : "INT32",
                "version" : 1,
                "isOptional" : false
              },
              "storage" : 970358400000
            }, {
              "name" : "review",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "An amazing story of nothing."
            }, {
              "name" : "id",
              "schema" : {
                "type" : "STRING",
                "isOptional" : true
              },
              "storage" : "bk001"
            } ]
          }, {
            "schema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            },
            "fieldValues" : [ {
              "name" : "author",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Poet"
            }, {
              "name" : "title",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "The Poet's First Poem"
            }, {
              "name" : "genre",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Poem"
            }, {
              "name" : "price",
              "schema" : {
                "type" : "FLOAT32",
                "isOptional" : true
              },
              "storage" : 24.95
            }, {
              "name" : "pub_date",
              "schema" : {
                "name" : "org.apache.kafka.connect.data.Date",
                "type" : "INT32",
                "version" : 1,
                "isOptional" : false
              },
              "storage" : 970358400000
            }, {
              "name" : "review",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Least poetic poems."
            }, {
              "name" : "id",
              "schema" : {
                "type" : "STRING",
                "isOptional" : true
              },
              "storage" : "bk002"
            } ]
          } ]
        } ]
      },
      "timestampType" : "NO_TIMESTAMP_TYPE",
      "offset" : 1574310211719,
      "headers" : [ ]
    }
    

Properties

Name

Description

Type

Default

Valid Values

Importance

schema.path

A list of URLs that specify the location of the XML schemas the connector must load. Both HTTP and HTTPS paths are supported.

LIST

HIGH

package

The Java package xjc used to generate the source code. This name is applied to the resulting schema.

STRING

com.github.jcustenborder.kafka.connect.transform.xml.model

HIGH

xjc.options.automatic.name.conflict.resolution.enabled

Boolean value that tells the xjc package whether to automatically create unique names for classes or properties when your XML schema definition causes a naming conflict. If enabled (true), xjc resolves the conflict without stopping. if disabled (false), code generation fails, requiring you to manually fix the conflict in schema definition.

BOOLEAN

[true, false]

LOW

xjc.options.strict.check.enabled

Specifies whether the xjc package performs strict validation of the XML schema definition.

BOOLEAN

true

[true, false]

LOW

xjc.options.verbose.enabled

If set to true, enables detailed logging of the xjc compilation process, providing verbose output in the connector logs. If set to false, only standard logging is provided.

BOOLEAN

[true, false]

LOW

Predicates

Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the Kafka Connect Filter (Kafka) SMT Usage Reference for Confluent Cloud, predicates can conditionally filter out specific records. For details and examples, see Predicates.