Kafka Connect FromXML SMT Usage Reference for Confluent Cloud

The FromXML single message transform (SMT) reads XML data, which is stored as bytes or a string and converts the XML to a structure that is strongly typed in connect. For example, it allows data to be converted from XML and stored as AVRO in a topic.

Note

The FromXML SMT is supported on the HTTP V2 Source, HTTP V2 Sink, and IBM MQ Source connectors.

To apply the FromXML SMT, add the following to your connector configuration:

{
  "transforms" : "fromXml",
  "transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
  "transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd"
 }

Examples

The example below shows how to use FromXML SMT.

  • Before:

    {
      "topic" : "test",
      "kafkaPartition" : 1,
      "valueSchema" : {
          "type" : "STRING",
          "isOptional" : false
         },
      "value" : "<?xml version=\"1.0\"?>\n<x:books xmlns:x=\"urn:books\">\n    <book id=\"bk001\">\n        <author>Writer</author>\n        <title>The First Book</title>\n        <genre>Fiction</genre>\n        <price>44.95</price>\n        <pub_date>2000-10-01</pub_date>\n        <review>An amazing story of nothing.</review>\n    </book>\n\n    <book id=\"bk002\">\n        <author>Poet</author>\n        \"title\">The Poet's First Poem</title>\n        \"genre\">Poem</genre>\n        \"price\">24.95</price>\n        \"pub_date\">2000-10-01</pub_date>\n        \"review\">Least poetic poems.</review>\n    </book>\n</x:books>",
      "timestampType" : "NO_TIMESTAMP_TYPE",
      "offset" : 1574310211719,
      "headers" : [ ]
     }
    
  • Adding the SMT to your connector configuration:

    To apply the FromXML SMT, add the following to your connector configuration:

    {
      "transforms" : "fromXml",
      "transforms.fromXml.type" : "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
      "transforms.fromXml.schema.path" : "file:src/test/resources/com/github/jcustenborder/kafka/connect/transform/xml/books.xsd"
     }
    
  • After:

    After the FromXML SMT applies, the value transforms as follows:

    {
      "topic" : "test",
      "kafkaPartition" : 1,
      "valueSchema" : {
        "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm",
        "type" : "STRUCT",
        "isOptional" : true,
        "fieldSchemas" : {
          "book" : {
            "type" : "ARRAY",
            "isOptional" : true,
            "valueSchema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            }
          }
        }
      },
      "value" : {
        "schema" : {
          "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BooksForm",
          "type" : "STRUCT",
          "isOptional" : true,
          "fieldSchemas" : {
            "book" : {
              "type" : "ARRAY",
              "isOptional" : true,
              "valueSchema" : {
                "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
                "type" : "STRUCT",
                "isOptional" : true,
                "fieldSchemas" : {
                  "author" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "title" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "genre" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "price" : {
                    "type" : "FLOAT32",
                    "isOptional" : true
                  },
                  "pub_date" : {
                    "name" : "org.apache.kafka.connect.data.Date",
                    "type" : "INT32",
                    "version" : 1,
                    "isOptional" : false
                  },
                  "review" : {
                    "type" : "STRING",
                    "isOptional" : false
                  },
                  "id" : {
                    "type" : "STRING",
                    "isOptional" : true
                  }
                }
              }
            }
          }
        },
        "fieldValues" : [ {
          "name" : "book",
          "schema" : {
            "type" : "ARRAY",
            "isOptional" : true,
            "valueSchema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            }
          },
          "storage" : [ {
            "schema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            },
            "fieldValues" : [ {
              "name" : "author",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Writer"
            }, {
              "name" : "title",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "The First Book"
            }, {
              "name" : "genre",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Fiction"
            }, {
              "name" : "price",
              "schema" : {
                "type" : "FLOAT32",
                "isOptional" : true
              },
              "storage" : 44.95
            }, {
              "name" : "pub_date",
              "schema" : {
                "name" : "org.apache.kafka.connect.data.Date",
                "type" : "INT32",
                "version" : 1,
                "isOptional" : false
              },
              "storage" : 970358400000
            }, {
              "name" : "review",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "An amazing story of nothing."
            }, {
              "name" : "id",
              "schema" : {
                "type" : "STRING",
                "isOptional" : true
              },
              "storage" : "bk001"
            } ]
          }, {
            "schema" : {
              "name" : "com.github.jcustenborder.kafka.connect.transform.xml.model.BookForm",
              "type" : "STRUCT",
              "isOptional" : true,
              "fieldSchemas" : {
                "author" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "title" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "genre" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "price" : {
                  "type" : "FLOAT32",
                  "isOptional" : true
                },
                "pub_date" : {
                  "name" : "org.apache.kafka.connect.data.Date",
                  "type" : "INT32",
                  "version" : 1,
                  "isOptional" : false
                },
                "review" : {
                  "type" : "STRING",
                  "isOptional" : false
                },
                "id" : {
                  "type" : "STRING",
                  "isOptional" : true
                }
              }
            },
            "fieldValues" : [ {
              "name" : "author",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Poet"
            }, {
              "name" : "title",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "The Poet's First Poem"
            }, {
              "name" : "genre",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Poem"
            }, {
              "name" : "price",
              "schema" : {
                "type" : "FLOAT32",
                "isOptional" : true
              },
              "storage" : 24.95
            }, {
              "name" : "pub_date",
              "schema" : {
                "name" : "org.apache.kafka.connect.data.Date",
                "type" : "INT32",
                "version" : 1,
                "isOptional" : false
              },
              "storage" : 970358400000
            }, {
              "name" : "review",
              "schema" : {
                "type" : "STRING",
                "isOptional" : false
              },
              "storage" : "Least poetic poems."
            }, {
              "name" : "id",
              "schema" : {
                "type" : "STRING",
                "isOptional" : true
              },
              "storage" : "bk002"
            } ]
          } ]
        } ]
      },
      "timestampType" : "NO_TIMESTAMP_TYPE",
      "offset" : 1574310211719,
      "headers" : [ ]
    }
    

Properties

Name Description Type Default Valid Values Importance
schema.path A list of URLs that specify the location of the XML schemas the connector must load. Both HTTP and HTTPS paths are supported. LIST     HIGH
package The Java package xjc used to generate the source code. This name is applied to the resulting schema. STRING com.github.jcustenborder.kafka.connect.transform.xml.model   HIGH
xjc.options.automatic.name.conflict.resolution.enabled Boolean value that tells the xjc package whether to automatically create unique names for classes or properties when your XML schema definition causes a naming conflict. If enabled (true), xjc resolves the conflict without stopping. if disabled (false), code generation fails, requiring you to manually fix the conflict in schema definition. BOOLEAN   [true, false] LOW
xjc.options.strict.check.enabled Specifies whether the xjc package performs strict validation of the XML schema definition. BOOLEAN true [true, false] LOW
xjc.options.verbose.enabled If set to true, enables detailed logging of the xjc compilation process, providing verbose output in the connector logs. If set to false, only standard logging is provided. BOOLEAN   [true, false] LOW

Predicates

Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the Kafka Connect Filter (Kafka) SMT Usage Reference for Confluent Cloud, predicates can conditionally filter out specific records. For details and examples, see Predicates.