Kudu Connector (Source and Sink) for Confluent Platform

You can use the Kafka Connect Kudu source connector to import data from columnar relational database Kudu with Impala JDBC driver into Apache Kafka® topics. You can use the Kudu sink connector to export data from Kafka topics to Kudu with Impala JDBC driver.

Install the Kudu Connector

You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.

If you are running a multi-node Connect cluster, the Kudu connector and Impala JDBC driver JARs must be installed on every Connect worker in the cluster. See below for details.

Install the connector using Confluent Hub

Prerequisite
Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.

Navigate to your Confluent Platform installation directory and run this command to install the latest (latest) connector version. The connector must be installed on every machine where Connect will be run.

confluent-hub install confluentinc/kafka-connect-kudu:latest

You can install a specific version by replacing latest with a version number. For example:

confluent-hub install confluentinc/kafka-connect-kudu:1.0.0-preview

Install Connector Manually

Download and extract the ZIP file for your connector and then follow the manual connector installation instructions.

License

You can use this connector for a 30-day trial period without a license key.

After 30 days, this connector is available under a Confluent enterprise license. Confluent issues enterprise license keys to subscribers, along with providing enterprise-level support for Confluent Platform and your connectors. If you are a subscriber, please contact Confluent Support at support@confluent.io for more information.

See Confluent Platform license for license properties and License topic configuration for information about the license topic. License requirements are the same for both the sink and source connector.

Installing Impala JDBC Driver

The Kudu source and sink connectors use the Java Database Connectivity (JDBC) API . In order for this to work, the connectors must use Impala to query Kudu database, and have Impala JDBC Driver installed.

The basic steps of installation are:

  1. Download Impala JDBC Connector, and unzip to get the JAR files.
  2. Place these JAR files into the share/confluent-hub-components/confluentinc-kafka-connect-kudu/lib directory in your Confluent Platform installation on each of the Connect worker nodes.
  3. Restart all of the Connect worker nodes.

General Guidelines

The following are additional guidelines to consider:

  • Use the most recent version of the Impala JDBC driver available.
  • Use the correct JAR file for the Java version used to run Connect workers. Make sure to use the correct JAR file for the Java version in use. If you install and try to use the Impala JDBC driver JAR file for the wrong version of Java, starting any Kudu source connector or Kudu sink connector will likely fail with UnsupportedClassVersionError. If this happens, remove the Impala JDBC driver JAR file you installed and repeat the driver installation process with the correct JAR file.
  • The share/confluent-hub-components/confluentinc-kafka-connect-kudu/lib directory mentioned above is for Confluent Platform. If you are using a different installation, find the location where the Confluent Kudu source and sink connector JAR files are located, and place the Impala JDBC driver JAR file(s) for the target databases into the same directory.
  • If the Impala JDBC driver is not installed correctly, the Kudu source or sink connector will fail on startup. Typically, the system throws the error No suitable driver found. If this happens, install the Impala JDBC driver again.

Limitations

  • Kudu does not support DATE and TIME types. Connect Date, Time and Timestamp types all will be mapped to Impala TIMESTAMP type, which corresponds to Kudu unixtime_micros type.
  • Impala does not support BINARY type, so our connectors will not accept binary data as well.
  • Complex data types like Array, Map and Struct are not supported.
  • For Decimal type, both Impala and Kudu allow at most 38 precision. And our connector shall observe the cap.