Kudu Connector (Source and Sink) for Confluent Platform¶
You can use the Kafka Connect Kudu source connector to import data from columnar relational database Kudu with Impala JDBC driver into Apache Kafka® topics. You can use the Kudu sink connector to export data from Kafka topics to Kudu with Impala JDBC driver.
Install the Kudu Connector¶
You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.
If you are running a multi-node Connect cluster, the Kudu connector and Impala JDBC driver JARs must be installed on every Connect worker in the cluster. See below for details.
Install the connector using Confluent Hub¶
- Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.
Navigate to your Confluent Platform installation directory and run this command to install the latest (
latest) connector version.
The connector must be installed on every machine where Connect will be run.
confluent-hub install confluentinc/kafka-connect-kudu:latest
You can install a specific version by replacing
latest with a version number. For example:
confluent-hub install confluentinc/kafka-connect-kudu:1.0.0-preview
You can use this connector for a 30-day trial period without a license key.
After 30 days, this connector is available under a Confluent enterprise license. Confluent issues enterprise license keys to subscribers, along with providing enterprise-level support for Confluent Platform and your connectors. If you are a subscriber, please contact Confluent Support at firstname.lastname@example.org for more information.
See Confluent Platform license for license properties and License topic configuration for information about the license topic. License requirements are the same for both the sink and source connector.
Installing Impala JDBC Driver¶
The Kudu source and sink connectors use the Java Database Connectivity (JDBC) API . In order for this to work, the connectors must use Impala to query Kudu database, and have Impala JDBC Driver installed.
The basic steps of installation are:
- Download Impala JDBC Connector, and unzip to get the JAR files.
- Place these JAR files into the
share/confluent-hub-components/confluentinc-kafka-connect-kudu/libdirectory in your Confluent Platform installation on each of the Connect worker nodes.
- Restart all of the Connect worker nodes.
The following are additional guidelines to consider:
- Use the most recent version of the Impala JDBC driver available.
- Use the correct JAR file for the Java version used to run Connect workers.
Make sure to use the correct JAR file for the Java version in use.
If you install and try to use the Impala JDBC driver JAR file for the wrong version of Java,
starting any Kudu source connector
or Kudu sink connector will likely fail with
UnsupportedClassVersionError. If this happens, remove the Impala JDBC driver JAR file you installed and repeat the driver installation process with the correct JAR file.
share/confluent-hub-components/confluentinc-kafka-connect-kudu/libdirectory mentioned above is for Confluent Platform. If you are using a different installation, find the location where the Confluent Kudu source and sink connector JAR files are located, and place the Impala JDBC driver JAR file(s) for the target databases into the same directory.
- If the Impala JDBC driver is not installed correctly, the
Kudu source or sink connector will fail on startup. Typically, the system throws the error
No suitable driver found. If this happens, install the Impala JDBC driver again.
- Kudu does not support
Timestamptypes all will be mapped to Impala
TIMESTAMPtype, which corresponds to Kudu
- Impala does not support
BINARYtype, so our connectors will not accept binary data as well.
- Complex data types like
Structare not supported.
Decimaltype, both Impala and Kudu allow at most 38 precision. And our connector shall observe the cap.