Important

You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Amazon Kinesis Source Connector for Confluent Platform

Note

If you are using Confluent Cloud, see https://docs.confluent.io/cloud/current/connectors/cc-kinesis-source.html for the cloud Quick Start.

The Kafka Connect Kinesis Source Connector is used to pull data from Amazon Kinesis and persist the data to an Apache Kafka® topic.

Install the Kinesis Connector

You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.

Install the connector using Confluent Hub

Prerequisite
Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.

Navigate to your Confluent Platform installation directory and run the following command to install the latest (latest) connector version. The connector must be installed on every machine where Connect will run.

confluent-hub install confluentinc/kafka-connect-kinesis:latest

You can install a specific version by replacing latest with a version number. For example:

confluent-hub install confluentinc/kafka-connect-kinesis:1.1.4

Install Connector Manually

Download and extract the ZIP file for your connector and then follow the manual connector installation instructions.

License

You can use this connector for a 30-day trial period without a license key.

After 30 days, this connector is available under a Confluent enterprise license. Confluent issues enterprise license keys to subscribers, along with providing enterprise-level support for Confluent Platform and your connectors. If you are a subscriber, please contact Confluent Support at support@confluent.io for more information.

See Confluent Platform license for license properties and License topic configuration for information about the license topic.

Usage Notes

The default credentials provider is DefaultAWSCredentialsProviderChain. For more information, see the AWS documentation.

Examples

Streaming ETL Demo

To evaluate the Kafka Connect Kinesis source connector, AWS S3 sink connector, Azure Blob sink connector, and GCP GCS sink connector in an end-to-end streaming deployment, refer to the Cloud ETL demo on GitHub. This demo also allows you to evaluate the real-time data processing capabilities of Confluent KSQL.

Property-based example

This configuration is used typically along with standalone workers.

 name=KinesisSourceConnector1
 connector.class=io.confluent.connect.kinesis.KinesisSourceConnector
 tasks.max=1
 aws.access.key.id=< Optional Configuration >
 aws.secret.key.id=< Optional Configuration >
 kafka.topic=< Required Configuration >
 kinesis.stream=< Required Configuration >
 kinesis.region=< Optional Configuration - defaults to US_EAST_1 >
 confluent.topic.bootstrap.servers=localhost:9092
 confluent.topic.replication.factor=1

REST-based example

This configuration is used typically along with distributed workers. Write the following JSON to connector.json, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect REST API

Connect distributed REST-based example:

 {
   "config" : {
     "name" : "KinesisSourceConnector1",
     "connector.class" : "io.confluent.connect.kinesis.KinesisSourceConnector",
     "tasks.max" : "1",
     "aws.access.key.id" : "< Optional Configuration >",
     "aws.secret.key.id" : "< Optional Configuration >",
     "kafka.topic" : "< Required Configuration >",
     "kinesis.stream" : "< Required Configuration >"
   }
 }

Use curl to post the configuration to one of the Kafka Connect Workers. Change http://localhost:8083/ the endpoint of one of your Kafka Connect worker(s).

Create a new connector:

curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors

Update an existing connector:

curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/KinesisSourceConnector1/config

Quick Start

The Kinesis connector is used to import data from Kinesis streams, and write them into a Kafka topic. Before you begin, create a Kinesis stream and have a user profile with read access to it.

Preliminary Setup

Navigate to your Confluent Platform installation directory and run this command to install the latest connector version.

confluent-hub install confluentinc/kafka-connect-kinesis:latest

You can install a specific version by replacing latest with a version number. For example:

confluent-hub install confluentinc/kafka-connect-kinesis:1.1.1-preview

Adding a new connector plugin requires restarting Connect. Use the Confluent CLI to restart Connect.

Tip

The command syntax for the Confluent CLI development commands changed in 5.3.0. These commands have been moved to confluent local. For example, the syntax for confluent start is now confluent local start. For more information, see confluent local.

$ |confluent_stop| connect && |confluent_start| connect
Using CONFLUENT_CURRENT: /Users/username/Sandbox/confluent-snapshots/var/confluent.NuZHxXfq
Starting zookeeper
zookeeper is [UP]
Starting kafka
kafka is [UP]
Starting schema-registry
schema-registry is [UP]
Starting kafka-rest
kafka-rest is [UP]
Starting connect
connect is [UP]

Check if the Kinesis plugin has been installed correctly and picked up by the plugin loader:

$ curl -sS localhost:8083/connector-plugins | jq .[].class | grep kinesis
"io.confluent.connect.kinesis.KinesisSourceConnector"

Kinesis Setup

You can use the AWS Management Console to set up your Kinesis stream as shown here or you can complete the following steps:

  1. Make sure you have an AWS account.

  2. Set up AWS Credentials.

  3. Create a Kinesis Stream.

    aws kinesis create-stream --stream-name my_kinesis_stream --shard-count 1
    
  4. Insert Records into your stream.

    aws kinesis put-record --stream-name my_kinesis_stream --partition-key 123 --data test-message-1
    

The example shows that a record containing partition key 123 and data “test-message-1” is inserted into my_kinesis_stream.

Source Connector Configuration

Start the services using the Confluent CLI:

confluent local start

Create a configuration file named kinesis-source-config.json with the following contents.

{
  "name": "kinesis-source",
  "config": {
    "connector.class": "io.confluent.connect.kinesis.KinesisSourceConnector",
    "tasks.max": "1",
    "kafka.topic": "kinesis_topic",
    "kinesis.region": "US_WEST_1",
    "kinesis.stream": "my_kinesis_stream",
    "confluent.license": "",
    "name": "kinesis-source",
    "confluent.topic.bootstrap.servers": "localhost:9092",
    "confluent.topic.replication.factor": "1"
  }
}

The important configuration parameters used here are:

  • kinesis.stream.name: The Kinesis Stream to subscribe to.

  • kafka.topic: The Kafka topic in which the messages received from Kinesis are produced.

  • tasks.max: The maximum number of tasks that should be created for this connector. Each Kinesis shard is allocated to a single task. If the number of shards specified exceeds the number of tasks, the connector throws an exception and fails.

  • kinesis.region: The region where the stream exists. Defaults to US_EAST_1 if not specified.

  • You may pass your AWS credentials to the Kinesis connector through your source connector configuration. To pass AWS credentials in the source configuration set the aws.access.key.id and the aws.secret.key.id: parameters.

    "aws.acess.key.id":<your-access-key>
    "aws.secret.key.id":<your-secret-key>
    

Run this command to start the Kinesis source connector.

Caution

You must include a double dash (--) between the topic name and your flag. For more information, see this post.

confluent local load source-kinesis -- -d source-kinesis-config.json

To check that the connector started successfully view the Connect worker’s log by running:

confluent local log connect

Start a Kafka Consumer in a separate terminal session to view the data exported by the connector into the kafka topic

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic kinesis_topic --from-beginning

Finally, stop the Confluent services using the command:

confluent local stop

Remove unused resources

Delete your stream and clean up resources to avoid incurring any unintended charges.

aws kinesis delete-stream --stream-name my_kinesis_stream

AWS Credentials

By default, the kinesis connector looks for kinesis credentials in the following locations and in the following order:

  1. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables accessible to the Connect worker processes where the connector will be deployed. These variables are recognized by the AWS CLI and all AWS SDKs (except for the AWS SDK for .NET). You use export to set these variables.

    export AWS_ACCESS_KEY_ID=<your_access_key_id>
    export AWS_SECRET_ACCESS_KEY=<your_secret_access_key>
    

    The AWS_ACCESS_KEY and AWS_SECRET_KEY can be used instead, but are not recognized by the AWS CLI.

  2. The aws.accessKeyId and aws.secretKey Java system properties on the Connect worker processes where the connector will be deployed. However, these variables are only recognized by the AWS SDK for Java and are not recommended.

  3. The ~/.aws/credentials file located in the home directory of the operating system user that runs the Connect worker processes. These credentials are recognized by most AWS SDKs and the AWS CLI. Use the following AWS CLI command to create the credentials file:

    aws configure
    

    You can also manually create the credentials file using a text editor. The file should contain lines in the following format:

    [default]
    aws_access_key_id = <your_access_key_id>
    aws_secret_access_key = <your_secret_access_key>
    

    Note

    When creating the credentials file, make sure that the user creating the credentials file is the same user that runs the Connect worker processes and that the credentials file is in this user’s home directory. Otherwise, the kinesis connector will not be able to find the credentials.

    See AWS Credentials File Format for additional details.

Choose one of the above to define the AWS credentials that the kinesis connectors use, verify the credentials implementation is set correctly, and then restart all of the Connect worker processes.

Note

Confluent recommends using either Environment variables or a Credentials file because these are the most straightforward, and they can be checked using the AWS CLI tool before running the connector.

All kinesis connectors run in a single Connect worker cluster and use the same credentials. This is sufficient for many use cases. If you want more control, refer to the following section to learn more about controlling and customizing how the kinesis connector gets AWS credentials.

Credentials Providers

A credentials provider is a Java class that implements the com.amazon.auth.AWSCredentialsProvider interface in the AWS Java library and returns AWS credentials from the environment. By default the kinesis connector configuration property kinesis.credentials.provider.class uses the com.amazon.auth.DefaultAWSCredentialsProviderChain class. This class and interface implementation chains together five other credential provider classes.

The com.amazonaws.auth.DefaultAWSCredentialsProviderChain implementation looks for credentials in the following order:

  1. Environment variables using the com.amazonaws.auth.EnvironmentVariableCredentialsProvider class implementation. This implementation uses environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Environment variables AWS_ACCESS_KEY and AWS_SECRET_KEY are also supported by this implementation; however, these two variables are only recognized by the AWS SDK for Java and are not recommended.

  2. Java system properties using the com.amazonaws.auth.SystemPropertiesCredentialsProvider class implementation. This implementation uses Java system properties aws.accessKeyId and aws.secretKey.

  3. Credentials file using the com.amazonaws.auth.profile.ProfileCredentialsProvider class implementation. This implementation uses a credentials file located in the path ~/.aws/credentials. This credentials provider can be used by most AWS SDKs and the AWS CLI. Use the following AWS CLI command to create the credentials file:

    aws configure
    

    You can also manually create the credentials file using a text editor. The file should contain lines in the following format:

    [default]
    aws_access_key_id = <your_access_key_id>
    aws_secret_access_key = <your_secret_access_key>
    

    Note

    When creating the credentials file, make sure that the user creating the credentials file is the same user that runs the Connect worker processes and that the credentials file is in this user’s home directory. Otherwise, the kinesis connector will not be able to find the credentials.

    See AWS Credentials File Format for additional details.

Using Other Implementations

You can use a different credentials provider. To do this, set the kinesis.credentials.provider.class property to the name of any class that implements the com.amazon.auth.AWSCredentialsProvider interface.

Important

If you are using a different credentials provider, do not include the aws.acess.key.id and aws.secret.key.id in the connector configuration file. If these parameters are included, they will override the custom credentials provider class.

Complete the following steps to use a different credentials provider:

  1. Find or create a Java credentials provider class that implements the com.amazon.auth.AWSCredentialsProvider interface.

  2. Put the class file in a JAR file.

  3. Place the JAR file in the share/java/kafka-connect-kinesis directory on all Connect workers.

  4. Restart the Connect workers.

  5. Change the kinesis connector property file to use your custom credentials. Add the provider class entry kinesis.credentials.provider.class=<className> in the kinesis connector properties file.

    Important

    You must use the fully qualified class name in the <className> entry.

Additional documentation