Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Kafka Connect AWS CloudWatch Logs Source Connector¶
The AWS CloudWatch Logs source connector is used to import data from AWS CloudWatch Logs, and write them into a Kafka topic. Moreover, the connector sources from a single log group and writes to one topic per log stream. There is a topic format configuration available to customize the topic names of each log stream. If specific customizations for topics such as multiple log streams writing to the same topic are desired, SMTs can be used for such actions.
This connector can start at one task supporting all importation of data and can scale up to one task per log stream which will raise performance to the highest that Amazon supports (100,000 logs per second or 10MB per second).
Prerequisites¶
The following are required to run the Kafka Connect AWS CloudWatch Logs Connector:
- Kafka Broker: Confluent Platform 3.3.0 or above
- Connect: Confluent Platform 4.1.0 or above
- Java 1.8
- AWS account
- At least one AWS CloudWatch log group and log stream in AWS CloudWatch Logs
Features¶
The AWS CloudWatch Logs connector offers a variety of features:
- At Least Once Delivery: Records imported from AWS CloudWatch Logs are delivered with at least once semantics. Duplicates will generally be limited, however, as there will only be repeats in the chance of unexpected termination of the connector.
- Topic Format Customizability: Because this connector is designed to write to a topic per log stream, custom topic formats can be created or all records can be written to exactly one topic.
- Log Stream Selection: The log streams from which logs are imported from can be specified, or as a default, all will be used.
Install the AWS CloudWatch Logs Connector¶
You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.
Install the connector using Confluent Hub¶
- Prerequisite
- Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.
Navigate to your Confluent Platform installation directory and run this command to install the latest (latest
) connector version.
The connector must be installed on every machine where Connect will be run.
confluent-hub install confluentinc/kafka-connect-aws-cloudwatch:latest
You can install a specific version by replacing latest
with a version number. For example:
confluent-hub install confluentinc/kafka-connect-aws-cloudwatch:1.0.0-preview
Install Connector Manually¶
Download and extract the ZIP file for your connector and then follow the manual connector installation instructions.
License¶
You can use this connector for a 30-day trial period without a license key.
After 30 days, this connector is available under a Confluent enterprise license. Confluent issues enterprise license keys to subscribers, along with providing enterprise-level support for Confluent Platform and your connectors. If you are a subscriber, please contact Confluent Support at support@confluent.io for more information.
See Confluent Platform license for license properties and License topic configuration for information about the license topic.
AWS CloudWatch Logs Source Connector Quick Start¶
Preliminary Setup¶
To add a new connector plugin you must restart Connect. Use the Confluent CLI command to restart Connect:
confluent stop connect && confluent start connect
Your output should resemble:
Using CONFLUENT_CURRENT: /Users/username/Sandbox/confluent-snapshots/var/confluent.NuZHxXfq
Starting zookeeper
zookeeper is [UP]
Starting kafka
kafka is [UP]
Starting schema-registry
schema-registry is [UP]
Starting kafka-rest
kafka-rest is [UP]
Starting connect
connect is [UP]
Check if the AWS CloudWatch Logs plugin has been installed correctly and picked up by the plugin loader:
curl -sS localhost:8083/connector-plugins | jq '.[].class' | grep cloudwatch
Your output should resemble:
"io.confluent.connect.aws.cloudwatch.AwsCloudWatchSourceConnector"
AWS Credentials¶
By default, the AWS CloudWatch Logs connector looks for AWS credentials in the following locations and in the following order:
The
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables accessible to the Connect worker processes where the connector will be deployed. These variables are recognized by the AWS CLI and all AWS SDKs (except for the AWS SDK for .NET). You use export to set these variables.export AWS_ACCESS_KEY_ID=<your_access_key_id> export AWS_SECRET_ACCESS_KEY=<your_secret_access_key>
The
AWS_ACCESS_KEY
andAWS_SECRET_KEY
can be used instead, but are not recognized by the AWS CLI.The
aws.accessKeyId
andaws.secretKey
Java system properties on the Connect worker processes where the connector will be deployed. However, these variables are only recognized by the AWS SDK for Java and are not recommended.The
~/.aws/credentials
file located in the home directory of the operating system user that runs the Connect worker processes. These credentials are recognized by most AWS SDKs and the AWS CLI. Use the following AWS CLI command to create the credentials file:aws configure
You can also manually create the credentials file using a text editor. The file should contain lines in the following format:
[default] aws_access_key_id = <your_access_key_id> aws_secret_access_key = <your_secret_access_key>
Note
When creating the credentials file, make sure that the user creating the credentials file is the same user that runs the Connect worker processes and that the credentials file is in this user’s home directory. Otherwise, the kinesis connector will not be able to find the credentials.
See AWS Credentials File Format for additional details.
Choose one of the above to define the AWS credentials that the AWS CloudWatch Logs connectors use, verify the credentials implementation is set correctly, and then restart all of the Connect worker processes.
Note
Confluent recommends using either Environment variables or a Credentials file because these are the most straightforward, and they can be checked using the AWS CLI tool before running the connector.
Credentials Providers¶
A credentials provider is a Java class that implements the com.amazon.auth
.AWSCredentialsProvider interface in the AWS Java library and
returns AWS credentials from the environment. By default the AWS CloudWatch Logs connector configuration property
aws.credentials.provider.class
uses the com.amazon.auth.DefaultAWSCredentialsProviderChain class. This class and
interface implementation chains together five other credential provider classes.
The com.amazonaws.auth.DefaultAWSCredentialsProviderChain implementation looks for credentials in the following order:
Environment variables using the com.amazonaws.auth.EnvironmentVariableCredentialsProvider class implementation. This implementation uses environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. Environment variablesAWS_ACCESS_KEY
andAWS_SECRET_KEY
are also supported by this implementation; however, these two variables are only recognized by the AWS SDK for Java and are not recommended.Java system properties using the com.amazonaws.auth.SystemPropertiesCredentialsProvider class implementation. This implementation uses Java system properties
aws.accessKeyId
andaws.secretKey
.Credentials file using the com.amazonaws.auth.profile.ProfileCredentialsProvider class implementation. This implementation uses a credentials file located in the path
~/.aws/credentials
. This credentials provider can be used by most AWS SDKs and the AWS CLI. Use the following AWS CLI command to create the credentials file:aws configure
You can also manually create the credentials file using a text editor. The file should contain lines in the following format:
[default] aws_access_key_id = <your_access_key_id> aws_secret_access_key = <your_secret_access_key>
Note
When creating the credentials file, make sure that the user creating the credentials file is the same user that runs the Connect worker processes and that the credentials file is in this user’s home directory. Otherwise, the kinesis connector will not be able to find the credentials.
See AWS Credentials File Format for additional details.
Using Other Implementations¶
You can use a different credentials provider. To do this, set the aws.credentials.provider.class
property to the name of any class that implements the com.amazon.auth.AWSCredentialsProvider interface.
Important
If you are using a different credentials provider, do not include the aws.access.key.id
and aws.secret.key.id
in the connector configuration file. If these parameters are included, they will override the custom credentials provider class.
Complete the following steps to use a different credentials provider:
Find or create a Java credentials provider class that implements the com.amazon.auth.AWSCredentialsProvider interface.
Put the class file in a JAR file.
Place the JAR file in the
share/java/kafka-connect-aws-cloudwatch
directory on all Connect workers.Restart the Connect workers.
Change the AWS CloudWatch Logs connector property file to use your custom credentials. Add the provider class entry
aws.credentials.provider.class=<className>
in the AWS CloudWatch Logs connector properties file.Important
You must use the fully qualified class name in the
<className>
entry.
AWS CloudWatch Logs Setup¶
You can use the AWS Management Console to set up your AWS CloudWatch log group and log stream as shown here or you can complete the following steps:
Make sure you have an AWS account.
Set up AWS Credentials.
Create a log group in AWS CloudWatch Logs.
aws logs create-log-group --log-group my-log-group
Create a log stream in AWS CloudWatch Logs.
aws logs create-log-stream --log-group my-log-group --log-stream my-log-stream
Insert Records into your log stream. If this is the first time inserting logs into a new log stream, then no sequence token is needed. However, after the first put, a sequence token is returned. You will need this token as a parameter for the next put.
aws logs put-log-events --log-group my-log-group --log-stream my-log-stream --log-events timestamp=<time>,message=some-string
The example shows a log event at a specified timestamp with a specified message put into the specified log stream and log group.
Enter the following command to get a sequence token:
aws logs describe-log-streams --log-group my-log-group
Output providing the sequence token is displayed:
{ "logStreams": [ { "logStreamName": "my-log-stream", "creationTime": 1569709821347, "lastIngestionTime": 1569709984113, "uploadSequenceToken": "49595785783592846449895609848346364951147972276040781330", "storedBytes": 0 } ] }
The example below shows how you can use the sequence token to generate logs for your stream.
aws logs put-log-events --log-group my-log-group --log-stream my-log-stream --log-events timestamp=<time>,message=bananas --sequence-token 49595785783592846449895609848346364951147972276040781330
Source Connector Configuration¶
Start the services using the Confluent CLI:
confluent start
Create a configuration file named aws-cloudwatch-logs-source-config.json with the following contents.
{
"name": "aws-cloudwatch-logs-source",
"config": {
"connector.class": "io.confluent.connect.aws.cloudwatch.AwsCloudWatchSourceConnector",
"tasks.max": "1",
"aws.cloudwatch.logs.url": "https://logs.us-east-2.amazonaws.com",
"aws.cloudwatch.log.group": "my-log-group",
"aws.cloudwatch.log.streams": "my-log-stream",
"name": "aws-cloudwatch-logs-source",
"confluent.topic.bootstrap.servers": "localhost:9092",
"confluent.topic.replication.factor": "1"
}
}
The important configuration parameters used here are:
aws.cloudwatch.logs.url: The endpoint URL that the source connector connects to to pull the specified logs.
aws.cloudwatch.log.group: The AWS CloudWatch log group under which the log streams are contained.
aws.cloudwatch.log.streams: A list of AWS CloudWatch log streams from which the logs are pulled from. The default value is to use all log streams from the configured log group.
tasks.max: The maximum number of tasks that should be created for this connector.
You may pass your AWS Credentials to the AWS CloudWatch Logs connector through your source connector configuration. To pass AWS credentials in the source configuration set the aws.access.key.id and the aws.secret.key.id: parameters.
"aws.access.key.id":<your-access-key-id> "aws.secret.access.key":<your-secret-access-key>
Run this command to start the AWS CloudWatch Logs source connector.
confluent load aws-cloudwatch-logs-source -d aws-cloudwatch-logs-source-config.json
To check that the connector started successfully view the Connect worker’s log by running:
confluent log connect
Start a Kafka Consumer in a separate terminal session to view the data exported by the connector into the kafka topic
path/to/confluent/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic my-log-group.my-log-stream --from-beginning
Finally, stop the Confluent services using the command:
confluent stop
Remove unused resources¶
Delete your log group and clean up resources to avoid incurring any unintended charges.
aws logs delete-log-group --log-group my-log-group
Examples¶
Property based example¶
name=aws-cloudwatch-logs-source-connector
connector.class=io.confluent.connect.aws.cloudwatch.AwsCloudWatchSourceConnector
tasks.max=1
aws.access.key.id=< Optional Configuration >
aws.secret.access.key=< Optional Configuration >
aws.cloudwatch.log.group=< Required Configuration >
aws.cloudwatch.log.streams=< Optional Configuration >
confluent.topic.bootstrap.servers=localhost:9092
confluent.topic.replication.factor=1
REST based example¶
This configuration is used typically along with distributed workers.
Write the following JSON to connector.json
, configure all of the required values, and use the command below to
post the configuration to one the distributed connect worker(s). Check here for more information about the
Kafka Connect REST API.
{
"name" : "aws-cloudwatch-logs-source-connector",
"config" : {
"name" : "aws-cloudwatch-logs-source-connector",
"connector.class" : "io.confluent.connect.aws.cloudwatch.AwsCloudWatchSourceConnector",
"tasks.max" : "1",
"aws.access.key.id" : "< Optional Configuration >",
"aws.secret.access.key" : "< Optional Configuration >",
"aws.cloudwatch.log.group" : "< Required Configuration >",
"aws.cloudwatch.log.streams : "< Optional Configuration - defaults to all log streams in
the log group >"
}
}
Use curl to post the configuration to one of the Kafka Connect Workers. Change
http://localhost:8083/
the endpoint of one of your Kafka Connect workers.
curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors
curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json \
http://localhost:8083/connectors/aws-cloudwatch-logs-source-connector/config