InfluxDB 2 Source Connector for Confluent Cloud

The fully-managed InfluxDB 2 Source connector for Confluent Cloud allows you to import data from an InfluxDB host into an Apache Kafka® topic.

The connector loads data by periodically executing an InfluxDB query and creating an output record for each row in the result set. By default, all measurements in a database are copied, each to its own output topic. The connector monitors the database for new measurements and adapts automatically. When copying data from a measurement, the connector loads only new records.

Note

Features

The InfluxDB 2 Source connector supports the following features:

  • At least once delivery: This connector guarantees that records from the Kafka topic are delivered at least once.
  • Supports one task: The connector supports running a single task, which is initiated when in QUERY mode. Otherwise, the connector initiates tasks based on the minimum number of measurements or max-tasks configured.
  • Offset management capabilities: Supports offset management. For more information, see Manage custom offsets.

For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.

Limitations

Be sure to review the following information.

Manage custom offsets

You can manage the offsets for this connector. Offsets provide information on the point in the system from which the connector is accessing data. For more information, see Manage Offsets for Fully-Managed Connectors in Confluent Cloud.

To manage offsets:

To get the current offset, make a GET request that specifies the environment, Kafka cluster, and connector name.

GET /connect/v1/environments/{environment_id}/clusters/{kafka_cluster_id}/connectors/{connector_name}/offsets
Host: https://api.confluent.cloud

Response:

Successful calls return HTTP 200 with a JSON payload that describes the offset.

{
    "id": "lcc-example123",
    "name": "{connector_name}",
    "offsets": [
       {
             "partition": {
                "measurement": "my-example"
             },
             "offset": {
                "time": "2024-02-26T12:25:28.595877Z"
             }
       }
    ],
    "metadata": {
        "observed_at": "2024-03-28T17:57:48.139635200Z"
    }
}

Responses include the following information:

  • The position of latest offset.
  • The observed time of the offset in the metadata portion of the payload. The observed_at time indicates a snapshot in time for when the API retrieved the offset. A running connector is always updating its offsets. Use observed_at to get a sense for the gap between real time and the time at which the request was made. By default, offsets are observed every minute. Calling GET repeatedly will fetch more recently observed offsets.
  • Information about the connector.
  • In these examples, the curly braces around “{connector_name}” indicate a replaceable value.

JSON payload

The table below offers a description of the unique fields in the JSON payload for managing offsets of the InfluxDB 2 Source connector. For more information about InfluxDB terminology, see InfluxDB Cloud Serverless documentation.

Field Definition Required/Optional
measurement A string that describes the data stored in associated fields of the data structure part of InfluxDB. Required
time An InfluxDB data type that represents a single point in time with nanosecond precision. This is the offset. Required

Quick Start

Use this quick start to get up and running with the Confluent Cloud InfluxDB 2 Source connector. The quick start provides the basics of selecting the connector and configuring it to stream events to Apache Kafka®.

Prerequisites
  • Authorized access to a Confluent Cloud cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud.

  • The Confluent CLI installed and configured for the cluster. See Install the Confluent CLI.

  • Authorized access to query the InfluxDB bucket. For more information, see Query data.

    Note

    The connector requires --read-bucket permission for the bucket where it sends data. For more information, see Query data.

  • Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).

Using the Confluent Cloud Console

Step 1: Launch your Confluent Cloud cluster

See the Quick Start for Confluent Cloud for installation instructions.

Step 2: Add a connector

In the left navigation menu, click Connectors. If you already have connectors in your cluster, click + Add connector.

Step 3: Select your connector

Click the InfluxDB 2 Source connector card.

InfluxDB 2 Source Connector Card

Step 4: Enter the connector details

Note

  • Make sure you have all your prerequisites completed.
  • An asterisk ( * ) designates a required entry.

At the Add InfluxDB 2 Source Connector screen, complete the following:

In the Kafka Topic Name Prefix field, define a topic prefix your connector will use to publish to Kafka topics. The connector publishes Kafka topics using the following naming convention: <topic.prefix><tableName>.

Step 5: Check for files

Verify that data is being produced at the InfluxDB host.

For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.

Using the Confluent CLI

To set up and run the connector using the Confluent CLI, complete the following steps.

Note

Make sure you have all your prerequisites completed.

Step 1: List the available connectors

Enter the following command to list available connectors:

confluent connect plugin list

Step 2: List the connector configuration properties

Enter the following command to show the connector configuration properties:

confluent connect plugin describe <connector-plugin-name>

The command output shows the required and optional configuration properties.

Step 3: Create the connector configuration file

Create a JSON file that contains the connector configuration properties. The following example shows the required connector properties.

{
  "connector.class": "InfluxDB2Source",
  "name": "InfluxDB2Source_0",
  "kafka.api.key": "****************",
  "kafka.api.secret": "*********************************",
  "influxdb.url": "http://influxdb-test.com:8086",
  "influxdb.token": "***************************",
  "influxdb.org.id": "<organization-id>",
  "influxdb.bucket": "<bucket-name>",
  "topic.prefix": "<topic-prefix>",
  "output.data.format": "JSON",
  "tasks.max": "1",
}

Note the following property definitions:

  • "connector.class": Identifies the connector plugin name.
  • "name": Sets a name for your new connector.
  • "kafka.auth.mode": Identifies the connector authentication mode you want to use. There are two options: SERVICE_ACCOUNT or KAFKA_API_KEY (the default). To use an API key and secret, specify the configuration properties kafka.api.key and kafka.api.secret, as shown in the example configuration (above). To use a service account, specify the Resource ID in the property kafka.service.account.id=<service-account-resource-ID>. To list the available service account resource IDs, use the following command:

    confluent iam service-account list
    

    For example:

    confluent iam service-account list
    
       Id     | Resource ID |       Name        |    Description
    +---------+-------------+-------------------+-------------------
       123456 | sa-l1r23m   | sa-1              | Service account 1
       789101 | sa-l4d56p   | sa-2              | Service account 2
    
  • "influxdb.url": Fully-qualified InfluxDB API URL used for establishing a connection. For example, http://influxdb-test.com:8086

  • "influxdb.token": Token to authenticate with the InfluxDB host.

  • "influxdb.org.id": The InfluxDB organization ID.

    Note

    The connector requires --read-bucket permission for the bucket where it sends data. For more information, see Query data.

    For more information, see writing data to InfluxDB.

  • "influxdb.bucket": The bucket where the connector sends data.

  • "output.data.format": Supported formats are AVRO, PROTOBUF, JSON_SR (JSON Schema), or JSON (schemaless). A valid schema must be available in Schema Registry to use a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).

  • "tasks.max": Enter the number of tasks to use with the connector. The connector supports running a single task, which is initiated when in QUERY mode. Otherwise, the connector initiates tasks based on the minimum number of measurements or max-tasks configured.

Single Message Transforms: See the Single Message Transforms (SMT) documentation for details about adding SMTs using the CLI.

See Configuration Properties for all property values and descriptions.

Step 3: Load the properties file and create the connector

Enter the following command to load the configuration and start the connector:

confluent connect cluster create --config-file <file-name>.json

For example:

confluent connect cluster create --config-file influxdb2-source-config.json

Example output:

Created connector InfluxDB2Source_0 lcc-do6vzd

Step 4: Check the connector status.

Enter the following command to check the connector status:

confluent connect cluster list

Example output:

ID           |             Name        | Status  | Type   | Trace
+------------+-------------------------+---------+--------+-------+
lcc-do6vzd   | InfluxDB2Source_0       | RUNNING | Source |       |

Step 5: Check for files

Verify that data is being produced at the Kafka topic.

For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.

Configuration Properties

Use the following configuration properties with the fully-managed connector. For self-managed connector property definitions and other details, see the connector docs in Self-managed connectors for Confluent Platform.

How should we connect to your data?

name

Sets a name for your connector.

  • Type: string
  • Valid Values: A string at most 64 characters long
  • Importance: high

Schema Config

schema.context.name

Add a schema context name. A schema context represents an independent scope in Schema Registry. It is a separate sub-schema tied to topics in different Kafka clusters that share the same Schema Registry instance. If not used, the connector uses the default schema configured for Schema Registry in your Confluent Cloud environment.

  • Type: string
  • Default: default
  • Importance: medium

Kafka Cluster credentials

kafka.auth.mode

Kafka Authentication mode. It can be one of KAFKA_API_KEY or SERVICE_ACCOUNT. It defaults to KAFKA_API_KEY mode.

  • Type: string
  • Default: KAFKA_API_KEY
  • Valid Values: KAFKA_API_KEY, SERVICE_ACCOUNT
  • Importance: high
kafka.api.key

Kafka API Key. Required when kafka.auth.mode==KAFKA_API_KEY.

  • Type: password
  • Importance: high
kafka.service.account.id

The Service Account that will be used to generate the API keys to communicate with Kafka Cluster.

  • Type: string
  • Importance: high
kafka.api.secret

Secret associated with Kafka API key. Required when kafka.auth.mode==KAFKA_API_KEY.

  • Type: password
  • Importance: high

InfluxDB

influxdb.url

Fully qualified InfluxDB API URL used for establishing connection.

  • Type: string
  • Importance: high
influxdb.token

Token to authenticate with influx db.

  • Type: password
  • Importance: high
influxdb.org.id

Organization ID.

  • Type: string
  • Importance: high
influxdb.bucket

Bucket from which this connector will read the data from.

  • Type: string
  • Importance: medium

Read Configuration

query

If specified, this query will be executed and the resultant records will be pushed to desired Apache Kafka topic. Use this setting if there’s a need to select subset of fields or tags, perform aggregations or filter data. The query should follow the template - import "influxdata/influxdb/schema" from(bucket:$influxdb.bucket) |> range(start: $startTimestamp, stop: $endTimestamp) |> <Your custom query criteria here> |> schema.fieldsAsCols() |> limit(n: $batch.size) In case of mode=bulk, the connector will run the query as-is each time it polls. The range criteria should be filled in by the user. Flux does not allow unbounded queries as they are resource intensive. If you use mode=timestamp, the values for $startTimestamp and $endTimestamp will be filled by the connector with appropriate source offsets.Users should replace other criteria mentioned at - <Your custom query criteria here>. The connector will replace $influxdb.bucket and $batch.size with the values from the corresponding configurations.

  • Type: string
  • Importance: medium
mode

The mode in which measurements in InfluxDB has to be polled. Supported modes are : bulk performs a bulk load of the entire measurement to desired Apache Kafka topic, each time it is polled. timestamp uses the timestamp to detect newly created rows and writes them to desired Apache Kafka topic.

  • Type: string
  • Default: timestamp
  • Importance: medium
topic.mapper

Configuration to decide how to map topics Supported options are : bucket - Topic name is Topic Prefix + Bucket name. All the records go into same topic. Or measurement - Topic name is Topic Prefix + Measurement name. All the records from same measurement go into same topic.

  • Type: string
  • Default: bucket
  • Importance: medium
topic.prefix

Prefix that should be prepended to measurement names to determine the name of the Apache Kafka topic to publish data to, in case of custom query, it should be the full name of the Apache Kafka topic.

  • Type: string
  • Importance: medium
batch.size

Maximum number of points to include in a single batch when polling for new data. This setting can be used to limit the amount of data buffered internally in the connector.

  • Type: int
  • Default: 5000
  • Importance: medium
timestamp.delay.interval.ms

How long to wait after a record with certain timestamp appears before we include it in the result. You may choose to add some delay to allow transactions with earlier timestamp to complete. The first execution will fetch all available records (i.e. starting at Unix Epoch) until current time minus the delay. Every following execution will get data from the time of the last record fetched in the previous batch until current time minus the delay.

  • Type: int
  • Default: 0
  • Importance: medium
influxdb.measurement.whitelist

Comma separated list of measurements to include in copying. If specified, Measurements Excluded cannot be set. If left empty, all measurements will be included.

  • Type: string
  • Importance: medium
influxdb.measurement.blacklist

Comma separated list of measurements to exclude from copying. If specified, Measurements Included cannot be set.

  • Type: string
  • Importance: medium

Retries

retry.backoff.ms

Backoff time duration to wait before retrying

  • Type: int
  • Default: 1000 (1 second)
  • Importance: medium
max.retries

The maximum number of times to retry on errors before failing the task.

  • Type: int
  • Default: 10
  • Importance: medium

Output messages

output.data.format

Sets the output Kafka record value format. Valid entries are AVRO, JSON_SR, PROTOBUF, or JSON. Note that you need to have Confluent Cloud Schema Registry configured if using a schema-based message format like AVRO, JSON_SR, and PROTOBUF

  • Type: string
  • Default: JSON
  • Importance: high

Number of tasks for this connector

tasks.max

Maximum number of tasks for the connector.

  • Type: int
  • Valid Values: [1,…]
  • Importance: high

Next Steps

For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent CLI to manage your resources in Confluent Cloud.

../_images/topology.png