OpenSearch Sink Connector for Confluent Cloud¶
The fully-managed OpenSearch Sink connector for Confluent Cloud moves data from an Apache Kafka® topic to a specified OpenSearch index facilitating real time analysis of data in OpenSearch. The connector supports Avro, JSON Schema, JSON (schemaless), and Protobuf data output format from Apache Kafka® topics.
Features¶
The OpenSearch Sink connector includes the following features:
- Automatic index creation: The connector supports automatic creation of indexes depending on the OpenSearch configuration.
- Multi-indexing: The connector allows you to create and manage up to 5 indexes simultaneously.
- Input data formats: The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) input data formats. Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON Schema, or Protobuf). For more information, see Schema Registry Enabled Environments.
- Dual-platform OpenSearch support: The connector supports both AWS OpenSearch and OSS OpenSearch.
- Topic-to-index mapping: The connector supports mapping a topic to a specific OpenSearch Index.
- Schema management: The connector supports Schema Registry, Schema Context and Reference Subject Naming Strategy.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Managed and Custom Connectors section.
Limitations¶
Be sure to review the following information.
- For connector limitations, see Opensearch Sink Connector limitations.
- If you plan to use one or more Single Message Transforms (SMTs), see SMT Limitations.
- If you plan to use Confluent Cloud Schema Registry, see Schema Registry Enabled Environments.
Quick Start¶
Use this quick start to get up and running with the Confluent Cloud OpenSearch Sink connector. The quick start provides the basics of selecting the connector and configuring it to stream events to an OpenSearch deployment.
- Prerequisites
- Authorized access to a Confluent Cloud cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud.
- The Confluent CLI installed and configured for the cluster. For help, see Install the Confluent CLI.
- Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON Schema, or Protobuf). For more information, see Schema Registry Enabled Environments.
- For networking considerations, see Networking and DNS. To use a set of public egress IP addresses, see Public Egress IP Addresses for Confluent Cloud Connectors.
- Kafka cluster credentials. The following lists the different ways you can provide credentials.
- Enter an existing service account resource ID.
- Create a Confluent Cloud service account for the connector. Make sure to review the ACL entries required in the service account documentation. Some connectors have specific ACL requirements.
- Create a Confluent Cloud API key and secret. To create a key and secret, you can use confluent api-key create or you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector.
Using the Confluent Cloud Console¶
Step 1: Launch your Confluent Cloud cluster¶
See the Quick Start for Confluent Cloud for installation instructions.
Step 2: Add a connector¶
In the left navigation menu, click Connectors. If you already have connectors in your cluster, click + Add connector.
Step 4: Enter the connector details¶
Note the following:
- Ensure you have completed all the prerequisites.
- An asterisk ( * ) designates a required entry.
At the Add OpenSearch Sink connector screen, complete the following:
If you’ve already populated your Kafka topics, select the topics you want to connect from the Topics list.
To create a new topic, click Add a new topic. To use the default topic settings, click Create with defaults. To modify the topic settings, click Show advanced settings, update accordingly, and then click Save & Create.
- Select the way you want to provide Kafka Cluster credentials. You can
choose one of the following options:
- Global Access: Allows your connector to access everything you have access to. With global access, connector access will be linked to your account. This option is not recommended for production.
- Granular access: Limits the access for your connector. You will be able to manage connector access through a service account. This option is recommended for production.
- Use an existing API key: Allows you to enter an API key and secret part you have stored. You can enter an API key and secret (or generate these in the Cloud Console).
- Click Continue.
- Enter your OpenSearch connection details:
- OpenSearch instance URL: The OpenSearch instance URL. For
example:
http://your-opensearch-instance.com/
. - Endpoint Authentication Type: The authentication type of the endpoint.
- SSL Enabled: Set whether to connect to the endpoint using SSL.
- OpenSearch instance URL: The OpenSearch instance URL. For
example:
- Click Continue.
Note
Configuration properties that are not shown in the Cloud Console use the default values. See Configuration Properties for all property values and definitions.
Select the Input Kafka record value format (data coming from the Kafka topic): AVRO, BYTES, JSON_SR, JSON (schemaless) PROTOBUF. A valid schema must be available in Schema Registry to use a schema-based message format (for example, AVRO, JSON_SR, or PROTOBUF). For more information, see Schema Registry Enabled Environments.
Enter the number of indexes to push data to in the Indexes field. Note that this value should be less than or equal to 5.
Configure the number of indexes you set in the Indexes field. For instance, if you entered 2 in the Indexes field, you should see 2 index configuration sections with the following fields to set:
- Index: The index name. This name together with the OpenSearch Instance URL forms the complete HTTP(S) URL.
- Topic: The topic from which data will be pulled for this index.
- Behavior for null-valued records: Set how to handle records with a non-null key and a null value (that is, Kafka tombstone records) per index.
Show advanced configurations
Schema context: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the Default context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a Source connector uses only that schema context to register a schema and a Sink connector uses only that schema context to read from. For more information about setting up a schema context, see What are schema contexts and when should you use them?.
Max poll interval (ms): The maximum delay between subsequent consume requests to Kafka. This configuration property can be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to
300000
milliseconds (5 minutes).Max poll records: The maximum number of records to consume from Kafka in a single request. This configuration property can be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 500 records.
Behavior on Errors: Behavior setting for handling error responses from HTTP requests. storage connectors. Must be configured to one of the following:
IGNORE
orFAIL
.Retry Backoff Policy: The backoff policy to use in terms of a retry. Must be configured to
CONSTANT_VALUE
OREXPONENTIAL_WITH_JITTER
.Retry Backoff (ms): The time in milliseconds to wait following an error before the connector retries the task.
Retry HTTP Status Codes: The HTTP response status codes returned that prompt the connector to retry the request. Enter a comma- separated list of codes or range of codes. Ranges are specified with a start and optional end code. Range boundaries are inclusive. For example:
400-
includes all codes greater than or equal to400
and400-500
includes codes from 400 to 500, including 500. Multiple ranges and single codes can be specified together to achieve fine-grained control over retry behavior. For example:404,408,500-
prompts the connector to retry on404 NOT FOUND
,408 REQUEST TIMEOUT
, and all5xx
error codes. Note that some status codes are always retried, such as unauthorized, timeouts, and too many requests.Maximum Retries: The maximum number of times the connector retries a request when an error occurs, before the task fails.
For information about transforms and predicates, see the Single Message Transforms (SMT) documentation for details. For a list of SMTs that are not supported with this connector, see Unsupported transformations.
Click Continue.
Based on the number of topic partitions you select, you will be provided with a recommended number of tasks.
- To change the number of recommended tasks, enter the number of tasks for the connector to use in the Tasks field.
- Click Continue.
Verify the connection details.
Click Launch.
The status for the connector should go from Provisioning to Running.
Step 5: Check the results in OpenSearch¶
Verify that new records are being added to your OpenSearch deployment.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Managed and Custom Connectors section.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See Confluent Cloud Dead Letter Queue for details.
Using the Confluent CLI¶
Complete the following steps to set up and run the connector using the Confluent CLI. Ensure you have completed all the prerequisites.
Step 1: List the available connectors¶
Enter the following command to list available connectors:
confluent connect plugin list
Step 2: List the connector configuration properties¶
Enter the following command to show the connector configuration properties:
confluent connect plugin describe <connector-plugin-name>
The command output shows the required and optional configuration properties.
Step 3: Create the connector configuration file¶
Create a JSON file that contains the connector configuration properties. The following example shows required and optional connector properties.
{
"connector.class": "OpenSearchSink",
"input.data.format": "JSON",
"kafka.auth.mode": "KAFKA_API_KEY",
"kafka.api.key": "<my-kafka-api-key",
"kafka.api.secret": "<my-kafka-api-secret",
"name": "os_sink_connectors3ss2a",
"instance.url": "https://your-opensearch-endpoint.example",
"topics": "inventory,orders,users",
"request.method": "POST",
"tasks.max": "1",
"indexes.num": "3",
"auth.type": "BASIC",
"connection.user": "username",
"connection.password": "password",
"index1.name" : "users_index",
"index1.topic": "users",
"index2.name" : "inventory_index",
"index2.topic": "inventory",
"index3.name" : "orders",
"index3.topic": "orders_index"
}
Note the following property definitions:
"connector.class"
: Identifies the connector plugin name."input.data.format"
: Sets the input Kafka record value format (data coming from the Kafka topic). Valid entries are AVRO, JSON_SR, PROTOBUF, or JSON. You must have Confluent Cloud Schema Registry configured if using a schema-based message format (for example, Avro, JSON Schema, JSON, or Protobuf).
"kafka.auth.mode"
: Identifies the connector authentication mode you want to use. There are two options:SERVICE_ACCOUNT
orKAFKA_API_KEY
(the default). To use an API key and secret, specify the configuration propertieskafka.api.key
andkafka.api.secret
, as shown in the example configuration (above). To use a service account, specify the Resource ID in the propertykafka.service.account.id=<service-account-resource-ID>
. To list the available service account resource IDs, use the following command:confluent iam service-account list
For example:
confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2
"name"
: Sets a name for your new connector."instance.url"
: The OpenSearch instance URL. The URL you enter should look like this:http://your-opensearch-instance.com/
."topics"
: Identifies the topic name or a comma-separated list of topic names."request.method"
: Enter an HTTP API Request Method. OnlyPOST
requests are supported."tasks.max"
: Enter the maximum number of tasks for the connector to use. More tasks might improve performance."indexes.num"
: The number of indexes to push data to.
Single Message Transforms: For details about adding SMTs using the CLI, see the Single Message Transforms (SMT) documentation. For a list of SMTs that are not supported with this connector, see Unsupported transformations.
For all property values and definitions, see Configuration Properties.
Step 4: Load the configuration file and create the connector¶
Enter the following Confluent CLI command to load the configuration and start the connector:
confluent connect cluster create --config-file <file-name>.json
For example:
confluent connect cluster create --config-file opensearch-sink-config.json
Example output:
Created connector os_sink_connectors3ss2a lcc-ix4dl
Step 5: Check the connector status¶
Enter the following Confluent CLI command to check the connector status:
confluent connect cluster list
Example output:
ID | Name | Status | Type
+-----------+----------------------------+---------+------+
lcc-ix4dl | os_sink_connectors3ss2a | RUNNING | sink
Step 6: Check the results in OpenSearch¶
Verify new records are being added to the OpenSearch deployment.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Managed and Custom Connectors section.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See Confluent Cloud Dead Letter Queue for details.
Configuration Properties¶
Use the following configuration properties with the OpenSearch Sink connector.
Which topics do you want to get data from?¶
topics
Identifies the topic name or a comma-separated list of topic names.
- Type: list
- Importance: high
Schema Config¶
schema.context.name
Add a schema context name. A schema context represents an independent scope in Schema Registry. It is a separate sub-schema tied to topics in different Kafka clusters that share the same Schema Registry instance. If not used, the connector uses the default schema configured for Schema Registry in your Confluent Cloud environment.
- Type: string
- Default: default
- Importance: medium
Input messages¶
input.data.format
Sets the input Kafka record value format. Valid entries are AVRO, JSON_SR, PROTOBUF, JSON or BYTES. Note that you need to have Confluent Cloud Schema Registry configured if using a schema-based message format like AVRO, JSON_SR, and PROTOBUF.
- Type: string
- Default: JSON
- Importance: high
value.converter.reference.subject.name.strategy
Set the subject reference name strategy for value. Valid entries are DefaultReferenceSubjectNameStrategy or QualifiedReferenceSubjectNameStrategy. Note that the subject reference name strategy can be selected only for PROTOBUF format with the default strategy being DefaultReferenceSubjectNameStrategy.
- Type: string
- Default: DefaultReferenceSubjectNameStrategy
- Importance: high
How should we connect to your data?¶
name
Sets a name for your connector.
- Type: string
- Valid Values: A string at most 64 characters long
- Importance: high
Kafka Cluster credentials¶
kafka.auth.mode
Kafka Authentication mode. It can be one of KAFKA_API_KEY or SERVICE_ACCOUNT. It defaults to KAFKA_API_KEY mode.
- Type: string
- Default: KAFKA_API_KEY
- Valid Values: KAFKA_API_KEY, SERVICE_ACCOUNT
- Importance: high
kafka.api.key
Kafka API Key. Required when kafka.auth.mode==KAFKA_API_KEY.
- Type: password
- Importance: high
kafka.service.account.id
The Service Account that will be used to generate the API keys to communicate with Kafka Cluster.
- Type: string
- Importance: high
kafka.api.secret
Secret associated with Kafka API key. Required when kafka.auth.mode==KAFKA_API_KEY.
- Type: password
- Importance: high
Consumer configuration¶
max.poll.interval.ms
The maximum delay between subsequent consume requests to Kafka. This configuration property may be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 300000 milliseconds (5 minutes).
- Type: long
- Default: 300000 (5 minutes)
- Valid Values: [60000,…,1800000]
- Importance: low
max.poll.records
The maximum number of records to consume from Kafka in a single request. This configuration property may be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 500 records.
- Type: long
- Default: 500
- Valid Values: [1,…,500]
- Importance: low
Number of tasks for this connector¶
tasks.max
Maximum number of tasks for the connector.
- Type: int
- Valid Values: [1,…]
- Importance: high
Authentication¶
instance.url
The OpenSearch instance URL. For example: https://your-opensearch-instance.com/.
- Type: string
- Importance: high
auth.type
Authentication type of the endpoint. Valid values are
NONE
,BASIC
.- Type: string
- Default: BASIC
- Importance: high
connection.user
The username to be used with an endpoint requiring basic authentication.
- Type: string
- Importance: medium
connection.password
The password to be used with an endpoint requiring basic authentication.
- Type: password
- Importance: medium
opensearch.ssl.enabled
Whether or not to connect to the endpoint via SSL.
- Type: boolean
- Default: false
- Importance: medium
opensearch.ssl.keystorefile
The key store containing the server certificate.
- Type: password
- Importance: low
opensearch.ssl.keystore.password
The store password for the key store file.
- Type: password
- Importance: high
opensearch.ssl.key.password
The password for the private key in the key store file.
- Type: password
- Importance: high
opensearch.ssl.truststorefile
The trust store containing a server CA certificate.
- Type: password
- Importance: high
opensearch.ssl.truststore.password
The trust store password containing a server CA certificate.
- Type: password
- Importance: high
opensearch.ssl.protocol
The protocol to use for SSL connections
- Type: string
- Default: TLSv1.3
- Importance: medium
Behavior On Error¶
behavior.on.error
Error handling behavior setting for handling error response from HTTP requests.
- Type: string
- Default: FAIL
- Importance: low
Indexes¶
indexes.num
The number of indexes to push data to. This value should be less than or equal to 5
- Type: int
- Default: 1
- Valid Values: [1,…,5]
- Importance: high
Retry Configs¶
retry.backoff.policy
The backoff policy to use in terms of retry - CONSTANT_VALUE or EXPONENTIAL_WITH_JITTER
- Type: string
- Default: EXPONENTIAL_WITH_JITTER
- Importance: medium
retry.backoff.ms
The initial duration in milliseconds to wait following an error before a retry attempt is made. Subsequent backoff attempts can be a constant value or exponential with jitter (can be configured using api*.retry.backoff.policy parameter). Jitter adds randomness to the exponential backoff algorithm to prevent synchronized retries.
- Type: int
- Default: 3000 (3 seconds)
- Valid Values: [100,…]
- Importance: medium
retry.on.status.codes
Comma-separated list of HTTP status codes or range of codes to retry on. Ranges are specified with start and optional end code. Range boundaries are inclusive. For instance, 400- includes all codes greater than or equal to 400. 400-500 includes codes from 400 to 500, including 500. Multiple ranges and single codes can be specified together to achieve fine-grained control over retry behavior. For example, 404,408,500- will retry on 404 NOT FOUND, 408 REQUEST TIMEOUT, and all 5xx error codes. Note that some status codes will always be retried, such as unauthorized, timeouts and too many requests.
- Type: string
- Default: 400-
- Importance: medium
max.retries
The maximum number of times to retry on errors before failing the task.
- Type: int
- Default: 3
- Importance: medium
Index 1 configuration¶
index1.name
The index name together with the OpenSearch Instance URL will form the complete HTTP(S) URL. This path can be templated with offset information.
- Type: string
- Importance: high
index1.topic
Topic from where data will be pulled for this Index
- Type: string
- Default: “”
- Importance: high
index1.behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
,DELETE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
index1.batch.size
Size of the batch of records to be sent to the OpenSearch. Note that Basic and Standard Clusters may experience throughput limitations, even with a higher batch size.
- Type: int
- Default: 1
- Importance: low
Index 2 configuration¶
index2.name
The index name together with the OpenSearch Instance URL will form the complete HTTP(S) URL. This path can be templated with offset information.
- Type: string
- Importance: high
index2.topic
Topic from where data will be pulled for this Index
- Type: string
- Default: “”
- Importance: high
index2.behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
,DELETE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
index2.batch.size
Size of the batch of records to be sent to the OpenSearch. Note that Basic and Standard Clusters may experience throughput limitations, even with a higher batch size.
- Type: int
- Default: 1
- Importance: low
Index 3 configuration¶
index3.name
The index name together with the OpenSearch Instance URL will form the complete HTTP(S) URL. This path can be templated with offset information.
- Type: string
- Importance: high
index3.topic
Topic from where data will be pulled for this Index
- Type: string
- Default: “”
- Importance: high
index3.behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
,DELETE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
index3.batch.size
Size of the batch of records to be sent to the OpenSearch. Note that Basic and Standard Clusters may experience throughput limitations, even with a higher batch size.
- Type: int
- Default: 1
- Importance: low
Index 4 configuration¶
index4.name
The index name together with the OpenSearch Instance URL will form the complete HTTP(S) URL. This path can be templated with offset information.
- Type: string
- Importance: high
index4.topic
Topic from where data will be pulled for this Index
- Type: string
- Default: “”
- Importance: high
index4.behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
,DELETE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
index4.batch.size
Size of the batch of records to be sent to the OpenSearch. Note that Basic and Standard Clusters may experience throughput limitations, even with a higher batch size.
- Type: int
- Default: 1
- Importance: low
Index 5 configuration¶
index5.name
The index name together with the OpenSearch Instance URL will form the complete HTTP(S) URL. This path can be templated with offset information.
- Type: string
- Importance: high
index5.topic
Topic from where data will be pulled for this Index
- Type: string
- Default: “”
- Importance: high
index5.behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
,DELETE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
index5.batch.size
Size of the batch of records to be sent to the OpenSearch. Note that Basic and Standard Clusters may experience throughput limitations, even with a higher batch size.
- Type: int
- Default: 1
- Importance: low
Next Steps¶
For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent CLI to manage your resources in Confluent Cloud.