Google Cloud Functions Gen 2 Sink Connector for Confluent Cloud¶
The fully-managed Google Cloud Functions Gen 2 Sink connector for Confluent Cloud moves data from an Apache Kafka® topic to a specified Google Cloud Functions. The connector supports Avro, JSON Schema, JSON (schemaless), and Protobuf data output format from Kafka topics.
Features¶
The Google Cloud Functions Gen 2 Sink connector includes the following features:
- Google Cloud Functions Gen 2 and Gen 1 support: The connector supports both Gen 2 and Gen 1 functions while delivering improved performance.
- Secure access and data exchange: The connector supports the following
authentication mechanisms:
- Google Cloud Service Account
- None
- API error reporting management: You can configure the connector to notify you when an API error occurs through email or through the Confluent Cloud user interface. You also can configure the connector to ignore when an API error occurs.
- Supported data formats: The connector supports Avro, Bytes, JSON (schemaless), JSON Schema, and Protobuf data formats. Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON Schema, or Protobuf). For additional information, see Schema Registry Enabled Environments.
- Schema Registry and Schema Context support: The connector allows you to map an API to a specific schema context so that you can leverage the schema context feature in different environments.
- Custom offset support: The connector allows you to configure custom offsets using the Confluent Cloud Console to prevent data loss and data duplication.
- Configurable retry functionality: The connector allows you to customize retry settings based on your requirements.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.
Limitations¶
Be sure to review the following information.
- If you plan to use one or more Single Message Transforms (SMTs), see SMT Limitations.
- If you plan to use Confluent Cloud Schema Registry, see Schema Registry Enabled Environments.
- The connector only supports invoking only a single function.
- The target Google Function should be in the same region as your Confluent Cloud cluster.
- The connector is only supported in Google Cloud clusters.
- Messages in the reporter topic can be out of order relative to the order that the records were provided
Quick Start¶
Use this quick start to get up and running with the Google Cloud Functions Gen 2 Sink connector on Confluent Cloud connector.
Prerequisites¶
- Authorized access to a Confluent Cloud cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud).
- The Confluent CLI installed and configured for the cluster. For help, see Install the Confluent CLI.
- Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). For more information, see Schema Registry Enabled Environments.
- At least one source Kafka topic must exist in your Confluent Cloud cluster before creating the sink connector.
Using the Confluent Cloud Console¶
Step 1: Launch your Confluent Cloud cluster¶
See the Quick Start for Confluent Cloud for installation instructions.
Step 2: Add a connector¶
In the left navigation menu, click Connectors. If you already have connectors in your cluster, click + Add connector.
Step 4: Enter the connector details¶
Note
- Ensure you have all your prerequisites completed.
- An asterisk ( * ) designates a required entry.
At the Add Google Cloud Functions Gen 2 Sink Connector screen, complete the following:
If you’ve already populated your Kafka topics, select the topics you want to connect from the Topics list.
To create a new topic, click +Add new topic.
- Select the way you want to provide Kafka Cluster credentials. You can
choose one of the following options:
- My account: This setting allows your connector to globally access everything that you have access to. With a user account, the connector uses an API key and secret to access the Kafka cluster. This option is not recommended for production.
- Service account: This setting limits the access for your connector by using a service account. This option is recommended for production.
- Use an existing API key: This setting allows you to specify an API key and a secret pair. You can use an existing pair or create a new one. This method is not recommended for production environments.
- Click Continue.
- In the Authentication Type field, select the authentication type for the given function. Currently the connector supports Google Cloud Service Account authentication only.
- Click Continue.
- Select the Input Kafka record value format (data coming from the Kafka topic): AVRO, or BYTES, JSON, JSON_SR, or PROTOBUF. A valid schema must be available in Schema Registry to use a schema-based message format (for example, Avro, JSON Schema, or Protobuf). For more information, see Schema Registry Enabled Environments. Note that to consume STRING data, select schemaless JSON.
- Function Name: Name of the function to be invoked
- Region Name: Region of the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’.
- Project ID: Project identifier for the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’.
Show advanced configurations
Schema context: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the Default context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a Source connector uses only that schema context to register a schema and a Sink connector uses only that schema context to read from. For more information about setting up a schema context, see What are schema contexts and when should you use them?.
Behavior On Errors: Select the error handling behavior setting for handling error responses from HTTP requests. Valid options are
IGNORE
andFAIL
. This defaults toFAIL
.Max poll interval (ms): The maximum delay between subsequent consume requests to Kafka. This configuration property can be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to
300000
milliseconds (5 minutes).Max poll records: The maximum number of records to consume from Kafka in a single request. This configuration property can be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 500 records.
Retry Backoff Policy: The backoff policy to use in terms of a retry. Must be configured to
CONSTANT_VALUE
OREXPONENTIAL_WITH_JITTER
.Retry Backoff (ms): The time in milliseconds to wait following an error before the connector retries the task.
Retry HTTP Status Codes: The HTTP response status codes returned that prompt the connector to retry the request. Enter a comma- separated list of codes or range of codes. Ranges are specified with a start and optional end code. Range boundaries are inclusive. For example:
400-
includes all codes greater than or equal to400
and400-500
includes codes from 400 to 500, including 500. Multiple ranges and single codes can be specified together to achieve fine-grained control over retry behavior. For example:404,408,500-
prompts the connector to retry on404 NOT FOUND
,408 REQUEST TIMEOUT
, and all5xx
error codes. Note that some status codes are always retried, such as unauthorized, timeouts, and too many requests.Maximum Retries: The maximum number of times the connector retries a request when an error occurs, before the task fails.
Behavior for null valued records: Behavior of the connector when it encounters a record with a null value. Valid options are IGNORE and FAIL. This defaults to IGNORE.
Connect timeout (ms): Timeout for the connection to the Google Cloud Functions. Default is 30000 ms.
Request timeout (ms): Timeout for the request to the Google Cloud Functions. Default is 30000 ms.
Batch max size: The number of records accumulated in a batch before the Google Cloud Functions API is invoked. Default is 1.
Batch json as array: Whether or not to use an array to bundle JSON records. Setting this to true will send records as a JSON array. Default is false.
For information about transforms and predicates, see the Single Message Transforms (SMT) documentation for details. For a list of SMTs that are not supported with this connector, see Unsupported transformations.
For all property values and definitions, see Configuration Properties.
- Click Continue.
Based on the number of topic partitions you select, you will be provided with a recommended number of tasks.
- To change the number of recommended tasks, enter the number of tasks for the connector to use in the Tasks field.
- Click Continue.
Verify the connection details.
Click Continue.
The status for the connector should go from Provisioning to Running.
Step 5: Check for records¶
Verify that records are being produced at the endpoint.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See View Connector Dead Letter Queue Errors in Confluent Cloud for details.
Using the Confluent CLI¶
To set up and run the connector using the Confluent CLI, complete the following steps, but ensure you have met all prerequisites.
Step 1: List the available connectors¶
Enter the following command to list available connectors:
confluent connect plugin list
Step 2: List the connector configuration properties¶
Enter the following command to show the connector configuration properties:
confluent connect plugin describe <connector-plugin-name>
The command output shows the required and optional configuration properties.
Step 3: Create the connector configuration file¶
Create a JSON file that contains the connector configuration properties. The following example shows the required connector properties.
{
"topics": "topic_0",
"schema.context.name": "default",
"input.data.format": "JSON",
"connector.class": "GoogleCloudFunctionsGen2Sink",
"name": "GoogleCloudFunctionsGen2SinkConnector_0",
"kafka.auth.mode": "KAFKA_API_KEY",
"kafka.api.key": "****************",
"kafka.api.secret": "****************************************************************",
"max.poll.interval.ms": "300000",
"max.poll.records": "500",
"tasks.max": "1",
"gcf.auth.type": "Google Cloud Service Account",
"gcp.credentials.json": "*\n*************************\n",
"behavior.on.error": "FAIL",
"max.retries": "5",
"retry.backoff.policy": "EXPONENTIAL_WITH_JITTER",
"retry.backoff.ms": "3000",
"retry.on.status.codes": "401,429,500-",
"gcf.connect.timeout.ms": "30000",
"gcf.request.timeout.ms": "30000",
"behavior.on.null.values": "IGNORE",
"gcf.name": "function-1",
"gcf.region.name": "us-central1",
"gcf.project.id": "connect-2024",
"max.batch.size": "1",
"batch.json.as.array": "false"
}
Note the following property definitions:
"connector.class"
: Identifies the connector plugin name."input.data.format"
: Sets the input Kafka record value format (data coming from the Kafka topic). Valid entries are AVRO, JSON_SR, PROTOBUF, or JSON. You must have Confluent Cloud Schema Registry configured if using a schema-based message format (for example, Avro, JSON Schema or Protobuf).
"kafka.auth.mode"
: Identifies the connector authentication mode you want to use. There are two options:SERVICE_ACCOUNT
orKAFKA_API_KEY
(the default). To use an API key and secret, specify the configuration propertieskafka.api.key
andkafka.api.secret
, as shown in the example configuration (above). To use a service account, specify the Resource ID in the propertykafka.service.account.id=<service-account-resource-ID>
. To list the available service account resource IDs, use the following command:confluent iam service-account list
For example:
confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2
"name"
: Sets a name for your new connector."topics"
: Identifies the topic name or a comma-separated list of topic names."tasks.max"
: Enter the maximum number of tasks for the connector to use. More tasks might improve performance."gcf.name"
: Name of the function to be invoked."gcf.region.name"
: Region of the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’."gcf.project.id"
: Project ID for the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’."gcf.auth.type"
: Authentication type for the given function. Currently the connector supports Google Cloud Service Account authentication and unauthorized invocation.
Single Message Transforms: For details about adding SMTs using the CLI, see the Single Message Transforms (SMT) documentation. For all property values and descriptions, see Configuration Properties.
Step 4: Load the properties file and create the connector¶
To load the configuration and start the connector, run the following Confluent CLI command:
confluent connect cluster create --config-file <file-name>.json
For example:
confluent connect cluster create --config-file google-cloud-functions-gen2-sink-config.json
Example output:
Created connector GoogleCloudFunctionsGen2SinkConnector_0 lcc-do6vzd
Step 5: Check the connector status.¶
To check the connector status, run the following Confluent CLI command:
confluent connect cluster list
Example output:
ID | Name | Status | Type | Trace |
+------------+--------------------------------------------+---------+------+-------+
lcc-do6vzd | GoogleCloudFunctionsGen2SinkConnector_0 | RUNNING | sink | |
Step 6: Check for records¶
Verify that records are populating the endpoint.
For more information and examples to use with the Confluent Cloud API for Connect, see the Confluent Cloud API for Connect Usage Examples section.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See View Connector Dead Letter Queue Errors in Confluent Cloud for details.
Configuration Properties¶
Use the following configuration properties with the fully-managed Google Cloud Functions Gen 2 Sink connector.
Which topics do you want to get data from?¶
topics
Identifies the topic name or a comma-separated list of topic names.
- Type: list
- Importance: high
Schema Config¶
schema.context.name
Add a schema context name. A schema context represents an independent scope in Schema Registry. It is a separate sub-schema tied to topics in different Kafka clusters that share the same Schema Registry instance. If not used, the connector uses the default schema configured for Schema Registry in your Confluent Cloud environment.
- Type: string
- Default: default
- Importance: medium
Input messages¶
input.data.format
Sets the input Kafka record value format. Valid entries are AVRO, JSON_SR, PROTOBUF, JSON or BYTES. Note that you need to have Confluent Cloud Schema Registry configured if using a schema-based message format like AVRO, JSON_SR, and PROTOBUF.
- Type: string
- Default: JSON
- Importance: high
value.converter.reference.subject.name.strategy
Set the subject reference name strategy for value. Valid entries are DefaultReferenceSubjectNameStrategy or QualifiedReferenceSubjectNameStrategy. Note that the subject reference name strategy can be selected only for PROTOBUF format with the default strategy being DefaultReferenceSubjectNameStrategy.
- Type: string
- Default: DefaultReferenceSubjectNameStrategy
- Importance: high
How should we connect to your data?¶
name
Sets a name for your connector.
- Type: string
- Valid Values: A string at most 64 characters long
- Importance: high
Kafka Cluster credentials¶
kafka.auth.mode
Kafka Authentication mode. It can be one of KAFKA_API_KEY or SERVICE_ACCOUNT. It defaults to KAFKA_API_KEY mode.
- Type: string
- Default: KAFKA_API_KEY
- Valid Values: KAFKA_API_KEY, SERVICE_ACCOUNT
- Importance: high
kafka.api.key
Kafka API Key. Required when kafka.auth.mode==KAFKA_API_KEY.
- Type: password
- Importance: high
kafka.service.account.id
The Service Account that will be used to generate the API keys to communicate with Kafka Cluster.
- Type: string
- Importance: high
kafka.api.secret
Secret associated with Kafka API key. Required when kafka.auth.mode==KAFKA_API_KEY.
- Type: password
- Importance: high
Consumer configuration¶
max.poll.interval.ms
The maximum delay between subsequent consume requests to Kafka. This configuration property may be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 300000 milliseconds (5 minutes).
- Type: long
- Default: 300000 (5 minutes)
- Valid Values: [60000,…,1800000] for non-dedicated clusters and [60000,…] for dedicated clusters
- Importance: low
max.poll.records
The maximum number of records to consume from Kafka in a single request. This configuration property may be used to improve the performance of the connector, if the connector cannot send records to the sink system. Defaults to 500 records.
- Type: long
- Default: 500
- Valid Values: [1,…,500] for non-dedicated clusters and [1,…] for dedicated clusters
- Importance: low
Number of tasks for this connector¶
tasks.max
Maximum number of tasks for the connector.
- Type: int
- Valid Values: [1,…]
- Importance: high
Authentication¶
gcf.auth.type
Authentication method of the connector. Valid values are
None
,Google Cloud Service Account
.- Type: string
- Default: Google Cloud Service Account
- Importance: high
gcp.credentials.json
GCP service account JSON file.
- Type: password
- Importance: high
Behavior on Error¶
behavior.on.error
Error handling behavior setting for handling error response from HTTP requests.
- Type: string
- Default: FAIL
- Importance: low
Retry configurations¶
max.retries
The maximum number of times to retry on errors before failing the task.
- Type: int
- Default: 5
- Importance: medium
retry.backoff.policy
The backoff policy to use in terms of retry - CONSTANT_VALUE or EXPONENTIAL_WITH_JITTER
- Type: string
- Default: EXPONENTIAL_WITH_JITTER
- Importance: medium
retry.backoff.ms
The initial duration in milliseconds to wait following an error before a retry attempt is made. Subsequent backoff attempts can be a constant value or exponential with jitter (can be configured using retry.backoff.policy parameter). Jitter adds randomness to the exponential backoff algorithm to prevent synchronized retries.
- Type: int
- Default: 3000 (3 seconds)
- Valid Values: [100,…]
- Importance: medium
retry.on.status.codes
Comma-separated list of HTTP status codes or range of codes to retry on. Ranges are specified with start and optional end code. Range boundaries are inclusive. For instance, 400- includes all codes greater than or equal to 400. 400-500 includes codes from 400 to 500, including 500. Multiple ranges and single codes can be specified together to achieve fine-grained control over retry behavior. For example, 404,408,500- will retry on 404 NOT FOUND, 408 REQUEST TIMEOUT, and all 5xx error codes. Note that some status codes will always be retried, such as unauthorized, timeouts and too many requests.
- Type: string
- Default: 401,429,500-
- Importance: medium
Connection configurations¶
gcf.connect.timeout.ms
The time in milliseconds to wait for a connection to be established
- Type: int
- Default: 30000 (30 seconds)
- Valid Values: [1000,…,600000]
- Importance: medium
gcf.request.timeout.ms
The time in milliseconds to wait for a request response from the server
- Type: int
- Default: 30000 (30 seconds)
- Valid Values: [1000,…,600000]
- Importance: medium
Behavior on records¶
behavior.on.null.values
How to handle records with a non-null key and a null value (i.e. Kafka tombstone records). Valid options are
IGNORE
andFAIL
- Type: string
- Default: IGNORE
- Importance: low
Google Cloud Functions Configurations¶
gcf.name
Name of the function to be invoked
- Type: string
- Importance: high
gcf.region.name
Region of the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’
- Type: string
- Importance: high
gcf.project.id
Project ID for the given function to be invoked as in ‘https://<region-name>-<project-id>.cloudfunctions.net/’
- Type: string
- Importance: high
Batch Configurations¶
max.batch.size
The number of records accumulated in a batch before the Google Cloud Functions API is invoked
- Type: int
- Default: 1
- Importance: high
batch.json.as.array
Whether or not to use an array to bundle json records. Setting this to true will send records as json array.
- Type: boolean
- Default: false
- Importance: high
Next Steps¶
For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent CLI to manage your resources in Confluent Cloud.