Snowflake Sink Connector for Confluent Cloud

Note

If you are installing the connector locally for Confluent Platform, see the Snowflake Connector for Kafka documentation.

The Kafka Connect Snowflake Sink connector for Confluent Cloud maps and persists events from Apache Kafka® topics directly to a Snowflake database. The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) data from Apache Kafka® topics. It ingests events from Kafka topics directly into a Snowflake database, exposing the data to services for querying, enrichment, and analytics.

Important

If you are still on Confluent Cloud Enterprise, please contact your Confluent Account Executive for more information about using this connector.

Features

The Snowflake sink connector provides the following features:

  • Database authentication: Uses private key authentication.
  • Input data formats: The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) input data formats. Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
  • Select configuration properties: The following properties determine what metadata is included in the RECORD_METADATA column in the Snowflake database table.
    • snowflake.metadata.createtime: If this value is set to false, the CreateTime property value is omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • snowflake.metadata.topic: If this value is set to false, the topic property value is omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • snowflake.metadata.offset.and.partition: If value is set to false, the Offset and Partition property values are omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • snowflake.metadata.all: If value is set to false, the metadata in the RECORD_METADATA column is completely empty. The default value is true.

Configuration properties that are not shown in the Confluent Cloud UI use the default values. For more information, see the Snowflake Sink Connector Configuration Properties.

For more information, see the Confluent Cloud connector limitations.

Target table naming guidelines

Note the following table naming guidelines and limitations:

  • Confluent Cloud and the Confluent Cloud managed Snowflake Sink connector do not allow you to configure topic:table name mapping, which is supported by the self-managed Snowflake Sink connector.

  • Snowflake itself has limitations on object (table) naming conventions. See Identifier Requirements for details.

  • Kafka is much more permissive with topic naming conventions. You are allowed to use Kafka topic names that break the table name mapping in the Confluent Cloud Snowflake Sink connector.

    When a Kafka topic name does not conform to Snowflake’s table naming limitations (for example, my-topic-name), the connector will rename the topic to a safe name with an appended hash (for example, my_topic_name_021342). A conforming topic name (for example, my_topic_name) will send results to the expected table named my_topic_name.

  • If the connector needs to adjust the name of the table created for a Kafka topic, there is the potential for identical table names. For example, if you are reading data from Kafka topics numbers+x and numbers-x, the tables created for these topics will both be named NUMBERS_X. To avoid table name duplication, the connector appends a suffix to the table name. The suffix is an underscore followed by a generated hash.

Generate a Snowflake key pair

Before the connector can sink data to Snowflake, you need to generate a key pair. Snowflake authentication requires 2048-bit (minimum) RSA. You add the public key to a Snowflake user account. You add the private key to the connector configuration (when completing the Quick Start instructions).

Note

This procedure generates an unencrypted private key. You can generate and use an encrypted key. If you generate an encrypted key, you add the passphrase to your connector configuration in addition to the private key. For information about generating an encrypted key, see Using Key Pair Authentication in the Snowflake documentation.

Creating the key pair

Complete the following steps to generate a key pair.

  1. Generate a private key using OpenSSL.

    openssl genrsa -out snowflake_key.pem 2048
    
  2. Generate the public key referencing the private key.

    openssl rsa -in snowflake_key.pem  -pubout -out snowflake_key.pub
    
  3. List the generated Snowflake key files.

    ls -l snowflake_key*
    
    -rw-r--r--  1  1679 Jun  8 17:04 snowflake_key.pem
    -rw-r--r--  1   451 Jun  8 17:05 snowflake_key.pub
    
  4. Show the contents of the public key file.

    cat snowflake_key.pub
    
    -----BEGIN PUBLIC KEY-----
    MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2zIuUb62JmrUAMoME+SX
    vsz9KUCp/cC+Y+kTGfYB3jRDQ06O0UT+yUKMO/KWuc0dUxZ8s9koW5l/n+TBfxIQ
    
    ... omitted
    
    1tD+Ktd/CTXPoVEI2tgCC9Avf/6/9HU3IpV0gL8SZ8U0N5ot4Uw+CSYB3JjMagEG
    bBWZ8Qc26pFk7Fd17+ykH6rEdLeQ9OElc0ZruVwSsa4AxaZOT+rqCCP7FQPzKTtA
    JQIDAQAB
    -----END PUBLIC KEY-----
    
  5. Copy the key. You will add it to a new user in Snowflake. Copy only the part of the key between --BEGIN PUBLIC KEY-- and --END PUBLIC KEY--). You can do this manually or you can use the following command:

    grep -v "BEGIN PUBLIC" snowflake_key.pub | grep -v "END PUBLIC"|tr -d '\r\n'
    

    In the following section you create a user and add the public key.

Creating a user and adding the public key

Open your Snowflake project. Complete the following steps to create a user account and add the public key to this account.

  1. Go to the Worksheets panel and switch to the SECURITYADMIN role.

    Important

    Make sure you set the SECURITYADMIN role in the Worksheets panel (shown below) and not using the user account drop-down selection. For additional information, see User Management.

    Snowflake security admin role
  2. Run the following query in Worksheets to create a user, add the public key copied earlier, and grant the SYSADMIN role to the user.

    CREATE USER admin RSA_PUBLIC_KEY='<public-key>';
    

    Make sure to add the public key as a single line in the statement.The following shows what this looks like in Snowflake Worksheets:

    Snowflake sysadmin role creation statements

    Tip

    If you did not set the role to SECURITYADMIN, or if you set the role using the user account drop-down menu, an SQL access control error is displayed.

    SQL access control error: Insufficient privileges to operate on account '<account-name>'
    

Configuring user privileges

Complete the following steps to set the correct privileges for the user added.

For example: Suppose you want to send Apache Kafka® records to a database named PRODUCTION using the schema PUBLIC. The following shows the required queries to configure the necessary user privileges.

# Use a role that can create and manage roles and privileges:
use role securityadmin;

# Create a Snowflake role with the privileges to work with the connector
create role kafka_connector_role;

# Grant privileges on the database:
grant usage on database PRODUCTION to role kafka_connector_role;

# Grant privileges on the schema:
grant usage on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create table on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create stage on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create pipe on schema PRODUCTION.PUBLIC to role kafka_connector_role;

# Grant the custom role to an existing user:
grant role kafka_connector_role to user admin;

# Make the new role the default role:
alter user admin set default_role=kafka_connector_role;

Extracting the private key

You add the private key to your Snowflake connector configuration. Extract the key and put it a safe place until you set up your connector.

  1. List the generated Snowflake key files.

    ls -l snowflake_key*
    
    -rw-r--r--  1  1679 Jun  8 17:04 snowflake_key.pem
    -rw-r--r--  1   451 Jun  8 17:05 snowflake_key.pub
    
  2. Show the contents of the private key file.

    cat snowflake_key.pem
    
    -----BEGIN RSA PRIVATE KEY-----
    MIIEpQIBAAKCAQEA2zIuUb62JmrUAMoME+SXvsz9KUCp/cC+Y+kTGfYB3jRDQ06O
    0UT+yUKMO/KWuc0dUxZ8s9koW5l/n+TBfxIQx+24C2+l9t3TxxaLdf/YCgQwKNR9
    dO9/c+SkX8NfcwUynGEo3wpmdb4hp0X9TfWKX9vG//zK2tndmMUrFY5OcGSSVJYJ
    Wv3gk04sVxhINo5knpgZoUVztxcRLm/vNvIX1tD+Ktd/CTXPoVEI2tgCC9Avf/6/
    9HU3IpV0gL8SZ8U0N5ot4Uw+CSYB3JjMagEGbBWZ8Qc26pFk7Fd17+ykH6rEdLeQ
    
    ... omitted
    
    UfrYj7+p03yVflrsB+nyuPETnRJx41b01GrwJk+75v5EIg8U71PQDWfy1qOrUk/d
    9u25iaVRzi6DFM0ppE76Lh72SKy+m0iEZIXWbV9q6vf46Oz1PrtffAzyi4pyJbe/
    ypQ53f0CgYEA7rE6Dh0tG7EnYfFYrnHLXFC2aVtnkfCMIZX/VIZPX82VGB1mV43G
    qTDQ/ax1tit6RHDBk7VU4Xn545Tgj1z6agYPvHtkhxYTq50xVBXr/xwlMnzUZ9s3
    VjGpMYQANm2seleV6/si54mT4TkUyB7jMgWdFsewtwF60quvxmiA9RU=
    -----END RSA PRIVATE KEY-----
    
  3. Copy the key. You will add it to the connector configuration. Copy only the part of the key between --BEGIN RSA PRIVATE KEY-- and --END RSA PRVATE KEY--). You can do this manually or you can use the following command:

    grep -v "BEGIN RSA PRIVATE KEY" snowflake_key.pem | grep -v "END RSA PRIVATE KEY"|tr -d '\r\n'
    
  4. Save the key to use later when you are completing the Quick Start steps. Or, you can complete the previous step when you actually need to get the key for the connector config.

Quick Start

Use this quick start to get up and running with the Confluent Cloud Snowflake sink connector. The quick start provides the basics of selecting the connector and configuring it to consume data from Kafka and persist the data to a Snowflake database.

Prerequisites
  • Authorized access to a Confluent Cloud cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud Platform (GCP).
  • The Confluent Cloud CLI installed and configured for the cluster. See Install and Configure the Confluent Cloud CLI
  • Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
  • A Snowflake account and key pair to use for connector authentication with the Snowflake database.
  • The user created must be granted privileges in Snowflake to modify the database and schema. For more information, see Access Control Privileges.
  • The Snowflake database and the Kafka cluster should be in the same region.
  • Kafka cluster credentials. You can use one of the following ways to get credentials:
    • Create a Confluent Cloud API key and secret. To create a key and secret, go to Kafka API keys in your cluster or you can autogenerate the API key and secret directly in the UI when setting up the connector.
    • Create a Confluent Cloud service account for the connector.

Using the Confluent Cloud GUI

Step 1: Launch your Confluent Cloud cluster.

See the Quick Start for Apache Kafka using Confluent Cloud for installation instructions.

Step 2: Add a connector.

Click Connectors. If you already have connectors in your cluster, click Add connector.

Step 3: Select your connector.

Click the Snowflake Sink connector icon.

Snowflake Sink Connector Icon

Step 4: Set up the connection.

Complete the following and click Continue.

Note

  • Make sure you have all your prerequisites completed.
  • An asterisk ( * ) designates a required entry.
  1. Select one or more topics.

  2. Enter a connector name.

  3. Enter your Kafka Cluster credentials. The credentials are either the API key and secret or the service account API key and secret.

  4. Select an Input message format (data coming from the Kafka topic): AVRO, JSON_SR (JSON Schema), PROTOBUF, or JSON (schemaless). A valid schema must be available in Schema Registry to use a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).

  5. Enter the Snowflake connection details:

    • Connection URL: Enter the URL for accessing your Snowflake account. Use the format https://<account_name>.<region_id>.snowflakecomputing.com:443. The https:// and 443 port number are optional. Do not use the region ID if your account is in the AWS US West region and you are using AWS PrivateLink.
    • Connection user name: Enter the user name created earlier.
    • Private key:: Enter the private key created earlier as a single line. Enter only the part of the key between --BEGIN RSA PRIVATE KEY-- and --END RSA PRVATE KEY--.
    • Database name: Enter the database name containing the table to insert rows into.

  6. Enter the Snowflake Schema name that contains the table to insert rows into.

  7. (Optional) Enter the private key passphrase. This is required if you created an encrypted key when generating the key pair.

  8. (Optional) Select whether or not to include the following metadata in the RECORD_METADATA column in the database table.

    • createtime: If this value is set to false, the CreateTime property value is omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • topic: If this value is set to false, the topic property value is omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • offset and partition: If this value is set to false, the Offset and Partition property values are omitted from the metadata in the RECORD_METADATA column. The default value is true.
    • all metadata: If this value is set to false, the metadata in the RECORD_METADATA column is completely empty. The default value is true.

    For details about metadata, see Schema of Topics in the Snowflake documentation.

  9. Enter the number of tasks for the connector. Refer to Confluent Cloud connector limitations for additional information.

    Note

    Configuration properties that are not listed use the default values. For default values and property definitions, see the Snowflake Sink Connector Configuration Properties.

Step 5: Launch the connector.

Verify the connection details and click Launch.

Launch the connector

Step 6: Check the connector status.

The status for the connector should go from Provisioning to Running. It may take a few minutes.

Check the connector status

Step 7: Check Snowflake

After the connector is running, verify that messages are populating your Snowflake database table.

Tip

When you launch a connector, a Dead Letter Queue topic is automatically created. See Confluent Cloud Dead Letter Queue for details.

For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.

Note

  • The Snowflake Sink connector does not remove Snowflake pipes when a connector is deleted. For instructions to manually clean up Snowflake pipes, see Dropping Pipes.
  • Snowflake Snowpipe failure can prevent messages from showing up in the target table despite being successfully written by the Snowflake Sink connector. If this happens, please check the Snowflake COPY_HISTORY view, internal stage, or table stage to find the message and associated error. For more on the workflow of Snowflake Sink connector, see Workflow for the Kafka Connector.

For additional information about this connector, see the Snowflake Connector for Kafka documentation. Note that not all connector features are provided in the Confluent Cloud connector.

See also

For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent Cloud CLI to manage your resources in Confluent Cloud.

../_images/topology.png

Using the Confluent Cloud CLI

Complete the following steps to set up and run the connector using the Confluent Cloud CLI.

Note

Make sure you have all your prerequisites completed.

Step 1: List the available connectors.

Enter the following command to list available connectors:

ccloud connector-catalog list

Step 2: Show the required connector configuration properties.

Enter the following command to show the required connector properties:

ccloud connector-catalog describe <connector-catalog-name>

For example:

ccloud connector-catalog describe SnowflakeSink

Example output:

Following are the required configs:
connector.class: SnowflakeSink
name
kafka.api.key
kafka.api.secret
input.data.format
snowflake.url.name
snowflake.user.name
snowflake.private.key
snowflake.schema.name
tasks.max
topics

Step 3: Create the connector configuration file.

Create a JSON file that contains the connector configuration properties. The following example shows the required connector properties.

{
  "connector.class": "SnowflakeSink",
  "name": "<connector-name>",
  "kafka.api.key": "<my-kafka-api-key>",
  "kafka.api.secret": "<my-kafka-api-secret>",
  "topics": "<topic1>, <topic2>",
  "input.data.format": "JSON",
  "snowflake.url.name": "https://wm83168.us-central1.gcp.snowflakecomputing.com:443",
  "snowflake.user.name": "<login-username>",
  "snowflake.private.key": "<private-key>",
  "snowflake.database.name": "<database-name>",
  "snowflake.schema.name": "<schema-name>",
  "tasks.max": "1"
}

Note the following required property definitions:

  • "connector.class": Identifies the connector plugin name.
  • "name": Enter a name for your connector.
  • "topics": Enter one topic or multiple comma-separated topics.
  • "input.data.format": Sets the input message format (data coming from the Kafka topic). Valid entries are AVRO, JSON_SR, PROTOBUF, or JSON. You must have Confluent Cloud Schema Registry configured if using a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
  • "snowflake.url.name": Enter the URL for accessing your Snowflake account. Use the format https://<account_name>.<region_id>.snowflakecomputing.com:443. The https:// and 443 port number are optional. Do not use the region ID if your account is in the AWS US West region and you are using AWS PrivateLink.
  • "snowflake.user.name": Enter the user name created earlier.
  • "snowflake.private.key":
    • Enter the private key created earlier as a single line.
    • Enter only the part of the key between --BEGIN RSA PRIVATE KEY-- and --END RSA PRVATE KEY--.
  • "snowflake.database.name": Enter the database name containing the table to insert rows into.
  • "snowflake.schema.name": Enter the Snowflake Schema name that contains the table to insert rows into.
  • "tasks.max": Enter the number of tasks for the connector. Refer to Confluent Cloud connector limitations for additional information.

The following are optional properties to include in the configuration. These properties affect what metadata is included in the RECORD_METADATA column in the Snowflake database table.

  • "snowflake.metadata.createtime": If this value is set to "false", the CreateTime property value is omitted from the metadata in the RECORD_METADATA column. The default value is "true".
  • "snowflake.metadata.topic": If this value is set to "false", the topic property value is omitted from the metadata in the RECORD_METADATA column. The default value is "true".
  • "snowflake.metadata.offset.and.partition": If value is set to "false", the Offset and Partition property values are omitted from the metadata in the RECORD_METADATA column. The default value is "true".
  • "snowflake.metadata.all": If value is set to "false", the metadata in the RECORD_METADATA column is completely empty. The default value is "true".

Note

Configuration properties that are not listed use the default values. For default values and property definitions, see Snowflake Sink Connector Configuration Properties.

Step 4: Load the properties file and create the connector.

Enter the following command to load the configuration and start the connector:

ccloud connector create --config <file-name>.json

For example:

ccloud connector create --config snowflake-sink.json

Example output:

Created connector confluent-snowflake lcc-ix4dl

Step 5: Check the connector status.

Enter the following command to check the connector status:

ccloud connector list

Example output:

ID          |            Name         | Status  | Type
+-----------+-------------------------+---------+------+
lcc-ix4dl   | confluent-snowflake     | RUNNING | sink

Step 6: Check Snowflake

After the connector is running, verify that records are populating your Snowflake database.

Tip

When you launch a connector, a Dead Letter Queue topic is automatically created. See Confluent Cloud Dead Letter Queue for details.

For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.

Note

  • The Snowflake Sink connector does not remove Snowflake pipes when a connector is deleted. For instructions to manually clean up Snowflake pipes, see Dropping Pipes.
  • Snowflake Snowpipe failure can prevent messages from showing up in the target table despite being successfully written by the Snowflake Sink connector. If this happens, please check the Snowflake COPY_HISTORY view, internal stage, or table stage to find the message and associated error. For more on the workflow of Snowflake Sink connector, see Workflow for the Kafka Connector.

For additional information about this connector, see the Snowflake Connector for Kafka documentation. Note that not all connector features are provided in the Confluent Cloud connector.

Troubleshooting

For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.

Tip

When you launch a connector, a Dead Letter Queue topic is automatically created. See Confluent Cloud Dead Letter Queue for details.

Suggested Reading

The following blog post provides an introduction to the Snowflake Sink connector and a scenario walkthrough.

Blog post: Announcing the Snowflake Sink Connector for Apache Kafka in Confluent Cloud

Next Steps

See also

For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent Cloud CLI to manage your resources in Confluent Cloud.

../_images/topology.png