Tableflow Quick Start Using Your Storage and AWS Glue in Confluent Cloud¶
Confluent Tableflow enables exposing Apache Kafka® topics as Apache Iceberg™ tables.
In this quick start, you perform the following steps:
- Step 1: Create a topic and publish data
- Step 2: Configure your storage bucket
- Step 3: Enable Tableflow on your topic
- Step 4: Set up access to the Iceberg REST Catalog
- Step 5: Query Iceberg tables
- Step 6: Query data with other analytics engines (optional)
Prerequisites¶
- DeveloperRead or higher access for the service account or user account. For more information, see Manage RBAC role bindings on Confluent Cloud.
Step 1: Create a topic and publish data¶
In this step, you create a stock-trades
topic by using Confluent Cloud Console.
Click Add topic, provide the topic name, and create it with default settings.
You can skip defining a contract.
Publish data to the stock-trades
topic by using the Datagen Source
connector with the Stock Trades
data set. When you configure the
Datagen connector, click Additional configuration and proceed through the
provisioning workflow. When you reach the Configuration step, in the
Select output record value format, select Avro. Click Continue and
keep the default settings. For more information, see
Datagen Source Connector Quick Start.
Step 2: Configure your storage bucket¶
Before you materialize your Kafka topic as an Iceberg table, you must configure the storage bucket where the materialized Iceberg tables are stored.
#. Create an S3 bucket in your preferred AWS account. For this guide, name the bucket tableflow-quickstart-storage.
In your Confluent Cloud environment, navigate to the Provider Integrations tab to create a provider integration for your S3 bucket.
Follow the steps in Provider Integration Quick Start to create a new provider integration to your S3 bucket. In the Select Confluent Resource dropdown, select Tableflow S3 bucket.
After you create the provider integration, Confluent Cloud can to access your S3 bucket and write materialized data into it.
Step 3: Enable Tableflow on your topic¶
With the provider integration configured, you can enable Tableflow on your Kafka topic to materialize it as an Iceberg table in the storage bucket that you created in Step 2.
- Navigate to your stock-trades topic and click Enable Tableflow.
- In the Enable Tableflow dialog, click Configure custom storage.
- In the Choose where to store your Tableflow data section, click Store in your bucket.
- In the Provider integration dropdown, select the provider integration that you created in Step 2. Provide the name of the storage bucket that you created, which in this guide is tableflow-quickstart-storage.
- Click Continue to review the configuration and launch Tableflow.
Materializing a newly created topic as an Iceberg table can take a few minutes.
For low-throughput topics in which Kafka segments have not been filled, Tableflow tries optimistically to publish data every 15 minutes. This is best-effort and not guaranteed.
Step 4: Set up access to the Iceberg REST Catalog¶
Follow the steps in Integrate Tableflow with the AWS Glue Catalog to configure AWS Glue Data Catalog as a catalog integration.
After configuring the catalog integration, the stock-trades
Iceberg table
and a database are created automatically in the AWS Glue Data Catalog. The
database name is based on your Kafka cluster ID.
It can take a few minutes to publish Tableflow Iceberg tables to the AWS Glue Data Catalog.
Step 5: Query Iceberg tables¶
You can use any AWS Glue and Iceberg-compatible compute engine to query the table. In this example, you use Amazon Athena SQL to query the table.
Follow the steps in Amazon Athena SQL to start
writing queries for the stock-trades
table.
Step 6: Query data with other analytics engines (optional)¶
Explore other integration options for using Tableflow with analytics engines: