Integrate Tableflow with the AWS Glue Catalog in Confluent Cloud¶
Tableflow enables integrating with the AWS Glue Data Catalog as an external catalog, allowing the metadata of Apache Iceberg™ tables materialized by Tableflow to be published to AWS Glue. This makes the Iceberg tables accessible to any Iceberg-compatible query or compute engine that leverages the AWS Glue Data Catalog.
These tables must be consumed as read-only tables.
Integrating with the AWS Glue Data Catalog ensures that external catalogs maintain up-to-date and consistent metadata, while the Tableflow catalog remains the single source of truth.
The AWS Glue Data Catalog integrates with Tableflow at the cluster level, enabling the automatic publication of all Tableflow-enabled topics as tables within Glue. As shown in the following diagram, the database and table names map directly to their corresponding cluster ID and topic name, establishing a clear relationship between these elements.
Configure AWS Glue Catalog integration¶
The following steps show how to enable AWS Glue Catalog integration at the cluster level.
In the navigation menu, click Tableflow and navigate to the Catalog Integration section.
Click Add Integration.
Select AWS Glue as the catalog and provide a name to identify your catalog integration.
Navigate to the Provider Integrations tab of your environment, and follow the steps in provider integration to create a new provider integration to your S3 bucket. For the Confluent Resource, select Tableflow Glue Catalog sync.
After you create the provider integration, Confluent Cloud can access your AWS Glue Catalog and publish Iceberg table metadata pointers to it.
Select the provider integration in the catalog integration wizard and proceed to the next step.
Review the configuration and launch the catalog integration.
Configure read-only access¶
The Iceberg tables materialized by Tableflow should be read-only tables. The following steps show how to set the required permissions to AWS Glue and S3 buckets.
Open the AWS Glue Catalog console.
Find the Iceberg table that was published in the previous steps as an AWS Glue table.
- The cluster ID maps to the AWS Glue database name.
- The Kafka topic name maps to the AWS Glue table name.
You can query these tables from any analytics or compute engine that supports AWS Glue Data Catalog. For more information, see Query Data.
You must consume Tableflow Iceberg tables as read-only, ensuring that downstream analytics engines have read-only access to them.