Integrate Catalogs with Tableflow in Confluent Cloud

An Apache Iceberg™ catalog manages metadata for Iceberg tables. It provides an abstraction layer for accessing table metadata, like schema, partitioning, and file locations, which enables analytics engines to interact with data seamlessly. Iceberg catalogs can be implemented using various backends, such as Hive, AWS Glue, or custom REST APIs, to suit different use cases.

Tableflow offers a built-in Iceberg REST catalog service that enables you to access tables created by Tableflow. If you’re using external catalog services, like AWS Glue Data Catalog, you can synchronize catalog updates by setting up a catalog integration within Tableflow.

The following diagram shows the interactions between Tableflow and the catalog services.

Tableflow interactions with catalog services

Tableflow Iceberg REST Catalog

Tableflow features an integrated Iceberg REST catalog (IRC), which enables seamless connections to any analytics or compute engine that supports the Iceberg REST catalog API.

Note

The Tableflow Iceberg REST Catalog doesn’t support credential vending for customer-managed storage buckets, so when you use Tableflow Catalog with your own storage, the analytics engine must have access to the storage location where you materialize Tableflow Iceberg tables.

To access the REST Catalog endpoint and obtain the necessary credentials, navigate to the Tableflow section in the Confluent Cloud Console. You can use this information within the analytics engine that supports the Iceberg REST catalog, enabling you to query Iceberg tables.

External catalog integration

Tableflow facilitates integration with external catalogs such as AWS Glue Data Catalog by synchronizing Iceberg table metadata with external catalog services. Tableflow catalog integration ensures that external catalogs have up-to-date and consistent metadata while maintaining the Tableflow Catalog as the single source of truth.

Key aspects of external catalog integration include:

  • Cluster-level integration: External catalogs can be integrated at the cluster level, allowing all topics materialized from the cluster to be published to the catalog service.
  • Metadata Synchronization: Tableflow synchronizes its metadata with the external catalog that is integrated with the Kafka cluster. When there’s a new update to a table, it’s immediately published in the external catalog.
  • Read-Only Tables: Iceberg tables exposed via external catalog synchronization are read-only. When working with Iceberg tables created by Tableflow, it’s essential to ensure that they are consumed as read-only tables.
  • Bring Your Own Storage (BYOS) support: Tableflow supports BYOS for all catalog integrations, providing flexibility in managing storage and infrastructure. External catalog integration is not supported with Tableflow Confluent-managed storage.

Note

Tableflow supports only one catalog integration per cluster.

For more information about specific catalog integrations, see: