Query Iceberg Tables with Trino and Tableflow in Confluent Cloud¶
Trino is an open-source query engine that ou can use to query your Tableflow tables. Trino can query your Apache Iceberg™ tables by connecting directly to the Tableflow Catalog via the Iceberg REST API.
In this example, you launch Trino in a Docker container on your local machine. For instructions about running Trino in a Docker container, see Trino in a Docker container.
In production, you should host your Trino server instance in the cloud-region where your Tableflow tables are stored, to decrease latency and reduce data egress fees.
Prerequisites¶
- A Kafka topic with Tableflow enabled. For more information, see the Tableflow Quick Start <cloud-tableflow-quick-start-managed-storage>
- A Confluent API Key with Tableflow permissions.
- Docker is installed and running in your development environment.
Step 1: Create a catalog configuration file¶
Create a directory named catalog.
mkdir catalog
In the catalog directory, create a file named tableflow.properties.
touch catalog/tableflow.properties
Copy the following property settings into tableflow.properties.
# tableflow.properties connector.name=iceberg iceberg.catalog.type=rest iceberg.rest-catalog.oauth2.credential=<api-key>:<api-secret> iceberg.rest-catalog.security=OAUTH2 iceberg.rest-catalog.uri=<Tableflow REST Catalog URI> # REST Catalog URI Example: https://tableflow.{CLOUD_REGION}.aws.confluent.cloud/iceberg/catalog/organizations/{ORD_ID}/environments/{ENV_ID} iceberg.rest-catalog.vended-credentials-enabled=true fs.native-s3.enabled=true s3.region=<Confluent Cluster Region> # S3 Region Example: us-west-2 iceberg.security=read_only
Step 2: Start a Trino container¶
In your terminal, change into the catalog directory.
cd catalog
Run the following command to start the Trino container with these options.
--name
: Assign a name to the container.-d
: Run the container in the background.-p
: Publish the container’s port to the host.--volume
: Bind mount a volume.
docker run --name trino -d -p 8080:8080 --volume $PWD:/etc/trino/catalog trinodb/trino
Run the following command to check that your Trino container is running.
docker ps
Step 3: Query your Tableflow table data¶
In your terminal, run the following command to start the Trino CLI client.
docker exec -it trino trino Your terminal prompt should now appear as **trino>**.
Run the following Trino SQL query to query your Tableflow table data. Use your values for these options.
- tableflow.: The name of your catalog properties file
- kafka-cluster-id: Your Kafka cluster ID, which resembles lck-xxxxxx. Also, you can use your Kafka cluster name.
- <table-name>: Your Kafka topic name
SELECT * FROM tableflow."<kafka-cluster-id>".<table-name>