Query Iceberg Tables with Trino and Tableflow in Confluent Cloud

Trino is an open-source query engine that ou can use to query your Tableflow tables. Trino can query your Apache Iceberg™ tables by connecting directly to the Tableflow Catalog via the Iceberg REST API.

In this example, you launch Trino in a Docker container on your local machine. For instructions about running Trino in a Docker container, see Trino in a Docker container.

In production, you should host your Trino server instance in the cloud-region where your Tableflow tables are stored, to decrease latency and reduce data egress fees.

Prerequisites

  • A Kafka topic with Tableflow enabled. For more information, see the Tableflow Quick Start <cloud-tableflow-quick-start-managed-storage>
  • A Confluent API Key with Tableflow permissions.
  • Docker is installed and running in your development environment.

Step 1: Create a catalog configuration file

  1. Create a directory named catalog.

    mkdir catalog
    
  2. In the catalog directory, create a file named tableflow.properties.

    touch catalog/tableflow.properties
    
  3. Copy the following property settings into tableflow.properties.

    # tableflow.properties
    
    connector.name=iceberg
    iceberg.catalog.type=rest
    iceberg.rest-catalog.oauth2.credential=<api-key>:<api-secret>
    iceberg.rest-catalog.security=OAUTH2
    iceberg.rest-catalog.uri=<Tableflow REST Catalog URI>
    # REST Catalog URI Example: https://tableflow.{CLOUD_REGION}.aws.confluent.cloud/iceberg/catalog/organizations/{ORD_ID}/environments/{ENV_ID}
    iceberg.rest-catalog.vended-credentials-enabled=true
    
    fs.native-s3.enabled=true
    s3.region=<Confluent Cluster Region>
    # S3 Region Example: us-west-2
    
    iceberg.security=read_only
    

Step 2: Start a Trino container

  1. In your terminal, change into the catalog directory.

    cd catalog
    
  2. Run the following command to start the Trino container with these options.

    • --name: Assign a name to the container.
    • -d: Run the container in the background.
    • -p: Publish the container’s port to the host.
    • --volume: Bind mount a volume.
    docker run --name trino -d -p 8080:8080 --volume $PWD:/etc/trino/catalog trinodb/trino
    
  3. Run the following command to check that your Trino container is running.

    docker ps
    

Step 3: Query your Tableflow table data

  1. In your terminal, run the following command to start the Trino CLI client.

      docker exec -it trino trino
    
    Your terminal prompt should now appear as **trino>**.
    
  2. Run the following Trino SQL query to query your Tableflow table data. Use your values for these options.

    • tableflow.: The name of your catalog properties file
    • kafka-cluster-id: Your Kafka cluster ID, which resembles lck-xxxxxx. Also, you can use your Kafka cluster name.
    • <table-name>: Your Kafka topic name
    SELECT * FROM tableflow."<kafka-cluster-id>".<table-name>