Configure Storage for Tableflow in Confluent Cloud

Apache Iceberg™ tables managed by Tableflow are stored in Confluent Managed Storage or in an custom storage provider.

For more information, see Storage with Tableflow.

Use Confluent Managed Storage

Confluent Managed Storage is the default storage option for Tableflow. There are no additional configurations required to use Confluent Managed Storage with Tableflow. Simply enable Tableflow on your topic.

To access tables stored in Confluent Managed Storage, use a query engine that supports the Iceberg REST API and vended credentials.

Cleaning up tables stored in Confluent Managed Storage

Table data for topics that have had Tableflow disabled is deleted automatically within a few days. Tableflow doesn’t have user-facing controls to immediately delete tables that are stored in Confluent Managed Storage. To have these tables deleted before automatic deletion is triggered, you must file a support request.

Bring Your Own Storage (BYOS)

The following steps show how to use Tableflow with S3 for storage.

Important

You should start with an empty bucket when you first enable Tableflow. Existing objects in the bucket may cause Tableflow to fail to start or may be lost entirely during initialization.

  1. In AWS, create a bucket in S3.

  2. Set up a Confluent Cloud provider integration for Tableflow. In the trust policy, specify the bucket you created previously.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucketMultipartUploads"
          ],
          "Resource": [
            "arn:aws:s3:::<bucket-name>"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "s3:PutObject",
            "s3:PutObjectTagging",
            "s3:GetObject",
            "s3:DeleteObject",
            "s3:AbortMultipartUpload",
            "s3:ListMultipartUploadParts"
          ],
          "Resource": [
            "arn:aws:s3:::<bucket-name>/*"
          ]
        }
      ]
    }
    
  3. When you enable Tableflow, specify the provider integration and bucket name.

For more information, see Tableflow Quick Start Using Your Storage.

Cleaning up Tableflow files in Amazon S3

Table data for topics that have had Tableflow disabled is deleted automatically within a few days. If you require your data to be deleted sooner, follow these steps to delete your table from Amazon S3 manually.

  1. Disable Tableflow on the topic that you’re deleting the table for.

  2. Identify the object prefix (folder) that is used for the table.

    • Tables stored in your bucket have the following prefix naming convention: s3://<bucket-name>/######/######/<UUID>/<env-id>/<lkc-id>/<version>/<UUID-of-table>/.

    • Option 1: If you’re using AWS Glue Data Catalog, find the location column which has a value that resembles: s3://<table-prefix>/metadata/<current-table-metadata>/. Note that deleting a table from the AWS Glue Data Catalog does not delete the table data itself.

    • Option 2: From a query engine that has access to the table, run the following query to find the table location.

      SELECT
        file_path
      FROM
        `[lkc ID]`.`[table name]`.files                  -- Spark
      --`[lkc ID]`.`[table name]$files`                  -- Trino
      --"AwsDataCatalog"."[lkc ID]"."[table name]$files" -- Athena
      LIMIT 10;
      
  3. Delete all objects that have the matching object prefix (folder) in the table.

    Screenshot of the workflow to delete a Tableflow-created object in S3
  4. If you’re using AWS Glue Data Catalog, delete the table entry from the catalog.