Storage with Tableflow in Confluent Cloud
Apache Iceberg™ and Delta Lake tables managed by Tableflow are stored in Confluent Managed Storage or in a custom storage provider, which is referred to as “external object storage”, “Bring Your Own Storage”, or “Bring Your Own Bucket”.
Confluent Managed Storage
Tableflow can store your Iceberg tables in Confluent Managed Storage. Confluent Managed Storage is Confluent’s’ “batteries included” storage option for Tableflow. There are no additional configurations required to use Confluent Managed Storage with Tableflow. Access to Confluent Managed Storage and your Tableflow-enabled Iceberg tables is controlled by your Confluent Cloud Access Controls Confluent Cloud Access Controls.
Encryption with self-managed keys
For clusters with self-managed encryption keys (BYOK), Tableflow with Confluent Managed Storage automatically inherits the same encryption key. For detailed information about encryption behavior, setup workflows, and validation procedures, see Use self-managed encryption keys with Tableflow.
To access tables stored in Confluent Managed Storage, you must use a query engine that supports the Iceberg REST API and vended credentials. Apache Spark® and Trino are query engines that work with Confluent Managed Storage.
Tableflow tables that use Confluent Managed Storage are not compatible with external catalogs, like AWS Glue.
Bring Your Own Storage (BYOS)
The Bring Your Own Storage (BYOS) feature in Tableflow enables you to leverage your existing storage solutions, such as Amazon S3 Buckets or Azure Data Lake Storage Gen2, for managing data. This flexibility ensures that you can maintain control over your data storage while benefiting from Tableflow’s powerful data management capabilities.
To use BYOS, Tableflow requires a provider integration to be configured at the environment level of the topic you’re enabling Tableflow on. Multiple Tableflow-enabled topics can use the same storage and provider integration. Your storage must be located in the same region as your Kafka cluster.
Important
You should start with an empty bucket when you first enable Tableflow. Existing objects in the bucket may cause Tableflow to fail to start or may be lost entirely during initialization.
Do not directly modify or delete objects from this bucket. Doing so may lead to table corruption.
Amazon S3 storage encryption
If your Amazon S3 destination storage uses self-managed encryption keys (SSE-KMS or DSSE-KMS), see Use self-managed encryption keys with Tableflow for KMS permission requirements and setup procedures.
Once you have enabled Tableflow on a topic with an Amazon S3 bucket as the storage, you can’t update that Tableflow-enabled topic to use another bucket or to use Confluent Managed Storage.
Azure storage encryption
Tableflow works with Azure Storage with service-side encryption (SSE). This can be utilized transparently and without any additional configurations in Tableflow. For more information, see Azure Storage encryption for data at rest.