Monitor Tableflow in Confluent Cloud

You start Tableflow by enabling it on a Apache Kafka® topic. Once enabled, you can monitor its status and progress by using the Confluent Cloud Console. Also, you can set up integrations with monitoring services like Prometheus and Datadog.

Monitor Tableflow in Cloud Console

  1. Log in to the Cloud Console.

  2. Navigate to the cluster that has your Tableflow enabled topic.

  3. Select the Topics tab, and under Show/hide columns, ensure Tableflow is selected.

    The topics that have Tableflow enabled and the syncing status of Tableflow-enabled topics are displayed. Ensure that Tableflow is enabled on at least 1 topic.

  4. Select a Tableflow-enabled topic.

    The Tableflow status under the topic name displays. Also, important Tableflow information is shown as cards in the Tableflow section, under the Kafka stats section.

  5. To see cluster-level metrics, including storage details, click the Tableflow tab.

Tableflow status

Once Tableflow is enabled on a topic, Tableflow emits a general status.

  • Config Issue: There was a problem with the specified configurations. More details about the issue are shown in an additional error field.

  • Degraded: There is an internal issue with Tableflow. More details about the issue are shown in an additional error field.

  • Pending: Tableflow is enabling for the first time and checking configurations.

  • Running/Syncing: Tableflow is successfully syncing incoming Kafka data to the Apache Iceberg™ table.

Metrics in Cloud Console

Tableflow emits the following metrics over the Confluent Cloud Metrics API.

Bytes Compacted

Metric name: bytes_compacted The number of bytes compacted.

Bytes Processed

Metric name: bytes_processed The number of bytes processed by Tableflow. This value includes the bytes of Kafka data read before materialization and bytes read as part of compaction.

Bytes Written

Metric name: bytes_added The number of bytes written to the table by Tableflow.

Compaction Duration

Metric name: compaction_duration The amount of time taken for compaction.

Compactions Pending

Metric name: compactions_pending The number of pending compactions.

Confluent Managed Storage used

Metric name: storage The amount of data managed by Confluent-owned buckets for Tableflow.

Files Compacted

Metric name: files_compacted The total number of files that have been used for compaction by Tableflow.

Kafka Bytes Read

Metric name: bytes_read The number of bytes read by Tableflow.

Kafka Rows Read

Metric name: rows_read The number of records read for a Tableflow enabled topic.

Lastest Kafka Topic Offset

Metric name: kafka_topic_offset The latest offset of the read record from Kafka for a Tableflow-enabled topic.

Latest Table Offset

Metric name: table_offset The latest offset of the persisted record for a Tableflow-enabled topic.

Number of Topics

Metric name: num_topics The total number of Tableflow-enabled topics.

Number of Rejected Rows

Metric name: rows_skipped The total number of Kafka records that were not able to be processed and added to the table by Tableflow.

Rows Written

Metric name: rows_added The total number of rows that have been committed to the table by Tableflow.

Snapshots Generated

Metric name: snapshots_generated The total number of Iceberg snapshots generated to the table by Tableflow.

Best Practices for alerting

Use the Metrics API to monitor your Tableflow-enabled topics over time. You should monitor and configure alerts for the following conditions:

  • Per topic

    • Alert on Tableflow status changes.

    • Alert on unexpected increases in the number of rows rejected.

  • Per Cluster

    • Alert on unexpected changes to storage used.

Error and recovery

If Tableflow encounters an issue, its status switches to Config Issue or Degraded.

Tableflow enters the Suspend state due to a recoverable failure

Tableflow enters the Suspend state when it encounters an error that has a high chance of requiring user intervention, or when materialization fails and the Failure mode is set to Suspend. Tableflow remains in the Suspend state until manually resumed by sending a PATCH request to the API with the spec.suspend property set to true.

Recovery after unrecoverable failure

There are situations in which Tableflow is unable to write new data to an existing table, for example, when your table objects are corrupted or missing, or you make a table-breaking change to your schema used on incoming messages.

First, back up your data by using a query engine to copy your data to a new table, or by copying the objects directly from your storage to another location. It’s important to copy your data to a location outside the location where Tableflow is managing the table, to ensure it doesn’t get deleted as part of Tableflow cleaning up storage.

You can then disable Tableflow on the topic. Tableflow automatically removes the associated table files within a few days. If you need to re-enable Tableflow on this topic sooner, contact support