Monitor Tableflow in Confluent Cloud

You start Tableflow by enabling it on a Apache Kafka® topic. Once enabled, you can monitor its status and progress by using the Confluent Cloud Console. Also, you can set up integrations with monitoring services like Prometheus and Datadog.

Monitor Tableflow in Cloud Console

  1. Log in to the Cloud Console.

  2. Navigate to the cluster that has your Tableflow enabled topic.

  3. Select the Topics tab, and under Show/hide columns, ensure Tableflow is selected.

    The topics that have Tableflow enabled and the syncing status of Tableflow-enabled topics are displayed. Ensure that Tableflow is enabled on at least 1 topic.

  4. Select a Tableflow-enabled topic.

    The Tableflow status under the topic name displays. Also, important Tableflow information is shown as cards in the Tableflow section, under the Kafka stats section.

  5. To see cluster-level metrics, including storage details, click the Tableflow tab.

Tableflow status

Once Tableflow is enabled on a topic, Tableflow emits a general status.

  • Config Issue: There was a problem with the specified configurations. More details about the issue are shown in an additional error field.
  • Degraded: There is an internal issue with Tableflow. More details about the issue are shown in an additional error field.
  • Pending: Tableflow is enabling for the first time and checking configurations.
  • Running/Syncing: Tableflow is successfully syncing incoming Kafka data to the Apache Iceberg™ table.

Metrics in Cloud Console

Tableflow emits the following metrics over the Confluent Cloud Metrics API.

Bytes Compacted
Metric name: bytes_compacted The number of bytes compacted.
Bytes Processed
Metric name: bytes_processed The number of bytes processed by Tableflow. This value includes the bytes of Kafka data read before materialization and bytes read as part of compaction.
Bytes Written
Metric name: bytes_added The number of bytes written to the table by Tableflow.
Compaction Duration
Metric name: compaction_duration The amount of time taken for compaction.
Compactions Pending
Metric name: compactions_pending The number of pending compactions.
Confluent Managed Storage used
Metric name: storage The amount of data managed by Confluent-owned buckets for Tableflow.
Files Compacted
Metric name: files_compacted The total number of files that have been used for compaction by Tableflow.
Kafka Bytes Read
Metric name: bytes_read The number of bytes read by Tableflow.
Kafka Rows Read
Metric name: rows_read The number of records read for a Tableflow enabled topic.
Lastest Kafka Topic Offset
Metric name: kafka_topic_offset The latest offset of the read record from Kafka for a Tableflow-enabled topic.
Latest Table Offset
Metric name: table_offset The latest offset of the persisted record for a Tableflow-enabled topic.
Number of Topics
Metric name: num_topics The total number of Tableflow-enabled topics.
Number of Rejected Rows
Metric name: rows_skipped The total number of Kafka records that were not able to be processed and added to the table by Tableflow.
Rows Written
Metric name: rows_added The total number of rows that have been committed to the table by Tableflow.
Snapshots Generated
Metric name: snapshots_generated The total number of Iceberg snapshots generated to the table by Tableflow.

Materialization lag

Materialization lag is the offset delay between the most recent message offset in Kafka and the most recent message offset of what Tableflow has added to your table. Tableflow doesn’t emit this metric directly, but it can be calculated by calculating the difference between kafka_topic_offset and table_offset. Both of these metrics are emitted at a per-partition level.

Best Practices for alerting

Use the Metrics API to monitor your Tableflow-enabled topics over time. You should monitor and configure alerts for the following conditions:

  • Per partition
  • Per topic
    • Alert on Tableflow status changes.
    • Alert on unexpected increases in the number of rows rejected.
  • Per Cluster
    • Alert on unexpected changes to storage used.

Error and recovery

If Tableflow encounters an issue, its status switches to Config Issue or Degraded.

Tableflow enters the Suspend state due to a recoverable failure

Tableflow enters the Suspend state when it encounters an error that has a high chance of requiring user intervention, or when materialization fails and the Failure mode is set to Suspend. Tableflow remains in the Suspend state until manually resumed by sending a PATCH request to the API with the spec.suspend property set to true.

Recovery after unrecoverable failure

There are situations in which Tableflow is unable to write new data to an existing table, for example, when your table objects are corrupted or missing, or you make a table-breaking change to your schema used on incoming messages.

First, backup your data by using a query engine to copy your data to a new table, or by copying the objects directly from your storage to another location. It’s important to copy your data to a location outside the location where Tableflow is managing the table, to ensure it doesn’t get deleted as part of Tableflow cleaning up storage.

After backing up your data, the best way to recover is to disable and then re-enable Tableflow. When Tableflow is enabled or re-enabled, it creates a new table and starts processing data from the beginning of the topic, not from where it previously left off. This means that if you encountered an error due to a breaking schema change, you must delete or expire all the Kafka messages up to the point of the change, so Tableflow can initialize on the new schema.