Monitor Tableflow in Confluent Cloud¶
You start Tableflow by enabling it on a Apache Kafka® topic. Once enabled, you can monitor its status and progress by using the Confluent Cloud Console. Also, you can set up integrations with monitoring services like Prometheus and Datadog.
Monitor Tableflow in Cloud Console¶
Log in to the Cloud Console.
Navigate to the cluster that has your Tableflow enabled topic.
Select the Topics tab, and under Show/hide columns, ensure Tableflow is selected.
The topics that have Tableflow enabled and the syncing status of Tableflow-enabled topics are displayed. Ensure that Tableflow is enabled on at least 1 topic.
Select a Tableflow-enabled topic.
The Tableflow status under the topic name displays. Also, important Tableflow information is shown as cards in the Tableflow section, under the Kafka stats section.
To see cluster-level metrics, including storage details, click the Tableflow tab.
Tableflow status¶
Once Tableflow is enabled on a topic, Tableflow emits a general status.
- Config Issue: There was a problem with the specified configurations. More details about the issue are shown in an additional error field.
- Degraded: There is an internal issue with Tableflow. More details about the issue are shown in an additional error field.
- Pending: Tableflow is enabling for the first time and checking configurations.
- Running/Syncing: Tableflow is successfully syncing incoming Kafka data to the Apache Iceberg™ table.
Metrics in Cloud Console¶
Tableflow emits the following metrics over the Confluent Cloud Metrics API.
- Bytes Compacted
- Metric name:
bytes_compacted
The number of bytes compacted. - Bytes Processed
- Metric name:
bytes_processed
The number of bytes processed by Tableflow. This value includes the bytes of Kafka data read before materialization and bytes read as part of compaction. - Bytes Written
- Metric name:
bytes_added
The number of bytes written to the table by Tableflow. - Compaction Duration
- Metric name:
compaction_duration
The amount of time taken for compaction. - Compactions Pending
- Metric name:
compactions_pending
The number of pending compactions. - Confluent Managed Storage used
- Metric name:
storage
The amount of data managed by Confluent-owned buckets for Tableflow. - Files Compacted
- Metric name:
files_compacted
The total number of files that have been used for compaction by Tableflow. - Kafka Bytes Read
- Metric name:
bytes_read
The number of bytes read by Tableflow. - Kafka Rows Read
- Metric name:
rows_read
The number of records read for a Tableflow enabled topic. - Lastest Kafka Topic Offset
- Metric name:
kafka_topic_offset
The latest offset of the read record from Kafka for a Tableflow-enabled topic. - Latest Table Offset
- Metric name:
table_offset
The latest offset of the persisted record for a Tableflow-enabled topic. - Number of Topics
- Metric name:
num_topics
The total number of Tableflow-enabled topics. - Number of Rejected Rows
- Metric name:
rows_skipped
The total number of Kafka records that were not able to be processed and added to the table by Tableflow. - Rows Written
- Metric name:
rows_added
The total number of rows that have been committed to the table by Tableflow. - Snapshots Generated
- Metric name:
snapshots_generated
The total number of Iceberg snapshots generated to the table by Tableflow.
Materialization lag¶
Materialization lag is the offset delay between the most recent message offset
in Kafka and the most recent message offset of what Tableflow has added to
your table. Tableflow doesn’t emit this metric directly, but it can be
calculated by calculating the difference between kafka_topic_offset
and
table_offset
. Both of these metrics are emitted at a per-partition level.
Best Practices for alerting¶
Use the Metrics API to monitor your Tableflow-enabled topics over time. You should monitor and configure alerts for the following conditions:
- Per partition
- Alert on large increases in materialization lag.
- Per topic
- Alert on Tableflow status changes.
- Alert on unexpected increases in the number of rows rejected.
- Per Cluster
- Alert on unexpected changes to storage used.
Error and recovery¶
If Tableflow encounters an issue, its status switches to Config Issue
or
Degraded
.
Tableflow enters the Suspend
state due to a recoverable failure¶
Tableflow enters the Suspend
state when it encounters an error that has
a high chance of requiring user intervention, or when materialization fails
and the Failure
mode is set to Suspend
. Tableflow remains in the
Suspend
state until manually resumed by sending a PATCH request to the API
with the spec.suspend
property set to true
.
Recovery after unrecoverable failure¶
There are situations in which Tableflow is unable to write new data to an existing table, for example, when your table objects are corrupted or missing, or you make a table-breaking change to your schema used on incoming messages.
First, backup your data by using a query engine to copy your data to a new table, or by copying the objects directly from your storage to another location. It’s important to copy your data to a location outside the location where Tableflow is managing the table, to ensure it doesn’t get deleted as part of Tableflow cleaning up storage.
After backing up your data, the best way to recover is to disable and then re-enable Tableflow. When Tableflow is enabled or re-enabled, it creates a new table and starts processing data from the beginning of the topic, not from where it previously left off. This means that if you encountered an error due to a breaking schema change, you must delete or expire all the Kafka messages up to the point of the change, so Tableflow can initialize on the new schema.