Materialized Tables in Confluent Cloud for Apache Flink

A materialized table in Confluent Cloud for Apache Flink® is a persistent, declarative object that combines a table definition, a continuous query, and the ability to evolve the pipeline over time, all in a single manageable asset.

Use materialized tables for long-running streaming queries that act as incremental materialized views. For exploratory queries, snapshot (batch) processing, or interactive SQL in the workspace, use statements directly. For more information on when to use materialized tables compared with statements, refer to Schema and Statement Evolution.

Benefits of materialized tables

With materialized tables, you can evolve streaming pipelines in place without manual offset management or consumer migration. Traditionally, evolving a live streaming pipeline required stopping the running statement, creating a new table with the updated query, managing stream offsets to prevent data loss, and migrating downstream consumers to the new topic—a process that is error-prone and incompatible with modern CI/CD and GitOps practices.

You can evolve a materialized table in place by using the CREATE OR ALTER MATERIALIZED TABLE statement. Instead of managing ephemeral statements and performing manual offset carry-over, you manage a single asset and let Flink handle the migration automatically.

Compared with the traditional approach of CREATE TABLE combined with INSERT INTO and manual offset management, a materialized table provides:

A single declarative object that owns both the table definition and the continuous query.
Automated in-place migration when the query or schema changes.
Explicit control over data reprocessing through the START_MODE clause.

How materialized tables work

A materialized table is a persistent object backed by:

An Apache Kafka® topic that stores the query results.
A schema subject in Schema Registry that tracks the output schema.
A continuous query that populates the table.

The lifecycle of a materialized table follows this pattern:

Create: You run a CREATE MATERIALIZED TABLE statement with a SELECT query. Flink creates the backing topic, registers the schema, and starts the continuous query.
Run: The continuous query processes data from the source tables and writes results to the backing topic.
Evolve: You run a CREATE OR ALTER MATERIALIZED TABLE statement to change the query, add columns, or modify the pipeline logic. Flink stops the previous query and starts a new one with the updated definition, writing to the same output topic.
Drop: You run DROP MATERIALIZED TABLE to remove the object and clean up its resources.

You can also modify table properties, like watermarks, metadata columns, or table options, without triggering a full evolution by using ALTER MATERIALIZED TABLE.

The continuous query that powers the materialized table is fully managed by Confluent Cloud. You manage the lifecycle of the materialized table itself by using create, evolve, and drop operations, and Flink handles the underlying query execution automatically.

How to evolve materialized tables

The CREATE OR ALTER MATERIALIZED TABLE statement triggers an evolution, which performs an in-place migration: the new query writes results to the same output topic as the previous version. For a step-by-step walkthrough, refer to Evolve a Streaming Pipeline.

State handling

When you trigger an evolution, Flink discards all existing processing state, such as aggregation counts, join state, or window state. Flink builds a new state from scratch based on the reprocessing settings that you specify in the START_MODE clause.

This means that for stateful queries (such as GROUP BY aggregations), Flink recalculates the results by reprocessing the source data, not by migrating the previous state.

Controlling reprocessing with START_MODE

The START_MODE clause controls how much historical data Flink reprocesses when you create or evolve a materialized table. Options range from processing all available data (FROM_BEGINNING) to processing only new data (FROM_NOW), with several options in between.

The default is RESUME_OR_FROM_BEGINNING, which attempts to resume from the previous job’s position on evolution and falls back to processing all data on initial creation.

For the full list of START_MODE values and their behavior, refer to START_MODE.

Downstream consumer impact

Because evolutions discard the previous job’s state and rebuild from scratch, downstream consumers might observe specific behaviors depending on the changelog mode of the materialized table.

Upsert mode

If your new query filters out keys that previously existed in the output, for example, a stricter WHERE clause, Flink emits no delete message for these keys. The old records remain downstream as stale “zombie” data, because the new job has no memory of what the previous job produced.

Append-only mode

When reprocessing historical data, for example, using START_MODE =FROM_BEGINNING, the output topic contains duplicate records: both the original records from the previous job and the newly reprocessed records. Downstream consumers must implement deduplication logic if this is a concern.

Retract mode

The new job emits INSERT messages for the reprocessed data but does not emit DELETE messages for the old result set. Downstream consumers receive duplicates for valid data and no retractions for data that the system should remove.

Lifecycle operations

The continuous query that powers a materialized table goes through the same lifecycle states as a statement:

Pending: The statement has been submitted and Flink is preparing to start running the statement.
Running: Flink is actively running the statement.
Completed: The statement has completed all of its work.
Deleting: The statement is being deleted.
Failed: The statement has encountered an error and is no longer running.
Degraded: The statement appears unhealthy, for example, no transactions have been committed for a long time, or the statement has been restarting frequently.
Stopping: The statement is about to be stopped.
Stopped: The statement has been stopped and is no longer running.

To stop or resume a materialized table, use the REST API or the stopped attribute of the Confluent Terraform provider. Stopping a materialized table pauses the continuous query without dropping the table or its backing Kafka topic. Resuming restarts the query from the last committed offset without discarding processing state.

Limitations

The following limitations apply to materialized tables:

State is not preserved across evolutions. Flink discards all processing state and rebuilds it from scratch through reprocessing.
CREATE OR ALTER is not idempotent. Running the same CREATE OR ALTER command always triggers a new evolution, even if nothing changed.
No automatic change detection. The system does not automatically detect changes to upstream dependencies, such as schemas or UDFs. Evolution only happens when you explicitly run CREATE OR ALTER.
No statement set support. You can’t use materialized tables within statement sets.
Only net-new tables. You can’t convert existing tables to materialized tables. You must create a new materialized table.