External Tables in Confluent Cloud for Apache Flink

Confluent Cloud for Apache Flink® supports external tables so you can enrich fast-moving streaming data with slowly changing reference data held in an external system, such as a key-value store, full-text index, or vector database. External tables are read-only and are queried with a lookup join that runs a per-row search against the external system. This pattern is also known as stream enrichment, external table lookup, or a lateral table-valued function join.

What is an external table?

An external table is a Confluent Cloud for Apache Flink table whose data lives in an external system rather than in an Apache Kafka® topic. You define an external table the same way you define any other table, by using CREATE TABLE with a connector that targets the external system, and a CREATE CONNECTION resource that holds the connection details and credentials.

External tables are read-only. You query them from your streaming pipeline but do not write to them. Each query against an external table runs as a per-row lookup, returning the matching rows from the external system as an array that you join against the input stream.

Search types and supported providers

Confluent Cloud for Apache Flink supports three search types against external tables. Each search type maps to a built-in search function:

Lookup join syntax

You query an external table with a lateral join against one of the three search functions. The following example runs a key lookup against an external customers_ext table to enrich an orders stream:

SELECT o.*, lookup.*
FROM orders AS o,
     LATERAL TABLE(KEY_SEARCH_AGG(customers_ext, DESCRIPTOR(o.customer_id), id))
     AS lookup

For text and vector search, swap in the corresponding function and pass the additional <limit> argument. For full syntax and configuration options, see Search Functions. For the join semantics, see Lookup Joins.

Setting up an external table

To make an external table available in your pipeline:

  1. Use CREATE CONNECTION to create a resource that holds the endpoint and credentials for the external system.

  2. Run CREATE TABLE with the appropriate connector and a reference to the connection.

  3. Query the external table from your pipeline with a lateral join against one of the three search functions.

For provider-specific configuration and end-to-end examples, see:

Operational considerations

External calls in stream processing enable enrichment and AI orchestration use cases, but they introduce dependencies on systems outside of Confluent Cloud for Apache Flink. Plan for the following:

  • Determinism. Because the external system can change between lookups, results from an external table are not deterministic over time. Reprocessing the same input stream can produce different output if the external system has been updated. See Determinism in Continuous Queries for a deeper discussion.

  • Latency and throughput. Each row in the probe stream triggers a lookup against the external system. Tune the asynchronous execution, client timeout, parallelism, and retry settings on the search function to match the latency budget of your pipeline and the capacity of the external system.

  • Replay traffic. A pipeline that reprocesses historical data issues all lookups again, which can overwhelm the external system. Consider rate limiting or pre-materializing the reference data into Apache Kafka® if reprocessing volume is high.

  • Private networking. External tables can reach systems on private networks. See Private networking with Flink for the supported networking topologies.