External Tables in Confluent Cloud for Apache Flink

Confluent Cloud for Apache Flink® supports external tables so you can enrich fast-moving streaming data with slowly changing reference data held in an external system, such as a key-value store, full-text index, or vector database. External tables are read-only and are queried with a lookup join that runs a per-row search against the external system. This pattern is also known as stream enrichment, external table lookup, or a lateral table-valued function join.

What is an external table?

An external table is a Confluent Cloud for Apache Flink table whose data lives in an external system rather than in an Apache Kafka® topic. You define an external table the same way you define any other table, by using CREATE TABLE with a connector that targets the external system, and a CREATE CONNECTION resource that holds the connection details and credentials.

External tables are read-only. You query them from your streaming pipeline but do not write to them. Each query against an external table runs as a per-row lookup, returning the matching rows from the external system as an array that you join against the input stream.

Search types and supported providers

Confluent Cloud for Apache Flink supports three search types against external tables. Each search type maps to a built-in search function:

Key search

Run an exact key lookup against the external system. Use KEY_SEARCH_AGG to look up rows by a matching key value.

Provider Connector value
Confluent JDBC (currently supports Postgres, MySQL, SQL Server, Oracle)	`confluent-jdbc`
Couchbase	`couchbase`
MongoDB	`mongodb`
REST (supports any REST endpoint that uses JSON format)	`rest`

For more information, see Key Search with External Databases.

Text search

Run a full-text search against the external system. Use TEXT_SEARCH_AGG to retrieve rows whose indexed text matches an input string, ranked by the external system’s relevance scoring.

Provider	Connector value
Couchbase	`couchbase`
Elasticsearch	`elastic`
MongoDB	`mongodb`

For more information, see Text Search with External Databases.

Vector search

Run a semantic similarity search against the external system using vector embeddings. Use VECTOR_SEARCH_AGG to retrieve the nearest rows for an input embedding. Typically used together with AI_EMBEDDING to embed the input column inline.

Provider	Connector value
Amazon S3 Vectors	`s3vectors`
Azure Cosmos DB	`cosmosdb`
Couchbase	`couchbase`
Elasticsearch	`elastic`
MongoDB	`mongodb`
Pinecone	`pinecone`

For more information, see Vector Search with External Databases.

Lookup join syntax

You query an external table with a lateral join against one of the three search functions. The following example runs a key lookup against an external customers_ext table to enrich an orders stream:

SELECT o.*, lookup.*
FROM orders AS o,
     LATERAL TABLE(KEY_SEARCH_AGG(customers_ext, DESCRIPTOR(o.customer_id), id))
     AS lookup

For text and vector search, swap in the corresponding function and pass the additional <limit> argument. For full syntax and configuration options, see Search Functions. For the join semantics, see Lookup Joins.

Setting up an external table

To make an external table available in your pipeline:

Use CREATE CONNECTION to create a resource that holds the endpoint and credentials for the external system.
Run CREATE TABLE with the appropriate connector and a reference to the connection.
Query the external table from your pipeline with a lateral join against one of the three search functions.

For provider-specific configuration and end-to-end examples, see:

Operational considerations

External calls in stream processing enable enrichment and AI orchestration use cases, but they introduce dependencies on systems outside of Confluent Cloud for Apache Flink. Plan for the following:

Determinism. Because the external system can change between lookups, results from an external table are not deterministic over time. Reprocessing the same input stream can produce different output if the external system has been updated. See Determinism in Continuous Queries for a deeper discussion.
Latency and throughput. Each row in the probe stream triggers a lookup against the external system. Tune the asynchronous execution, client timeout, parallelism, and retry settings on the search function to match the latency budget of your pipeline and the capacity of the external system.
Replay traffic. A pipeline that reprocesses historical data issues all lookups again, which can overwhelm the external system. Consider rate limiting or pre-materializing the reference data into Apache Kafka® if reprocessing volume is high.
Private networking. External tables can reach systems on private networks. See Private networking with Flink for the supported networking topologies.