Run a Managed AI Model with Confluent Cloud for Apache Flink

Confluent Cloud for Apache Flink® supports managed serverless inference on Confluent Cloud using open source models. Managed models are resources in Flink SQL, just like tables and functions. You can use a SQL statement to create a model resource and pass it on for inference in streaming queries. The SQL interface is available in Cloud Console and the Flink SQL shell.

Note

Managed AI models are an an Early Access Program feature in Confluent Cloud.

An Early Access feature is a component of Confluent Cloud introduced to gain feedback. This feature should be used only for evaluation and non-production testing purposes or to provide feedback to Confluent, particularly as it becomes more widely available in follow-on preview editions.

Early Access Program features are intended for evaluation use in development and testing environments only, and not for production use. Early Access Program features are provided: (a) without support; (b) “AS IS”; and (c) without indemnification, warranty, or condition of any kind. No service level commitment will apply to Early Access Program features. Early Access Program features are considered to be a Proof of Concept as defined in the Confluent Cloud Terms of Service. Confluent may discontinue providing preview releases of the Early Access Program features at any time in Confluent’s sole discretion.

If you would like to participate in the Early Access Program, sign up here.

The CREATE MODEL statement registers a managed model in your Flink environment for real-time prediction and inference.

For the fully managed AI models that run in Confluent Cloud, see confluent.model.

To create a model that runs on another cloud provider, see Run a Remote AI Model.

In this guide, you create and run a managed LLM and a managed embedding.

Prerequisites

  • Access to Confluent Cloud.
  • Access to a Flink compute pool.
  • Sufficient permissions to create models. For more information, see RBAC for model inference.

Supported cloud regions

Managed AI models are supported in the following AWS regions:

  • us-east-1
  • us-east-2
  • us-west-2

Create a managed LLM

The following steps show how to create and query a managed LLM in Confluent Cloud.

  1. Log in to Confluent Cloud and navigate to your Flink workspace.

  2. Run the following statement to create the LLM.

    CREATE MODEL `managed_model_llm`
    INPUT (prompt STRING)
    OUTPUT (response STRING)
    WITH (
      'provider' = 'confluent',
      'task' = 'text_generation',
      'confluent.model' = 'microsoft/Phi-3.5-mini-instruct'
    );
    
  3. Run the following statement to create a table that contains prompts for the managed LLM.

    CREATE TABLE text_stream (
      id BIGINT, prompt STRING
    );
    
  4. Run the following statement to populate the text_stream table with some example prompts.

    INSERT INTO text_stream
      VALUES
        (1, 'The mitochondria is the powerhouse of the cell'),
        (2, 'Tell me a bit about Tiananmen Square'),
        (3, 'How many rs are there in strawberry');
    
  5. Run the following statement to call the AI_COMPLETE function with the example prompt data.

    SELECT id, prompt, response
    FROM text_stream, LATERAL TABLE(AI_COMPLETE('`managed_model_llm`', prompt));
    

    Your output should resemble:

    id prompt                                         response
    == ============================================== ========
    1  The mitochondria is the powerhouse of the cell Your statement is correct. The mitochondria are indeed often referred to as the powerhouse of the cell...
    2  Tell me a bit about Tiananmen Square           Tiananmen Square is one of the most iconic public spaces in the world...
    3. How many rs are there in strawberry            The phrase "How many rs are there in strawberry?" is a bit confusing because...
    

Create a managed embedding model

The following steps show how to create an embedding model and generate embeddings in Confluent Cloud.

  1. Run the following statement to create the embedding model.

    CREATE MODEL `managed_model_embedding`
    INPUT (text STRING)
    OUTPUT (embedding ARRAY<FLOAT>)
    WITH (
      'provider' = 'confluent',
      'task' = 'embedding',
      'confluent.model'='BAAI/bge-large-en-v1.5'
    );
    
  2. Run the following statement to call the AI_EMBEDDING function to generate embeddings for the example prompt data.

    SELECT id, prompt, embedding
    FROM text_stream, LATERAL TABLE(AI_EMBEDDING('`managed_model_embedding`', prompt));
    

    Your output should resemble:

    id prompt                                         response
    == ============================================== ========
    1  The mitochondria is the powerhouse of the cell -0.015426636,-0.025268555,1.1193752E-4,0.027648926,-0.0020256042, ...
    2  Tell me a bit about Tiananmen Square           0.013908386,-0.021865845,-0.022506714,0.012626648, ...
    3. How many rs are there in strawberry            0.036865234,0.016098022,-0.01600647,-0.013015747, ...