Improve Agent Output with Reflection Workflows

With a reflection workflow, you can iteratively refine agent output through a structured drafter-critic loop. Instead of relying on a single-pass response, you define two sub-agents – a drafter and a critic – that work together to review and improve the result until it meets a defined quality bar.

Single-pass agents often produce shallow or incomplete results for complex analytical tasks, such as root-cause analysis, fraud investigation, or structured data extraction. When these outputs drive downstream automation, low-quality conclusions can cause costly or disruptive actions.

You can address this by creating a reflection workflow that adds an internal quality gate. You define a composite reflection agent that orchestrates two sub-agents in a bounded loop: the drafter generates output, the critic evaluates it against your defined criteria, and the loop continues until the critic approves the output or the iteration limit is reached. Only the final, approved output is emitted into the stream.

How reflection works

When you create a reflection agent, you define two sub-agents:

Drafter: You configure this agent to generate the initial output or perform the primary task. In subsequent iterations, the drafter receives the original task, its previous draft, and the critic’s feedback as context for refinement.
Critic: You configure this agent to review the drafter’s output against specific constraints or quality criteria. The critic responds with either an approval signal or actionable feedback describing what must be fixed.

At runtime, the reflection loop follows this execution sequence:

Initialization: The agent runtime identifies the agent as a REFLECTION type and loads the configurations for both sub-agents.
Drafting phase: The drafter receives the input prompt and generates a response.
Critique phase: The critic receives the drafter’s output along with the original task context. The critic evaluates the output and responds with a verdict.
Condition check: The agent runtime evaluates the critic’s response against the configured pass condition.
- If the pass condition is met, the loop terminates and the drafter’s output is emitted as the final result.
- If the pass condition is not met and MAX_ITERATIONS has not been reached, the loop continues to the next iteration.
Termination: The loop ends when the pass condition is met or MAX_ITERATIONS is reached. If the iteration limit is reached without approval, the last draft is returned.

How the runtime prompts sub-agents

The agent runtime constructs specialized prompts for each sub-agent at every stage of the loop. You do not need to include response format instructions or multi-turn context handling in your sub-agent prompts.

Critic prompt construction

When the critic is invoked, the runtime constructs a composite prompt that includes:

The original task given to the drafter
The drafter’s output to review
Your evaluation criteria (from the critic agent’s prompt)
A required response format specifying the expected JSON schema
Decision rules for when to approve or request changes

Because of this, your critic prompt should focus only on evaluation criteria: what to check, what standards to apply, and what domain-specific rules matter. The response format and approval mechanics are handled by the runtime.

The runtime instructs the critic to return a JSON object with the following structure:

{
  "verdict": "APPROVED or REQUIRED_CHANGES",
  "confidence": 0.95,
  "issues": [
    {
      "field": "field_name_or_section",
      "severity": "high or medium or low",
      "description": "Clear description of the issue"
    }
  ],
  "suggestions": "Specific suggestions for improvement"
}

The pass condition evaluator parses this JSON to determine whether the loop should terminate. It reads the verdict field for APPROVAL mode and the confidence field for CONFIDENCE mode. For more information, refer to Pass conditions.

Refinement prompt construction

When the loop continues to the next iteration, the runtime constructs a refinement prompt for the drafter that includes:

The original task
The drafter’s previous output
The critic’s feedback (issues and suggestions)

Your drafter prompt does not need to handle multi-turn context explicitly. The runtime injects this context automatically.

Create a reflection agent

The following steps show how you create a reflection agent by using Flink SQL syntax.

Step 1: Define the drafter agent

Define a standard agent that generates the initial output. You assign it a model and prompt, and optionally assign it tools. The drafter and critic can use different models. You can also assign tools (function-based or MCP-based) to sub-agents independently. For more information, refer to Call Tools.

The following example creates a drafter agent named claim_extractor that extracts structured JSON from claim descriptions.

CREATE AGENT claim_extractor
USING MODEL my_model
USING PROMPT 'You are a claims intake agent. Extract a JSON object
  with fields: incident_type, damage_summary,
  estimated_severity (low/med/high), location (lat/long), and
  any missing_fields. Use the user description and metadata.';

Step 2: Define the critic agent

Define a standard agent that reviews and critiques the drafter’s output. In the critic’s prompt, define the evaluation criteria for your use case. The runtime handles the response format automatically.

The following example creates a critic agent named claim_critic that validates the drafter’s output against domain-specific quality criteria.

CREATE AGENT claim_critic
USING MODEL my_model
USING PROMPT 'You are a claims quality reviewer. Critique the
  extracted JSON for missing required fields, contradictions, or
  low-confidence assumptions.';

Step 3: Create the reflection agent

Combine your drafter and critic into a composite reflection agent by using the USING AGENTS clause. List the drafter first and the critic second. You can use unquoted or backtick-quoted agent names.

You do not assign a model or prompt to the reflection agent. It delegates all LLM reasoning to the sub-agents you defined.

The following example creates a reflection agent named reflective_claim_intake that runs up to three draft-critique cycles and terminates when the critic responds with APPROVED. This max_iterations value controls the reflection loop and is separate from any max_iterations you set on sub-agents that use tools.

CREATE AGENT reflective_claim_intake
USING AGENTS claim_extractor, claim_critic
WITH (
  'type' = 'reflection',
  'pass_condition' = 'approval',
  'max_iterations' = '3'
);

Step 4: Run the reflection agent

Run your reflection agent by using the AI_RUN_AGENT function, the same way you run a standard agent. The agent runtime handles the reflection loop internally.

The following example runs reflective_claim_intake against a table named queries that contains id and prompt columns.

SELECT `prompt`, status, response
FROM `queries`,
LATERAL TABLE(
  AI_RUN_AGENT('reflective_claim_intake', `prompt`, `id`)
);

Configuration

You set the following configuration options in the WITH clause of the reflection agent’s CREATE AGENT statement. These options apply to the composite reflection agent, not to the sub-agents.

Option	Description	Default	Required
`TYPE`	Agent type. Must be `'REFLECTION'`.	STANDARD	Yes
`PASS_CONDITION`	Termination condition for the loop. `'APPROVAL'` or `'CONFIDENCE'`.	APPROVAL	No
`MAX_ITERATIONS`	Maximum reflection loop iterations.	10	No
`CONFIDENCE_VALUE`	Confidence threshold (0.0-1.0). Used when PASS_CONDITION is `'CONFIDENCE'`. If omitted, defaults to 0.8.	0.8	No

Important

Each reflection iteration triggers additional LLM calls. With MAX_ITERATIONS set to 5, a single input event can trigger up to 10 LLM calls (five drafts and five critiques). If the drafter uses tools, it might re-invoke them on each iteration when it refines its output based on critic feedback. Consider the cost and latency implications when configuring the iteration limit.

Understanding `MAX_ITERATIONS` at each level

A reflection workflow has two levels of iteration, both configured with max_iterations:

Reflection agent (outer loop): The reflection agent’s MAX_ITERATIONS controls how many draft-critique-refine cycles the reflection loop performs. The default is 10. For example, with MAX_ITERATIONS set to 3, the drafter and critic exchange up to three rounds of drafts and critiques.
Sub-agents (inner loop): If a sub-agent uses tools, its max_iterations controls how many iterations the sub-agent can use to call tools within a single invocation. For example, if your drafter agent has max_iterations set to 5, it can call tools up to five times before producing its draft for that round. Sub-agents without tools complete in a single step, so max_iterations has no effect.

Pass conditions

You use a pass condition to control when the reflection loop terminates successfully. The default pass condition is APPROVAL.

APPROVAL

With the APPROVAL pass condition, the runtime parses the critic’s JSON response and checks whether the verdict field equals "APPROVED" (case-insensitive). If the verdict is "APPROVED", the loop terminates and the drafter’s output is emitted as the final result.

CREATE AGENT my_reflection_agent
USING AGENTS drafter_agent, critic_agent
WITH (
  'type' = 'reflection',
  'pass_condition' = 'approval',
  'max_iterations' = '3'
);

CONFIDENCE

With the CONFIDENCE pass condition, the runtime reads the confidence field from the critic’s JSON response and checks whether it meets or exceeds the configured threshold. If it does, the loop terminates and the drafter’s output is emitted as the final result.

If you do not specify CONFIDENCE_VALUE, the default threshold of 0.8 is used.

CREATE AGENT my_reflection_agent
USING AGENTS drafter_agent, critic_agent
WITH (
  'type' = 'reflection',
  'pass_condition' = 'confidence',
  'confidence_value' = '0.95',
  'max_iterations' = '3'
);

Constraints and validation rules

The following rules apply when you create a reflection agent. Invalid configurations fail at agent creation time.

You must not specify USING MODEL or USING PROMPT on the reflection agent. It delegates all LLM reasoning to its sub-agents.
You must specify exactly two sub-agents in the USING AGENTS clause. List the drafter first and the critic second.
If you do not specify CONFIDENCE_VALUE when using the CONFIDENCE pass condition, the default threshold of 0.8 is used.
If a sub-agent fails during the reflection loop, the entire request fails immediately. For more information about error handling, refer to Error Handling.

When to use reflection

Reflection workflows and manual agent chaining serve different purposes.

Reflection workflows: Use a reflection workflow when you want higher-quality output from a single analytical task. Your drafter and critic iterate on the same task until the output meets your quality bar. Typical use cases include structured data extraction, root-cause analysis, and content generation with specific formatting requirements.
Manual agent chaining: Use manual agent chaining when you need different stages to perform fundamentally different tasks in a sequential pipeline. For example, you can create one agent that classifies a request and a separate agent that routes it to a specialist. For more information, refer to Create and Run Streaming Agents.

Example: Claims intake with reflection

This example shows how you build a complete reflection workflow for an insurance claims intake system. You configure the drafter to extract structured data from free-text claim descriptions, and the critic to validate the extracted data for completeness and accuracy.

Set up the model and input table

-- Create a connection to the model provider.
CREATE CONNECTION openai_connection
WITH (
  'type' = 'openai',
  'endpoint' = 'https://api.openai.com/v1/chat/completions',
  'api-key' = '<your-openai-api-key>'
);

-- Create a model.
CREATE MODEL claims_model
INPUT (message STRING)
OUTPUT (response STRING)
WITH (
  'provider' = 'openai',
  'openai.connection' = 'openai_connection',
  'openai.model_version' = 'gpt-4o'
);

-- Create an input table for submitted claims.
CREATE TABLE claims_submitted (
  id INT,
  customer_id INT,
  description STRING,
  image_url STRING,
  latitude DOUBLE,
  longitude DOUBLE
);

Define the sub-agents

-- Drafter: extracts structured claim fields.
CREATE AGENT claim_extractor
USING MODEL claims_model
USING PROMPT 'You are a claims intake agent. Extract a JSON object
  with fields: incident_type, damage_summary,
  estimated_severity (low/med/high), estimated_cost, location
  (lat/long), and any missing_fields. Use the user description
  and metadata.';

-- Critic: validates the extracted data.
CREATE AGENT claim_critic
USING MODEL claims_model
USING PROMPT 'You are a claims quality reviewer. Critique the
  extracted JSON for missing required fields, contradictions,
  or low-confidence assumptions. Verify that estimated_cost is
  present and reasonable.';

Create and run the reflection agent

-- Create the reflection agent.
CREATE AGENT reflective_claim_intake
USING AGENTS claim_extractor, claim_critic
WITH (
  'type' = 'reflection',
  'pass_condition' = 'approval',
  'max_iterations' = '3'
);

-- Create an output table for approved claims.
CREATE TABLE claims_intake_approved (
  id INT,
  customer_id INT,
  claim_json STRING
);

-- Run the reflection agent and write approved claims.
INSERT INTO claims_intake_approved
SELECT
  c.id,
  c.customer_id,
  r.response AS claim_json
FROM claims_submitted AS c,
LATERAL TABLE(
  AI_RUN_AGENT(
    'reflective_claim_intake',
    CONCAT(
      'description=', c.description,
      '; image_url=', c.image_url,
      '; lat=', CAST(c.latitude AS STRING),
      '; lon=', CAST(c.longitude AS STRING)
    ),
    CAST(c.id AS STRING)
  )
) AS r(status, response);

In this example:

The drafter extracts structured JSON from the claim description.
The critic checks the JSON for missing fields, contradictions, and unreasonable values.
If the critic responds with REQUIRED_CHANGES, the drafter receives the feedback and produces an improved version.
The loop repeats up to three times. Only the final, approved JSON is written to the claims_intake_approved table.

Example responses

The following example shows what the drafter, critic, and revised drafter responses look like during a reflection loop for a claim with the input description=broken windshield; lat=37.7749;lon=-122.4194.

Iteration 1: Initial draft

The drafter produces an initial JSON extraction from the claim description.

{
  "incident_type": "windshield_damage",
  "damage_summary": "Front windshield has crack",
  "estimated_severity": "med",
  "location": {"lat": 37.7749, "lon": -122.4194},
  "missing_fields": []
}

Iteration 1: Critic response

The critic identifies missing and incomplete fields and responds with REQUIRED_CHANGES.

{
  "verdict": "REQUIRED_CHANGES",
  "confidence": 0.4,
  "issues": [
    {
      "field": "estimated_cost",
      "severity": "high",
      "description": "Required field estimated_cost is missing."
    },
    {
      "field": "damage_summary",
      "severity": "medium",
      "description": "Description lacks specific details about
        crack size and location on windshield."
    }
  ],
  "suggestions": "Add estimated_cost for windshield repair.
    Typical range is $200-$500 depending on crack size. Specify
    crack dimensions if available from the description."
}

Iteration 2: Revised draft

The drafter receives the critic’s feedback and produces an improved version that addresses the issues.

{
  "incident_type": "windshield_damage",
  "damage_summary": "Front windshield has crack approximately
    6 inches long on the passenger side",
  "estimated_severity": "med",
  "estimated_cost": 350,
  "location": {"lat": 37.7749, "lon": -122.4194},
  "missing_fields": []
}

Iteration 2: Critic response

The critic confirms that all required fields are present and responds with APPROVED. The loop terminates and the revised draft is emitted as the final output.

{
  "verdict": "APPROVED",
  "confidence": 0.95,
  "issues": [],
  "suggestions": ""
}