Python Table API Quick Start on Confluent Cloud for Apache Flink

Confluent Cloud for Apache Flink® supports programming applications with the Table API. Confluent provides a plugin for running applications that use the Table API on Confluent Cloud.

For more information, see Table API.

For code examples, see Python Examples for Table API on Confluent Cloud.

Note

The Flink Table API is available for preview.

A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion.

Comments, questions, and suggestions related to the Table API are encouraged and can be submitted through the established channels.

Prerequisites

  • Access to Confluent Cloud
  • A compute pool in Confluent Cloud
  • A Apache Kafka® cluster, if you want to run examples that store data in Kafka
  • Java version 11 or later

To run Table API and Flink SQL programs, you must generate an API key that’s specific to the Flink environment. Also, you need Confluent Cloud account details, like your organization and environment identifiers.

  • Flink API Key: Follow the steps in Generate a Flink API key. For convenience, assign your Flink key and secret to the FLINK_API_KEY and FLINK_API_SECRET environment variables.
  • Organization ID: The identifier your organization, for example, b0b421724-4586-4a07-b787-d0bb5aacbf87. For convenience, assign your organization identifier to the ORG_ID environment variable.
  • Environment ID: The identifier of the environment where your Flink SQL statements run, for example, env-z3y2x1. For convenience, assign your environment identifier to the ENV_ID environment variable.
  • Cloud provider name: The name of the cloud provider where your cluster runs, for example, aws. To see the available providers, run the confluent flink region list command. For convenience, assign your cloud provider to the CLOUD_PROVIDER environment variable.
  • Cloud region: The name of the region where your cluster runs, for example, us-east-1. To see the available regions, run the confluent flink region list command. For convenience, assign your cloud region to the CLOUD_REGION environment variable.
export CLOUD_PROVIDER="aws"
export CLOUD_REGION="us-east-1"
export FLINK_API_KEY="<your-flink-api-key>"
export FLINK_API_SECRET="<your-flink-api-secret>"
export ORG_ID="<your-organization-id>"
export ENV_ID="<your-environment-id>"
export COMPUTE_POOL_ID="<your-compute-pool-id>"
Copy

Compile and run a Table API program

Use poetry to create a virtual environment that contains all required dependencies and project files.

  1. Follow the instructions here to install pipx.

  2. Run the following command to install poetry.

    pipx install poetry
    
    Copy
  3. Copy the following code into a file named hello_table_api.py.

    from pyflink.table.confluent import ConfluentSettings, ConfluentTools
    from pyflink.table import TableEnvironment, Row
    from pyflink.table.expressions import col, row
    
    def run():
        # Set up the connection to Confluent Cloud
        settings = ConfluentSettings.from_file("/cloud.properties")
        env = TableEnvironment.create(settings)
    
        # Run your first Flink statement in Table API
        env.from_elements([row("Hello world!")]).execute().print()
    
        # Or use SQL
        env.sql_query("SELECT 'Hello world!'").execute().print()
    
        # Structure your code with Table objects - the main ingredient of Table API.
        table = env.from_path("examples.marketplace.clicks") \
            .filter(col("user_agent").like("Mozilla%")) \
            .select(col("click_id"), col("user_id"))
    
        table.print_schema()
        print(table.explain())
    
        # Use the provided tools to test on a subset of the streaming data
        expected = ConfluentTools.collect_materialized_limit(table, 50)
        actual = [Row(42, 500)]
        if expected != actual:
            print("Results don't match!")
    
    if __name__ == "__main__":
        run()
    
    Copy
  4. In the directory where you created hello_table_api.py, run the following command to build a virtual environment containing all required dependencies and project files.

    poetry install
    
    Copy
  5. Run the following command to execute the Table API program.

    poetry run hello_table_api.py
    
    Copy