Packaging Python UDFs with the Java Table API for Confluent Manager for Apache Flink
This topic describes how to execute Python User-Defined Functions (UDFs) within a generic Java Table API runner for Confluent Manager for Apache Flink® (CMF). This approach requires a hybrid setup where a custom Docker image provides the necessary Python runtime while the Java application orchestrates the execution and configuration.
Note
This topic is intended for advanced users and requires manual environment management. The configuration described in this topic is valid and stable across Flink versions.
Step 1: Environment setup using Docker
To deploy a Python UDF with the Table API, you must first build a custom Docker image that contains the required Python dependencies. To learn about building a Docker image, see Build and push your first image in the Docker documentation.
When building the Docker image, ensure that you include the following components:
Include the
flink-pythonJAR is included in the Flink/libdirectory during the Docker build.Install the desired version of Python and the
apache-flink(pyflink) package within the Docker image.
Step 2: Configure the Java application
For the Table API to execute the application, you must explicitly define where Flink can find the Python runtime.
Set the Python executable by configuring the Flink environment to point to the Python virtual environment or binary created in Step 1. Following is an example of how to set the Python executable in the Java application configuration:
config.setString("python.executable", "/opt/flink/pyflink/.venv/bin/python");
Load Python files by ensuring the UDF definitions, for example
my_udfs.py, are included in the job dependencies.
Step 3: SQL implementation
Inside the SQL script or statement being processed by the Java runner, register the function. To do this, declare the UDF using the CREATE TEMPORARY FUNCTION syntax, specifying the language is Python. For example:
-- Syntax to register the UDF
CREATE TEMPORARY FUNCTION py_upper
AS 'my_udfs.my_python_udf'
LANGUAGE PYTHON;
Step 4: Invoke the Java Table API
Execute the Java Table API runner with the appropriate configuration to ensure it can locate the Python runtime and dependencies. This involves running the Java application with the custom Docker image built in Step 1, which includes the necessary Python environment and the flink-python JAR.