Frequently Asked Questions (FAQ) on Confluent Platform for Apache Flink

The following sections provide answers to some of the most common questions about Confluent Platform for Apache Flink®.

What is Confluent Platform for Apache Flink?

Confluent Platform for Apache Flink is a component of Confluent Platform that enables you to run and manage Apache Flink® applications on-premises alongside other Confluent Platform components.

What is the relationship between Confluent Platform for Apache Flink and Flink on Confluent Cloud?

Confluent Platform for Apache Flink and Confluent Cloud for Apache Flink are two separate products that have different deployment models:

  • Confluent Platform for Apache Flink is designed for on-premises deployments, providing a managed experience for running Flink applications alongside other Confluent Platform components.
  • Confluent Cloud for Apache Flink provides a cloud-native, serverless service for Flink that enables simple and scalable stream processing that easily integrates with Kafka.

How do I install Confluent Platform for Apache Flink?

You can install Confluent Platform for Apache Flink using Helm. For more details, see Install Confluent Manager for Apache Flink with Helm.

What license do I need for Confluent Platform for Apache Flink?

Confluent Platform for Apache Flink is an Confluent Enterprise feature. You need a Confluent Platform for Apache Flink license to use Confluent Platform for Apache Flink.

What is the advantage of using Confluent Platform for Apache Flink over Apache Flink?

Confluent provides support and security patches for Confluent Platform for Apache Flink just like other Confluent Platform components. For more information, see Confluent Platform for Apache Flink.

What is Confluent Manager for Apache Flink (CMF)?

CMF is the central management component that enables users to securely manage a fleet of Flink applications across multiple environments. CMF sits next to other Confluent Platform components (like Confluent Server and Kafka Connect), and exposes functionality primarily through a REST API, which is also used by the Confluent CLI and Confluent Control Center

How do I get started with Confluent Platform for Apache Flink?

To get started with Confluent Platform for Apache Flink, you can either create and submit a Flink SQL statement or deploy a Flink application.

What is the purpose of a Flink environment in CMF?

Flink environments serve two main roles:

  1. Isolation: They enable logical isolation via access control (RBAC is scoped at the Environment level) and physical isolation by specifying the target Kubernetes namespace for deployment.
  2. Shared Configuration: They allow configuration options (like observability settings or checkpoint storage location) to be set at the Environment level, taking precedence over settings in individual Flink applications. This helps separate concerns between platform operators and developers.

What are Flink applications in CMF?

CMF applications are resources consisting of a Flink job (packaged as a Java JAR file), a Flink configuration, the specification of a Flink Kubernetes cluster, and status information. Every application runs on its own cluster, providing isolation between all applications.

What are Flink application instances?

A Flink application instance tracks the details of a deployed Flink application. A new application instance (with a unique identifier or UID) is created every time the specification for a Flink application is changed. Instances allow users to:

  • Understand the effective specification of the application after environment defaults have been applied.
  • Track changes made to the application over time.
  • Correlate Flink application activity with centralized logging systems, as the instance name is provided as an annotation on Kubernetes pods.
  • Track the status of the underlying Flink job, which is especially useful for finite streaming workloads (batch processing).

How can I manage CMF resources?

Confluent provides a number of ways to manage CMF resources.

You can use:

What are Flink statements and their limitations in CMF?

Statements are CMF resources representing Flink SQL queries. Flink SQL support in CMF is currently available as an open preview feature.

There are three main types of statements:

  1. Statements reading catalog metadata, for example SHOW TABLES: Immediately executed by CMF without creating a Flink deployment, typically for interactive scenarios.
  2. Interactive SELECT statements: Executed on Flink clusters, collecting results retrievable via the Statement Results endpoint (ad-hoc data exploration).
  3. Detached INSERT INTO statements: Executed on Flink clusters to deploy data pipeline jobs in production, writing results into a table backed by a Kafka topic.

Key limitations of statements include the following:

  • No support for CREATE TABLE, ALTER TABLE, DROP TABLE or EXPLAIN statements.
  • Compacted Kafka topics are not supported.
  • User-defined functions are not supported
  • SELECT and INSERT INTO statements with updating results are not supported.

For more details on what is and is not supported, see Features & Support for Statements in Confluent Manager for Apache Flink.

How are data sources connected for Flink SQL Statements?

Flink SQL uses catalogs to connect to external storage systems. Confluent Manager for Apache Flink features built-in Kafka catalogs that expose Kafka topics as tables and derive their schemas from Schema Registry.

When configuring a catalog:

  • A catalog references a Schema Registry instance and one or more Kafka clusters.
  • Each Kafka cluster is represented as a DATABASE, and each topic of a cluster is a TABLE in that database.
  • Sensitive connection properties (like credentials) must be stored separately in Flink Secrets.

What is a compute pool?

A compute pool defines the compute resources used to execute a SQL statement. Each statement must reference a compute pool, which acts as a template for the dedicated Flink cluster running the query.

  • The currently supported type is DEDICATED, meaning each statement runs or its own dedicated Flink cluster in application mode.
  • The compute pool configuration includes the Flink version, image (must use a confluentinc/cp-flink-sql image), and resource specifications for the JobManager and TaskManagers.

Can I configure logging and metrics for Flink applications?

Yes, you can configure both logging and metrics.

For Flink logging configurations, you use the Log4j 2 configuration file, exposed through the application and environment APIs. For more information, see Logging with Confluent Manager for Apache Flink.

Flink metrics collection leverages Flink’s metrics system. For more information, see Collect Metrics for Confluent Manager for Apache Flink.

What are the risks associated with deploying Flink jobs via CMF?

Apache Flink is a framework for executing user code, which has some inherent risks. It is crucial to set up proper authentication/authorization (RBAC) and limit networked access. Unconfigured Flink clusters should never be deployed in an internet-facing environment.

What security features are offered with Confluent Platform for Apache Flink?

Confluent Manager for Apache Flink provides the following key security features: