Comparing Open Source Flink with Confluent Cloud for Apache Flink

Confluent Cloud for Apache Flink® supports many of the capabilities of Open Source Apache Flink® and provides additional features. Also, Confluent Cloud for Apache Flink has some different behaviors and limitations relative to Open Source (OSS) Flink. This topic describes the key differences between Confluent Cloud for Apache Flink and OSS Flink.

Additional features

The following list shows features provided by Confluent Cloud for Apache Flink that go beyond what OSS Flink offers.

Auto-inference of environments, clusters, topics, and schemas

In OSS Flink, you must define and configure your tables and their schemas, including authentication and authorization to Apache Kafka®. Confluent Cloud for Apache Flink maps environments, clusters, topics, and schemas automatically from Confluent Cloud to the corresponding Flink concepts of catalogs, databases, tables, and table schemas.

Autoscaling

Autopilot scales up and scales down the compute resources that SQL statements use in Confluent Cloud. The autoscaling process is based on parallelism, which is the number of parallel operations that occur when the SQL statement is running. A SQL statement performs at its best when it has the optimal resources for its required parallelism.

Default system column implementation

Confluent Cloud for Apache Flink has a default implementation for a system column named $rowtime. This column is mapped to the Kafka record timestamp, which can be either LogAppendTime or CreateTime.

Default watermark strategy

Flink requires a watermark strategy for a variety of features, such as windowing and temporal joins. Confluent Cloud for Apache Flink has a default watermark strategy applied on all tables/topics, which is based on the $rowtime system column. OSS Flink requires you to define a watermark strategy manually. For more information, see Event Time and Watermarks.

Because the default strategy is defined for general usage, there are cases that require a custom strategy, for example, when delays in record arrival of longer than 7 days occur in your streams. You can override the default strategy with a custom strategy by using the ALTER TABLE statement.

Schema Registry support for JSON_SR and Protobuf

Confluent Cloud for Apache Flink has support for Schema Registry formats AVRO, JSON_SR, and Protobuf, while OSS Flink currently supports only Schema Registry AVRO.

INFORMATION_SCHEMA support

Confluent Cloud for Apache Flink has an implementation for IMPLEMENTATION_SCHEMA, which is a system view that provides insights on catalogs, databases, tables, and schemas. This doesn’t exist in OSS Flink.

Behavioral differences

The following list shows differences in behavior between Confluent Cloud for Apache Flink and OSS Flink.

Configuration options

OSS Flink supports various optimization configuration options on different levels, like Execution Options, Optimizer Options, Table Options, and SQL Client Options. Confluent Cloud for Apache Flink supports only the necessary subset of these options.

Same of these options have different names in Confluent Cloud for Apache Flink, as shown in the following table.

Confluent Cloud for Apache Flink OSS Flink
client.results-timeout table.exec.async-lookup.timeout
client.statement-name
sql.current-catalog table.builtin-catalog-name
sql.current-database table.builtin-database-name
sql.local-time-zone table.local-time-zone
sql.state-ttl table.exec.state.ttl
sql.tables.scan.bounded.mode scan.bounded.mode
sql.tables.scan.bounded.timestamp-millis scan.bounded.timestamp-millis
sql.tables.scan.idle-timeout table.exec.source.idle-timeout
sql.tables.scan.startup.mode scan.startup.mode
sql.tables.scan.startup.timestamp-millis scan.startup.timestamp-millis

CREATE statements provision underlying resources

When you run a CREATE TABLE statement in Confluent Cloud for Apache Flink, it creates the underlying Kafka topic and a Schema Registry schema in Confluent Cloud. In OSS Flink, a CREATE TABLE statement only registers the object in the Flink catalog and doesn’t create an underlying resource.

This also means that temporary tables are not supported in Confluent Cloud for Apache Flink, while they are in OSS Flink.

One Kafka connector and only Confluent Cloud support

OSS Flink contains a Kafka connector and an Upsert-Kafka connector, which, combined with the format, defines whether the source/sink is treated as an append-stream or update stream. Confluent Cloud for Apache Flink has only one Kafka connector and determines if the source/sink is an append-stream or update stream by examining the changelog.mode connector option.

Confluent Cloud for Apache Flink only supports reading from and writing to Kafka topics that are located on Confluent Cloud. OSS Flink supports other connectors, like Kinesis, Pulsar, JDBC, etc., and also other Kafka environments, like on-premises and different cloud service providers.

Limitations

The following list shows limitations of Confluent Cloud for Apache Flink compared with OSS Flink.

Windowing functions syntax

Confluent Cloud for Apache Flink supports the TUMBLE, HOP, SESSION, and CUMULATE windowing functions only by using so-called Table-Valued Functions syntax. OSS Flink supports these windowing functions also by using the outdated Group Window Aggregations functions.

Currently unsupported statements and features

Confluent Cloud for Apache Flink does not support the following:

  • ANALYZE statements
  • CALL statements
  • CATALOG commands other than SHOW (No CREATE/DROP/ALTER)
  • DATABASE command other than SHOW (No CREATE/DROP/ALTER)
  • DELETE statements
  • DROP (DROP TABLE, DROP CATALOG, DROP DATABASE, DROP FUNCTION)
  • FUNCTION statements
  • JAR statements
  • LOAD / UNLOAD statements
  • TRUNCATE statements
  • UPDATE statements

Limited support for ALTER

Confluent Cloud for Apache Flink has limited support for ALTER TABLE compared with OSS Flink. In Confluent Cloud for Apache Flink, you can use ALTER TABLE only to change the watermark strategy, add a metadata column, or change a parameter value.