Stream Processing and SQL with Confluent Cloud for Apache Flink¶
Apache Flink® is a powerful, scalable stream processing framework for executing complex, stateful, low-latency streaming applications on large volumes of data.
Flink excels at complex, high-performance, mission-critical streaming workloads and is used by many companies for production stream processing applications.
Confluent Cloud provides a cloud-native, serverless service for Flink for simple, scalable, and secure stream processing. It currently supports SQL.
Get started with:
Important
Confluent Cloud for Apache Flink®️ is currently available for Preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing Preview releases of the Preview features at any time in Confluent’s sole discretion. Check out Getting Help for questions, feedback and requests.
For SQL features and limitations in the preview program, see Notable Limitations in Public Preview.
Confluent Cloud for Apache Flink is cloud-native¶
Confluent Cloud for Apache Flink®️ provides a truly cloud-native experience for Flink. This means you can fully focus on your business logic, encapsulated in SQL statements, and Confluent Cloud takes care of what’s needed to run them in a secure, resource-efficient and fault-tolerant manner. You don’t need to know about or interact with Flink clusters, state backends, checkpointing, and all of the other aspects that are usually involved when operating a production-ready Flink deployment.
- Fully Managed
- On Confluent Cloud, you don’t need to choose a runtime version of Flink. You’re always using the latest version and benefit from continous improvements and innovations. All of your running statements automatically and transparently receive security patches and minor upgrades of the Flink runtime.
- Autoscaling
- All of your SQL statements on Confluent Cloud are monitored continuously and auto-scaled to keep up with the rate of their input topics. The resources required by a statement depend on its complexity and the throughput of topics it reads from.
Confluent Cloud for Apache Flink is complete¶
Confluent has integrated Flink deeply with Confluent Cloud to provide an enterprise-ready, complete experience that enables data discovery and processing using familiar SQL semantics.
Flink is a regional service¶
Confluent Cloud for Apache Flink is a regional service, and you can create compute pools in any of the supported regions. Compute pools represent a set of resources that scale automatically between zero and their maximum size to provide all of the power required by your statements.
While compute pools are created within an environment, you can query data in any topic in your Confluent Cloud organization, even if the data is in a different environment, as long as it’s in the same region. This enables Flink to do cross-cluster, cross-environment queries while providing low latency. Of course, access control with RBAC still determines the data that can be read or written.
Metadata mapping between Kafka cluster, topics, schemas, and Flink¶
Apache Kafka® topics and schemas are always in sync with Flink, simplifying how you can process your data. Any topic created in Kafka is visible directly as a table in Flink, and any table created in Flink is visible as a topic in Kafka. Effectively, Flink provides a SQL interface on top of Confluent Cloud.
Because Flink follows the SQL standard, the terminology is slightly different from Kafka. The following table shows the mapping between Kafka and Flink terminology.
Kafka | Flink | Notes |
---|---|---|
Environment | Catalog | Flink can query and join data that are in any environments/catalogs |
Cluster | Database | Flink can query and join data that are in different clusters/databases |
Topic + Schema | Table | Kafka topics and Flink tables are always in sync. You never need to to declare tables manually for existing topics. Creating a table in Flink creates a topic and the associated schema. |
As a result, when you start using Flink, you can directly access all of the environments, clusters, and topics that you already have in Confluent Cloud, without any additional metadata creation.
Compared with Open Source Flink, the main difference is that the DDLs related to catalogs, databases, and tables act on physical objects and not only on metadata. For example, when you create a table in Flink, the corresponding topic and schema are created immediately in Confluent Cloud.
Security¶
Confluent Cloud for Apache Flink has a deep integration with Role-Based Access Control (RBAC), ensuring that you can easily access and process the data that you have access to, and no other data.
Access from Flink to the data¶
- For ad-hoc queries, Confluent recommends using your user account, so the permissions of the current user are applied automatically without any additional setting needed.
- For INSERT INTO queries that need to run 24/7 Confluent requires you to use a service account, so the queries are not affected by a user leaving the company or changing teams. Any query not running with a service account times out after 4 hours.
Access to Flink¶
To manage Flink access, Confluent has introduced two new roles. In both cases, RBAC of the user on the underlying data is still applied.
- FlinkDeveloper: basic access to Flink, enabling users to query data and manage their own statements.
- FlinkAdmin: role that enables creating and managing Flink compute pools.
Service accounts¶
Service accounts are needed to run statements permanently. If you want to run a statement with service account permissions, an OrgAdmin must create an Assigner role binding for the user on the service account.
Confluent Cloud for Apache Flink is everywhere¶
Confluent Cloud for Apache Flink will be available in all clouds. During Public Preview, it is currently available in a limited number of regions, as described in Cloud regions.