Aggregate a Stream in a Tumbling Window with Confluent Cloud for Apache Flink¶
Aggregation over windows is central to processing streaming data. Confluent Cloud for Apache Flink® supports Windowing Table-Valued Functions (Windowing TVFs) in Confluent Cloud for Apache Flink, a SQL-standard syntax for splitting an infinite stream into windows of finite size and computing aggregations within each window. This is often used to find the min/max/average within a group, finding the first or last record or calculating totals.
In this guide, you will learn how to run an Flink SQL statement that identifies the maximum and minimum orders from a continuous data stream of orders data.
This topic shows the following steps:
Prerequisites¶
- Access to Confluent Cloud.
- The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink.
- A provisioned Flink compute pool.
Step 1: Inspect the example stream¶
In this step, you query the read-only orders
table in the
examples.marketplace
database to inspect the stream for fields that you
can mask.
Log in to Confluent Cloud and navigate to your Flink workspace.
In the Use catalog dropdown, select your environment.
In the Use database dropdown, select your Kafka cluster.
Run the following statement to inspect the example
orders
stream.SELECT * FROM examples.marketplace.orders;
Your output should resemble:
order_id customer_id product_id price 68362284-34df-41a3-87fb-50b79647b786 3195 1267 47.48 6e03663e-d20b-4a23-848a-aec959d794e3 3094 1412 50.92 84217b5d-7dcb-46d1-9600-675a3734a3ed 3038 1094 83.56 ...
Step 2: View aggregated results in a tumbling window¶
Run the following statement to start a windowed query on the
orders
data.SELECT window_start, window_end, MIN(price) as minimum_order_value, MAX(price) as maximum_order_value FROM TABLE(TUMBLE(TABLE examples.marketplace.orders, DESCRIPTOR($rowtime), INTERVAL '10' SECOND)) GROUP BY window_start, window_end;
Your output should resemble:
window_start window_end minimum_order_value maximum_order_value 2023-09-12 08:54:20.000 2023-09-12 08:54:30.000 10.05 99.75 2023-09-12 08:54:30.000 2023-09-12 08:54:40.000 10.22 99.88 2023-09-12 08:54:40.000 2023-09-12 08:54:50.000 10.09 150.45 ...
The Flink statement created with this query identifies the minimum and maximum order value in each 10-second window.