Compare Current and Previous Values in a Data Stream with Confluent Cloud for Apache Flink¶

Confluent Cloud for Apache Flink® provides a LAG function, which is a built-in function that enables you to access data from a previous event in the same row without the need for a self-join. It gives you the ability to analyze the differences between consecutive rows or to create more complex calculations based on previous events. This can be particularly useful in scenarios such as comparing daily sales values.

In this guide, you will learn how to run an Flink SQL statement that uses the LAG function to compare current and historical order values from a continuous data stream of orders data.

This topic shows the following steps:

Step 1: Inspect the example stream
Step 2: View aggregated results

Prerequisites¶

Access to Confluent Cloud.
The OrganizationAdmin, EnvironmentAdmin, or FlinkAdmin role for creating compute pools, or the FlinkDeveloper role if you already have a compute pool. If you don’t have the appropriate role, contact your OrganizationAdmin or EnvironmentAdmin. For more information, see Grant Role-Based Access in Confluent Cloud for Apache Flink.
A provisioned Flink compute pool.

Step 1: Inspect the example stream¶

In this step, you query the read-only orders table in the examples.marketplace database to inspect the stream for fields that you can mask.

Log in to Confluent Cloud and navigate to your Flink workspace.
In the Use catalog dropdown, select your environment.
In the Use database dropdown, select your Kafka cluster.

Run the following statement to inspect the example orders stream.

SELECT * FROM examples.marketplace.orders;
Copy

Your output should resemble:

order_id                                customer_id   product_id  price
68362284-34df-41a3-87fb-50b79647b786    3195          1267        47.48
6e03663e-d20b-4a23-848a-aec959d794e3    3094          1412        50.92
84217b5d-7dcb-46d1-9600-675a3734a3ed    3038          1094        83.56
...
Copy

Step 2: View aggregated results¶

Run the following statement to start a query on the orders data using the LAG function to return current and previous order data for each customer.

SELECT $rowtime AS row_time
      , customer_id
      , order_id
      , price
      , LAG(order_id, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_id
      , LAG(price, 1) OVER (PARTITION BY customer_id ORDER BY $rowtime) previous_order_price
  FROM examples.marketplace.orders;
Copy