SELECT Statement

Important

Confluent Cloud for Apache Flink®️ is currently available for Preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing Preview releases of the Preview features at any time in Confluent’s sole discretion. Check out Getting Help for questions, feedback and requests.

For Flink SQL features and limitations in the preview program, see Notable Limitations in Public Preview.

Confluent Cloud for Apache Flink®️ enables querying the content of your tables by using familiar SELECT syntax.

Syntax

SELECT [DISTINCT] select_list FROM table_expression [ WHERE boolean_expression ]

Description

Select list

The select_list specification * means the query resolves all columns. But in production, using * is not recommended, because it makes queries less robust to catalog changes. Instead, use a select_list to specify a subset of available columns or make calculations using the columns. For example, if an orders table has columns named order_id, price, and tax you could write the following query:

SELECT order_id, price + tax FROM orders

Table expression

The table_expression can be any source of data, including a table, view, or VALUES clause, the joined results of multiple existing tables, or a subquery.

Assuming that an orders table is available in the catalog, the following would read all rows from .

SELECT * FROM orders;

VALUES clause

Queries can consume from inline data by using the VALUES clause. Each tuple corresponds to one row. You can provide an alias to assign a name to each column.

 SELECT order_id, price
   FROM (VALUES (1, 2.0), (2, 3.1))
   AS t (order_id, price);

Your output should resemble:

order_id price
1        2.0
2        3.1

WHERE clause

Filter rows by using the WHERE clause.

 SELECT price + tax
   FROM orders
   WHERE id = 10;

Functions

You can invoke built-in scalar functions on the columns of a single row.

SELECT PRETTY_PRINT(order_id) FROM orders;

DISTINCT

If SELECT DISTINCT is specified, all duplicate rows are removed from the result set, which means that one row is kept from each group of duplicates.

For streaming queries, the required state for computing the query result might grow infinitely. State size depends on the number of distinct rows.

SELECT DISTINCT id FROM orders;

Examples

In the Flink SQL shell or in a Cloud Console workspace, run the following commands to see examples of the SELECT statement.

  1. Create a table for web page click events.

    -- Create a table for web page click events.
    CREATE TABLE clicks (
      ip_address VARCHAR,
      url VARCHAR,
      click_ts_raw BIGINT
    );
    
  2. Populate the table with mock clickstream data.

    -- Populate the table with mock clickstream data.
    INSERT INTO clicks
    VALUES( '10.0.0.1',  'https://acme.com/index.html',     1692812175),
          ( '10.0.0.12', 'https://apache.org/index.html',   1692826575),
          ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575),
          ( '10.0.0.1',  'https://acme.com/index.html',     1692812175),
          ( '10.0.0.12', 'https://apache.org/index.html',   1692819375),
          ( '10.0.0.13', 'https://confluent.io/index.html', 1692826575);
    

    Press ENTER to return to the SQL shell. Because INSERT INTO VALUES is a point-in-time statement, it exits after it completes inserting records.

  3. View all rows in the clicks table by using a SELECT statement.

    SELECT * FROM clicks;
    

    Your output should resemble:

    ip_address url                             click_ts_raw
    10.0.0.1   https://acme.com/index.html     1692812175
    10.0.0.12  https://apache.org/index.html   1692826575
    10.0.0.13  https://confluent.io/index.html 1692826575
    10.0.0.1   https://acme.com/index.html     1692812175
    10.0.0.12  https://apache.org/index.html   1692819375
    10.0.0.13  https://confluent.io/index.html 1692826575
    
  4. View only unique rows in the clicks table by using a SELECT DISTINCT statement.

    SELECT DISTINCT * FROM clicks;
    

    Your output should resemble:

    ip_address url                             click_ts_raw
    10.0.0.1   https://acme.com/index.html     1692812175
    10.0.0.12  https://apache.org/index.html   1692826575
    10.0.0.13  https://confluent.io/index.html 1692826575
    10.0.0.12  https://apache.org/index.html   1692819375
    
  5. View only records that have the ip_address of 10.0.0.1 by using a SELECT WHERE statement.

    SELECT * FROM clicks WHERE ip_address='10.0.0.1';
    

    Your output should resemble:

    ip_address url                         click_ts_raw
    10.0.0.1   https://acme.com/index.html 1692812175
    10.0.0.1   https://acme.com/index.html 1692812175