Aggregate Functions

Confluent Cloud for Apache Flink®️ provides these built-in functions to aggregate rows in Flink SQL queries:

AVG COLLECT COUNT CUME_DIST
DENSE_RANK FIRST_VALUE LAG LAST_VALUE
LEAD LISTAGG MAX MIN
NTILE PERCENT_RANK RANK ROW_NUMBER
STDDEV_POP STDDEV_SAMP SUM VAR_POP
VAR_SAMP VARIANCE    

The aggregate functions take an expression across all the rows as the input and return a single aggregated value as the result.

AVG

Syntax
AVG([ ALL | DISTINCT ] expression)
Description

By default or with keyword ALL, returns the average (arithmetic mean) of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

COLLECT

Syntax
COLLECT([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns a multiset of expression over all input rows.

NULL values are ignored.

Use DISTINCT to return one unique instance of each value.

COUNT

Syntax
COUNT([ ALL ] expression | DISTINCT expression1 [, expression2]*)
Description

By default or with ALL, returns the number of input rows for which expression isn’t NULL.

Use DISTINCT to return one unique instance of each value.

Use COUNT(*) or COUNT(1) to return the number of input rows.

CUME_DIST

Syntax
CUME_DIST()
Description
Returns the cumulative distribution of a value in a group of values. The result is the number of rows preceding or equal to the current row in the partition ordering divided by the number of rows in the window partition.

DENSE_RANK

Syntax
DENSE_RANK()
Description

Returns the rank of a value in a group of values.

The result is one plus the previously assigned rank value.

Unlike the RANK function, DENSE_RANK doesn’t produce gaps in the ranking sequence.

Related function

FIRST_VALUE

Syntax
FIRST_VALUE(expression)
Description
Returns the first value in an ordered set of values.
Related function

LAG

Syntax
LAG(expression [, offset] [, default])
Description

Returns the value of expression at the offsetth row before the current row in the window.

The default value of offset is 1, and the default value of the default argument is NULL.

Related function

LAST_VALUE

Syntax
LAST_VALUE(expression)
Description
Returns the last value in an ordered set of values.
Related function

LEAD

Syntax
LEAD(expression [, offset] [, default])
Description

Returns the value of expression at the offsetth row after the current row in the window.

The default value of offset is 1, and the default value of the default argument is NULL.

Related function

LISTAGG

Syntax
LISTAGG(expression [, separator])
Description

Concatenates the values of string expressions and places separator values between them.

The separator isn’t added at the end of string.

The default value of separator is ','.

MAX

Syntax
MAX([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns the maximum value of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

MIN

Syntax
MIN([ ALL | DISTINCT ] expression )
Description

By default or with the ALL keyword, returns the minimum value of expression across all input rows.

Use DISTINCT to return one unique instance of each value.

NTILE

Syntax
NTILE(n)
Description

Divides the rows for each window partition into n buckets ranging from 1 to at most n.

If the number of rows in the window partition doesn’t divide evenly into the number of buckets, the remainder values are distributed one per bucket, starting with the first bucket.

For example, with 6 rows and 4 buckets, the bucket values would be:

1 1 2 2 3 4

PERCENT_RANK

Syntax
PERCENT_RANK()
Description

Returns the percentage ranking of a value in a group of values.

The result is the rank value minus one, divided by the number of rows in the parition minus one.

If the partition only contains one row, the PERCENT_RANK function returns 0.

RANK

Syntax
RANK()
Description

Returns the rank of a value in a group of values.

The result is one plus the number of rows preceding or equal to the current row in the partition ordering.

The values produce gaps in the sequence.

Related functions

ROW_NUMBER

Syntax
ROW_NUMBER()
Description

Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.

The ROW_NUMBER and RANK functions are similar. ROW_NUMBER numbers all rows sequentially, for example, 1, 2, 3, 4, 5. RANK provides the same numeric value for ties, for example 1, 2, 2, 4, 5.

Related functions

STDDEV_POP

Syntax
STDDEV_POP([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns the population standard deviation of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

Related function

STDDEV_SAMP

Syntax
STDDEV_SAMP([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns the sample standard deviation of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

Related function

SUM

Syntax
SUM([ ALL | DISTINCT ] expression)

By default or with the ALL keyword, returns the sum of expression across all input rows.

Use DISTINCT to return one unique instance of each value.

VAR_POP

Syntax
VAR_POP([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns the population variance, which is the square of the population standard deviation, of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

Related function

VAR_SAMP

Syntax
VAR_SAMP([ ALL | DISTINCT ] expression)
Description

By default or with the ALL keyword, returns the sample variance, which is the square of the sample standard deviation, of expression over all input rows.

Use DISTINCT to return one unique instance of each value.

The VARIANCE function is equivalent to VAR_SAMP.

Related functions

VARIANCE

Syntax
VARIANCE([ ALL | DISTINCT ] expression)
Description
Equivalent to VAR_SAMP.

Other built-in functions

Important

Confluent Cloud for Apache Flink®️ is currently available for Preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing Preview releases of the Preview features at any time in Confluent’s sole discretion. Check out Getting Help for questions, feedback and requests.

For Flink SQL features and limitations in the preview program, see Notable Limitations in Public Preview.