Aggregate Functions¶
Confluent Cloud for Apache Flink®️ provides these built-in functions to aggregate rows in Flink SQL queries:
AVG | COLLECT | COUNT | CUME_DIST |
DENSE_RANK | FIRST_VALUE | LAG | LAST_VALUE |
LEAD | LISTAGG | MAX | MIN |
NTILE | PERCENT_RANK | RANK | ROW_NUMBER |
STDDEV_POP | STDDEV_SAMP | SUM | VAR_POP |
VAR_SAMP | VARIANCE |
The aggregate functions take an expression across all the rows as the input and return a single aggregated value as the result.
AVG¶
- Syntax
AVG([ ALL | DISTINCT ] expression)
- Description
By default or with keyword
ALL
, returns the average (arithmetic mean) ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.
COLLECT¶
- Syntax
COLLECT([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns a multiset ofexpression
over all input rows.NULL values are ignored.
Use
DISTINCT
to return one unique instance of each value.
COUNT¶
- Syntax
COUNT([ ALL ] expression | DISTINCT expression1 [, expression2]*)
- Description
By default or with
ALL
, returns the number of input rows for which expression isn’t NULL.Use
DISTINCT
to return one unique instance of each value.Use
COUNT(*)
orCOUNT(1)
to return the number of input rows.
CUME_DIST¶
- Syntax
CUME_DIST()
- Description
- Returns the cumulative distribution of a value in a group of values. The result is the number of rows preceding or equal to the current row in the partition ordering divided by the number of rows in the window partition.
DENSE_RANK¶
FIRST_VALUE¶
- Syntax
FIRST_VALUE(expression)
- Description
- Returns the first value in an ordered set of values.
- Related function
LAG¶
- Syntax
LAG(expression [, offset] [, default])
- Description
Returns the value of expression at the offsetth row before the current row in the window.
The default value of
offset
is 1, and the default value of thedefault
argument is NULL.- Related function
LAST_VALUE¶
- Syntax
LAST_VALUE(expression)
- Description
- Returns the last value in an ordered set of values.
- Related function
LEAD¶
- Syntax
LEAD(expression [, offset] [, default])
- Description
Returns the value of expression at the offsetth row after the current row in the window.
The default value of
offset
is 1, and the default value of thedefault
argument is NULL.- Related function
LISTAGG¶
- Syntax
LISTAGG(expression [, separator])
- Description
Concatenates the values of string expressions and places separator values between them.
The separator isn’t added at the end of string.
The default value of separator is
','
.
MAX¶
- Syntax
MAX([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns the maximum value ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.
MIN¶
- Syntax
MIN([ ALL | DISTINCT ] expression )
- Description
By default or with the
ALL
keyword, returns the minimum value ofexpression
across all input rows.Use
DISTINCT
to return one unique instance of each value.
NTILE¶
- Syntax
NTILE(n)
- Description
Divides the rows for each window partition into
n
buckets ranging from 1 to at mostn
.If the number of rows in the window partition doesn’t divide evenly into the number of buckets, the remainder values are distributed one per bucket, starting with the first bucket.
For example, with 6 rows and 4 buckets, the bucket values would be:
1 1 2 2 3 4
PERCENT_RANK¶
- Syntax
PERCENT_RANK()
- Description
Returns the percentage ranking of a value in a group of values.
The result is the rank value minus one, divided by the number of rows in the parition minus one.
If the partition only contains one row, the
PERCENT_RANK
function returns 0.
RANK¶
- Syntax
RANK()
- Description
Returns the rank of a value in a group of values.
The result is one plus the number of rows preceding or equal to the current row in the partition ordering.
The values produce gaps in the sequence.
- Related functions
ROW_NUMBER¶
- Syntax
ROW_NUMBER()
- Description
Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
The
ROW_NUMBER
andRANK
functions are similar.ROW_NUMBER
numbers all rows sequentially, for example,1, 2, 3, 4, 5
.RANK
provides the same numeric value for ties, for example1, 2, 2, 4, 5
.- Related functions
STDDEV_POP¶
- Syntax
STDDEV_POP([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns the population standard deviation ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.- Related function
STDDEV_SAMP¶
- Syntax
STDDEV_SAMP([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns the sample standard deviation ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.- Related function
SUM¶
- Syntax
SUM([ ALL | DISTINCT ] expression)
By default or with the
ALL
keyword, returns the sum ofexpression
across all input rows.Use
DISTINCT
to return one unique instance of each value.
VAR_POP¶
- Syntax
VAR_POP([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns the population variance, which is the square of the population standard deviation, ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.- Related function
VAR_SAMP¶
- Syntax
VAR_SAMP([ ALL | DISTINCT ] expression)
- Description
By default or with the
ALL
keyword, returns the sample variance, which is the square of the sample standard deviation, ofexpression
over all input rows.Use
DISTINCT
to return one unique instance of each value.The
VARIANCE
function is equivalent toVAR_SAMP
.- Related functions
Other built-in functions¶
- Aggregate Functions
- Collection Functions
- Comparison functions
- Conditional Functions
- Datetime Functions
- Hash Functions
- JSON Functions
- Numeric Functions
- String Functions
Important
Confluent Cloud for Apache Flink®️ is currently available for Preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing Preview releases of the Preview features at any time in Confluent’s sole discretion. Check out Getting Help for questions, feedback and requests.
For Flink SQL features and limitations in the preview program, see Notable Limitations in Public Preview.