Aggregate Function

Last Updated：2021-04-13

The behavior of an aggregate function is to aggregate the results of multiple rows into one row.

Palo supports the following aggregate functions:

1.avg
2.count
3.approx_count_distinct
4.max
5.min
6.sum
7.group_concat
8.variance_samp
9.variance_pop
10.percentile_approx
11.topn

For aggregate functions related to BITMAP type and HLL type, please refer to BITMAP Function and HLL Function.

AVG

Description

avg(numeric val)

Function: the aggregate function returns the average value of numbers in a set. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of AVG are NULL, then the function returns NULL. When GROUP BY clause is specified in a query, then the value of each GROUP BY will be returned in a result.
Return type: double type

Example

mysql> select ss_ticket_number, avg(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | avg(`ss_sales_price`) |
+------------------+-----------------------+
|            28818 |             41.041875 |
+------------------+-----------------------+

Keywords

avg

COUNT

Description

count([distinct] col_name...)

Function: the aggregat function returns the number of rows that meet the requirements, or the number of rows not containing NULL.
- COUNT(*) count the rows containing NULL values.
- COUNT(column_name) only counts the rows not containing NULL values.
- Users can use the COUNT function and the DISTINCT operator at the same time, count(distinct col_name...) will remove the duplicate first, and then calculate the number of occurrences of the column combination.
Return type: int type

Example

mysql> select count(distinct tiny_column, short_column) from small_table;
+-----------------------------------------------+
| count(DISTINCT `tiny_column`, `short_column`) |
+-----------------------------------------------+
|                                             2 |
+-----------------------------------------------+

Keywords

count

APPROX_COUNT_DISTINCT

Description

approx_count_distinct(type col)

Function: the aggregate function returns the deduplicate value of specified column. Unlike the count(distinct) method, the function uses the HyperLogLog algorithm to return duplicate values, which has some errors but has higher efficiency than count(distinct).
Return type: int type

Example

mysql> select approx_count_distinct(ss_ticket_number) from store_sales;
+-------------------------------------------+
| approx_count_distinct(`ss_ticket_number`) |
+-------------------------------------------+
|                                     46580 |
+-------------------------------------------+

Keywords

approx_count_distinct

MAX

Description

max(type col)

Function: this aggregate function returns the max value of numbers in a set. It is opposite to the min function. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MAX are NULL, then the function returns NULL. When GROUP BY clause is specified in a query, then the value of each GROUP BY will be returned in a result.
Return type: same type as input parameters.

Example

mysql> select ss_ticket_number, max(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | max(`ss_sales_price`) |
+------------------+-----------------------+
|            28818 |                136.42 |
+------------------+-----------------------+

Keywords

max

MIN

Description

min(type col)

Function: this aggregate function returns the min value of numbers in a set. It is opposite to the max function. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MIN are NULL, then the function returns NULL. When GROUP BY clause is specified in a query, then the value of each GROUP BY will be returned in a result.
Return type: same type as input parameters.

Example

mysql> select ss_ticket_number, min(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | min(`ss_sales_price`) |
+------------------+-----------------------+
|            28818 |                  0.91 |
+------------------+-----------------------+

Keywords

min

SUM

Description

sum(numeric col)

Function: the aggregate function returns the sum of all the values in a set. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MIN are NULL, then the function returns NULL. WhenGROUP BYclause is specified in a query, then the value of eachGROUP BY` will be returned in a result.
Return type: if the parameter is integer, then return BIGINT, and if the parameter is floating point, return double.

Example

mysql> select ss_ticket_number, sum(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | sum(`ss_sales_price`) |
+------------------+-----------------------+
|            28818 |            1971980.01 |
+------------------+-----------------------+

Keywords

sum

GROUP_CONCAT

Description

group_count(col[, separator])

Function: the aggregate function will return a string, which is a new string by connecting all strings in a set. If the user specifies a separator, the separator is used to connect strings of two adjacent rows.
Return type: string type
Instructions for use: by default, this function returns a string that covers all result sets. When GROUP BY clause is specified in a query, then the value of each GROUP BY will be returned in a result.

Example

mysql> select * from tbl;
+------+------+
| k1   | v1   |
+------+------+
|    1 | a    |
|    1 | b    |
|    1 | c    |
+------+------+

mysql> select k1, group_concat(v1) from tbl group by k1;
+------+--------------------+
| k1   | group_concat(`v1`) |
+------+--------------------+
|    1 | a, b, c            |
+------+--------------------+

mysql> select k1, group_concat(v1, '|') from tbl group by k1;
+------+-------------------------+
| k1   | group_concat(`v1`, '|') |
+------+-------------------------+
|    1 | a|b|c                   |
+------+-------------------------+

Keywords

group_concat

VARIANCE,VAR_SAMP,VARIANCE_SAMP

Description

variance(numeric val)
var_samp(numeric val)
variance_samp(numeric val)

Function: this kind of aggregate function returns the sampel Variance of a set of numbers. This is a mathematical attribute representing the distance between the value and the average value. It acts on numeric types.variance and var_samp are the aliases of variance_samp.
Return type: double type

Example

mysql> select ss_ticket_number, variance(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+----------------------------+
| ss_ticket_number | variance(`ss_sales_price`) |
+------------------+----------------------------+
|            28818 |             1378.408511209 |
+------------------+----------------------------+

Keywords

variance, var_samp, variance_samp

VAR_POP,VARIANCE_POP

Description

var_pop(numeric val)
variance_pop(numeric val)

Function: this kind of aggregate function returns the population Variance of a set of numbers.This is a mathematical attribute representing the distance between the value and the average value. It acts on numeric types.var_pop is the alias of variance_pop.
Return type: double type

Example

mysql> select ss_ticket_number, variance_pop(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+--------------------------------+
| ss_ticket_number | variance_pop(`ss_sales_price`) |
+------------------+--------------------------------+
|            28818 |                 1378.408511134 |
+------------------+--------------------------------+

Keywords

var_pop, variance_pop

PERCENTILE_APPROX

Description

percentile_approx(numeric val, double percentile, double compression)

Function: this kind of aggregate function adopts T-Digest algorithm, and returns the approximate value of a set of specified percentile values. The percentile should be between 0 and 1. The accuracy of the results can be controlled bycompression , and the value range is 2014 - 10000. The greater the value is, the higher the accuracy is, the greater the memory overhead and time consumption are. The value is 2048 by default.
Return type: double type

Example

mysql> select percentile_approx(query_time, 0.95), percentile_approx(query_time, 0.99) from tbl limit 10;
+---------------------------------------+---------------------------------------+
| percentile_approx(`query_time`, 0.95) | percentile_approx(`query_time`, 0.99) |
+---------------------------------------+---------------------------------------+
|                    30.994913101196289 |                    116.05957794189453 |
+---------------------------------------+---------------------------------------+

mysql> select `table`, percentile_approx(cost_time,0.99) from log_statis group by `table`;
+---------------------+---------------------------+
| table    | percentile_approx(`cost_time`, 0.99) |
+----------+--------------------------------------+
| test     |                                54.22 |
+----------+--------------------------------------+

mysql> select `table`, percentile_approx(cost_time,0.99, 4096) from log_statis group by `table`;
+---------------------+-----------------------------------+
| table    | percentile_approx(`cost_time`, 0.99, 4096.0) |
+----------+----------------------------------------------+
| test     |                                        54.21 |
+----------+----------------------------------------------+

Keywords

percentile_approx, percentile

TOPN

Description

topn(expr, int top_num[, int space_expand_rate])

Function: the topn function uses the Space-Saving algorithm to calculate the top_num frequent items in expr, and the results are frequent items and their occurrence times, which are approximate values. Thespace_expand_rate parameter is optional, which is used to set the number of counter used in Space-Saving algorithm. The greater the value ofspace_expand_rate is, the more accurate the result is. The default value is 50.
```
counter numbers = top_num * space_expand_rate
```
Return type: string of JSON format.

Example

mysql> select topn(time, 10) from tbl;
+--------------------------------------------------------------------------------------------------------------------------+
| topn(`time`, 10)                                                                                                         |
+--------------------------------------------------------------------------------------------------------------------------+
| {"0":7894391,"1":3887461,"2":3792601,"6":3344590,"5":2394986,"7":1421491,"3":1046929,"29":982826,"30":674072,"4":640616} |
+--------------------------------------------------------------------------------------------------------------------------+

mysql> select topn(time, 10, 100) from tbl;
+--------------------------------------------------------------------------------------------------------------------------+
| topn(`time`, 10, 100)                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------------+
| {"0":7894592,"1":3887551,"2":3792700,"6":3344590,"5":2394986,"7":1421492,"3":1046977,"29":982826,"30":674072,"4":640625} |
+--------------------------------------------------------------------------------------------------------------------------+

Keywords

topn

Type Conversion Function

Bit Operation Function

Data Warehouse

Aggregate Function

AVG

Description

Example

Keywords

COUNT

Description

Example

Keywords

APPROX_COUNT_DISTINCT

Description

Example

Keywords

MAX

Description

Example

Keywords

MIN

Description

Example

Keywords

SUM

Description

Example

Keywords

GROUP_CONCAT

Description

Example

Keywords

VARIANCE,VAR_SAMP,VARIANCE_SAMP

Description

Example

Keywords

VAR_POP,VARIANCE_POP

Description

Example

Keywords

PERCENTILE_APPROX

Description

Example

Keywords

TOPN

Description

Example

Keywords