Aggregate Function
The behavior of an aggregate function is to aggregate the results of multiple rows into one row.
Palo supports the following aggregate functions:
1.avg
2.count
3.approx_count_distinct
4.max
5.min
6.sum
7.group_concat
8.variance_samp
9.variance_pop
10.percentile_approx
11.topn
For aggregate functions related to BITMAP type and HLL type, please refer to BITMAP Function and HLL Function.
AVG
Description
avg(numeric val)
- Function: the aggregate function returns the average value of numbers in a set. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of AVG are NULL, then the function returns NULL. When
GROUP BY
clause is specified in a query, then the value of eachGROUP BY
will be returned in a result. - Return type: double type
Example
mysql> select ss_ticket_number, avg(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | avg(`ss_sales_price`) |
+------------------+-----------------------+
| 28818 | 41.041875 |
+------------------+-----------------------+
Keywords
avg
COUNT
Description
count([distinct] col_name...)
-
Function: the aggregat function returns the number of rows that meet the requirements, or the number of rows not containing NULL.
- COUNT(*) count the rows containing NULL values.
- COUNT(column_name) only counts the rows not containing NULL values.
- Users can use the COUNT function and the DISTINCT operator at the same time, count(distinct col_name...) will remove the duplicate first, and then calculate the number of occurrences of the column combination.
- Return type: int type
Example
mysql> select count(distinct tiny_column, short_column) from small_table;
+-----------------------------------------------+
| count(DISTINCT `tiny_column`, `short_column`) |
+-----------------------------------------------+
| 2 |
+-----------------------------------------------+
Keywords
count
APPROX_COUNT_DISTINCT
Description
approx_count_distinct(type col)
- Function: the aggregate function returns the deduplicate value of specified column. Unlike the
count(distinct)
method, the function uses the HyperLogLog algorithm to return duplicate values, which has some errors but has higher efficiency thancount(distinct)
. - Return type: int type
Example
mysql> select approx_count_distinct(ss_ticket_number) from store_sales;
+-------------------------------------------+
| approx_count_distinct(`ss_ticket_number`) |
+-------------------------------------------+
| 46580 |
+-------------------------------------------+
Keywords
approx_count_distinct
MAX
Description
max(type col)
- Function: this aggregate function returns the max value of numbers in a set. It is opposite to the min function. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MAX are NULL, then the function returns NULL. When
GROUP BY
clause is specified in a query, then the value of eachGROUP BY
will be returned in a result. - Return type: same type as input parameters.
Example
mysql> select ss_ticket_number, max(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | max(`ss_sales_price`) |
+------------------+-----------------------+
| 28818 | 136.42 |
+------------------+-----------------------+
Keywords
max
MIN
Description
min(type col)
- Function: this aggregate function returns the min value of numbers in a set. It is opposite to the max function. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MIN are NULL, then the function returns NULL. When
GROUP BY
clause is specified in a query, then the value of eachGROUP BY
will be returned in a result. - Return type: same type as input parameters.
Example
mysql> select ss_ticket_number, min(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | min(`ss_sales_price`) |
+------------------+-----------------------+
| 28818 | 0.91 |
+------------------+-----------------------+
Keywords
min
SUM
Description
sum(numeric col)
- Function: the aggregate function returns the sum of all the values in a set. The function has only 1 parameter, which can be a column of number type, and the value it returns is the function of numbers, or the calculation result is the expression of numbers. Rows containing NULL value will be ignored. If the table is empty or all the parameters of MIN are NULL, then the function returns NULL. When
GROUP BY
clause is specified in a query, then the value of each
GROUP BY` will be returned in a result. - Return type: if the parameter is integer, then return BIGINT, and if the parameter is floating point, return double.
Example
mysql> select ss_ticket_number, sum(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+-----------------------+
| ss_ticket_number | sum(`ss_sales_price`) |
+------------------+-----------------------+
| 28818 | 1971980.01 |
+------------------+-----------------------+
Keywords
sum
GROUP_CONCAT
Description
group_count(col[, separator])
- Function: the aggregate function will return a string, which is a new string by connecting all strings in a set. If the user specifies a separator, the separator is used to connect strings of two adjacent rows.
- Return type: string type
- Instructions for use: by default, this function returns a string that covers all result sets. When GROUP BY clause is specified in a query, then the value of each GROUP BY will be returned in a result.
Example
mysql> select * from tbl;
+------+------+
| k1 | v1 |
+------+------+
| 1 | a |
| 1 | b |
| 1 | c |
+------+------+
mysql> select k1, group_concat(v1) from tbl group by k1;
+------+--------------------+
| k1 | group_concat(`v1`) |
+------+--------------------+
| 1 | a, b, c |
+------+--------------------+
mysql> select k1, group_concat(v1, '|') from tbl group by k1;
+------+-------------------------+
| k1 | group_concat(`v1`, '|') |
+------+-------------------------+
| 1 | a|b|c |
+------+-------------------------+
Keywords
group_concat
VARIANCE,VAR_SAMP,VARIANCE_SAMP
Description
variance(numeric val)
var_samp(numeric val)
variance_samp(numeric val)
- Function: this kind of aggregate function returns the sampel Variance of a set of numbers. This is a mathematical attribute representing the distance between the value and the average value. It acts on numeric types.
variance
andvar_samp
are the aliases ofvariance_samp
. - Return type: double type
Example
mysql> select ss_ticket_number, variance(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+----------------------------+
| ss_ticket_number | variance(`ss_sales_price`) |
+------------------+----------------------------+
| 28818 | 1378.408511209 |
+------------------+----------------------------+
Keywords
variance, var_samp, variance_samp
VAR_POP,VARIANCE_POP
Description
var_pop(numeric val)
variance_pop(numeric val)
- Function: this kind of aggregate function returns the population Variance of a set of numbers.This is a mathematical attribute representing the distance between the value and the average value. It acts on numeric types.
var_pop
is the alias ofvariance_pop
. - Return type: double type
Example
mysql> select ss_ticket_number, variance_pop(ss_sales_price) from store_sales group by ss_ticket_number;
+------------------+--------------------------------+
| ss_ticket_number | variance_pop(`ss_sales_price`) |
+------------------+--------------------------------+
| 28818 | 1378.408511134 |
+------------------+--------------------------------+
Keywords
var_pop, variance_pop
PERCENTILE_APPROX
Description
percentile_approx(numeric val, double percentile, double compression)
- Function: this kind of aggregate function adopts T-Digest algorithm, and returns the approximate value of a set of specified percentile values. The
percentile
should be between 0 and 1. The accuracy of the results can be controlled bycompression
, and the value range is 2014 - 10000. The greater the value is, the higher the accuracy is, the greater the memory overhead and time consumption are. The value is 2048 by default. - Return type: double type
Example
mysql> select percentile_approx(query_time, 0.95), percentile_approx(query_time, 0.99) from tbl limit 10;
+---------------------------------------+---------------------------------------+
| percentile_approx(`query_time`, 0.95) | percentile_approx(`query_time`, 0.99) |
+---------------------------------------+---------------------------------------+
| 30.994913101196289 | 116.05957794189453 |
+---------------------------------------+---------------------------------------+
mysql> select `table`, percentile_approx(cost_time,0.99) from log_statis group by `table`;
+---------------------+---------------------------+
| table | percentile_approx(`cost_time`, 0.99) |
+----------+--------------------------------------+
| test | 54.22 |
+----------+--------------------------------------+
mysql> select `table`, percentile_approx(cost_time,0.99, 4096) from log_statis group by `table`;
+---------------------+-----------------------------------+
| table | percentile_approx(`cost_time`, 0.99, 4096.0) |
+----------+----------------------------------------------+
| test | 54.21 |
+----------+----------------------------------------------+
Keywords
percentile_approx, percentile
TOPN
Description
topn(expr, int top_num[, int space_expand_rate])
-
Function: the topn function uses the Space-Saving algorithm to calculate the top_num frequent items in expr, and the results are frequent items and their occurrence times, which are approximate values. The
space_expand_rate
parameter is optional, which is used to set the number of counter used in Space-Saving algorithm. The greater the value ofspace_expand_rate
is, the more accurate the result is. The default value is 50.counter numbers = top_num * space_expand_rate
- Return type: string of JSON format.
Example
mysql> select topn(time, 10) from tbl;
+--------------------------------------------------------------------------------------------------------------------------+
| topn(`time`, 10) |
+--------------------------------------------------------------------------------------------------------------------------+
| {"0":7894391,"1":3887461,"2":3792601,"6":3344590,"5":2394986,"7":1421491,"3":1046929,"29":982826,"30":674072,"4":640616} |
+--------------------------------------------------------------------------------------------------------------------------+
mysql> select topn(time, 10, 100) from tbl;
+--------------------------------------------------------------------------------------------------------------------------+
| topn(`time`, 10, 100) |
+--------------------------------------------------------------------------------------------------------------------------+
| {"0":7894592,"1":3887551,"2":3792700,"6":3344590,"5":2394986,"7":1421492,"3":1046977,"29":982826,"30":674072,"4":640625} |
+--------------------------------------------------------------------------------------------------------------------------+
Keywords
topn