# Aggregate functions
<a name="sql-functions-agg-spark"></a>

Aggregate functions in AWS Clean Rooms Spark SQL are used to perform calculations or operations on a group of rows and return a single value. They are essential for data analysis and summarization tasks.

AWS Clean Rooms Spark SQL supports the following aggregate functions:

**Topics**
+ [ANY\$1VALUE function](ANY_VALUE.md)
+ [APPROX COUNT\$1DISTINCT function](approx-count-distinct.md)
+ [APPROX PERCENTILE function](approx-percentile.md)
+ [AVG function](avg-function.md)
+ [BOOL\$1AND function](BOOL_AND.md)
+ [BOOL\$1OR function](BOOL_OR.md)
+ [CARDINALITY function](CARDINALITY.md)
+ [COLLECT\$1LIST function](COLLECT_LIST.md)
+ [COLLECT\$1SET function](COLLECT_SET.md)
+ [COUNT and COUNT DISTINCT functions](count-function.md)
+ [COUNT function](COUNT.md)
+ [MAX function](MAX.md)
+ [MEDIAN function](MEDIAN.md)
+ [MIN function](MIN.md)
+ [PERCENTILE function](percentile.md)
+ [SKEWNESS function](SKEWNESS.md)
+ [STDDEV\$1SAMP and STDDEV\$1POP functions](STDDEV_functions.md)
+ [SUM and SUM DISTINCT functions](sum-function.md)
+ [VAR\$1SAMP and VAR\$1POP functions](VARIANCE_functions.md)

# ANY\$1VALUE function
<a name="ANY_VALUE"></a>

The ANY\$1VALUE function returns any value from the input expression values nondeterministically. This function can return NULL if the input expression doesn't result in any rows being returned. 

## Syntax
<a name="ANY_VALUE-synopsis"></a>

```
ANY_VALUE (expression[, isIgnoreNull] )
```

## Arguments
<a name="ANY_VALUE-arguments"></a>

 *expression *   
The target column or expression on which the function operates. The *expression* is one of the following data types:

*isIgnoreNull*  
A boolean that determines if the function should return only non-null values.

## Returns
<a name="ANY_VALUE-returns"></a>

Returns the same data type as *expression*. 

## Usage notes
<a name="ANY_VALUE-usage-notes"></a>

If a statement that specifies the ANY\$1VALUE function for a column also includes a second column reference, the second column must appear in a GROUP BY clause or be included in an aggregate function. 

## Examples
<a name="ANY_VALUE-examples"></a>

The following example returns an instance of any `dateid` where the `eventname` is `Eagles`. 

```
select any_value(dateid) as dateid, eventname from event where eventname ='Eagles' group by eventname;
```

Following are the results.

```
dateid | eventname
-------+---------------
 1878  | Eagles
```

The following example returns an instance of any `dateid` where the `eventname` is `Eagles` or `Cold War Kids`. 

```
select any_value(dateid) as dateid, eventname from event where eventname in('Eagles', 'Cold War Kids') group by eventname;
```

Following are the results.

```
dateid | eventname
-------+---------------
 1922  | Cold War Kids
 1878  | Eagles
```

# APPROX COUNT\$1DISTINCT function
<a name="approx-count-distinct"></a>

APPROX COUNT\$1DISTINCT provides an efficient way to estimate the number of unique values in a column or dataset.

## Syntax
<a name="approx-count-distinct-syntax"></a>

```
approx_count_distinct(expr[, relativeSD])
```

## Arguments
<a name="approx-count-distinct-arguments"></a>

 *expr*   
The expression or column for which you want to estimate the number of unique values.  
It can be a single column, a complex expression, or a combination of columns. 

*relativeSD*  
An optional parameter that specifies the desired relative standard deviation of the estimate.  
It is a value between 0 and 1, representing the maximum acceptable relative error of the estimate. A smaller relativeSD value will result in a more accurate but slower estimation.   
If this parameter isn't provided, a default value (usually around 0.05 or 5%) is used.

## Returns
<a name="approx-count-distinct-returns"></a>

Returns the estimated cardinality by HyperLogLog\$1\$1. relativeSD defines the maximum relative standard deviation allowed.

## Example
<a name="approx-count-distinct-example"></a>

The following query estimates the number of unique values in the `col1` column, with a relative standard deviation of 1% (0.01).

```
SELECT approx_count_distinct(col1, 0.01)
```

The following query estimates that there are 3 unique values in the `col1` column (the values 1, 2, and 3).

```
SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1)
```

# APPROX PERCENTILE function
<a name="approx-percentile"></a>

APPROX PERCENTILE is used to estimate the percentile value of a given expression or column without having to sort the entire dataset. This function is useful in scenarios where you need to quickly understand the distribution of a large dataset or track percentile-based metrics, without the computational overhead of performing an exact percentile calculation. However, it's important to understand the trade-offs between speed and accuracy, and to choose the appropriate error tolerance based on the specific requirements of your use case.

## Syntax
<a name="approx-percentile-syntax"></a>

```
APPROX_PERCENTILE(expr, percentile [, accuracy])
```

## Arguments
<a name="approx-percentile-syntax.arguments"></a>

 *expr*   
The expression or column for which you want to estimate the percentile value.  
It can be a single column, a complex expression, or a combination of columns. 

*percentile*  
The percentile value you want to estimate, expressed as a value between 0 and 1.   
For example, 0.5 would correspond to the 50th percentile (median).

*accuracy*  
An optional parameter that specifies the desired accuracy of the percentile estimate. It is a value between 0 and 1, representing the maximum acceptable relative error of the estimate. A smaller `accuracy` value will result in a more precise but slower estimation. If this parameter isn't provided, a default value (usually around 0.05 or 5%) is used.

## Returns
<a name="approx-percentile-syntax.returns"></a>

Returns the approximate percentile of the numeric or ANSI interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. 

The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. 

Higher value of accuracy yields better accuracy, `1.0/accuracy` is the relative error of the approximation. 

When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

## Examples
<a name="approx-percentile-syntax-example"></a>

The following query estimates the 95th percentile of the `response_time` column, with a maximum relative error of 1% (0.01).

```
SELECT APPROX_PERCENTILE(response_time, 0.95, 0.01) AS p95_response_time
FROM my_table;
```

The following query estimates the 50th, 40th, and 10th percentile values of the `col` column in the `tab` table.

```
SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col)
```

The following query estimates the 50th percentile (median) of the values in the col column.

```
SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col)
```

# AVG function
<a name="avg-function"></a>

The AVG function returns the average (arithmetic mean) of the input expression values. The AVG function works with numeric values and ignores NULL values.

## Syntax
<a name="avg-function-syntax"></a>

```
AVG (column)
```

## Arguments
<a name="avg-function-arguments"></a>

*column*  
The target column that the function operates on. The column is one of the following data types:  
+ SMALLINT
+ INTEGER
+ BIGINT
+ DECIMAL
+ DOUBLE
+ FLOAT

## Data types
<a name="avg-function-data-types"></a>

The argument types supported by the AVG function are SMALLINT, INTEGER, BIGINT, DECIMAL, and DOUBLE.

The return types supported by the AVG function are:
+ BIGINT for any integer type argument
+ DOUBLE for a floating point argument
+ Returns the same data type as expression for any other argument type

The default precision for an AVG function result with a DECIMAL argument is 38. The scale of the result is the same as the scale of the argument. For example, an AVG of a DEC(5,2) column returns a DEC(38,2) data type.

## Example
<a name="avg-function-example"></a>

Find the average quantity sold per transaction from the SALES table.

```
select avg(qtysold) from sales;
```

# BOOL\$1AND function
<a name="BOOL_AND"></a>

The BOOL\$1AND function operates on a single Boolean or integer column or expression. This function applies similar logic to the BIT\$1AND and BIT\$1OR functions. For this function, the return type is a Boolean value (`true` or `false`).

If all values in a set are true, the BOOL\$1AND function returns `true` (`t`). If any value is false, the function returns `false` (`f`).

## Syntax
<a name="BOOL_AND-synopsis"></a>

```
BOOL_AND ( [DISTINCT | ALL] expression )
```

## Arguments
<a name="BOOL_AND-arguments"></a>

 *expression *   
The target column or expression that the function operates on. This expression must have a BOOLEAN or integer data type. The return type of the function is BOOLEAN.

DISTINCT \$1 ALL  
With the argument DISTINCT, the function eliminates all duplicate values for the specified expression before calculating the result. With the argument ALL, the function retains all duplicate values. ALL is the default. 

## Examples
<a name="bool_and_example"></a>

You can use the Boolean functions against either Boolean expressions or integer expressions. 

For example, the following query return results from the standard USERS table in the TICKIT database, which has several Boolean columns.

The BOOL\$1AND function returns `false` for all five rows. Not all users in each of those states likes sports.

```
select state, bool_and(likesports) from users 
group by state order by state limit 5;

state | bool_and
------+---------
AB    | f
AK    | f
AL    | f
AZ    | f
BC    | f
(5 rows)
```

# BOOL\$1OR function
<a name="BOOL_OR"></a>

The BOOL\$1OR function operates on a single Boolean or integer column or expression. This function applies similar logic to the BIT\$1AND and BIT\$1OR functions. For this function, the return type is a Boolean value (`true`, `false`, or `NULL`).

If a value in a set is `true`, the BOOL\$1OR function returns `true` (`t`). If a value in a set is `false`, the function returns `false` (`f`). NULL can be returned if the value is unknown.

## Syntax
<a name="BOOL_OR-synopsis"></a>

```
BOOL_OR ( [DISTINCT | ALL] expression )
```

## Arguments
<a name="BOOL_OR-arguments"></a>

 *expression *   
The target column or expression that the function operates on. This expression must have a BOOLEAN or integer data type. The return type of the function is BOOLEAN.

DISTINCT \$1 ALL  
With the argument DISTINCT, the function eliminates all duplicate values for the specified expression before calculating the result. With the argument ALL, the function retains all duplicate values. ALL is the default. 

## Examples
<a name="bool_or_example"></a>

You can use the Boolean functions with either Boolean expressions or integer expressions. For example, the following query return results from the standard USERS table in the TICKIT database, which has several Boolean columns.

The BOOL\$1OR function returns `true` for all five rows. At least one user in each of those states likes sports.

```
select state, bool_or(likesports) from users 
group by state order by state limit 5;

state | bool_or 
------+--------
AB    | t      
AK    | t      
AL    | t       
AZ    | t       
BC    | t       
(5 rows)
```

The following example returns NULL.

```
SELECT BOOL_OR(NULL = '123')
               bool_or
------                  
NULL
```

# CARDINALITY function
<a name="CARDINALITY"></a>

The CARDINALITY function returns the size of an ARRAY or MAP expression (*expr*).

This function is useful to find the size or length of an array.

## Syntax
<a name="CARDINALITY-syntax"></a>

```
cardinality(expr)
```

## Arguments
<a name="CARDINALITY-arguments"></a>

 *expr*   
An ARRAY or MAP expression.

## Returns
<a name="CARDINALITY-returns"></a>

Returns the size of an array or a map (INTEGER). 

The function returns `NULL` for null input if `sizeOfNull` is set to `false` or `enabled` is set to `true`. 

Otherwise, the function returns `-1` for null input. With the default settings, the function returns `-1` for null input.

## Example
<a name="CARDINALITY-example"></a>

The following query calculates the cardinality, or the number of elements, in the given array. The array (`'b', 'd', 'c', 'a'`) has 4 elements, so the output of this query would be `4`.

```
SELECT cardinality(array('b', 'd', 'c', 'a'));
 4
```

# COLLECT\$1LIST function
<a name="COLLECT_LIST"></a>

The COLLECT\$1LIST function collects and returns a list of non-unique elements. 

This type of function is useful when you want to collect multiple values from a set of rows into a single array or list data structure.

**Note**  
The function is non-deterministic because the order of the collected results depends on the order of the rows, which may be non-deterministic after a shuffle operation is performed.

## Syntax
<a name="COLLECT_LIST-syntax"></a>

```
collect_list(expr)
```

## Arguments
<a name="COLLECT_LIST-arguments"></a>

 *expr*   
An expression of any type.

## Returns
<a name="COLLECT_LIST-returns"></a>

Returns an ARRAY of the argument type. The order of elements in the array is non-deterministic. 

NULL values are excluded.

If DISTINCT is specified, the function collects only unique values and is a synonym for `collect_set `aggregate function.

## Example
<a name="COLLECT_LIST-example"></a>

The following query collects all the values from the col column into a list. The `VALUES` clause is used to create an inline table with three rows, where each row has a single column col with the values 1, 2, and 1 respectively. The `collect_list() `function is then used to aggregate all the values from the col column into a single array. The output of this SQL statement would be the array `[1,2,1]`, which contains all the values from the col column in the order they appeared in the input data.

```
SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]
```

# COLLECT\$1SET function
<a name="COLLECT_SET"></a>

The COLLECT\$1SET function collects and returns a set of unique elements. 

This function is useful when you want to collect all the distinct values from a set of rows into a single data structure, without including any duplicates.

**Note**  
The function is non-deterministic because the order of the collected results depends on the order of the rows, which may be non-deterministic after a shuffle operation is performed.

## Syntax
<a name="COLLECT_SET-syntax"></a>

```
collect_set(expr)
```

## Arguments
<a name="COLLECT_SET-arguments"></a>

 *expr*   
An expression of any type except MAP.

## Returns
<a name="COLLECT_SET-returns"></a>

Returns an ARRAY of the argument type. The order of elements in the array is non-deterministic. 

NULL values are excluded.

## Example
<a name="COLLECT_SET-example"></a>

The following query collects all the unique values from the col column into a set. The `VALUES` clause is used to create an inline table with three rows, where each row has a single column col with the values 1, 2, and 1 respectively. The `collect_set()` function is then used to aggregate all the unique values from the col column into a single set. The output of this SQL statement would be the set `[1,2]`, which contains the unique values from the col column. The duplicate value of 1 is only included once in the result.

```
SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2]
```

# COUNT and COUNT DISTINCT functions
<a name="count-function"></a>

The COUNT function counts the rows defined by the expression. The COUNT DISTINCT function computes the number of distinct non-NULL values in a column or expression. It eliminates all duplicate values from the specified expression before doing the count.

## Syntax
<a name="count-function-syntax"></a>

```
COUNT (DISTINCT column)
```

## Arguments
<a name="count-function-arguments"></a>

*column*  
The target column that the function operates on.

## Data types
<a name="count-function-data-types"></a>

The COUNT function and the COUNT DISTINCT function supports all argument data types.

The COUNT DISTINCT function returns BIGINT.

## Examples
<a name="count-function-examples"></a>

Count all of the users from the state of Florida.

```
select count (identifier) from users where state='FL';
```

Count all of the unique venue IDs from the EVENT table.

```
select count (distinct venueid) as venues from event;
```

# COUNT function
<a name="COUNT"></a>

 The COUNT function counts the rows defined by the expression.

The COUNT function has the following variations.
+ COUNT ( \$1 ) counts all the rows in the target table whether they include nulls or not.
+ COUNT ( *expression* ) computes the number of rows with non-NULL values in a specific column or expression.
+ COUNT ( DISTINCT *expression* ) computes the number of distinct non-NULL values in a column or expression.

## Syntax
<a name="COUNT-synopsis"></a>

```
COUNT( * | expression )
```

```
COUNT ( [ DISTINCT | ALL ] expression )
```

## Arguments
<a name="COUNT-arguments"></a>

 *expression *   
The target column or expression that the function operates on. The COUNT function supports all argument data types.

DISTINCT \$1 ALL  
With the argument DISTINCT, the function eliminates all duplicate values from the specified expression before doing the count. With the argument ALL, the function retains all duplicate values from the expression for counting. ALL is the default.

## Return type
<a name="Supported_data_types_count"></a>

The COUNT function returns BIGINT.

## Examples
<a name="COUNT-examples"></a>

Count all of the users from the state of Florida:

```
select count(*) from users where state='FL';

count
-------
510
```

Count all of the event names from the EVENT table:

```
select count(eventname) from event;

count
-------
8798
```

Count all of the event names from the EVENT table:

```
select count(all eventname) from event;

count
-------
8798
```

Count all of the unique venue IDs from the EVENT table:

```
select count(distinct venueid) as venues from event;

venues
--------
204
```

Count the number of times each seller listed batches of more than four tickets for sale. Group the results by seller ID:

```
select count(*), sellerid from listing 
where numtickets > 4
group by sellerid
order by 1 desc, 2;

count | sellerid
------+----------
12    |    6386
11    |    17304
11    |    20123
11    |    25428
...
```

# MAX function
<a name="MAX"></a>

 The MAX function returns the maximum value in a set of rows. DISTINCT or ALL might be used but do not affect the result. 

## Syntax
<a name="MAX-synopsis"></a>

```
MAX ( [ DISTINCT | ALL ] expression )
```

## Arguments
<a name="MAX-arguments"></a>

 *expression *   
The target column or expression that the function operates on. The *expression* is any numerical data type.

DISTINCT \$1 ALL   
With the argument DISTINCT, the function eliminates all duplicate values from the specified expression before calculating the maximum. With the argument ALL, the function retains all duplicate values from the expression for calculating the maximum. ALL is the default. 

## Data types
<a name="Supported_data_types_max"></a>

Returns the same data type as *expression*. 

## Examples
<a name="MAX-examples"></a>

Find the highest price paid from all sales: 

```
select max(pricepaid) from sales;

max
----------
12624.00
(1 row)
```

Find the highest price paid per ticket from all sales: 

```
select max(pricepaid/qtysold) as max_ticket_price
from sales;

max_ticket_price
-----------------
2500.00000000
(1 row)
```

# MEDIAN function
<a name="MEDIAN"></a>

## Syntax
<a name="MEDIAN-synopsis"></a>

```
MEDIAN ( median_expression )
```

## Arguments
<a name="MEDIAN-arguments"></a>

 *median\$1expression*   
The target column or expression that the function operates on.

# MIN function
<a name="MIN"></a>

 The MIN function returns the minimum value in a set of rows. DISTINCT or ALL might be used but do not affect the result.

## Syntax
<a name="MIN-synopsis"></a>

```
MIN ( [ DISTINCT | ALL ] expression )
```

## Arguments
<a name="MIN-arguments"></a>

 *expression *   
The target column or expression that the function operates on. The *expression* is any numerical data type.

DISTINCT \$1 ALL  
With the argument DISTINCT, the function eliminates all duplicate values from the specified expression before calculating the minimum. With the argument ALL, the function retains all duplicate values from the expression for calculating the minimum. ALL is the default.

## Data types
<a name="Supported_data_types_min"></a>

 Returns the same data type as *expression*. 

## Examples
<a name="MIN-examples"></a>

Find the lowest price paid from all sales:

```
select min(pricepaid) from sales;

min
-------
20.00
(1 row)
```

Find the lowest price paid per ticket from all sales:

```
select min(pricepaid/qtysold)as min_ticket_price
from sales;

min_ticket_price
------------------
20.00000000
(1 row)
```

# PERCENTILE function
<a name="percentile"></a>

The PERCENTILE function is used to calculates the exact percentile value by first sorting the values in the `col` column and then finding the value at the specified `percentage`.

The PERCENTILE function is useful when you need to calculate the exact percentile value and the computational cost is acceptable for your use case. It provides more accurate results than the APPROX\$1PERCENTILE function, but may be slower, especially for large datasets. 

In contrast, the APPROX\$1PERCENTILE function is a more efficient alternative that can provide an estimate of the percentile value with a specified error tolerance, making it more suitable for scenarios where speed is a higher priority than absolute precision.

## Syntax
<a name="percentile-syntax"></a>

```
percentile(col, percentage [, frequency])
```

## Arguments
<a name="percentile-arguments"></a>

 *col*   
The expression or column for which you want to calculate the percentile value.

*percentage*  
The percentile value you want to calculate, expressed as a value between 0 and 1.   
For example, 0.5 would correspond to the 50th percentile (median).

*frequency *  
An optional parameter that specifies the frequency or weight of each value in the `col` column. If provided, the function will calculate the percentile based on the frequency of each value.

## Returns
<a name="percentile-returns"></a>

Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. 

The value of percentage must be between 0.0 and 1.0. 

The value of frequency should be positive integral

## Example
<a name="percentile-example"></a>

The following query finds the value that is greater than or equal to 30% of the values in the `col` column. Since the values are 0 and 10, the 30th percentile is 3.0, because it is the value that is greater than or equal to 30% of the data.

```
SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col);
 3.0
```

# SKEWNESS function
<a name="SKEWNESS"></a>

The SKEWNESS function returns the skewness value calculated from values of a group. 

Skewness is a statistical measure that describes the asymmetry or lack of symmetry in a dataset. It provides information about the shape of the data distribution.

This function can be useful in understanding the statistical properties of a dataset and informing further analysis or decision-making.

## Syntax
<a name="SKEWNESS-syntax"></a>

```
skewness(expr)
```

## Arguments
<a name="SKEWNESS-arguments"></a>

 *expr*   
An expression that evaluates to a numeric.

## Returns
<a name="SKEWNESS-returns"></a>

Returns DOUBLE. 

If DISTINCT is specified, the function operates only on a unique set of *expr* values. 

## Examples
<a name="SKEWNESS-examples"></a>

The following query calculates the skewness of the values in the `col` column. In this example, the `VALUES` clause is used to create an inline table with four rows, where each row has a single column `col` with the values -10, -20, 100, and 1000. The `skewness()` function is then used to calculate the skewness of the values in the `col` column. The result, 1.1135657469022011, represents the degree and direction of skewness in the data. A positive skewness value indicates that the data is skewed to the right, with the bulk of the values concentrated on the left side of the distribution. A negative skewness value indicates that the data is skewed to the left, with the bulk of the values concentrated on the right side of the distribution.

```
SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 1.1135657469022011
```

The following query calculates the skewness of the values in the col column. Similar to the previous example, the `VALUES` clause is used to create an inline table with four rows, where each row has a single column `col` with the values -1000, -100, 10, and 20. The `skewness()` function is then used to calculate the skewness of the values in the `col` column. The result, -1.1135657469022011, represents the degree and direction of skewness in the data. In this case, the negative skewness value indicates that the data is skewed to the left, with the bulk of the values concentrated on the right side of the distribution.

```
SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
 -1.1135657469022011
```

# STDDEV\$1SAMP and STDDEV\$1POP functions
<a name="STDDEV_functions"></a>

 The STDDEV\$1SAMP and STDDEV\$1POP functions return the sample and population standard deviation of a set of numeric values (integer, decimal, or floating-point). The result of the STDDEV\$1SAMP function is equivalent to the square root of the sample variance of the same set of values. 

STDDEV\$1SAMP and STDDEV are synonyms for the same function. 

## Syntax
<a name="STDDEV_functions-syntax"></a>

```
STDDEV_SAMP | STDDEV ( [ DISTINCT | ALL ] expression) STDDEV_POP ( [ DISTINCT | ALL ] expression)
```

The expression must have numeric data type. Regardless of the data type of the expression, the return type of this function is a double precision number. 

**Note**  
Standard deviation is calculated using floating point arithmetic, which might result in slight imprecision.

## Usage notes
<a name="STDDEV_usage_notes"></a>

When the sample standard deviation (STDDEV or STDDEV\$1SAMP) is calculated for an expression that consists of a single value, the result of the function is NULL not 0. 

## Examples
<a name="STDDEV_functions-examples"></a>

The following query returns the average of the values in the VENUESEATS column of the VENUE table, followed by the sample standard deviation and population standard deviation of the same set of values. VENUESEATS is an INTEGER column. The scale of the result is reduced to 2 digits. 

```
select avg(venueseats),
cast(stddev_samp(venueseats) as dec(14,2)) stddevsamp,
cast(stddev_pop(venueseats) as dec(14,2)) stddevpop
from venue;

avg  | stddevsamp | stddevpop
-------+------------+-----------
17503 |   27847.76 |  27773.20
(1 row)
```

The following query returns the sample standard deviation for the COMMISSION column in the SALES table. COMMISSION is a DECIMAL column. The scale of the result is reduced to 10 digits. 

```
select cast(stddev(commission) as dec(18,10))
from sales;

stddev
----------------
130.3912659086
(1 row)
```

The following query casts the sample standard deviation for the COMMISSION column as an integer. 

```
select cast(stddev(commission) as integer)
from sales;

stddev
--------
130
(1 row)
```

The following query returns both the sample standard deviation and the square root of the sample variance for the COMMISSION column. The results of these calculations are the same. 

```
select
cast(stddev_samp(commission) as dec(18,10)) stddevsamp,
cast(sqrt(var_samp(commission)) as dec(18,10)) sqrtvarsamp
from sales;

stddevsamp   |  sqrtvarsamp
----------------+----------------
130.3912659086 | 130.3912659086
(1 row)
```

# SUM and SUM DISTINCT functions
<a name="sum-function"></a>

The SUM function returns the sum of the input column or expression values. The SUM function works with numeric values and ignores NULL values. 

The SUM DISTINCT function eliminates all duplicate values from the specified expression before calculating the sum.

## Syntax
<a name="sum-function-syntax"></a>

```
SUM (DISTINCT column )
```

## Arguments
<a name="sum-function-arguments"></a>

*column*  
The target column that the function operates on. The column is any numeric data types.

## Examples
<a name="sum-function-examples"></a>

Find the sum of all commissions paid from the SALES table.

```
select sum(commission) from sales
```

Find the sum of all distinct commissions paid from the SALES table.

```
select sum (distinct (commission)) from sales
```

# VAR\$1SAMP and VAR\$1POP functions
<a name="VARIANCE_functions"></a>

 The VAR\$1SAMP and VAR\$1POP functions return the sample and population variance of a set of numeric values (integer, decimal, or floating-point). The result of the VAR\$1SAMP function is equivalent to the squared sample standard deviation of the same set of values. 

VAR\$1SAMP and VARIANCE are synonyms for the same function. 

## Syntax
<a name="VARIANCE_functions-syntax"></a>

```
VAR_SAMP | VARIANCE ( [ DISTINCT | ALL ] expression)
VAR_POP ( [ DISTINCT | ALL ] expression)
```

The expression must have an integer, decimal, or floating-point data type. Regardless of the data type of the expression, the return type of this function is a double precision number. 

**Note**  
The results of these functions might vary across data warehouse clusters, depending on the configuration of the cluster in each case. 

## Usage notes
<a name="VARIANCE_usage_notes"></a>

When the sample variance (VARIANCE or VAR\$1SAMP) is calculated for an expression that consists of a single value, the result of the function is NULL not 0. 

## Examples
<a name="VARIANCE_functions-examples"></a>

The following query returns the rounded sample and population variance of the NUMTICKETS column in the LISTING table. 

```
select avg(numtickets),
round(var_samp(numtickets)) varsamp,
round(var_pop(numtickets)) varpop
from listing;

avg | varsamp | varpop
-----+---------+--------
10 |      54 |     54
(1 row)
```

The following query runs the same calculations but casts the results to decimal values. 

```
select avg(numtickets),
cast(var_samp(numtickets) as dec(10,4)) varsamp,
cast(var_pop(numtickets) as dec(10,4)) varpop
from listing;

avg | varsamp | varpop
-----+---------+---------
10 | 53.6291 | 53.6288
(1 row)
```