

# Aggregate Functions
<a name="sql-reference-aggregate-functions"></a>

Instead of returning a result calculated from a single row, an aggregate function returns a result calculated from aggregated data contained in a finite set of rows, or from information about a finite set of rows. An aggregate function may appear in any of the following:
+ <selection list> portion of a [SELECT clause](sql-reference-select-clause.md)
+ [ORDER BY clause](sql-reference-order-by-clause.md)
+ [HAVING clause](sql-reference-having-clause.md)

An aggregate function is different from [Analytic Functions](sql-reference-analytic-functions.md), which are always evaluated relative to a window that must be specified, and so they can't appear in a HAVING clause. Other differences are described in the table later in this topic.

Aggregate functions operate slightly differently in aggregate queries on tables than when you use them in aggregate queries on streams, as follows. If an aggregate query on tables contains a GROUP BY clause, the aggregate function returns one result per group in the set of input rows. Lacking an explicit GROUP BY clause is equivalent to GROUP BY (), and returns only one result for the entire set of input rows.

On streams, an aggregate query must contain an explicit GROUP BY clause on a monotonic expression based on rowtime. Without one, the sole group is the whole stream, which never ends, preventing any result from being reported. Adding a GROUP BY clause based on a monotonic expression breaks the stream into finite sets of rows, contiguous in time, and each such set can then be aggregated and reported.

Whenever a row arrives that changes the value of the monotonic grouping expression, a new group is started and the previous group is considered complete. Then, the Amazon Kinesis Data Analytics application outputs the value of the aggregate functions. Note that the GROUP BY clause may also include other non-monotonic expressions, in which case more than one result per set of rows may be produced.

Performing an aggregate query on streams is often referred to as streaming aggregation, as distinct from the windowed aggregation discussed in [Analytic Functions](sql-reference-analytic-functions.md) and [Windowed Aggregation on Streams](sql-reference-windowed-aggregation-stream.md). For more information about stream-to-stream joins, see [JOIN clause](sql-reference-join-clause.md).

If an input row contains a `null` in a column used as an input to a data analysis function, the data analysis function ignores the row (except for COUNT).


**Differences Between Aggregate and Analytic Functions**  

| Function Type | Outputs | Rows or Windows Used | Notes | 
| --- | --- | --- | --- | 
|  Aggregate Functions  |  One output row per group of input rows.  |  All output columns are calculated over the same window or same group of rows.  |  COUNT DISTINCT is not allowed in streaming aggregation. Statements of the following type are not allowed: SELECT COUNT(DISTINCT x) ... FROM ... GROUP BY ...  | 
|  [Analytic Functions](sql-reference-analytic-functions.md)  |  One output row for each input row.  |  Each output column may be calculated using a different window or partition.  |  COUNT DISTINCT can't be used as [Analytic Functions](sql-reference-analytic-functions.md) or in windowed aggregation.  | 

## Streaming Aggregation and Rowtime Bounds
<a name="w2aac22b7c21"></a>

Normally, an aggregate query generates a result when a row arrives that changes the value of the monotonic expression in the GROUP BY. For example, if the query is grouped by FLOOR(rowtime TO MINUTE), and the rowtime of the current row is 9:59.30, then a new row with a rowtime of 10:00.00 will trigger the result. 

Alternately, a rowtime bound can be used to advance the monotonic expression and enable the query to return a result. For example, if the query is grouped by FLOOR(rowtime TO MINUTE), and the rowtime of the current row is 9:59.30, then an incoming rowtime bound of 10:00.00 the query to return a result. 

## Aggregate Function List
<a name="w2aac22b7c23"></a>

Amazon Kinesis Data Analytics supports the following aggregate functions:
+ [AVG](sql-reference-avg.md)
+ [COUNT](sql-reference-count.md)
+ [COUNT\$1DISTINCT\$1ITEMS\$1TUMBLING Function](count-distinct-items.md)
+ [EXP\$1AVG](sql-reference-exp-avg.md)
+ [FIRST\$1VALUE](sql-reference-first-value.md)
+ [LAST\$1VALUE](sql-reference-last-value.md)
+ [MAX](sql-reference-max.md)
+ [MIN](sql-reference-min.md)
+ [SUM](sql-reference-sum.md)
+ [TOP\$1K\$1ITEMS\$1TUMBLING Function](top-k.md)



The following SQL uses the AVG aggregate function as part of a query to find the average age of all employees:

```
SELECT
    AVG(AGE) AS AVERAGE_AGE
FROM SALES.EMPS;
```

Result:


| AVERAGE\$1AGE | 
| --- | 
|  38  | 



To find the average age of employees in each department, we can add an explicit GROUP BY clause to the query:

```
SELECT
    DEPTNO,
    AVG(AGE) AS AVERAGE_AGE
FROM SALES.EMPS
GROUP BY DEPTNO;
```

Returns:


| DEPTNO | AVERAGE\$1AGE | 
| --- | --- | 
|  10  |  30  | 
|  20  |  25  | 
|  30  |  40  | 
|  40  |  57  | 

## Examples of Aggregate Queries on Streams (Streaming Aggregation)
<a name="w2aac22b7c25"></a>

For this example, assume that the data in the following table is flowing through the stream called WEATHERSTREAM.


| ROWTIME | CITY | TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  32  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  9  | 
|  2018-11-01 13:00:00.0  |  Denver  |  50  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  10  | 
|  2018-11-01 18:00:00.0  |  Miami  |  71  | 
|  2018-11-01 19:00:00.0  |  Denver  |  43  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  4  | 
|  2018-11-02 01:00:00.0  |  Denver  |  39  | 
|  2018-11-02 07:00:00.0  |  Denver  |  46  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  | 
|  2018-11-02 13:00:00.0  |  Denver  |  56  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  | 
|  2018-11-02 19:00:00.0  |  Denver  |  50  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  | 



If you want to find the minimum and maximum temperature recorded anywhere each day (globally regardless of city), the minimum and maximum temperature can be calculated using the aggregate functions MIN and MAX respectively. To indicate that we want this information on a per-day basis (and to provide a monotonic expression as the argument of the GROUP BY clause), we use the FLOOR function to round each row's rowtime down to the nearest day: 

```
SELECT STREAM
    
    FLOOR(WEATHERSTREAM.ROWTIME to DAY) AS FLOOR_DAY,  
    MIN(TEMP) AS MIN_TEMP,
    MAX(TEMP) AS MAX_TEMP
FROM WEATHERSTREAM

GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY);
```

The result of the aggregate query is shown in the following table.


| FLOOR\$1DAY | MIN\$1TEMP | MAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-01 00:00:00.0  |  2  |  71  | 
|  2018-11-02 00:00:00.0  |  2  |  56  | 



There is no row for 2018-11-03, even though the example data does include temperature measurements on that day. This is because the rows for 2018-11-03 cannot be aggregated until all rows for that day are known to have arrived, and that will only happen when either a row with a rowtime of 2018-11-04 00:00:00.0 (or later) or a rowtime bound of 2018-11-04 00:00:00.0 (or later) arrives. If and when either did arrive, the next result would be as described in the following table.


| FLOOR\$1DAY | MIN\$1TEMP | MAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-03 00:00:00.0  |  1  |  36  | 

Let's say that instead of finding the global minimum and maximum temperatures each day, we want to find the minimum, maximum, and average temperature for each city each day. To do this, we use the SUM and COUNT aggregate functions to compute the average, and add CITY to the GROUP BY clause, as shown following:

```
SELECT STREAM
    FLOOR(WEATHERSTREAM.ROWTIME TO DAY) AS FLOOR_DAY,
    
    CITY,
    MIN(TEMP) AS MIN_TEMP,
    MAX(TEMP) AS MAX_TEMP,
    SUM(TEMP)/COUNT(TEMP) AS AVG_TEMP
FROM WEATHERSTREAM
GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY), CITY;
```

The result of the aggregate query is shown in the following table.


| FLOOR\$1DAY | CITY | MIN\$1TEMP | MAX\$1TEMP | AVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-01 00:00:00.0  |  Anchorage  |  2  |  10  |  7  | 
|  2018-11-01 00:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-01 00:00:00.0  |  Miami  |  65  |  71  |  68  | 
|  2018-11-02 00:00:00.0  |  Anchorage  |  2  |  4  |  3  | 
|  2018-11-02 00:00:00.0  |  Denver  |  39  |  56  |  47  | 



In this case, the arrival of rows for a new day's temperature measurements triggers the aggregation of the previous day's data, grouped by CITY, which then results in one row being produced per city included in the day's measurements.

Here again, a rowtime bound 2018-11-04 00:00:00.0 could be used to prompt a result for 2018-11-03 prior to any actual measurements for 2018-11-04 coming in is shown in the following table.


| FLOOR\$1DAY | CITY | MIN\$1TEMP | MAX\$1TEMP | AVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-03 00:00:00.0  |  Anchorage  |  1  |  1  |  1  | 
|  2018-11-03 00:00:00.0  |  Denver  |  36  |  36  |  36  | 

# Windowed Aggregation on Streams
<a name="sql-reference-windowed-aggregation-stream"></a>

To illustrate how windowed aggregation works on Amazon Kinesis data streams, assume that the data in the following table is flowing through a stream called WEATHERSTREAM.


| ROWTIME | CITY | TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  32  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  9  | 
|  2018-11-01 13:00:00.0  |  Denver  |  50  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  10  | 
|  2018-11-01 18:00:00.0  |  Miami  |  71  | 
|  2018-11-01 19:00:00.0  |  Denver  |  43  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  4  | 
|  2018-11-02 01:00:00.0  |  Denver  |  39  | 
|  2018-11-02 07:00:00.0  |  Denver  |  46  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  | 
|  2018-11-02 13:00:00.0  |  Denver  |  56  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  | 
|  2018-11-02 19:00:00.0  |  Denver  |  50  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  | 

Suppose that you want to find the minimum and maximum temperature recorded in the 24-hour period prior to any given reading, globally, regardless of city. To do this, you define a window of `RANGE INTERVAL '1' DAY PRECEDING`, and use it in the `OVER` clause for the `MIN` and `MAX` analytic functions:

```
 SELECT STREAM
        ROWTIME,
        MIN(TEMP) OVER W1 AS WMIN_TEMP,
        MAX(TEMP) OVER W1 AS WMAX_TEMP
 FROM WEATHERSTREAM
 WINDOW W1 AS (
    RANGE INTERVAL '1' DAY PRECEDING
);
```

## Results
<a name="sqlrf_windowed_aggregation-results"></a>


| ROWTIME | WMIN\$1TEMP | WMAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  29  |  29  | 
|  2018-11-01 01:00:00.0  |  2  |  29  | 
|  2018-11-01 06:00:00.0  |  2  |  65  | 
|  2018-11-01 07:00:00.0  |  2  |  65  | 
|  2018-11-01 09:00:00.0  |  2  |  65  | 
|  2018-11-01 13:00:00.0  |  2  |  65  | 
|  2018-11-01 17:00:00.0  |  2  |  65  | 
|  2018-11-01 18:00:00.0  |  2  |  71  | 
|  2018-11-01 19:00:00.0  |  2  |  71  | 
|  2018-11-02 01:00:00.0  |  2  |  71  | 
|  2018-11-02 01:00:00.0  |  2  |  71  | 
|  2018-11-02 07:00:00.0  |  4  |  71  | 
|  2018-11-02 09:00:00.0  |  3  |  71  | 
|  2018-11-02 13:00:00.0  |  3  |  71  | 
|  2018-11-02 17:00:00.0  |  2  |  71  | 
|  2018-11-02 19:00:00.0  |  2  |  56  | 
|  2018-11-03 01:00:00.0  |  2  |  56  | 
|  2018-11-03 01:00:00.0  |  1  |  56  | 



Now, assume that you want to find the minimum, maximum, and average temperature recorded in the 24-hour period prior to any given reading, broken down by city. To do this, you add a `PARTITION BY` clause on `CITY` to the window specification, and add the `AVG` analytic function over the same window to the selection list:

```
 SELECT STREAM
        ROWTIME,
        CITY,
        MIN(TEMP) over W1 AS WMIN_TEMP,
        MAX(TEMP) over W1 AS WMAX_TEMP,
        AVG(TEMP) over W1 AS WAVG_TEMP
 FROM AGGTEST.WEATHERSTREAM
 WINDOW W1 AS (
        PARTITION BY CITY
        RANGE INTERVAL '1' DAY PRECEDING
 );
```

### Results
<a name="sqlrf_windowed_aggregation_results"></a>


| ROWTIME | CITY | WMIN\$1TEMP | WMAX\$1TEMP | WAVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  |  29  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  |  2  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  |  65  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  29  |  32  |  30  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  2  |  9  |  5  | 
|  2018-11-01 13:00:00.0  |  Denver  |  29  |  50  |  37  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  2  |  10  |  7  | 
|  2018-11-01 18:00:00.0  |  Miami  |  65  |  71  |  68  | 
|  2018-11-01 19:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  2  |  10  |  6  | 
|  2018-11-02 01:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-02 07:00:00.0  |  Denver  |  32  |  50  |  42  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  |  10  |  6  | 
|  2018-11-02 13:00:00.0  |  Denver  |  39  |  56  |  46  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  |  10  |  4  | 
|  2018-11-02 19:00:00.0  |  Denver  |  39  |  56  |  46  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  |  56  |  45  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  |  4  |  2  | 

## Examples of Rowtime Bounds and Windowed Aggregation
<a name="sqlrf-windowed-aggregation-examples"></a>

This is an example of a windowed aggregate query:

```
 SELECT STREAM ROWTIME, ticker, amount, SUM(amount)
    OVER (
        PARTITION BY ticker
        RANGE INTERVAL '1' HOUR PRECEDING)
 AS hourlyVolume
 FROM Trades
```

Because this is a query on a stream, rows pop out of this query as soon as they go in. For example, given the following inputs:

```
Trades: IBM 10 10 10:00:00
Trades: ORCL 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 10:25:00
Trades: IBM 30 11:05:00
Trades.bound: 11:10:00
```

In this example, the output is as follows:

```
Trades: IBM 10 10 10:00:00
Trades: ORCL 20 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 35 10:25:00
Trades: IBM 30 30 11:05:00
Trades.bound: 11:10:00
```

The rows still hang around behind the scenes for an hour, and thus the second ORCL row output has a total of 35; but the original IBM trade falls outside the "hour preceding" window, and so it is excluded from the IBM sum.

## Example
<a name="sqlrf_windowed_aggregation_example"></a>

Some business problems seem to need totals over the whole history of a stream, but this is usually not practical to compute. However, such business problems are often solvable by looking at the last day, the last hour, or the last N records. Sets of such records are called *windowed aggregates*.

They are easy to compute in a stream database, and can be expressed in ANSI (SQL:2008) standard SQL as follows:

```
SELECT STREAM ticker,
      avg(price) OVER lastHour AS avgPrice,
      max(price) OVER lastHour AS maxPrice
   FROM Bids
   WINDOW lastHour AS  (
      PARTITION BY ticker
      RANGE INTERVAL '1' HOUR PRECEDING)
```

## 
<a name="sqlrf_windowed_aggregation_notes"></a>

**Note**  
The `Interval_clause` must be of one of the following appropriate types:   
Integer literal with ROWS
Numeric value for RANGE over a numeric column
INTERVAL for a RANGE over a date/time/timestamp

# AVG
<a name="sql-reference-avg"></a>

Returns the average of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html). To return an exponentially weighted average of a stream of value expressions selected in a specified time window, see [EXP\$1AVG](sql-reference-exp-avg.md).

When you use AVG, be aware of the following:
+ If you don't use the `OVER` clause, `AVG` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `AVG` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c30b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c30b9b2"></a>

```
AVG(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c30b9b4"></a>

```
AVG([DISTINCT | ALL] number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c30c11"></a>

DISTINCT

Performs the aggregate function only on each unique instance of a value.

ALL

Performs the aggregate function on all values. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression, returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-avg-examples"></a>

### Example Dataset
<a name="w2aac22b7c30c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Average of Values Using the GROUP BY Clause
<a name="avg_example_1"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `AVG` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="avg_example_1_step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    avg_price     DOUBLE);  
    
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        AVG(price) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, 
        STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="avg_example_1_floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    avg_price     DOUBLE);  

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        AVG(price) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, 
        FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="avg_example_1_results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols NFS, WAS, PPL, ALY with corresponding dates and average prices.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-avg-example-1.png)


### Example 2: Return the Average of Values Using the OVER Clause
<a name="w2aac22b7c30c13b6"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `AVG` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    avg_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        AVG(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for timestamp, ticker symbol, and average price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-avg-example-2.png)


## Usage Notes
<a name="w2aac22b7c30c15"></a>

Amazon Kinesis Analytics doesn't support `AVG` applied to interval types. This functionality is a departure from the SQL:2008 standard.

 When used as an analytic function, `AVG` returns null if the window being evaluated contains no rows, or if all rows contain null values. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). AVG also returns null for a `PARTITION BY` clause for which the partition within the window matching the input row contains no rows or all rows are null. For more information about `PARTITION BY`, see [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md). 

 `AVG` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 2: 
+ AVG(1, 2, 3) = 2
+ AVG(1,null, 2, null, 3, null) = 2

## Related Topics
<a name="w2aac22b7c30c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [EXP\$1AVG](sql-reference-exp-avg.md)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# COUNT
<a name="sql-reference-count"></a>

Returns the number of qualifying rows of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use COUNT, be aware of the following:
+ If you don't use the `OVER` clause, `COUNT` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `COUNT` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c32b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c32b9b2"></a>

```
COUNT(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c32b9b4"></a>

```
COUNT(* | ALL number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c32c11"></a>

\$1

Counts all rows.

ALL

Counts all rows. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-count-examples"></a>

### Example Dataset
<a name="w2aac22b7c32c13b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Number of Values Using the GROUP BY Clause
<a name="w2aac22b7c32c13b5"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `COUNT` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-count-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    count_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        COUNT(Price) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-count-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    count_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        COUNT(Price) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-count-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols and prices for AAPL, WSB, and UHN at specific times.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-example-1.png)


### Example 2: Return the Number of Values Using the OVER Clause
<a name="w2aac22b7c32c13b9"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `COUNT` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    count_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        COUNT(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing timestamp, stock ticker symbols, and prices for four different stocks.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-example-2.png)


## Usage Notes
<a name="w2aac22b7c32c15"></a>

Amazon Kinesis Analytics doesn't support the `FILTER` clause of the `COUNT` function or the use of `COUNT DISTINCT` in either aggregate functions or analytic functions. For more information on aggregate and analytic functions, see [Aggregate Functions](sql-reference-aggregate-functions.md) and [Analytic Functions](sql-reference-analytic-functions.md). This functionality is a departure from the SQL:2008 standard. 

 When used as an analytic function, `COUNT` returns zero if the window being evaluated contains no rows. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). COUNT also returns zero for a `PARTITION BY` clause for which the partition within the window matching the input row contains no rows. For more information about `PARTITION BY`, see [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md). 

 `COUNT` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 3: 
+ COUNT(1, 2, 3) = 3
+ COUNT(1,null, 2, null, 3, null) = 3

## Related Topics
<a name="w2aac22b7c32c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# COUNT\$1DISTINCT\$1ITEMS\$1TUMBLING Function
<a name="count-distinct-items"></a>

Returns a count of the number of distinct items in the specified in-application stream column over a tumbling window. The resulting count is approximate; the function uses the HyperLogLog algorithm. 

For more information, see [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog). 

When you use `COUNT_DISTINCT_ITEMS_TUMBLING`, be aware of the following:
+ When there are less than or equal to 10,000 items in the window, the function returns an exact count.
+ Getting an exact count of the number of distinct items can be inefficient and costly. Therefore, this function approximates the count. For example, if there are 100,000 distinct items, the algorithm may return 99,700. If cost and efficiency is not a consideration, you can write your own SELECT statement to get the exact count.

  The following example demonstrates how to get an exact count of distinct rows for each ticker symbol in a five second tumbling window. The SELECT statement uses all of the columns (except ROWTIME) in determining the uniqueness.

  ```
  CREATE OR REPLACE STREAM output_stream (ticker_symbol VARCHAR(4), unique_count BIGINT); 
  
  
  CREATE OR REPLACE PUMP stream_pump AS 
  INSERT INTO output_stream 
  SELECT STREAM TICKER_SYMBOL, COUNT(distinct_stream.price) AS unique_count
    FROM (
      SELECT STREAM DISTINCT rowtime as window_time, 
  
        TICKER_SYMBOL,   
        CHANGE, 
        PRICE, 
        STEP((SOURCE_SQL_STREAM_001.rowtime) BY INTERVAL '5' SECOND) 
  
        FROM SOURCE_SQL_STREAM_001) as distinct_stream 
    GROUP BY TICKER_SYMBOL, 
        STEP((distinct_stream.window_time) BY INTERVAL '5' SECOND);
  ```



The function operates on a [tumbling window](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html). You specify the size of the tumbling window as a parameter.

## Syntax
<a name="count-distinct-items-syntax"></a>

```
COUNT_DISTINCT_ITEMS_TUMBLING (

      in-application-streamPointer,
      'columnName',     
      windowSize
   )
```

## Parameters
<a name="count-distinct-items-parameters"></a>

The following sections describe the parameters. 

### in-application-streamPointer
<a name="count-distinct-items-inputstream"></a>

Using this parameter, you provide a pointer to an in-application stream. You can set a pointer using the `CURSOR` function. For example, the following statement sets a pointer to `InputStream`. 

```
CURSOR(SELECT STREAM * FROM InputStream)
```

### columnName
<a name="count-distinct-items-column-name"></a>

Column name in your in-application stream that you want the function to use to count distinct values. Note the following about the column name:
+ Must appear in single quotation marks ('). For example, `'column1'`.

### windowSize
<a name="count-distinct-items-window-size"></a>

Size of the tumbling window in seconds. The size should be at least 1 second and should not exceed 1 hour = 3600 seconds.

## Examples
<a name="count-distinct-items-examples"></a>

### Example Dataset
<a name="w2aac22b7c34c20b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Approximate the number of distinct values in a column
<a name="w2aac22b7c34c20b5"></a>

The following example demonstrates how to use the `COUNT_DISTINCT_ITEMS_TUMBLING` function to approximate the number of distinct `TICKER_SYMBOL` values in the current tumbling window of the in-application stream. For more information about tumbling windows, see [Tumbling Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).

```
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (
    NUMBER_OF_DISTINCT_ITEMS BIGINT);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
   INSERT INTO "DESTINATION_SQL_STREAM"
      SELECT STREAM * 
      FROM TABLE(COUNT_DISTINCT_ITEMS_TUMBLING(
          CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),  -- pointer to the data stream
            'TICKER_SYMBOL',                                     -- name of column in single quotes
            60                                                   -- tumbling window size in seconds
      )
);
```

The preceding example outputs a stream similar to the following:

![\[Table showing ROWTIME and NUMBER_OF_DISTINCT_ITEMS columns with four identical entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-distinct-example-1.png)


# EXP\$1AVG
<a name="sql-reference-exp-avg"></a>

```
EXP_AVG ( expression, <time-interval> )
```

EXP\$1AVG returns an exponentially-weighted average ([exponential moving average](https://en.wikipedia.org/wiki/Moving_average)) of a stream of value expressions selected in a specified time window. EXP\$1AVG divides the specified window into intervals based on the value of <time-interval>. The values of the specified expression are weighted the most heavily for the most recent time-intervals and exponentially less heavily for earlier intervals.

## Example
<a name="sql-reference-exp-avg-example"></a>

This example creates an exponentially-weighted average of the price of each stock ticker over a 30-second window such that the prices (for that ticker symbol) in the most recent 10-second subwindow carry double the weight of the prices in the middle 10-second subwindow and four times the weight of the prices in the oldest 10-second subwindow.

```
select stream t.rowtime, ticker, price,
exp_avg(price, INTERVAL '10' SECOND) over w as avgPrice
from t
window w as (partition by ticker range interval '30' second preceding);
```

In this example, 10 seconds is the half-life of the decay function, that is, the period over which the weights applied to the prices being averaged decrease by a factor of two. In other words, the older one will be given half as much weight as the newer one. It is specified as the time\$1interval in the call to EXP\$1AVG as interval '10' second .

# FIRST\$1VALUE
<a name="sql-reference-first-value"></a>

```
FIRST_VALUE( <value-expression>) <null treatment> OVER <window-specification>
```

FIRST\$1VALUE returns the evaluation of the <value expression> from the first row that qualifies for the aggregate. FIRST\$1VALUE requires the OVER clause, and is considered an [Analytic Functions](sql-reference-analytic-functions.md). FIRST\$1VALUE has a null treatment option defined in the following table.


| Null treatment option | Effect | 
| --- | --- | 
|  FIRST\$1VALUE(x) IGNORE NULLS OVER <window-specification>  |  Returns first non null value of x in <window-specification>  | 
|  FIRST\$1VALUE(x) RESPECT NULLS OVER <window-specification>  |  Returns first value, including null of x in <window-specification>  | 
|  FIRST\$1VALUE(x) OVER <window-specification>  |  Returns first value, including null of x in <window-specification>  | 

# LAST\$1VALUE
<a name="sql-reference-last-value"></a>

```
LAST_VALUE ( <value-expression> )  OVER <window-specification>
```

LAST\$1VALUE returns the evaluation of the <value expression> from the last row that qualifies for the aggregate. 


| Null Treatment Option | Effect | 
| --- | --- | 
|  LAST\$1VALUE(x) IGNORE NULLS OVER <window-specification>  |  Returns last non null value of x in <window-specification>  | 
|  LAST\$1VALUE(x) RESPECT NULLS OVER <window-specification>  |  Returns last value, including null of x in <window-specification>  | 
|  LAST\$1VALUE(x) OVER <window-specification>  |  Returns last value, including null of x in <window-specification>  | 

# MAX
<a name="sql-reference-max"></a>

Returns the maximum value of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about window queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use MAX, be aware of the following:
+ If you don't use the `OVER` clause, `MAX` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `MAX` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c42b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c42b9b2"></a>

```
MAX(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c42b9b4"></a>

```
MAX(number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c42c11"></a>

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-max-examples"></a>

### Example Dataset
<a name="w2aac22b7c42c13b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Maximum Value Using the GROUP BY Clause
<a name="w2aac22b7c42c13b5"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `MAX` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (recommended)
<a name="sql-reference-max-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    max_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MAX(Price) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-max-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    max_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MAX(Price) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-max-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing ROWTIME, TICKER_SYMBOL, and MAX_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-max-example-1.png)


### Example 2: Return the Maximum Value Using the OVER Clause
<a name="w2aac22b7c42c13b7"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `MAX` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    max_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        MAX(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock ticker symbols QAZ, QXZ, MJN, WSB with corresponding timestamps and maximum prices.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-max-example-2.png)


## Usage Notes
<a name="w2aac22b7c42c15"></a>

For string values, MAX is determined by which string is last in the collating sequence.

 If MAX is used as an analytic function and the window being evaluated contains no rows, MAX returns null. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). 

## Related Topics
<a name="w2aac22b7c42c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# MIN
<a name="sql-reference-min"></a>

Returns the minimum value of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use MIN, be aware of the following:
+ If you don't use the `OVER` clause, `MIN` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `MIN` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c44b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c44b9b2"></a>

```
MIN(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c44b9b4"></a>

```
MIN(number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c44c11"></a>

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-min-examples"></a>

### Example Dataset
<a name="w2aac22b7c44c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Minimum Value Using the GROUP BY Clause
<a name="w2aac22b7c44c13b4"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `MIN` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-min-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    min_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MIN(Price) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-min-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    min_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MIN(Price) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-min-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols, timestamps, and minimum prices for four different stocks.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-min-example-1.png)


### Example 2: Return the Minimum Value Using the OVER Clause
<a name="w2aac22b7c44c13b6"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `MIN` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    min_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        MIN(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock ticker symbols and minimum prices for NFS, NFLX, ASD, and DFG on 2017-02-17.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-min-example-2.png)


## Usage Notes
<a name="w2aac22b7c44c15"></a>

For string values, MIN is determined by which string is last in the collating sequence.

 If MIN is used as an analytic function and the window being evaluated contains no rows, MIN returns null. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). 

## Related Topics
<a name="w2aac22b7c44c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# SUM
<a name="sql-reference-sum"></a>

Returns the sum of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use SUM, be aware of the following:
+ If you don't use the `OVER` clause, `SUM` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `SUM` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c46b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c46b9b2"></a>

```
SUM(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c46b9b4"></a>

```
SUM([DISTINCT | ALL] number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c46c11"></a>

DISTINCT

Counts only distinct values.

ALL

Counts all rows. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-sum-examples"></a>

### Example Dataset
<a name="w2aac22b7c46c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Sum of Values Using the GROUP BY Clause
<a name="w2aac22b7c46c13b4"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `SUM` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-sum-example-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sum_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        SUM(price) AS sum_price
    FROM "SOURCE_SQL_STREAM_001"
     GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-sum-example-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sum_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        SUM(price) AS sum_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-sum-example-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing data stream with timestamp, ticker symbol, and sum price columns.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-sum-example-1.png)


## Usage Notes
<a name="w2aac22b7c46c15"></a>

Amazon Kinesis Analytics doesn't support `SUM` applied to interval types. This functionality is a departure from the SQL:2008 standard.

`SUM` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 6:
+ SUM(1, 2, 3) = 6
+ SUM(1,null, 2, null, 3, null) = 6

## Related Topics
<a name="w2aac22b7c46c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# TOP\$1K\$1ITEMS\$1TUMBLING Function
<a name="top-k"></a>

Returns the most frequently occurring values in the specified in-application stream column over a tumbling window. This can be used to find trending (most popular) values in a specified column.

For example, the Getting Started exercise uses a demo stream that provides continuous stock price updates (ticker\$1symbol, price, change, and other columns). Suppose you want to find the three most frequently traded stocks in each 1-minute tumbling window. You can use this function to find those ticker symbols.

When you use `TOP_K_ITEMS_TUMBLING`, be aware of the following:
+ Counting each incoming record on your streaming source is not efficient, therefore the function approximates the most frequently occurring values. For example, when seeking the three most traded stocks, the function may return three of the five most traded stocks.

The function operates on a [tumbling window](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html). You specify the window size as a parameter. 

For a sample application with step-by-step instructions, see [Most Frequently Occurring Values](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/top-k-example.html).

## Syntax
<a name="top-k-syntax"></a>

```
TOP_K_ITEMS_TUMBLING (
      in-application-streamPointer,
      'columnName',     
      K,
      windowSize,
   )
```

## Parameters
<a name="top-k-parameters"></a>

The following sections describe the parameters. 

### in-application-streamPointer
<a name="top-k-inputstream"></a>

Pointer to an in-application stream. You can set a pointer using the CURSOR function. For example, the following statement sets a pointer to InputStream. 

```
CURSOR(SELECT STREAM * FROM InputStream)
```

### columnName
<a name="top-k-column-name"></a>

Column name in your in-application stream that you want to use to compute the topK values. Note the following about the column name:

**Note**  
The column name must appear in single quotation marks ('). For example, `'column1'`.

### K
<a name="top-k-k"></a>

Using this parameter, you specify how many of the most frequently occurring values from a specific column you want returned. The value K must be greater than or equal to one and cannot exceed 100,000.

### windowSize
<a name="top-k-window-size"></a>

Size of the tumbling window in seconds. The size must be greater than or equal to one second and must not exceed 3600 seconds (one hour).

## Examples
<a name="top-k-examples"></a>

### Example Dataset
<a name="w2aac22b7c48c18b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Most Frequently Occurring Values
<a name="w2aac22b7c48c18b5"></a>

The following example retrieves the most frequently occuring values in the sample stream created in the [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/getting-started.html) tutorial.

```
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (
  "TICKER_SYMBOL" VARCHAR(4), 
  "MOST_FREQUENT_VALUES" BIGINT
);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM * 
        FROM TABLE (TOP_K_ITEMS_TUMBLING(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            'TICKER_SYMBOL',         -- name of column in single quotes
            5,                       -- number of the most frequently occurring values
            60                       -- tumbling window size in seconds
            )
        );
```

The preceding example outputs a stream similar to the following.

![\[Table showing data stream with columns for timestamp, ticker symbol, and frequency values.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-topk-example-1.png)
