

# Functions
<a name="sql-reference-functions"></a>

The topics in this section describe functions supported by streaming SQL.

**Topics**
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Boolean Functions](sql-reference-boolean-functions.md)
+ [Conversion Functions](sql-reference-conversion-functions.md)
+ [Date and Time Functions](sql-reference-date-time-functions.md)
+ [Null Functions](sql-reference-null-functions.md)
+ [Numeric Functions](sql-reference-numeric-functions.md)
+ [Log Parsing Functions](sql-reference-pattern-matching-functions.md)
+ [Sorting Functions](sql-reference-sorting-functions.md)
+ [Statistical Variance and Deviation Functions](sql-reference-statistical-variance-deviation-functions.md)
+ [Streaming SQL Functions](sql-reference-streaming-sql-functions.md)
+ [String and Search Functions](sql-reference-string-and-search-functions.md)

# Aggregate Functions
<a name="sql-reference-aggregate-functions"></a>

Instead of returning a result calculated from a single row, an aggregate function returns a result calculated from aggregated data contained in a finite set of rows, or from information about a finite set of rows. An aggregate function may appear in any of the following:
+ <selection list> portion of a [SELECT clause](sql-reference-select-clause.md)
+ [ORDER BY clause](sql-reference-order-by-clause.md)
+ [HAVING clause](sql-reference-having-clause.md)

An aggregate function is different from [Analytic Functions](sql-reference-analytic-functions.md), which are always evaluated relative to a window that must be specified, and so they can't appear in a HAVING clause. Other differences are described in the table later in this topic.

Aggregate functions operate slightly differently in aggregate queries on tables than when you use them in aggregate queries on streams, as follows. If an aggregate query on tables contains a GROUP BY clause, the aggregate function returns one result per group in the set of input rows. Lacking an explicit GROUP BY clause is equivalent to GROUP BY (), and returns only one result for the entire set of input rows.

On streams, an aggregate query must contain an explicit GROUP BY clause on a monotonic expression based on rowtime. Without one, the sole group is the whole stream, which never ends, preventing any result from being reported. Adding a GROUP BY clause based on a monotonic expression breaks the stream into finite sets of rows, contiguous in time, and each such set can then be aggregated and reported.

Whenever a row arrives that changes the value of the monotonic grouping expression, a new group is started and the previous group is considered complete. Then, the Amazon Kinesis Data Analytics application outputs the value of the aggregate functions. Note that the GROUP BY clause may also include other non-monotonic expressions, in which case more than one result per set of rows may be produced.

Performing an aggregate query on streams is often referred to as streaming aggregation, as distinct from the windowed aggregation discussed in [Analytic Functions](sql-reference-analytic-functions.md) and [Windowed Aggregation on Streams](sql-reference-windowed-aggregation-stream.md). For more information about stream-to-stream joins, see [JOIN clause](sql-reference-join-clause.md).

If an input row contains a `null` in a column used as an input to a data analysis function, the data analysis function ignores the row (except for COUNT).


**Differences Between Aggregate and Analytic Functions**  

| Function Type | Outputs | Rows or Windows Used | Notes | 
| --- | --- | --- | --- | 
|  Aggregate Functions  |  One output row per group of input rows.  |  All output columns are calculated over the same window or same group of rows.  |  COUNT DISTINCT is not allowed in streaming aggregation. Statements of the following type are not allowed: SELECT COUNT(DISTINCT x) ... FROM ... GROUP BY ...  | 
|  [Analytic Functions](sql-reference-analytic-functions.md)  |  One output row for each input row.  |  Each output column may be calculated using a different window or partition.  |  COUNT DISTINCT can't be used as [Analytic Functions](sql-reference-analytic-functions.md) or in windowed aggregation.  | 

## Streaming Aggregation and Rowtime Bounds
<a name="w2aac22b7c21"></a>

Normally, an aggregate query generates a result when a row arrives that changes the value of the monotonic expression in the GROUP BY. For example, if the query is grouped by FLOOR(rowtime TO MINUTE), and the rowtime of the current row is 9:59.30, then a new row with a rowtime of 10:00.00 will trigger the result. 

Alternately, a rowtime bound can be used to advance the monotonic expression and enable the query to return a result. For example, if the query is grouped by FLOOR(rowtime TO MINUTE), and the rowtime of the current row is 9:59.30, then an incoming rowtime bound of 10:00.00 the query to return a result. 

## Aggregate Function List
<a name="w2aac22b7c23"></a>

Amazon Kinesis Data Analytics supports the following aggregate functions:
+ [AVG](sql-reference-avg.md)
+ [COUNT](sql-reference-count.md)
+ [COUNT\$1DISTINCT\$1ITEMS\$1TUMBLING Function](count-distinct-items.md)
+ [EXP\$1AVG](sql-reference-exp-avg.md)
+ [FIRST\$1VALUE](sql-reference-first-value.md)
+ [LAST\$1VALUE](sql-reference-last-value.md)
+ [MAX](sql-reference-max.md)
+ [MIN](sql-reference-min.md)
+ [SUM](sql-reference-sum.md)
+ [TOP\$1K\$1ITEMS\$1TUMBLING Function](top-k.md)



The following SQL uses the AVG aggregate function as part of a query to find the average age of all employees:

```
SELECT
    AVG(AGE) AS AVERAGE_AGE
FROM SALES.EMPS;
```

Result:


| AVERAGE\$1AGE | 
| --- | 
|  38  | 



To find the average age of employees in each department, we can add an explicit GROUP BY clause to the query:

```
SELECT
    DEPTNO,
    AVG(AGE) AS AVERAGE_AGE
FROM SALES.EMPS
GROUP BY DEPTNO;
```

Returns:


| DEPTNO | AVERAGE\$1AGE | 
| --- | --- | 
|  10  |  30  | 
|  20  |  25  | 
|  30  |  40  | 
|  40  |  57  | 

## Examples of Aggregate Queries on Streams (Streaming Aggregation)
<a name="w2aac22b7c25"></a>

For this example, assume that the data in the following table is flowing through the stream called WEATHERSTREAM.


| ROWTIME | CITY | TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  32  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  9  | 
|  2018-11-01 13:00:00.0  |  Denver  |  50  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  10  | 
|  2018-11-01 18:00:00.0  |  Miami  |  71  | 
|  2018-11-01 19:00:00.0  |  Denver  |  43  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  4  | 
|  2018-11-02 01:00:00.0  |  Denver  |  39  | 
|  2018-11-02 07:00:00.0  |  Denver  |  46  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  | 
|  2018-11-02 13:00:00.0  |  Denver  |  56  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  | 
|  2018-11-02 19:00:00.0  |  Denver  |  50  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  | 



If you want to find the minimum and maximum temperature recorded anywhere each day (globally regardless of city), the minimum and maximum temperature can be calculated using the aggregate functions MIN and MAX respectively. To indicate that we want this information on a per-day basis (and to provide a monotonic expression as the argument of the GROUP BY clause), we use the FLOOR function to round each row's rowtime down to the nearest day: 

```
SELECT STREAM
    
    FLOOR(WEATHERSTREAM.ROWTIME to DAY) AS FLOOR_DAY,  
    MIN(TEMP) AS MIN_TEMP,
    MAX(TEMP) AS MAX_TEMP
FROM WEATHERSTREAM

GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY);
```

The result of the aggregate query is shown in the following table.


| FLOOR\$1DAY | MIN\$1TEMP | MAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-01 00:00:00.0  |  2  |  71  | 
|  2018-11-02 00:00:00.0  |  2  |  56  | 



There is no row for 2018-11-03, even though the example data does include temperature measurements on that day. This is because the rows for 2018-11-03 cannot be aggregated until all rows for that day are known to have arrived, and that will only happen when either a row with a rowtime of 2018-11-04 00:00:00.0 (or later) or a rowtime bound of 2018-11-04 00:00:00.0 (or later) arrives. If and when either did arrive, the next result would be as described in the following table.


| FLOOR\$1DAY | MIN\$1TEMP | MAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-03 00:00:00.0  |  1  |  36  | 

Let's say that instead of finding the global minimum and maximum temperatures each day, we want to find the minimum, maximum, and average temperature for each city each day. To do this, we use the SUM and COUNT aggregate functions to compute the average, and add CITY to the GROUP BY clause, as shown following:

```
SELECT STREAM
    FLOOR(WEATHERSTREAM.ROWTIME TO DAY) AS FLOOR_DAY,
    
    CITY,
    MIN(TEMP) AS MIN_TEMP,
    MAX(TEMP) AS MAX_TEMP,
    SUM(TEMP)/COUNT(TEMP) AS AVG_TEMP
FROM WEATHERSTREAM
GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO DAY), CITY;
```

The result of the aggregate query is shown in the following table.


| FLOOR\$1DAY | CITY | MIN\$1TEMP | MAX\$1TEMP | AVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-01 00:00:00.0  |  Anchorage  |  2  |  10  |  7  | 
|  2018-11-01 00:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-01 00:00:00.0  |  Miami  |  65  |  71  |  68  | 
|  2018-11-02 00:00:00.0  |  Anchorage  |  2  |  4  |  3  | 
|  2018-11-02 00:00:00.0  |  Denver  |  39  |  56  |  47  | 



In this case, the arrival of rows for a new day's temperature measurements triggers the aggregation of the previous day's data, grouped by CITY, which then results in one row being produced per city included in the day's measurements.

Here again, a rowtime bound 2018-11-04 00:00:00.0 could be used to prompt a result for 2018-11-03 prior to any actual measurements for 2018-11-04 coming in is shown in the following table.


| FLOOR\$1DAY | CITY | MIN\$1TEMP | MAX\$1TEMP | AVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-03 00:00:00.0  |  Anchorage  |  1  |  1  |  1  | 
|  2018-11-03 00:00:00.0  |  Denver  |  36  |  36  |  36  | 

# Windowed Aggregation on Streams
<a name="sql-reference-windowed-aggregation-stream"></a>

To illustrate how windowed aggregation works on Amazon Kinesis data streams, assume that the data in the following table is flowing through a stream called WEATHERSTREAM.


| ROWTIME | CITY | TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  32  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  9  | 
|  2018-11-01 13:00:00.0  |  Denver  |  50  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  10  | 
|  2018-11-01 18:00:00.0  |  Miami  |  71  | 
|  2018-11-01 19:00:00.0  |  Denver  |  43  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  4  | 
|  2018-11-02 01:00:00.0  |  Denver  |  39  | 
|  2018-11-02 07:00:00.0  |  Denver  |  46  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  | 
|  2018-11-02 13:00:00.0  |  Denver  |  56  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  | 
|  2018-11-02 19:00:00.0  |  Denver  |  50  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  | 

Suppose that you want to find the minimum and maximum temperature recorded in the 24-hour period prior to any given reading, globally, regardless of city. To do this, you define a window of `RANGE INTERVAL '1' DAY PRECEDING`, and use it in the `OVER` clause for the `MIN` and `MAX` analytic functions:

```
 SELECT STREAM
        ROWTIME,
        MIN(TEMP) OVER W1 AS WMIN_TEMP,
        MAX(TEMP) OVER W1 AS WMAX_TEMP
 FROM WEATHERSTREAM
 WINDOW W1 AS (
    RANGE INTERVAL '1' DAY PRECEDING
);
```

## Results
<a name="sqlrf_windowed_aggregation-results"></a>


| ROWTIME | WMIN\$1TEMP | WMAX\$1TEMP | 
| --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  29  |  29  | 
|  2018-11-01 01:00:00.0  |  2  |  29  | 
|  2018-11-01 06:00:00.0  |  2  |  65  | 
|  2018-11-01 07:00:00.0  |  2  |  65  | 
|  2018-11-01 09:00:00.0  |  2  |  65  | 
|  2018-11-01 13:00:00.0  |  2  |  65  | 
|  2018-11-01 17:00:00.0  |  2  |  65  | 
|  2018-11-01 18:00:00.0  |  2  |  71  | 
|  2018-11-01 19:00:00.0  |  2  |  71  | 
|  2018-11-02 01:00:00.0  |  2  |  71  | 
|  2018-11-02 01:00:00.0  |  2  |  71  | 
|  2018-11-02 07:00:00.0  |  4  |  71  | 
|  2018-11-02 09:00:00.0  |  3  |  71  | 
|  2018-11-02 13:00:00.0  |  3  |  71  | 
|  2018-11-02 17:00:00.0  |  2  |  71  | 
|  2018-11-02 19:00:00.0  |  2  |  56  | 
|  2018-11-03 01:00:00.0  |  2  |  56  | 
|  2018-11-03 01:00:00.0  |  1  |  56  | 



Now, assume that you want to find the minimum, maximum, and average temperature recorded in the 24-hour period prior to any given reading, broken down by city. To do this, you add a `PARTITION BY` clause on `CITY` to the window specification, and add the `AVG` analytic function over the same window to the selection list:

```
 SELECT STREAM
        ROWTIME,
        CITY,
        MIN(TEMP) over W1 AS WMIN_TEMP,
        MAX(TEMP) over W1 AS WMAX_TEMP,
        AVG(TEMP) over W1 AS WAVG_TEMP
 FROM AGGTEST.WEATHERSTREAM
 WINDOW W1 AS (
        PARTITION BY CITY
        RANGE INTERVAL '1' DAY PRECEDING
 );
```

### Results
<a name="sqlrf_windowed_aggregation_results"></a>


| ROWTIME | CITY | WMIN\$1TEMP | WMAX\$1TEMP | WAVG\$1TEMP | 
| --- | --- | --- | --- | --- | 
|  2018-11-01 01:00:00.0  |  Denver  |  29  |  29  |  29  | 
|  2018-11-01 01:00:00.0  |  Anchorage  |  2  |  2  |  2  | 
|  2018-11-01 06:00:00.0  |  Miami  |  65  |  65  |  65  | 
|  2018-11-01 07:00:00.0  |  Denver  |  29  |  32  |  30  | 
|  2018-11-01 09:00:00.0  |  Anchorage  |  2  |  9  |  5  | 
|  2018-11-01 13:00:00.0  |  Denver  |  29  |  50  |  37  | 
|  2018-11-01 17:00:00.0  |  Anchorage  |  2  |  10  |  7  | 
|  2018-11-01 18:00:00.0  |  Miami  |  65  |  71  |  68  | 
|  2018-11-01 19:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-02 01:00:00.0  |  Anchorage  |  2  |  10  |  6  | 
|  2018-11-02 01:00:00.0  |  Denver  |  29  |  50  |  38  | 
|  2018-11-02 07:00:00.0  |  Denver  |  32  |  50  |  42  | 
|  2018-11-02 09:00:00.0  |  Anchorage  |  3  |  10  |  6  | 
|  2018-11-02 13:00:00.0  |  Denver  |  39  |  56  |  46  | 
|  2018-11-02 17:00:00.0  |  Anchorage  |  2  |  10  |  4  | 
|  2018-11-02 19:00:00.0  |  Denver  |  39  |  56  |  46  | 
|  2018-11-03 01:00:00.0  |  Denver  |  36  |  56  |  45  | 
|  2018-11-03 01:00:00.0  |  Anchorage  |  1  |  4  |  2  | 

## Examples of Rowtime Bounds and Windowed Aggregation
<a name="sqlrf-windowed-aggregation-examples"></a>

This is an example of a windowed aggregate query:

```
 SELECT STREAM ROWTIME, ticker, amount, SUM(amount)
    OVER (
        PARTITION BY ticker
        RANGE INTERVAL '1' HOUR PRECEDING)
 AS hourlyVolume
 FROM Trades
```

Because this is a query on a stream, rows pop out of this query as soon as they go in. For example, given the following inputs:

```
Trades: IBM 10 10 10:00:00
Trades: ORCL 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 10:25:00
Trades: IBM 30 11:05:00
Trades.bound: 11:10:00
```

In this example, the output is as follows:

```
Trades: IBM 10 10 10:00:00
Trades: ORCL 20 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 35 10:25:00
Trades: IBM 30 30 11:05:00
Trades.bound: 11:10:00
```

The rows still hang around behind the scenes for an hour, and thus the second ORCL row output has a total of 35; but the original IBM trade falls outside the "hour preceding" window, and so it is excluded from the IBM sum.

## Example
<a name="sqlrf_windowed_aggregation_example"></a>

Some business problems seem to need totals over the whole history of a stream, but this is usually not practical to compute. However, such business problems are often solvable by looking at the last day, the last hour, or the last N records. Sets of such records are called *windowed aggregates*.

They are easy to compute in a stream database, and can be expressed in ANSI (SQL:2008) standard SQL as follows:

```
SELECT STREAM ticker,
      avg(price) OVER lastHour AS avgPrice,
      max(price) OVER lastHour AS maxPrice
   FROM Bids
   WINDOW lastHour AS  (
      PARTITION BY ticker
      RANGE INTERVAL '1' HOUR PRECEDING)
```

## 
<a name="sqlrf_windowed_aggregation_notes"></a>

**Note**  
The `Interval_clause` must be of one of the following appropriate types:   
Integer literal with ROWS
Numeric value for RANGE over a numeric column
INTERVAL for a RANGE over a date/time/timestamp

# AVG
<a name="sql-reference-avg"></a>

Returns the average of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html). To return an exponentially weighted average of a stream of value expressions selected in a specified time window, see [EXP\$1AVG](sql-reference-exp-avg.md).

When you use AVG, be aware of the following:
+ If you don't use the `OVER` clause, `AVG` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `AVG` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c30b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c30b9b2"></a>

```
AVG(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c30b9b4"></a>

```
AVG([DISTINCT | ALL] number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c30c11"></a>

DISTINCT

Performs the aggregate function only on each unique instance of a value.

ALL

Performs the aggregate function on all values. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression, returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-avg-examples"></a>

### Example Dataset
<a name="w2aac22b7c30c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Average of Values Using the GROUP BY Clause
<a name="avg_example_1"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `AVG` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="avg_example_1_step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    avg_price     DOUBLE);  
    
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        AVG(price) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, 
        STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="avg_example_1_floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    avg_price     DOUBLE);  

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        AVG(price) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, 
        FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="avg_example_1_results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols NFS, WAS, PPL, ALY with corresponding dates and average prices.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-avg-example-1.png)


### Example 2: Return the Average of Values Using the OVER Clause
<a name="w2aac22b7c30c13b6"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `AVG` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    avg_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        AVG(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for timestamp, ticker symbol, and average price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-avg-example-2.png)


## Usage Notes
<a name="w2aac22b7c30c15"></a>

Amazon Kinesis Analytics doesn't support `AVG` applied to interval types. This functionality is a departure from the SQL:2008 standard.

 When used as an analytic function, `AVG` returns null if the window being evaluated contains no rows, or if all rows contain null values. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). AVG also returns null for a `PARTITION BY` clause for which the partition within the window matching the input row contains no rows or all rows are null. For more information about `PARTITION BY`, see [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md). 

 `AVG` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 2: 
+ AVG(1, 2, 3) = 2
+ AVG(1,null, 2, null, 3, null) = 2

## Related Topics
<a name="w2aac22b7c30c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [EXP\$1AVG](sql-reference-exp-avg.md)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# COUNT
<a name="sql-reference-count"></a>

Returns the number of qualifying rows of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use COUNT, be aware of the following:
+ If you don't use the `OVER` clause, `COUNT` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `COUNT` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c32b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c32b9b2"></a>

```
COUNT(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c32b9b4"></a>

```
COUNT(* | ALL number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c32c11"></a>

\$1

Counts all rows.

ALL

Counts all rows. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-count-examples"></a>

### Example Dataset
<a name="w2aac22b7c32c13b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Number of Values Using the GROUP BY Clause
<a name="w2aac22b7c32c13b5"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `COUNT` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-count-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    count_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        COUNT(Price) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-count-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    count_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        COUNT(Price) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-count-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols and prices for AAPL, WSB, and UHN at specific times.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-example-1.png)


### Example 2: Return the Number of Values Using the OVER Clause
<a name="w2aac22b7c32c13b9"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `COUNT` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    count_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        COUNT(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS count_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing timestamp, stock ticker symbols, and prices for four different stocks.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-example-2.png)


## Usage Notes
<a name="w2aac22b7c32c15"></a>

Amazon Kinesis Analytics doesn't support the `FILTER` clause of the `COUNT` function or the use of `COUNT DISTINCT` in either aggregate functions or analytic functions. For more information on aggregate and analytic functions, see [Aggregate Functions](sql-reference-aggregate-functions.md) and [Analytic Functions](sql-reference-analytic-functions.md). This functionality is a departure from the SQL:2008 standard. 

 When used as an analytic function, `COUNT` returns zero if the window being evaluated contains no rows. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). COUNT also returns zero for a `PARTITION BY` clause for which the partition within the window matching the input row contains no rows. For more information about `PARTITION BY`, see [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md). 

 `COUNT` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 3: 
+ COUNT(1, 2, 3) = 3
+ COUNT(1,null, 2, null, 3, null) = 3

## Related Topics
<a name="w2aac22b7c32c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# COUNT\$1DISTINCT\$1ITEMS\$1TUMBLING Function
<a name="count-distinct-items"></a>

Returns a count of the number of distinct items in the specified in-application stream column over a tumbling window. The resulting count is approximate; the function uses the HyperLogLog algorithm. 

For more information, see [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog). 

When you use `COUNT_DISTINCT_ITEMS_TUMBLING`, be aware of the following:
+ When there are less than or equal to 10,000 items in the window, the function returns an exact count.
+ Getting an exact count of the number of distinct items can be inefficient and costly. Therefore, this function approximates the count. For example, if there are 100,000 distinct items, the algorithm may return 99,700. If cost and efficiency is not a consideration, you can write your own SELECT statement to get the exact count.

  The following example demonstrates how to get an exact count of distinct rows for each ticker symbol in a five second tumbling window. The SELECT statement uses all of the columns (except ROWTIME) in determining the uniqueness.

  ```
  CREATE OR REPLACE STREAM output_stream (ticker_symbol VARCHAR(4), unique_count BIGINT); 
  
  
  CREATE OR REPLACE PUMP stream_pump AS 
  INSERT INTO output_stream 
  SELECT STREAM TICKER_SYMBOL, COUNT(distinct_stream.price) AS unique_count
    FROM (
      SELECT STREAM DISTINCT rowtime as window_time, 
  
        TICKER_SYMBOL,   
        CHANGE, 
        PRICE, 
        STEP((SOURCE_SQL_STREAM_001.rowtime) BY INTERVAL '5' SECOND) 
  
        FROM SOURCE_SQL_STREAM_001) as distinct_stream 
    GROUP BY TICKER_SYMBOL, 
        STEP((distinct_stream.window_time) BY INTERVAL '5' SECOND);
  ```



The function operates on a [tumbling window](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html). You specify the size of the tumbling window as a parameter.

## Syntax
<a name="count-distinct-items-syntax"></a>

```
COUNT_DISTINCT_ITEMS_TUMBLING (

      in-application-streamPointer,
      'columnName',     
      windowSize
   )
```

## Parameters
<a name="count-distinct-items-parameters"></a>

The following sections describe the parameters. 

### in-application-streamPointer
<a name="count-distinct-items-inputstream"></a>

Using this parameter, you provide a pointer to an in-application stream. You can set a pointer using the `CURSOR` function. For example, the following statement sets a pointer to `InputStream`. 

```
CURSOR(SELECT STREAM * FROM InputStream)
```

### columnName
<a name="count-distinct-items-column-name"></a>

Column name in your in-application stream that you want the function to use to count distinct values. Note the following about the column name:
+ Must appear in single quotation marks ('). For example, `'column1'`.

### windowSize
<a name="count-distinct-items-window-size"></a>

Size of the tumbling window in seconds. The size should be at least 1 second and should not exceed 1 hour = 3600 seconds.

## Examples
<a name="count-distinct-items-examples"></a>

### Example Dataset
<a name="w2aac22b7c34c20b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Approximate the number of distinct values in a column
<a name="w2aac22b7c34c20b5"></a>

The following example demonstrates how to use the `COUNT_DISTINCT_ITEMS_TUMBLING` function to approximate the number of distinct `TICKER_SYMBOL` values in the current tumbling window of the in-application stream. For more information about tumbling windows, see [Tumbling Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).

```
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (
    NUMBER_OF_DISTINCT_ITEMS BIGINT);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
   INSERT INTO "DESTINATION_SQL_STREAM"
      SELECT STREAM * 
      FROM TABLE(COUNT_DISTINCT_ITEMS_TUMBLING(
          CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),  -- pointer to the data stream
            'TICKER_SYMBOL',                                     -- name of column in single quotes
            60                                                   -- tumbling window size in seconds
      )
);
```

The preceding example outputs a stream similar to the following:

![\[Table showing ROWTIME and NUMBER_OF_DISTINCT_ITEMS columns with four identical entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-count-distinct-example-1.png)


# EXP\$1AVG
<a name="sql-reference-exp-avg"></a>

```
EXP_AVG ( expression, <time-interval> )
```

EXP\$1AVG returns an exponentially-weighted average ([exponential moving average](https://en.wikipedia.org/wiki/Moving_average)) of a stream of value expressions selected in a specified time window. EXP\$1AVG divides the specified window into intervals based on the value of <time-interval>. The values of the specified expression are weighted the most heavily for the most recent time-intervals and exponentially less heavily for earlier intervals.

## Example
<a name="sql-reference-exp-avg-example"></a>

This example creates an exponentially-weighted average of the price of each stock ticker over a 30-second window such that the prices (for that ticker symbol) in the most recent 10-second subwindow carry double the weight of the prices in the middle 10-second subwindow and four times the weight of the prices in the oldest 10-second subwindow.

```
select stream t.rowtime, ticker, price,
exp_avg(price, INTERVAL '10' SECOND) over w as avgPrice
from t
window w as (partition by ticker range interval '30' second preceding);
```

In this example, 10 seconds is the half-life of the decay function, that is, the period over which the weights applied to the prices being averaged decrease by a factor of two. In other words, the older one will be given half as much weight as the newer one. It is specified as the time\$1interval in the call to EXP\$1AVG as interval '10' second .

# FIRST\$1VALUE
<a name="sql-reference-first-value"></a>

```
FIRST_VALUE( <value-expression>) <null treatment> OVER <window-specification>
```

FIRST\$1VALUE returns the evaluation of the <value expression> from the first row that qualifies for the aggregate. FIRST\$1VALUE requires the OVER clause, and is considered an [Analytic Functions](sql-reference-analytic-functions.md). FIRST\$1VALUE has a null treatment option defined in the following table.


| Null treatment option | Effect | 
| --- | --- | 
|  FIRST\$1VALUE(x) IGNORE NULLS OVER <window-specification>  |  Returns first non null value of x in <window-specification>  | 
|  FIRST\$1VALUE(x) RESPECT NULLS OVER <window-specification>  |  Returns first value, including null of x in <window-specification>  | 
|  FIRST\$1VALUE(x) OVER <window-specification>  |  Returns first value, including null of x in <window-specification>  | 

# LAST\$1VALUE
<a name="sql-reference-last-value"></a>

```
LAST_VALUE ( <value-expression> )  OVER <window-specification>
```

LAST\$1VALUE returns the evaluation of the <value expression> from the last row that qualifies for the aggregate. 


| Null Treatment Option | Effect | 
| --- | --- | 
|  LAST\$1VALUE(x) IGNORE NULLS OVER <window-specification>  |  Returns last non null value of x in <window-specification>  | 
|  LAST\$1VALUE(x) RESPECT NULLS OVER <window-specification>  |  Returns last value, including null of x in <window-specification>  | 
|  LAST\$1VALUE(x) OVER <window-specification>  |  Returns last value, including null of x in <window-specification>  | 

# MAX
<a name="sql-reference-max"></a>

Returns the maximum value of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about window queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use MAX, be aware of the following:
+ If you don't use the `OVER` clause, `MAX` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `MAX` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c42b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c42b9b2"></a>

```
MAX(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c42b9b4"></a>

```
MAX(number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c42c11"></a>

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-max-examples"></a>

### Example Dataset
<a name="w2aac22b7c42c13b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Maximum Value Using the GROUP BY Clause
<a name="w2aac22b7c42c13b5"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `MAX` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (recommended)
<a name="sql-reference-max-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    max_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MAX(Price) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-max-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    max_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MAX(Price) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-max-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing ROWTIME, TICKER_SYMBOL, and MAX_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-max-example-1.png)


### Example 2: Return the Maximum Value Using the OVER Clause
<a name="w2aac22b7c42c13b7"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `MAX` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    max_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        MAX(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS max_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock ticker symbols QAZ, QXZ, MJN, WSB with corresponding timestamps and maximum prices.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-max-example-2.png)


## Usage Notes
<a name="w2aac22b7c42c15"></a>

For string values, MAX is determined by which string is last in the collating sequence.

 If MAX is used as an analytic function and the window being evaluated contains no rows, MAX returns null. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). 

## Related Topics
<a name="w2aac22b7c42c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# MIN
<a name="sql-reference-min"></a>

Returns the minimum value of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use MIN, be aware of the following:
+ If you don't use the `OVER` clause, `MIN` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `MIN` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c44b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c44b9b2"></a>

```
MIN(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c44b9b4"></a>

```
MIN(number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c44c11"></a>

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-min-examples"></a>

### Example Dataset
<a name="w2aac22b7c44c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Minimum Value Using the GROUP BY Clause
<a name="w2aac22b7c44c13b4"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `MIN` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-min-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    min_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MIN(Price) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-min-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    min_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        MIN(Price) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-min-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing stock ticker symbols, timestamps, and minimum prices for four different stocks.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-min-example-1.png)


### Example 2: Return the Minimum Value Using the OVER Clause
<a name="w2aac22b7c44c13b6"></a>

 In this example, the `OVER` clause divides records in a stream partitioned by the time range interval of '1' hour preceding. The `MIN` function is then calculated from the rows returned by the `OVER` clause. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    min_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
        MIN(price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS min_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock ticker symbols and minimum prices for NFS, NFLX, ASD, and DFG on 2017-02-17.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-min-example-2.png)


## Usage Notes
<a name="w2aac22b7c44c15"></a>

For string values, MIN is determined by which string is last in the collating sequence.

 If MIN is used as an analytic function and the window being evaluated contains no rows, MIN returns null. For more information, see [Analytic Functions](sql-reference-analytic-functions.md). 

## Related Topics
<a name="w2aac22b7c44c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# SUM
<a name="sql-reference-sum"></a>

Returns the sum of a group of values from a windowed query. A windowed query is defined in terms of time or rows. For information about windowed queries, see [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html).

When you use SUM, be aware of the following:
+ If you don't use the `OVER` clause, `SUM` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `SUM` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="w2aac22b7c46b9"></a>

### Tumbling Windowed Query
<a name="w2aac22b7c46b9b2"></a>

```
SUM(number-expression) ... GROUP BY monotonic-expression | time-based-expression
```

### Sliding Windowed Query
<a name="w2aac22b7c46b9b4"></a>

```
SUM([DISTINCT | ALL] number-expression) OVER window-specification
```

## Parameters
<a name="w2aac22b7c46c11"></a>

DISTINCT

Counts only distinct values.

ALL

Counts all rows. `ALL` is the default.

*number-expression*

Specifies the value expressions evaluated for each row in the aggregation.

OVER *window-specification*

Divides records in a stream partitioned by the time range interval or the number of rows. A window specification defines how records in the stream are partitioned by the time range interval or the number of rows. 

GROUP BY *monotonic-expression* \$1 *time-based-expression*

Groups records based on the value of the grouping expression returning a single summary row for each group of rows that has identical values in all columns.

## Examples
<a name="sql-reference-sum-examples"></a>

### Example Dataset
<a name="w2aac22b7c46c13b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Sum of Values Using the GROUP BY Clause
<a name="w2aac22b7c46c13b4"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `SUM` function is then calculated from the rows returned by the `GROUP BY` clause.

#### Using STEP (Recommended)
<a name="sql-reference-sum-example-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sum_price     DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        SUM(price) AS sum_price
    FROM "SOURCE_SQL_STREAM_001"
     GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="sql-reference-sum-example-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sum_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        SUM(price) AS sum_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO MINUTE);
```

#### Results
<a name="sql-reference-sum-example-results"></a>

The preceding examples output a stream similar to the following.

![\[Table showing data stream with timestamp, ticker symbol, and sum price columns.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-sum-example-1.png)


## Usage Notes
<a name="w2aac22b7c46c15"></a>

Amazon Kinesis Analytics doesn't support `SUM` applied to interval types. This functionality is a departure from the SQL:2008 standard.

`SUM` ignores null values from the set of values or a numeric expression. For example, each of the following return the value of 6:
+ SUM(1, 2, 3) = 6
+ SUM(1,null, 2, null, 3, null) = 6

## Related Topics
<a name="w2aac22b7c46c17"></a>
+ [Windowed Queries](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/windowed-sql.html)
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [GROUP BY clause](sql-reference-group-by-clause.md)
+ [Analytic Functions](sql-reference-analytic-functions.md)
+ [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html)
+ [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md)

# TOP\$1K\$1ITEMS\$1TUMBLING Function
<a name="top-k"></a>

Returns the most frequently occurring values in the specified in-application stream column over a tumbling window. This can be used to find trending (most popular) values in a specified column.

For example, the Getting Started exercise uses a demo stream that provides continuous stock price updates (ticker\$1symbol, price, change, and other columns). Suppose you want to find the three most frequently traded stocks in each 1-minute tumbling window. You can use this function to find those ticker symbols.

When you use `TOP_K_ITEMS_TUMBLING`, be aware of the following:
+ Counting each incoming record on your streaming source is not efficient, therefore the function approximates the most frequently occurring values. For example, when seeking the three most traded stocks, the function may return three of the five most traded stocks.

The function operates on a [tumbling window](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html). You specify the window size as a parameter. 

For a sample application with step-by-step instructions, see [Most Frequently Occurring Values](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/top-k-example.html).

## Syntax
<a name="top-k-syntax"></a>

```
TOP_K_ITEMS_TUMBLING (
      in-application-streamPointer,
      'columnName',     
      K,
      windowSize,
   )
```

## Parameters
<a name="top-k-parameters"></a>

The following sections describe the parameters. 

### in-application-streamPointer
<a name="top-k-inputstream"></a>

Pointer to an in-application stream. You can set a pointer using the CURSOR function. For example, the following statement sets a pointer to InputStream. 

```
CURSOR(SELECT STREAM * FROM InputStream)
```

### columnName
<a name="top-k-column-name"></a>

Column name in your in-application stream that you want to use to compute the topK values. Note the following about the column name:

**Note**  
The column name must appear in single quotation marks ('). For example, `'column1'`.

### K
<a name="top-k-k"></a>

Using this parameter, you specify how many of the most frequently occurring values from a specific column you want returned. The value K must be greater than or equal to one and cannot exceed 100,000.

### windowSize
<a name="top-k-window-size"></a>

Size of the tumbling window in seconds. The size must be greater than or equal to one second and must not exceed 3600 seconds (one hour).

## Examples
<a name="top-k-examples"></a>

### Example Dataset
<a name="w2aac22b7c48c18b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return the Most Frequently Occurring Values
<a name="w2aac22b7c48c18b5"></a>

The following example retrieves the most frequently occuring values in the sample stream created in the [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/getting-started.html) tutorial.

```
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (
  "TICKER_SYMBOL" VARCHAR(4), 
  "MOST_FREQUENT_VALUES" BIGINT
);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM * 
        FROM TABLE (TOP_K_ITEMS_TUMBLING(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            'TICKER_SYMBOL',         -- name of column in single quotes
            5,                       -- number of the most frequently occurring values
            60                       -- tumbling window size in seconds
            )
        );
```

The preceding example outputs a stream similar to the following.

![\[Table showing data stream with columns for timestamp, ticker symbol, and frequency values.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-topk-example-1.png)


# Analytic Functions
<a name="sql-reference-analytic-functions"></a>

An analytic function is one that returns a result calculated from data in (or about) a finite set of rows identified by a [SELECT clause](sql-reference-select-clause.md) or in the [ORDER BY clause](sql-reference-order-by-clause.md).

The SELECT topic explains the order-by clause, showing the order-by chart, as well as the windowing clause (and window-specification chart). To see where an order-by clause is used in Select statements, see the Select chart in the SELECT topic of this guide. 

1. Analytic functions must specify a window. Since there are a few restrictions on window specifications, and a few differences between specifying windows for windowed aggregation and windowed join, please see [Allowed and Disallowed Window Specifications](sql-reference-allowed-disallowed-window.md) for explanations.

1. Analytic functions may only appear in the <selection list> portion of a SELECT clause or in the ORDER BY clause.

Other differences are described in the table later in this topic.

Performing queries using analytic functions is commonly referred to as windowed aggregation (discussed below), as distinct from [Aggregate Functions](sql-reference-aggregate-functions.md).

Because of the presence of the window specification, queries that use analytic functions produce results in a different manner than do aggregate queries. For each row in the input set, the window specification identifies a different set of rows on which the analytic function operates. If the window specification also includes a PARTITION BY clause, then the only rows in the window that will be considered in producing a result will be those that share the same partition as the input row.

If an input row contains a null in a column used as an input to an analytic function, the analytic function ignores the row, except for COUNT, which does count rows with null values. In cases where the window (or in the case of a PARTITION BY, a partition within the window) contains no rows, an analytic function will return null. The exception to this is COUNT, which returns zero.


**Differences Between Aggregate and Analytic Functions**  

| Function Type | Outputs | Rows or Windows Used | Notes | 
| --- | --- | --- | --- | 
|  [Aggregate Functions](sql-reference-aggregate-functions.md)  |  One output row per group of input rows.  |  All output columns are calculated over the same window or same group of rows.  |  COUNT DISTINCT is not allowed in [Aggregate Functions](sql-reference-aggregate-functions.md). Statements of the following type are not allowed: SELECT COUNT(DISTINCT x) ... FROM ... GROUP BY ...  | 
|  Analytic Functions  |  One output row for each input row.  |  Each output column may be calculated using a different window or partition.  |  COUNT DISTINCT can't be used as analytic functions or in windowed aggregation.  | 

## Related Topics
<a name="sqlrf_analyticfunctions_relatedtopics"></a>
+ [Windowed Aggregation on Streams](sql-reference-windowed-aggregation-stream.md)
+ [SELECT statement](sql-reference-select.md) 
+ [SELECT clause](sql-reference-select-clause.md) 

# Boolean Functions
<a name="sql-reference-boolean-functions"></a>

The topics in this section describe the boolean functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [ANY](sql-reference-any.md)
+ [EVERY](sql-reference-every.md)

# ANY
<a name="sql-reference-any"></a>

```
ANY ( <boolean_expression> )
```



ANY returns true if the supplied boolean\$1expression is true in any of the selected rows. Returns false if the supplied boolean\$1expression is true in none of the selected rows.

**Example**  
The following SQL snippet returns 'true' if the price for any ticker in the stream of trades is below 1. Returns 'false' if every price in the stream is 1 or greater.

```
 SELECT STREAM ANY (price < 1) FROM trades
  GROUP BY (FLOOR trades.rowtime to hour)
```

# EVERY
<a name="sql-reference-every"></a>

```
EVERY ( <boolean_expression> )
```

EVERY returns true if the supplied boolean\$1expression is true in all of the selected rows. Returns false if the supplied boolean\$1expression is false in any of the selected rows.

**Example**  
The following SQL snippet returns 'true' if the price for every ticker in the stream of trades is below 1. Returns 'false' if any price is 1 or greater.

```
 SELECT STREAM EVERY (price < 1) FROM trades
  GROUP BY (FLOOR trades.rowtime to hour)
```

# Conversion Functions
<a name="sql-reference-conversion-functions"></a>

The topics in this section describe the conversion functions for Amazon Kinesis Data Analytics streaming SQL.

For functions that convert to and from Datetime and Timestamp values, see [Datetime Conversion Functions](sql-reference-datetime-conversion-functions.md).

**Topics**
+ [CAST](sql-reference-cast.md)

# CAST
<a name="sql-reference-cast"></a>

CAST lets you convert one value expression or data type to another value expression or data type.

```
CAST ( <cast-operand> AS <cast-target> )
 <cast-operand> := <value-expression>
 <cast-target>  := <data-type>
```

## Valid Conversions
<a name="w2aac22c15c11b7"></a>

Using CAST with source operands of the types listed in the first column below can create cast target types as listed in the second column, without restriction. Other target types are not supported.


| Source Operand Types | Target Operand Types | 
| --- | --- | 
|  Any numeric type (NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT, REAL, DOUBLE)  |  VARCHAR, CHAR, or any numeric type (See Note A.)  | 
|  VARCHAR, CHAR  |  All of the above, plus, DATE, TIME, TIMESTAMP, DAY-TIME INTERVAL, BOOLEAN  | 
|  DATE  |  DATE, VARCHAR, CHAR, TIMESTAMP  | 
|  TIME  |  TIME, VARCHAR, CHAR, TIMESTAMP  | 
|  TIMESTAMP  |  TIME, VARCHAR, CHAR, TIMESTAMP, DATE  | 
|  DAY-TIME INTERVAL  |  DAY-TIME INTERVAL, BIGINT, DECIMAL, CHAR, VARCHAR  | 
|  BOOLEAN  |  VARCHAR, CHAR, BOOLEAN  | 
|  BINARY, VARBINARY  |  BINARY, VARBINARY  | 

### Examples
<a name="sql-reference-cast-exmaples"></a>

#### 2.1 DATE to CHAR/VARCHAR
<a name="w2aac22c15c11b7b6b2"></a>

```
+-------------+
|   EXPR$0    |
+-------------+
| 2008-08-23  |
+-------------+
1 row selected
```

(Note that if an inadequate output specification is supplied, no rows are selected:

```
values(cast(date'2008-08-23' as varchar(9)));
'EXPR$0'
No rows selected
```

(Because the date literal requires 10 characters)

In the next case, the date is blank-padded on the right (because of the semantics of the CHAR datatype):

```
+----------------------------+
|           EXPR$0           |
+----------------------------+
| 2008-08-23                 |
+----------------------------+
1 row selected
```

#### REAL to INTEGER
<a name="w2aac22c15c11b7b6b4"></a>

The real (NUMERIC or DECIMAL) is rounded by the cast:

```
+---------+
| EXPR$0  |
+---------+
| -2      |
+---------+
1 row selected
```

#### STRING to TIMESTAMP
<a name="w2aac22c15c11b7b6b6"></a>

There are two ways to convert a string to a timestamp. The first uses CAST, as shown in the next topic. The other uses [Char To Timestamp(Sys)](sql-reference-char-to-timestamp.md).

## Using CAST to Convert a String to a Timestamp
<a name="w2aac22c15c11b9"></a>

The example below illustrates this method for conversion:

```
'EXPR$0'
'2007-02-19 21:23:45'
1 row selected
```

If the input string lacks any one of the six fields (year, month, day, hours, minutes, seconds), or uses any delimiters different from those shown above, CAST will not return a value. (Fractional seconds are disallowed.)

If the input string is thus not in the appropriate format to be CAST, then to convert the string to a timestamp, you must use the CHAR\$1TO\$1TIMESTAMP method.

## Using CHAR\$1TO\$1TIMESTAMP to convert a String to a Timestamp
<a name="BOOTSTRINGTOTIMESTAMP"></a>

When the input string is not in the appropriate format to be CAST, you can use the CHAR\$1TO\$1TIMESTAMP method. It has the additional advantage that you can specify which parts of the timestamp string you wish to use in subsequent processing, and create a TIMESTAMP value containing only those. To do so, you specify a template that identifies which parts you want, such as 'yyyy-MM' to use only the year and month parts.

The input-date-time string-to-be-converted can contain all or any parts of a full timestamp, that is, values for any or all of the standard elements ('yyyy-MM-dd hh:mm:ss'). If all these elements are present in your input string, and 'yyyy-MM-dd hh:mm:ss' is the template you supply, then the input-string elements are interpreted in that order as year, month, day, hour, minute, and seconds, such as in '2009-09-16 03:15:24'. The yyyy cannot be uppercase; the hh can be uppercase to mean using a 24-hour clock. For many examples of valid specifiers, see the table and examples later in this topic. For the full range of valid specifiers, see [Class SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/index.html?java/text/SimpleDateFormat.html) on the Oracle website.

CHAR\$1TO\$1TIMESTAMP uses the template you specify as a parameter in the function call. The template causes the TIMESTAMP result to use only the parts of the input-date-time value that you specified in the template. Those fields in the resulting TIMESTAMP will then contain the corresponding data taken from your input-date-time string; fields not specified in your template will use default values (see below). The format of the template used by CHAR\$1TO\$1TIMESTAMP is defined by [Class SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/index.html?java/text/SimpleDateFormat.html), at which link all the specifiers are listed, some with examples. For more information, see [Date and Time Patterns](sql-reference-parse-timestamp-format.md).

The function-call syntax is as follows:

```
  CHAR_TO_TIMESTAMP('<format_string>','<input_date_time_string>')
```

Where <format\$1 string> is the template you specify for the parts of <date\$1time\$1string> you want, and <input\$1date\$1time\$1string> is the original string that is being converted to a TIMESTAMP result.

Each string must be enclosed in single quotes, and each element of the <input\$1date\$1time\$1string> must be in the range for its corresponding element in the template. Otherwise, no result is returned.

### Example 1
<a name="sql-reference-cast-example-1"></a>
+ The input-string-element whose position corresponds with MM must be an integer from 1 to 12, because anything else does not represent a valid month.
+ The input-string-element whose position corresponds with dd must be an integer from 1 to 31, because anything else does not represent a valid day.
+ However, if MM is 2, dd cannot be 30 or 31, because February never has such days.

However, for months or days, the default starting value substituted for the omitted parts is 01.

For example, using '2009-09-16 03:15:24' as your input string, you can obtain a TIMESTAMP containing only the date, with zeros for the other fields such as hours, minutes, or seconds, by specifying

```
  CHAR_TO_TIMESTAMP('yyyy-MM-dd','2009-09-16 03:15:24').
```

The result would be the TIMESTAMP 2009-09-16 00:00:00.

### Example 2
<a name="sql-reference-cast-example-2"></a>
+ If the call had kept hours and minutes in the template while omitting months, days, and seconds, as illustrated in the following call --- --- CHAR\$1TO\$1TIMESTAMP('yyyy-hh-mm','2009-09-16 03:15:24') --- --- then the resulting TIMESTAMP would be 2009-01-01 03:15:00.


| Template | Input String | Output TIMESTAMP | Notes | 
| --- | --- | --- | --- | 
|  'yyyy-MM-dd hh:mm:ss'  |  '2009-09-16 03:15:24'  |  '2009-09-16 03:15:24'  |  Input string MUST use the form 'yyyy-MM-dd hh:mm:ss' or a subset or reordering thereof; using an input string like 'Wednesday, 16 September 2009 03:15:24' will NOT work, meaning that no output will result.  | 
|  'yyyy-mm'  |  '2012-02-08 07:23:19'  |  '2012-01-01 00:02:00'  | The template above specifies only year first and minutes second, so the second element in the input string ("02") is used as minutes. Default values are used for Month and Day ("01") and for hours and seconds ("00"). | 
|  'yyyy-ss-mm'  |  '2012-02-08 07:23:19'  |  '2012-01-01 00:08:02'  | The template above specifies only year, seconds, and minutes, in that order, so the second element in the input string ("02") is used as seconds and the third as minutes ("08"). Default values are used for Month and Day ("01") and for hours ("00"). | 
|  'MMM dd, yyyy'  |  'March 7, 2010'  |  '2010-03-07 00:00:00'  | MMM in the template above matches "March"; the template's 'comma space' matches the input string. --- --- If the template lacks the comma, so must the input string, or there is no output; --- --- If the input string lacks the comma, so must the template. | 
|  'MMM dd,'  |  'March 7, 2010'  |  '1970-03-07 00:00:00'  | Note that the template above doesn't use a year specifier, causing the output TIMESTAMP to use the earliest year in this epoch, 1970. | 
|  'MMM dd,y'  |  'March 7, 2010'  |  '2010-03-07 00:00:00'  | Using the template above, if the input string were 'March 7, 10', the output TIMESTAMP would be '0010-03-07 00:00:00'. | 
|  'M-d'  |  '2-8'  |  '1970-02-08 00:00:00'  | Absent a yyyy specifier in the template, as above, the earliest year in this epoch (1970) is used. An input string of '2-8-2012' would give the same result; using '2012-2-8' would give no result because 2012 is not a valid month. | 
|  'MM-dd-yyyy'  |  '06-23-2012 10:11:12'  |  '2012-06-23 00:00:00'  | Dashes as delimiters (as above) are fine, if template and input both use them in the same positions. Since the template omits hours, minutes, and seconds, zeroes are used in the output TIMESTAMP. | 
|  'dd-MM-yy hh:mm:ss'  |  '23-06-11 10:11:12'  |  '2011-06-23 10:11:12'  |  You can have the specifiers in any order as long as that order matches the meaning of the input string you supply, as above. The template and input string of the next example below have the same meaning (and the same output TIMESTAMP) as this example, but they specify months before days and seconds before hours.  | 
|  'MM-dd-yy ss:hh:mm'  |  '06-23-11 12:10:11'  |  '2011-06-23 10:11:12'  | In the template used above, the order of the month and day specifiers is reversed from the example just above, and the specifier for seconds is before hours instead of after minutes; but because the input string also puts months before days and seconds before hours, the meaning (and the output TIMESTAMP) is the same as the example ABOVE. | 
|  'yy-dd-MM ss:hh:mm'  |  '06-23-11 12:10:11'  |  '2006-11-23 10:11:12'  | The template used above reverses (compared to the prior example above) the years and months specifiers, while the input string remains the same. In this case, the output TIMESTAMP uses the first element of the input string as the years, the second as the days, and the third as the months. | 
|  'dd-MM-yy hh:mm'  |  '23-06-11 10:11:12'  |  '2011-06-23 10:11:00'  | With seconds omitted in the template, as above, the output TIMESTAMP uses 00 seconds. Any number of y specifiers produces the same result; but if the input string inadvertently uses a 1 instead of 11 for the year, as in '23-06-1 10:11:12', then the output TIMESTAMP becomes '0001-06-23 10:11:00'. | 
|  'MM/dd/yy hh:mm:ss'  |  `'12/19/11 10:11:12'`  `'12/19/11 12:11:10'`  |  `'2011-12-19 10:11:12'`  `'2011-12-19 00:11:10'`  | Slashes as delimiters are fine, if template and input both use them in the same positions, as above; otherwise, no output. Using specifier hh, input times of 12:11:10 and 00:11:10 have the same meaning as a time in the morning. | 
|  'MM/dd/yy HH:mm:ss'  |  '12/19/11 12:59:59' '12/19/11 21:08:07'  |  '2011-12-19 12:59:59' '2011-12-19 21:08:07'   | The input-string values '2011-12-19 00:11:12' or '2011-12-19 12:11:12' would fail with this template because '2011' is not a month, as required/expected by the template-string 'MM/dd/yy HH:mm:ss'. However, changing the template gives useful output:<pre><br />values(cast(CHAR_TO_TIMESTAMP('y/MM/dd HH:mm:ss', '2011/12/19 00:11:12') as varchar(19)));<br />'EXPR$0'<br />'2011-12-19 00:11:12'<br />1 row selected</pre>`'12/19/11 00:11:12'` would fail with the above template (`'y/MM/dd'`), since 19 is not a valid month; supplying `'12/11/19 00:11:12'` works. `'2011-12-19 12:11:12'` would fail as input because dashes don't match the slashes in the template ; `'2011/12/19 12:11:12'` works. Note that for times after 12 noon, that is, for afternoon and evening times, the hours specifier must be HH instead of hh, and the input string must specify the afternoon or evening hour in 24-hour clock time, hours running from 00 to 23. --- --- Using specifier HH, input times of 12:11:10 and 00:11:10 have different meanings, the first as a time in the afternoon and the second as a time in the morning. --- --- Using the specifier hh, the times from 12:00 through 11:59:59 are morning times: --- --- Given the specifiers hh:mm:ss, the output TIMESTAMP will include '00:09:08' in the morning for both input string '12:09:08' and input string '00:09:08'; --- --- whereas --- --- Given the specifiers HH:mm:ss, the output TIMESTAMP for input string '00:09:08' in the morning will include '00:09:08' --- --- and the output TIMESTAMP for input string '12:09:08' in the afternoon will include '12:09:08'.  | 

### Further examples
<a name="sql-reference-cast-examples-further"></a>

The examples below illustrate using various templates with CHAR\$1TO\$1TIMESTAMP, including some common misunderstandings.

```
 values (CHAR_TO_TIMESTAMP('yyyy-hh-mm','2009-09-16 03:15:24'));
'EXPR$0'
'2009-01-01 09:16:00'
1 row selected
```

Note that the fields in the input string above were used in the order given by the specifiers in the template, as defined by the dashes-as-delimiters in both template and input string: years first, then hours, then minutes. Since the specifiers for months and days are not present in the template, their values in the input string were ignored, with 01 substituted for both values in the output TIMESTAMP. The template specified hours and minutes as the second and third input values, so 09 became the hours and 16 became the minutes. No specifier was present for seconds, so 00 was used.

The years specifier can be alone or, after a delimiter matching the input string shows the end of the years specifier, with one of the hours:minutes:seconds specifiers:

```
values (CHAR_TO_TIMESTAMP('yyyy','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 00:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy hh','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
```

The template above fails because it has a space-as-delimiter before the "hh" rather than the dash delimiter used in the input string's date specification;

whereas the four templates below work because they use the same delimiter to separate the years specifier from the next specifier as is used in the input string's date specification (dash in the first case, space in the second, slash in the third, and dash in the fourth).

```
values (CHAR_TO_TIMESTAMP('yyyy-hh','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy hh','2009 09 16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy/hh','2009/09/16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy-mm','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 00:09:00'
1 row selected
```

However, if the template specifies months (MM), it cannot then specify hours, minutes, or seconds unless days are also specified:

Template specifying years and months only, thus omitting days/hours/minutes/seconds from the resulting TIMESTAMP:

```
values (CHAR_TO_TIMESTAMP('yyyy-MM','2009-09-16 03:15:24') );
'EXPR$0'
'2009-09-01 00:00:00'
1 row selected
```

The next two templates fail, lacking a 'days' specifier:

```
values (CHAR_TO_TIMESTAMP('yyyy-MM hh','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
values (CHAR_TO_TIMESTAMP('yyyy-MM hh:','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
```

The next three succeed, using a 'days' specifier:

```
 values (CHAR_TO_TIMESTAMP('yyyy-MM-dd hh','2009-09-16 03:15:24') );
'EXPR$0'
'2009-09-16 03:00:00'
1 row selected
```

The template above, 'yyyy-MM-dd hh', specifies only hours (hh) without minutes or seconds. Since hh is the 4th token/element of the template, its value is to be taken from the 4th token/element of the input string '2009-09-16 03:15:24' ; and that 4th element is 03, then used as the value output for hours. Since neither mm or ss is specified, the default or initial values defined as the starting point for mm and ss are used, which are zeroes.

```
values (CHAR_TO_TIMESTAMP('yyyy-MM-dd ss','2009-09-16 03:15:24') );
'EXPR$0'
'2009-09-16 00:00:03'
1 row selected
```

The template above, 'yyyy-MM-dd ss', specifies that the 4th token/element of the input string is to be used as seconds (ss). The 4th element of the input string '2009-09-16 03:15:24' is 03, which becomes the value output for seconds as specified in the template; and since neither hh nor mm is specified in the template, their default or initial values are used, which are zeroes.

```
values (CHAR_TO_TIMESTAMP('yyyy-MM-dd mm','2009-09-16 03:15:24') );
'EXPR$0'
'2009-09-16 00:03:00'
1 row selected
```

The template above, 'yyyy-MM-dd mm', specifies that the 4th token/element of the input string is to be used as minutes (mm). The 4th element of the input string '2009-09-16 03:15:24' is 03, which becomes the value output for minutes as specified in the template; and since neither hh nor ss is specified in the template, their default or initial values are used, which are zeroes.

Further failures, lacking a 'days' specifier:

```
values (CHAR_TO_TIMESTAMP('yyyy-MM- mm','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
values (CHAR_TO_TIMESTAMP('yyyy-MM   mm','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
values (CHAR_TO_TIMESTAMP('yyyy-MM   hh','2009-09-16 03:15:24') );
'EXPR$0'
No rows selected
```

## About Delimiters and Values
<a name="w2aac22c15c11c13"></a>

Delimiters in the template must match those in the input string; values in the input string must be acceptable for the template specifiers to which they correspond.

As a general convention, a colon is used to separate hours from minutes, and minutes from seconds. Similarly, the general convention is to use a dash or slash to separate years from months and months from days. Any parallel usage seems to work, and the examples that follow illustrate this.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh:mm:ss','2009/09/16 03:15:24') );
'EXPR$0'
No rows selected
```

The example above fails because 2009 is not an acceptable value for months, which is the first specifier (MM) in the template.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh:mm:ss','09/16/11 03:15:24') );
'EXPR$0'
'2011-09-16 03:15:24'
1 row selected
```

The example above succeeds because the delimiters are parallel (slashes to slashes, colons to colons) and each value is acceptable for the corresponding specifier.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh/mm/ss','09/16/11 03/15/24') );
'EXPR$0'
'2011-09-16 03:15:24'
1 row selected
```

The example above succeeds because the delimiters are parallel (all slashes) and each value is acceptable for the corresponding specifier.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh-mm-ss','09/16/11 03-15-24') );
'EXPR$0'
'2011-09-16 03:15:24'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy|MM|dd hh|mm|ss','2009|09|16 03|15|24') );
'EXPR$0'
'2009-09-16 03:15:24'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy@MM@dd hh@mm@ss','2009@09@16 03@15@24') );
'EXPR$0'
'2009-09-16 03:15:24'
1 row selected
```

The examples above succeed because the delimiters are parallel and the values are acceptable per specifier.

In the following examples, note that omissions in the supplied string can cause the template value 'yyyy' to produce logical but unintended or unexpected results. The value given as the year in the resulting TIMESTAMP value derives directly from the first element in the supplied string.

```
 VALUES(CHAR_TO_TIMESTAMP('yyyy','09-16 03:15'));
'EXPR$0'
'0009-01-01 00:00:00'
1 row selected
VALUES(CHAR_TO_TIMESTAMP('yyyy','16 03:15'));
'EXPR$0'
'0016-01-01 00:00:00'
1 row selected
```

## TIMESTAMP to STRING
<a name="TOC6"></a>

```
 values(cast( TIMESTAMP '2007-02-19 21:25:35' AS VARCHAR(25)));
'EXPR$0'
'2007-02-19 21:25:35'
1 row selected
```

Note that CAST requires a TIMESTAMP-literal to have literally the full format of 'yyyy-mm-dd hh:mm:ss'. If any part of that full format is missing, the literal is rejected as illegal, as seen below:

```
 values( TIMESTAMP '2007-02-19 21:25');
Error: Illegal TIMESTAMP literal '2007-02-19 21:25':
                                  not in format 'yyyy-MM-dd HH:mm:ss' (state=,code=0)
 values( TIMESTAMP '2007-02-19 21:25:00');
'EXPR$0'
'2007-02-19 21:25:00'
1 row selected
```

Also, if an inadequate output specification is supplied, no rows are selected:

```
values(cast( TIMESTAMP '2007-02-19 21:25:35' AS VARCHAR(18)));
'EXPR$0'
No rows selected
(Because the timestamp literal requires 19 characters)
```

These restrictions apply similarly to CASTing to TIME or DATE types. 

## STRING to TIME
<a name="TOC7"></a>

```
 values(cast(' 21:23:45.0' AS TIME));
'EXPR$0'
'21:23:45'
1 row selected
```

For more information, see Note A.

## STRING to DATE
<a name="TOC8"></a>

```
 values(cast('2007-02-19' AS DATE));
'EXPR$0'
'2007-02-19'
1 row selected
```

**Note A**  
Note that CAST for strings requires that the string operand for casting to TIME or DATE have the exact form required to represent a TIME or DATE, respectively.

As shown below, the cast fails if:
+ the string operand includes data extraneous to the targeted type, or
+ the INTERVAL operand ( 'day hours:minutes:seconds.milliseconds' ) does not include necessary data, or
+ the specified output field is too small to hold the conversion results.

```
 values(cast('2007-02-19 21:23:45.0' AS TIME));
'EXPR$0'
No rows selected
```

Fails because it includes date information not allowed as a TIME.

```
 values(cast('2007-02-19 21:23:45.0' AS DATE));
'EXPR$0'
No rows selected
```

Fails because it includes time information not allowed as a DATE.

```
 values(cast('2007-02-19 21' AS DATE));
'EXPR$0'
No rows selected
```

Fails because it includes time information not allowed as a DATE.

```
 values(cast('2009-02-28' AS DATE));
'EXPR$0'
'2009-02-28'
1 row selected
```

Succeeds because it includes a correct representation of date string.

```
 values(CAST (cast('2007-02-19 21:23:45.0' AS TIMESTAMP) AS DATE));
'EXPR$0'
'2007-02-19'
1 row selected
```

Succeeds because it correctly converts string to TIMESTAMP before casting to DATE.



```
 values(cast('21:23' AS TIME));
'EXPR$0'
No rows selected
```

Fails because it lacks time information (seconds) required for a TIME.

(Specifying fractional seconds is allowed but not required.)



```
 values(cast('21:23:34:11' AS TIME));
'EXPR$0'
No rows selected
```

Fails because it includes incorrect representation of fractional seconds.

```
 values(cast('21:23:34.11' AS TIME));
'EXPR$0'
'21:23:34'
1 row selected
```

Succeeds because it includes correct representation of fractional seconds.

```
 values(cast('21:23:34' AS TIME));
'EXPR$0'
'21:23:34'
1 row selected
```

This example succeeds because it includes correct representation of seconds without fractions of a second.

## INTERVAL to exact numerics
<a name="TOC10"></a>

CAST for intervals requires that the INTERVAL operand have only one field in it, such as MINUTE, HOUR, SECOND.

If the INTERVAL operand has more than one field, such as MINUTE TO SECOND, the cast fails, as shown below:

```
 values ( cast (INTERVAL '120' MINUTE(3) as decimal(4,2)));
+---------+
| EXPR$0  |
+---------+
+---------+
No rows selected

 values ( cast (INTERVAL '120' MINUTE(3) as decimal(4)));
+---------+
| EXPR$0  |
+---------+
| 120     |
+---------+
1 row selected

 values ( cast (INTERVAL '120' MINUTE(3) as decimal(3)));
+---------+
| EXPR$0  |
+---------+
| 120     |
+---------+
1 row selected

 values ( cast (INTERVAL '120' MINUTE(3) as decimal(2)));
+---------+
| EXPR$0  |
+---------+
+---------+
No rows selected

 values cast(interval '1.1' second(1,1) as decimal(2,1));
+---------+
| EXPR$0  |
+---------+
| 1.1     |
+---------+
1 row selected

 values cast(interval '1.1' second(1,1) as decimal(1,1));
+---------+
| EXPR$0  |
+---------+
+---------+
No rows selected
```

For year, decimal fractions are disallowed as input and as output.

```
values cast(interval '1.1' year (1,1) as decimal(2,1));
Error: org.eigenbase.sql.parser.SqlParseException: Encountered "," at line 1, column 35.
Was expecting:
    ")" ... (state=,code=0)
values cast(interval '1.1' year (1) as decimal(2,1));
Error: From line 1, column 13 to line 1, column 35:
              Illegal interval literal format '1.1' for INTERVAL YEAR(1) (state=,code=0)
values cast(interval '1.' year (1) as decimal(2,1));
Error: From line 1, column 13 to line 1, column 34:
              Illegal interval literal format '1.' for INTERVAL YEAR(1) (state=,code=0)
values cast(interval '1' year (1) as decimal(2,1));
+---------+
| EXPR$0  |
+---------+
| 1.0     |
+---------+
1 row selected
```

For additional examples, see SQL Operators: Further examples.

## Limitations
<a name="TOC11"></a>

Amazon Kinesis Data Analytics does not support directly casting numeric values to interval values. This is a departure from the SQL:2008 standard. The recommended way to convert a numeric to an interval is to multiply the numeric value against a specific interval value. For example, to convert the integer time\$1in\$1millis to a day-time interval:

```
time_in_millis * INTERVAL '0 00:00:00.001' DAY TO SECOND
```

For example:

```
 values cast( 5000 * (INTERVAL '0 00:00:00.001' DAY TO SECOND) as varchar(11));
'EXPR$0'
'5000'
1 row selected
```

# Date and Time Functions
<a name="sql-reference-date-time-functions"></a>

The following built-in functions relate to dates and time. 

**Topics**
+ [Time Zones](#sql-reference-date-time-functions-time-zones)
+ [Datetime Conversion Functions](sql-reference-datetime-conversion-functions.md)
+ [Date, Timestamp, and Interval Operators](sql-reference-date-timestamp-interval.md)
+ [Date and Time Patterns](sql-reference-parse-timestamp-format.md)
+ [CURRENT\$1DATE](sql-reference-current-date.md)
+ [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md)
+ [CURRENT\$1TIME](sql-reference-current-time.md)
+ [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md)
+ [EXTRACT](sql-reference-extract.md)
+ [LOCALTIME](sql-reference-localtime.md)
+ [LOCALTIMESTAMP](sql-reference-local-timestamp.md)
+ [TSDIFF](sql-reference-tsdiff.md)

 Of these, the SQL extension CURRENT\$1ROW\$1TIMESTAMP is the most useful for a streaming context, because it gives you information about the times of streaming data as it emerges, not just when the query is run. This is a key difference between a streaming query and a traditional RDMS query: streaming queries remain "open," producing more data, so the timestamp for when the query was run does not offer good information. 

LOCALTIMESTAMP, LOCALTIME, CURRENT\$1DATE, and CURRENT\$1TIMESTAMP all produce results which are set to values at the time the query first executes. Only CURRENT\$1ROW\$1TIMESTAMP generates a row with a unique timestamp (date and time) for each row. 

A query run with LOCALTIMESTAMP (or CURRENT\$1TIMESTAMP or CURRENT\$1TIME) as one of the columns puts into all output rows the time the query is first run. If that column instead contains CURRENT\$1ROW\$1TIMESTAMP, each output row gets a newly-calculated value of TIME representing when that row was output.

To return a part (such as the day of the month) from a Datetime value, use [EXTRACT](sql-reference-extract.md)

## Time Zones
<a name="sql-reference-date-time-functions-time-zones"></a>

Amazon Kinesis Data Analytics runs in UTC. As a result, all time functions return time in UTC.

# Datetime Conversion Functions
<a name="sql-reference-datetime-conversion-functions"></a>

You specify date and time formats using patterned letters. Date and time pattern strings use unquoted letters from 'A' to 'Z' and from 'a' to 'z', with each letter representing a formatting element.

For more information, see [Class SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/index.html?java/text/SimpleDateFormat.html) on the Oracle website.

**Note**  
If you include other characters, they will be incorporated into the output string during formatting or compared to the input string during parsing.

The pattern letters in the following table are defined (all other characters from 'A' to 'Z' and from 'a' to 'z' are reserved).


| Letter | Date or Time Component | Presentation | Examples | 
| --- | --- | --- | --- | 
|  y  |  Year  |  Year  |  yyyy; yy 2018;18  | 
|  Y  |  Week year  |  Year  |  YYYY; YY 2009; 09  | 
|  M  |  Month in year  |  Month  |  MMM;MM;MM July; Jul; 07  | 
|  w  |  Week in year  |  Number  |  ww; 27  | 
|  W  |  Week in month  |  Number  |  W 2  | 
|  D  |  Day in year  |  Number  |  DDD 321  | 
|  d  |  Day in month  |  Number  |  dd 10  | 
|  F  |  Day of week in month  |  Number  |  F 2  | 
|  E  |  Day name in week  |  Text  |  Tuesday; Tue  | 
|  u  |  Day number of week (1 = Monday, ..., 7 = Sunday)  |  Number  |  1  | 
|  a  |  Am/pm marker  |  Text  |  PM  | 
|  H  |  Hour in day (0-23)  |  Number  |  0  | 
|  k  |  Hour in day (1-24)  |  Number  |  24  | 
|  K  |  Hour in am/pm (0-11)  |  Number  |  0  | 
|  h  |  Hour in am/pm (1-12)  |  Number  |  12  | 
|  m  |  Minute in hour  |  Number  |  30  | 
|  s  |  Second in minute  |  Number  |  55  | 
|  S  |  Millisecond  |  Number  |  978  | 
|  z  |  Time zone  |  General time zone  |  Pacific Standard Time; PST; GMT-08:00  | 
|  Z  |  Time zone  |  RFC 822 time zone  |  -0800  | 
|  X  |  Time zone  |  ISO 8601 time zone  |  -08; -0800; -08:00  | 



You determine the exact presentation by repeating pattern letters, along the lines of YYYY.

**Text**  
 If the number of repeated pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters.

**Number**  
For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.

**Year**  
If the formatter's Calendar is the Gregorian calendar, the following rules are applied.
+ For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a number.
+ For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.

For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 2018, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.

Otherwise, calendar system specific forms are applied. For both formatting and parsing, if the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar specific short or abbreviated form is used.

# Char To Timestamp(Sys)
<a name="sql-reference-char-to-timestamp"></a>

The Char to Timestamp function is one of the most frequently-used system functions, because it lets you create a timestamp out of any correctly formatted input string. Using this function, you can specify which parts of the timestamp string you wish to use in subsequent processing, and create a TIMESTAMP value containing only those. To do so, you specify a template that identifies the parts of the timestamp you want. For example, to use only year and month, you would specify 'yyyy-MM'. 

The input date-time string can contain any parts of a full timestamp ('yyyy-MM-dd hh:mm:ss'). If all these elements are present in your input string, and 'yyyy-MM-dd hh:mm:ss' is the template you supply, then the input-string elements are interpreted in that order as year, month, day, hour, minute, and seconds, such as in '2009-09-16 03:15:24'. The yyyy cannot be uppercase; the hh can be uppercase to mean using a 24-hour clock. 

For the full range of valid specifiers, see [Class SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/index.html?java/text/SimpleDateFormat.html) on the Oracle website.

CHAR\$1TO\$1TIMESTAMP uses the template you specify as a parameter in the function call. The template causes the TIMESTAMP result to use only the parts of the input-date-time value that you specified in the template. Those fields in the resulting TIMESTAMP contain the corresponding data taken from your input-date-time string. Fields not specified in your template will use default values (see below). The format of the template used by CHAR\$1TO\$1TIMESTAMP is defined by the [Class SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/index.html?java/text/SimpleDateFormat.html) on the Oracle website. For more information, see [Date and Time Patterns](sql-reference-parse-timestamp-format.md).

The function-call syntax is as follows:

```
CHAR_TO_TIMESTAMP('<format_string>','<input_date_time_string>')
```

Where <format\$1 string> is the template you specify for the parts of <date\$1time\$1string> you want, and <input\$1date\$1time\$1string> is the original string that is being converted to a TIMESTAMP result.

Note that each string must be enclosed in single quotes and each element of the <input\$1date\$1time\$1string> must be in the range for its corresponding element in the template, otherwise no result is returned.

For example, the input-string-element whose position corresponds with MM must be an integer from 1 to 12, because anything else does not represent a valid month. Similarly, the input-string-element whose position corresponds with dd must be an integer from 1 to 31, because anything else does not represent a valid day. (However, if MM is 2, dd cannot be 30 or 31, because February never has such days.)

For hours, minutes, or seconds, the default starting value is zero, so when those specifiers are omitted from the template, zeroes are substituted. For months or days, the default starting value substituted for the omitted parts is 01.

For example, using '2009-09-16 03:15:24' as your input string, you can obtain a TIMESTAMP containing only the date, with zeros for the other fields such as hours, minutes, or seconds.

```
 CHAR_TO_TIMESTAMP('yyyy-MM-dd','2009-09-16 03:15:24').
```

The result would is TIMESTAMP 2009-09-16 00:00:00.

If the call had kept hours and minutes in the template while omitting months, days, and seconds, as illustrated in the following call.

```
--- --- CHAR_TO_TIMESTAMP('yyyy-hh-mm','2009-09-16 03:15:24')
```

Then, the resulting TIMESTAMP would be 2009-01-01 03:15:00.

[Template Strings to Create Specific Output Timestamps](sql-reference-template-strings-create-output-timestamps.md) shows further illustrative examples of templates and input strings used to create the indicated output TIMESTAMPs.

**Note**  
Input string MUST use the form 'yyyy-MM-dd hh:mm:ss' or a subset or reordering thereof. As a result, using an input string like 'Wednesday, 16 September 2009 03:15:24' will NOT work, meaning that no output will result.

## About Delimiters and Values
<a name="w2aac22c17c19c29c39"></a>

Delimiters in the template must match those in the input string and values in the input string must be acceptable for the template specifiers to which they correspond.

As a general convention, a colon is used to separate hours from minutes, and minutes from seconds. Similarly, the general convention is to use a dash or slash to separate years from months and months from days. 

For example, the following template has values that line up correctly with the input string.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh:mm:ss','09/16/11 03:15:24') );
'EXPR$0'
'2011-09-16 03:15:24'
1 row selected
```

If values in the input string are not acceptable for the template specifiers to which they correspond, the result fails, as in the following example.

```
values (CHAR_TO_TIMESTAMP('MM/dd/yy hh:mm:ss','2009/09/16 03:15:24') );
'EXPR$0'
No rows selected
```

This example returns no rows because 2009 is not an acceptable value for months, which is the first specifier (MM) in the template.

Omissions in the supplied string can cause the template value 'yyyy' to produce logical but unintended or unexpected results. The following examples each return an erroneous year, but one that derives directly from the first element in the supplied string.

```
 VALUES(CHAR_TO_TIMESTAMP('yyyy','09-16 03:15'));
'EXPR$0'
'0009-01-01 00:00:00'
1 row selected
VALUES(CHAR_TO_TIMESTAMP('yyyy','16 03:15'));
'EXPR$0'
'0016-01-01 00:00:00'
1 row selected
```

## Examples Using Templates to Create TIMESTAMPS
<a name="w2aac22c17c19c29c41"></a>

The order of the template must match the input string. That means that you cannot specify "hh" after "yyyy" and expect the method to find the hour automatically. For example, the following template specifies years first, then hours, then minutes, and returns an erroneous result.

```
 values (CHAR_TO_TIMESTAMP('yyyy-hh-mm','2009-09-16 03:15:24'));
'EXPR$0'
'2009-01-01 09:16:00'
1 row selected
```

Since the specifiers for months and days are not present in the template, their values in the input string were ignored, with 01 substituted for both values in the output TIMESTAMP. The template specified hours and minutes as the second and third input values, so 09 became the hours and 16 became the minutes. No specifier was present for seconds, so 00 was used.

The years specifier can be alone or after a delimiter matching the input string shows the end of the years specifier, with one of the hours:minutes:seconds specifiers.

```
values (CHAR_TO_TIMESTAMP('yyyy','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 00:00:00'
1 row selected
```

 In contrast, the template below fails because it has a space-as-delimiter before the "hh" rather than the dash delimiter used in the input string's date specification. 

```
  values (CHAR_TO_TIMESTAMP('yyyy hh','2009-09-16 03:15:24') );
  'EXPR$0'
  No rows selected
```

The four templates below work because they use the same delimiter to separate the years specifier from the next specifier as is used in the input string's date specification (dash in the first case, space in the second, slash in the third, and dash in the fourth).

```
values (CHAR_TO_TIMESTAMP('yyyy-hh','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy hh','2009 09 16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy/hh','2009/09/16 03:15:24') );
'EXPR$0'
'2009-01-01 09:00:00'
1 row selected
values (CHAR_TO_TIMESTAMP('yyyy-mm','2009-09-16 03:15:24') );
'EXPR$0'
'2009-01-01 00:09:00'
1 row selected
```

However, if the template specifies months (MM), it cannot then specify hours, minutes, or seconds unless days are also specified.

# Template Strings to Create Specific Output Timestamps
<a name="sql-reference-template-strings-create-output-timestamps"></a>




| Template | Input String | Output TIMESTAMP | Notes | 
| --- | --- | --- | --- | 
|  'yyyy-MM-dd hh:mm:ss'  |  '2009-09-16 03:15:24'  |  '2009-09-16 03:15:24'  | 
|  'yyyy-mm'  |  '2011-02-08 07:23:19'  |  '2011-01-01 00:02:00'  |  The template above specifies only year first and minutes second, so the second element in the input string ("02") is used as minutes. Default values are used for Month and Day ("01") and for hours and seconds ("00").  | 
|  'MMM dd, yyyy'  |  'March 7, 2010'  |  '2010-03-07 00:00:00'  | MMM in the template above matches "March"; the template's 'comma space' matches the input string. If the template lacks the comma, so must the input string, or there is no output; If the input string lacks the comma, so must the template. | 
|  'MMM dd,'  |  'March 7, 2010'  |  '1970-03-07 00:00:00'  | Note that the template above doesn't use a year specifier, causing the output TIMESTAMP to use the earliest year in this epoch, 1970. | 
|  'MMM dd,y'  |  'March 7, 2010'  |  '2010-03-07 00:00:00'  | Using the template above, if the input string were 'March 7, 10', the output TIMESTAMP would be '0010-03-07 00:00:00'. | 
|  'M-d'  |  '2-8'  |  '1970-02-08 00:00:00'  | Absent a yyyy specifier in the template, as above, the earliest year in this epoch (1970) is used. An input string of '2–8−2011' would give the same result; using '2011–2−8' would give no result because 2011 is not a valid month. | 
|  'MM-dd-yyyy'  |  '06-23-2011 10:11:12'  |  '2011-06-23 00:00:00'  | Dashes as delimiters (as above) are fine, if template and input both use them in the same positions. Since the template omits hours, minutes, and seconds, zeroes are used in the output TIMESTAMP. | 
|  `'dd-MM-yy hh:mm:ss'`  |  `'23-06-11 10:11:12'`  |  `'2011-06-23 10:11:12'`  | You can have the specifiers in any order as long as that order matches the meaning of the input string you supply. The template and input string of the next example below have the same meaning (and the same output TIMESTAMP) as this example, but they specify months before days and seconds before hours. | 
|  `'MM-dd-yy ss:hh:mm'`  |  `'06-23-11 12:10:11'`  |  `'2011-06-23 10:11:12'`  | In the template used above, the order of the month and day specifiers is reversed from the example just above, and the specifier for seconds is before hours instead of after minutes; but because the input string also puts months before days and seconds before hours, the meaning (and the output TIMESTAMP) is the same as the example ABOVE. | 
|  `'yy-dd-MM ss:hh:mm'`  |  `'06-23-11 12:10:11'`  |  `'2006-11-23 10:11:12'`  | The template used above reverses (compared to the prior example above) the years and months specifiers, while the input string remains the same. In this case, the output TIMESTAMP uses the first element of the input string as the years, the second as the days, and the third as the months. | 
|  `'dd-MM-yy hh:mm'`  |  `'23-06-11 10:11:12'`  |  `'2011-06-23 10:11:00'`  | With seconds omitted in the template, as above, the output TIMESTAMP uses 00 seconds. Any number of y specifiers produces the same result; but if the input string inadvertently uses a 1 instead of 11 for the year, as in '23-06-1 10:11:12', then the output TIMESTAMP becomes '0001-06-23 10:11:00'. | 
|  `'MM/dd/yy hh:mm:ss'`  |  `'12/19/11 10:11:12'`  `'12/19/11 12:11:12'`  |  `'2011-12-19 10:11:12'`  `'12/19/11 00:11:12'`  | Slashes as delimiters are fine, if template and input both use them in the same positions, as above. Using specifier hh, input times of 12:11:10 and 00:11:10 have the same meaning as a time in the morning. | 
|  `'MM/dd/yy HH:mm:ss'`  |  `'12/19/11 12:59:59'` `'12/19/11 21:08:07'`  `'2011-12-19 00:11:12'` `'2011-12-19 12:11:12'`  |  `'2011-12-19 12:59:59'` `'2011-12-19 21:08:07'`   |  The input-string values `'2011-12-19 00:11:12'` or `'2011-12-19 12:11:12'` would fail with this template because `'2011'` is not a month, as required/expected by the template-string `'MM/dd/yy HH:mm:ss'`. However, changing the template gives useful output: <pre>values(cast(CHAR_TO_TIMESTAMP('y/MM/dd HH:mm:ss', '2011/12/19 00:11:12') as<br />varchar(19)));<br />'EXPR$0'<br />'2011-12-19 00:11:12'</pre> 1 row selected `'12/19/11 00:11:12'` would fail with the above template (`'y/MM/dd'`), since 19 is not a valid month; supplying '`12/11/19 00:11:12'` works. `'2011-12-19 12:11:12'` would fail as input because dashes don't match the slashes in the template, `'2011/12/19 12:11:12'` works. Note that for times after 12 noon (that is, for afternoon and evening times), the hours specifier must be HH instead of hh, and the input string must specify the afternoon or evening hour in 24-hour clock time, hours running from 00 to 23. Using specifier HH, input times of 12:11:10 and 00:11:10 have different meanings, the first as a time in the afternoon and the second as a time in the morning. Using the specifier hh, the times from 12:00 through 11:59:59 are morning times: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-template-strings-create-output-timestamps.html) | 



# CHAR\$1TO\$1DATE
<a name="sql-reference-char-to-date"></a>

Converts a string to a date, according to the specified format string.

```
CHAR_TO_DATE(format,dateString);
```

# CHAR\$1TO\$1TIME
<a name="sql-reference-char-to-time"></a>

Converts a string to a date, according to the specified format string

```
CHAR_TO_TIME(format,dateString);
```

# DATE\$1TO\$1CHAR
<a name="sql-reference-date-to-char"></a>

The DATE\$1TO\$1CHAR converts a date to a string.

```
DATE_TO_CHAR(format,d);
```

Where d is a date that will be converted to a string.

# TIME\$1TO\$1CHAR
<a name="sql-reference-time-to-char"></a>

Uses a format string to format a time. Returns the formatted time or portion of a time as a string.

```
TIME_TO_CHAR(format,time);
```

# TIMESTAMP\$1TO\$1CHAR
<a name="sql-reference-timestamp-to-char"></a>

Uses a format string to format a timestamp as char. Returns the timestamp as a string.

```
TIMESTAMP_TO_CHAR(format,ts);
```

Where ts is timestamp.

**Note**  
If the input is `null`, the output will be the string "`null`".

# TO\$1TIMESTAMP
<a name="sql-reference-to-timestamp"></a>

Converts a Unix timestamp to a SQL timestamp in 'YYYY-MM-DD HH:MM:SS' format. 

## Syntax
<a name="sql-reference-to-timestamp-syntax"></a>

```
TO_TIMESTAMP(unixEpoch)
```

## Parameters
<a name="sql-reference-to-timestamp-parameters"></a>

*unixEpoch*

A Unix timestamp in the format milliseconds since '1970-01-01 00:00:00' UTC, expressed as a BIGINT.

## Example
<a name="sql-reference-to-timestamp-examples"></a>

### Example Dataset
<a name="w2aac22c17c19c41b9b2"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

**Note**  
The sample dataset has been modified to include a Unix timestamp value (CHANGE\$1TIME).

To run each example, you need an Amazon Kinesis Analytics application that has the input stream for the sample stock ticker. To learn how to create an Analytics application and configure the input stream for the sample stock ticker, see [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
change_time     BIGINT,    --The UNIX timestamp value
price           REAL)
```

### Example 1: Convert a Unix Timestamp to a SQL Timestamp
<a name="w2aac22c17c19c41b9b4"></a>

In this example, the `change_time` value in the source stream is converted to a SQL TIMESTAMP value in the in-application stream.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sector VARCHAR(64), 
    change REAL, 
    change_time TIMESTAMP, 
    price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM   TICKER_SYMBOL,
                SECTOR,
                CHANGE,
                TO_TIMESTAMP(CHANGE_TIME), 
                PRICE

FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for time, ticker symbol, sector, change, and price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-to-timestamp.png)


## Notes
<a name="sql-reference-to-timestamp-notes"></a>

TO\$1TIMESTAMP is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics streaming SQL extension.

# UNIX\$1TIMESTAMP
<a name="sql-reference-unix-timestamp"></a>

Converts a SQL timestamp to a Unix timestamp that is expressed in milliseconds since '1970-01-01 00:00:00' UTC and that is a BIGINT.

## Syntax
<a name="sql-reference-unix-timestamp-syntax"></a>

```
UNIX_TIMESTAMP(timeStampExpr)
```

## Parameters
<a name="sql-reference-unix-timestamp-parameters"></a>

*timeStampExpr*

A SQL TIMESTAMP value.

## Example
<a name="sql-reference-unix-timestamp-examples"></a>

### Example Dataset
<a name="w2aac22c17c19c43b9b2"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

**Note**  
The sample dataset has been modified to include a Timestamp value (CHANGE\$1TIME).

To run each example, you need an Amazon Kinesis Analytics application that has the input stream for the sample stock ticker. To learn how to create an Analytics application and configure the input stream for the sample stock ticker, see [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
change_time     TIMESTAMP,    --The timestamp value to convert
price           REAL)
```

### Example 1: Convert a Timestamp to a UNIX Timestamp
<a name="w2aac22c17c19c43b9b4"></a>

In this example, the `change_time` value in the source stream is converted to a TIMESTAMP value in the in-application stream.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
        ticker_symbol VARCHAR(4), 
        SECTOR VARCHAR(16), 
        CHANGE REAL,
        CHANGE_TIME BIGINT, 
        PRICE REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM   TICKER_SYMBOL,
                SECTOR,
                CHANGE,
                UNIX_TIMESTAMP(CHANGE_TIME),
                PRICE
FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for time, ticker symbol, sector, change, and price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-unix-timestamp.png)


## Notes
<a name="sql-reference-unix-timestamp-notes"></a>

UNIX\$1TIMESTAMP is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics streaming SQL extension.

# Date, Timestamp, and Interval Operators
<a name="sql-reference-date-timestamp-interval"></a>

The arithmetic operators \$1, -, \$1, and / are binary operators.


| Operator | Description | Notes | 
| --- | --- | --- | 
|  \$1  |  Addition  |  interval \$1 interval = interval interval \$1 datetime = datetime datetime \$1 interval = datetime  | 
|  -  |  Subtraction  |  interval - interval = interval datetime - interval = datetime ( <datetime> - <datetime> ) [Date, Timestamp, and Interval Operators](#sql-reference-date-timestamp-interval) <interval qualifier> = interval  | 
|  \$1  |  Multiplication  |  interval \$1 numeric = interval numeric \$1 interval = interval  | 
|  /  |  Division  |  interval / numeric = interval  | 

## Examples
<a name="date-timestamp-interval-examples"></a>


| Example | Operation | Result | 
| --- | --- | --- | 
|  1  |  INTERVAL '1' DAY \$1 INTERVAL '3' DAY  |  INTERVAL '4' DAY  | 
|  2  |  INTERVAL '1' DAY \$1 INTERVAL '3 4' DAY TO HOUR  |  INTERVAL '\$14 04' DAY TO HOUR  | 
|  3  |  INTERVAL '1' DAY - INTERVAL '3 4' DAY TO HOUR  |  INTERVAL '-2 04' DAY TO HOUR  | 
|  4  |  INTERVAL '1' YEAR \$1 INTERVAL '3-4' YEAR TO MONTH  |  INTERVAL '\$14-04' YEAR TO MONTH  | 
|  5  |  2 \$1 INTERVAL '3 4' DAY TO HOUR  |  INTERVAL '6 8' DAY TO HOUR  | 
|  6  |  INTERVAL '3 4' DAY TO HOUR / 2  |  INTERVAL ' 1 14' DAY TO HOUR  | 

In the example 3, '3 4 DAY means 3 days and 4 hours, so the result in that row means 24 hours minus 76 hours, resulting in minus 52 hours, which is a negative 2 days and 4 hours.

Example 4 uses TO MONTH rather than TO HOUR, so the INTERVAL specified as '3-4' means 3 years and 4 months, or 40 months.

In example 6, the "/2" applies to the INTERVAL '3 4', which is 76 hours, half of which is 38, or 1 day and 14 hours.

### Further Examples of Interval Operations
<a name="w2aac22c17c21b7c11"></a>

Streaming SQL also supports subtracting two datetimes, giving an interval. You specify what kind of interval you want for the result, as shown following:

```
(<datetime> - <datetime>) <interval qualifier>
```



The following examples show operations that can be useful in Amazon Kinesis Data Analytics applications.

**Example 1 – Time Difference (as minutes to the nearest second or as seconds)**  

```
 values cast ((time  '12:03:34' - time '11:57:23') minute to second as varchar(8));
+---------+
 EXPR$0  
+---------+
 +6:11   
+---------+
1 row selected
............... 6 minutes, 11 seconds
or
 values cast ((time  '12:03:34' - time '11:57:23') second as varchar(8));
+---------+
 EXPR$0  
+---------+
 +371    
+---------+
1 row selected
```

**Example 2 – Time Difference (as minutes only)**  

```
values cast ((time  '12:03:34' - time '11:57:23') minute as varchar(8));
+---------+
 EXPR$0  
+---------+
 +6      
+---------+
1 row selected
............... 6 minutes; seconds ignored.
values cast ((time  '12:03:23' - time '11:57:23') minute as varchar(8));
+---------+
 EXPR$0  
+---------+
 +6      
+---------+
1 row selected
............... 6 minutes
```

**Example 3 – Time-to-Timestamp Difference (as days to the nearest second) Invalid**  

```
values cast ((time '12:03:34'-timestamp '2004-04-29 11:57:23') day to second as varchar(8));
Error: From line 1, column 14 to line 1, column 79: Parameters must be of the same type
```

**Example 4 – Timestamp difference (as days to the nearest second)**  

```
values cast ((timestamp  '2004-05-01 12:03:34' - timestamp '2004-04-29 11:57:23') day to
                second as varchar(8));
+-----------+
  EXPR$0   
+-----------+
 +2 00:06  
+-----------+
1 row selected
............... 2 days, 6 minutes
............... Although "second" was specified above, the varchar(8) happens to allow
only room enough to show only the minutes, not the seconds.
The example below expands to varchar(11), showing the full result:
values cast ((timestamp  '2004-05-01 12:03:34' - timestamp '2004-04-29 11:57:23') day to
                second as varchar(11));
+--------------+
    EXPR$0    
+--------------+
 +2 00:06:11  
+--------------+
1 row selected
............... 2 days, 6 minutes, 11 seconds
```

**Example 5 – Timestamp Difference (as days to the nearest second)**  

```
values cast ((timestamp  '2004-05-01 1:03:34' - timestamp '2004-04-29 11:57:23') day to
                second as varchar(11));
+--------------+
    EXPR$0    
+--------------+
 +1 13:06:11  
+--------------+
1 row selected
............... 1 day, 13 hours, 6 minutes, 11 seconds
values cast ((timestamp  '2004-05-01 13:03:34' - timestamp '2004-04-29 11:57:23') day to
                second as varchar(11));
+--------------+
    EXPR$0    
+--------------+
 +2 01:06:11  
+--------------+
1 row selected
............... 2 days, 1 hour, 6 minutes, 11 seconds
```

**Example 6 – Timestamp Difference (as days)**  

```
values cast ((timestamp  '2004-05-01 12:03:34' - timestamp '2004-04-29 11:57:23') day
                as varchar(8));
+---------+
 EXPR$0  
+---------+
 +2      
+---------+
1 row selected
............... 2 days
```

**Example 7 – Time Difference (as days)**  

```
values cast ((date '2004-12-02 ' - date '2003-12-01 ') day  as varchar(8));
Error: Illegal DATE literal '2004-12-02 ': not in format 'yyyy-MM-dd'
.............. Both date literals end with a space;  disallowed.
values cast ((date '2004-12-02' - date '2003-12-01 ') day  as varchar(8));
Error: Illegal DATE literal '2003-12-01 ': not in format 'yyyy-MM-dd'
.............. Second date literal still ends with a space;  disallowed.
values cast ((date '2004-12-02' - date '2003-12-01') day  as varchar(8));
+---------+
 EXPR$0  
+---------+
 +367    
+---------+
1 row selected
............... 367 days
```

**Example 8 – Not Supported (Simple Difference of Dates)**  
If you don't specify "day" as the intended unit, as shown following, the subtraction is not supported.  

```
     values cast ((date '2004-12-02' - date '2003-12-01') as varchar(8));
     Error: From line 1, column 15 to line 1, column 51:
            Cannot apply '-' to arguments of type '<DATE> - <DATE>'.
     Supported form(s): '<NUMERIC> - <NUMERIC>'
                        '<DATETIME_INTERVAL> - <DATETIME_INTERVAL>'
                        '<DATETIME> - <DATETIME_INTERVAL>'
```

### Why Use "as varchar" in Conversion Examples?
<a name="w2aac22c17c21b7c13"></a>

The reason for using the "values cast (<expression> AS varchar(N))" syntax in the examples above is that while the SQLline client used above (with Amazon Kinesis Data Analytics running) does return an interval, JDBC does not support returning that result so as to display it. Therefore, that "values" syntax is used to see/show it.

If you close the Amazon Kinesis Data Analytics (with a \$1kill command) or if you don't start it before running SQLline, then you can run the sqllineEngine (rather than the sqllineClient) from the bin subdirectory of your Amazon Kinesis Data Analytics home, which can show your results without the Amazon Kinesis Data Analytics application or JDBC:

### Rules for Specifying Intervals
<a name="TOC14"></a>

A Day-Time Interval Literal is a string that denotes a single interval value: for example '10' SECONDS. Note it has two parts: the value (which must always be in single-quotes) and the qualifier (here, SECONDS), which give the units for the value.

The qualifier takes the following form:

```
DAY  HOUR  MINUTE  SECOND [TO HOUR  MINUTE  SECOND]
```

**Note**  
YEAR TO MONTH intervals require a dash separating the values, whereas DAY TO HOUR intervals use a space to separate the values, as seen in the 2nd, 3rd, 5th, and 6th examples in that topic.

In addition, the leading term has to be of greater significance than the optional trailing term, so this means you can only specify:

```
  DAY
  HOUR
  MINUTE
  SECOND
  DAY TO HOUR
  DAY TO MINUTE
  DAY TO SECOND
  HOUR TO MINUTE
  HOUR TO SECOND
  MINUTE TO SECOND
```

The easiest way to understand these may be to translate X TO Y as "Xs to the nearest Y". Hence, DAY TO HOUR is "days to the nearest hour".

When DAY, HOUR, or MINUTE is the leading term, you can specify a precision, e.g., DAY(3) TO HOUR, indicating the number of digits the associated field in the value can have. The maximum precision is 10, and the default is 2. You can't specify precision for HOUR, OR MINUTE in the trailing term - they are always of precision 2. So for example, HOUR(3) TO MINUTE is legal, HOUR TO MINUTE(3) is not.

SECOND can also take a precision, but the way it is specified differs depending on whether it is the leading or trailing field.
+ If SECOND is the leading field, you can specify the digits before and after the decimal point. For example, SECOND(3,3) would allow you to specify up to 999.999 seconds. The default is (2,3), which is actually a deviation from the SQL:2008 spec (it should be (2,6), but we only have millisecond precision).
+ If SECOND is the trailing field, you can only specify precision for the fractional seconds, that is, the part shown after the seconds' decimal point below. For example, SECOND(3) would indicate milliseconds. The default is 3 digits after the decimal point, but as above this is a deviation from the standard of 6.

As for the value, it takes the general form of:

```
  [+-]'[+-]DD HH:MM:SS.SSS'
```

Where DD are digits indicating days, HH hours, MM minutes, and SS.SSS is seconds (adjust the number of digits appropriately if precision is explicitly specified).

Not all values have to include all fields—you can trim from both front or back, but not from in the middle. So you could make it 'DD HH' or 'MM:SS.SSS', but not 'DD MM'.

However you write it, though, the value must match the qualifier, as shown following:

```
INTERVAL '25 3' DAY to HOUR ------> legal
INTERVAL '3:45:04.0' DAY TO HOUR --> illegal
```

As stated in the SQL spec, if the precision is not explicitly specified, it is implied to be 2. Thus:
+ INTERVAL '120' MINUTE is an illegal interval. The legal form for the desired interval is INTERVAL '120' MINUTE(2)

  and
+ INTERVAL '120' SECOND is not legal. The legal form for the desired interval is INTERVAL '120' SECOND(3).

  ```
     values INTERVAL '120' MINUTE(2);
     Error: From line 1, column 8 to line 1, column 31:
                         Interval field value 120 exceeds precision of MINUTE(2) field
     values INTERVAL '120' MINUTE(3);
     Conversion not supported
  ```

Also, if HOUR, MINUTE, or SECOND are not the leading field, they must fall in the following ranges (taken from Table 6 in topic 4.6.3 of the SQL:2008 foundation spec), as shown following:

```
  HOUR: 0-23
  MINUTE: 0-59
  SECOND: 0-59.999
```

Year-month intervals are similar, except that the qualifiers are as shown following:

```
  YEAR
  MONTH
  YEAR TO MONTH
```

Precision can be specified just as with DAY and HOUR, and the max of 10 and default of 2 is the same.

The value format for year-month is: 'YY-MM'. If MONTH is the trailing field, it must fall in the range 0-11.

```
<interval qualifier> := <start field> TO <end field>  <single datetime field>

<start field> := <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ]

<end field> := <non-second primary datetime field>  SECOND [ <left paren> <interval fractional seconds precision> <right paren> ]

<single datetime field> := <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ]
   SECOND [ <left paren> <interval leading field precision>
           [ <comma> <interval fractional seconds precision> ] <right paren> ]
<primary datetime field> := <non-second primary datetime field>       SECOND
<non-second primary datetime field> := YEAR  MONTH  DAY  HOUR      MINUTE
<interval fractional seconds precision> := <unsigned integer>
<interval leading field precision> := <unsigned integer>
```

# Date and Time Patterns
<a name="sql-reference-parse-timestamp-format"></a>

Date and time formats are specified by date and time pattern strings. In these pattern strings, unquoted letters from A to Z and from a to z represent components of a data or time value. If a letter or text string is enclosed within a pair of single quotes, that letter or text is not interpreted but rather used as is, as are all other characters in the pattern string. During printing, that letter or text is copied as is to the output string; during parsing, they are matched against the input string. "''" represents a single quote.

The following pattern letters are defined for the indicated Date or Time Component. All other characters from 'A' to 'Z' and from 'a' to 'z' are reserved. For an alphabetic ordering of the pattern letters, see [Date and Time Pattern Letters in Alphabetic Order](#PATTERNSINALFAORDR).


| Date or Time Component | Pattern Letter | Presentation as text or number | Examples | 
| --- | --- | --- | --- | 
|  Era designator  |  G  |  [Text](#sql-reference-parse-timestamp-format-text)  |  AD  | 
|  Year  |  y  |  Year  |  1996; 96  | 
|  Month in year  |  M  |  Month  |  July; Jul; 07  | 
|  Week in year  |  w  |  Number  |  27  | 
|  Week in month  |  W  |  Number  |  2  | 
|  Day in year  |  D  |  Number  |  189  | 
|  Day in month  |  d  |  Number  |  10  | 
|  Day of week in month  |  F  |  Number  |  2  | 
|  Day in week  |  E  |  [Text](#sql-reference-parse-timestamp-format-text)  |  EE=Tu; EEE=Tue; EEEE=Tuesday  | 
|  Am/pm marker  |  a  |  [Text](#sql-reference-parse-timestamp-format-text)  |  PM  | 
|  Hour in day (0-23)  |  H  |  Number  |  0  | 
|  Hour in day (1-24)  |  k  |  Number  |  24  | 
|  Hour in am/pm (0-11)  |  K  |  Number  |  0  | 
|  Hour in am/pm (1-12)  |  h  |  Number  |  12  | 
|  Minute in hour  |  m  |  Number  |  30  | 
|  Second in minute  |  s  |  Number  |  55  | 
|  Millisecond  |  S  |  Number  |  978  | 
|  Time zone  |  z  |  General  |  Pacific Standard Time; PST; GMT-08:00  | 
|  Time zone  |  Z  |  RFC  |  -0800  | 

Pattern letters are usually repeated, as their number determines the exact presentation:

## Text
<a name="sql-reference-parse-timestamp-format-text"></a>

For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters.

## Number
<a name="w2aac22c17c23c13"></a>

For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.

## Year
<a name="w2aac22c17c23c15"></a>

Time zones are interpreted as text if they have names. For time zones representing a GMT offset value, the following syntax is used:

```
GMTOffsetTimeZone:
GMT Sign Hours : Minutes
Sign: one of
+ -
Hours:
Digit
Digit Digit
Minutes:
Digit Digit
Digit: one of
0 1 2 3 4 5 6 7 8 9
```

Hours must be between 0 and 23, and Minutes must be between 00 and 59. The format is locale independent and digits must be taken from the Basic Latin block of the Unicode standard.

For parsing, RFC 822 time zones are also accepted.

## RFC 822 time zone
<a name="w2aac22c17c23c17"></a>

For formatting, the RFC 822 4-digit time zone format is used:

```
RFC822TimeZone:
Sign TwoDigitHours Minutes
TwoDigitHours:
Digit Digit
```

TwoDigitHours must be between 00 and 23. Other definitions are as for general time zones.

For parsing, general time zones are also accepted.

SimpleDateFormat also supports ''localized date and time pattern'' strings. In these strings, the pattern letters described above may be replaced with other, locale dependent, pattern letters. SimpleDateFormat does not deal with the localization of text other than the pattern letters; that's up to the client of the class.

## Examples
<a name="sqlrf-parsetimestampformat-examples"></a>

The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific time zone.


| Date and Time Pattern | Result | 
| --- | --- | 
|  "yyyy.MM.dd G 'at' HH:mm:ss z"  |  2001.07.04 AD at 12:08:56 PDT  | 
|  "EEE, MMM d, ''yy"  |  Wed, Jul 4, '01  | 
|  "h:mm a"  |  12:08 PM  | 
|  "hh 'o''clock' a, zzzz"  |  12 o'clock PM, Pacific Daylight Time  | 
|  "K:mm a, z"  |  0:08 PM, PDT  | 
|  "yyyyy.MMMMM.dd GGG hh:mm aaa"  |  02001.July.04 AD 12:08 PM  | 
|  "EEE, d MMM yyyy HH:mm:ss Z"  |  Wed, 4 Jul 2001 12:08:56 -0700  | 
|  "yyMMddHHmmssZ"  |  010704120856-0700  | 
|  "yyyy-MM-dd'T'HH:mm:ss.SSSZ"  |  2001-07-04T12:08:56.235-0700  | 

## Date and Time Pattern Letters in Alphabetic Order
<a name="PATTERNSINALFAORDR"></a>

The same pattern letters shown at first, above, in Date or Time Component order are shown below in alphabetic order for easy reference.


| Pattern Letter | Date or Time Component | Presentation as text or number | Examples | 
| --- | --- | --- | --- | 
|  a  |  Am/pm marker  |  Text  |  PM  | 
|  D  |  Day in year  |  Number  |  189  | 
|  d  |  Day in month  |  Number  |  10  | 
|  E  |  Day in week  |  Text  |  EE=Tu; EEE=Tue; EEEE=Tuesday  | 
|  F  |  Day of week in month  |  Number  |  2  | 
|  G  |  Era designator  |  Text  |  AD  | 
|  H  |  Hour in day (0-23)  |  Number  |  0  | 
|  h  |  Hour in am/pm (1-12)  |  Number  |  12  | 
|  k  |  Hour in day (1-24)  |  Number  |  24  | 
|  K  |  Hour in am/pm (0-11)  |  Number  |  0  | 
|  M  |  Month in year  |  Month  |  July; Jul; 07  | 
|  m  |  Minute in hour  |  Number  |  30  | 
|  s  |  Second in minute  |  Number  |  55  | 
|  S  |  Millisecond  |  Number  |  978  | 
|  w  |  Week in year  |  Number  |  27  | 
|  W  |  Week in month  |  Number  |  2  | 
|  y  |  Year  |  Year  |  1996; 96   | 
|  z  |  Time zone  |  General  |  Pacific Standard Time; PST; GMT-08:00   | 
|  Z  |  Time zone  |  RFC  |  -0800  | 

# CURRENT\$1DATE
<a name="sql-reference-current-date"></a>

Returns the current Amazon Kinesis Data Analytics system date when the query executes as YYYY-MM-DD when the query executes.

For more information, see [CURRENT\$1TIME](sql-reference-current-time.md), [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md), [LOCALTIMESTAMP](sql-reference-local-timestamp.md), [LOCALTIME](sql-reference-localtime.md), and [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md).

## Example
<a name="sql-reference-current-date-example"></a>

```
+---------------+
| CURRENT_DATE  |
+---------------+
| 2008-08-27    |
+---------------+
```

# CURRENT\$1ROW\$1TIMESTAMP
<a name="sql-reference-current-row-timestamp"></a>

CURRENT\$1ROW\$1TIMESTAMP is an Amazon Kinesis Data Analytics extension to the SQL:2008 specification. This function returns the current timestamp as defined by the environment on which the Amazon Kinesis Data Analytics application is running. CURRENT\$1ROW\$1TIMESTAMP is always returned as UTC, not the local timezone.

CURRENT\$1ROW\$1TIMESTAMP is similar to [LOCALTIMESTAMP](sql-reference-local-timestamp.md), but returns a new timestamp for each row in a stream.

A query run with LOCALTIMESTAMP (or CURRENT\$1TIMESTAMP or CURRENT\$1TIME) as one of the columns puts into all output rows the time the query is first run.

If that column instead contains CURRENT\$1ROW\$1TIMESTAMP, each output row gets a newly-calculated value of TIME representing when that row was output.

**Note**  
CURRENT\$1ROW\$1TIMESTAMP is not defined in the SQL:2008 specification; it is an Amazon Kinesis Data Analytics extension.

For more information, see [CURRENT\$1TIME](sql-reference-current-time.md), [CURRENT\$1DATE](sql-reference-current-date.md), [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md), [LOCALTIMESTAMP](sql-reference-local-timestamp.md), [LOCALTIME](sql-reference-localtime.md), and [CURRENT\$1ROW\$1TIMESTAMP](#sql-reference-current-row-timestamp).

# CURRENT\$1TIME
<a name="sql-reference-current-time"></a>

Returns the current Amazon Kinesis Data Analytics system time when the query executes. Time is in UTC, not the local time zone.

For more information, see [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md), [LOCALTIMESTAMP](sql-reference-local-timestamp.md), [LOCALTIME](sql-reference-localtime.md), [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md), and [CURRENT\$1DATE](sql-reference-current-date.md).

## Example
<a name="sql-reference-current-time-example"></a>

```
+---------------+
| CURRENT_TIME  |
+---------------+
| 20:52:05      |
```

# CURRENT\$1TIMESTAMP
<a name="sql-reference-current-timestamp"></a>

Returns the current database system timestamp (as defined on the environment on which Amazon Kinesis Data Analytics is running) as a datetime value. 

For more information, see [CURRENT\$1TIME](sql-reference-current-time.md), [CURRENT\$1DATE](sql-reference-current-date.md), [LOCALTIME](sql-reference-localtime.md), [LOCALTIMESTAMP](sql-reference-local-timestamp.md),  and [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md).

## Example
<a name="current-timestamp-example"></a>

```
+--------------------+
| CURRENT_TIMESTAMP  |
+--------------------+
| 20:52:05           |
+--------------------+
```

# EXTRACT
<a name="sql-reference-extract"></a>

```
EXTRACT(YEAR|MONTH|DAY|HOUR|MINUTE|SECOND FROM <datetime expression>|<interval expression>)
```

The EXTRACT function extracts one field from a DATE, TIME, TIMESTAMP or INTERVAL expression. Returns BIGINT for all fields other than SECOND. For SECOND it returns DECIMAL(5,3) and includes milliseconds.

## Syntax
<a name="sql-reference-extract-syntax"></a>

### Examples
<a name="sql-reference-extract-examples"></a>


| Function | Result | 
| --- | --- | 
|  <pre>EXTRACT(DAY FROM INTERVAL '2 3:4:5.678' DAY TO SECOND)</pre>  |  2  | 
|  <pre>EXTRACT(HOUR FROM INTERVAL '2 3:4:5.678' DAY TO SECOND)</pre>  |  3  | 
|  <pre>EXTRACT(MINUTE FROM INTERVAL '2 3:4:5.678' DAY TO SECOND)</pre>  |  4  | 
|  <pre>EXTRACT(SECOND FROM INTERVAL '2 3:4:5.678' DAY TO SECOND)</pre>  |  5.678  | 
|  <pre>EXTRACT(MINUTE FROM CURRENT_ROW_TIMESTAMP)<br />where CURRENT_ROW_TIMESTAMP is 2016-09-23 04:29:26.234</pre>  |  29  | 
|  <pre>EXTRACT (HOUR FROM CURRENT_ROW_TIMESTAMP)</pre> where CURRENT\$1ROW\$1TIMESTAMP is 2016-09-23 04:29:26.234  |  4  | 

### Use in Function
<a name="sql-ref-extract-use"></a>

EXTRACT can be used for conditioning data, as in the following function which returns a 30 minute floor when [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md) is input for p\$1time.

```
CREATE or replace FUNCTION FLOOR30MIN( p_time TIMESTAMP )
RETURNS  TIMESTAMP
CONTAINS SQL
RETURNS NULL ON NULL INPUT
RETURN  floor(p_time to HOUR) + (( EXTRACT (  MINUTE FROM p_time  ) / 30)* INTERVAL '30' MINUTE ) ;
```

You would implement this function using code along the following lines:

```
SELECT stream FLOOR30MIN( CURRENT_ROW_TIMESTAMP ) as ROWTIME , * from "MyStream" ) over (range current row ) as r
```

**Note**  
The code above assumes that you have previously created a stream called "MyStream."

# LOCALTIME
<a name="sql-reference-localtime"></a>

Returns the current time when the query executes as defined by the environment on which Amazon Kinesis Data Analytics is running. LOCALTIME is always returned as UTC (GMT), not the local timezone.

For more information, see [CURRENT\$1TIME](sql-reference-current-time.md), [CURRENT\$1DATE](sql-reference-current-date.md), [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md), [LOCALTIMESTAMP](sql-reference-local-timestamp.md), and [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md).

## Example
<a name="sql-reference-localtime-example"></a>

```
 VALUES localtime;
+------------+
| LOCALTIME  |
+------------+
| 01:11:15   |
+------------+
1 row selected (1.558 seconds)
```

## Limitations
<a name="sql-reference-localtime-limitations"></a>

Amazon Kinesis Data Analytics does not support the optional <time precision> parameter specified in SQL:2008. This is a departure from the SQL:2008 standard.

# LOCALTIMESTAMP
<a name="sql-reference-local-timestamp"></a>

Returns the current timestamp as defined by the environment on Amazon Kinesis Data Analytics application is running. Time is always returned as UTC (GMT), not the local timezone.

For more information, see [CURRENT\$1TIME](sql-reference-current-time.md), [CURRENT\$1DATE](sql-reference-current-date.md), [CURRENT\$1TIMESTAMP](sql-reference-current-timestamp.md), [LOCALTIME](sql-reference-localtime.md), and [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md).

## Example
<a name="sql-reference-local-timestamp-example"></a>

```
values localtimestamp;
+--------------------------+
|      LOCALTIMESTAMP      |
+--------------------------+
| 2008-08-27 01:13:42.206  |
+--------------------------+
1 row selected (1.133 seconds)
```

## Limitations
<a name="sql-reference-local-timestamp-limitations"></a>

Amazon Kinesis Data Analytics does not support the optional <timestamp precision> parameter specified in SQL:2008. This is a departure from the SQL:2008 standard.

# TSDIFF
<a name="sql-reference-tsdiff"></a>

Returns NULL if any of the arguments is null.

 Otherwise returns the difference between the two timestamps in milliseconds.

## Syntax
<a name="sql-reference-tsdiff-syntax"></a>

```
TSDIFF(startTime, endTime)
```

## Parameters
<a name="sql-reference-tsdiff-parameters"></a>

*startTime*

A Unix timestamp in the format milliseconds since '1970-01-01 00:00:00' UTC, expressed as a BIGINT.

*endTime*

A Unix timestamp in the format milliseconds since '1970-01-01 00:00:00' UTC, expressed as a BIGINT.

# Null Functions
<a name="sql-reference-null-functions"></a>

The topics in this section describe the null functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [COALESCE](sql-reference-coalesce.md)
+ [NULLIF](sql-reference-nullif.md)

# COALESCE
<a name="sql-reference-coalesce"></a>

```
COALESCE (
      <value-expression>
      {,<value-expression>}... )
```

The COALESCE function takes a list of expressions (all of which must be of the same type) and returns the first non-null argument from the list. If all of the expressions are null, COALESCE returns null.

## Examples
<a name="sql-reference-coalesce-examples"></a>


| Expression | Result | 
| --- | --- | 
|  COALESCE('chair')  |  chair  | 
|  COALESCE('chair', null, 'sofa')  |  chair  | 
|  COALESCE(null, null, 'sofa')  |  sofa  | 
|  COALESCE(null, 2, 5)  |  2  | 

# NULLIF
<a name="sql-reference-nullif"></a>

```
NULLIF ( <value-expression>, <value-expression> )
```

Returns null if the two input arguments are equal, otherwise returns the first value. Both arguments must be of comparable type, or an exception is raised.

## Examples
<a name="sql-reference-nullif-examples"></a>


| Function | Result | 
| --- | --- | 
|  NULLIF(4,2)  |  4  | 
|  NULLIF(4,4)  |  <null>  | 
|  NULLIF('amy','fred')  |  amy  | 
|  NULLIF('amy', cast(null as varchar(3)))  |  amy  | 
|  NULLIF(cast(null as varchar(3)),'fred')  |  <null>  | 

# Numeric Functions
<a name="sql-reference-numeric-functions"></a>

The topics in this section describe the numeric functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [ABS](sql-reference-abs.md)
+ [CEIL / CEILING](sql-reference-ceil.md)
+ [EXP](sql-reference-exp.md)
+ [FLOOR](sql-reference-floor.md)
+ [LN](sql-reference-ln.md)
+ [LOG10](sql-reference-log10.md)
+ [MOD](sql-reference-mod.md)
+ [POWER](sql-reference-power.md)
+ [STEP](sql-reference-step.md)

# ABS
<a name="sql-reference-abs"></a>

Returns the absolute value of the input argument. Returns `null` if the input argument is null.

```
ABS ( <numeric-expression>  <interval-expression>
      )
```

## Examples
<a name="sqlrf_abs_examples"></a>


| Function | Result | 
| --- | --- | 
|  ABS(2.0)  |  2.0  | 
|  ABS(-1.0)  |  1.0  | 
|  ABS(0)  |  0  | 
|  ABS(-3 \$1 3)  |  9  | 
|  ABS(INTERVAL '-3 4:20' DAY TO MINUTE)  |  INTERVAL '3 4:20' DAY TO MINUTE  | 

If you use `cast as VARCHAR` in SQLline to show the output, the value is returned as `+3 04:20`.

```
 values(cast(ABS(INTERVAL '-3 4:20' DAY TO MINUTE) AS VARCHAR(8)));
  +-----------+
    EXPR$0   
  +-----------+
   +3 04:20  
  +-----------+
  1 row selected
```

# CEIL / CEILING
<a name="sql-reference-ceil"></a>

```
 CEIL | CEILING ( <number-expression> )
 CEIL | CEILING ( <datetime-expression> TO <time-unit> )
 CEIL | CEILING ( <number-expression> )
 CEIL | CEILING ( <datetime-expression> TO <[[time-unit> )
```

When called with a numeric argument, CEILING returns the smallest integer equal to or larger than the input argument.

When called with a date, time, or timestamp expression, CEILING returns the smallest value equal to or larger than the input, subject to the precision specified by the <time unit>.

Returns null if any input argument is null.

## Examples
<a name="sql-reference-ceil-examples"></a>


| Function | Result | 
| --- | --- | 
|  CEIL(2.0)  |  2  | 
|  CEIL(-1.0)  |  -1  | 
|  CEIL(5.2)  |  6  | 
|  CEILING(-3.3)  |  -3  | 
|  CEILING(-3 \$1 3.1)  |  -9  | 
|  CEILING(TIMESTAMP '2004-09-30 13:48:23' TO HOUR)  |  TIMESTAMP '2004-09-30 14:00:00'  | 
|  CEILING(TIMESTAMP '2004-09-30 13:48:23' TO MINUTE)  |  TIMESTAMP '2004-09-30 13:49:00'  | 
|  CEILING(TIMESTAMP '2004-09-30 13:48:23' TO DAY)  |  TIMESTAMP '2004-10-01 00:00:00.0'  | 
|  CEILING(TIMESTAMP '2004-09-30 13:48:23' TO YEAR)  |  TIMESTAMP '2005-01-01 00:00:00.0'  | 

## Notes
<a name="sqlrf_ceil_notes"></a>
+ CEIL and CEILING are synonyms for this function provided by the SQL:2008 standard.
+ CEIL(<datetime value expression> TO <time unit>) is an Amazon Kinesis Data Analytics extension.
+ For more information, see [FLOOR](sql-reference-floor.md).

# EXP
<a name="sql-reference-exp"></a>

```
EXP ( <number-expression> )
```

Returns the value of e (approximately 2.7182818284590455) raised to the power of the input argument. Returns null if the input argument is null.

## Examples
<a name="sqlrf-exp-examples"></a>


| Function | Result | 
| --- | --- | 
|  EXP(1)  |  2.7182818284590455  | 
|  EXP(0)  |  1.0  | 
|  EXP(-1)  |  0.36787944117144233  | 
|  EXP(10)  |  22026.465794806718  | 
|  EXP(2.5)  |  12.182493960703473  | 

# FLOOR
<a name="sql-reference-floor"></a>

```
FLOOR ( <time-unit> )
```

When called with a numeric argument, FLOOR returns the largest integer equal to or smaller than the input argument.

When called with a date, time, or timestamp expression, FLOOR returns the largest value equal to or smaller than the input, subject to the precision specified by <time unit>.

FLOOR returns null if any input argument is null.

## Examples
<a name="sqlref-floor-examples"></a>


|  Function  |  Result  | 
| --- | --- | 
|  FLOOR(2.0)  |  2  | 
|  FLOOR(-1.0)  |  -1  | 
|  FLOOR(5.2)  |  5  | 
|  FLOOR(-3.3)  |  -4  | 
|  FLOOR(-3 \$1 3.1)  |  -10  | 
|  FLOOR(TIMESTAMP '2004-09-30 13:48:23' TO HOUR)  |  TIMESTAMP '2004-09-30 13:00:00'  | 
|  FLOOR(TIMESTAMP '2004-09-30 13:48:23' TO MINUTE)  |  TIMESTAMP '2004-09-30 13:48:00'  | 
|  FLOOR(TIMESTAMP '2004-09-30 13:48:23' TO DAY)  |  TIMESTAMP '2004-09-30 00:00:00.0'  | 
|  FLOOR(TIMESTAMP '2004-09-30 13:48:23' TO YEAR)  |  TIMESTAMP '2004-01-01 00:00:00.0'  | 

## Notes
<a name="sqlrf-floor-notes"></a>

**Note**  
FLOOR ( <datetime expression> TO <timeunit> ) is an Amazon Kinesis Data Analytics extension.  
The STEP function is similar to FLOOR but can round values down to arbitrary intervals, such as 30 seconds. For more information, see [STEP](sql-reference-step.md).

# LN
<a name="sql-reference-ln"></a>

```
LN ( <number-expression> )
```

Returns the natural log (that is, the log with respect to base e) of the input argument. If the argument is negative or 0, an exception is raised. Returns null if the input argument is null.

For more information, see [LOG10](sql-reference-log10.md) and [EXP](sql-reference-exp.md).

## Examples
<a name="sql-reference-ln-examples"></a>


| Function | Result | 
| --- | --- | 
|  LN(1)  |  0.0  | 
|  LN(10)  |  2.302585092994046  | 
|  LN(2.5)  |  0.9162907318741551  | 

# LOG10
<a name="sql-reference-log10"></a>

```
LOG10 ( <number-expression> )
```

Returns the base 10 logarithm of the input argument. If the argument is negative or 0, an exception is raised. Returns null if the input argument is null.

## Examples
<a name="sql-reference-log10-examples"></a>


|  Function  |  Result  | 
| --- | --- | 
|  LOG10(1)  |  0.0  | 
|  LOG10(100)  |  2.0  | 
|  log10(cast('23' as decimal))  |  1.3617278360175928  | 

## 
<a name="sql-reference-log10-notes"></a>

**Note**  
LOG10 is not a SQL:2008 standard function; it is an Amazon Kinesis Data Analytics extension to the standard.

# MOD
<a name="sql-reference-mod"></a>

```
MOD ( <dividend>, <divisor> )
 <dividend> := <integer-expression>
 <divisor>  := <integer-expression>
```

Returns the remainder when the first argument (the dividend is divided by the second numeric argument (the divisor). If the divisor is zero, a divide by zero error is raised.

## Examples
<a name="sqlrf-mod-examples"></a>


| Function | Result | 
| --- | --- | 
|  MOD(4,2)  |  0  | 
|  MOD(5,3)  |  2  | 
|  MOD(-4,3)  |  -1  | 
|  MOD(5,12)  |  5  | 

## Limitations
<a name="sqlrf-mod-limitations"></a>

The Amazon Kinesis Data Analytics MOD function only supports arguments of scale 0 (integers). This is a departure from the SQL:2008 standard, which supports any numeric argument. Other numeric arguments can be CAST to an integer, of course.

# POWER
<a name="sql-reference-power"></a>

```
 POWER ( <base>, <exponent> )
 <base> := <number-expression>
 <exponent> := <number-expression>
```

Returns the value of the first argument (the base) raised to the power of the second argument (the exponent). Returns null if either the base or the exponent is null, and raises an exception if the base is zero and the exponent is negative, or if the base is negative and the exponent is not a whole number. 

## Examples
<a name="sql-reference-power-examples"></a>


| Function | Result | 
| --- | --- | 
|  POWER(3,2)  |  9  | 
|  POWER(-2,3)  |  -8  | 
|  POWER(4,-2)  |  1/16 ..or.. 0.0625  | 
|  POWER(10.1,2.5)  |  324.19285157140644  | 

# STEP
<a name="sql-reference-step"></a>

```
STEP ( <time-unit> BY INTERVAL '<integer-literal>' <interval-literal> )
STEP ( <integer-expression> BY <integer-literal> )
```

STEP rounds down the input value (<time-unit> or <integer-expression>) to the nearest multiple of <integer-literal>.

The STEP function works on datetime data types or integer types. STEP is a scalar function that performs an operation similar to [FLOOR](sql-reference-floor.md). However, by using STEP you can specify an arbitrary time or integer interval for rounding down the first argument.

STEP returns null if any input argument is null.

## STEP with an Integer Argument
<a name="sql-reference-step-integer"></a>

When called with an integer argument, STEP returns the largest integer multiple of the <interval-literal> argument equal to or smaller than the <integer-expression> argument. For example, `STEP(23 BY 5)` returns `20`, because 20 is the greatest multiple of 5 that is less than 23.

`STEP ( <integer-expression > BY <integer-literal> )` is equivalent to the following.

```
( <integer-expression> / <integer-literal> ) * <integer-literal>
```

### Examples
<a name="sql-reference-step-integer-examples"></a>

In the following examples, the return value is the largest multiple of <integer-literal> that is equal to or less than <integer-expression>. 


|  Function  |  Result  | 
| --- | --- | 
|  STEP(23 BY 5)  |  20  | 
|  STEP(30 BY 10)  |  30  | 

## STEP with a Date Type Argument
<a name="sql-reference-step-date"></a>

When called with a date, time, or timestamp argument, STEP returns the largest value equal to or smaller than the input, subject to the precision specified by <time unit>. 

`STEP(<datetimeExpression> BY <intervalLiteral>)` is equivalent to the following.

```
(<datetimeExpression> - timestamp '1970-01-01 00:00:00')  /  <intervalLiteral> )  * <intervalLiteral> + timestamp '1970-01-01 00:00:00'
```

<intervalLiteral> can be one of the following:
+ YEAR
+ MONTH
+ DAY
+ HOUR
+ MINUTE
+ SECOND

### Examples
<a name="sql-reference-step-date-examples"></a>

In the following examples, the return value is the latest multiple of <integer-literal> of the unit specified by <intervalLiteral> that is equal to or earlier than <datetime-expression>. 


|  Function  |  Result  | 
| --- | --- | 
|  STEP(CAST('2004-09-30 13:48:23' as TIMESTAMP) BY INTERVAL '10' SECOND)  |  '2004-09-30 13:48:20'  | 
|  STEP(CAST('2004-09-30 13:48:23' as TIMESTAMP) BY INTERVAL '2' HOUR)  |  '2004-09-30 12:00:00'  | 
|  STEP(CAST('2004-09-30 13:48:23' as TIMESTAMP) BY INTERVAL '5' MINUTE)  |  '2004-09-30 13:45:00'  | 
|  STEP(CAST('2004-09-27 13:48:23' as TIMESTAMP) BY INTERVAL '5' DAY)  |  '2004-09-25 00:00:00.0'  | 
|  STEP(CAST('2004-09-30 13:48:23' as TIMESTAMP) BY INTERVAL '1' YEAR)  |  '2004-01-01 00:00:00.0'  | 

## STEP in a GROUP BY clause (tumbling window)
<a name="sql-reference-step-tumbling"></a>

In this example, an aggregate query has a `GROUP BY` clause with `STEP` applied to `ROWTIME` that groups the stream into finite rows.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    sum_price     DOUBLE);
-- CREATE OR REPLACE PUMP to insert into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT STREAM 
        ticker_symbol,
        SUM(price) AS sum_price
    FROM "SOURCE_SQL_STREAM_001"
     GROUP BY ticker_symbol, STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
```

### Results
<a name="sql-reference-step-tumbling-example-results"></a>

The preceding example outputs a stream similar to the following.

![\[Data table showing ROWTIME, TICKER_SYMBOL, and SUM_PRICE columns with sample entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-step-example-1.png)


## STEP in an OVER clause (sliding window)
<a name="sql-reference-step-sliding"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ingest_time TIMESTAMP,
    ticker_symbol VARCHAR(4),
    ticker_symbol_count integer);

--Create pump data into output
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
-- select the ingest time used in the GROUP BY clause
SELECT STREAM STEP(source_sql_stream_001.approximate_arrival_time BY INTERVAL '10' SECOND) as ingest_time,
    ticker_symbol,
   count(*) over w1 as ticker_symbol_count
FROM source_sql_stream_001
WINDOW w1 AS (
    PARTITION BY ticker_symbol,
   -- aggregate records based upon ingest time
       STEP(source_sql_stream_001.approximate_arrival_time BY INTERVAL '10' SECOND)   
   -- use process time as a trigger, which can be different time window as the aggregate
   RANGE INTERVAL '10' SECOND PRECEDING);
```

### Results
<a name="sql-reference-step-sliding-example-results"></a>

The preceding example outputs a stream similar to the following.

![\[Table showing ROWTIME, INGEST_TIME, TICKER_SYMBOL, and TICKER_SYMBOL_COUNT columns with sample data.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-step-example-2.png)


## Notes
<a name="sql-reference-step-notes"></a>

STEP ( <datetime expression> BY <literal expression> ) is an Amazon Kinesis Data Analytics extension.

You can use STEP to aggregate results using tumbling windows. For more information on tumbling windows, see [Tumbling Window Concepts](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).

# Log Parsing Functions
<a name="sql-reference-pattern-matching-functions"></a>

Amazon Kinesis Data Analytics features the following functions for log parsing: 
+ [FAST\$1REGEX\$1LOG\$1PARSER](sql-reference-fast-regex-log-parser.md) works similarly to the regex parser, but takes several "shortcuts" to ensure faster results. For example, the fast regex parser stops at the first match it finds (known as "lazy" semantics.) 
+ [FIXED\$1COLUMN\$1LOG\$1PARSE](sql-reference-fixed-column-log-parse.md) parses fixed-width fields and automatically converts them to the given SQL types. 
+ [REGEX\$1LOG\$1PARSE](sql-reference-regex-log-parse.md) uses the default Java regular expression parser. For more information about this parser, see [Pattern](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) in the Java Platform documentation on the Oracle website.
+ [SYS\$1LOG\$1PARSE](sql-reference-sys-log-parse.md) processes entries commonly found in UNIX/Linux system logs.
+ [VARIABLE\$1COLUMN\$1LOG\$1PARSE](sql-reference-variable-column-log-parse.md) splits an input string (its first argument, <character-expression>) into fields separated by a delimiter character or delimiter string.
+ [W3C\$1LOG\$1PARSE](sql-reference-w3c-log-parse.md) processes entries in W3C-predefined-format logs.

# FAST\$1REGEX\$1LOG\$1PARSER
<a name="sql-reference-fast-regex-log-parser"></a>

```
FAST_REGEX_LOG_PARSE('input_string', 'fast_regex_pattern')
```

The FAST\$1REGEX\$1LOG\$1PARSE works by first decomposing the regular expression into a series of regular expressions, one for each expression inside a group and one for each expression outside a group. Any fixed length portions at the start of any expressions are moved to the end of the previous expression. If any expression is entirely fixed length, it is merged with the previous expression. The series of expressions is then evaluated using lazy semantics with no backtracking. (In regular expression parsing parlance, "lazy" means don't parse more than you need to at each step. "Greedy" means parse as much as you can at each step.)

The columns returned will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type varchar(1024).  See sample usage below at First FRLP Example and at Further FRLP Examples.

## FAST\$1REGEX\$1LOG\$1PARSER (FRLP)
<a name="w2aac22c23b7b9"></a>

FAST\$1REGEX\$1LOG\$1PARSER uses a lazy search - it stops at the first match. By contrast, the [REGEX\$1LOG\$1PARSE](sql-reference-regex-log-parse.md) is greedy unless possessive quantifiers are used.

FAST\$1REGEX\$1LOG\$1PARSE scans the supplied input string for all the characters specified by the Fast Regex pattern.  
+ All characters in that input string must be accounted for by the characters and scan groups defined in the Fast Regex pattern. Scan groups define the fields-or-columns resulting when a scan is successful.
+ If all characters in the input\$1string are accounted for when the Fast Regex pattern is applied, then FRLP creates an output field (column) from each parenthetical expression in that Fast Regex pattern, in left-to-right order. The first (leftmost) parenthetical expression creates the first output field, the next (second) parenthetical expression creates the second output field, up through the last parenthetical expression creating the last output field.  
+ If the input\$1string contains any characters not accounted for (matched) by applying Fast Regex pattern, then FRLP returns no fields at all.

## Character Class Symbols for Fast Regex
<a name="w2aac22c23b7c11"></a>

Fast Regex uses a different set of character class symbols from the regular regex parser:


| Symbol or Construct | Meaning | 
| --- | --- | 
|  -  |  Character range, including endpoints  | 
|  [ charclasses ]  |  Character class  | 
|  [^ charclasses ]  |  Negated character class  | 
|  \$1  |  Union  | 
|  &  |  Intersection  | 
|  ?  |  Zero or one occurrence  | 
|  \$1  |  Zero or more occurrences  | 
|  \$1  |  One or more occurrences  | 
|  \$1n\$1  |  n occurrences  | 
|  \$1n,\$1  |  n or more occurrences  | 
|  \$1n,m\$1  |  n to m occurrences, including both  | 
|  .  |  Any single character  | 
|  \$1  |  The empty language  | 
|  @  |  Any string  | 
|  "<Unicode string without double-quotes>"  |  A string)  | 
|  ( )  |  The empty string)  | 
|  ( unionexp )  |  Precedence override  | 
|  < <identifier> >  |  Named pattern  | 
|  <n-m>  |  Numerical interval  | 
|  charexp:=<Unicode character>  |  A single non-reserved character  | 
|  \$1 <Unicode character>  |  A single character)  | 



We support the following POSIX standard identifiers as named patterns:

        <Digit>    -    "[0-9]"

        <Upper>    -    "[A-Z]"

        <Lower>    -    "[a-z]"

        <ASCII>    -    "[\$1u0000-\$1u007F]"

        <Alpha>    -    "<Lower>\$1<Upper>"

        <Alnum>    -    "<Alpha>\$1<Digit>"

        <Punct>    -    "[\$1\$1"\$1\$1%&'()\$1\$1,-./:;<=>?@[\$1\$1\$1]^\$1`\$1\$1\$1\$1]"

        <Blank>    -    "[ \$1t]"

        <Space>    -    "[ \$1t\$1n\$1f\$1r\$1u000B]"

        <Cntrl>    -    "[\$1u0000-\$1u001F\$1u007F]"

        <XDigit>    -    "0-9a-fA-F"

        <Print>    -    "<Alnum>\$1<Punct>"

        <Graph>    -    "<Print>"

 First FRLP Example

This first example uses the Fast Regex pattern '(.\$1)\$1(.\$1.\$1)\$1.\$1'

```
select t.r."COLUMN1", t.r."COLUMN2" from
. . . . . . . . . . . . .> (values (FAST_REGEX_LOG_PARSE('Mary_had_a_little_lamb', '(.*)_(._.*)_.*'))) t(r);
+------------------------+-----------------------+
|         COLUMN1        |         COLUMN2       |
+------------------------+-----------------------+
| Mary_had               |     a_little_lamb     |
+------------------------+-----------------------+
1 row selected
```

1. The scan of input\$1string ('Mary\$1had\$1a\$1little\$1lamb') begins with the 1st group defined in Fast Regex pattern:  (.\$1), which means "find any character 0 or more times."  

    '**(.\$1)**\$1(.\$1.\$1)\$1.\$1'

1. This group specification, defining the first column to be parsed, asks the Fast Regex Log Parser to accept input string characters starting from the input string's first character until it finds the next group in the Fast Regex Pattern or the next literal character or string that is not inside a group (not in parentheses). In this example, the next literal character after the first group is an underscore:  

    '(.\$1)**\$1**(.\$1.\$1)\$1.\$1'

1. The parser scans each character in the input string until it finds the next specification in the Fast Regex pattern: an underscore:

    '(.\$1)\$1**(.\$1.\$1)**\$1.\$1'

1. Group-2 thus begins with "a\$1l". Next, the parser needs to determine the end of this group, using the remaining specification in the pattern:

    '(.\$1)\$1(.\$1.\$1)**\$1.\$1**'

**Note**  
Character-strings or literals specified in the pattern but not inside a group must be found in the input string but will not be included in any output field.  
If the Fast Regex pattern had omitted the final asterisk, no results would be obtained.

## Further FRLP Examples
<a name="furtherExamples"></a>

The next example uses a "\$1", which means repeat the last expression 1 or more times ("\$1" means 0 or more times).

### Example A
<a name="w2aac22c23b7c13b4"></a>

In this case, the longest prefix is the first underscore. The first field/column group will match on "Mary" and the second will not match.  

```
select t.r."COLUMN1", t.r."COLUMN2" from
      . . . . . . . . . . . . .> (values (FAST_REGEX_LOG_PARSE('Mary_had_a_little_lamb',
        '(.*)_+(._.*)'))) t(r); 
      +----------+----------+
      | COLUMN1  | COLUMN2  |
      +----------+----------+
      +----------+----------+
      No rows selected
```

The preceding example returns no fields because the "\$1" required there be at least one more underscore-in-a-row; and the input\$1string does not have that. 

### Example B
<a name="w2aac22c23b7c13b6"></a>

In the following case, the '\$1' is superfluous because of the lazy semantics:

```
select t.r."COLUMN1", t.r."COLUMN2" from
      . . . . . . . . . . . . .> (values (FAST_REGEX_LOG_PARSE('Mary____had_a_little_lamb',
        '(.*)_+(.*)'))) t(r);
      +-------------------------+-------------------------+
      |         COLUMN1         |         COLUMN2         |
      +-------------------------+-------------------------+
      | Mary                    |    had_a_little_lamb    |
      +-------------------------+-------------------------+
      1 row selected
```

 The preceding example succeeds in returning two fields because after finding the multiple underscores required by the "\$1\$1" specification, the group-2 specification (.\$1) accepts all remaining characters in the .input\$1string. Underscores do not appear trailing "Mary" nor leading "had" because the "\$1\$1" specification is not enclosed in parentheses.

As mentioned in the introduction, "lazy" in regular expression parsing parlance means don't parse more than you need to at each step;  "Greedy" means parse as much as you can at each step.

The first case in this topic, A, fails because when it gets to the first underscore, the regex processor has no way of knowing without backtracking that it can't use the underscore to match "\$1\$1", and FRLP doesn't backtrack, whereas [REGEX\$1LOG\$1PARSE](sql-reference-regex-log-parse.md) does.  

The search directly above, B, gets turned into three searches: 

```
(.*)_
_*(._
.*)
```

Notice that the second field group gets split between the second and third searches, also that "\$1\$1" is considered the same as "\$1\$1\$1" (that is, it considers "underscore repeat-underscore-1-or-more-times" the same as "underscore underscore repeat-underscore-0-or-more-times".)

Case A demonstrates the main difference between REGEX\$1LOG\$1PARSE and FAST\$1REGEX\$1LOG\$1PARSE, because the search in A would work under REGEX\$1LOG\$1PARSE because that function would use backtracking.

### Example C
<a name="w2aac22c23b7c13b8"></a>

In the following example, the plus is not superfluous, because the "<Alpha> (any alphabetic char) is fixed length thus will be used as a delimiter for the " \$1" search.

```
select t.r."COLUMN1", t.r."COLUMN2" from
. . . . . . . . . . . . .> (values (FAST_REGEX_LOG_PARSE('Mary____had_a_little_lamb', '(.*)_+(<Alpha>.*)'))) t(r);
+----------------------------+----------------------------+
|          COLUMN1           |          COLUMN2           |
+----------------------------+----------------------------+
| Mary                       | had_a_little_lamb          |
+----------------------------+----------------------------+
1 row selected 

'(.*) +(<Alpha>.*)' gets converted into three regular expressions:
'.* '
' *<Alpha>'
'.*$'
```

Each is matched in turn using lazy semantics. 

The columns returned will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type varchar(1024).

# FIXED\$1COLUMN\$1LOG\$1PARSE
<a name="sql-reference-fixed-column-log-parse"></a>

Parses fixed-width fields and automatically converts them to the given SQL types. 

```
FIXED_COLUMN_LOG_PARSE ( <string value expression>, <column description string expression> )
 <column description string expression> := '<column description> [,...]'
 <column description> :=
   <identifier> TYPE <data type> [ NOT NULL ]
   START <numeric value expression> [FOR <numeric constant expression>]
```

Starting position of column is 0. Column specifications for types DATE,TIME and TIMESTAMP support a format parameter allowing the user to specify exact time component layout. The parser uses the Java class [java.lang.SimpleDateFormat](http://docs.oracle.com/javase/1.5.0/docs/api/java/text/SimpleDateFormat.html) to parse the strings for types DATE, TIME and TIMESTAMP. The [Date and Time Patterns](sql-reference-parse-timestamp-format.md) topic gives a full description and examples of timestamp format strings. The following is an example of a column definition with a format string:

```
"name" TYPE TIMESTAMP 'dd/MMM/yyyy:HH:mm:ss'
```

**Related Topics**  
[REGEX\$1LOG\$1PARSE](sql-reference-regex-log-parse.md)

# REGEX\$1LOG\$1PARSE
<a name="sql-reference-regex-log-parse"></a>

```
REGEX_LOG_PARSE (<character-expression>,<regex-pattern>,<columns>)<regex-pattern> := <character-expression>[OBJECT] <columns> := <columnname> [ <datatype> ] {, <columnname> <datatype> }*
```

Parses a character string based on Java Regular Expression patterns as defined in [java.util.regex.pattern](http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html).

Columns are based on match groups defined in the regex-pattern. Each group defines a column, and the groups are processed from left to right. Failure to match produces a NULL value result: If the regular expression does not match the string passed as the first parameter, NULL is returned.

The columns returned will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type varchar(1024).

## Examples
<a name="sql-reference-regex-log-parse-examples"></a>

### Example Dataset
<a name="w2aac22c23c11c11b3"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return results from two capture groups
<a name="w2aac22c23c11c11b5"></a>

The following code example searches the contents of the `sector` field for a letter `E` and the character that follows it, and then searches for a letter R, and returns it and all characters following it:

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (match1 VARCHAR(1024), match2 VARCHAR(1024));
 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM T.REC.COLUMN1, T.REC.COLUMN2
    FROM 
         (SELECT STREAM SECTOR, 
             REGEX_LOG_PARSE(SECTOR, '.*([E].).*([R].*)') AS REC 
             FROM SOURCE_SQL_STREAM_001) AS T;
```

The preceding code example produces results similar to the following:

![\[Table showing ROWTIME, MATCH1, and MATCH2 columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-regex-log-parse-example-1.png)


### Example 2: Return a stream field and results from two capture groups
<a name="w2aac22c23c11c11b7"></a>

The following code example returns the `sector` field, and searches the contents of the `sector` field for a letter `E` and returns it and the character that follows it, and then searches for a letter R, and returns it and all characters following it:

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (sector VARCHAR(24), match1 VARCHAR(24), match2 VARCHAR(24));

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM T.SECTOR, T.REC.COLUMN1, T.REC.COLUMN2
    FROM 
         (SELECT STREAM SECTOR,
             REGEX_LOG_PARSE(SECTOR, '.*([E].).*([R].*)') AS REC
             FROM SOURCE_SQL_STREAM_001) AS T;
```

The preceding code example produces results similar to the following:

![\[Table showing data entries for ROWTIME, SECTOR, MATCH1, and MATCH2 columns with healthcare and energy sectors.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-regex-log-parse-example-2.png)


## 
<a name="sqlrf_regex_log_parse_formoreinfo"></a>

For more information, see [FAST\$1REGEX\$1LOG\$1PARSER](sql-reference-fast-regex-log-parser.md).

## Quick Regex Reference
<a name="w2aac22c23c11c15"></a>

For full details on Regex, see [java.util.regex.pattern](http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html)


|  |  | 
| --- |--- |
|  [xyz] Find single character of: x, y or z [^abc] Find any single character except: x, y, or z [r-z] Find any single character between r-z [r-zR-Z] Find any single character between r-z or R-Z ^ Start of line \$1 End of line \$1A Start of string \$1z End of string . Any single character \$1s Find any whitespace character \$1S Find any non-whitespace character \$1d Find any digit \$1D Find any non-digit  |  \$1w Find any word character (letter, number, underscore) \$1W Find any non-word character \$1b Find any word boundary (...) Capture everything enclosed (x\$1y) Find x or y (also works with symbols such as \$1d or \$1s) x? Find zero or one of x (also works with symbols such as \$1d or \$1s) x\$1 Find zero or more of x (also works with symbols such as \$1d or \$1s) x\$1 Find one or more of x (also works with symbols such as \$1d or \$1s) x\$13\$1 Find exactly 3 of x (also works with symbols such as \$1d or \$1s) x\$13,\$1 Find 3 or more of x (also works with symbols such as \$1d or \$1s) x\$13,6\$1 Find between 3 and 6 of x (also works with symbols such as \$1d or \$1s)  | 

# SYS\$1LOG\$1PARSE
<a name="sql-reference-sys-log-parse"></a>

Parses the standard syslog format:

```
 Mon DD HH:MM:SS server message
```

SYS\$1LOG\$1PARSE processes entries commonly found in UNIX/Linux system logs. System log entries start with a timestamp and are followed with a free form text field. SYS\$1LOG\$1PARSE output consists of two columns. The first column is named "COLUMN1" and is SQL data type TIMESTAMP. The second column is named "COLUMN2" and is SQL type VARCHAR().

**Note**  
For more information about SYSLOG, see [IETF RFC3164](https://tools.ietf.org/html/rfc3164). For more information about date-time patterns and matching, see [Date and Time Patterns](sql-reference-parse-timestamp-format.md).

# VARIABLE\$1COLUMN\$1LOG\$1PARSE
<a name="sql-reference-variable-column-log-parse"></a>

```
 VARIABLE_COLUMN_LOG_PARSE(
  <character-expression>, <columns>, <delimiter-string>
  [ , <escape-string>, <quote-string> ] )
  <columns> := <number of columns> | <list of columns>
  <number of columns> := <numeric value expression>
  <list of columns> := '<column description>[, ...]'
  <column description> := <identifier> TYPE <data type> [ NOT NULL ]
  <delimiter string> := <character-expression>
  <escape-string> := <character-expression>
  <quote-string> := '<begin quote character> [ <end quote character> ]'
```



VARIABLE\$1COLUMN\$1LOG\$1PARSE splits an input string (its first argument, <character-expression>) into fields separated by a delimiter character or delimiter string. Thus it handles comma-separated values or tab-separated values. It can be combined with [FIXED\$1COLUMN\$1LOG\$1PARSE](sql-reference-fixed-column-log-parse.md) to handle something like maillog, where some fields are fixed-length and others are variable-length.

**Note**  
 Parsing of binary files is not supported.

The arguments <escape-string> and <quote-string> are optional. Specifying an <escape-string> allows the value of a field to contain an embedded delimiter. As a simple example, if the <delimiter-string> specified a comma, and the <escape-string> specified a backslash, then an input of "a,b' would be split into two fields "a" and "b", but an input of "a\$1,b" would result in a single field "a,b".

Since Amazon Kinesis Data Analytics supports [Expressions and Literals](sql-reference-expressions.md), a tab can also be a delimiter, specified using a unicode escape, e.g., u&'\$10009', which is a string consisting only of a tab character.

Specifying a <quote-string> is another way to hide an embedded delimiter. The <quote-string> should be a one or two character expression: the first is used as the <begin quote character> character; the second, if present, is used as the <end quote character> character. If only one character is supplied, it is used as both to begin and to end quoted strings. When the input includes a quoted string, that is, a string enclosed in the characters specified as <quote-string>, then that string appears in one field, even if it contains a delimiter.

Note that the <begin quote character> and <end quote character> are single characters and can be different. The <begin quote character> can be used to start and end the quoted string, or the <begin quote character> can start the quoted string and the <end quote character> used to end that quoted string.

When a list of columns <list of columns> is supplied as the second parameter <columns>, the column specifications (<column description>) for types DATE, TIME, and TIMESTAMP support a format parameter allowing the user to specify exact time component layout. The parser uses the Java class [java.lang.SimpleDateFormat](https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html) to parse the strings for those types. [Date and Time Patterns](sql-reference-parse-timestamp-format.md) gives a full description of timestamp format strings, with examples. The following is an example of a column definition with a format string:

```
    "name" TYPE TIMESTAMP 'dd/MMM/yyyy:HH:mm:ss'
```

By default, the output columns are named COLUMN1, COLUMN2, COLUMN3, etc., each of SQL data type VARCHAR(1024).

# W3C\$1LOG\$1PARSE
<a name="sql-reference-w3c-log-parse"></a>



```
 W3C_LOG_PARSE( <character-expression>, <format-string> )
 <format-string> := '<predefined-format> | <custom-format>'
 <predefined format> :=
         COMMON
       | COMMON WITH VHOST
       | NCSA EXTENDED
       | REFERER
       | AGENT
       | IIS
 <custom-format> := [an Apache log format specifier]
```

## W3C Predefined Formats
<a name="sql-reference-w3c-log-parse-predefined"></a>

Specifying the following W3C-predefined-format names summarizes using the format specifiers indicated, as shown in the following statement:

```
 select stream W3C_LOG_PARSE(message, 'COMMON') r  from w3ccommon t;
```


| Format Name | W3C Name | Format Specifiers | 
| --- | --- | --- | 
|  COMMON  |  Common Log Format (CLF)  |  %h %l %u %t "%r" %>s %b  | 
|  COMMON WITH VHOST  |  Common Log Format with Virtual Host  |  %v %h %l %u %t "%r" %>s %b  | 
|  NCSA EXTENDED  |  NCSA extended/combined log format  |  %h %l %u %t "%r" %>s %b "%[Referer]i" "%[User-agent]i"  | 
|  REFERER  |  Referer log format  |  %[Referer]i ---> %U  | 
|  AGENT  |  Agent (Browser) log format  |  %[User-agent]i  | 

## W3C Format Specifiers
<a name="sql-reference-w3c-log-parse-specfics"></a>

The format specifiers are listed below. W3C\$1LOG\$1PARSE automatically detects these specifiers and output records with one column for each specifier. The column's type is automatically chosen based on the possible outputs of the specifier. For example, %b represents the number of bytes sent in processing an HTTP request, so the column type is numeric. For %B, however, zero bytes is represented by a dash - forcing the column type to be text. Note A explains what the "..." and "<" or ">" markings shown in the specifier table mean.

The following table lists W3C format specifiers alphabetically by command.


| Format Specifier | Explanation | 
| --- | --- | 
|  %  |  The percent sign (Apache 2.0.44 and later)  | 
|  %...a  |  Remote IP-address  | 
|  %...A  |  Local IP-address  | 
|  %...B  |  Size of response in bytes, excluding HTTP headers.  | 
|  %...b  |  Size of response in bytes, excluding HTTP headers, in CLF format, which means that when no bytes are sent, uses a '-' rather than a 0.  | 
|  %...[Customerdata]C  |  The contents of cookie Customerdata in the request sent to the server.  | 
|  %...D  |  The time taken to serve the request, in microseconds.  | 
|  %...[CUSTOMERDATA]e  |  The contents of the environment variable CUSTOMERDATA  | 
|  %...f  |  Filename  | 
|  %...h  |  Remote host  | 
|  %...H  |  The request protocol  | 
|  %...[Customerdata]i  |  The contents of Customerdata: header line(s) in the request sent to the server.  | 
|  %...l  |  Remote logname (from identd, if supplied)  | 
|  %...m  |  The request method  | 
|  %...[Customerdata]n  |  The contents of note Customerdata from another module.  | 
|  %...[Customerdata]o  |  The contents of Customerdata: header line(s) in the reply.  | 
|  %...p  |  The canonical port of the server serving the request  | 
|  %...P  |  The process ID of the child that serviced the request.  | 
|  %...[format]P  |  The process ID or thread id of the child that serviced the request. Valid formats are pid and tid. (Apache 2.0.46 and later)  | 
|  %...q  |  The query string (prepended with a ? if a query string exists, otherwise an empty string)  | 
|  %...r  |  First line of request  | 
|  %...s  |  Status. For requests that got internally redirected, this is the status of the \$1original\$1 request --- %...>s for the last.  | 
|  %...t  |  Time, in common log format time format (standard English format)  | 
|  %...[format]t  |  The time, in the form given by format, which should be in strimmer(3) format. (potentially localized)  | 
|  %...T  |  The time taken to serve the request, in seconds.  | 
|  %...u  |  Remote user (from auth; may be bogus if return status (%s) is 401)  | 
|  %...U  |  The URL path requested, not including any query string.  | 
|  %...v  |  The canonical ServerName of the server serving the request.  | 
|  %...V  |  The server name according to the UseCanonicalName setting.  | 
|  %...X  |  Connection status when response is completed X = connection aborted before the response completed. \$1 = connection may be kept alive after the response is sent. - = connection will be closed after the response is sent. (The %..X directive was %...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...[var]c syntax.)  | 
|  :%...I:  |  Bytes received, including request and headers, cannot be zero. You need to enable [mod\$1logio](http://httpd.apache.org/docs/2.0/mod/mod_logio.html) to use this.  | 
|  :%...O:  |  Bytes sent, including headers, cannot be zero. You need to enable [mod\$1logio](http://httpd.apache.org/docs/2.0/mod/mod_logio.html) to use this.  | 

**Note**  <a name="noteA"></a>
Some W3C format specifiers are shown as containing a "..." indication or a "<" or ">", which are optional controls on suppressing or redirecting the output of that specifier. The "..." can either be empty (as in the COMMON specification "\$1%h %u %r \$1%s %b") or it can indicate conditions for including the item. The conditions are a list of HTTP status codes, possibly preceded by "\$1", and if the specified condition is not met, then the column or field returned shows "-".   
For example, as described in the [Apache documentation](http://httpd.apache.org/docs/2.0/mod/mod_log_config.html), specifying "%400,501[User-agent]i" will log the User-agent only on 400 errors and 501 errors (Bad Request, Not Implemented). Similarly, "%\$1200,304,302[Referer]i" will log the Referer: on all requests that fail to return some sort of normal status.   
The modifiers "<" and ">" can be used to choose whether the original or final (respectively) request should be consulted when a request has been internally redirected. By default, the % directives %s, %U, %T, %D, and %r look at the original request while all others look at the final request. So for example, %>s can be used to record the final status of the request and %<u can be used to record the original authenticated user on a request that is internally redirected to an unauthenticated resource.   
For security reasons, starting with Apache 2.0.46, non-printable and other special characters are escaped mostly by using \$1xhh sequences, where hh stands for the hexadecimal representation of the raw byte. Exceptions from this rule are " and \$1 which are escaped by prepending a backslash, and all white space characters which are written in their C-style notation (\$1n, \$1t etc). In httpd 2.0 versions prior to 2.0.46, no escaping was performed on the strings from %...r, %...i and %...o, so great care was needed when dealing with raw log files, since clients could have inserted control characters into the log.   
Also, in httpd 2.0, the B format strings represent simply the size in bytes of the HTTP response (which will differ, for instance, if the connection is aborted, or if SSL is used). For the actual number of bytes sent over the network to the client, use the %O format provided by [mod\$1logio](http://httpd.apache.org/docs/2.0/mod/mod_logio.html).

## W3C Format Specifiers by Function or Category
<a name="w2aac22c23c17c11"></a>

The categories are bytes sent, connection status, content of environmental variable, filename, host, IP, notes, protocol, query string, replies, requests, and time. For the markings "..." or "<" or "<", see the previous note.


| Function or Category | W3C Format Specifiers | 
| --- | --- | 
|  **Bytes sent, excluding HTTP headers**  | 
|  with a "0" when no bytes are sent  |  %...B  | 
|  with a "-" (CLF format) when no bytes are sent  |  %...b  | 
|  Bytes received, including request and headers, cannot be zero Must enable [mod\$1logio](http://httpd.apache.org/docs/2.0/mod/mod_logio.html) to use this.  |  :% ... I:  | 
|  Bytes sent, including headers, cannot be zero Must enable [mod\$1logio](http://httpd.apache.org/docs/2.0/mod/mod_logio.html) to use this.  |  :%... O:  | 
|  **Connection status when response is completed**  | 
|  Connection aborted before the response completed  |  X  | 
|  Connection may be kept alive after the response is sent  |  \$1  | 
|  Connection will be closed after the response is sent  |  -  | 
|   The %..X directive was %...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...[var]c syntax.   | 
|  **Environment variable CUSTOMERDATA**  | 
|  contents  |  %...[CUSTOMERDATA]e  | 
|  **Filename**  |  %...f  | 
|  **Host (remote)**  |  %...h  | 
|  **Protocol**  |  %...H  | 
|  **IP addresses**  | 
|  Remote  |  %...a  | 
|  Local  |  %...A  | 
|  **Notes**  | 
|  Contents of note Customerdata from another module  |  %...[Customerdata]n  | 
|  **Protocol (request)**  |  %...H  | 
|  **Query string**  If query exists, prepended with a ? If not, the empty string.   |  %...q  | 
|  **Replies**  | 
|  Contents of Customerdata (header lines in the reply)  |  %...[Customerdata]o  | 

The W3C format specifiers for the response and time categories are listed following table.


| Function or Category | W3C Format Specifiers | 
| --- | --- | 
|  **Requests**  | 
|  Canonical port of the server serving the request  |  %...p  | 
|  Contents of cookie Customerdata in the request sent to server  |  %... [Customerdata]C  | 
|  Contents of BAR:header line(s)  |  %... [BAR]i  | 
|  First line sent:  |  %...r  | 
|  Microseconds taken to serve a request  |  %...D  | 
|  Protocol  |  %...H  | 
|  Process ID of the child that serviced the request  |  %...P  | 
|  Process ID or thread id of the child that serviced the request. Valid formats are pid and tid. (Apache 2.0.46 and later)  |  %...[format]P  | 
|  Remote logname (from identd, if supplied)  |  %...l  | 
|  Remote user: (from auth; may be bogus if return status (%s) is 401)  |  %...u  | 
|  Server (canonical ServerName) serving the request  |  %...v  | 
|  Server name by the UseCanonicalName setting  |  %...V  | 
|  Request method  |  %...m  | 
|  Return status  |  %s  | 
|  Seconds taken to serve the request  |  %...T  | 
|  Status of the \$1original\$1 request that was internally redirected  |  %...s  | 
|  Status of the last request  |  %...>s  | 
|  URL path requested, not including any query string  |  %...U  | 
|  **Time**  | 
|  Common log format time format (standard English format)  |  %...t  | 
|  Time in strftime(3) format, potentially localized  |  %...[format]t  | 
|  Seconds taken to serve the request  |  %...T  | 

## W3C Examples
<a name="w2aac22c23c17c13"></a>

W3C\$1LOG\$1PARSE supports access to logs generated by W3C-compliant applications like the Apache web server, producing output rows with one column for each specifier. The data types are derived from the log entry description specifiers listed in the [Apache mod\$1log\$1config](http://httpd.apache.org/docs/2.0/mod/mod_log_config.html?#formats) specification.

## Example 1
<a name="sql-reference-w3c-log-parse-info"></a>

The input in this example is taken from an Apache log file and is representative of the COMMON log format.

### Input
<a name="sql-reference-w3c-log-parse-info-input"></a>

```
(192.168.254.30 - John [24/May/2004:22:01:02 -0700]
                     "GET /icons/apache_pb.gif HTTP/1.1" 304 0),
(192.168.254.30 - Jane [24/May/2004:22:01:02 -0700]
                     "GET /icons/small/dir.gif HTTP/1.1" 304 0);
```

### DDL
<a name="sql-reference-w3c-log-parse-info-ddl"></a>

```
CREATE OR REPLACE PUMP weblog AS
        SELECT STREAM
            l.r.COLUMN1,
            l.r.COLUMN2,
            l.r.COLUMN3,
            l.r.COLUMN4,
            l.r.COLUMN5,
            l.r.COLUMN6,
            l.r.COLUMN7
        FROM (SELECT STREAM W3C_LOG_PARSE(message, 'COMMON')
              FROM "weblog_read) AS l(r);
```

### Output
<a name="sql-reference-w3c-log-parse-info-output"></a>

```
 192.168.254.30 -  John  [24/May/2004:22:01:02 -0700] GET /icons/apache_pb.gif HTTP/1.1  304  0
 192.168.254.30 -  Jane  [24/May/2004:22:01:02 -0700] GET /icons/small/dir.gif HTTP/1.1  304  0
```

### 
<a name="sql-reference-w3c-log-parse-details"></a>

The specification of COMMON in the FROM clause means the Common Log Format (CLF), which uses the specifiers %h %l %u %t "%r" %>s %b.

The [W3C-predefined formats](https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-w3c-log-parse.html#sql-reference-w3c-log-parse-predefined) shows the COMMON and other predefined specifier sets.

The specification of COMMON in the FROM clause means the Common Log Format (CLF), which uses the specifiers %h %l %u %t "%r" %>s %b. 

The table below, Specifiers used by the Common Log Format, describes the specifiers used by COMMON in the FROM clause.

## Specifiers Used by the Common Log Format
<a name="w2aac22c23c17c17"></a>


| Output Column | Format Specifier | Returns | 
| --- | --- | --- | 
|  COLUMN1  |  %h  |  The IP address of the remote host  | 
|  COLUMN2  |  %l  |  The remote logname  | 
|  COLUMN3  |  %u  |  The remote user  | 
|  COLUMN4  |  %t  |  The time  | 
|  COLUMN5  |  "%r"  |  The first line of the request  | 
|  COLUMN6  |  %>s  |  The status: For internally redirected requests, the status of the \$1original\$1 request --- %...>s for the last.  | 
|  COLUMN7  |  %b  |  The number of bytes sent, excluding HTTP headers  | 

## Example 2
<a name="w2aac22c23c17c19"></a>

The DDL in this example shows how to rename output columns and filter out unneeded columns.

### DDL
<a name="sql-reference-w3c-log-parse-ddl"></a>

```
    CREATE OR REPLACE VIEW "Schema1".weblogreduced AS
        SELECT STREAM CAST(s.COLUMN3 AS VARCHAR(5)) AS LOG_USER,
        CAST(s.COLUMN1 AS VARCHAR(15)) AS ADDRESS,
        CAST(s.COLUMN4 AS VARCHAR(30)) as TIME_DATES
        FROM "Schema1".weblog s;
```

### Output
<a name="sql-reference-w3c-log-parse-output"></a>

```
    +----------+-----------------+--------------------------------+
    | LOG_USER |     ADDRESS     |           TIME_DATES           |
    |          |                 |                                |
    +----------+-----------------+--------------------------------+
    | Jane     | 192.168.254.30  | [24/May/2004:22:01:02 -0700]   |
    |          |                 |                                |
    | John     | 192.168.254.30  | [24/May/2004:22:01:02 -0700]   |
    +----------+-----------------+--------------------------------+
```

## W3C Customized Formats
<a name="w2aac22c23c17c21"></a>

The same results would be created by naming the specifiers directly rather than using the "COMMON" name, as shown following: 

```
    CREATE OR REPLACE FOREIGN STREAM schema1.weblog
        SERVER logfile_server
        OPTIONS (LOG_PATH '/path/to/logfile',
                 ENCODING 'UTF-8',
                 SLEEP_INTERVAL '10000',
                 MAX_UNCHANGED_STATS '10',
                 PARSER 'W3C',
                 PARSER_FORMAT '%h %l %u %t \"%r\" %>s %b');
    or
     CREATE FOREIGN STREAM "Schema1".weblog_read
     SERVER "logfile_server"
     OPTIONS (log_path '/path/to/logfile',
     encoding 'UTF-8',
     sleep_interval '10000',
     max_unchanged_stats '10');
     CREATE OR REPLACE VIEW "Schema1".weblog AS
        SELECT STREAM
            l.r.COLUMN1,
            l.r.COLUMN2,
            l.r.COLUMN3,
            l.r.COLUMN4,
            l.r.COLUMN5,
            l.r.COLUMN6
        FROM (SELECT STREAM W3C_LOG_PARSE(message, '%h %l %u %t \"%r\" %>s %b')
              FROM "Schema1".weblog_read) AS l(r);
```

**Note**  
If you change %t to [%t], the date column contains the following:  

```
        24/May/2004:22:01:02 -0700
```
(instead of `[24/May/2004:22:01:02 -0700]`)

# Sorting Functions
<a name="sql-reference-sorting-functions"></a>

The topics in this section describe the sorting functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [Group Rank](sql-reference-group-rank-udx.md)

# Group Rank
<a name="sql-reference-group-rank-udx"></a>

This function applies a `RANK()` function to logical groups of rows and optionally delivers the group in sorted order.

Applications of `group_rank` include the following:
+ To sort results of a streaming `GROUP BY`.
+ To determine a relationship within the results of a group.

Group Rank can do the following actions:
+ Apply rank to a specified input column.
+ Supply either sorted or unsorted output.
+ Enable the user to specify a period of inactivity for data flush.

## SQL Declarations
<a name="sql-reference-group-rank-udx-declarations"></a>

The functional attributes and DDL are described in the sections that follow.
<a name="sql-reference-group-rank-udx-attributes"></a>
**Functional Attributes for Group\$1Rank**  
This function acts as follows:
+ Gathers rows until either a rowtime change is detected or a specified idle-time limit is exceeded.
+ Accepts any streaming rowset.
+ Uses any column with a basic SQL data type of `INTEGER`, `CHAR`, `VARCHAR` as the column by which to do the ranking.
+ Orders the output rows either in the order received or in ascending or descending order of values in the selected column.

### DDL for Group\$1Rank
<a name="sql-reference-group-rank-udx-udl"></a>

```
  group_rank(c cursor, rankByColumnName VARCHAR(128),
    rankOutColumnName VARCHAR(128), sortOrder VARCHAR(10), outputOrder VARCHAR(10),
    maxIdle INTEGER, outputMax INTEGER)
  returns table(c.*, "groupRank" INTEGER)
```

The parameters to the function are listed in the following table.


| Parameter | Description | 
| --- | --- | 
| c | CURSOR to Streaming Result Set | 
|  `rankByColumnName`  |  String naming the column to use for ranking the group.  | 
|  `rankOutColumnName`  |  String naming the column to use for returning the rank. This string must match the name of the `groupRank` column in the `RETURNS` clause of the `CREATE FUNCTION` statement.  | 
|  `sortOrder`  |  Controls ordering of rows for rank assignment. Valid values are as follows: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-group-rank-udx.html)  | 
|  `outputOrder`  |  Controls ordering of output. Valid values are as follows: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sql-reference-group-rank-udx.html)  | 
|  `maxIdle`  |  Time limit in milliseconds for holding a group for ranking. When `maxIdle` expires, the current group is released to the stream. A value of zero indicates no idle timeout.  | 
|  `outputMax`  |  Maximum number of rows the function outputs in a given group. A value of 0 indicates no limit.  | 

## Example
<a name="sql-reference-group-rank-example"></a>

### Example Dataset
<a name="sql-reference-group-rank-example-dataset"></a>

The following example is based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Data Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Data Analytics application that has the sample stock ticker input stream. To learn how to create an analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Data Analytics Developer Guide*. 

The sample stock dataset has the following schema:

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Sort the Results of a GROUP BY Clause
<a name="sql-reference-group-rank-example-1"></a>

In this example, the aggregate query has a `GROUP BY` clause on `ROWTIME` that groups the stream into finite rows. The `GROUP_RANK` function then sorts the rows returned by the `GROUP BY` clause.

```
CREATE OR REPLACE STREAM "ticker_grouped" (
    "group_time" TIMESTAMP,
    "ticker" VARCHAR(65520),
    "ticker_count" INTEGER);

CREATE OR REPLACE STREAM "destination_sql_stream" (
    "group_time" TIMESTAMP,
    "ticker" VARCHAR(65520),
    "ticker_count" INTEGER,
    "group_rank" INTEGER);
    
CREATE OR REPLACE PUMP "ticker_pump" AS 
    INSERT INTO "ticker_grouped"
    SELECT STREAM
        FLOOR(SOURCE_SQL_STREAM_001.ROWTIME TO SECOND),
        "TICKER_SYMBOL",
        COUNT(TICKER_SYMBOL)
    FROM SOURCE_SQL_STREAM_001
    GROUP BY FLOOR(SOURCE_SQL_STREAM_001.ROWTIME TO SECOND), TICKER_SYMBOL;

CREATE OR REPLACE PUMP DESTINATION_SQL_STREAM_PUMP AS 
    INSERT INTO "destination_sql_stream"
    SELECT STREAM
        "group_time",
        "ticker",
        "ticker_count",
        "groupRank"
    FROM TABLE(
        GROUP_RANK(
            CURSOR(SELECT STREAM * FROM "ticker_grouped"), 
            'ticker_count', 
            'groupRank', 
            'desc', 
            'asc', 
            5, 
            0));
```

### Results
<a name="sql-reference-group-rank-example-results"></a>

The preceding examples output a stream similar to the following.

![\[Data table showing ROWTIME, group_time, ticker, ticker_count, and group_rank columns with sample values.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-group-rank-example-01.png)


## Operational Overview
<a name="sql-reference-group-rank-udx-ops"></a>

Rows are buffered from the input cursor for each group (that is, rows with the same rowtimes). Ranking of the rows is done either after the arrival of a row with a different rowtime (or when the idle timeout occurs). Rows continue to be read while ranking is performed on the group of rows with the same rowtime.

The `outputMax` parameter specifies the maximum number of rows to be returned for each group after ranks are assigned.

By default, `group_rank` supports column pass through, as the example illustrates by using `c.*` as the standard shortcut directing pass through of all input columns in the order presented. You can, instead, name a subset using the notation "`c.columName`", allowing you to reorder the columns. However, using specific column names ties the UDX to a specific input set, whereas using the `c.*` notation allows the UDX to handle any input set.

The `rankOutColumnName` parameter specifies the output column used to return ranks. This column name must match the column name specified in the `RETURNS` clause of the `CREATE FUNCTION` statement.

# Statistical Variance and Deviation Functions
<a name="sql-reference-statistical-variance-deviation-functions"></a>

Each of these functions takes a set of numbers, ignores nulls, and can be used as either an aggregate function or an analytical function. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md) and [Analytic Functions](sql-reference-analytic-functions.md).

The relationships among these functions are described in the following table.


| Function purpose | Function name | Formula | Comments | 
| --- | --- | --- | --- | 
|  Hotspots  |  [HOTSPOTS](sqlrf-hotspots.md) (expr)  |  Detects hotspots of frequently occurring data in the data stream.  |    | 
|  Random Cut Forest  |  [RANDOM\$1CUT\$1FOREST](sqlrf-random-cut-forest.md) (expr)  |  Detects anomalies in the data stream.  |    | 
|  Random Cut Forest with Explanation  |  [RANDOM\$1CUT\$1FOREST\$1WITH\$1EXPLANATION](sqlrf-random-cut-forest-with-explanation.md) (expr)  |  Detects anomalies in the data stream, and returns an attribution score based on how anomalous the data in each column is.  |    | 
|  Population variance  |  [VAR\$1POP](sql-reference-VARPOP.md)(expr)  |  ( SUM(expr\$1expr) - SUM(expr)\$1SUM(expr) / COUNT(expr)) / COUNT(expr)  |  Applied to an empty set, it returns null.  | 
|  Population standard deviation  |  [STDDEV\$1POP](sql-reference-STDDEVPOP.md)  |  Square root of the population variance (VAR\$1POP).  |  When VAR\$1POP returns null, STDDEV\$1POP returns null.  | 
|  Sample variance  |  [VAR\$1SAMP](sql-reference-VARSAMP.md)  |  (SUM(expr\$1expr) - SUM(expr)\$1SUM(expr) / COUNT(expr)) / (COUNT(expr)−1)  |  Applied to an empty set, it returns null.  Applied to an input set of one element, VAR\$1SAMP returns null.  | 
|  Sample standard deviation  |  [STDDEV\$1SAMP](sql-reference-STDDEVSAMP.md) (expr)  |  Square root of the sample variance (VAR\$1SAMP).  |  Applied to only 1 row of input data, STDDEV\$1SAMP returns null.  | 

# HOTSPOTS
<a name="sqlrf-hotspots"></a>

Detects *hotspots*, or regions of activity that is significantly higher than the norm, in your data stream. A hotspot is defined as a small region of space that is relatively dense with data points. 

Using the `HOTSPOTS` function, you can use a simple SQL function to identify relatively dense regions in your data without having to explicitly build and train complicated machine learning models. You can then identify subsections of your data that need attention so that you can take immediate action.

For example, hotspots in your data might indicate a collection of overheated servers in a data center, a high concentration of vehicles indicating a traffic bottleneck, ride share rides in a certain area indicating a high-traffic event, or increased sales of products in a category that share similar features.

**Note**  
The ability of the `HOTSPOTS` function to detect frequent data points is application-dependent. To cast your business problem so that it can be solved with this function requires domain expertise. For example, you might want to determine which combination of columns in your input stream to pass to the function, and how to normalize the data if necessary. 

The algorithm accepts the `DOUBLE`, `INTEGER`, `FLOAT`, `TINYINT`, `SMALLINT`, `REAL`, and `BIGINT` data types. DECIMAL is not a supported type. Use DOUBLE instead.

**Note**  
The `HOTSPOT` function does not return the records that make up a hotspot. You can use the `ROWTIME` column to determine which records belong to a given hotspot.



## Syntax
<a name="hotspots-syntax"></a>

```
HOTSPOTS (inputStream,    
  windowSize,
  scanRadius,
  minimumNumberOfPointsInAHotspot)
```

## Parameters
<a name="hotspots-parameters"></a>

The following sections describe `HOTSPOT` function parameters. 

### inputStream
<a name="hotspots-input-stream"></a>

Pointer to your input stream. You set a pointer using the `CURSOR` function. For example, the following statements set a pointer to `InputStream`:

 

```
--Select all columns from input stream 
CURSOR(SELECT STREAM * FROM InputStream) 
--Select specific columns from input stream
CURSOR(SELECT STREAM PRICE, CHANGE FROM InputStream) 
-– Normalize the column X value.
CURSOR(SELECT STREAM IntegerColumnX / 100, IntegerColumnY FROM InputStream) 
–- Combine columns before passing to the function.
CURSOR(SELECT STREAM IntegerColumnX - IntegerColumnY FROM InputStream)
```

**Note**  
Only numeric columns from the input stream will be analyzed for hotspots. The `HOTSPOTS` function ignores other columns included in the cursor.

### windowSize
<a name="hotspots-windowsize"></a>

Specifies the number of records that are considered for each timestep by the sliding window over the stream.

You can set this value between 100 and 1000 inclusive. 

By increasing the window size, you can get a better estimate of hotspot position and density (relevance), but this also increases the running time.

### scanRadius
<a name="hotspots-scan-radius"></a>

Specifies the typical distance between a hotspot point and its nearest neighbors. 

This parameter is analogous to the **ε** value in the [DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) algorithm.

Set this parameter to a value that is smaller than the typical distance between points that are not in a hotspot, but large enough so that points in a hotspot have neighbors within this distance. 

You can set this value to any double value greater than zero. The lower the value for `scanRadius`, the more similar any two records belonging to the same hotspot are. However, low values for `scanRadius` also increase running time. Lower values of `scanRadius` result in hotspots that are smaller but more numerous.

### minimumNumberOfPointsInAHotspot
<a name="hotspots-minimumpoints"></a>

Specifies the number of records that are required for the records to form a hotspot. 

**Note**  
This parameter should be set in consideration with [windowSize](#hotspots-windowsize). It is best to think of `minimumNumberOfPointsInAHotspot` as some fraction of `windowSize`. The exact fraction is discoverable through experimentation. 

You can set this value between 2 and your configured value for window size inclusive. Choose a value that best models the problem you are solving in light of your chosen value for window size. 

## Output
<a name="hotspots-output"></a>

The output of the HOTSPOTS function is a table object that has the same schema as the input, with the following additional column:

### HOTSPOT\$1RESULTS
<a name="hotspots-hotspots"></a>

A JSON string describing all the hotspots found around the record. The function returns all potential hotspots; you can filter out hotspots below a certain `density` threshold in your application. The field has the following nodes, with values for each of the input columns:
+ `density`: The number of records in the hotspot divided by the hotspot size. You can use this value to determine the relative relevance of the hotspot.
+ `maxValues`: The maximum values for the records in the hotspot for each data column.
+ `minValues`: The minimum values for the records in the hotspot for each data column.

Data type: VARCHAR.

**Note**  
The trends that machine learning functions use to determine analysis scores are infrequently reset when the Kinesis Data Analytics service performs service maintenance. You might unexpectedly see analysis scores of 0 after service maintenance occurs. We recommend you set up filters or other mechanisms to treat these values appropriately as they occur.

## Example
<a name="hotspots-examples"></a>

The following example executes the `HOTSPOTS` function on the demo stream, which contains random data without meaningful hotspots. For an example that executes the `HOTSPOTS` function on a custom data stream with meaningful data hotspots, see [Example: Detect Hotspots](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-hotspots-detection.html).

### Example Dataset
<a name="hotspots-examples-dataset"></a>

The example following is based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Data Analytics Developer Guide*. To run the example, you need an Kinesis Data Analytics application that has the sample stock ticker input stream. To learn how to create a Kinesis Data Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Data Analytics Developer Guide*. 

The sample stock dataset has the following schema:

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return Hotspots on the Sample Data Stream
<a name="hotspots_example_1"></a>

In this example, a destination stream is created for the output of the HOTSPOTS function. A pump is then created that runs the HOTSPOTS function on the specified values in the sample data stream.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM"(
    CHANGE REAL, 
    PRICE REAL,
    HOTSPOTS_RESULT VARCHAR(10000));
    
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
        SELECT 
            "CHANGE", 
            "PRICE",
            "HOTSPOTS_RESULT" 
        FROM TABLE (
            HOTSPOTS(
                CURSOR(SELECT STREAM "CHANGE", "PRICE" FROM "SOURCE_SQL_STREAM_001"), 
                100, 
                0.013, 
                20)
            );
```

#### Results
<a name="hotspots_example_1_results"></a>

This example outputs a stream similar to the following.

![\[Results of SQL code showing rowtime, hotspot, and hotspot_results.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/hotspots-example-1.png)


# RANDOM\$1CUT\$1FOREST
<a name="sqlrf-random-cut-forest"></a>

Detects anomalies in your data stream.  A record is an anomaly if it is distant from other records.  To detect anomalies in individual record columns, see [RANDOM\$1CUT\$1FOREST\$1WITH\$1EXPLANATION](sqlrf-random-cut-forest-with-explanation.md).

 

**Note**  
The `RANDOM_CUT_FOREST` function's ability to detect anomalies is application-dependent. To cast your business problem so that it can be solved with this function requires domain expertise. For example, determining which combination of columns in your input stream to pass to the function and potentially normalize the data. For more information, see [inputStream](#random-cut-forest-input-view).

A stream record can have non-numeric columns, but the function uses only numeric columns to assign an anomaly score. A record can have one or more numeric columns. The algorithm uses all of the numeric data in computing an anomaly score.  If a record has ***n*** numeric columns, the underlying algorithm assumes each record is a point in ***n***-dimensional space. A point in ***n***-dimensional space that is distant from other points receives a higher anomaly score. 

The algorithm starts developing the machine learning model using current records in the stream when you start the application. The algorithm does not use older records in the stream for machine learning, nor does it use statistics from previous executions of the application.

The algorithm accepts the `DOUBLE`, `INTEGER`, `FLOAT`, `TINYINT`, `SMALLINT`, `REAL`, and `BIGINT` data types.

 

**Note**  
DECIMAL is not a supported type. Use DOUBLE instead.

The following is an example of anomaly detection. The diagram shows three clusters and a few anomalies randomly interjected. The red squares show the records that received the highest anomaly score according to the `RANDOM_CUT_FOREST` function. The blue diamonds represent the remaining records.  Note how the highest scoring records tend to be outside the clusters. 

 

![\[Scatter plot showing three clusters of blue data points and several red outlier points.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/rcf-10.png)




For a sample application with step-by-step instructions, see [Detect Anomalies](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-anomaly-detection.html).

## Syntax
<a name="random-cut-forest-syntax"></a>

 

```
RANDOM_CUT_FOREST (inputStream,    
                   numberOfTrees,
                   subSampleSize,
                   timeDecay,
                   shingleSize)
```



## Parameters
<a name="random-cut-forest-parameters"></a>

The following sections describe the parameters. 

### inputStream
<a name="random-cut-forest-input-view"></a>

Pointer to your input stream. You set a pointer using the `CURSOR` function. For example, the following statements sets a pointer to `InputStream`.

 

```
CURSOR(SELECT STREAM * FROM InputStream) 
CURSOR(SELECT STREAM IntegerColumnX, IntegerColumnY FROM InputStream) 
-– Perhaps normalize the column X value.
CURSOR(SELECT STREAM IntegerColumnX / 100, IntegerColumnY FROM InputStream) 
–- Combine columns before passing to the function.
CURSOR(SELECT STREAM IntegerColumnX - IntegerColumnY FROM InputStream)
```

The `CURSOR` function is the only required parameter for the `RANDOM_CUT_FOREST` function. The function assumes the following default values for the other parameters:

numberOfTrees = 100

subSampleSize = 256

timeDecay = 100,000

shingleSize = 1

When using this function, your input stream can have up to 30 numeric columns. 

### numberOfTrees
<a name="random-cut-forest-no-of-trees"></a>

Using this parameter, you specify the number of random cut trees in the forest.  

**Note**  
By default, the algorithm constructs a number of trees, each constructed using a given number of sample records (see `subSampleSize` later in this list) from the input stream. The algorithm uses each tree to assign an anomaly score. The average of all these scores is the final anomaly score.

The default value for `numberOfTrees` is 100. You can set this value between 1 and 1,000 (inclusive). By increasing the number of trees in the forest, you can get a better estimate of the anomaly score, but this also increases the running time.

### subSampleSize
<a name="random-cut-forest-subsample-size"></a>

Using this parameter, you can specify the size of the random sample that you want the algorithm to use when constructing each tree.  Each tree in the forest is constructed with a (different) random sample of records.  The algorithm uses each tree to assign an anomaly score. When the sample reaches `subSampleSize` records, records are removed randomly, with older records having a higher probability of removal than newer records. 

The default value for `subSampleSize` is 256.  You can set this value between 10 and 1,000 (inclusive).

Note that the `subSampleSize` must be less than the `timeDecay` parameter (which is set to 100,000 by default).  Increasing the sample size provides each tree a larger view of the data, but also increases the running time.

**Note**  
The algorithm returns zero for the first `subSampleSize` records while the machine learning model is trained.

### timeDecay
<a name="random-cut-forest-time-decay"></a>

The `timeDecay` parameter allows you to specify how much of the recent past to consider when computing an anomaly score. This is because data streams naturally evolve over time. For example, an eCommerce website’s revenue may continuously increase, or global temperatures may rise over time.  In such situations, we want an anomaly to be flagged relative to recent data, as opposed to data from the distant past. 

The default value is 100,000 records (or 100,000 shingles if shingling is used, as described in the following section).  You can set this value between 1 and the maximum integer (that is, 2147483647). The algorithm exponentially decays the importance of older data.

If you choose the `timeDecay` default of 100,000, the anomaly detection algorithm does the following:
+ Uses only the most recent 100,000 records in the calculations (and ignores older records). 
+ Within the most recent 100,000 records, assigns exponentially more weight to recent records and less to older records in anomaly detection calculations. 

If you don't want to use the default value, you can calculate the number of records to use in the algorithm. To do this, multiply the number of expected records per day by the number of days you want the algorithm to consider. For example, if you expect 1,000 records per day, and you want to analyze 7 days of records, set this parameter to 7,000 (1,000 \$1 7).

The `timeDecay` parameter determines the maximum quantity of recent records kept in the working set of the anomaly detection algorithm.  Smaller `timeDecay` values are desirable if the data is changing rapidly.  The best `timeDecay` value is application-dependent. 

### shingleSize
<a name="random-cut-forest-shingle-size"></a>

The explanation given here is for a one-dimensional stream (that is, a stream with one numeric column), but it can also be used for multi-dimensional streams.

A shingle is a consecutive sequence of the most recent records.  For example, a `shingleSize` of 10 at time *t* corresponds to a vector of the last 10 records received up to and including time *t*.  The algorithm treats this sequence as a vector over the last `shingleSize` number of records.  

If data is arriving uniformly in time, a shingle of size 10 at time *t* corresponds to the data received at time *t-9, t-8,…,t*.  At time *t\$11*, the shingle slides over one unit and consists of data from time *t-8,t-7, …, t, t\$11*.  These shingled records gathered over time correspond to a collection of 10-dimensional vectors over which the anomaly detection algorithm runs.  



The intuition is that a shingle captures the shape of the recent past.   Your data may have a typical shape.  For example, if your data is collected hourly, a shingle of size 24 may capture the daily rhythm of your data.

The default `shingleSize` is one record (because shingle size is data dependent). You can set this value between 1 and 30 (inclusive).



Note the following about setting the `shingleSize`:
+ If you set the `shingleSize` too small, the algorithm will be more susceptible to minor fluctuations in the data, leading to high-anomaly scores for records that are not anomalous.
+ If you set the `shingleSize` too large, it may take more time to detect anomalous records because there are more records in the shingle that are not anomalous.  It may also take more time to determine that the anomaly has ended.
+ Identifying the right shingle size is application-dependent.  Experiment with different shingle sizes to ascertain the effect.

  

The following example illustrates how you can catch anomalies when you monitor the records with the highest anomaly score. In this particular example, the two highest anomaly scores also signal the beginning and end of an artificially injected anomaly.

Consider this stylized one-dimensional stream represented as a sine wave, intended to capture a circadian rhythm.  This curve illustrates the typical number of orders that an eCommerce site receives per hour, the number of users logged into a server, the number of ad clicks received per hour, etc.  A severe dip of 20 consecutive records is artificially injected in the middle of the plot.  

 

![\[Sine wave graph showing regular oscillations with an anomaly spike in the third cycle.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/rcf-20.png)




We ran the `RANDOM_CUT_FOREST` function with a shingle size of four records. The result is shown below. The red line shows the anomaly score.  Note that the beginning and the end of the anomaly received high scores.

 

![\[Graph showing sine wave with anomaly and anomaly score over time, with a spike in both lines.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/rcf-30.png)


When you use this function, we recommend that you investigate the highest scoring points as potential anomalies.  

**Note**  
The trends that machine learning functions use to determine analysis scores are infrequently reset when the Kinesis Data Analytics service performs service maintenance. You might unexpectedly see analysis scores of 0 after service maintenance occurs. We recommend you set up filters or other mechanisms to treat these values appropriately as they occur.



For more information, see the [Robust Random Cut Forest Based Anomaly Detection On Streams](http://proceedings.mlr.press/v48/guha16.pdf) white paper at the Journal of Machine Learning Research website.

# RANDOM\$1CUT\$1FOREST\$1WITH\$1EXPLANATION
<a name="sqlrf-random-cut-forest-with-explanation"></a>

Computes an anomaly score and explains it for each record in your data stream. The anomaly score for a record indicates how different it is from the trends that have recently been observed for your stream. The function also returns an attribution score for each column in a record, based on how anomalous the data in that column is. For each record, the sum of the attribution scores of all columns is equal to the anomaly score. 

You also have the option of getting information about the direction in which a given column is anomalous (whether it's high or low relative to the recently observed data trends for that column in the stream).

For example, in e-commerce applications, you might want to know when there's a change in the recently observed pattern of transactions. You also might want to know how much of the change is due to a change in the number of purchases made per hour, and how much is due to a change in the number of carts abandoned per hour—information that is represented by the attribution scores. You also might want to look at directionality to know whether you're being notified of the change due to an increase or a decrease in each of those values.

 

**Note**  
The `RANDOM_CUT_FOREST_WITH_EXPLANATION` function's ability to detect anomalies is application-dependent. Casting your business problem so that it can be solved with this function requires domain expertise. For example, you may need to determine which combination of columns in your input stream to pass to the function, and you may potentially benefit from normalizing the data. For more information, see [inputStream](#random-cut-forest-with-explanation-input-view).

A stream record can have non-numeric columns, but the function uses only numeric columns to assign an anomaly score. A record can have one or more numeric columns. The algorithm uses all the numeric data in computing an anomaly score. 

The algorithm starts developing the machine learning model using current records in the stream when you start the application. The algorithm does not use older records in the stream for machine learning, nor does it use statistics from previous executions of the application.

The algorithm accepts the `DOUBLE`, `INTEGER`, `FLOAT`, `TINYINT`, `SMALLINT`, `REAL`, and `BIGINT` data types.

 

**Note**  
`DECIMAL` is not a supported type. Use `DOUBLE` instead.

The following is a simple visual example of anomaly detection with different attribution scores in two-dimensional space. The diagram shows a cluster of blue data points and four outliers shown as red points. The red points have similar anomaly scores, but these four points are anomalous for different reasons. For points A1 and A2, most of the anomaly is attributable to their outlying y-values. In the case of A3 and A4, you can attribute most of the anomaly to their outlying x-values. Directionality is LOW for the y-value of A1, HIGH for the y-value of A2, HIGH for the x-value of A3, and LOW for the x-value of A4.

 

![\[Anomaly detection chart showing 4 points that appear outside a cluster.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/different_attributions.png)




## Syntax
<a name="random-cut-forest-with-explanation-syntax"></a>

 

```
RANDOM_CUT_FOREST_WITH_EXPLANATION (inputStream,    
                   numberOfTrees,
                   subSampleSize,
                   timeDecay,
                   shingleSize,
                   withDirectionality
)
```



## Parameters
<a name="random-cut-forest-with-explanation-parameters"></a>

The following sections describe the parameters of the `RANDOM_CUT_FOREST_WITH_EXPLANATION` function. 

### inputStream
<a name="random-cut-forest-with-explanation-input-view"></a>

Pointer to your input stream. You set a pointer using the `CURSOR` function. For example, the following statements set a pointer to `InputStream`.

 

```
CURSOR(SELECT STREAM * FROM InputStream) 
CURSOR(SELECT STREAM IntegerColumnX, IntegerColumnY FROM InputStream) 
-– Perhaps normalize the column X value.
CURSOR(SELECT STREAM IntegerColumnX / 100, IntegerColumnY FROM InputStream) 
–- Combine columns before passing to the function.
CURSOR(SELECT STREAM IntegerColumnX - IntegerColumnY FROM InputStream)
```

The `CURSOR` function is the only required parameter for the `RANDOM_CUT_FOREST_WITH_EXPLANATION` function. The function assumes the following default values for the other parameters:

`numberOfTrees` = 100

`subSampleSize` = 256

`timeDecay` = 100,000

`shingleSize` = 1

`withDirectionality` = FALSE

When you use this function, your input stream can have up to 30 numeric columns. 

### numberOfTrees
<a name="random-cut-forest-with-explanation-no-of-trees"></a>

Using this parameter, you specify the number of random cut trees in the forest.  

**Note**  
By default, the algorithm constructs a number of trees, each constructed using a given number of sample records (see `subSampleSize` later in this list) from the input stream. The algorithm uses each tree to assign an anomaly score. The average of all these scores is the final anomaly score.

The default value for `numberOfTrees` is 100. You can set this value between 1 and 1,000 (inclusive). By increasing the number of trees in the forest, you can get a better estimate of the anomaly and attribution scores, but this also increases the running time.

### subSampleSize
<a name="random-cut-forest-with-explanation-subsample-size"></a>

Using this parameter, you can specify the size of the random sample that you want the algorithm to use when constructing each tree. Each tree in the forest is constructed with a (different) random sample of records. The algorithm uses each tree to assign an anomaly score. When the sample reaches `subSampleSize` records, records are removed randomly, with older records having a higher probability of removal than newer records. 

The default value for `subSampleSize` is 256. You can set this value between 10 and 1,000 (inclusive).

The `subSampleSize` must be less than the `timeDecay` parameter (which is set to 100,000 by default). Increasing the sample size provides each tree a larger view of the data, but it also increases the running time.

**Note**  
The algorithm returns zero for the first `subSampleSize` records while the machine learning model is trained.

### timeDecay
<a name="random-cut-forest-with-explanation--time-decay"></a>

You can use the `timeDecay` parameter to specify how much of the recent past to consider when computing an anomaly score. Data streams naturally evolve over time. For example, an e-commerce website’s revenue might continuously increase, or global temperatures might rise over time. In such situations, you want an anomaly to be flagged relative to recent data, as opposed to data from the distant past. 

The default value is 100,000 records (or 100,000 shingles if shingling is used, as described in the following section). You can set this value between 1 and the maximum integer (that is, 2147483647). The algorithm exponentially decays the importance of older data.

If you choose the `timeDecay` default of 100,000, the anomaly detection algorithm does the following:
+ Uses only the most recent 100,000 records in the calculations (and ignores older records). 
+ Within the most recent 100,000 records, assigns exponentially more weight to recent records and less to older records in anomaly detection calculations. 

The `timeDecay` parameter determines the maximum quantity of recent records kept in the working set of the anomaly detection algorithm. Smaller `timeDecay` values are desirable if the data is changing rapidly. The best `timeDecay` value is application-dependent. 

### shingleSize
<a name="random-cut-forest-with-explanation-shingle-size"></a>

The explanation given here applies to a one-dimensional stream (that is, a stream with one numeric column), but shingling can also be used for multi-dimensional streams.

A shingle is a consecutive sequence of the most recent records. For example, a `shingleSize` of 10 at time *t* corresponds to a vector of the last 10 records received up to and including time *t*. The algorithm treats this sequence as a vector over the last `shingleSize` number of records.  

If data is arriving uniformly in time, a shingle of size 10 at time *t* corresponds to the data received at time *t-9, t-8,…,t*. At time *t\$11*, the shingle slides over one unit and consists of data from time *t-8,t-7, …, t, t\$11*. These shingled records gathered over time correspond to a collection of 10-dimensional vectors over which the anomaly detection algorithm runs.  



The intuition is that a shingle captures the shape of the recent past. Your data might have a typical shape. For example, if your data is collected hourly, a shingle of size 24 might capture the daily rhythm of your data.

The default `shingleSize` is one record (because shingle size is data-dependent). You can set this value between 1 and 30 (inclusive).



Note the following about setting the `shingleSize`:
+ If you set the `shingleSize` too small, the algorithm is more susceptible to minor fluctuations in the data, leading to high anomaly scores for records that are not anomalous.
+ If you set the `shingleSize` too large, it might take more time to detect anomalous records because there are more records in the shingle that are not anomalous. It also might take more time to determine that the anomaly has ended.
+ Identifying the right shingle size is application-dependent. Experiment with different shingle sizes to determine the effects.

  

### withDirectionality
<a name="random-cut-forest-with-explanation-with-directionality"></a>

A Boolean parameter that defaults to `false`. When set to `true`, it tells you the direction in which each individual dimension makes a contribution to the anomaly score. It also provides the strength of the recommendation for that directionality.

## Results
<a name="random-cut-forest-with-explanation-results"></a>

The function returns an anomaly score of 0 or more and an explanation in JSON format.

The anomaly score starts out at 0 for all the records in the stream while the algorithm goes through the learning phase. You then start to see positive values for the anomaly score. Not all positive anomaly scores are significant; only the highest ones are. To get a better understanding of the results, look at the explanation.

The explanation provides the following values for each column in the record:
+ **Attribution score:** A nonnegative number that indicates how much this column has contributed to the anomaly score of the record. In other words, it indicates how different the value of this column is from what’s expected based on the recently observed trend. The sum of the attribution scores of all columns for the record is equal to the anomaly score.
+ **Strength:** A nonnegative number representing the strength of the directional recommendation. A high value for strength indicates a high confidence in the directionality that is returned by the function. During the learning phase, the strength is 0.
+ **Directionality:** This is either HIGH if the value of the column is above the recently observed trend or LOW if it’s below the trend. During the learning phase, this defaults to LOW.

**Note**  
The trends that machine learning functions use to determine analysis scores are infrequently reset when the Kinesis Data Analytics service performs service maintenance. You might unexpectedly see analysis scores of 0 after service maintenance occurs. We recommend you set up filters or other mechanisms to treat these values appropriately as they occur.

## Examples
<a name="random-cut-forest-with-explanation-examples"></a>

### Stock Ticker Data Example
<a name="random-cut-forest-with-explanation-examples-stockdata"></a>

This example is based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run the example, you need a Kinesis Data Analytics application that has the sample stock ticker input stream. To learn how to create a Kinesis Data Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the following schema:

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

In this example, the application calculates an anomaly score for the record and an attribution score for the `PRICE` and `CHANGE` columns, which are the only numeric columns in the input stream. 

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (anomaly REAL, ANOMALY_EXPLANATION VARCHAR(20480));
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT "ANOMALY_SCORE", "ANOMALY_EXPLANATION" FROM TABLE (RANDOM_CUT_FOREST_WITH_EXPLANATION(CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 100, 256, 100000, 1, true)) WHERE ANOMALY_SCORE > 0
```

The preceding example outputs a stream similar to the following.

![\[Screenshot showing an output stream containing anomaly scores and explanation information.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/anomaly_results.png)


### Network and CPU Utilization Example
<a name="random-cut-forest-with-explanation-examples-networkandcpu"></a>

This theoretical example shows two sets of data that follow an oscillating pattern. In the following graph, they're represented by the red and blue curves at the top. The red curve shows network utilization over time, and the blue curve shows idle CPU over time for the same computer system. The two signals, which are out of phase with each other, are regular most of the time. But they both also show occasional anomalies, which appear as irregularities in the graph. The following is an explanation of what the curves in the graph represent from the top curve to the bottom curve.
+ The top curve, which is red, represents network utilization over time. It follows a cyclical pattern and is regular most of the time, except for two anomalous periods, each representing a drop in utilization. The first anomalous period occurs between time values 500 and 1,000. The second anomalous period occurs between time values 1,500 and 2,000.
+ The second curve from the top (blue in color) is idle CPU over time. It follows a cyclical pattern and is regular most of the time, with the exception of two anomalous periods. The first anomalous period occurs between time values 1,000 and 1,500 and shows a drop in idle CPU time. The second anomalous period occurs between time values 1,500 and 2,000 and shows an increase in idle CPU time.
+ The third curve from the top shows the anomaly score. At the beginning, there's a learning phase during which the anomaly score is 0. After the learning phase, there's steady noise in the curve, but the anomalies stand out. 

  The first anomaly, which is marked in red on the black anomaly score curve, is more attributable to the network utilization data. The second anomaly, marked in blue, is more attributable to the CPU data. The red and blue markings are provided in this graph as a visual aide. They aren't produced by the `RANDOM_CUT_FOREST_WITH_EXPLANATION` function. Here's how we obtained these red and blue markings:
  + After running the function, we selected the top 20 anomaly score values. 
  + From this set of top 20 anomaly score values, we selected those values for which the network utilization had an attribution greater than or equal to 1.5 times the attribution for CPU. We colored the points in this new set with red markers in the graph.
  + We colored with blue markers the points for which the CPU attribution score was greater than or equal to 1.5 times the attribution for network utilization.
+ The second curve from the bottom is a graphical representation of directionality for the network utilization signal. We obtained this curve by running the function, multiplying the strength by -1 for LOW directionality and by \$11 for HIGH directionality, and plotting the results against time. 

  When there's a drop in the cyclical pattern of network utilization, there’s a corresponding negative spike in directionality. When network utilization shows an increase back to the regular pattern, directionality shows a positive spike corresponding to that increase. Later on, there's another negative spike, followed closely by another positive spike. Together they represent the second anomaly seen in the network utilization curve.
+ The bottom curve is a graphical representation of directionality for the CPU signal. We obtained it by multiplying the strength by -1 for LOW directionality and by \$11 for HIGH directionality, and then plotting the results against time. 

  With the first anomaly in the idle CPU curve, this directionality curve shows a negative spike followed immediately by a smaller, positive spike. The second anomaly in the idle CPU curve produces a positive spike followed by a negative spike in directionality.

![\[Stylized graph showing network utilization over time; idle CPU over time; anomaly score with explanation over time; directionality for network utilization; and directionality for idle CPU.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-rcfwa-example-3.png)


### Blood Pressure Example
<a name="random-cut-forest-with-explanation-examples-bloodpressure"></a>

For a more detailed example, with code that detects and explains anomalies in blood pressure readings, see [Example: Detecting Data Anomalies and Getting an Explanation](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-anomaly-detection-with-explanation.html).

# STDDEV\$1POP
<a name="sql-reference-STDDEVPOP"></a>

Returns the square root of the [VAR\$1POP](sql-reference-VARPOP.md) population variance for <number expression>, evaluated for each row remaining in the group.

When you use `STDDEV_POP`, be aware of the following:
+ When the input set has no non-null data, `STDDEV_POP` returns `NULL`.
+ If you don't use the `OVER` clause, `STDDEV_POP` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a `GROUP BY` clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `STDDEV_POP` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an `OVER` clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="stddevpop-syntax"></a>

```
 STDDEV_POP ( [DISTINCT | ALL] number-expression )
```

## Parameters
<a name="stddevpop-parameters"></a>

### ALL
<a name="stddevpop-parameters-all"></a>

Includes duplicate values in the input set. `ALL` is the default.

### DISTINCT
<a name="stddevpop-parameters-distinct"></a>

Excludes duplicate values in the input set.

## Examples
<a name="stddevpop-examples"></a>

### Example Dataset
<a name="w2aac22c29c16c17b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Determine the standard deviation of the population in a column in a tumbling window query
<a name="w2aac22c29c16c17b4"></a>

The following example demonstrates how to use the `STDDEV_POP` function to determine the standard deviation of the values in a tumbling window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

#### Using STEP (Recommended)
<a name="stddevpop-examples-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), stddev_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, STDDEV_POP(price) AS stddev_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP(("SOURCE_SQL_STREAM_001".ROWTIME) BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="stddevpop-examples-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), stddev_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, STDDEV_POP(price) AS stddev_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
```

#### Results
<a name="stddevpop-results"></a>

The preceding examples output a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and STDDEV_POP_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-stddev-pop-1.png)


### Example 2: Determine the standard deviation of the population of the values in a column in a sliding window query
<a name="w2aac22c29c16c17b6"></a>

The following example demonstrates how to use the `STDDEV_POP` function to determine the standard deviation of the values in a sliding window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), stddev_pop_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, STDDEV_POP(price) OVER TEN_SECOND_SLIDING_WINDOW AS stddev_pop_price
FROM "SOURCE_SQL_STREAM_001"
 
WINDOW TEN_SECOND_SLIDING_WINDOW AS (
  PARTITION BY ticker_symbol
  RANGE INTERVAL '10' SECOND PRECEDING);
```

The preceding example outputs a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and STDEV_POP_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-stddev-pop-2.png)


## See Also
<a name="stddepop-seealso"></a>
+ Sample standard deviation: [STDDEV\$1SAMP](sql-reference-STDDEVSAMP.md)
+ Sample variance: [VAR\$1SAMP](sql-reference-VARSAMP.md)
+ Population variance: [VAR\$1POP](sql-reference-VARPOP.md)

# STDDEV\$1SAMP
<a name="sql-reference-STDDEVSAMP"></a>

Returns the statistical standard deviation of all values in <number-expression>, evaluated for each row remaining in the group and defined as the square root of the [VAR\$1SAMP](sql-reference-VARSAMP.md).

When you use `STDDEV_SAMP`, be aware of the following:
+ When the input set has no non-null data, `STDDEV_SAMP` returns `NULL`.
+ If you don't use the `OVER` clause, `STDDEV_SAMP` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a GROUP BY clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `STDDEV_SAMP` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an OVER clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 
+ `STD_DEV` is an alias of `STDDEV_SAMP`.

## Syntax
<a name="stddevsamp-syntax"></a>

```
 STDDEV_SAMP ( [DISTINCT | ALL] number-expression )
```

## Parameters
<a name="stddevsamp-parameters"></a>

### ALL
<a name="stddevsamp-parameters-all"></a>

Includes duplicate values in the input set. `ALL` is the default.

### DISTINCT
<a name="stddevsamp-parameters-distinct"></a>

Excludes duplicate values in the input set.

## Examples
<a name="stddevsamp-examples"></a>

### Example Dataset
<a name="w2aac22c29c18c23b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Determine the statistical standard deviation of the values in a column in a tumbling window query
<a name="w2aac22c29c18c23b4"></a>

The following example demonstrates how to use the `STDDEV_SAMP` function to determine the standard deviation of the values in a tumbling window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), stddev_samp_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, STDDEV_SAMP(price) AS stddev_samp_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
```

### Results
<a name="stddev-example-results"></a>

The preceding examples output a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and STDDEV_SAMP_PRICE columns with sample data.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-stddev-samp-1.png)


### Example 2: Determine the statistical standard deviation of the values in a columm in a sliding window query
<a name="w2aac22c29c18c23c10"></a>

The following example demonstrates how to use the `STDDEV_SAMP` function to determine the standard deviation of the values in a sliding window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), stddev_samp_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, STDDEV_SAMP(price) OVER TEN_SECOND_SLIDING_WINDOW AS stddev_samp_price
FROM "SOURCE_SQL_STREAM_001"
 
WINDOW TEN_SECOND_SLIDING_WINDOW AS (
  PARTITION BY ticker_symbol
  RANGE INTERVAL '10' SECOND PRECEDING);
```

The preceding example outputs a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and STDDEV_SAMP_PRICE columns with sample data.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-stddev-samp-2.png)


## See Also
<a name="stddevsamp-seealso"></a>
+ Population standard deviation: [STDDEV\$1POP](sql-reference-STDDEVPOP.md)
+ Sample variance: [VAR\$1SAMP](sql-reference-VARSAMP.md)
+ Population variance: [VAR\$1POP](sql-reference-VARPOP.md)

# VAR\$1POP
<a name="sql-reference-VARPOP"></a>

Returns the population variance of a non-null set of numbers (nulls being ignored)

VAR\$1POP uses the following calculation:
+ (SUM(expr\$1expr) - SUM(expr)\$1SUM(expr) / COUNT(expr)) / COUNT(expr)

In other words, for a given set of non-null values, using S1 as the sum of the values and S2 as the sum of the squares of the values, VAR\$1POP returns the result (S2-S1\$1S1/N)/N.

When you use `VAR_POP`, be aware of the following:
+ When the input set has no non-null data, or when applied to an empty set, `VAR_POP` returns `NULL`.
+ If you don't use the `OVER` clause, `VAR_POP` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a `GROUP BY` clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `VAR_POP` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an `OVER` clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="varpop-syntax"></a>

```
 VAR_POP ( [DISTINCT | ALL] number-expression )
```

## Parameters
<a name="varpop-parameters"></a>

### ALL
<a name="varpop-parameters-all"></a>

Includes duplicate values in the input set. `ALL` is the default.

### DISTINCT
<a name="varpop-parameters-distinct"></a>

Excludes duplicate values in the input set.

## Examples
<a name="varpop-examples"></a>

### Example Dataset
<a name="w2aac22c29c20c29b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Determine the population variance in a column in a tumbling window query
<a name="varpop_example_1"></a>

The following example demonstrates how to use the `VARPOP` function to determine the population variance of the values in a tumbling window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

#### Using STEP (Recommended)
<a name="varpop_example_step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) AS var_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP(("SOURCE_SQL_STREAM_001".ROWTIME) BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="varpop_example_floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) AS var_pop_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
```

#### Results
<a name="varpop_example_results"></a>

The preceding examples output a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and VAR_POP_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-varpop-1.png)


### Example 2: Determine the population variance of the values in a columm in a sliding window query
<a name="w2aac22c29c20c29b6"></a>

The following example demonstrates how to use the `VARPOP` function to determine the population variance of the values in a sliding window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_pop_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_POP(price) OVER TEN_SECOND_SLIDING_WINDOW AS var_pop_price
FROM "SOURCE_SQL_STREAM_001"
 
WINDOW TEN_SECOND_SLIDING_WINDOW AS (
  PARTITION BY ticker_symbol
  RANGE INTERVAL '10' SECOND PRECEDING);
```

The preceding example outputs a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and VAR_POP_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-varpop-2.png)


## See Also
<a name="varpop-seealso"></a>
+ Population standard deviation: [STDDEV\$1POP](sql-reference-STDDEVPOP.md)
+ Sample standard deviation: [STDDEV\$1SAMP](sql-reference-STDDEVSAMP.md)
+ Sample variance: [VAR\$1SAMP](sql-reference-VARSAMP.md)

# VAR\$1SAMP
<a name="sql-reference-VARSAMP"></a>

Returns the sample variance of a non-null set of numbers (nulls being ignored).

VAR\$1SAMP uses the following calculation:
+ (SUM(expr\$1expr) - SUM(expr)\$1SUM(expr) / COUNT(expr)) / (COUNT(expr)-1)

In other words, for a given set of non-null values, using S1 as the sum of the values and S2 as the sum of the squares of the values, `VAR_SAMP` returns the result (S2-S1\$1S1/N)/(N-1). 

When you use `VAR_SAMP`, be aware of the following:
+ When the input set has no non-null data, `VAR_SAMP` returns `NULL`. Given an input set of null or one element, `VAR_SAMP` returns `null`.
+ If you don't use the `OVER` clause, `VAR_SAMP` is calculated as an aggregate function. In this case, the aggregate query must contain a [GROUP BY clause](sql-reference-group-by-clause.md) on a monotonic expression based on `ROWTIME` that groups the stream into finite rows. Otherwise, the group is the infinite stream, and the query will never complete and no rows will be emitted. For more information, see [Aggregate Functions](sql-reference-aggregate-functions.md). 
+ A windowed query that uses a `GROUP BY` clause processes rows in a tumbling window. For more information, see [Tumbling Windows (Aggregations Using GROUP BY)](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html).
+ If you use the `OVER` clause, `VAR_SAMP` is calculated as an analytic function. For more information, see [Analytic Functions](sql-reference-analytic-functions.md).
+ A windowed query that uses an `OVER` clause processes rows in a sliding window. For more information, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) 

## Syntax
<a name="varsamp-syntax"></a>

```
 VAR_SAMP ( [DISTINCT | ALL] number-expression )
```

## Parameters
<a name="varsamp-parameters"></a>

### ALL
<a name="varsamp-parameters-all"></a>

Includes duplicate values in the input set. `ALL` is the default.

### DISTINCT
<a name="varsamp-parameters-distinct"></a>

Excludes duplicate values in the input set.

## Examples
<a name="varsamp-examples"></a>

### Example Dataset
<a name="w2aac22c29c22c29b2"></a>

The examples following are based on the sample stock dataset that is part of the [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Determine the sample variance in a column in a tumbling window query
<a name="varsamp-example-1"></a>

The following example demonstrates how to use the `VAR_SAMP` function to determine the sample variance of the values in a tumbling window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

#### Using STEP (Recommended)
<a name="varsamp-example-step"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_samp_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_SAMP(price) AS var_samp_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, STEP(("SOURCE_SQL_STREAM_001".ROWTIME) BY INTERVAL '60' SECOND);
```

#### Using FLOOR
<a name="varsamp-example-floor"></a>

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_samp_price REAL);

CREATE OR REPLACE  PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_SAMP(price) AS var_samp_price
    FROM "SOURCE_SQL_STREAM_001"
    GROUP BY ticker_symbol, FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00') SECOND / 10 TO SECOND);
```

#### Results
<a name="varsamp-example-results"></a>

The preceding examples output a stream similar to the following:

![\[Table showing ROWTIME, TICKER_SYMBOL, and VAR_SAMP_PRICE columns with sample data entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-varsamp-1.png)


### Example 2: Determine the sample variance of the values in a column in a sliding window query
<a name="w2aac22c29c22c29b6"></a>

The following example demonstrates how to use the `VAR_SAMP` function to determine the sample variance of the values in a sliding window of the PRICE column of the example dataset. `DISTINCT` is not specified, so duplicate values are included in the calculation.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (ticker_symbol VARCHAR(4), var_samp_price REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM ticker_symbol, VAR_SAMP(price) OVER TEN_SECOND_SLIDING_WINDOW AS var_samp_price
FROM "SOURCE_SQL_STREAM_001"
 
WINDOW TEN_SECOND_SLIDING_WINDOW AS (
  PARTITION BY ticker_symbol
  RANGE INTERVAL '10' SECOND PRECEDING);
```

The preceding example outputs a stream similar to the following:

![\[Table showing data for ROWTIME, TICKER_SYMBOL, and VAR_SAMP_PRICE columns with sample entries.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-varsamp-2.png)


## See Also
<a name="varsamp-seealso"></a>
+ Population standard deviation: [STDDEV\$1POP](sql-reference-STDDEVPOP.md)
+ Sample standard deviation: [STDDEV\$1SAMP](sql-reference-STDDEVSAMP.md)
+ Population variance:[VAR\$1POP](sql-reference-VARPOP.md) 

# Streaming SQL Functions
<a name="sql-reference-streaming-sql-functions"></a>

The topics in this section describe the streaming functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [LAG](sql-reference-lag.md)
+ [Monotonic Function](sql-reference-monotonic.md)
+ [NTH\$1VALUE](sql-reference-nth-value.md)

# LAG
<a name="sql-reference-lag"></a>

LAG returns the evaluation of the expression (such as the name of a column) for the record that is N records before the current record in a given window. Both offset and default are evaluated with respect to the current record. If there is no such record, LAG instead returns a specified default expression. LAG returns a value of the same type as the expression. 

## Syntax
<a name="sql-reference-lag-syntax"></a>

```
LAG(expr [ , N [ , defaultExpr]]) [ IGNORE NULLS | RESPECT NULLS ] OVER [ window-definition ] 
```

## Parameters
<a name="sql-reference-lag-parameters"></a>

*expr*

An expression that is evaluated on a record.

*N*

The number of records before the current record to query. The default is `1`.

*defaultExpr*

An expression of the same type as *expr* that is returned if the record queried (*n* before the current record) falls outside the window. If not specified, *null* is returned for values that fall outside the window.

**Note**  
The *defaultExpr* expression doesn't replace actual *null* values returned from the source stream.

IGNORE NULLS

A clause that specifies that null values are not counted when determining the offset. For example, suppose that `LAG(expr, 1)` is queried, and the previous record has a null value for *expr*. Then the second record previous is queried, and so on.

RESPECT NULLS

A clause that specifies that null values are counted when determining the offset. This behavior is the default.

OVER *window-specification*

A clause that divides records in a stream partitioned by the time range interval or the number of records. A window specification defines how records in the stream are partitioned, whether by the time range interval or the number of records. 

## Example
<a name="sql-reference-lag-examples"></a>

### Example Dataset
<a name="w2aac22c31b7b9b2"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the input stream for the sample stock ticker. To learn how to create an Analytics application and configure the input stream for the sample stock ticker, see [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Return Values from Previous Records in an OVER Clause
<a name="w2aac22c31b7b9b4"></a>

In this example, the OVER clause divides records in a stream partitioned by the time range interval of '1' minute preceding. The LAG function then retrieves price values from the two previous records that contain the given ticker symbol, skipping records if `price` is null.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    price     DOUBLE,
    previous_price DOUBLE,
    previous_price_2 DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, 
         price,
         LAG(price, 1, 0) IGNORE NULLS OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' MINUTE PRECEDING),
         LAG(price, 2, 0) IGNORE NULLS OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' MINUTE PRECEDING)   
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for time, ticker symbol, price, and previous prices.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-lag.png)


## Notes
<a name="sql-reference-lag-notes"></a>

LAG is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics streaming SQL extension.

# Monotonic Function
<a name="sql-reference-monotonic"></a>

```
MONOTONIC(<expression>)
```

Streaming GROUP BY requires that at least one of the grouped expressions be monotonic and non-constant. The only column known in advance to be monotonic is ROWTIME. For more information, see [Monotonic Expressions and Operators](sql-reference-monotonic-expressions-operators.md).

The MONOTONIC function allows you to declare that a given expression is monotonic, enabling a streaming GROUP BY to use that expression as a key.

The MONOTONIC function evaluates its argument and returns the result (as the same type as its argument).

By enclosing an expression in MONOTONIC, you are asserting that values of that expression are either non-increasing or non-decreasing and never change direction. For example, if you have a stream LINEITEMS consisting of the line items of orders, and you wrote MONOTONIC(orderId), you are asserting that line items are consecutive in the stream. It would be OK if there were line items for order 1000, followed by line items for order 1001, followed by line items for order 1005. It would be illegal if there were then a line item for order 1001 (that is, the line item sequence became 1000, 1001, 1005, 1001). Similarly, a line item sequence of 987, 974, 823 would be legal, but the following line item sequences would be illegal:
+ 987, 974, 823, 973
+ 987, 974, 823, 1056

An expression declared monotonic can decrease, or even have arbitrary order.

Note that the definition of MONOTONIC is precisely what is needed for GROUP BY to make progress.

If an expression declared monotonic is not monotonic (that is, if the assertion is not valid for the actual data) then Amazon Kinesis Data Analytics behavior is unspecified.

In other words, if you are certain that an expression is monotonic, you can use this MONOTONIC function to enable Amazon Kinesis Data Analytics to treat the expression as monotonic.

However, if you are mistaken and the values resulting from evaluating the expression change from ascending to descending or from descending to ascending, unexpected results may arise. Amazon Kinesis Data Analytics streaming SQL will take you at your word and operate on your assurance that the expression is monotonic. But if in fact it is not monotonic, the resulting Amazon Kinesis Data Analytics behavior cannot be determined in advance, and so results may not be as expected or desired.

# NTH\$1VALUE
<a name="sql-reference-nth-value"></a>

```
NTH_VALUE(x, n) [ <from first or last> ] [ <null treatment> ] over w
```

Where:

<null treatment> := RESPECT NULLS \$1 IGNORE NULL

<from first or last> := FROM FIRST \$1 FROM LAST

NTH\$1VALUE returns the nth value of x from the first or last value in the window. Default is first. If <null treatment> is set to IGNORE NULLS, then function will skip over nulls while counting. 

If there aren't enough rows in the window to reach nth value, the function returns NULL.

# String and Search Functions
<a name="sql-reference-string-and-search-functions"></a>

The topics in this section describe the string and search functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [CHAR\$1LENGTH / CHARACTER\$1LENGTH](sql-reference-char-length.md)
+ [INITCAP](sql-reference-initcap.md)
+ [LOWER](sql-reference-lower.md)
+ [OVERLAY](sql-reference-overlay.md)
+ [POSITION](sql-reference-position.md)
+ [REGEX\$1REPLACE](sql-reference-regex-replace.md)
+ [SUBSTRING](sql-reference-substring.md)
+ [TRIM](sql-reference-trim.md)
+ [UPPER](sql-reference-upper.md)

# CHAR\$1LENGTH / CHARACTER\$1LENGTH
<a name="sql-reference-char-length"></a>

```
 CHAR_LENGTH | CHARACTER_LENGTH ( <character-expression> )
```

Returns the length in characters of the string passed as the input argument. Returns null if input argument is null.

## Examples
<a name="sql-reference-char-length-examples"></a>


|  |  | 
| --- |--- |
|  <pre>CHAR_LENGTH('one')</pre>  |  3  | 
|  <pre>CHAR_LENGTH('')</pre>  |  0  | 
|  <pre>CHARACTER_LENGTH('fred')</pre>  |  4  | 
|  <pre>CHARACTER_LENGTH( cast (null as varchar(16) )</pre>  |  null  | 
|  <pre>CHARACTER_LENGTH( cast ('fred' as char(16) )</pre>  |  16  | 

## Limitations
<a name="sql-reference-char-length-limitations"></a>

Amazon Kinesis Data Analytics streaming SQL does not support the optional USING CHARACTERS \$1 OCTETS clause. This is a departure from the SQL:2008 standard.

# INITCAP
<a name="sql-reference-initcap"></a>

```
INITCAP ( <character-expression> )
```

Returns a converted version of the input string such that the first character of each space-delimited word is upper-cased, and all other characters are lower-cased.

## Examples
<a name="sql-reference-initcap-examples"></a>


| Function | Result | 
| --- | --- | 
|  INITCAP('Each FIRST lEtTeR is cAPITALIZED')  |  Each First Letter Is Capitalized  | 

## 
<a name="sqlrf-initcap-notes"></a>

**Note**  
The INITCAP function is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics extension.

# LOWER
<a name="sql-reference-lower"></a>

```
LOWER ( <character-expression> )
```

Converts a string to all lower-case characters. Returns null if input argument is null, and the empty string if the input argument is an empty string.

## Examples
<a name="sql-reference-lower-examples"></a>


| Function | Result | 
| --- | --- | 
|  LOWER('abcDEFghi123')  |  abcdefghi123  | 

# OVERLAY
<a name="sql-reference-overlay"></a>

```
 OVERLAY ( <original-string>
           PLACING <replacement-string>
           FROM <start-position>
           [ FOR <string-length> ]
         )
 <original-string> := <character-expression>
 <replacement-string> := <character-expression>
 <start-position> := <integer-expression>
 <string-length> := <integer-expression>
```

The OVERLAY function is used to replace a portion of the first string argument (the original string) with the second string argument (the replacement string).

The start position indicates the character position in the original string where the replacement string should be overlaid. The optional string length parameter determines how many characters of the original string to replace (if not specified, it defaults to the length of the replacement string). If there are more characters in the replacement string than are left in the original string, the remaining characters are simply appended.

If the start position is greater than the length of the original string, the replacement string is simply appended. If the start position is less than 1, then ( 1 - start position) characters of the replacement string is prepended to the result, and the rest overlaid on the original (see examples below).

If the string length is less than zero, an exception is raised.

If any of the input arguments are null, the result is null.

## Examples
<a name="sqlrf-overlay-examples"></a>


| Function | Result | 
| --- | --- | 
|  OVERLAY ('12345' PLACING 'foo' FROM 1)  |  foo45  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 0)  |  foo345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM -2)  |  foo12345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 4)  |  123foo  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 17)  |  12345foo  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 0)  |  1foo2345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 2)  |  1foo45  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 9)  |  1foo  | 

## Limitations
<a name="sqlrf-overlay-limitations"></a>

Amazon Kinesis Data Analytics does not support the optional USING CHARACTERS \$1 OCTETS clause defined in SQL:2008; USING CHARACTERS is simply assumed. Strict SQL:2008 also requires that a start position less than 1 return a null result, rather than the behavior described above. These are departures from the standard.

# POSITION
<a name="sql-reference-position"></a>

```
 POSITION ( <search-string> IN <source-string> )
 search-string := <character-expression>
 source-string := <character-expression>
```

The POSITION function searches for the first input argument (the search string) within the second input argument (the source string).

If the search string is found within the source string, POSITION returns the character position of the first instance of the search string (subsequent instances are ignored). If the search string is the empty string, POSITION returns 1.

If the search string is not found, POSITION returns 0.

If either the search string or the source string is null, POSITION returns null.

## Examples
<a name="sql-reference-position-examples"></a>


| Function | Result | 
| --- | --- | 
|  POSITION ('findme' IN '1234findmeXXX')  |  5  | 
|  POSITION ('findme' IN '1234not-hereXXX')  |  0  | 
|  POSITION ('1' IN '1234567')  |  1  | 
|  POSITION ('7' IN '1234567')  |  7  | 
|  POSITION ('' IN '1234567')  |  1  | 

## Limitations
<a name="sql-reference-position-limitations"></a>

Amazon Kinesis Data Analytics streaming SQL does not support the optional USING CHARACTERS \$1 OCTETS clause defined in SQL:2008; USING CHARACTERS is simply assumed. This is a departure from the standard.

# REGEX\$1REPLACE
<a name="sql-reference-regex-replace"></a>

REGEX\$1REPLACE replaces a substring with an alternative substring. It returns the value of the following Java expression.

```
java.lang.String.replaceAll(regex, replacement)
```

## Syntax
<a name="sql-reference-regex-replace-syntax"></a>

```
REGEX_REPLACE(original VARCHAR(65535), regex VARCHAR(65535), replacement VARCHAR(65535), startPosition int, occurence int)

RETURNS VARCHAR(65535)
```

## Parameters
<a name="sql-reference-regex-replace-parameters"></a>

*original*

The string on which to execute the regex operation.

*regex*

The [regular expression](https://en.wikipedia.org/wiki/Regular_expression) to match. If the encoding for *regex* doesn't match the encoding for *original*, an error is written to the error stream.

*replacement*

The string to replace *regex* matches in the *original* string. If the encoding for *replacement* doesn't match the encoding for *original* or *regex*, an error is written to the error stream.

*startPosition*

The first character in the *original* string to search. If *startPosition* is less than 1, an error is written to the error stream. If *startPosition* is greater than the length of *original*, then *original* is returned.

*occurence*

The occurrence of the string that matches the *regex* expression to replace. If *occurence* is 0, all substrings matching *regex* are replaced. If *occurence* is less than 0, an error is written to the error stream.

## Example
<a name="sql-reference-regex-replace-example"></a>

### Example Dataset
<a name="w2aac22c33c17c11b2"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

To run each example, you need an Amazon Kinesis Analytics application that has the input stream for the sample stock ticker. To learn how to create an Analytics application and configure the input stream for the sample stock ticker, see [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Replace All String Values in a Source String with a New Value
<a name="w2aac22c33c17c11b4"></a>

In this example, all character strings in the `sector` field are replaced if they match a regular expression.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
        ticker_symbol VARCHAR(4), 
        SECTOR VARCHAR(24), 
        CHANGE REAL, 
        PRICE REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM   TICKER_SYMBOL,
                REGEX_REPLACE(SECTOR, 'TECHNOLOGY', 'INFORMATION TECHNOLOGY', 1, 0);
                CHANGE,
                PRICE
FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for time, ticker symbol, sector, change, and price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-regex-replace.png)


## Notes
<a name="sql-reference-regex-replace-notes"></a>

REGEX\$1REPLACE is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics streaming SQL extension.

REGEX\$1REPLACE returns `null` if any parameters are `null`.

# SUBSTRING
<a name="sql-reference-substring"></a>

```
 SUBSTRING ( <source-string> FROM <start-position> [ FOR <string-length> ] )
 SUBSTRING ( <source-string>, <start-position> [ , <string-length> ] )
 SUBSTRING ( <source-string> SIMILAR <pattern> ESCAPE <escape-char> )
 <source-string> := <character-expression>
 <start-position> := <integer-expression>
 <string-length> := <integer-expression>
 <regex-expression> := <character-expression>
 <pattern> := <character-expression>
 <escape-char> := <character-expression>
```

SUBSTRING extracts a portion of the source string specified in the first argument. Extraction starts at the value of *start-position* or the first expression matching the value of *regex-expression*.

If a value is specified for *string-length*, only that number of characters is returned. If there aren't that many characters left in the string, only the characters that are left are returned. If *string-length* is not specified, the string length defaults to the remaining length of the input string.

If the start position is less than 1, then the start position is interpreted as if it is 1 and the string length is reduced by (1–start position). For examples, see following. If the start position is greater than the number of characters in the string, or the length parameter is 0, the result is an empty string.

## Parameters
<a name="sql-reference-substring-parameters"></a>

*source-string*

The string to search for positional or regular-expression matches.

*start-position*

The first character of *source-string* to return. If *start-position* is greater than the length of *source-string*, SUBSTRING returns null.

*string-length*

The number of characters from *source-string* to return.

*regex-expression*

A pattern of characters to match and return from *source-string*. Only the first match is returned.

*pattern*

A three-part pattern of characters that consists of the following:
+ The string to be found before the returned substring
+ The returned substring
+ The string to be found after the returned substring

The parts are delimited by a double quotation mark (") and a specified escape character. For more information, see [Similar...Escape](#sql-reference-substring-examples-similar) Samples following.

## Examples
<a name="sql-reference-substring-examples"></a>

### FROM/ FOR
<a name="sql-reference-substring-examples-from-for"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789' FROM 3 FOR 4)  |  3456  | 
|  SUBSTRING('123456789' FROM 17 FOR 4)  |  <empty string>  | 
|  SUBSTRING('123456789' FROM -1 FOR 4)  |  12  | 
|  SUBSTRING('123456789' FROM 6 FOR 0)  |  <empty string>  | 
|  SUBSTRING('123456789' FROM 8 FOR 4)  |  89  | 

### FROM Regex
<a name="sql-reference-substring-examples-from"></a>


| Function | Result | 
| --- | --- | 
| SUBSTRING('TECHNOLOGY' FROM 'L[A-Z]\$1') | LOGY | 
| SUBSTRING('TECHNOLOGY' FROM 'FOO') | null | 
| SUBSTRING('TECHNOLOGY' FROM 'O[A-Z]') | OL | 

### Numeric
<a name="sql-reference-substring-examples-numeric"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789', 3, 4)  |  3456  | 
|  SUBSTRING('123456789', 7, 4)  |  789  | 
|  SUBSTRING('123456789', 10, 4)  |  null  | 

### Similar...Escape
<a name="sql-reference-substring-examples-similar"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789' SIMILAR '23\$1"456\$1"78' ESCAPE '\$1')  |  456  | 
|  SUBSTRING('TECHNOLOGY' SIMILAR 'TECH%"NOLO%"GY' ESCAPE '%')  |  NOLO  | 

## Notes
<a name="sql-reference-substring-notes"></a>
+ Amazon Kinesis Data Analytics streaming SQL doesn't support the optional 'USING CHARACTERS \$1 OCTETS' clause defined in SQL:2008. USING CHARACTERS is simply assumed.
+ The second and third forms of the SUBSTRING function listed preceding (using a regular expression, and using commas rather than FROM...FOR) are not part of the SQL:2008 standard. They are part of the streaming SQL extension to Amazon Kinesis Data Analytics.

# TRIM
<a name="sql-reference-trim"></a>

```
TRIM ( [ [ <trim-specification> ] [ <trim-character> ] FROM ] <trim-source> )
 <trim-specification> := LEADING | TRAILING | BOTH
 <trim-character> := <character-expression>
 <trim-source> := <character-expression>
```



TRIM removes instances of the specified trim-character from the beginning and/or end of the trim-source string as dictated by the trim-specification (that is, LEADING, TRAILING, or BOTH). If LEADING is specified, only repetitions of the trim character at the beginning of the source string are removed. If TRAILING is specified, only repetitions of the trim character at the end of the source string are removed. If BOTH is specified, or the trim specifier is left out entirely, then repetitions are removed from both the beginning and end of the source string.

If the trim-character is not explicitly specified, it defaults to the space character (' '). Only one trim character is allowed; specifying an empty string or a string longer than one character results in an exception.

If either input is null, null is returned.

## Examples
<a name="sql-reference-trim-examples"></a>


| Function | Result | 
| --- | --- | 
|  <pre>TRIM(' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (BOTH FROM ' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (BOTH ' ' FROM ' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (LEADING 'x' FROM 'xxxTrim frontxxx')</pre>  |  'Trim frontxxx'  | 
|  <pre>TRIM (TRAILING 'x' FROM 'xxxTrimxBackxxx')</pre>  |  'xxxTrimxBack'  | 
|  <pre>TRIM (BOTH 'y' FROM 'xxxNo y to trimxxx')</pre>  |  'xxxNo y to trimxxx'  | 

# UPPER
<a name="sql-reference-upper"></a>

```
< UPPER ( <character-expression> )
```

Converts a string to all upper-case characters. Returns null if the input argument is null, and the empty string if the input argument is an empty string.

## Examples
<a name="sqlrf-upper-examples"></a>


| Function | Result | 
| --- | --- | 
|  UPPER('abcDEFghi123')  |  ABCDEFGHI123  | 