# Standard SQL Operators
<a name="sql-reference-standard-sql-operators"></a>

The following topics discuss standard SQL operators:

**Topics**
+ [CREATE statements](sql-reference-create-statements.md)
+ [INSERT](sql-reference-insert.md)
+ [Query](sql-reference-query.md)
+ [SELECT statement](sql-reference-select.md)

# CREATE statements
<a name="sql-reference-create-statements"></a>

You can use the following CREATE statements with Amazon Kinesis Data Analytics:
+ [CREATE FUNCTION](sql-reference-create-function.md)
+ [CREATE PUMP](sql-reference-create-pump.md)
+ [CREATE STREAM](sql-reference-create-stream.md)

# CREATE STREAM
<a name="sql-reference-create-stream"></a>

The CREATE STREAM statement creates a (local) stream. The name of the stream must be distinct from the name of any other stream in the same schema. It is good practice to include a description of the stream.

Like tables, streams have columns, and you specify the data types for these in the CREATE STREAM statement. These should map to the data source for which you are creating the stream. For column\$1name, any valid non-reserved SQL name is usable. Column values cannot be null.
+ Specifying OR REPLACE re-creates the stream if it already exists, enabling a definition change for an existing object, implicitly dropping it without first needing to use a DRP command. Using CREATE OR REPLACE on a stream that already has data in flight kills the stream and loses all history.
+ RENAME can be specified only if OR REPLACE has been specified.
+ For the complete list of types and values in type\$1specification, such as TIMESTAMP, INTEGER, or varchar(2), see the topic Amazon Kinesis Data Analytics Data Types in the Amazon Kinesis Data Analytics SQL Reference Guide.
+ For option\$1value, any string can be used.

## Simple stream for unparsed log data
<a name="sql-reference-capture-stream-simple"></a>

```
CREATE OR REPLACE STREAM logStream (
    source  VARCHAR(20),
    message VARCHAR(3072))
DESCRIPTION 'Head of webwatcher stream processing';
```

## Stream capturing sensor data from Intelligent Travel System pipeline
<a name="sql-reference-create-stream-capturing-sensor"></a>

```
CREATE OR REPLACE STREAM "LaneData" (
    -- ROWTIME is time at which sensor data collected
    LDS_ID  INTEGER,        -- loop-detector ID
    LNAME   VARCHAR(12),
    LNUM    VARCHAR(4),
    OCC     SMALLINT,
    VOL     SMALLINT,
    SPEED   DECIMAL(4,2)
) DESCRIPTION 'Conditioned LaneData for analysis queries';
```

## Stream capturing order data from e-commerce pipeline
<a name="sql-reference-create-stream-capturing-order"></a>

```
CREATE OR REPLACE STREAM "OrderData" (
    "key_order"    BIGINT NOT NULL,
    "key_user"     BIGINT,
    "country"      SMALLINT,
    "key_product"  INTEGER,
    "quantity"     SMALLINT,
    "eur"          DECIMAL(19,5),
    "usd"          DECIMAL(19,5)
) DESCRIPTION 'conditioned order data, ready for analysis';
```

# CREATE FUNCTION
<a name="sql-reference-create-function"></a>

Amazon Kinesis Data Analytics provides a number of [Functions](sql-reference-functions.md), and also allows users to extend its capabilities by means of user-defined functions (UDFs). Amazon Kinesis Data Analytics supports UDFs defined in SQL only.

User-defined functions may be invoked using either the fully-qualified name or by the function name alone. 

Values passed to (or returned from) a user-defined function or transformation must be exactly the same data types as the corresponding parameter definitions. In other words, implicit casting is not allowed in passing parameters to (or returning values from) a user-defined function.

## User-Defined Function (UDF)
<a name="sql-reference-create-function-udf"></a>

A user-defined function can implement complex calculations, taking zero or more scalar parameters and returning a scalar result. UDFs operate like built-in functions such as FLOOR() or LOWER(). For each occurrence of a user-defined function within a SQL statement, that UDF is called once per row with scalar parameters: constants or column values in that row.

## Syntax
<a name="sql-reference-create-function-syntax"></a>

```
 CREATE FUNCTION ''<function_name>'' ( ''<parameter_list>'' )
  RETURNS ''<data type>''
  LANGUAGE SQL
  [ SPECIFIC ''<specific_function_name>''  | [NOT] DETERMINISTIC ]
  CONTAINS SQL
  [ READS SQL DATA ]
  [ MODIFIES SQL DATA ]
  [ RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ]  
  RETURN ''<SQL-defined function body>''
```

SPECIFIC assigns a specific function name that is unique within the application. Note that the regular function name does not need to be unique (two or more functions may share the same name, as long as they are distinguishable by their parameter list). 

DETERMINISTIC / NOT DETERMINISTIC indicates whether a function will always return the same result for a given set of parameter values. This may be used by your application for query optimization.

READS SQL DATA and MODIFIES SQL DATA indicate whether the function potentially reads or modifies SQL data, respectively. If a function attempts to read data from tables or streams without READS SQL DATA being specified, or insert to a stream or modify a table without MODIFIES SQL DATA being specified, an exception will be raised.

RETURNS NULL ON NULL INPUT and CALLED ON NULL INPUT indicate whether the function is defined as returning null if any of its parameters are null. If left unspecified, the default is CALLED ON NULL INPUT.

A SQL-defined function body consists only of a single RETURN statement.

## Examples
<a name="sql-reference-create-function-examples"></a>

```
CREATE FUNCTION get_fraction( degrees DOUBLE )
    RETURNS DOUBLE
    CONTAINS SQL
    RETURN degrees - FLOOR(degrees)
;
```

# CREATE PUMP
<a name="sql-reference-create-pump"></a>

A pump is an Amazon Kinesis Data Analytics Repository Object (an extension of the SQL standard) that provides a continuously running INSERT INTO stream SELECT ... FROM query functionality, thereby enabling the results of a query to be continuously entered into a named stream. 

You need to specify a column list for both the query and the named stream (these imply a set of source-target pairs). The column lists need to match in terms of datatype, or the SQL validator will reject them. (These need not list all columns in the target stream; you can set up a pump for one column.)

For more information, see [SELECT statement](sql-reference-select.md).

The following code first creates and sets a schema, then creates two streams in this schema: 
+ "OrderDataWithCreateTime" which will serve as the origin stream for the pump.
+ "OrderData" which will serve as the destination stream for the pump.

```
CREATE OR REPLACE STREAM "OrderDataWithCreateTime" (
"key_order" VARCHAR(20),
"key_user" VARCHAR(20),
"key_billing_country" VARCHAR(20),
"key_product" VARCHAR(20),
"quantity" VARCHAR(20),
"eur" VARCHAR(20),
"usd" VARCHAR(20))
DESCRIPTION 'Creates origin stream for pump';

CREATE OR REPLACE STREAM "OrderData" (
"key_order" VARCHAR(20),
"key_user" VARCHAR(20),
"country" VARCHAR(20),
"key_product" VARCHAR(20),
"quantity" VARCHAR(20),
"eur" INTEGER,
"usd" INTEGER)
DESCRIPTION 'Creates destination stream for pump';
```

The following code uses these two streams to create a pump. Data is selected from "OrderDataWithCreateTime" and inserted into "OrderData".

```
CREATE OR REPLACE PUMP "200-ConditionedOrdersPump" AS
INSERT INTO "OrderData" (
"key_order", "key_user", "country",
"key_product", "quantity", "eur", "usd")
//note that this list matches that of the query
SELECT STREAM
"key_order", "key_user", "key_billing_country",
"key_product", "quantity", "eur", "usd"
//note that this list matches that of the insert statement
FROM "OrderDataWithCreateTime";
```

For more detail, see the topic [In-Application Streams and Pumps](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/streams-pumps.html) in the *Amazon Managed Service for Apache Flink Developer Guide*. 

## Syntax
<a name="sqlrf_create_pump_syntax"></a>

```
 CREATE [ OR REPLACE ] PUMP <qualified-pump-name> 
                       [ DESCRIPTION '<string-literal>' ] AS <streaming-insert>
```

where streaming-insert is an insert statement such as:

```
INSERT INTO ''stream-name'' SELECT "columns" FROM <source stream>
```

# INSERT
<a name="sql-reference-insert"></a>

INSERT is used to insert rows into a stream. It can also be used in a pump to insert the output of one stream into another.

## Syntax
<a name="sql-reference-insert-syntax"></a>

```
 <insert statement> :=
   INSERT [ EXPEDITED ]
   INTO  <table-name > [ ( insert-column-specification ) ]
   <  query  >
 <insert-column-specification> := < simple-identifier-list >
 <simple-identifier-list> :=
    <simple-identifier> [ , < simple-identifier-list > ]
```

For a discussion of VALUES, see [SELECT statement](sql-reference-select.md).

## Pump Stream Insert
<a name="sql-reference-insert-pump"></a>

INSERT may also be specified as part of a [CREATE PUMP](sql-reference-create-pump.md) statement.

```
 CREATE PUMP "HighBidsPump" AS INSERT INTO "highBids" ( "ticker", "shares", "price")
 SELECT  "ticker", "shares", "price"
 FROM SALES.bids
 WHERE "shares"*"price">100000
```

Here the results to be inserted into the "highBids" stream should come from a UNION ALL expression that evaluates to a stream. This will create a continuously running stream insert. Rowtimes of the rows inserted will be inherited from the rowtimes of the rows output from the select or UNION ALL. Again rows may be initially dropped if other inserters, ahead of this inserter, have inserted rows with rowtimes later than those initially prepared by this inserter, since the latter would then be out of time order. See the topic [CREATE PUMP](sql-reference-create-pump.md) in this guide.

# Query
<a name="sql-reference-query"></a>

## Syntax
<a name="sql-reference-query-syntax"></a>

```
 <query> :=
     <select>
   | <query> <set-operator> [ ALL ] <query>
   | VALUES <row-constructor> { , <row-constructor> }...
   | '(' <query> ')'
  <set-operator> :=
     EXCEPT
   | INTERSECT
   | UNION
  <row-constructor> :=
     [ ROW ] ( <expression> { , <expression> }... )
```

## select
<a name="sql-reference-query-select"></a>

The select box in the chart above represents any SELECT command; that command is described in detail on its own page.

Set operators (EXCEPT, INTERSECT, UNION)

Set operators combine rows produced by queries using set operations:
+ EXCEPT returns all rows that are in the first set but not in the second
+ INTERSECT returns all rows that are in both first and second sets
+ UNION returns all rows that are in either set

In all cases, the two sets must have the same number of columns, and the column types must be assignment-compatible. The column names of the resulting relation are the names of the columns of the first query.

With the ALL keyword, the operators use the semantics of a mathematical [Multiset](https://en.wikipedia.org/wiki/Multiset) , meaning that duplicate rows are not eliminated. For example, if a particular row occurs 5 times in the first set and 2 times in the second set, then UNION ALL will emit the row 3 \$1 2 = 5 times.

ALL is not currently supported for EXCEPT or INTERSECT.

All operators are left-associative, and INTERSECT has higher precedence than EXCEPT or UNION, which have the same precedence. To override default precedence, you can use parentheses. For example:

```
SELECT * FROM a
UNION
SELECT * FROM b
INTERSECT
SELECT * FROM c
EXCEPT
SELECT * FROM d
EXCEPT
SELECT * FROM E
```

is equivalent to the fully-parenthesized query

```
( ( SELECT * FROM a
    UNION
    ( SELECT * FROM b
      INTERSECT
      SELECT * FROM c) )
  EXCEPT
  SELECT * FROM d )
EXCEPT
SELECT * FROM e
```

## Streaming set operators
<a name="sql-reference-query-streaming-set-operators"></a>

UNION ALL is the only set operator that can be applied to streams. Both sides of the operator must be streams; it is an error if one side is a stream and the other is a relation.

For example, the following query produces a stream of orders taken over the phone or via the web:

```
SELECT STREAM *
  FROM PhoneOrders
UNION ALL
SELECT STREAM *
  FROM WebOrders
```

Rowtime generation. The rowtime of a row emitted from streaming UNION ALL is the same as the timestamp of the input row.

Rowtime bounds: A rowtime bound is an assertion about the future contents of a stream. It states that the next row in the stream will have a ROWTIME no earlier than the value of the bound. For example, if a rowtime bound is 2018-12-0223:23:07, this tells the system that the next row will arrive no earlier than 2018-12-0223:23:07. Rowtime bounds are useful in managing gaps in data flow, such as those left overnight on a stock exchange.

Amazon Kinesis Data Analytics ensures that the ROWTIME column is ascending by merging the incoming rows on the basis of the time stamp. If the first set has rows that are timestamped 10:00 and 10:30, and the second set has only reached 10:15, Kinesis Data Analytics pauses the first set and waits for the second set to reach 10:30. In this case, it would be advantageous if the producer of the second set were to send a rowtime bound.

## VALUES operator
<a name="sql-reference-query-values-operator"></a>

The VALUES operator expresses a constant relation in a query. (See also the discussion of VALUES in the topic SELECT in this guide.)

VALUES can be used as a top-level query, as follows:

```
VALUES 1 + 2 > 3;
EXPR$0
======
FALSE
VALUES
    (42, 'Fred'),
    (34, 'Wilma');
EXPR$0 EXPR$1
====== ======
    42 Fred
    34 Wilma
```

Note that the system has generated arbitrary column names for anonymous expressions. You can assign column names by putting VALUES into a subquery and using an AS clause:

```
SELECT *
FROM (
    VALUES
        (42, 'Fred'),
        (34, 'Wilma')) AS t (age, name);
AGE NAME
=== =====
 42 Fred
 34 Wilma
```

# SELECT statement
<a name="sql-reference-select"></a>

SELECT retrieves rows from streams. You can use SELECT as a top-level statement, or as part of a query involving set operations, or as part of another statement, including (for example) when passed as a query into a UDX. For examples, see the topics INSERT, IN, EXISTS, [CREATE PUMP](sql-reference-create-pump.md) in this guide.

The subclauses of the SELECT statement are described in the topics [SELECT clause](sql-reference-select-clause.md), [GROUP BY clause](sql-reference-group-by-clause.md), Streaming GROUP BY, [ORDER BY clause](sql-reference-order-by-clause.md), [HAVING clause](sql-reference-having-clause.md), [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md) and [WHERE clause](sql-reference-where-clause.md) in this guide.

## Syntax
<a name="sql-reference-select-syntax"></a>

```
 <select> :=
    SELECT [ STREAM] [ DISTINCT | ALL ]
    <select-clause>
    FROM <from-clause>
    [ <where-clause> ]
    [ <group-by-clause> ]
    [ <having-clause> ]
    [ <window-clause> ]
    [ <order-by-clause> ]
```

## The STREAM keyword and the principle of streaming SQL
<a name="sql-reference-select-stream"></a>

The SQL query language was designed for querying stored relations, and producing finite relational results. 

The foundation of streaming SQL is the STREAM keyword, which tells the system to compute the time differential of a relation. The time differential of a relation is the change of the relation with respect to time. A streaming query computes the change in a relation with respect to time, or the change in an expression computed from several relations.

To ask for the time-differential of a relation in Amazon Kinesis Data Analytics, we use the STREAM keyword:

```
SELECT STREAM * FROM Orders
```

If we start running that query at 10:00, it will produce rows at 10:15 and 10:25. At 10:30 the query is still running, waiting for future orders:

```
ROWTIME  orderId custName   product quantity
======== ======= ========== ======= ========
10:15:00     102 Ivy Black  Rice           6
10:25:00     103 John Wu    Apples         3
```

Here, the system is saying 'At 10:15:00 I executed the query SELECT \$1 FROM Orders and found one row in the result that was not present at 10:14:59.999'. It generates the row with a value of 10:15:00 in the ROWTIME column because that is when the row appeared. This is the core idea of a stream: a relation that keeps updating over time.

You can apply this definition to more complicated queries. For example, the stream

```
SELECT STREAM * FROM Orders WHERE quantity > 5
```

has a row at 10:15 but no row at 10:25, because the relation

```
SELECT * FROM Orders WHERE quantity > 5
```

goes from empty to one row when order 102 is placed at 10:15, but is not affected when order 103 is placed at 10:25.

We can apply the same logic to queries involving any combination of SQL operators. Queries involving JOIN, GROUP BY, subqueries, set operations UNION, INTERSECT, EXCEPT, and even qualifiers such as IN and EXISTS, are well-defined when converted to streams. Queries combining streams and stored relations are also well-defined.

# SELECT ALL and SELECT DISTINCT
<a name="sql-reference-select-all-distinct"></a>

If the ALL keyword is specified, the query does not eliminate duplicate rows. This is the default behavior if neither ALL nor DISTINCT is specified.

If the DISTINCT keyword is specified, a query eliminates rows that are duplicates according to the columns in the SELECT clause.

Note that for these purposes, the value NULL is considered equal to itself and not equal to any other value. These are the same semantics as for GROUP BY and the IS NOT DISTINCT FROM operator.
<a name="STREAMINGettingStartedELECTDISTINCT"></a>
**Streaming SELECT DISTINCT**  
SELECT DISTINCT can be used with streaming queries as long as there is a non-constant monotonic expression in the SELECT clause. (The rationale for the non-constant monotonic expression is the same as for streaming GROUP BY.) Amazon Kinesis Data Analytics emits rows for SELECT DISTINCT as soon as they are ready.

If ROWTIME is one of the columns in the SELECT clause, it is ignored for the purposes of duplicate-elimination. Duplicates are eliminated on the basis of the other columns in the SELECT clause.

For example:

```
SELECT STREAM DISTINCT ROWTIME, prodId, FLOOR(Orders.ROWTIME TO DAY)
FROM Orders
```

displays the set of unique products that are ordered in any given day.

If you are doing "GROUP BY floor(ROWTIME TO MINUTE)" and there are two rows in a given minute -- say 22:49:10 and 22:49:15 -- then the summary of those rows is going to come out timestamped 22:50:00. Why? Because that is the earliest time that row is complete.

**Note: **"GROUP BY ceil(ROWTIME TO MINUTE)" or "GROUP BY floor(ROWTIME TO MINUTE) - INTERVAL '1' DAY" would give identical behavior.

It is not the value of the grouping expression that determines row completion, it's when that expression changes value.

If you want the rowtimes of the output rows to be the time they are emitted, then in the following example you would need to change from form 1 to use form 2 instead:

```
(Form 1)
   select distinct floor(s.rowtime to hour), a,b,c
   from s
(Form 2)
   select min(s.rowtime) as rowtime, floor(s.rowtime to hour), a, b, c
   from s
   group by floor(s.rowtime to hour), a, b, c
```

# SELECT clause
<a name="sql-reference-select-clause"></a>

 The <select-clause> uses the following items after the STREAM keyword:

```
  <select-list> :=
    <select-item> { , <select-item> }...
 <select-item> :=
    <select-expression> [ [ AS ] <simple-identifier> ]
 <simple-identifier> :=
    <identifier> | <quoted-identifier>
 <select-expression> :=
    <identifier> . *  | *  | <expression>
```

## Expressions
<a name="w2aac20c15c17b9"></a>

Each of these expressions may be:
+ a scalar expression
+ a call to an [Aggregate Functions](sql-reference-aggregate-functions.md), if this is an aggregating query (see [GROUP BY clause](sql-reference-group-by-clause.md))
+ a call to an [Analytic Functions](sql-reference-analytic-functions.md), if this is not an aggregating query
+ the wildcard expression \$1 expands to all columns of all relations in the FROM clause
+ the wildcard expression alias.\$1 expands to all columns of the relation named alias
+ the [ROWTIME](sql-reference-rowtime.md)
+ a [CASE expression](#sql-reference-select-clause-caseexpr)

Each expression may be assigned an alias, using the AS column\$1name syntax. This is the name of the column in the result set of this query. If this query is in the FROM clause of an enclosing query, this will be the name that will be used to reference the column. The number of columns specified in the AS clause of a stream reference must match the number of columns defined in the original stream.

Amazon Kinesis Data Analytics has a few simple rules to derive the alias of an expression that does not have an alias. The default alias of a column expression is the name of the column: for example, EMPS.DEPTNO is aliased DEPTNO by default. Other expressions are given an alias like EXPR\$10. You should not assume that the system will generate the same alias each time.

In a streaming query, aliasing a column AS ROWTIME has a special meaning: For more information, see [ROWTIME](sql-reference-rowtime.md).

**Note**  
All streams have an implicit column called ROWTIME. This column may impact your use of the syntax 'AS t(c1, c2, ...)' that is now supported by SQL:2008. Previously in a FROM clause you could only write

```
SELECT ... FROM r1 AS t1 JOIN r2 as t2
```

but t1 and t2 would have the same columns as r1 and t2. The AS syntax enables you to rename r1's columns by writing the following:

```
SELECT ... FROM r1 AS t1(a, b, c)
```

(r1 must have precisely 3 columns for this syntax to work).

If r1 is a stream, then ROWTIME is implicitly included, but it doesn't count as a column. As a result, if a stream has 3 columns without including ROWTIME, you cannot rename ROWTIME by specifying 4 columns. For example, if the stream Bids has three columns, the following code is invalid.

```
SELECT STREAM * FROM Bids (a, b, c, d)
```

It is also invalid to rename another column ROWTIME, as in the following example.

```
SELECT STREAM * FROM Bids (ROWTIME, a, b)
```

because that would imply renaming another column to ROWTIME. For more information about expressions and literals, see [Expressions and Literals](sql-reference-expressions.md).

## CASE expression
<a name="sql-reference-select-clause-caseexpr"></a>

The CASE expression enables you to specify a set of discrete test expressions and a specific return-value (expression) for each such test. Each test expression is specified in a WHEN clause; each return-value expression is specified in the corresponding THEN clause. Multiple such WHEN-THEN pairs can be specified.

If you specify a comparison-test-expression before the first WHEN clause, then each expression in a WHEN clause is compared to that comparison-test-expression. The first one to match the comparison-test-expression causes the return-value from its corresponding THEN clause to be returned. If no WHEN clause expression matches the comparison-test-expression, the return-value is null unless an ELSE clause is specified, in which case the return-value in that ELSE clause is returned.

If you do not specify a comparison-test-expression before the first WHEN clause, then each expression in a WHEN clause is evaluated (left to right) and the first one to be true causes the return-value from its corresponding THEN clause to be returned. If no WHEN clause expression is true, the return-value is null unless an ELSE clause is specified, in which case the return-value in that ELSE clause is returned.

## VALUES
<a name="sql-reference-select-clause-values"></a>

VALUES uses expressions to calculate one or more row values, and is often used within a larger command. When creating more than one row, the VALUES clause must specify the same number of elements for every row. The resulting table-columns data-types are derived from the explicit or inferred types of the expressions appearing in that column. VALUES is allowed syntactically wherever SELECT is permitted. See also the discussion of VALUES as an operator, in the topic Query in this guide.

SYNTAX

```
VALUES ( expression [, ...] ) [, ...]
    [ ORDER BY sort_expression [ ASC | DESC | USING operator ] [, ...] ]
```

VALUES is a SQL operator, on a par with SELECT and UNION, enabling the following types of actions:
+ You can write VALUES (1), (2) to return two rows each with a single anonymous column.
+ You can write VALUES (1, 'a'), (2, 'b') to return two rows of two columns.
+ You can name the columns using AS, as in the following example:

```
SELECT * FROM (VALUES (1, 'a'), (2, 'b')) AS t(x, y)
```

The most important use of VALUES is in an INSERT statement, to insert a single row:

```
  INSERT INTO emps (empno, name, deptno, gender)
    VALUES (107, 'Jane Costa', 22, 'F');
```

However, you can also insert multiple rows:

```
   INSERT INTO Trades (ticker, price, amount)
     VALUES ('MSFT', 30.5, 1000),
            ('ORCL', 20.25, 2000);
```

When you use VALUES in the FROM clause of a SELECT statement, the entire VALUES clause must be enclosed in parentheses, consistent with the fact that it operates as a query, not a table expression. For additional examples, see [FROM clause](sql-reference-from-clause.md).

**Note**  
Using INSERT with streams engages some additional considerations as to rowtimes, pumps, and INSERT EXPEDITED. For more information, see [INSERT](sql-reference-insert.md).

# FROM clause
<a name="sql-reference-from-clause"></a>

The FROM clause is the source of rows for a query.

```
 <from-clause> :=
    FROM <table-reference> { , <table-reference> }...
 <table-reference> :=
    <table-name> [ <table-name> ] [ <correlation> ]
| <joined-table>
 <table-name> :=  <identifier>
 <table-over> :=  OVER <window-specification>
 <window-specification> :=
 (   <window-name>
| <query_partition_clause>
| ORDER BY <order_by_clause>
| <windowing_clause>
    )
 <windowing-clause> :=
    { ROWS | RANGE }
    { BETWEEN
    { UNBOUNDED PRECEDING
      | CURRENT ROW
      | <value-expression> { PRECEDING | FOLLOWING }
    }
       AND
    { UNBOUNDED FOLLOWING
      | CURRENT ROW
      | <value-expression> { PRECEDING | FOLLOWING }
    }
    | { UNBOUNDED { PRECEDING | FOLLOWING }
      | CURRENT ROW
      | <value-expression> { PRECEDING | FOLLOWING }
    }
    }
```

For charts on window-specification and windowing-clause, see the [WINDOW Clause (Sliding Windows)](sql-reference-window-clause.md) under the Window statement.

```
 <correlation> :=
    [ AS ] <correlation-name> [ '(' <column> { , <column> }... ')' ]
 <joined-table> :=
    <table-reference> CROSS JOIN <table-reference>
  | <table-reference> NATURAL <join-type> JOIN <table-reference>
  | <table-reference> <join-type> JOIN <table-reference>
       [ USING '(' <column> { , <column>}... ')'
       | ON <condition>
       ]
 <join-type> :=
    INNER
  | <outer-join-type> [ OUTER ]
 <outer-join-type> :=
    LEFT
  | RIGHT
  | FULL
```
<a name="SELRELATIONS"></a>
**Relations**  
Several types of relation can appear in a FROM clause:
+ A named relation (table, stream)
+ A subquery enclosed in parentheses.
+ A join combining two relations (see the topic JOIN in this guide).
+ A transform expression.

Subqueries are described in more detail in the topic Query in this guide.

Here are some examples of subqueries:

```
// set operation as subquery
// (finds how many departments have no employees)
SELECT COUNT(*)
FROM (
  SELECT deptno FROM Dept
  EXCEPT
  SELECT deptno FROM Emp);
// table-constructor as a subquery,
// combined with a regular table in a join
SELECT *
FROM Dept AS d
  JOIN (VALUES ('Fred', 10), ('Bill', 20)) AS e (name, deptno)
  ON d.deptno = e.deptno;
```

Unlike subqueries in other parts of the SELECT statement, such as in the [WHERE clause](sql-reference-where-clause.md) clause (WHERE [Condition Clause](sql-reference-conditions.md)), a subquery in the FROM clause cannot contain correlating variables. For example:

```
// Invalid query. Dept.deptno is an illegal reference to
// a column of another table in the enclosing FROM clause.
SELECT *
FROM Dept,
  (SELECT *
   FROM Emp
   WHERE Emp.deptno = Dept.Deptno)
```

## FROM clause with multiple relations
<a name="sqlrf-from-clause-multiple-relations"></a>

If a FROM clause contains multiple, comma-separated relations, the query constructs the cartesian product of those relations; that is, it combines each row from each relation with each row from every other relation.

The comma in the FROM clause is therefore equivalent to the CROSS JOIN operator.

## Correlation names
<a name="sqlrf-from-clause-correlation-names"></a>

Each relation in the FROM clause can have a correlation name assigned using AS correlation-name. This name is an alternative name by which the relation can be referenced in expressions throughout the query. (Even though the relation may be a subquery or stream, it is conventionally called a 'table alias' to distinguish it from column aliases defined in the SELECT clause.)

Without an AS clause, a named relation's name becomes its default alias. (In streaming queries, the OVER clause does not prevent this default assignment from happening.)

An alias is necessary if a query uses the same named relation more than once, or if any of the relations are subqueries or table expressions.

For example, in the following query, the named relation EMPS is used twice; once with its default alias EMPS, and once with an assigned alias MANAGERS:

```
SELECT EMPS.NAME || ' is managed by ' || MANAGERS.NAME
FROM LOCALDB.Sales.EMPS,
  LOCALDB.Sales.EMPS AS MANAGERS
WHERE MANAGERS.EMPNO = EMPS.MGRNO
```

An alias can optionally be followed by a list of columns:

```
SELECT e.empname,
FROM LOCALDB.Sales.EMPS AS e(empname, empmgrno)
```

## OVER clause
<a name="sqlrf-from-clause-over"></a>

The OVER clause is only applicable for streaming joins. For more detail, see the topic [JOIN clause](sql-reference-join-clause.md) in this guide.

# JOIN clause
<a name="sql-reference-join-clause"></a>

The JOIN clause in a SELECT statement combines columns from one or more streams or reference tables.

**Topics**
+ [Stream-to-Stream Joins](#sqlrf-join-stream)
+ [Stream-to-Table Joins](#sqlrf-join-stream-to-table)

## Stream-to-Stream Joins
<a name="sqlrf-join-stream"></a>

Amazon Kinesis Data Analytics supports joining an in-application stream with another in-application stream using SQL, bringing this important traditional database functionality into the streaming context.

This section describes the types of joins that Kinesis Data Analytics supports, including time-based and row-based window joins, and the details about streaming joins.

### Join Types
<a name="sqlrf-join-stream-types"></a>

There are five types of joins:


|  |  | 
| --- |--- |
|  INNER JOIN (or just JOIN)  |  Returns all pairs of rows from the left and from the right for which the join condition evaluates to TRUE.  | 
|  LEFT OUTER JOIN (or just LEFT JOIN)  |  As INNER JOIN, but rows from the left are kept even if they don't match any rows on the right. NULL values are generated on the right.  | 
|  RIGHT OUTER JOIN (or just RIGHT JOIN)  |  As INNER JOIN, but rows from the right are kept even if they don't match any rows on the left. NULL values are generated on the left for these rows.  | 
|  FULL OUTER JOIN (or just FULL JOIN)  |  As INNER JOIN, but rows from both sides are kept even if they don't match any rows on the other side. NULL values are generated on the other side for these rows.  | 
|  CROSS JOIN  |  Returns the Cartesian product of the inputs: Every row from the left is paired with every row from the right.  | 

### Time-Based Window vs. Row-Based Window Joins
<a name="sqlrf-join-stream-time"></a>

It isn't practical to join the entire history of the left stream to the entire history of the right. Therefore, you must restrict at least one stream to a time window by using an OVER clause. The OVER clause defines a window of rows that are to be considered for joining at a given time.

The window can be time-based or row-based:
+ A time-based window uses the RANGE keyword. It defines the window as the set of rows whose ROWTIME column falls within a particular time interval of the query's current time.

  For example, the following clause specifies that the window contains all rows whose ROWTIMEs are within the hour preceding the stream's current time:

  ```
  OVER (RANGE INTERVAL '1' HOUR PRECEDING)
  ```
+ A row-based window uses the ROWS keyword. It defines the window as a given count of rows before or after the row with the current time stamp.

  For example, the following clause specifies that only the latest 10 rows be included in the window:

  ```
  OVER (ROWS 10 PRECEDING)
  ```

**Note**  
If no time window or row-based window is specified on the side of a join, then only the current row from that side participates in the join evaluation.

### Examples of Stream-to-Stream Joins
<a name="sqlrf-join-stream-examples"></a>

The following examples demonstrate how an in-application stream-to-stream join works, when the results of the join are returned, and what the row times of the join results are.

**Topics**
+ [Example Dataset](#sqlrf-join-stream-examples-dataset)
+ [Example 1: Time Window on One Side of a JOIN (INNER JOIN)](#sqlrf-join-stream-examples-1)
+ [Example 2: Time Windows on Both Sides of a JOIN (INNER JOIN)](#sqlrf-join-stream-examples-2)
+ [Example 3: Time Window on One Side of a RIGHT JOIN (RIGHT OUTER JOIN)](#sqlrf-join-stream-examples-3)
+ [Example 4: Time Windows on Both Sides of a RIGHT JOIN (RIGHT OUTER JOIN)](#sqlrf-join-stream-examples-4)
+ [Example 5: Time Window on One Side of a LEFT JOIN (LEFT OUTER JOIN)](#sqlrf-join-stream-examples-5)
+ [Example 6: Time Windows on Both Sides of a LEFT JOIN (LEFT OUTER JOIN)](#sqlrf-join-stream-examples-6)
+ [Summary](#sqlrf-join-stream-examples-summary)

#### Example Dataset
<a name="sqlrf-join-stream-examples-dataset"></a>

The examples in this section are based on the following datasets and stream definitions:

##### Sample of Orders Data
<a name="sqlrf-join-stream-examples-dataset-orders"></a>

```
{
   "orderid":"101",
   "orders":"1"
}
```

##### Sample of Shipments Data
<a name="sqlrf-join-stream-examples-dataset-shipments"></a>

```
{
   "orderid":"101",
   "shipments":"2"
}
```

##### Creating the ORDERS\$1STREAM In-Application Stream
<a name="sqlrf-join-stream-examples-dataset-create-orders"></a>

```
CREATE OR REPLACE STREAM "ORDERS_STREAM" ("orderid" int, "orderrowtime" timestamp);
CREATE OR REPLACE PUMP "ORDERS_STREAM_PUMP" AS INSERT INTO "ORDERS_STREAM"
SELECT STREAM "orderid", "ROWTIME"
FROM "SOURCE_SQL_STREAM_001" WHERE "orders" = 1;
```

##### Creating the SHIPMENTS\$1STREAM In-Application Stream
<a name="sqlrf-join-stream-examples-dataset-create-shipments"></a>

```
CREATE OR REPLACE STREAM "SHIPMENTS_STREAM" ("orderid" int, "shipmentrowtime" timestamp);
CREATE OR REPLACE PUMP "SHIPMENTS_STREAM_PUMP" AS INSERT INTO "SHIPMENTS_STREAM"
SELECT STREAM "orderid", "ROWTIME"
FROM "SOURCE_SQL_STREAM_001" WHERE "shipments" = 2;
```

#### Example 1: Time Window on One Side of a JOIN (INNER JOIN)
<a name="sqlrf-join-stream-examples-1"></a>

This example demonstrates a query that returns all orders with shipments that executed in the last minute.

##### Join Query
<a name="sqlrf-join-stream-examples-1-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "shipmenttime" timestamp, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS 
INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", s."orderid", s.rowtime as "shipmenttime", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      JOIN SHIPMENTS_STREAM AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-1-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | shipmenttime | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  |  | 
| 10:00:20 | 102 |  |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:45 | 10:00:40 | 
|  |  | 10:00:50 | 105 |  |  |  |  | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-1-graph"></a>

The following diagram represents a query that returns all orders with shipments that executed in the last minute.

![\[Diagram of the join between all orders (orders_stream) and the shipments (shipments_stream) that occurred in the last minute.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex1-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-1-return"></a>

The following describes the events that trigger results from the query.
+ Because no time or row window is specified on the Shipments stream, only the current row of the Shipments stream participates in the join.
+ Because the query on the Orders stream specifies a one-minute preceding window, the rows in the Orders stream with a ROWTIME in the last minute participate in the join.
+ When the record in the Shipments stream arrived at 10:00:45 for orderid 104, the JOIN result was triggered because there is a match on orderid in the Orders stream in the preceding minute.
+ The record in the Orders stream with orderid 100 arrived late, so the corresponding record in the Shipments stream was not the latest record. Because no window was specified on the Shipments stream, only the current row of the Shipments stream participates in the join. As a result, no records are returned by the JOIN statement for orderid 100. For information about including late rows in a JOIN statement, see [Example 2](#sqlrf-join-stream-examples-2).
+ Because there is no matching record in the Shipments stream for orderid 105, no results are emitted, and the record is ignored.

##### ROWTIMES of Results
<a name="sqlrf-join-stream-examples-1-rowtimes"></a>
+ The ROWTIME of the record in the output stream is the later of the ROWTIMEs of the rows that matched the join.

#### Example 2: Time Windows on Both Sides of a JOIN (INNER JOIN)
<a name="sqlrf-join-stream-examples-2"></a>

This example demonstrates a query that returns all orders that executed in the last minute, with shipments that executed in the last minute.

##### Join Query
<a name="sqlrf-join-stream-examples-2-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "shipmenttime" timestamp, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", s."orderid", s.rowtime as "shipmenttime", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      JOIN SHIPMENTS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-2-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | shipmenttime | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  |  | 
| 10:00:20 | 102 |  |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:45 | 10:00:40 | 
|  |  |  |  | 10:00:45 | 100 | 10:00:00 | 10:00:45 | 
|  |  | 10:00:50 | 105 |  |  |  |  | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-2-graph"></a>

The following diagram represents a query that returns all orders that executed in the last minute, with shipments that executed in the last minute.

![\[Diagram of the join between all orders occurring in last minute (orders_stream) and the shipments occurring in last minute (shipments_stream).\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex2-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-2-return"></a>

The following describes the events that trigger results from the query.
+ Windows are specified on both sides of the join. So all the rows in the minute preceding the current row of both the Orders stream and the Shipments stream participate in the join.
+ When the record in the Shipments stream for orderid 104 arrived, the corresponding record in the Orders stream was within the one-minute window. So a record was returned to the Output stream.
+ Even though the order event for orderid 100 arrived late in the Orders stream, the join result was returned. This is because the window in the Shipments stream includes the past minute of orders, which includes the corresponding record.
+ Having a window on both sides of the join is helpful for including late-arriving records on either side of the join; for example, if an order or shipment record is received late or out of order. 

##### ROWTIMEs of Results
<a name="sqlrf-join-stream-examples-2-rowtimes"></a>
+ The ROWTIME of the record in the output stream is the later of the ROWTIMEs of the rows that matched the join.

#### Example 3: Time Window on One Side of a RIGHT JOIN (RIGHT OUTER JOIN)
<a name="sqlrf-join-stream-examples-3"></a>

This example demonstrates a query that returns all shipments that executed in the last minute, whether or not there are corresponding orders in the last minute.

##### Join Query
<a name="sqlrf-join-stream-examples-3-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", s."orderid", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      RIGHT JOIN SHIPMENTS_STREAM AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-3-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  | 
|  |  |  |  | 10:00:00 | 100 | null | 
| 10:00:20 | 102 |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:40 | 
|  |  | 10:00:50 | 105 |  |  |  | 
|  |  |  |  | 10:00:50 | 105 | null | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-3-graph"></a>

The following diagram represents a query that returns all shipments that executed in the last minute, whether or not there are corresponding orders in the last minute.

![\[Diagram of a query returning all shipments (shipments_stream) occurring in the last minute, whether or not there are corresponding orders (orders_stream) in the last minute\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex3-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-3-return"></a>

The following describes the events that trigger results from the query.
+ When a record in the Shipments stream arrived for orderid 104, a result in the Output stream was emitted.
+ As soon as the record in the Shipments stream arrived for orderid 105, a record was emitted in the Output stream. However, there is no matching record in the Orders stream, so the OrderTime value is null.

##### ROWTIMEs of Results
<a name="sqlrf-join-stream-examples-3-rowtimes"></a>
+ The ROWTIME of the record in the output stream is the later of the ROWTIMEs of the rows that matched the join.
+ Because the right side of the join (the Shipments stream) has no window, the ROWTIME of the result with an unmatched join is the ROWTIME of the unmatched row.

#### Example 4: Time Windows on Both Sides of a RIGHT JOIN (RIGHT OUTER JOIN)
<a name="sqlrf-join-stream-examples-4"></a>

This example demonstrates a query that returns all shipments that executed in the last minute, whether or not they have corresponding orders.

##### Join Query
<a name="sqlrf-join-stream-examples-4-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "shipmenttime" timestamp, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", s."orderid", s.ROWTIME as "shipmenttime", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      RIGHT JOIN SHIPMENTS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-4-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | shipmenttime | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  |  | 
| 10:00:20 | 102 |  |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:40 | 10:00:45 | 
|  |  |  |  | 10:00:45 | 100 | 10:00:45 | 10:00:00 | 
|  |  | 10:00:50 | 105 |  |  |  |  | 
|   |  |  |  |  |  |  |  | 
|  |  |  |  | 10:01:50 | 105 | 10:00:50 | null | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-4-graph"></a>

The following diagram represents a query that returns all shipments that executed in the last minute, whether or not they have corresponding orders.

![\[Diagram of a query returning all shipments (shipments_stream) in the last minute, whether or not there are corresponding orders (orders_stream).\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex4-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-4-return"></a>

The following describes the events that trigger results from the query.
+ When a record in the Shipments stream arrived for orderid 104, a result in the Output stream was emitted.
+ Even though the order event for orderid 100 arrived late in the Orders stream, the join result is returned. This is because the window in the Shipments stream includes the past minute of orders, which includes the corresponding record.
+ For the shipment for which the order is not found (for orderid 105), the result is not emitted to the Output stream until the end of the one-minute window on the Shipments stream.

##### ROWTIMEs of Results
<a name="sqlrf-join-stream-examples-4-rowtimes"></a>
+ The ROWTIME of the record in the output stream is the later of the ROWTIMEs of the rows that matched the join.
+ For shipment records with no matching order record, the ROWTIME of the result is the ROWTIME of the end of the window. This is because the right side of the join (from the Shipments stream) is now a one-minute window of events, and the service is waiting for the end of the window to determine whether any matching records arrive. When the window ends and no matching records are found, the result is emitted with a ROWTIME corresponding to the end of the window.

#### Example 5: Time Window on One Side of a LEFT JOIN (LEFT OUTER JOIN)
<a name="sqlrf-join-stream-examples-5"></a>

This example demonstrates a query that returns all orders that executed in the last minute, whether or not there are corresponding shipments in the last minute.

##### Join Query
<a name="sqlrf-join-stream-examples-5-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", o."orderid", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      LEFT JOIN SHIPMENTS_STREAM AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-5-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  | 
| 10:00:20 | 102 |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:40 | 
|  |  | 10:00:50 | 105 |  |  |  | 
|   |  |  |  |  |  |  | 
|  |  |  |  | 10:01:00 | 101 | 10:00:00 | 
|  |  |  |  | 10:01:20 | 102 | 10:00:20 | 
|  |  |  |  | 10:01:30 | 103 | 10:00:30 | 
|  |  |  |  | 10:01:40 | 104 | 10:00:40 | 
|  |  |  |  | 10:01:45 | 100 | 10:00:45 | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-5-graph"></a>

The following diagram represents a query that returns all orders that executed in the last minute, whether or not there are corresponding shipments in the last minute.

![\[Diagram of a query returning all orders (orders_stream) that executed in the last minute, whether or not there are corresponding shipments (shipments_stream) in the last minute.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex5-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-5-return"></a>

The following describes the events that trigger results from the query.
+ When a record in the Shipments stream arrived for orderid 104, a result in the Output stream is emitted.
+ For records in the Orders stream with no corresponding record in the Shipments stream, records are not emitted to the Output stream until the end of the one-minute window. This is because the service is waiting until the end of the window for matching records.

##### ROWTIMEs of Results
<a name="sqlrf-join-stream-examples-5-rowtimes"></a>
+ The ROWTIME of the record in the output stream is the later of the ROWTIMEs of the rows that matched the join.
+ For records in the Orders stream with no corresponding record in the Shipments stream, the ROWTIMEs of the results are the ROWTIMEs of the end of the current window.

#### Example 6: Time Windows on Both Sides of a LEFT JOIN (LEFT OUTER JOIN)
<a name="sqlrf-join-stream-examples-6"></a>

This example demonstrates a query that returns all orders that executed in the last minute, whether or not they have corresponding shipments.

##### Join Query
<a name="sqlrf-join-stream-examples-6-query"></a>

```
CREATE OR REPLACE STREAM "OUTPUT_STREAM" ("resultrowtime" timestamp, "orderid" int, "shipmenttime" timestamp, "ordertime" timestamp);
CREATE OR REPLACE PUMP "OUTPUT_STREAM_PUMP" AS INSERT INTO "OUTPUT_STREAM"
  SELECT STREAM ROWTIME as "resultrowtime", s."orderid", s.ROWTIME as "shipmenttime", o.ROWTIME as "ordertime"
    FROM ORDERS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS o
      LEFT JOIN SHIPMENTS_STREAM OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS s
        ON o."orderid" = s."orderid";
```

##### Query Results
<a name="sqlrf-join-stream-examples-6-results"></a>


| ORDERS\$1STREAM | SHIPMENTS\$1STREAM | OUTPUT\$1STREAM | 
| --- | --- | --- | 
| ROWTIME | orderid | ROWTIME | orderid | resultrowtime | orderid | shipmenttime | OrderTime | 
| 10:00:00 | 101 | 10:00:00 | 100 |  |  |  |  | 
| 10:00:20 | 102 |  |  |  |  |  |  | 
| 10:00:30 | 103 |  |  |  |  |  |  | 
| 10:00:40 | 104 |  |  |  |  |  |  | 
|  |  | 10:00:45 | 104 |  |  |  |  | 
| 10:00:45 | 100\$1 |  |  | 10:00:45 | 104 | 10:00:40 | 10:00:45 | 
|  |  | 10:00:50 | 105 | 10:00:45 | 100 | 10:00:00 | 10:00:45 | 
|   |  |  |  |  |  |  |  | 
|  |  |  |  | 10:01:00 | 101 | null | 10:00:00 | 
|  |  |  |  | 10:01:20 | 102 | null | 10:00:20 | 
|  |  |  |  | 10:01:30 | 103 | null | 10:00:30 | 
|  |  |  |  | 10:01:40 | 104 | null | 10:00:40 | 
|  |  |  |  | 10:01:45 | 100 | null | 10:00:45 | 

\$1 - Record with orderid = 100 is a late event in the Orders stream.

##### Visual Representation of the Join
<a name="sqlrf-join-stream-examples-6-graph"></a>

The following diagram represents a query that returns all orders that executed in the last minute, whether or not they have corresponding shipments.

![\[Diagram of a query returning all orders (orders_stream) that executed in the last minute, whether or not they have corresponding shipments (shipments_stream).\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/select-join-ex6-graph.png)


##### Triggering of Results
<a name="sqlrf-join-stream-examples-6-return"></a>

The following describes the events that trigger results from the query.
+ When a record in the Shipments stream arrived for orderids 104 and 100, a result in the Output stream was emitted. This occurred even though the record in the Orders stream for orderid 100 arrived late.
+ Records in the Orders stream with no corresponding record in the Shipments stream are emitted in the Output stream at the end of the one-minute window. This is because the service waits until the end of the window for corresponding records in the Shipments stream.

##### ROWTIMEs of Results
<a name="sqlrf-join-stream-examples-6-rowtimes"></a>
+ The ROWTIME of the record in the Output stream is the later of the ROWTIMEs of the rows that matched the join.
+ For records in the Orders stream with no corresponding record in the Shipments stream, the ROWTIMEs of the orders is the ROWTIME corresponding to the end of the window.

#### Summary
<a name="sqlrf-join-stream-examples-summary"></a>
+ Kinesis Data Analytics always returns rows from joins in ascending order of ROWTIME.
+ For an inner join, the ROWTIME of an output row is the later of the ROWTIMEs of the two input rows. This is also true for an outer join for which matching input rows are found.
+ For outer joins for which a match is not found, the ROWTIME of an output row is the later of the following two times:
  + The ROWTIME of the input row for which a match was not found.
  + The later bound of the window of the other input stream at the point any possible match could have been found.

## Stream-to-Table Joins
<a name="sqlrf-join-stream-to-table"></a>

If one of the relations is a stream and the other is a finite relation, it is referred to as a stream-table join. For each row in the stream, the query looks up the row or rows in the table that match the join condition. 

For example, Orders is a stream and PriceList is a table. The effect of the join is to add price list information to the order.

For information about creating a reference data source and joining a stream to a reference table, see [Example: Add Reference Data Source](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/app-add-reference-data.html) in the *Amazon Kinesis Data Analytics Developer Guide*.

# HAVING clause
<a name="sql-reference-having-clause"></a>

The HAVING clause in a SELECT specifies a condition to apply within a group or aggregate. In other words, HAVING filters rows after the aggregation of the GROUP BY clause has been applied. Since HAVING is evaluated after GROUP BY, it can only reference expressions constructed (or derivable) from grouping keys, aggregate expressions, and constants. (These are the same rules that apply to expressions in the SELECT clause of a GROUP BY query.) A HAVING clause must come after any GROUP BY clause and before any ORDER BY clause. HAVING is like [WHERE clause](sql-reference-where-clause.md), but applies to groups. Results from a HAVING clause represent groupings or aggregations of original rows, whereas results from a WHERE clause are individual original rows.

In non-streaming applications, if there is no GROUP BY clause, GROUP BY () is assumed (though since there are no grouping expressions, expressions can consist only of constants and aggregate expressions). In streaming queries, HAVING cannot be used without a GROUP BY clause.

WHERE and HAVING can both appear in a single SELECT statement. The WHERE selects from the stream or table those individual rows that satisfy its condition (the WHERE-condition). The GROUP BY criteria apply only to the rows selected by the WHERE condition.

Such a grouping, for example "GROUP BY CustomerID", can be further qualified by a HAVING-condition, which then selects aggregations of rows satisfying its condition within the specified grouping. For example, "GROUP BY ClientID HAVING SUM(ShipmentValue) > 3600" would select only those clients whose various shipments that fit the WHERE criteria also had values that added up to exceed 3600.

See the WHERE clause syntax chart for the conditions, which applies to both HAVING and WHERE clauses.

The condition must be a Boolean predicate expression. The query returns only rows for which the predicate evaluates to TRUE.

The example below shows a streaming query that displays products for which there are more than \$11000 of orders in the past hour.

```
SELECT STREAM "prodId"
FROM "Orders"
GROUP BY FLOOR("Orders".ROWTIME TO HOUR), "prodId"
HAVING SUM("quantity" * "price") > 1000;
```

# GROUP BY clause
<a name="sql-reference-group-by-clause"></a>

## Syntax Chart for the GROUP BY Clause
<a name="sqlrf-groupbyclause-syntax-chart"></a>

(To see where this clause fits, see [SELECT statement](sql-reference-select.md))

For example, GROUP BY <column name-or-expression>, where:
+ the expression can be an aggregate; and,
+ any column name used in the GROUP BY clause must also be in the SELECT statement.

Additionally, a column that is not named in or derivable from the GROUP BY clause cannot appear in the SELECT statement except within aggregations, such as SUM (allOrdersValue).

What derivable means is that a column specified in the GROUP BY clause enables access to the column you want to include in the SELECT clause. If a column is derivable, the SELECT statement can specify it even though it is not explicitly named in the GROUP BY clause.

Example: If the key to a table is in the GROUP BY clause, then any of that table's columns can appear in the select-list because, given that key, such columns are considered accessible.

The GROUP BY clause groups selected rows based on the value of the grouping expressions, returning a single summary row of information for each group of rows that have identical values in all columns.

Note that for these purposes, the value NULL is considered equal to itself and not equal to any other value. These are the same semantics as for the IS NOT DISTINCT FROM operator.

## Streaming GROUP BY
<a name="sql-reference-group-by-streaming"></a>

GROUP BY can be used in a streaming query as long as one of the grouping expressions is a non-constant monotonic or time-based expression. This requirement is necessary in order for Amazon Kinesis Data Analytics to make progress, as explained below.

A monotonic expression is one that always moves in the same direction: it either ascends-or-stays-the-same, or it descends-or-stays the same; it doesn't reverse direction. It does not need to be strictly ascending or strictly descending, that is, every value always above the previous one or every value always below the previous one. A constant expression falls under the definition of monotonic -- it is technically both ascending and descending -- but is clearly unsuitable for these purposes. For more information about monotonicity, see [Monotonic Expressions and Operators](sql-reference-monotonic-expressions-operators.md).

Consider the following query:

```
SELECT STREAM prodId, COUNT(*)
FROM Orders
GROUP BY prodId
```

The query is intended to compute the number of orders for each product, as a stream. However, since Orders is an infinite stream, Amazon Kinesis Data Analytics can never know that it has seen all orders for a given product, can never complete a particular row's total, and therefore can never output a row. Rather than allow a query that can never emit a row, the Amazon Kinesis Data Analytics validator rejects the query.

The syntax for streaming GROUP BY is as follows:

`GROUP BY <monotonic or time-based expression> ,`

`<column name-or-expression, ...>`


where any column name used in the GROUP BY clause needs to be in the SELECT statement; the expression can be an aggregate. Additionally, a column name that does not appear in the GROUP BY clause cannot appear in the SELECT statement except within aggregations, or if, as above, access to the column can be created from column that you specify in the GROUP BY clause.

For example, the following query, which computes the product counts per hour, uses the monotonic expression FLOOR(Orders.ROWTIME TO HOUR) is therefore valid:

```
SELECT STREAM FLOOR(Orders.ROWTIME TO HOUR) AS theHour, prodId, COUNT(*)
FROM Orders
GROUP BY FLOOR(Orders.ROWTIME TO HOUR), prodId
```

One of the expressions in the GROUP BY must be monotonic or time-based. For example GROUP BY FLOOR(S.ROWTIME) TO HOUR will yield one output row per hour for the previous hour's input rows. The GROUP BY can specify additional partitioning terms. For example, GROUP BY FLOOR(S.ROWTIME) TO HOUR, USERID will yield one output row per hour per USERID value. If you know for a fact that an expression is monotonic, you can declare it so by using the [Monotonic Function](sql-reference-monotonic.md). If the actual data are not monotonic, the resulting system behavior is indeterminate: results may not be as expected or desired.

See the topic [Monotonic Function](sql-reference-monotonic.md) in this guide for more details.

Duplicate rowtimes can occur in a stream, and as long as the ROWTIME value is the same, the GROUP BY operation will keep accumulating rows. In order to emit a row, the ROWTIME value has to change at some point.

# WHERE clause
<a name="sql-reference-where-clause"></a>

The WHERE clause extracts records that meet a specified condition. The condition can be a numeric or string comparison, or use the BETWEEN, LIKE, OR IN operators: see [Streaming SQL Operators](sql-reference-streaming-operators.md). Conditions can be combined using logical operators such as AND, OR, and NOT.

The WHERE clause is like the [HAVING clause](sql-reference-having-clause.md) clause. It applies to groups, that is, results from a WHERE clause are individual original rows, whereas results from a HAVING clause represent groupings or aggregations of original rows.

WHERE and HAVING can both appear in a single SELECT statement. The WHERE selects from the stream or table those individual rows that satisfy its condition (the WHERE-condition). The GROUP BY criteria apply only to the rows selected by the WHERE condition. Such a grouping, for example "GROUP BY CustomerID", can be further qualified by a HAVING-condition, which then selects aggregations of rows satisfying its condition within the specified grouping. For example, "GROUP BY ClientID HAVING SUM(ShipmentValue) > 3600" would select only those clients whose various shipments that fit the WHERE criteria also had values that added up to exceed 3600.

## 
<a name="sql-reference-where-clause-example"></a>

To see where this clause fits into the SELECT statement, see [SELECT statement](sql-reference-select.md).

The condition must be a Boolean predicate expression. The query returns only rows for which the predicate evaluates to TRUE; if the condition evaluates to NULL, the row is not emitted.

The condition in the WHERE clause cannot contain windowed aggregation expressions, because if the where clause condition caused rows to be dropped, it would alter the contents of the window.

WHERE is also discussed in the topics [JOIN clause](sql-reference-join-clause.md) and [HAVING clause](sql-reference-having-clause.md) in this guide.

# WINDOW Clause (Sliding Windows)
<a name="sql-reference-window-clause"></a>

The `WINDOW` clause for a sliding windowed query specifies the rows over which analytic functions are computed across a group of rows in relation to the current row. These aggregate functions produce an output row aggregated by the keys in one or more columns for each input row. The `WINDOW` clause in a query specifies records in a stream partitioned by the time range interval or the number of rows, and an additional optional set of columns specified by the `PARTITION BY` clause. You can define named or inline window specifications that can be used in analytic functions and streaming `JOIN` clauses. For more information about analytic functions, see [Analytic Functions](sql-reference-analytic-functions.md).

Aggregate functions in a sliding window query are performed over each column specified in the `OVER` clause. The `OVER` clause can reference a named window specification or can be inline as part of the `SELECT` statement for a pump. The following examples show how to use the `OVER` clause to reference a named window specification and inline in the `SELECT` statement. 

## Syntax
<a name="w2aac20c15c30b7"></a>

```
[WINDOW window_name AS 
(
	{PARTITION BY partition_name 
		RANGE INTERVAL 'interval' {SECOND | MINUTE | HOUR} PRECEDING | 
		ROWS number PRECEDING
, …}
)
```

## OVER Clause
<a name="w2aac20c15c30b9"></a>

The examples following show you how to use the `OVER` clause to reference a named window specification.

**Example 1: OVER Referencing a Named Window Specification**

The following example shows an aggregate function that references the window specification with the name W1. In this example, the average price is calculated over the set of records specified by the `W1` window specification. To learn more about how to use the OVER clause with a window specification, see [Examples](#sql-reference-window-clause-examples), following.

```
AVG(price) OVER W1 AS avg_price 
```

**Example 2: OVER Referencing an Inline Window Specification**

 The following example shows an aggregate function that references an inline window specification. In this example, the average price is calculated over each input row with an inline window specification. To learn more about how to use the OVER clause with a window specification, see [Examples](#sql-reference-window-clause-examples), following. 

```
AVG(price) OVER (
    PARTITION BY ticker_symbol
    RANGE INTERVAL '1' HOUR PRECEDING) AS avg_price
```

For more information about aggregate functions and the OVER clause, see [Aggregate Functions](sql-reference-aggregate-functions.md).

## Parameters
<a name="w2aac20c15c30c11"></a>

*window-name*

Specifies a unique name that can be referenced from OVER clauses or subsequent window definitions. The name is used in analytic functions and streaming `JOIN` clauses. For more information about analytic functions, see [Analytic Functions](sql-reference-analytic-functions.md).

AS

Defines the named window specification for the `WINDOW` clause. 

PARTITION BY *partition-name*

Divides rows into groups that share the same values. After rows are partitioned, the window function computes all rows that fall into the same partition as the current row.

RANGE INTERVAL *'interval'* \$1SECOND \$1 MINUTE \$1 HOUR\$1 PRECEDING

Specifies the window boundaries from the time range interval. The window function computes all rows that fall into the same time interval as the current row.

ROWS *number* PRECEDING

Specifies the window boundaries from the number of rows. The window function computes all rows that fall into the same number of rows.

## Examples
<a name="sql-reference-window-clause-examples"></a>

### Example Dataset
<a name="w2aac20c15c30c13b3"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. To run each example, you need an Amazon Kinesis Analytics application that has the sample stock ticker input stream. To learn how to create an Analytics application and configure the sample stock ticker input stream, see [Getting Started](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. For additional samples, see [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html).

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Time-Based Sliding Window That References a Named Window Specification
<a name="w2aac20c15c30c13b5"></a>

This example defines a named window specification with a partition boundary of one minute preceding the current row. The `OVER` clause of the `SELECT` statement for the pump references the named window specification. 

```
WINDOW W1 AS (
    PARTITION BY ticker_symbol 
    RANGE INTERVAL '1' MINUTE PRECEDING);
```

To run this example, create the stock sample application and run and save the SQL code following.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4), 
    min_price     DOUBLE, 
    max_price     DOUBLE, 
    avg_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol,
        MIN(price) OVER W1 AS min_price,
        MAX(price) OVER W1 AS max_price,
        AVG(price) OVER W1 AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
    WINDOW W1 AS (
        PARTITION BY ticker_symbol 
        RANGE INTERVAL '1' MINUTE PRECEDING);
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for rowtime, ticker symbol, and price information.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-analytic-functions-example-1.png)


### Example 2: Row-Based Sliding Window That References a Named Window Specification
<a name="w2aac20c15c30c13b7"></a>

This example defines a named window specification with a partition boundary of two rows preceding the current row and ten rows preceding the current row. The `OVER` clause of the `SELECT` statement for the pump references the named window specification. 

```
 WINDOW
    last2rows AS (PARTITION BY ticker_symbol ROWS 2 PRECEDING),
    last10rows AS (PARTITION BY ticker_symbol ROWS 10 PRECEDING);
```

To run this example, create the stock sample application and run and save the SQL code following.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol      VARCHAR(4), 
    price              DOUBLE, 
    avg_last2rows      DOUBLE, 
    avg_Last10rows     DOUBLE);
CREATE OR REPLACE PUMP "myPump" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM ticker_symbol, 
    price, 
    AVG(price) OVER last2rows, 
    AVG(price) OVER last10rows
FROM SOURCE_SQL_STREAM_001
WINDOW
    last2rows AS (PARTITION BY ticker_symbol ROWS 2 PRECEDING),
    last10rows AS (PARTITION BY ticker_symbol ROWS 10 PRECEDING);
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock ticker symbols, prices, and average values for multiple rows.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-analytic-functions-example-2.png)


### Example 3: Time-Based Sliding Window with Inline Window Specification
<a name="w2aac20c15c30c13b9"></a>

This example defines an inline window specification with a partition boundary of one minute preceding the current row. The `OVER` clause of the `SELECT` statement for the pump uses the inline window specification.

To run this example, create the stock sample application and run and save the SQL code following.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    price         DOUBLE,
    avg_price     DOUBLE);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
    INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM ticker_symbol, price, 
        AVG(Price) OVER (
            PARTITION BY ticker_symbol
            RANGE INTERVAL '1' HOUR PRECEDING) AS avg_price
    FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for timestamp, ticker symbol, price, and average price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-analytic-functions-example-3.png)


## Usage Notes
<a name="w2aac20c15c30c15"></a>

### 
<a name="w2aac20c15c30c15b2"></a>

For the WINDOW clause and endpoints, Amazon Kinesis Analytics SQL follows SQL-2008 standards for windows over a range. 

To include the endpoints of an hour, you can use the window syntax following.

```
WINDOW HOUR AS (RANGE INTERVAL '1' HOUR PRECEDING) 
```

To not include the endpoints of the previous hour, you can use the window syntax following.

```
WINDOW HOUR AS (RANGE INTERVAL '59:59.999' MINUTE TO SECOND(3) PRECEDING);
```

For more information, see [Allowed and Disallowed Window Specifications](sql-reference-allowed-disallowed-window.md).

## Related Topics
<a name="w2aac20c15c30c17"></a>
+ [Sliding Windows](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/sliding-window-concepts.html) in the Kinesis Developer Guide
+ [Aggregate Functions](sql-reference-aggregate-functions.md)
+ [SELECT statement](sql-reference-select.md)
+ [CREATE STREAM](sql-reference-create-stream.md) statement
+ [CREATE PUMP](sql-reference-create-pump.md) statement

# Allowed and Disallowed Window Specifications
<a name="sql-reference-allowed-disallowed-window"></a>

Amazon Kinesis Data Analytics supports nearly all windows that end with the current row. 

You cannot define an infinite window, a negative-sized window, or use negative integers in the window specification. Offset windows are currently unsupported.
+ Infinite windows are windows with no bounds. Typically these point into the future, which for streams is infinite. For example "ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING" is not supported, because in a streaming context such a query would not produce a result, since streams are continually expanding as new data arrives. All uses of UNBOUNDED FOLLOWING are unsupported.
+ Negative windows . For example, "ROWS BETWEEN 0 PRECEDING AND 4 PRECEDING" is a window of negative size and is therefore illegal. Instead, you would use: "ROWS BETWEEN 4 PRECEDING AND 0 PRECEDING" in this case.
+ Offset windows are windows that do not end with CURRENT ROW. These are not supported in the current release. For example, "ROWS BETWEEN UNBOUNDED PRECEDING AND 4 FOLLOWING" is not supported. (Window spans CURRENT ROW rather than starting or ending there.)
+ Windows defined with negative integers. For example,  "ROWS BETWEEN -4 PRECEDING AND CURRENT ROW" is invalid because negative integers are disallowed.

Also, the special case of ... 0 PRECEDING (and ... 0 FOLLOWING) cannot be used for windowed aggregation; instead, the synonym CURRENT ROW can be used. 

For windowed aggregation, partitioned windows are allowed, but ORDER BY must not be present.

For windowed join, partitioned windows are NOT allowed, but ORDER BY can be present if it sorts by the ROWTIME column of one of the inputs.

# Window examples
<a name="sql-reference-window-examples"></a>

The following examples show a sample input data set, the definitions for several windows, and the contents of those windows at various times after 10:00, the time data starts to arrive for this example.

The windows are defined as follows:

```
SELECT STREAM
  ticker,
  sum(amount) OVER lastHour,
  count(*) OVER lastHour
  sum(amount) OVER lastThree
FROM Trades
WINDOW
  lastHour AS (RANGE INTERVAL '1' HOUR PRECEDING),
  lastThree AS (ROWS 3 PRECEDING),
  lastZeroRows AS (ROWS CURRENT ROW),
  lastZeroSeconds AS (RANGE CURRENT ROW),
  lastTwoSameTicker AS (PARTITION BY ticker ROWS 2 PRECEDING),
  lastHourSameTicker AS (PARTITION BY ticker RANGE INTERVAL '1' HOUR PRECEDING)
```

## First Example: time-based windows versus row-based windows
<a name="sql-reference-window-examples-example-1"></a>

As shown on the right side of the figure below, the time-based lastHour window contains varying numbers of rows, because window membership is defined by time range.

![\[Time-based window examples showing ROWTIME, ticker, and amount columns with varying row counts.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/window-examples.png)


## Examples of windows containing rows
<a name="sql-reference-window-examples-example-rows"></a>

The row-based lastThree window generally contains four rows: the three preceding and the current row. However for the row 10:10 IBM, it only contains two rows, because there is no data before 10:00.

A row-based window can contain several rows whose ROWTIME value is the same, though they arrive at different times (wall-clock times). The order of such a row in the row-based window depends on its arrival time; indeed, the row's arrival time can determine which window includes it.

For example, the middle lastThree window in Figure 1 shows the arrival of a YHOO trade with ROWTIME 11:15 (and the last three trades before it). However, this window excludes the next trade, for IBM, whose ROWTIME is also 11:15 but which must have arrived later than the YHOO trade. This 11:15 IBM trade is included in the 'next' window, as is the 11:15 YHOO trade, its immediate predecessor.

Second Example: zero width windows, row-based and time-based

Figure 2: Examples of zero-width windows shows row-based and time-based windows of zero width. The row-based window lastZeroRows includes just the current row, and therefore always contains precisely one row. Note that ROWS CURRENT ROW is equivalent to ROWS 0 PRECEDING.

The time-based window lastZeroSeconds contains all rows with the same timestamp, of which there may be several. Note that RANGE CURRENT ROW is equivalent to RANGE INTERVAL '0' SECOND PRECEDING.

![\[Examples of zero-width windows showing ROWTIME, ticker, and amount columns with time-based data.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/window-examples2.png)


## Third Example: Partitioning applied to row-based and time-based windows
<a name="sql-reference-window-examples-THIRDEXAMPLE"></a>

Figure 3 shows windows that are similar to those in Figure 1 but with a PARTITION BY clause. For time-based window lastTwoSameTicker and the row-based window lastHourSameTicker, the window contains rows that meet the window criteria and have the same value of the ticker column. Note: Partitions are evaluated before windows.

![\[Table showing partitioned windows with ROWTIME, ticker, and amount columns for stock data.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/window-examples3.png)


# ORDER BY clause
<a name="sql-reference-order-by-clause"></a>

A streaming query can use ORDER BY if its leading expression is time-based and monotonic. For example, a streaming query whose leading expression is based on the ROWTIME column can use ORDER BY to do the following operations:
+ Sort the results of a streaming GROUP BY.
+ Sort a batch of rows arriving within a fixed time window of a stream.
+ Perform streaming ORDER BY on windowed-joins.

The "time-based and monotonic" requirement on the leading expression means that the query

```
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM DISTINCT ticker FROM trades ORDER BY ticker
```

will fail, but the query

```
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM DISTINCT rowtime, ticker FROM trades ORDER BY ROWTIME, ticker
```

will succeed.

**Note**  
The preceding examples use the `DISTINCT` clause to remove duplicate instances of the same ticker symbol from the result set, so that the results will be monotonic.

Streaming ORDER BY sorts rows using SQL-2008 compliant syntax for the ORDER BY clause. It can be combined with a UNION ALL statement, and can sort on expressions, such as:

```
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM x, y FROM t1
UNION ALL
SELECT STREAM a, b FROM t2 ORDER BY ROWTIME, MOD(x, 5)
```

The ORDER BY clause can specify ascending or descending sort order, and can use column ordinals, as well as ordinals specifying (referring to) the position of items in the select list.

**Note**  
The `UNION` statement in the preceding query collects records from two separate streams for ordering.
<a name="TOC32"></a>
**Streaming ORDER BY SQL Declarations**  
The streaming ORDER BY clause includes the following functional attributes:
+ Gathers rows until the monotonic expression in streaming ORDER BY clause does not change.
+ Does not require streaming GROUP BY clause in the same statement.
+ Can use any column with a basic SQL data type of TIMESTAMP, DATE, DECIMAL, INTEGER, FLOAT, CHAR, VARCHAR.
+ Does not require that columns/expressions in the ORDER BY clause be present in the SELECT list of the statement.
+ Applies all the standard SQL validation rules for ORDER BY clause.

The following query is an example of streaming ORDER BY:

```
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM state, city, SUM(amount)
FROM orders
GROUP BY FLOOR(ROWTIME TO HOUR), state, city
ORDER BY FLOOR(ROWTIME TO HOUR), state, SUM(amount)
```

# T-sorting Stream Input
<a name="sqlrf_T-sorting_Stream_Input"></a>

Amazon Kinesis Data Analytics real-time analytics use the fact that arriving data is ordered by ROWTIME. However, sometimes data arriving from multiple sources may not be time-synchronized.

While Amazon Kinesis Data Analytics can sort data from individual data sources that have been independently inserted into Amazon Kinesis Data Analytics application's native stream, in some cases such data may have already combined from multiple sources (such as for efficient consumption at an earlier stage in processing). At other times, high volume data sources could make direct insertion impossible.

In addition, an unreliable data source could block progress by forcing Amazon Kinesis Data Analytics application to wait indefinitely, unable to proceed until all connected data sources deliver. In this case, data from this source could be unsynchronized.

You can use the ORDER BY clause to resolve these issues. Amazon Kinesis Data Analytics uses a sliding time-based window of incoming rows to reorder those rows by ROWTIME.

**Syntax**  
You specify the time-based parameter for sorting and the time-based window in which the streaming rows are to be time-sorted, using the following syntax:

```
  ORDER BY <timestamp_expr> WITHIN
      <interval_literal>
```

**Restrictions**  
The T-sort has the following restrictions:
+ The datatype of the ORDER BY expression must be timestamp.
+ The partially-ordered expression <timestamp\$1expr> must be present in the select list of the query with the alias ROWTIME.
+ The leading expression of the ORDER BY clause must not contain the ROWTIME function and must not use the DESC keyword.
+ The ROWTIME column needs to be fully qualified. For example:
+ `ORDER BY FLOOR(ROWTIME TO MINUTE), ...` fails.
+ `ORDER BY FLOOR(s.ROWTIME TO MINUTE), ...` works.

If any of these requirements are not met, the statement will fail with errors.

Additional notes:
+ You cannot use incoming rowtimebounds. These are ignored by the system.
+ If <timestamp\$1expr> evaluates to NULL, the corresponding row is discarded.

# ROWTIME
<a name="sql-reference-rowtime"></a>

ROWTIME is an operator and system column that returns the time at which a particular row of a stream was created. 

It is used in four distinct ways:
+ As an operator
+ As a system column of a stream
+ As a column alias, to override the timestamp on the current row
+ As an ordinary column in a table

For more details, see the topics Timestamp, ROWTIME, and [CURRENT\$1ROW\$1TIMESTAMP](sql-reference-current-row-timestamp.md) in this guide.

**ROWTIME operator**  
When used in the SELECT clause of a streaming query, without being qualified by a preceding 'alias.' , ROWTIME is an operator that evaluates to the timestamp of the row that is just about to be generated.

Its type is always TIMESTAMP NOT NULL.

**ROWTIME system column**  
Every stream has a ROWTIME column. To reference this column from within a query, qualify it with the stream name (or alias). For example, the following join query returns three timestamp columns: the system columns of its input streams, and the timestamp of the generated row.

```
SELECT STREAM
  o.ROWTIME AS leftRowtime,
  s.ROWTIME AS rightRowtime,
  ROWTIME AS joinRowtime
FROM Orders AS o
  JOIN Shipments OVER (RANGE INTERVAL '1' HOUR FOLLOWING) AS s
  ON o.orderId = s.orderId

leftRowtime         rightRowtime        joinRowtime
=================== =================== ===================
2008-02-20 10:15:00 2008-02-20 10:30:00 2008-02-20 10:15:00
2008-02-20 10:25:00 2008-02-20 11:15:00 2008-02-20 10:25:00
2008-02-20 10:25:30 2008-02-20 11:05:00 2008-02-20 10:25:30
```


As it happens, leftRowtime is always equal to joinRowtime, because the join is specified such that the output row timestamp is always equal to the ROWTIME column from the Orders stream.

It follows that every streaming query has a ROWTIME column. However, the ROWTIME column is not returned from a top-level JDBC query unless you explicitly include it in the SELECT clause. For example:

```
CREATE STREAM Orders(
  "orderId" INTEGER NOT NULL,
  "custId" INTEGER NOT NULL);
SELECT columnName
FROM ALL_STREAMS;

columnName
==========
orderId
custId

SELECT STREAM *
FROM Orders;

orderId custId
======= ======
    100    501
    101     22
    102    699

SELECT STREAM ROWTIME, *
FROM Orders;

ROWTIME             orderId custId
=================== ======= ======
2008-02-20 10:15:00     100    501
2008-02-20 10:25:00     101     22
2008-02-20 10:25:30     102    699
```


This is mainly to ensure compatibility with JDBC: the stream Orders declares two columns, so it makes sense that "SELECT STREAM \$1" should return two columns.