

For similar capabilities to Amazon Timestream for LiveAnalytics, consider Amazon Timestream for InfluxDB. It offers simplified data ingestion and single-digit millisecond query response times for real-time analytics. Learn more [here](https://docs.aws.amazon.com//timestream/latest/developerguide/timestream-for-influxdb.html).

# Built-in time series functionality
<a name="timeseries-specific-constructs"></a>

Timestream for LiveAnalytics provides built-in time series functionality that treat time series data as a first class concept.

Built-in time series functionality can be divided into two categories: views and functions.

You can read about each construct below.

**Topics**
+ [Timeseries views](timeseries-specific-constructs.views.md)
+ [Time series functions](timeseries-specific-constructs.functions.md)

# Timeseries views
<a name="timeseries-specific-constructs.views"></a>

Timestream for LiveAnalytics supports the following functions for transforming your data to the `timeseries` data type:

**Topics**
+ [CREATE\$1TIME\$1SERIES](#timeseries-specific-constructs.views.CREATE_TIME_SERIES)
+ [UNNEST](#timeseries-specific-constructs.views.UNNEST)

## CREATE\$1TIME\$1SERIES
<a name="timeseries-specific-constructs.views.CREATE_TIME_SERIES"></a>

 **CREATE\$1TIME\$1SERIES** is an aggregation function that takes all the raw measurements of a time series (time and measure values) and returns a timeseries data type. The syntax of this function is as follows: 

```
CREATE_TIME_SERIES(time, measure_value::<data_type>)
```

 where `<data_type>` is the data type of the measure value and can be one of bigint, boolean, double, or varchar. The second parameter cannot be null.

Consider the CPU utilization of EC2 instances stored in a table named **metrics** as shown below:


| Time | region | az | vpc | instance\$1id | measure\$1name | measure\$1value::double | 
| --- | --- | --- | --- | --- | --- | --- | 
|  2019-12-04 19:00:00.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef0  |  cpu\$1utilization  |  35.0  | 
|  2019-12-04 19:00:01.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef0  |  cpu\$1utilization  |  38.2  | 
|  2019-12-04 19:00:02.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef0  |  cpu\$1utilization  |  45.3  | 
|  2019-12-04 19:00:00.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef1  |  cpu\$1utilization  |  54.1  | 
|  2019-12-04 19:00:01.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef1  |  cpu\$1utilization  |  42.5  | 
|  2019-12-04 19:00:02.000000000  |  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef1  |  cpu\$1utilization  |  33.7  | 

Running the query:

```
SELECT region, az, vpc, instance_id, CREATE_TIME_SERIES(time, measure_value::double) as cpu_utilization FROM metrics
    WHERE measure_name=’cpu_utilization’
    GROUP BY region, az, vpc, instance_id
```

will return all series that have `cpu_utilization` as a measure value. In this case, we have two series: 


| region | az | vpc | instance\$1id | cpu\$1utilization | 
| --- | --- | --- | --- | --- | 
|  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef0  |  [\$1time: 2019-12-04 19:00:00.000000000, measure\$1value::double: 35.0\$1, \$1time: 2019-12-04 19:00:01.000000000, measure\$1value::double: 38.2\$1, \$1time: 2019-12-04 19:00:02.000000000, measure\$1value::double: 45.3\$1]  | 
|  us-east-1  |  us-east-1d  |  vpc-1a2b3c4d  |  i-1234567890abcdef1  |  [\$1time: 2019-12-04 19:00:00.000000000, measure\$1value::double: 35.1\$1, \$1time: 2019-12-04 19:00:01.000000000, measure\$1value::double: 38.5\$1, \$1time: 2019-12-04 19:00:02.000000000, measure\$1value::double: 45.7\$1]  | 

## UNNEST
<a name="timeseries-specific-constructs.views.UNNEST"></a>

 `UNNEST` is a table function that enables you to transform `timeseries` data into the flat model. The syntax is as follows: 

 `UNNEST` transforms a `timeseries` into two columns, namely, `time` and `value`. You can also use aliases with UNNEST as shown below: 

```
UNNEST(timeseries) AS <alias_name> (time_alias, value_alias)
```

where `<alias_name>` is the alias for the flat table, `time_alias` is the alias for the `time` column and `value_alias` is the alias for the `value` column.

For example, consider the scenario where some of the EC2 instances in your fleet are configured to emit metrics at a 5 second interval, others emit metrics at a 15 second interval, and you need the average metrics for all instances at a 10 second granularity for the past 6 hours. To get this data, you transform your metrics to the time series model using **CREATE\$1TIME\$1SERIES**. You can then use **INTERPOLATE\$1LINEAR** to get the missing values at 10 second granularity. Next, you transform the data back to the flat model using **UNNEST**, and then use **AVG** to get the average metrics across all instances.

```
WITH interpolated_timeseries AS (
    SELECT region, az, vpc, instance_id,
        INTERPOLATE_LINEAR(
            CREATE_TIME_SERIES(time, measure_value::double),
                SEQUENCE(ago(6h), now(), 10s)) AS interpolated_cpu_utilization
    FROM timestreamdb.metrics 
    WHERE measure_name= ‘cpu_utilization’ AND time >= ago(6h)
    GROUP BY region, az, vpc, instance_id
)
SELECT region, az, vpc, instance_id, avg(t.cpu_util)
FROM interpolated_timeseries
CROSS JOIN UNNEST(interpolated_cpu_utilization) AS t (time, cpu_util)
GROUP BY region, az, vpc, instance_id
```

 The query above demonstrates the use of **UNNEST** with an alias. Below is an example of the same query without using an alias for **UNNEST**: 

```
WITH interpolated_timeseries AS (
    SELECT region, az, vpc, instance_id,
        INTERPOLATE_LINEAR(
            CREATE_TIME_SERIES(time, measure_value::double),
                SEQUENCE(ago(6h), now(), 10s)) AS interpolated_cpu_utilization
    FROM timestreamdb.metrics 
    WHERE measure_name= ‘cpu_utilization’ AND time >= ago(6h)
    GROUP BY region, az, vpc, instance_id
)
SELECT region, az, vpc, instance_id, avg(value)
FROM interpolated_timeseries
CROSS JOIN UNNEST(interpolated_cpu_utilization)
GROUP BY region, az, vpc, instance_id
```

# Time series functions
<a name="timeseries-specific-constructs.functions"></a>

Amazon Timestream for LiveAnalytics supports timeseries functions, such as derivatives, integrals, and correlations, as well as others, to derive deeper insights from your time series data. This section provides usage information for each of these functions, as well as sample queries. Select a topic below to learn more. 

**Topics**
+ [Interpolation functions](timeseries-specific-constructs.functions.interpolation.md)
+ [Derivatives functions](timeseries-specific-constructs.functions.derivatives.md)
+ [Integral functions](timeseries-specific-constructs.functions.integrals.md)
+ [Correlation functions](timeseries-specific-constructs.functions.correlation.md)
+ [Filter and reduce functions](timeseries-specific-constructs.functions.filter-reduce.md)

# Interpolation functions
<a name="timeseries-specific-constructs.functions.interpolation"></a>

If your time series data is missing values for events at certain points in time, you can estimate the values of those missing events using interpolation. Amazon Timestream supports four variants of interpolation: linear interpolation, cubic spline interpolation, last observation carried forward (locf) interpolation, and constant interpolation. This section provides usage information for the Timestream for LiveAnalytics interpolation functions, as well as sample queries. 



## Usage information
<a name="w2aab7c59c13c13c11b7"></a>


| Function | Output data type | Description | 
| --- | --- | --- | 
|  `interpolate_linear(timeseries, array[timestamp])`  |  timeseries  |  Fills in missing data using [linear interpolation](https://wikipedia.org/wiki/Linear_interpolation).  | 
|  `interpolate_linear(timeseries, timestamp)`  |  double  |  Fills in missing data using [linear interpolation](https://wikipedia.org/wiki/Linear_interpolation).  | 
|  `interpolate_spline_cubic(timeseries, array[timestamp])`  |  timeseries  |  Fills in missing data using [cubic spline interpolation](https://wikiversity.org/wiki/Cubic_Spline_Interpolation#:~:text=Cubic%20spline%20interpolation%20is%20a,Lagrange%20polynomial%20and%20Newton%20polynomial.).  | 
|  `interpolate_spline_cubic(timeseries, timestamp)`  |  double  |  Fills in missing data using [cubic spline interpolation](https://wikiversity.org/wiki/Cubic_Spline_Interpolation#:~:text=Cubic%20spline%20interpolation%20is%20a,Lagrange%20polynomial%20and%20Newton%20polynomial.).  | 
|  `interpolate_locf(timeseries, array[timestamp])`  |  timeseries  |  Fills in missing data using the last sampled value.  | 
|  `interpolate_locf(timeseries, timestamp)`  |  double  |  Fills in missing data using the last sampled value.  | 
|  `interpolate_fill(timeseries, array[timestamp], double)`  |  timeseries  |  Fills in missing data using a constant value.  | 
|  `interpolate_fill(timeseries, timestamp, double)`  |  double  |  Fills in missing data using a constant value.  | 

## Query examples
<a name="w2aab7c59c13c13c11b9"></a>

**Example**  
Find the average CPU utilization binned at 30 second intervals for a specific EC2 host over the past 2 hours, filling in the missing values using linear interpolation:  

```
WITH binned_timeseries AS (
SELECT hostname, BIN(time, 30s) AS binned_timestamp, ROUND(AVG(measure_value::double), 2) AS avg_cpu_utilization
FROM "sampleDB".DevOps
WHERE measure_name = 'cpu_utilization'
    AND hostname = 'host-Hovjv'
    AND time > ago(2h)
GROUP BY hostname, BIN(time, 30s)
), interpolated_timeseries AS (
SELECT hostname,
    INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(binned_timestamp, avg_cpu_utilization),
            SEQUENCE(min(binned_timestamp), max(binned_timestamp), 15s)) AS interpolated_avg_cpu_utilization
FROM binned_timeseries
GROUP BY hostname
)
SELECT time, ROUND(value, 2) AS interpolated_cpu
FROM interpolated_timeseries
CROSS JOIN UNNEST(interpolated_avg_cpu_utilization)
```

**Example**  
Find the average CPU utilization binned at 30 second intervals for a specific EC2 host over the past 2 hours, filling in the missing values using interpolation based on the last observation carried forward:  

```
WITH binned_timeseries AS (
SELECT hostname, BIN(time, 30s) AS binned_timestamp, ROUND(AVG(measure_value::double), 2) AS avg_cpu_utilization
FROM "sampleDB".DevOps
WHERE measure_name = 'cpu_utilization'
    AND hostname = 'host-Hovjv'
    AND time > ago(2h)
GROUP BY hostname, BIN(time, 30s)
), interpolated_timeseries AS (
SELECT hostname,
    INTERPOLATE_LOCF(
        CREATE_TIME_SERIES(binned_timestamp, avg_cpu_utilization),
            SEQUENCE(min(binned_timestamp), max(binned_timestamp), 15s)) AS interpolated_avg_cpu_utilization
FROM binned_timeseries
GROUP BY hostname
)
SELECT time, ROUND(value, 2) AS interpolated_cpu
FROM interpolated_timeseries
CROSS JOIN UNNEST(interpolated_avg_cpu_utilization)
```

# Derivatives functions
<a name="timeseries-specific-constructs.functions.derivatives"></a>

Derivatives are used calculate the rate of change for a given metric and can be used to proactively respond to an event. For example, suppose you calculate the derivative of the CPU utilization of EC2 instances over the past 5 minutes, and you notice a significant positive derivative. This can be indicative of increased demand on your workload, so you may decide want to spin up more EC2 instances to better handle your workload. 

Amazon Timestream supports two variants of derivative functions. This section provides usage information for the Timestream for LiveAnalytics derivative functions, as well as sample queries. 



## Usage information
<a name="w2aab7c59c13c13c13b9"></a>


| Function | Output data type | Description | 
| --- | --- | --- | 
|  `derivative_linear(timeseries, interval)`  |  timeseries  |  Calculates the [derivative](https://wikipedia.org/wiki/Derivative) of each point in the `timeseries` for the specified `interval`.  | 
|  `non_negative_derivative_linear(timeseries, interval)`  |  timeseries  |  Same as `derivative_linear(timeseries, interval)`, but only returns positive values.  | 

## Query examples
<a name="w2aab7c59c13c13c13c11"></a>

**Example**  
Find the rate of change in the CPU utilization every 5 minutes over the past 1 hour:  

```
SELECT DERIVATIVE_LINEAR(CREATE_TIME_SERIES(time, measure_value::double), 5m) AS result 
FROM “sampleDB”.DevOps 
WHERE measure_name = 'cpu_utilization' 
AND hostname = 'host-Hovjv' and time > ago(1h) 
GROUP BY hostname, measure_name
```

**Example**  
Calculate the rate of increase in errors generated by one or more microservices:  

```
WITH binned_view as (
    SELECT bin(time, 5m) as binned_timestamp, ROUND(AVG(measure_value::double), 2) as value            
    FROM “sampleDB”.DevOps  
    WHERE micro_service = 'jwt'  
    AND time > ago(1h) 
    AND measure_name = 'service_error'
    GROUP BY bin(time, 5m)
)
SELECT non_negative_derivative_linear(CREATE_TIME_SERIES(binned_timestamp, value), 1m) as rateOfErrorIncrease
FROM binned_view
```

# Integral functions
<a name="timeseries-specific-constructs.functions.integrals"></a>

You can use integrals to find the area under the curve per unit of time for your time series events. As an example, suppose you're tracking the volume of requests received by your application per unit of time. In this scenario, you can use the integral function to determine the total volume of requests served per specified interval over a specific time period.

Amazon Timestream supports one variant of integral functions. This section provides usage information for the Timestream for LiveAnalytics integral function, as well as sample queries. 



## Usage information
<a name="w2aab7c59c13c13c15b9"></a>


| Function | Output data type | Description | 
| --- | --- | --- | 
|  `integral_trapezoidal(timeseries(double))` `integral_trapezoidal(timeseries(double), interval day to second)` `integral_trapezoidal(timeseries(bigint))` `integral_trapezoidal(timeseries(bigint), interval day to second)` `integral_trapezoidal(timeseries(integer), interval day to second)` `integral_trapezoidal(timeseries(integer))`  |  double  |  Approximates the [integral](https://wikipedia.org/wiki/Integral) per the specified `interval day to second` for the `timeseries` provided, using the [trapezoidal rule](https://wikipedia.org/wiki/Trapezoidal_rule). The interval day to second parameter is optional and the default is `1s`. For more information about intervals, see [Interval and duration](date-time-functions.md#date-time-functions-interval-duration).  | 

## Query examples
<a name="w2aab7c59c13c13c15c11"></a>

**Example**  
Calculate the total volume of requests served per five minutes over the past hour by a specific host:  

```
SELECT INTEGRAL_TRAPEZOIDAL(CREATE_TIME_SERIES(time, measure_value::double), 5m) AS result FROM sample.DevOps 
WHERE measure_name = 'request' 
AND hostname = 'host-Hovjv' 
AND time > ago (1h) 
GROUP BY hostname, measure_name
```

# Correlation functions
<a name="timeseries-specific-constructs.functions.correlation"></a>

Given two similar length time series, correlation functions provide a correlation coefficient, which explains how the two time series trend over time. The correlation coefficient ranges from `-1.0` to `1.0`. `-1.0` indicates that the two time series trend in opposite directions at the same rate. whereas `1.0` indicates that the two timeseries trend in the same direction at the same rate. A value of `0` indicates no correlation between the two time series. For example, if the price of oil increases, and the stock price of an oil company increases, the trend of the price increase of oil and the price increase of the oil company will have a positive correlation coefficient. A high positive correlation coefficient would indicate that the two prices trend at a similar rate. Similarly, the correlation coefficient between bond prices and bond yields is negative, indicating that these two values trends in the opposite direction over time.

Amazon Timestream supports two variants of correlation functions. This section provides usage information for the Timestream for LiveAnalytics correlation functions, as well as sample queries. 



## Usage information
<a name="w2aab7c59c13c13c19c11"></a>


| Function | Output data type | Description | 
| --- | --- | --- | 
|  `correlate_pearson(timeseries, timeseries)`  |  double  |  Calculates [Pearson's correlation coefficient](https://wikipedia.org/wiki/Pearson_correlation_coefficient) for the two `timeseries`. The timeseries must have the same timestamps.  | 
|  `correlate_spearman(timeseries, timeseries)`  |  double  |  Calculates [Spearman's correlation coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) for the two `timeseries`. The timeseries must have the same timestamps.  | 

## Query examples
<a name="w2aab7c59c13c13c19c13"></a>

**Example**  

```
WITH cte_1 AS (
    SELECT INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, measure_value::double), 
        SEQUENCE(min(time), max(time), 10m)) AS result 
    FROM sample.DevOps 
    WHERE measure_name = 'cpu_utilization' 
    AND hostname = 'host-Hovjv' AND time > ago(1h) 
    GROUP BY hostname, measure_name
), 
cte_2 AS (
    SELECT INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, measure_value::double), 
        SEQUENCE(min(time), max(time), 10m)) AS result 
    FROM sample.DevOps 
    WHERE measure_name = 'cpu_utilization' 
    AND hostname = 'host-Hovjv' AND time > ago(1h) 
    GROUP BY hostname, measure_name
) 
SELECT correlate_pearson(cte_1.result, cte_2.result) AS result 
FROM cte_1, cte_2
```

# Filter and reduce functions
<a name="timeseries-specific-constructs.functions.filter-reduce"></a>

Amazon Timestream supports functions for performing filter and reduce operations on time series data. This section provides usage information for the Timestream for LiveAnalytics filter and reduce functions, as well as sample queries. 



## Usage information
<a name="w2aab7c59c13c13c23b7"></a>


| Function | Output data type | Description | 
| --- | --- | --- | 
|  `filter(timeseries(T), function(T, Boolean))`  |  timeseries(T)  |  Constructs a time series from an the input time series, using values for which the passed `function` returns `true`.  | 
|  `reduce(timeseries(T), initialState S, inputFunction(S, T, S), outputFunction(S, R))`  |  R  |  Returns a single value, reduced from the time series. The `inputFunction` will be invoked on each element in timeseries in order. In addition to taking the current element, inputFunction takes the current state (initially `initialState`) and returns the new state. The `outputFunction` will be invoked to turn the final state into the result value. The `outputFunction` can be an identity function.  | 

## Query examples
<a name="w2aab7c59c13c13c23b9"></a>

**Example**  
Construct a time series of CPU utilization of a host and filter points with measurement greater than 70:  

```
WITH time_series_view AS (
    SELECT INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, ROUND(measure_value::double,2)), 
            SEQUENCE(ago(15m), ago(1m), 10s)) AS cpu_user
    FROM sample.DevOps
    WHERE hostname = 'host-Hovjv' and measure_name = 'cpu_utilization'
        AND time > ago(30m)
    GROUP BY hostname
)
SELECT FILTER(cpu_user, x -> x.value > 70.0) AS cpu_above_threshold
from time_series_view
```

**Example**  
Construct a time series of CPU utilization of a host and determine the sum squared of the measurements:  

```
WITH time_series_view AS (
    SELECT INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, ROUND(measure_value::double,2)), 
            SEQUENCE(ago(15m), ago(1m), 10s)) AS cpu_user
    FROM sample.DevOps
    WHERE hostname = 'host-Hovjv' and measure_name = 'cpu_utilization'
        AND time > ago(30m)
    GROUP BY hostname
)
SELECT REDUCE(cpu_user,
    DOUBLE '0.0',
    (s, x) -> x.value * x.value + s,
    s -> s)
from time_series_view
```

**Example**  
Construct a time series of CPU utilization of a host and determine the fraction of samples that are above the CPU threshold:  

```
WITH time_series_view AS (
    SELECT INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, ROUND(measure_value::double,2)), 
            SEQUENCE(ago(15m), ago(1m), 10s)) AS cpu_user
    FROM sample.DevOps
    WHERE hostname = 'host-Hovjv' and measure_name = 'cpu_utilization'
        AND time > ago(30m)
    GROUP BY hostname
)
SELECT ROUND(
    REDUCE(cpu_user, 
      -- initial state 
      CAST(ROW(0, 0) AS ROW(count_high BIGINT, count_total BIGINT)),
      -- function to count the total points and points above a certain threshold
      (s, x) -> CAST(ROW(s.count_high + IF(x.value > 70.0, 1, 0), s.count_total + 1) AS ROW(count_high BIGINT, count_total BIGINT)),
      -- output function converting the counts to fraction above threshold
      s -> IF(s.count_total = 0, NULL, CAST(s.count_high AS DOUBLE) / s.count_total)), 
    4) AS fraction_cpu_above_threshold
from time_series_view
```