

For similar capabilities to Amazon Timestream for LiveAnalytics, consider Amazon Timestream for InfluxDB. It offers simplified data ingestion and single-digit millisecond query response times for real-time analytics. Learn more [here](https://docs.aws.amazon.com//timestream/latest/developerguide/timestream-for-influxdb.html).

# Using scheduled queries in Timestream for LiveAnalytics
<a name="scheduledqueries"></a>

The scheduled query feature in Amazon Timestream for LiveAnalytics is a fully managed, serverless, and scalable solution for calculating and storing aggregates, rollups, and other forms of preprocessed data typically used for operational dashboards, business reports, ad-hoc analytics, and other applications. Scheduled queries make real-time analytics more performant and cost-effective, so you can derive additional insights from your data, and can continue to make better business decisions. 

With scheduled queries, you define the real-time analytics queries that compute aggregates, rollups, and other operations on the data—and Amazon Timestream for LiveAnalytics periodically and automatically runs these queries and reliably writes the query results into a separate table. The data is typically calculated and updated into these tables within a few minutes. 

You can then point your dashboards and reports to query the tables that contain aggregated data instead of querying the considerably larger source tables. This leads to performance and cost gains that can exceed orders of magnitude. This is because the tables with aggregated data contain much less data than the source tables, so they offer faster queries and cheaper data storage. 

Additionally, tables with scheduled queries offer all of the existing functionality of a Timestream for LiveAnalytics table. For example, you can query the tables using SQL. You can visualize the data stored in the tables using Grafana. You can also ingest data into the table using Amazon Kinesis, Amazon MSK, AWS IoT Core, and Telegraf. You can configure data retention policies on these tables for automatic data lifecycle management. 

Because the data retention of the tables that contain aggregated data is fully decoupled from that of source tables, you can also choose to reduce the data retention of the source tables and keep the aggregate data for a much longer duration, at a fraction of the data storage cost. Scheduled queries make real-time analytics faster, cheaper, and therefore more accessible to many more customers, so they can monitor their applications and drive better data-driven business decisions. 

**Topics**
+ [Benefits](#scheduledqueries-benifits)
+ [Use cases](#scheduledqueries-usescases)
+ [Example](#scheduledqueries-example)
+ [Concepts](scheduledqueries-concepts.md)
+ [Schedule expressions](scheduledqueries-schedule.md)
+ [Data model mappings](scheduledqueries-mappings.md)
+ [Notification messages](scheduledqueries-notification.md)
+ [Error reports](scheduledqueries-errorreport.md)
+ [Patterns and examples](scheduledqueries-examplesandpatterns.md)

## Scheduled query benefits
<a name="scheduledqueries-benifits"></a>

The following are the benefits of scheduled queries:
+ **Operational ease** – Scheduled queries are serverless and fully managed. 
+ **Performance and cost** – Because scheduled queries precompute the aggregates, rollups, or other real-time analytics operations for your data and store the results in a table, queries that access tables populated by scheduled queries contain less data than the source tables. Therefore, queries that are run on these tables are faster and cheaper. Tables populated by scheduled computations contain less data than their source tables, and therefore help reduce the storage cost. You can also retain this data for a longer duration in the memory store at a fraction of the cost of retaining the source data in the memory store.
+ **Interoperability** – Tables populated by scheduled queries offer all of the existing functionality of Timestream for LiveAnalytics tables and can be used with all of the services and tools that work with Timestream for LiveAnalytics. See [Working with Other Services](https://docs.aws.amazon.com/timestream/latest/developerguide/OtherServices.html) for details.

## Scheduled query use cases
<a name="scheduledqueries-usescases"></a>

You can use scheduled queries for business reports that summarize the end-user activity from your applications, so you can train machine learning models for personalization. You can also use scheduled queries for alarms that detect anomalies, network intrusions, or fraudulent activity, so you can take immediate remedial actions. 

Additionally, you can use scheduled queries for more effective data governance. You can do this by granting source table access exclusively to the scheduled queries, and providing your developers access to only the tables populated by scheduled queries. This minimizes the impact of unintentional, long-running queries.

## Example: Using real-time analytics to detect fraudulent payments and make better business decisions
<a name="scheduledqueries-example"></a>

Consider a payment system that processes transactions sent from multiple point-of-sale terminals distributed across major metropolitan cities in the United States. You want to use Amazon Timestream for LiveAnalytics to store and analyze the transaction data, so you can detect fraudulent transactions and run real-time analytics queries. These queries can help you answer business questions such as identifying the busiest and least used point-of-sale terminals per hour, the busiest hour of the day for each city, and the city with most transactions per hour. 

The system process \$1100K transactions per minute. Each transaction stored in Amazon Timestream for LiveAnalytics is 100 bytes. You've configured 10 queries that run every minute to detect various kinds of fraudulent payments. You've also created 25 queries that aggregate and slice/dice your data along various dimensions to help answer your business questions. Each of these queries processes the last hour's data. 

You've created a dashboard to display the data generated by these queries. The dashboard contains 25 widgets, it is refreshed every hour, and it is typically accessed by 10 users at any given time. Finally, your memory store is configured with a 2-hour data retention period and the magnetic store is configured to have a 6-month data retention period. 

In this case, you can use real-time analytics queries that recompute the data every time the dashboard is accessed and refreshed, or use derived tables for the dashboard. The query cost for dashboards based on real-time analytics queries will be \$1120.70 per month. In contrast, the cost of dashboarding queries powered by derived tables will be \$112.27 per month (see [Amazon Timestream for LiveAnalytics pricing](https://aws.amazon.com/timestream/pricing/)). In this case, using derived tables reduces the query cost by \$110 times.

# Scheduled query concepts
<a name="scheduledqueries-concepts"></a>

**Query string** - This is the query whose result you are pre-computing and storing in another Timestream for LiveAnalytics table. You can define a scheduled query using the full SQL surface area of Timestream for LiveAnalytics, which provides you the flexibility of writing queries with common table expressions, nested queries, window functions, or any kind of aggregate and scalar functions that are supported by [Timestream for LiveAnalytics query language](https://docs.aws.amazon.com/timestream/latest/developerguide/reference.html).

**Schedule expression** - Allows you to specify when your scheduled query instances are run. You can specify the expressions using a cron expression (such as run at 8 AM UTC every day) or rate expression (such as run every 10 minutes). 

**Target configuration** - Allows you to specify how you map the result of a scheduled query into the destination table where the results of this scheduled query will be stored. 

**Notification configuration** -Timestream for LiveAnalytics automatically runs instances of a scheduled query based on your schedule expression. You receive a notification for every such query run on an SNS topic that you configure when you create a scheduled query. This notification specifies whether the instance was successfully run or encountered any errors. In addition, it provides information such as the bytes metered, data written to the target table, next invocation time, and so on.

The following is an example of this kind of notification message.

```
{
    "type":"AUTO_TRIGGER_SUCCESS",
    "arn":"arn:aws:timestream:us-east-1:123456789012:scheduled-query/ PT1mPerMinutePerRegionMeasureCount-9376096f7309",
    "nextInvocationEpochSecond":1637302500,
    "scheduledQueryRunSummary":
    {
        "invocationEpochSecond":1637302440,
        "triggerTimeMillis":1637302445697,
        "runStatus":"AUTO_TRIGGER_SUCCESS",
        "executionStats":
        {
            "executionTimeInMillis":21669,
            "dataWrites":36864,
            "bytesMetered":13547036820,
            "recordsIngested":1200,
            "queryResultRows":1200
        }
    }
}
```

In this notification message, `bytesMetered` is the bytes that the query scanned on the source table, and dataWrites is the bytes written to the target table. 

**Note**  
 If you are consuming these notifications programmatically, be aware that new fields could be added to the notification message in the future.

**Error report location** - Scheduled queries asynchronously run and store data in the target table. If an instance encounters any errors (for example, invalid data which could not be stored), the records that encountered errors are written to an error report in the error report location you specify at creation of a scheduled query. You specify the S3 bucket and prefix for the location. Timestream for LiveAnalytics appends the scheduled query name and invocation time to this prefix to help you identify the errors associated with a specific instance of a scheduled query.

**Tagging** - You can optionally specify tags that you can associate with a scheduled query. For more details, see [Tagging Timestream for LiveAnalytics Resources](https://docs.aws.amazon.com/timestream/latest/developerguide/tagging-keyspaces.html).

**Example**

In the following example, you compute a simple aggregate using a scheduled query:

```
SELECT region, bin(time, 1m) as minute, 
    SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints 
FROM raw_data.devops 
WHERE time BETWEEN @scheduled_runtime - 10m AND @scheduled_runtime + 1m 
GROUP BY bin(time, 1m), region
```

`@scheduled_runtime parameter` - In this example, you will notice the query accepting a special named parameter `@scheduled_runtime`. This is a special parameter (of type Timestamp) that the service sets when invoking a specific instance of a scheduled query so that you can deterministically control the time range for which a specific instance of a scheduled query analyzes the data in the source table. You can use `@scheduled_runtime` in your query in any location where a Timestamp type is expected.

Consider an example where you set a schedule expression: cron(0/5 \$1 \$1 \$1 ? \$1) where the scheduled query will run at minute 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 of every hour. For the instance that is triggered at 2021-12-01 00:05:00, the @scheduled\$1runtime parameter is initialized to this value, such that the instance at this time operates on data in the range 2021-11-30 23:55:00 to 2021-12-01 00:06:00.

**Instances with overlapping time ranges** - As you will see in this example, two subsequent instances of a scheduled query can overlap in their time ranges. This is something you can control based on your requirements, the time predicates you specify, and the schedule expression. In this case, this overlap allows these computations to update the aggregates based on any data whose arrival was slightly delayed, up to 10 minutes in this example. The query run triggered at 2021-12-01 00:00:00 will cover the time range 2021-11-30 23:50:00 to 2021-12-30 00:01:00 and the query run triggered at 2021-12-01 00:05:00 will cover the range 2021-11-30 23:55:00 to 2021-12-01 00:06:00. 

To ensure correctness and to make sure that the aggregates stored in the target table match the aggregates computed from the source table, Timestream for LiveAnalytics ensures that the computation at 2021-12-01 00:05:00 will be performed only after the computation at 2021-12-01 00:00:00 has completed. The results of the latter computations can update any previously materialized aggregate if a newer value is generated. Internally, Timestream for LiveAnalytics uses record versions where records generated by latter instances of a scheduled query will be assigned a higher version number. Therefore, the aggregates computed by the invocation at 2021-12-01 00:05:00 can update the aggregates computed by the invocation at 2021-12-01 00:00:00, assuming newer data is available on the source table.

**Automatic triggers vs. manual triggers** - After a scheduled query is created, Timestream for LiveAnalytics will automatically run the instances based on the specified schedule. Such automated triggers are managed entirely by the service. 

However, there might be scenarios where you might want to manually initiate some instances of a scheduled query. Examples include if a specific instance failed in a query run, if there was late-arriving data or updates in the source table after the automated schedule run, or if you want to update the target table for time ranges that are not covered by automated query runs (for example, for time ranges before creation of a scheduled query). 

You can use the ExecuteScheduledQuery API to manually initiate a specific instance of a scheduled query by passing the InvocationTime parameter, which is a value used for the @scheduled\$1runtime parameter. The following are a few important considerations when using the ExecuteScheduledQuery API:
+ If you are triggering multiple of these invocations, you need to make sure that these invocations do not generate results in overlapping time ranges. If you cannot ensure non-overlapping time ranges, then make sure that these query runs are initiated sequentially one after the other. If you concurrently initiate multiple query runs that overlap in their time ranges, then you can see trigger failures where you might see version conflicts in the error reports for these query runs.
+ You can initiate the invocations with any timestamp value for @scheduled\$1runtime. So it is your responsibility to appropriately set the values so the appropriate time ranges are updated in the target table corresponding to the ranges where data was updated in the source table.
+ The ExecuteScheduledQuery API operates asynchronously. Upon a successful call, the service sends a 200 response and proceeds to execute the query. However, if there are multiple scheduled query executions concurrently running, anticipate potential delays in executing manually triggered scheduled executions. 

# Schedule expressions for scheduled queries
<a name="scheduledqueries-schedule"></a>

You can create scheduled queries on an automated schedule by using Amazon Timestream for LiveAnalytics scheduled queries that use cron or rate expressions. All scheduled queries use the UTC time zone, and the minimum possible precision for schedules is 1 minute. 

Two ways to specify the schedule expressions are *cron* and *rate*. Cron expressions offer more fine grained schedule control, while rate expressions are simpler to express but lack the fine-grained control. 

For example, with a cron expression, you can define a scheduled query that gets triggered at a specified time on a certain day of each week or month, or a specified minute every hour only on Monday - Friday, and so on. In contrast, rate expressions initiate a scheduled query at a regular rate, such as once every minute, hour, or day, starting from the exact time when the scheduled query is created.

**Cron expression**
+ *Syntax*

  ```
  cron(fields)
  ```

  Cron expressions have six required fields, which are separated by white space.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries-schedule.html)

**Wildcard characters**
  + The \$1,\$1 (comma) wildcard includes additional values. In the Month field, JAN,FEB,MAR would include January, February, and March.
  + The \$1-\$1 (dash) wildcard specifies ranges. In the Day field, 1-15 would include days 1 through 15 of the specified month. 
  + The \$1\$1\$1 (asterisk) wildcard includes all values in the field. In the Hours field, \$1\$1\$1 would include every hour. You cannot use \$1\$1\$1 in both the Day-of-month and Day-of-week fields. If you use it in one, you must use \$1?\$1 in the other.
  + The \$1/\$1 (forward slash) wildcard specifies increments. In the Minutes field, you could enter 1/10 to specify every 10th minute, starting from the first minute of the hour (for example, the 11th, 21st, and 31st minute, and so on). 
  + The \$1?\$1 (question mark) wildcard specifies one or another. In the Day-of-month field you could enter \$17\$1 and if you didn't care what day of the week the 7th was, you could enter \$1?\$1 in the Day-of-week field.
  + The \$1L\$1 wildcard in the Day-of-month or Day-of-week fields specifies the last day of the month or week. 
  + The W wildcard in the Day-of-month field specifies a weekday. In the Day-of-month field, 3W specifies the weekday closest to the third day of the month. 
  + The \$1\$1\$1 wildcard in the Day-of-week field specifies a certain instance of the specified day of the week within a month. For example, 3\$12 would be the second Tuesday of the month: the 3 refers to Tuesday because it is the third day of each week, and the 2 refers to the second day of that type within the month. 
**Note**  
If you use a '\$1' character, you can define only one expression in the day-of-week field. For example, "3\$11,6\$13" is not valid because it is interpreted as two expressions. 

**Limitations**
  + You can't specify the Day-of-month and Day-of-week fields in the same cron expression. If you specify a value (or a \$1) in one of the fields, you must use a \$1?\$1 (question mark) in the other.
  + Cron expressions that lead to rates faster than 1 minute are not supported.

  **Examples**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries-schedule.html)

**Rate expressions**
+ A rate expression starts when you create the scheduled event rule, and then runs on its defined schedule. Rate expressions have two required fields. Fields are separated by white space. 

  *Syntax*

  ```
  rate(value unit)
  ```
  + `value`: A positive number.
  + `unit`: The unit of time. Different units are required for values of 1 (for example, minute) and values over 1 (for example, minutes). Valid values: minute \$1 minutes \$1 hour \$1 hours \$1 day \$1 days

# Data model mappings for scheduled queries
<a name="scheduledqueries-mappings"></a>

Timestream for LiveAnalytics supports flexible modeling of data in its tables and this same flexibility applies to results of scheduled queries that are materialized into another Timestream for LiveAnalytics table. With scheduled queries, you can query any table, whether it has data in multi-measure records or single-measure records and write the query results using either multi-measure or single-measure records. 

You use the TargetConfiguration in the specification of a scheduled query to map the query results to the appropriate columns in the destination derived table. The following sections describe the different ways of specifying this TargetConfiguration to achieve different data models in the derived table. Specifically, you will see:
+ How to write to multi-measure records when the query result does not have a measure name and you specify the target measure name in the TargetConfiguration.
+ How you use measure name in the query result to write multi-measure records. 
+ How you can define a model to write multiple records with different multi-measure attributes.
+ How you can define a model to write to single-measure records in the derived table.
+ How you can query single-measure records and/or multi-measure records in a scheduled query and have the results materialized to either a single-measure record or a multi-measure record, which allows you to choose the flexibility of data models.

## Example: Target measure name for multi-measure records
<a name="scheduledqueries-mappings-targetmeasurename"></a>

In this example, you will see that the query is reading data from a table with multi-measure data and is writing the results into another table using multi-measure records. The scheduled query result does not have a natural measure name column. Here, you specify the measure name in the derived table using the TargetMultiMeasureName property in the TargetConfiguration.TimestreamConfiguration. 

```
{
    "Name" : "CustomMultiMeasureName",
    "QueryString" : "SELECT region, bin(time, 1h) as hour, AVG(memory_cached) as avg_mem_cached_1h, MIN(memory_free) as min_mem_free_1h, MAX(memory_used) as max_mem_used_1h, SUM(disk_io_writes) as sum_1h, AVG(disk_used) as avg_disk_used_1h, AVG(disk_free) as avg_disk_free_1h, MAX(cpu_user) as max_cpu_user_1h, MIN(cpu_idle) as min_cpu_idle_1h, MAX(cpu_system) as max_cpu_system_1h FROM raw_data.devops_multi WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h AND measure_name = 'metrics' GROUP BY region, bin(time, 1h)",
    "ScheduleConfiguration" : {
        "ScheduleExpression" : "cron(0 0/1 * * ? *)"
    },
    "NotificationConfiguration" : {
        "SnsConfiguration" : {
            "TopicArn" : "******"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******",
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName" : "derived",
            "TableName" : "dashboard_metrics_1h_agg_1",
            "TimeColumn" : "hour",
            "DimensionMappings" : [
                {
                    "Name": "region",
                    "DimensionValueType" : "VARCHAR"
                }
            ],
            "MultiMeasureMappings" : {
                "TargetMultiMeasureName": "dashboard-metrics",
                "MultiMeasureAttributeMappings" : [
                    {
                        "SourceColumn" : "avg_mem_cached_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName" : "avgMemCached"
                    },
                    {
                        "SourceColumn" : "min_mem_free_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "max_mem_used_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "sum_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName" : "totalDiskWrites"
                    },
                    {
                        "SourceColumn" : "avg_disk_used_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "avg_disk_free_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "max_cpu_user_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName" : "CpuUserP100"
                    },
                    {
                        "SourceColumn" : "min_cpu_idle_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "max_cpu_system_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName" : "CpuSystemP100"
                    }  
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    }
}
```

The mapping in this example creates one multi-measure record with measure name dashboard-metrics and attribute names avgMemCached, min\$1mem\$1free\$11h, max\$1mem\$1used\$11h, totalDiskWrites, avg\$1disk\$1used\$11h, avg\$1disk\$1free\$11h, CpuUserP100, min\$1cpu\$1idle\$11h, CpuSystemP100. Notice the optional use of TargetMultiMeasureAttributeName to rename the query output columns to a different attribute name used for result materialization.

The following is the schema for the destination table once this scheduled query is materialized. As you can see from the Timestream for LiveAnalytics attribute type in the following result, the results are materialized into a multi-measure record with a single-measure name `dashboard-metrics`, as shown in the measure schema.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  region  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  time  |  timestamp  |  TIMESTAMP  | 
|  CpuSystemP100  |  double  |  MULTI  | 
|  avgMemCached  |  double  |  MULTI  | 
|  min\$1cpu\$1idle\$11h  |  double  |  MULTI  | 
|  avg\$1disk\$1free\$11h  |  double  |  MULTI  | 
|  avg\$1disk\$1used\$11h  |  double  |  MULTI  | 
|  totalDiskWrites  |  double  |  MULTI  | 
|  max\$1mem\$1used\$11h  |  double  |  MULTI  | 
|  min\$1mem\$1free\$11h  |  double  |  MULTI  | 
|  CpuUserP100  |  double  |  MULTI  | 

The following are the corresponding measures obtained with a SHOW MEASURES query.


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
|  dashboard-metrics  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 

## Example: Using measure name from scheduled query in multi-measure records
<a name="scheduledqueries-mappings-usingmeasurename"></a>

In this example, you will see a query reading from a table with single-measure records and materializing the results into multi-measure records. In this case, the scheduled query result has a column whose values can be used as measure names in the target table where the results of the scheduled query is materialized. Then you can specify the measure name for the multi-measure record in the derived table using the MeasureNameColumn property in TargetConfiguration.TimestreamConfiguration. 

```
{
    "Name" : "UsingMeasureNameFromQueryResult",
    "QueryString" : "SELECT region, bin(time, 1h) as hour, measure_name, AVG(CASE WHEN measure_name IN ('memory_cached', 'disk_used', 'disk_free') THEN measure_value::double ELSE NULL END) as avg_1h, MIN(CASE WHEN measure_name IN ('memory_free', 'cpu_idle') THEN measure_value::double ELSE NULL END) as min_1h, SUM(CASE WHEN measure_name IN ('disk_io_writes') THEN measure_value::double ELSE NULL END) as sum_1h, MAX(CASE WHEN measure_name IN ('memory_used', 'cpu_user', 'cpu_system') THEN measure_value::double ELSE NULL END) as max_1h FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h AND measure_name IN ('memory_free', 'memory_used', 'memory_cached', 'disk_io_writes', 'disk_used', 'disk_free', 'cpu_user', 'cpu_system', 'cpu_idle') GROUP BY region, measure_name, bin(time, 1h)",
    "ScheduleConfiguration" : {
        "ScheduleExpression" : "cron(0 0/1 * * ? *)"
    },
    "NotificationConfiguration" : {
        "SnsConfiguration" : {
            "TopicArn" : "******"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******",
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName" : "derived",
            "TableName" : "dashboard_metrics_1h_agg_2",
            "TimeColumn" : "hour",
            "DimensionMappings" : [
                {
                    "Name": "region",
                    "DimensionValueType" : "VARCHAR"
                }
            ],
            "MeasureNameColumn" : "measure_name",
            "MultiMeasureMappings" : {
                "MultiMeasureAttributeMappings" : [
                    {
                        "SourceColumn" : "avg_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "min_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName": "p0_1h"
                    },
                    {
                        "SourceColumn" : "sum_1h",
                        "MeasureValueType" : "DOUBLE"
                    },
                    {
                        "SourceColumn" : "max_1h",
                        "MeasureValueType" : "DOUBLE",
                        "TargetMultiMeasureAttributeName": "p100_1h"
                    } 
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    }
}
```

The mapping in this example will create multi-measure records with attributes avg\$11h, p0\$11h, sum\$11h, p100\$11h and will use the values of the measure\$1name column in the query result as the measure name for the multi-measure records in the destination table. Additionally note that the previous examples optionally use the TargetMultiMeasureAttributeName with a subset of the mappings to rename the attributes. For instance, min\$11h was renamed to p0\$11h and max\$11h is renamed to p100\$11h.

The following is the schema for the destination table once this scheduled query is materialized. As you can see from the Timestream for LiveAnalytics attribute type in the following result, the results are materialized into a multi-measure record. If you look at the measure schema, there were nine different measure names that were ingested which correspond to the values seen in the query results.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  region  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  time  |  timestamp  |  TIMESTAMP  | 
|  sum\$11h  |  double  |  MULTI  | 
|  p100\$11h  |  double  |  MULTI  | 
|  p0\$11h  |  double  |  MULTI  | 
|  avg\$11h  |  double  |  MULTI  | 

The following are corresponding measures obtained with a SHOW MEASURES query.


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
|  cpu\$1idle  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  cpu\$1system  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  cpu\$1user  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1free  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1io\$1writes  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1used  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  memory\$1cached  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  memory\$1free  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  memory\$1free  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 

## Example: Mapping results to different multi-measure records with different attributes
<a name="scheduledqueries-mappings-mappintresultstodiffrentmultimeasure"></a>

The following example shows how you can map different columns in your query result into different multi-measure records with different measure names. If you see the following scheduled query definition, the result of the query has the following columns: region, hour, avg\$1mem\$1cached\$11h, min\$1mem\$1free\$11h, max\$1mem\$1used\$11h, total\$1disk\$1io\$1writes\$11h, avg\$1disk\$1used\$11h, avg\$1disk\$1free\$11h, max\$1cpu\$1user\$11h, max\$1cpu\$1system\$11h, min\$1cpu\$1system\$11h. `region` is mapped to dimension, and `hour` is mapped to the time column. 

The MixedMeasureMappings property in TargetConfiguration.TimestreamConfiguration specifies how to map the measures to multi-measure records in the derived table. 

In this specific example, avg\$1mem\$1cached\$11h, min\$1mem\$1free\$11h, max\$1mem\$1used\$11h are used in one multi-measure record with measure name of mem\$1aggregates, total\$1disk\$1io\$1writes\$11h, avg\$1disk\$1used\$11h, avg\$1disk\$1free\$11h are used in another multi-measure record with measure name of disk\$1aggregates, and finally max\$1cpu\$1user\$11h, max\$1cpu\$1system\$11h, min\$1cpu\$1system\$11h are used in another multi-measure record with measure name cpu\$1aggregates. 

In these mappings, you can also optionally use TargetMultiMeasureAttributeName to rename the query result column to have a different attribute name in the destination table. For instance, the result column avg\$1mem\$1cached\$11h gets renamed to avgMemCached, total\$1disk\$1io\$1writes\$11h gets renamed to totalIOWrites, etc. 

When you're defining the mappings for multi-measure records, Timestream for LiveAnalytics inspects every row in the query results and automatically ignores the column values that have NULL values. As a result, in the case of mappings with multiple measures names, if all the column values for that group in the mapping are NULL for a given row, then no value for that measure name is ingested for that row. 

For example, in the following mapping, avg\$1mem\$1cached\$11h, min\$1mem\$1free\$11h, and max\$1mem\$1used\$11h are mapped to measure name mem\$1aggregates. If for a given row of the query result, all these of the column values are NULL, Timestream for LiveAnalytics won't ingest the measure mem\$1aggregates for that row. If all nine columns for a given row are NULL, then you will see an user error reported in your error report. 

```
{
    "Name" : "AggsInDifferentMultiMeasureRecords",
    "QueryString" : "SELECT region, bin(time, 1h) as hour, AVG(CASE WHEN measure_name = 'memory_cached' THEN measure_value::double ELSE NULL END) as avg_mem_cached_1h, MIN(CASE WHEN measure_name = 'memory_free' THEN measure_value::double ELSE NULL END) as min_mem_free_1h, MAX(CASE WHEN measure_name = 'memory_used' THEN measure_value::double ELSE NULL END) as max_mem_used_1h, SUM(CASE WHEN measure_name = 'disk_io_writes' THEN measure_value::double ELSE NULL END) as total_disk_io_writes_1h, AVG(CASE WHEN measure_name = 'disk_used' THEN measure_value::double ELSE NULL END) as avg_disk_used_1h, AVG(CASE WHEN measure_name = 'disk_free' THEN measure_value::double ELSE NULL END) as avg_disk_free_1h, MAX(CASE WHEN measure_name = 'cpu_user' THEN measure_value::double ELSE NULL END) as max_cpu_user_1h, MAX(CASE WHEN measure_name = 'cpu_system' THEN measure_value::double ELSE NULL END) as max_cpu_system_1h, MIN(CASE WHEN measure_name = 'cpu_idle' THEN measure_value::double ELSE NULL END) as min_cpu_system_1h FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h AND measure_name IN ('memory_cached', 'memory_free', 'memory_used', 'disk_io_writes', 'disk_used', 'disk_free', 'cpu_user', 'cpu_system', 'cpu_idle') GROUP BY region, bin(time, 1h)",
    "ScheduleConfiguration" : {
        "ScheduleExpression" : "cron(0 0/1 * * ? *)"
    },
    "NotificationConfiguration" : {
        "SnsConfiguration" : {
            "TopicArn" : "******"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******",
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName" : "derived",
            "TableName" : "dashboard_metrics_1h_agg_3",
            "TimeColumn" : "hour",
            "DimensionMappings" : [
                {
                    "Name": "region",
                    "DimensionValueType" : "VARCHAR"
                }
            ],
            "MixedMeasureMappings" : [
                {
                    "MeasureValueType" : "MULTI",
                    "TargetMeasureName" : "mem_aggregates",
                    "MultiMeasureAttributeMappings" : [
                        {
                            "SourceColumn" : "avg_mem_cached_1h",
                            "MeasureValueType" : "DOUBLE",
                            "TargetMultiMeasureAttributeName": "avgMemCached"
                        },
                        {
                            "SourceColumn" : "min_mem_free_1h",
                            "MeasureValueType" : "DOUBLE"
                        },
                        {
                            "SourceColumn" : "max_mem_used_1h",
                            "MeasureValueType" : "DOUBLE",
                            "TargetMultiMeasureAttributeName": "maxMemUsed"
                        }
                    ]
                },
                {
                    "MeasureValueType" : "MULTI",
                    "TargetMeasureName" : "disk_aggregates",
                    "MultiMeasureAttributeMappings" : [
                        {
                            "SourceColumn" : "total_disk_io_writes_1h",
                            "MeasureValueType" : "DOUBLE",
                            "TargetMultiMeasureAttributeName": "totalIOWrites"
                        },
                        {
                            "SourceColumn" : "avg_disk_used_1h",
                            "MeasureValueType" : "DOUBLE"
                        },
                        {
                            "SourceColumn" : "avg_disk_free_1h",
                            "MeasureValueType" : "DOUBLE"
                        }
                    ]
                },
                {
                    "MeasureValueType" : "MULTI",
                    "TargetMeasureName" : "cpu_aggregates",
                    "MultiMeasureAttributeMappings" : [
                        {
                            "SourceColumn" : "max_cpu_user_1h",
                            "MeasureValueType" : "DOUBLE"
                        },
                        {
                            "SourceColumn" : "max_cpu_system_1h",
                            "MeasureValueType" : "DOUBLE"
                        },
                        {
                            "SourceColumn" : "min_cpu_idle_1h",
                            "MeasureValueType" : "DOUBLE",
                            "TargetMultiMeasureAttributeName": "minCpuIdle"
                        }
                    ]
                }
            ]
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    }
}
```

The following is the schema for the destination table once this scheduled query is materialized.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  region  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  time  |  timestamp  |  TIMESTAMP  | 
|  minCpuIdle  |  double  |  MULTI  | 
|  max\$1cpu\$1system\$11h  |  double  |  MULTI  | 
|  max\$1cpu\$1user\$11h  |  double  |  MULTI  | 
|  avgMemCached  |  double  |  MULTI  | 
|  maxMemUsed  |  double  |  MULTI  | 
|  min\$1mem\$1free\$11h  |  double  |  MULTI  | 
|  avg\$1disk\$1free\$11h  |  double  |  MULTI  | 
|  avg\$1disk\$1used\$11h  |  double  |  MULTI  | 
|  totalIOWrites  |  double  |  MULTI  | 

The following are the corresponding measures obtained with a SHOW MEASURES query.


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
|  cpu\$1aggregates  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1aggregates  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  mem\$1aggregates  |  multi  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 

## Example: Mapping results to single-measure records with measure name from query results
<a name="scheduledqueries-mappings-mappintresultstosinglemeasurerecords"></a>

The following is an example of a scheduled query whose results are materialized into single-measure records. In this example, the query result has the measure\$1name column whose values will be used as measure names in the target table. You use the MixedMeasureMappings attribute in the TargetConfiguration.TimestreamConfiguration to specify the mapping of the query result column to the scalar measure in the target table. 

In the following example definition, the query result is expected to nine distinct measure\$1name values. You list out all these measure names in the mapping and specify which column to use for the single-measure value for that measure name. For example, in this mapping, if measure name of memory\$1cached is seen for a given result row, then the value in the avg\$11h column is used as the value for the measure when the data is written to the target table. You can optionally use TargetMeasureName to provide a new measure name for this value. 

```
{
    "Name" : "UsingMeasureNameColumnForSingleMeasureMapping",
    "QueryString" : "SELECT region, bin(time, 1h) as hour, measure_name, AVG(CASE WHEN measure_name IN ('memory_cached', 'disk_used', 'disk_free') THEN measure_value::double ELSE NULL END) as avg_1h, MIN(CASE WHEN measure_name IN ('memory_free', 'cpu_idle') THEN measure_value::double ELSE NULL END) as min_1h, SUM(CASE WHEN measure_name IN ('disk_io_writes') THEN measure_value::double ELSE NULL END) as sum_1h, MAX(CASE WHEN measure_name IN ('memory_used', 'cpu_user', 'cpu_system') THEN measure_value::double ELSE NULL END) as max_1h FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h AND measure_name IN ('memory_free', 'memory_used', 'memory_cached', 'disk_io_writes', 'disk_used', 'disk_free', 'cpu_user', 'cpu_system', 'cpu_idle') GROUP BY region, bin(time, 1h), measure_name",
    "ScheduleConfiguration" : {
        "ScheduleExpression" : "cron(0 0/1 * * ? *)"
    },
    "NotificationConfiguration" : {
        "SnsConfiguration" : {
            "TopicArn" : "******"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******",
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName" : "derived",
            "TableName" : "dashboard_metrics_1h_agg_4",
            "TimeColumn" : "hour",
            "DimensionMappings" : [
                {
                    "Name": "region",
                    "DimensionValueType" : "VARCHAR"
                }
            ],
            "MeasureNameColumn" : "measure_name",
            "MixedMeasureMappings" : [
                {
                    "MeasureName" : "memory_cached",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_1h",
                    "TargetMeasureName" : "AvgMemCached"
                },
                {
                    "MeasureName" : "disk_used",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_1h"
                },
                {
                    "MeasureName" : "disk_free",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_1h"
                },
                {
                    "MeasureName" : "memory_free",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "min_1h",
                    "TargetMeasureName" : "MinMemFree"
                },
                {
                    "MeasureName" : "cpu_idle",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "min_1h"
                },
                {
                    "MeasureName" : "disk_io_writes",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "sum_1h",
                    "TargetMeasureName" : "total-disk-io-writes"
                },
                {
                    "MeasureName" : "memory_used",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_1h",
                    "TargetMeasureName" : "maxMemUsed"
                },
                {
                    "MeasureName" : "cpu_user",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_1h"
                },
                {
                    "MeasureName" : "cpu_system",
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_1h"
                }
            ]
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    }
}
```

The following is the schema for the destination table once this scheduled query is materialized. As you can see from the schema, the table is using single-measure records. If you list the measure schema for the table, you will see the nine measures written to based on the mapping provided in the specification.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  region  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  time  |  timestamp  |  TIMESTAMP  | 
|  measure\$1value::double  |  double  |  MEASURE\$1VALUE  | 

The following are the corresponding measures obtained with a SHOW MEASURES query.


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
|  AvgMemCached  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  MinMemFree  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  cpu\$1idle  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  cpu\$1system  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  cpu\$1user  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1free  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  disk\$1used  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  maxMemUsed  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  total-disk-io-writes  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 

## Example: Mapping results to single-measure records with query result columns as measure names
<a name="scheduledqueries-mappings-mappintresultstsolumnsasmeasurename"></a>

In this example, you have a query whose results do not have a measure name column. Instead, you want the query result column name as the measure name when mapping the output to single-measure records. Earlier there was an example where a similar result was written to a multi-measure record. In this example, you will see how to map it to single-measure records if that fits your application scenario. 

Again, you specify this mapping using the MixedMeasureMappings property in TargetConfiguration.TimestreamConfiguration. In the following example, you see that the query result has nine columns. You use the result columns as measure names and the values as the single-measure values. 

For example, for a given row in the query result, the column name avg\$1mem\$1cached\$11h is used as the column name and value associated with column, and avg\$1mem\$1cached\$11h is used as the measure value for the single-measure record. You can also use TargetMeasureName to use a different measure name in the target table. For instance, for values in column sum\$11h, the mapping specifies to use total\$1disk\$1io\$1writes\$11h as the measure name in the target table. If any column's value is NULL, then the corresponding measure is ignored. 

```
{
    "Name" : "SingleMeasureMappingWithoutMeasureNameColumnInQueryResult",
    "QueryString" : "SELECT region, bin(time, 1h) as hour, AVG(CASE WHEN measure_name = 'memory_cached' THEN measure_value::double ELSE NULL END) as avg_mem_cached_1h, AVG(CASE WHEN measure_name = 'disk_used' THEN measure_value::double ELSE NULL END) as avg_disk_used_1h, AVG(CASE WHEN measure_name = 'disk_free' THEN measure_value::double ELSE NULL END) as avg_disk_free_1h, MIN(CASE WHEN measure_name = 'memory_free' THEN measure_value::double ELSE NULL END) as min_mem_free_1h, MIN(CASE WHEN measure_name = 'cpu_idle' THEN measure_value::double ELSE NULL END) as min_cpu_idle_1h, SUM(CASE WHEN measure_name = 'disk_io_writes' THEN measure_value::double ELSE NULL END) as sum_1h, MAX(CASE WHEN measure_name = 'memory_used' THEN measure_value::double ELSE NULL END) as max_mem_used_1h, MAX(CASE WHEN measure_name = 'cpu_user' THEN measure_value::double ELSE NULL END) as max_cpu_user_1h, MAX(CASE WHEN measure_name = 'cpu_system' THEN measure_value::double ELSE NULL END) as max_cpu_system_1h FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h AND measure_name IN ('memory_free', 'memory_used', 'memory_cached', 'disk_io_writes', 'disk_used', 'disk_free', 'cpu_user', 'cpu_system', 'cpu_idle') GROUP BY region, bin(time, 1h)",
    "ScheduleConfiguration" : {
        "ScheduleExpression" : "cron(0 0/1 * * ? *)"
    },
    "NotificationConfiguration" : {
        "SnsConfiguration" : {
            "TopicArn" : "******"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******",
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName" : "derived",
            "TableName" : "dashboard_metrics_1h_agg_5",
            "TimeColumn" : "hour",
            "DimensionMappings" : [
                {
                    "Name": "region",
                    "DimensionValueType" : "VARCHAR"
                }
            ],
            "MixedMeasureMappings" : [
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_mem_cached_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_disk_used_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "avg_disk_free_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "min_mem_free_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "min_cpu_idle_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "sum_1h",
                    "TargetMeasureName" : "total_disk_io_writes_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_mem_used_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_cpu_user_1h"
                },
                {
                    "MeasureValueType" : "DOUBLE",
                    "SourceColumn" : "max_cpu_system_1h"
                }
            ]
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    }
}
```

The following is the schema for the destination table once this scheduled query is materialized. As you can see that the target table is storing records with single-measure values of type double. Similarly, the measure schema for the table shows the nine measure names. Also notice that the measure name total\$1disk\$1io\$1writes\$11h is present since the mapping renamed sum\$11h to total\$1disk\$1io\$1writes\$11h.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  region  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  time  |  timestamp  |  TIMESTAMP  | 
|  measure\$1value::double  |  double  |  MEASURE\$1VALUE  | 

The following are the corresponding measures obtained with a SHOW MEASURES query.


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
|  avg\$1disk\$1free\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  avg\$1disk\$1used\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  avg\$1mem\$1cached\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  max\$1cpu\$1system\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  max\$1cpu\$1user\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  max\$1mem\$1used\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  min\$1cpu\$1idle\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  min\$1mem\$1free\$11h  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 
|  total-disk-io-writes  |  double  |  [\$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1]  | 

# Scheduled query notification messages
<a name="scheduledqueries-notification"></a>

This section describes the messages sent by Timestream for LiveAnalytics when creating, deleting, running, or updating the state of a scheduled query. 


| Notification message name | Structure | Description | 
| --- | --- | --- | 
|  CreatingNotificationMessage  |  <pre>CreatingNotificationMessage {<br />    String arn;<br />    NotificationType type;<br />}</pre>  |  This notification message is sent before sending the response for `CreateScheduledQuery`. The scheduled query is enabled after sending this notification.  *arn* - The ARN of the scheduled query that is being created. *type* - SCHEDULED\$1QUERY\$1CREATING  | 
|  UpdateNotificationMessage  |  <pre> UpdateNotificationMessage {<br />    String arn;<br />    NotificationType type;<br />    QueryState state;<br />}</pre>  |  This notification message is sent when a scheduled query is updated. Timestream for LiveAnalytics can disable the scheduled query, automatically, in case non-recoverable error is encountered, such as: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries-notification.html) *arn* - The ARN of the scheduled query that is being updated. *type* - SCHEDULED\$1QUERY\$1UPDATE *state* - ENABLED or DISABLED  | 
|  DeleteNotificationMessage  |  <pre>DeletionNotificationMessage {<br />    String arn;<br />    NotificationType type;<br />}</pre>  |  This notification message is sent when a scheduled query has been deleted.  *arn* - The ARN of the scheduled query that is being created. *type* - SCHEDULED\$1QUERY\$1DELETED  | 
|  SuccessNotificationMessage  |  <pre>SuccessNotificationMessage {<br />    NotificationType type;<br />    String arn;<br />    Date nextInvocationEpochSecond;<br />    ScheduledQueryRunSummary runSummary;<br />}<br /><br />ScheduledQueryRunSummary {<br />    Date invocationTime;<br />    Date triggerTime;<br />    String runStatus;<br />    ExecutionStats executionstats;<br />    ErrorReportLocation errorReportLocation;<br />    String failureReason;<br />}<br /><br /><br />ExecutionStats {<br />    Long bytesMetered;<br />    Long dataWrites;<br />    Long queryResultRows;<br />    Long recordsIngested;<br />    Long executionTimeInMillis;<br />}<br /><br /><br />ErrorReportLocation {<br />    S3ReportLocation s3ReportLocation;<br />}<br /><br /><br />S3ReportLocation {<br />    String bucketName;<br />    String objectKey;<br />}</pre>  |  This notification message is sent after the scheduled query is run and the results are successfully ingested.  *ARN* - The ARN of the scheduled query that is being deleted. *NotificationType* - AUTO\$1TRIGGER\$1SUCCESS or MANUAL\$1TRIGGER\$1SUCCESS. *nextInvocationEpochSecond* - The next time Timestream for LiveAnalytics will run the scheduled query. *runSummary* - Information about the scheduled query run.  | 
|  FailureNotificationMessage  |  <pre>FailureNotificationMessage {<br />    NotificationType type;<br />    String arn;<br />    ScheduledQueryRunSummary runSummary;<br />}<br /><br />ScheduledQueryRunSummary {<br />    Date invocationTime;<br />    Date triggerTime;<br />    String runStatus;<br />    ExecutionStats executionstats;<br />    ErrorReportLocation errorReportLocation;<br />    String failureReason;<br />}<br /><br /><br />ExecutionStats {<br />    Long bytesMetered;<br />    Long dataWrites;<br />    Long queryResultRows;<br />    Long recordsIngested;<br />    Long executionTimeInMillis;<br />}<br /><br /><br />ErrorReportLocation {<br />    S3ReportLocation s3ReportLocation;<br />}<br /><br /><br />S3ReportLocation {<br />    String bucketName;<br />    String objectKey;<br />}</pre>  |  This notification message is sent when a failure is encountered during a scheduled query run or when ingesting the query results.  *arn* - The ARN of the scheduled query that is being run. *type* - AUTO\$1TRIGGER\$1FAILURE or MANUAL\$1TRIGGER\$1FAILURE. *runSummary* - Information about the scheduled query run.  | 

# Scheduled query error reports
<a name="scheduledqueries-errorreport"></a>

This section describes the location, format, and reasons for error reports generated by Timestream for LiveAnalytics when errors are encountered by running scheduled queries.

**Topics**
+ [Reasons](#scheduled-queries-error-report-reasons)
+ [Location](#scheduled-queries-error-report-location)
+ [Format](#scheduled-queries-error-report-format)
+ [Error types](#scheduled-queries-error-report-error-types)
+ [Example](#scheduled-queries-error-report-example)

## Scheduled query error reports reasons
<a name="scheduled-queries-error-report-reasons"></a>

Error reports are generated for recoverable errors. Error reports are not generated for non-recoverable errors. Timestream for LiveAnalytics can disable the scheduled queries automatically when non-recoverable errors are encountered. These include:
+ `AssumeRole` failure
+ Any 4xx errors encountered when communicating with KMS when a customer-managed KMS key is specified
+ Any 4xx errors encountered when a scheduled query runs
+ Any 4xx errors encountered during ingestion of query results

For non-recoverable errors, Timestream for LiveAnalytics sends a failure notification with a non-recoverable error message. An update notification is also sent which indicates that the scheduled query is disabled.

## Scheduled query error reports location
<a name="scheduled-queries-error-report-location"></a>

A scheduled query error report location has the following naming convention:

```
s3://customer-bucket/customer-prefix/
```

Following is an example scheduled query ARN:

```
arn:aws:timestream:us-east-1:000000000000:scheduled-query/test-query-hd734tegrgfd
```

```
s3://customer-bucket/customer-prefix/test-query-hd734tegrgfd/<InvocationTime>/<Auto or Manual>/<Actual Trigger Time>
```

*Auto* indicates scheduled queries automatically scheduled by Timestream for LiveAnalytics and *Manual* indicates scheduled queries manually triggered by a user via `ExecuteScheduledQuery` API action in Amazon Timestream for LiveAnalytics Query. For more information about `ExecuteScheduledQuery`, see [ExecuteScheduledQuery](https://docs.aws.amazon.com/timestream/latest/developerguide/API_query_ExecuteScheduledQuery.html).

## Scheduled query error reports format
<a name="scheduled-queries-error-report-format"></a>

 The error reports have the following JSON format:

```
{
    "reportId": <String>,            // A unique string ID for all error reports belonging to a particular scheduled query run
    "errors": [ <Error>, ... ],      // One or more errors
}
```

## Scheduled query error types
<a name="scheduled-queries-error-report-error-types"></a>

The `Error` object can be one of three types: 
+ Records Ingestion Errors

  ```
  {
      "reason": <String>,              // The error message String
      "records": [ <Record>, ... ],    // One or more rejected records )
  }
  ```
+ Row Parse and Validation Errors

  ```
  {
      "reason": <String>,        // The error message String
      "rawLine": <String>,       // [Optional] The raw line String that is being parsed into record(s) to be ingested. This line has encountered the above-mentioned parse error.
  }
  ```
+ General Errors

  ```
  {
      "reason": <String>,        // The error message
  }
  ```

## Scheduled query error reports example
<a name="scheduled-queries-error-report-example"></a>

The following is an example of an error report that was produced due to ingestion errors. 

```
{
    "reportId": "C9494AABE012D1FBC162A67EA2C18255",
    "errors": [
        {
            "reason": "The record timestamp is outside the time range [2021-11-12T14:18:13.354Z, 2021-11-12T16:58:13.354Z) of the memory store.",
            "records": [
                {
                    "dimensions": [
                        {
                            "name": "dim0",
                            "value": "d0_1",
                            "dimensionValueType": null
                        },
                        {
                            "name": "dim1",
                            "value": "d1_1",
                            "dimensionValueType": null
                        }
                    ],
                    "measureName": "random_measure_value",
                    "measureValue": "3.141592653589793",
                    "measureValues": null,
                    "measureValueType": "DOUBLE",
                    "time": "1637166175635000000",
                    "timeUnit": "NANOSECONDS",
                    "version": null
                },
                {
                    "dimensions": [
                        {
                            "name": "dim0",
                            "value": "d0_2",
                            "dimensionValueType": null
                        },
                        {
                            "name": "dim1",
                            "value": "d1_2",
                            "dimensionValueType": null
                        }
                    ],
                    "measureName": "random_measure_value",
                    "measureValue": "6.283185307179586",
                    "measureValues": null,
                    "measureValueType": "DOUBLE",
                    "time": "1637166175636000000",
                    "timeUnit": "NANOSECONDS",
                    "version": null
                },
                {
                    "dimensions": [
                        {
                            "name": "dim0",
                            "value": "d0_3",
                            "dimensionValueType": null
                        },
                        {
                            "name": "dim1",
                            "value": "d1_3",
                            "dimensionValueType": null
                        }
                    ],
                    "measureName": "random_measure_value",
                    "measureValue": "9.42477796076938",
                    "measureValues": null,
                    "measureValueType": "DOUBLE",
                    "time": "1637166175637000000",
                    "timeUnit": "NANOSECONDS",
                    "version": null
                },
                {
                    "dimensions": [
                        {
                            "name": "dim0",
                            "value": "d0_4",
                            "dimensionValueType": null
                        },
                        {
                            "name": "dim1",
                            "value": "d1_4",
                            "dimensionValueType": null
                        }
                    ],
                    "measureName": "random_measure_value",
                    "measureValue": "12.566370614359172",
                    "measureValues": null,
                    "measureValueType": "DOUBLE",
                    "time": "1637166175638000000",
                    "timeUnit": "NANOSECONDS",
                    "version": null
                }
            ]
        }
    ]
}
```

# Scheduled query patterns and examples
<a name="scheduledqueries-examplesandpatterns"></a>

This section describes the usage patterns for scheduled queries as well as end-to-end examples.

**Topics**
+ [Sample schema](scheduledqueries-common-schema-example.md)
+ [Patterns](scheduledqueries-patterns.md)
+ [Examples](scheduledqueries-examples.md)

# Scheduled queries sample schema
<a name="scheduledqueries-common-schema-example"></a>

In this example we will use a sample application mimicking a DevOps scenario monitoring metrics from a large fleet of servers. Users want to alert on anomalous resource usage, create dashboards on aggregate fleet behavior and utilization, and perform sophisticated analysis on recent and historical data to find correlations. The following diagram provides an illustration of the setup where a set of monitored instances emit metrics to Timestream for LiveAnalytics. Another set of concurrent users issues queries for alerts, dashboards, or ad-hoc analysis, where queries and ingestion run in parallel. 

![\[Diagram showing data flow from servers to Timestream database, with users querying for analytics.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/schedquery_common_schema_example.png)


The application being monitored is modeled as a highly scaled-out service that is deployed in several regions across the globe. Each region is further subdivided into a number of scaling units called cells that have a level of isolation in terms of infrastructure within the region. Each cell is further subdivided into silos, which represent a level of software isolation. Each silo has five microservices that comprise one isolated instance of the service. Each microservice has several servers with different instance types and OS versions, which are deployed across three availability zones. These attributes that identify the servers emitting the metrics are modeled as [dimensions](https://docs.aws.amazon.com/timestream/latest/developerguide/concepts.html) in Timestream for LiveAnalytics. In this architecture, we have a hierarchy of dimensions (such as region, cell, silo, and microservice\$1name) and other dimensions that cut across the hierarchy (such as instance\$1type and availability\$1zone). 

The application emits a variety of metrics (such as cpu\$1user and memory\$1free) and events (such as task\$1completed and gc\$1reclaimed). Each metric or event is associated with eight dimensions (such as region or cell) that uniquely identify the server emitting it. Data is written with the 20 metrics stored together in a multi-measure record with measure name metrics and all the 5 events are stored together in another multi-measure record with measure name events. The data model, schema, and data generation can be found in the [open-sourced data generator](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/tools/python/perf-scale-workload). In addition to the schema and data distributions, the data generator provides an example of using multiple writers to ingest data in parallel, using the ingestion scaling of Timestream for LiveAnalytics to ingest millions of measurements per second. Below we show the schema (table and measure schema) and some sample data from the data set.

**Topics**
+ [

## Multi-measure records
](#scheduledqueries-common-schema-example-mmr)
+ [

## Single-measure records
](#scheduledqueries-common-schema-example-smr)

## Multi-measure records
<a name="scheduledqueries-common-schema-example-mmr"></a>

**Table Schema**

Below is the table schema once the data is ingested using multi-measure records. It is the output of DESCRIBE query. Assuming the data is ingested into a database raw\$1data and table devops, below is the query.

```
DESCRIBE "raw_data"."devops"
```


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
| availability\$1zone | varchar | DIMENSION | 
| microservice\$1name | varchar | DIMENSION | 
| instance\$1name | varchar | DIMENSION | 
| process\$1name | varchar | DIMENSION | 
| os\$1version | varchar | DIMENSION | 
| jdk\$1version | varchar | DIMENSION | 
| cell | varchar | DIMENSION | 
| region | varchar | DIMENSION | 
| silo | varchar | DIMENSION | 
| instance\$1type | varchar | DIMENSION | 
| measure\$1name | varchar | MEASURE\$1NAME | 
| time | timestamp | TIMESTAMP | 
| memory\$1free | double | MULTI | 
| cpu\$1steal | double | MULTI | 
| cpu\$1iowait | double | MULTI | 
| cpu\$1user | double | MULTI | 
| memory\$1cached | double | MULTI | 
| disk\$1io\$1reads | double | MULTI | 
| cpu\$1hi | double | MULTI | 
| latency\$1per\$1read | double | MULTI | 
| network\$1bytes\$1out | double | MULTI | 
| cpu\$1idle | double | MULTI | 
| disk\$1free | double | MULTI | 
| memory\$1used | double | MULTI | 
| cpu\$1system | double | MULTI | 
| file\$1descriptors\$1in\$1use | double | MULTI | 
| disk\$1used | double | MULTI | 
| cpu\$1nice | double | MULTI | 
| disk\$1io\$1writes | double | MULTI | 
| cpu\$1si | double | MULTI | 
| latency\$1per\$1write | double | MULTI | 
| network\$1bytes\$1in | double | MULTI | 
| task\$1end\$1state | varchar | MULTI | 
| gc\$1pause | double | MULTI | 
| task\$1completed | bigint | MULTI | 
| gc\$1reclaimed | double | MULTI | 

**Measure Schema**

Below is the measure schema returned by the SHOW MEASURES query.

```
SHOW MEASURES FROM "raw_data"."devops"
```


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
| events | multi | [\$1"data\$1type":"varchar","dimension\$1name":"availability\$1zone"\$1,\$1"data\$1type":"varchar","dimension\$1name":"microservice\$1name"\$1,\$1"data\$1type":"varchar","dimension\$1name":"instance\$1name"\$1,\$1"data\$1type":"varchar","dimension\$1name":"process\$1name"\$1,\$1"data\$1type":"varchar","dimension\$1name":"jdk\$1version"\$1,\$1"data\$1type":"varchar","dimension\$1name":"cell"\$1,\$1"data\$1type":"varchar","dimension\$1name":"region"\$1,\$1"data\$1type":"varchar","dimension\$1name":"silo"\$1] | 
| metrics | multi | [\$1"data\$1type":"varchar","dimension\$1name":"availability\$1zone"\$1,\$1"data\$1type":"varchar","dimension\$1name":"microservice\$1name"\$1,\$1"data\$1type":"varchar","dimension\$1name":"instance\$1name"\$1,\$1"data\$1type":"varchar","dimension\$1name":"os\$1version"\$1,\$1"data\$1type":"varchar","dimension\$1name":"cell"\$1,\$1"data\$1type":"varchar","dimension\$1name":"region"\$1,\$1"data\$1type":"varchar","dimension\$1name":"silo"\$1,\$1"data\$1type":"varchar","dimension\$1name":"instance\$1type"\$1] | 

**Example Data**


| region | Cell | Silo | availability\$1zone | microservice\$1name | instance\$1name | instance\$1type | os\$1version | process\$1name | jdk\$1version | measure\$1name | Time | cpu\$1user | cpu\$1system | cpu\$1idle | cpu\$1steal | cpu\$1iowait | cpu\$1nice | cpu\$1hi | cpu\$1si | memory\$1used | memory\$1cached | disk\$1io\$1reads | latency\$1per\$1read | disk\$1io\$1writes | latency\$1per\$1write | disk\$1used | disk\$1free | network\$1bytes\$1in | network\$1bytes\$1out | file\$1descriptors\$1in\$1use | memory\$1free | task\$1end\$1state | gc\$1pause | task\$1completed | gc\$1reclaimed | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com | m5.8xlarge | AL2012 |  |  | metrics | 11/12/2021 12:43 | 62.8 | 0.408 | 34.2 | 0.972 | 0.0877 | 0.103 | 0.567 | 0.844 | 57.6 | 88.9 | 52.6 | 91.9 | 31.7 | 2.25 | 63.5 | 29.2 | 85.3 | 49.8 | 32.3 | 57.6 |  |  |  |  | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com | m5.8xlarge | AL2012 |  |  | metrics | 11/12/2021 12:41 | 56 | 0.923 | 39.9 | 0.799 | 0.532 | 0.655 | 0.851 | 0.317 | 90.5 | 31.9 | 56.6 | 37.1 | 25 | 93.3 | 52.2 | 33.1 | 7.14 | 53.7 | 65.9 | 20.4 |  |  |  |  | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com | m5.8xlarge | AL2012 |  |  | metrics | 11/12/2021 12:39 | 48.5 | 0.801 | 48.2 | 0.18 | 0.943 | 0.0316 | 0.844 | 0.54 | 97.4 | 41.4 | 55.1 | 32.7 | 86.2 | 33.7 | 72.7 | 61.5 | 80.8 | 5.15 | 44.3 | 8.5 |  |  |  |  | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com | m5.8xlarge | AL2012 |  |  | metrics | 11/12/2021 12:38 | 37.5 | 0.723 | 58.8 | 0.317 | 0.608 | 0.859 | 0.791 | 0.393 | 4.84 | 78.9 | 20.3 | 41.4 | 46.8 | 3.87 | 84.6 | 60.6 | 21.1 | 11.8 | 2.76 | 10 |  |  |  |  | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com | m5.8xlarge | AL2012 |  |  | metrics | 11/12/2021 12:36 | 58 | 0.786 | 38.7 | 0.219 | 0.436 | 0.829 | 0.331 | 0.734 | 51 | 36.8 | 81.8 | 50.5 | 77.9 | 17.8 | 82.3 | 64 | 7.69 | 66.5 | 56.2 | 31.3 |  |  |  |  | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com |  |  | host\$1manager | JDK\$18 | events | 11/12/2021 12:43 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 75.8 | SUCCESS\$1WITH\$1NO\$1RESULT | 85.5 | 348 | 64.8 | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com |  |  | host\$1manager | JDK\$18 | events | 11/12/2021 12:41 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 7.47 | SUCCESS\$1WITH\$1RESULT | 22.8 | 42 | 7.45 | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com |  |  | host\$1manager | JDK\$18 | events | 11/12/2021 12:39 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 64.1 | SUCCESS\$1WITH\$1RESULT | 6.77 | 249 | 72.3 | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com |  |  | host\$1manager | JDK\$18 | events | 11/12/2021 12:38 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 23 | SUCCESS\$1WITH\$1RESULT | 53.3 | 138 | 99 | 
| us-east-2 | us-east-2-cell-1 | us-east-2-cell-1-silo-2 | us-east-2-1 | athena | i-zaZswmJk-athena-us-east-2-cell-1-silo-2-00000216.amazonaws.com |  |  | host\$1manager | JDK\$18 | events | 11/12/2021 12:36 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | 39.4 | SUCCESS\$1WITH\$1NO\$1RESULT | 79.6 | 254 | 82.9 | 

## Single-measure records
<a name="scheduledqueries-common-schema-example-smr"></a>

Timestream for LiveAnalytics also allows you to ingest the data with one measure per time series record. Below are the schema details when ingested using single measure records.

**Table Schema**

Below is the table schema once the data is ingested using multi-measure records. It is the output of DESCRIBE query. Assuming the data is ingested into a database raw\$1data and table devops, below is the query.

```
DESCRIBE "raw_data"."devops_single"
```


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
| availability\$1zone | varchar | DIMENSION | 
| microservice\$1name | varchar | DIMENSION | 
| instance\$1name | varchar | DIMENSION | 
| process\$1name | varchar | DIMENSION | 
| os\$1version | varchar | DIMENSION | 
| jdk\$1version | varchar | DIMENSION | 
| cell | varchar | DIMENSION | 
| region | varchar | DIMENSION | 
| silo | varchar | DIMENSION | 
| instance\$1type | varchar | DIMENSION | 
| measure\$1name | varchar | MEASURE\$1NAME | 
| time | timestamp | TIMESTAMP | 
| measure\$1value::double | double | MEASURE\$1VALUE | 
| measure\$1value::bigint | bigint | MEASURE\$1VALUE | 
| measure\$1value::varchar | varchar | MEASURE\$1VALUE | 

**Measure Schema**

Below is the measure schema returned by the SHOW MEASURES query.

```
SHOW MEASURES FROM "raw_data"."devops_single"
```


| measure\$1name | data\$1type | Dimensions | 
| --- | --- | --- | 
| cpu\$1hi | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1idle | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1iowait | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1nice | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1si | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1steal | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1system | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| cpu\$1user | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| disk\$1free | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| disk\$1io\$1reads | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| disk\$1io\$1writes | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| disk\$1used | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| file\$1descriptors\$1in\$1use | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| gc\$1pause | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'process\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'jdk\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1] | 
| gc\$1reclaimed | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'process\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'jdk\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1] | 
| latency\$1per\$1read | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| latency\$1per\$1write | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| memory\$1cached | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| memory\$1free | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'process\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'jdk\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| memory\$1used | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| network\$1bytes\$1in | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| network\$1bytes\$1out | double | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'os\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1type', 'data\$1type': 'varchar'\$1] | 
| task\$1completed | bigint | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'process\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'jdk\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1] | 
| task\$1end\$1state | varchar | [\$1'dimension\$1name': 'availability\$1zone', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'microservice\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'instance\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'process\$1name', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'jdk\$1version', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'cell', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'region', 'data\$1type': 'varchar'\$1, \$1'dimension\$1name': 'silo', 'data\$1type': 'varchar'\$1] | 

**Example Data**


| availability\$1zone | microservice\$1name | instance\$1name | process\$1name | os\$1version | jdk\$1version | Cell | region | Silo | instance\$1type | measure\$1name | Time | measure\$1value::double | measure\$1value::bigint | measure\$1value::varchar | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1hi | 34:57.2 | 0.87169 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1idle | 34:57.2 | 3.46266 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1iowait | 34:57.2 | 0.10226 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1nice | 34:57.2 | 0.63013 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1si | 34:57.2 | 0.16441 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1steal | 34:57.2 | 0.10729 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1system | 34:57.2 | 0.45709 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | cpu\$1user | 34:57.2 | 94.20448 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | disk\$1free | 34:57.2 | 72.51895 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | disk\$1io\$1reads | 34:57.2 | 81.73383 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | disk\$1io\$1writes | 34:57.2 | 77.11665 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | disk\$1used | 34:57.2 | 89.42235 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | file\$1descriptors\$1in\$1use | 34:57.2 | 30.08254 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com | server |  | JDK\$18 | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 |  | gc\$1pause | 34:57.2 | 60.28679 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com | server |  | JDK\$18 | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 |  | gc\$1reclaimed | 34:57.2 | 75.28839 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | latency\$1per\$1read | 34:57.2 | 8.07605 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | latency\$1per\$1write | 34:57.2 | 58.11223 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | memory\$1cached | 34:57.2 | 87.56481 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com | server |  | JDK\$18 | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 |  | memory\$1free | 34:57.2 | 18.95768 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | memory\$1free | 34:57.2 | 97.20523 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | memory\$1used | 34:57.2 | 12.37723 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | network\$1bytes\$1in | 34:57.2 | 31.02065 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com |  | AL2012 |  | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 | r5.4xlarge | network\$1bytes\$1out | 34:57.2 | 0.51424 |  |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com | server |  | JDK\$18 | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 |  | task\$1completed | 34:57.2 |  | 69 |  | 
| eu-west-1-1 | hercules | i-zaZswmJk-hercules-eu-west-1-cell-9-silo-2-00000027.amazonaws.com | server |  | JDK\$18 | eu-west-1-cell-9 | eu-west-1 | eu-west-1-cell-9-silo-2 |  | task\$1end\$1state | 34:57.2 |  |  | SUCCESS\$1WITH\$1RESULT | 

# Scheduled query patterns
<a name="scheduledqueries-patterns"></a>

In this section you will find some common patterns of how you can use Amazon Timestream for LiveAnalytics Scheduled Queries to optimize your dashboards to load faster and at reduced costs. The examples below use a DevOps application scenario to illustrate the key concepts which apply to scheduled queries in general, irrespective of the application scenario.

Scheduled Queries in Timestream for LiveAnalytics allow you to express your queries using the full SQL surface area of Timestream for LiveAnalytics. Your query can include one or more source tables, perform aggregations or any other query allowed by Timestream for LiveAnalytics's SQL language, and then materialize the results of the query in another destination table in Timestream for LiveAnalytics. For ease of exposition, this section refers to this target table of a scheduled query as a *derived table*.

The following are the key points that are covered in this section.
+ Using a simple fleet-level aggregate to explain how you can define a scheduled query and understand some basic concepts. 
+ How you can combine results from the target of a scheduled query (the derived table) with the results from the source table to get the cost and performance benefits of scheduled query. 
+ What are your trade-offs when configuring the refresh period of the scheduled queries. 
+ Using scheduled queries for some common scenarios.
  + Tracking the last data point from every instance before a specific date.
  + Distinct values for a dimension to use for populating variables in a dashboard.
+ How you handle late arriving data in the context of scheduled queries.
+ How you can use one-off manual executions to handle a variety of scenarios not directly covered by automated triggers for scheduled queries.

**Topics**
+ [

# Scenario
](scheduledqueries-patterns-scenario.md)
+ [

# Simple fleet-level aggregates
](scheduledqueries-patterns-simplefleet.md)
+ [

# Last point from each device
](scheduledqueries-patterns-lastpointfromdevice.md)
+ [

# Unique dimension values
](scheduledqueries-patterns-uniquedimvalues.md)
+ [

# Handling late-arriving data
](scheduledqueries-patterns-latearrive.md)
+ [

# Back-filling historical pre-computations
](scheduledqueries-patterns-backfilling.md)

# Scenario
<a name="scheduledqueries-patterns-scenario"></a>

The following examples use a DevOps monitoring scenario which is outlined in [Scheduled queries sample schema](scheduledqueries-common-schema-example.md).

The examples provide the scheduled query definition where you can plug in the appropriate configurations for where to receive execution status notifications for scheduled queries, where to receive reports for errors encountered during execution of a scheduled query, and the IAM role the scheduled query uses to perform its operations.

You can create these scheduled queries after filling in the preceding options, [creating the target ](https://docs.aws.amazon.com/timestream/latest/developerguide/code-samples.create-table.html) (or derived) table, and executing the through the AWS CLI. For example, assume that a scheduled query definition is stored in a file, `scheduled_query_example.json`. You can create the query using the CLI command.

```
aws timestream-query create-scheduled-query --cli-input-json  file://scheduled_query_example.json --profile aws_profile --region us-east-1
```

In the preceding command, the profile passed using the --profile option must have the appropriate permissions to create scheduled queries. See [Identity-based policies for Scheduled Queries](https://docs.aws.amazon.com/timestream/latest/developerguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-sheduledqueries) for detailed instructions for the policies and permissions.

# Simple fleet-level aggregates
<a name="scheduledqueries-patterns-simplefleet"></a>

This first example walks you through some of the basic concepts when working with scheduled queries using a simple example computing fleet-level aggregates. Using this example, you will learn the following.
+ How to take your dashboard query that is used to obtain aggregate statistics and map it to a scheduled query.
+ How Timestream for LiveAnalytics manages the execution of the different instances of your scheduled query.
+ How you can have different instances of scheduled queries overlap in time ranges and how the correctness of data is maintained on the target table to ensure that your dashboard using the results of the scheduled query gives you results that match with the same aggregate computed on the raw data. 
+ How to set the time range and refresh cadence for your scheduled query.
+ How you can self-serve track the results of the scheduled queries to tune them so that the execution latency for the query instances are within the acceptable delays of refreshing your dashboards. 

**Topics**
+ [

## Aggregate from source tables
](#scheduledqueries-patterns-simplefleet-aggrfromsourcetable)
+ [

## Scheduled query to pre-compute aggregates
](#scheduledqueries-patterns-simplefleet-schedtoprecomputeaggr)
+ [

## Aggregate from derived table
](#scheduledqueries-patterns-simplefleet-aggrfromderived)
+ [

## Aggregate combining source and derived tables
](#scheduledqueries-patterns-simplefleet-aggrcombsourceandderived)
+ [

## Aggregate from frequently refreshed scheduled computation
](#scheduledqueries-patterns-simplefleet-aggregatefromrequently)

## Aggregate from source tables
<a name="scheduledqueries-patterns-simplefleet-aggrfromsourcetable"></a>

In this example, you are tracking the number of metrics emitted by the servers within a given region in every minute. The graph below is an example plotting this time series for the region us-east-1.

![\[Time series graph showing fluctuating number of metrics emitted by servers in us-east-1 region.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/schedquery_aggrfromsourcetable.png)


Below is an example query to compute this aggregate from the raw data. It filters the rows for the region us-east-1 and then computes the per minute sum by accounting for the 20 metrics (if measure\$1name is metrics) or 5 events (if measure\$1name is events). In this example, the graph illustration shows that the number of metrics emitted vary between 1.5 Million to 6 Million per minute. When plotting this time series for several hours (past 12 hours in this figure), this query over the raw data analyzes hundreds of millions of rows.

```
WITH grouped_data AS (
    SELECT region, bin(time, 1m) as minute, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints 
    FROM "raw_data"."devops"
    WHERE time BETWEEN from_milliseconds(1636699996445) AND from_milliseconds(1636743196445)
        AND region = 'us-east-1'
    GROUP BY region, measure_name, bin(time, 1m)
)
SELECT minute, SUM(numDataPoints) AS numDataPoints
FROM grouped_data
GROUP BY minute
ORDER BY 1 desc, 2 desc
```

## Scheduled query to pre-compute aggregates
<a name="scheduledqueries-patterns-simplefleet-schedtoprecomputeaggr"></a>

If you would like to optimize your dashboards to load faster and lower your costs by scanning less data, you can use a scheduled query to pre-compute these aggregates. Scheduled queries in Timestream for LiveAnalytics allows you to materialize these pre-computations in another Timestream for LiveAnalytics table, which you can subsequently use for your dashboards.

The first step in creating a scheduled query is to identify the query you want to pre-compute. Note that the preceding dashboard was drawn for region us-east-1. However, a different user may want the same aggregate for a different region, say us-west-2 or eu-west-1. To avoid creating a scheduled query for each such query, you can pre-compute the aggregate for each region and materialize the per-region aggregates in another Timestream for LiveAnalytics table.

The query below provides an example of the corresponding pre-computation. As you can see, it is similar to the common table expression grouped\$1data used in the query on the raw data, except for two differences: 1) it does not use a region predicate, so that we can use one query to pre-compute for all regions; and 2) it uses a parameterized time predicate with a special parameter @scheduled\$1runtime which is explained in details below.

```
SELECT region, bin(time, 1m) as minute, 
    SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints 
FROM raw_data.devops 
WHERE time BETWEEN @scheduled_runtime - 10m AND @scheduled_runtime + 1m 
GROUP BY bin(time, 1m), region
```

The preceding query can be converted into a scheduled query using the following specification. The scheduled query is assigned a Name, which is a user-friendly mnemonic. It then includes the QueryString, a ScheduleConfiguration, which is a [cron expression](https://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries-schedule.html). It specifies the TargetConfiguration which maps the query results to the destination table in Timestream for LiveAnalytics. Finally, it specifies a number of other configurations, such as the NotificationConfiguration, where notifications are sent for individual executions of the query, ErrorReportConfiguration where a report is written in case the query encounters any errors, and the ScheduledQueryExecutionRoleArn, which is the role used to perform operations for the scheduled query.

```
{
    "Name": "MultiPT5mPerMinutePerRegionMeasureCount",
    "QueryString": "SELECT region, bin(time, 1m) as minute, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints FROM raw_data.devops WHERE time BETWEEN @scheduled_runtime - 10m AND @scheduled_runtime + 1m GROUP BY bin(time, 1m), region",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/5 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "per_minute_aggs_pt5m",
            "TimeColumn": "minute",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "numDataPoints",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "numDataPoints",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

In the example, the ScheduleExpression cron(0/5 \$1 \$1 \$1 ? \$1) implies that the query is executed once every 5 minutes at the 5th, 10th, 15th, .. minutes of every hour of every day. These timestamps when a specific instance of this query is triggered is what translates to the @scheduled\$1runtime parameter used in the query. For instance, consider the instance of this scheduled query executing on 2021-12-01 00:00:00. For this instance, the @scheduled\$1runtime parameter is initialized to the timestamp 2021-12-01 00:00:00 when invoking the query. Therefore, this specific instance will execute at timestamp 2021-12-01 00:00:00 and will compute the per-minute aggregates from time range 2021-11-30 23:50:00 to 2021-12-01 00:01:00. Similarly, the next instance of this query is triggered at timestamp 2021-12-01 00:05:00 and in that case, the query will compute per-minute aggregates from the time range 2021-11-30 23:55:00 to 2021-12-01 00:06:00. Hence, the @scheduled\$1runtime parameter provides a scheduled query to pre-compute the aggregates for the configured time ranges using the invocation time for the queries.

Note that two subsequent instances of the query overlap in their time ranges. This is something you can control based on your requirements. In this case, this overlap allows these queries to update the aggregates based on any data whose arrival was slightly delayed, up to 5 minutes in this example. To ensure correctness of the materialized queries, Timestream for LiveAnalytics ensures that the query at 2021-12-01 00:05:00 will be performed only after the query at 2021-12-01 00:00:00 has completed and the results of the latter queries can update any previously materialized aggregate using if a newer value is generated. For example, if some data at timestamp 2021-11-30 23:59:00 arrived after the query for 2021-12-01 00:00:00 executed but before the query for 2021-12-01 00:05:00, then the execution at 2021-12-01 00:05:00 will recompute the aggregates for the minute 2021-11-30 23:59:00 and this will result in the previous aggregate being updated with the newly-computed value. You can rely on these semantics of the scheduled queries to strike a trade-off between how quickly you update your pre-computations versus how you can gracefully handle some data with delayed arrival. Additional considerations are discussed below on how you trade-off this refresh cadence with freshness of the data and how you address updating the aggregates for data that arrives even more delayed or if your source of the scheduled computation has updated values which would require the aggregates to be recomputed. 

Every scheduled computation has a notification configuration where Timestream for LiveAnalytics sends notification of every execution of a scheduled configuration. You can configure an SNS topic for to receive notifications for each invocation. In addition to the success or failure status of a specific instance, it also has several statistics such as the time this computation took to execute, the number of bytes the computation scanned, and the number of bytes the computation wrote to its destination table. You can use these statistics to further tune your query, schedule configuration, or track the spend for your scheduled queries. One aspect worth noting is the execution time for an instance. In this example, the scheduled computation is configured to execute the every 5 minutes. The execution time will determine the delay with which the pre-computation will be available, which will also define the lag in your dashboard when you're using the pre-computed data in your dashboards. Furthermore, if this delay is consistently higher than the refresh interval, for example, if the execution time is more than 5 minutes for a computation configured to refresh every 5 minutes, it is important to tune your computation to run faster to avoid further lag in your dashboards.

## Aggregate from derived table
<a name="scheduledqueries-patterns-simplefleet-aggrfromderived"></a>

Now that you have set up the scheduled queries and the aggregates are pre-computed and materialized to another Timestream for LiveAnalytics table specified in the target configuration of the scheduled computation, you can use the data in that table to write SQL queries to power your dashboards. Below is an equivalent of the query that uses the materialized pre-aggregates to generate the per minute data point count aggregate for us-east-1.

```
SELECT bin(time, 1m) as minute, SUM(numDataPoints) as numDatapoints
FROM "derived"."per_minute_aggs_pt5m"
WHERE time BETWEEN from_milliseconds(1636699996445) AND from_milliseconds(1636743196445)
    AND region = 'us-east-1'
GROUP BY bin(time, 1m)
ORDER BY 1 desc
```

![\[Graph showing data points fluctuating between 0 and 6 million over time from 23:00 to 10:00.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/schedquery_aggrfromderived.png)


The previous figure plots the aggregate computed from the aggregate table. Comparing this panel with the panel computed from the raw source data, you will notice that they match up exactly, albeit these aggregates are delayed by a few minute, controlled by the refresh interval you configured for the scheduled computation plus the time to execute it.

This query over the pre-computed data scans several orders of magnitude lesser data compared to the aggregates computed over the raw source data. Depending on the granularity of aggregations, this reduction can easily result in 100X lower cost and query latency. There is a cost to executing this scheduled computation. However, depending on how frequently these dashboards are refreshed and how many concurrent users load these dashboards, you end up significantly reducing your overall costs by using these pre-computations. And this is on top of 10-100X faster load times for the dashboards.

## Aggregate combining source and derived tables
<a name="scheduledqueries-patterns-simplefleet-aggrcombsourceandderived"></a>

Dashboards created using the derived tables can have a lag. If your application scenario requires the dashboards to have the most recent data, then you can use the power and flexibility of Timestream for LiveAnalytics's SQL support to combine the latest data from the source table with the historical aggregates from the derived table to form a merged view. This merged view uses the union semantics of SQL and non-overlapping time ranges from the source and the derived table. In the example below, we are using the "derived"."per\$1minute\$1aggs\$1pt5m" derived table. Since the scheduled computation for that derived table refreshes once every 5 minutes (per the schedule expression specification), this query below uses the most recent 15 minutes of data from the source table, and any data older than 15 minutes from the derived table and then unions the results to create the merged view that has the best of both worlds: the economics and low latency by reading older pre-computed aggregates from the derived table and the freshness of the aggregates from the source table to power your real time analytics use cases.

Note that this union approach will have slightly higher query latency compared to only querying the derived table and also have slightly higher data scanned, since it is aggregating the raw data in real time to fill in the most recent time interval. However, this merged view will still be significantly faster and cheaper compared to aggregating on the fly from the source table, especially for dashboards rendering days or weeks of data. You can tune the time ranges for this example to suite your application's refresh needs and delay tolerance.

```
WITH aggregated_source_data AS (
    SELECT bin(time, 1m) as minute, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDatapoints 
    FROM "raw_data"."devops"
    WHERE time BETWEEN bin(from_milliseconds(1636743196439), 1m) - 15m AND from_milliseconds(1636743196439)
        AND region = 'us-east-1'
    GROUP BY bin(time, 1m)
), aggregated_derived_data AS (
    SELECT bin(time, 1m) as minute, SUM(numDataPoints) as numDatapoints
    FROM "derived"."per_minute_aggs_pt5m"
    WHERE time BETWEEN from_milliseconds(1636699996439) AND bin(from_milliseconds(1636743196439), 1m) - 15m
        AND region = 'us-east-1'
    GROUP BY bin(time, 1m)
)
SELECT minute, numDatapoints
FROM (
    (
    SELECT *
    FROM aggregated_derived_data
    )
    UNION
    (
    SELECT *
    FROM aggregated_source_data
    )
)
ORDER BY 1 desc
```

Below is the dashboard panel with this unified merged view. As you can see, the dashboard looks almost identical to the view computed from the derived table, except for that it will have the most up-to-date aggregate at the rightmost tip.

![\[Time-series graph showing fluctuating data points over 11 hours, with peaks around 6 million.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/schedquery_aggrcombsourceandderived.png)


## Aggregate from frequently refreshed scheduled computation
<a name="scheduledqueries-patterns-simplefleet-aggregatefromrequently"></a>

Depending on how frequently your dashboards are loaded and how much latency you want for your dashboard, there is another approach to obtaining fresher results in your dashboard: having the scheduled computation refresh the aggregates more frequently. For instance, below is configuration of the same scheduled computation, except that it refreshes once every minute (note the schedule express cron(0/1 \$1 \$1 \$1 ? \$1)). With this setup, the derived table per\$1minute\$1aggs\$1pt1m will have much more recent aggregates compared to the scenario where the computation specified a refresh schedule of once every 5 minutes.

```
{
    "Name": "MultiPT1mPerMinutePerRegionMeasureCount",
    "QueryString": "SELECT region, bin(time, 1m) as minute, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints FROM raw_data.devops WHERE time BETWEEN @scheduled_runtime - 10m AND @scheduled_runtime + 1m GROUP BY bin(time, 1m), region",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/1 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "per_minute_aggs_pt1m",
            "TimeColumn": "minute",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "numDataPoints",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "numDataPoints",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

```
SELECT bin(time, 1m) as minute, SUM(numDataPoints) as numDatapoints
FROM "derived"."per_minute_aggs_pt1m"
WHERE time BETWEEN from_milliseconds(1636699996446) AND from_milliseconds(1636743196446)
    AND region = 'us-east-1'
GROUP BY bin(time, 1m), region
ORDER BY 1 desc
```

Since the derived table has more recent aggregates, you can now directly query the derived table per\$1minute\$1aggs\$1pt1m to get fresher aggregates, as can be seen from the previous query and the dashboard snapshot below.

![\[Graph showing fluctuating data points over time, with peaks reaching 6 million and valleys near 1 million.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/schedquery_aggregatefromrequently.png)


Note that refreshing the scheduled computation at a faster schedule (say 1 minute compared to 5 minutes) will increase the maintenance costs for the scheduled computation. The notification message for every computation's execution provides statistics for how much data was scanned and how much was written to the derived table. Similarly, if you use the merged view to union the derived table, you query costs on the merged view and the dashboard load latency will be higher compared to only querying the derived table. Therefore, the approach you pick will depend on how frequently your dashboards are refreshed and the maintenance costs for the scheduled queries. If you have tens of users refreshing the dashboards once every minute or so, having a more frequent refresh of your derived table will likely result in overall lower costs.

# Last point from each device
<a name="scheduledqueries-patterns-lastpointfromdevice"></a>

Your application may require you to read the last measurement emitted by a device. There can be more general use cases to obtain the last measurement for a device before a given date/time or the first measurement for a device after a given date/time. When you have millions of devices and years of data, this search might require scanning large amounts of data.

Below you will see an example of how you can use scheduled queries to optimize searching for the last point emitted by a device. You can use the same pattern to optimize the first point query as well if your application needs them.

**Topics**
+ [

## Computed from source table
](#scheduledqueries-patterns-lastpointfromdevice-computedfromsrctable)
+ [

## Derived table to precompute at daily granularity
](#scheduledqueries-patterns-lastpointfromdevice-derivedttabletoprecompute)
+ [

## Computed from derived table
](#scheduledqueries-patterns-lastpointfromdevice-computedfromderivedtable)
+ [

## Combining from source and derived table
](#scheduledqueries-patterns-lastpointfromdevice-combinesourceandderived)

## Computed from source table
<a name="scheduledqueries-patterns-lastpointfromdevice-computedfromsrctable"></a>

Below is an example query to find the last measurement emitted by the services in a specific deployment (for example, servers for a given micro-service within a given region, cell, silo, and availability\$1zone). In the example application, this query will return the last measurement for hundreds of servers. Also note that this query has an unbounded time predicate and looks for any data older than a given timestamp.

**Note**  
For information about the `max` and `max_by` functions, see [Aggregate functions](aggregate-functions.md).

```
SELECT instance_name, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure
FROM "raw_data"."devops"
WHERE time < from_milliseconds(1636685271872)
    AND measure_name = 'events'
    AND region = 'us-east-1'
    AND cell = 'us-east-1-cell-10'
    AND silo = 'us-east-1-cell-10-silo-3'
    AND availability_zone = 'us-east-1-1'
    AND microservice_name = 'hercules'
GROUP BY region, cell, silo, availability_zone, microservice_name,
    instance_name, process_name, jdk_version
ORDER BY instance_name, time DESC
```

## Derived table to precompute at daily granularity
<a name="scheduledqueries-patterns-lastpointfromdevice-derivedttabletoprecompute"></a>

You can convert the preceding use case into a scheduled computation. If your application requirements are such that you may need to obtain these values for your entire fleet across multiple regions, cells, silos, availability zones and microservices, you can use one schedule computation to pre-compute the values for your entire fleet. That is the power of Timestream for LiveAnalytics's serverless scheduled queries that allows these queries to scale with your application's scaling requirements.

Below is a query to pre-compute the last point across all the servers for a given day. Note that the query only has a time predicate and not a predicate on the dimensions. The time predicate limits the query to the past day from the time when the computation is triggered based on the specified schedule expression.

```
SELECT region, cell, silo, availability_zone, microservice_name, 
    instance_name, process_name, jdk_version, 
    MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure 
FROM raw_data.devops 
WHERE time BETWEEN bin(@scheduled_runtime, 1d) - 1d AND bin(@scheduled_runtime, 1d) 
    AND measure_name = 'events' 
GROUP BY region, cell, silo, availability_zone, microservice_name, 
    instance_name, process_name, jdk_version
```

Below is a configuration for the scheduled computation using the preceding query which executes that query at 01:00 hrs UTC every day to compute the aggregate for the past day. The schedule expression cron(0 1 \$1 \$1 ? \$1) controls this behavior and runs an hour after the day has ended to consider any data arriving up to a day late.

```
{
    "Name": "PT1DPerInstanceLastpoint",
    "QueryString": "SELECT region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1d) - 1d AND bin(@scheduled_runtime, 1d) AND measure_name = 'events' GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, process_name, jdk_version",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0 1 * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "per_timeseries_lastpoint_pt1d",
            "TimeColumn": "time",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "cell",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "silo",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "availability_zone",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "instance_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "process_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "jdk_version",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "last_measure",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "last_measure",
                        "MeasureValueType": "DOUBLE"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

## Computed from derived table
<a name="scheduledqueries-patterns-lastpointfromdevice-computedfromderivedtable"></a>

Once you define the derived table using the preceding configuration and at least one instance of the scheduled query has materialized data into the derived table, you can now query the derived table to get the latest measurement. Below is an example query on the derived table.

```
SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure
FROM "derived"."per_timeseries_lastpoint_pt1d"
WHERE time < from_milliseconds(1636746715649)
    AND measure_name = 'last_measure'
    AND region = 'us-east-1'
    AND cell = 'us-east-1-cell-10'
    AND silo = 'us-east-1-cell-10-silo-3'
    AND availability_zone = 'us-east-1-1'
    AND microservice_name = 'hercules'
GROUP BY region, cell, silo, availability_zone, microservice_name,
    instance_name, process_name, jdk_version
ORDER BY instance_name, time DESC
```

## Combining from source and derived table
<a name="scheduledqueries-patterns-lastpointfromdevice-combinesourceandderived"></a>

Similar to the previous example, any data from the derived table will not have the most recent writes. Therefore, you can again use a similar pattern as earlier to merge the data from the derived table for the older data and use the source data for the remaining tip. Below is an example of such a query using the similar UNION approach. Since the application requirement is to find the latest measurement before a time period, and this start time can be in past, the way you write this query is to use the provided time, use the source data for up to a day old from the specified time, and then use the derived table on the older data. As you can see from the query example below, the time predicate on the source data is bounded. That ensures efficient processing on the source table which has significantly higher volume of data, and then the unbounded time predicate is on the derived table.

```
WITH last_point_derived AS (
    SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure
    FROM "derived"."per_timeseries_lastpoint_pt1d"
    WHERE time < from_milliseconds(1636746715649)
        AND measure_name = 'last_measure'
        AND region = 'us-east-1'
        AND cell = 'us-east-1-cell-10'
        AND silo = 'us-east-1-cell-10-silo-3'
        AND availability_zone = 'us-east-1-1'
        AND microservice_name = 'hercules'
    GROUP BY region, cell, silo, availability_zone, microservice_name,
        instance_name, process_name, jdk_version
), last_point_source AS (
    SELECT instance_name, MAX(time) AS time, MAX_BY(gc_pause, time) AS last_measure
    FROM "raw_data"."devops"
    WHERE time < from_milliseconds(1636746715649) AND time > from_milliseconds(1636746715649) - 26h
        AND measure_name = 'events'
        AND region = 'us-east-1'
        AND cell = 'us-east-1-cell-10'
        AND silo = 'us-east-1-cell-10-silo-3'
        AND availability_zone = 'us-east-1-1'
        AND microservice_name = 'hercules'
    GROUP BY region, cell, silo, availability_zone, microservice_name,
        instance_name, process_name, jdk_version
)
SELECT instance_name, MAX(time) AS time, MAX_BY(last_measure, time) AS last_measure
FROM (
    SELECT * FROM last_point_derived
    UNION
    SELECT * FROM last_point_source
)
GROUP BY instance_name
ORDER BY instance_name, time DESC
```

The previous is just one illustration of how you can structure the derived tables. If you have years of data, you can use more levels of aggregations. For instance, you can have monthly aggregates on top of daily aggregates, and you can have hourly aggregates before the daily. So you can merge together the most recent to fill in the last hour, the hourly to fill in the last day, the daily to fill in the last month, and monthly to fill in the older. The number of levels you set up vs. the refresh schedule will be depending on your requirements of how frequently these queries are issues and how many users are concurrently issuing these queries.

# Unique dimension values
<a name="scheduledqueries-patterns-uniquedimvalues"></a>

You may have a use case where you have dashboards which you want to use the unique values of dimensions as variables to drill down on the metrics corresponding to a specific slice of data. The snapshot below is an example where the dashboard pre-populates the unique values of several dimensions such as region, cell, silo, microservice, and availability\$1zone. Here we show an example of how you can use scheduled queries to significantly speed up computing these distinct values of these variables from the metrics you are tracking.

**Topics**
+ [

## On raw data
](#scheduledqueries-patterns-uniquedimvalues-onraw)
+ [

## Pre-compute unique dimension values
](#scheduledqueries-patterns-uniquedimvalues-precompute)
+ [

## Computing the variables from derived table
](#scheduledqueries-patterns-uniquedimvalues-fromderived)

## On raw data
<a name="scheduledqueries-patterns-uniquedimvalues-onraw"></a>

You can use SELECT DISTINCT to compute the distinct values seen from your data. For instance, if you want to obtain the distinct values of region, you can use the query of this form.

```
SELECT DISTINCT region
FROM "raw_data"."devops"
WHERE time > ago(1h)
ORDER BY 1
```

You may be tracking millions of devices and billions of time series. However, in most cases, these interesting variables are for lower cardinality dimensions, where you have a few to tens of values. Computing DISTINCT from raw data can require scanning large volumes of data. 

## Pre-compute unique dimension values
<a name="scheduledqueries-patterns-uniquedimvalues-precompute"></a>

You want these variables to load fast so that your dashboards are interactive. Moreover, these variables are often computed on every dashboard load, so you want them to be cost-effective as well. You can optimize finding these variables using scheduled queries and materializing them in a derived table.

First, you need to identify the dimensions for which you need to compute the DISTINCT values or columns which you will use in the predicates when computing the DISTINCT value.

In this example, you can see that the dashboard is populating distinct values for the dimensions region, cell, silo, availability\$1zone and microservice. So you can use the query below to pre-compute these unique values.

```
SELECT region, cell, silo, availability_zone, microservice_name, 
    min(@scheduled_runtime) AS time, COUNT(*) as numDataPoints 
FROM raw_data.devops 
WHERE time BETWEEN @scheduled_runtime - 15m AND @scheduled_runtime 
GROUP BY region, cell, silo, availability_zone, microservice_name
```

There are a few important things to note here.
+ You can use one scheduled computation to pre-compute values for many different queries. For instance, you are using the preceding query to pre-compute values for five different variables. So you don't need one for each variable. You can use this same pattern to identify shared computation across multiple panels to optimize the number of scheduled queries you need to maintain.
+ The unique values of the dimensions isn't inherently time series data. So you convert this to time series using the @scheduled\$1runtime. By associating this data with the @scheduled\$1runtime parameter, you can also track which unique values appeared at a given point in time, thus creating time series data out of it.
+ In the previous example, you will see a metric value being tracked. This example uses COUNT(\$1). You can compute other meaningful aggregates if you want to track them for your dashboards.

Below is a configuration for a scheduled computation using the previous query. In this example, it is configured to refresh once every 15 mins using the schedule expression cron(0/15 \$1 \$1 \$1 ? \$1). 

```
{
    "Name": "PT15mHighCardPerUniqueDimensions",
    "QueryString": "SELECT region, cell, silo, availability_zone, microservice_name, min(@scheduled_runtime) AS time, COUNT(*) as numDataPoints FROM raw_data.devops WHERE time BETWEEN @scheduled_runtime - 15m AND @scheduled_runtime GROUP BY region, cell, silo, availability_zone, microservice_name",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/15 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "hc_unique_dimensions_pt15m",
            "TimeColumn": "time",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "cell",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "silo",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "availability_zone",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "count_multi",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "numDataPoints",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

## Computing the variables from derived table
<a name="scheduledqueries-patterns-uniquedimvalues-fromderived"></a>

Once the scheduled computation pre-materializes the unique values in the derived table hc\$1unique\$1dimensions\$1pt15m, you can use the derived table to efficiently compute the unique values of the dimensions. Below are example queries for how to compute the unique values, and how you can use other variables as predicates in these unique value queries.

**Region**

```
SELECT DISTINCT region
FROM "derived"."hc_unique_dimensions_pt15m"
WHERE time > ago(1h)
ORDER BY 1
```

**Cell**

```
SELECT DISTINCT cell
FROM "derived"."hc_unique_dimensions_pt15m"
WHERE time > ago(1h)
  AND region = '${region}'
ORDER BY 1
```

**Silo**

```
SELECT DISTINCT silo
FROM "derived"."hc_unique_dimensions_pt15m"
WHERE time > ago(1h)
   AND region = '${region}' AND cell = '${cell}'
ORDER BY 1
```

**Microservice**

```
SELECT DISTINCT microservice_name
FROM "derived"."hc_unique_dimensions_pt15m"
WHERE time > ago(1h)
   AND region = '${region}' AND cell = '${cell}'
ORDER BY 1
```

**Availability Zone**

```
SELECT DISTINCT availability_zone
FROM "derived"."hc_unique_dimensions_pt15m"
WHERE time > ago(1h)
   AND region = '${region}' AND cell = '${cell}' AND silo = '${silo}'
ORDER BY 1
```

# Handling late-arriving data
<a name="scheduledqueries-patterns-latearrive"></a>

You may have scenarios where you can have data that arrives significantly late, for example, the time when the data was ingested into Timestream for LiveAnalytics is significantly delayed compared to the timestamp associated to the rows that are ingested. In the previous examples, you have seen how you can use the time ranges defined by the @scheduled\$1runtime parameter to account for some late arriving data. However, if you have use cases where data can be delayed by hours or days, you may need a different pattern to make sure your pre-computations in the derived table are appropriately updated to reflect such late-arriving data. For general information about late-arriving data, see [Writing data (inserts and upserts)](writes.md#writes.writing-data-inserts-upserts).

In the following you will see two different ways to address this late arriving data.
+ If you have predictable delays in your data arrival, then you can use another "catch-up" scheduled computation to update your aggregates for late arriving data.
+ If you have un-predictable delays or occasional late-arrival data, you can use manual executions to update the derived tables.

This discussion covers scenarios for late data arrival. However, the same principles apply for data corrections, where you have modified the data in your source table and you want to update the aggregates in your derived tables.

**Topics**
+ [

## Scheduled catch-up queries
](#scheduledqueries-patterns-latearrive-schedcatchup)
+ [

## Manual executions for unpredictable late arriving data
](#scheduledqueries-patterns-latearrive-manual)

## Scheduled catch-up queries
<a name="scheduledqueries-patterns-latearrive-schedcatchup"></a>

### Query aggregating data that arrived in time
<a name="scheduledqueries-patterns-latearrive-schedcatchup-1"></a>

Below is a pattern you will see how you can use an automated way to update your aggregates if you have predictable delays in your data arrival. Consider one of the previous examples of a scheduled computation on real-time data below. This scheduled computation refreshes the derived table once every 30 minutes and already accounts for data up to an hour delayed.

```
{
    "Name": "MultiPT30mPerHrPerTimeseriesDPCount",
    "QueryString": "SELECT region, cell, silo, availability_zone, microservice_name, instance_type, os_version, instance_name, process_name, jdk_version, bin(time, 1h) as hour, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND @scheduled_runtime + 1h GROUP BY region, cell, silo, availability_zone, microservice_name, instance_type, os_version, instance_name, process_name, jdk_version, bin(time, 1h)",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/30 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "dp_per_timeseries_per_hr",
            "TimeColumn": "hour",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "cell",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "silo",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "availability_zone",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "instance_type",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "os_version",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "instance_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "process_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "jdk_version",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "numDataPoints",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "numDataPoints",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

### Catch-up query updating the aggregates for late arriving data
<a name="scheduledqueries-patterns-latearrive-schedcatchup-2"></a>

Now if you consider the case that your data can be delayed by about 12 hours. Below is a variant of the same query. However, the difference is that it computes the aggregates on data that is delayed by up to 12 hours compared to when the scheduled computation is being triggered. For instance, you see the query in the example below, the time range this query is targeting is between 2h to 14h before when the query is triggered. Moreover, if you notice the schedule expression cron(0 0,12 \$1 \$1 ? \$1), it will trigger the computation at 00:00 UTC and 12:00 UTC every day. Therefore, when the query is triggered on 2021-12-01 00:00:00, then the query updates aggregates in the time range 2021-11-30 10:00:00 to 2021-11-30 22:00:00. Scheduled queries use upsert semantics similar to Timestream for LiveAnalytics's writes where this catch-up query will update the aggregate values with newer values if there is late arriving data in the window or if newer aggregates are found (e.g., a new grouping shows up in this aggregate which was not present when the original scheduled computation was triggered), then the new aggregate will be inserted into the derived table. Similarly, when the next instance is triggered on 2021-12-01 12:00:00, then that instance will update aggregates in the range 2021-11-30 22:00:00 to 2021-12-01 10:00:00.

```
       {
    "Name": "MultiPT12HPerHrPerTimeseriesDPCountCatchUp",
    "QueryString": "SELECT region, cell, silo, availability_zone, microservice_name, instance_type, os_version, instance_name, process_name, jdk_version, bin(time, 1h) as hour, SUM(CASE WHEN measure_name = 'metrics' THEN 20 ELSE 5 END) as numDataPoints FROM raw_data.devops WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 14h AND bin(@scheduled_runtime, 1h) - 2h GROUP BY region, cell, silo, availability_zone, microservice_name, instance_type, os_version, instance_name, process_name, jdk_version, bin(time, 1h)",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0 0,12 * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "dp_per_timeseries_per_hr",
            "TimeColumn": "hour",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "cell",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "silo",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "availability_zone",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "instance_type",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "os_version",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "instance_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "process_name",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "jdk_version",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "numDataPoints",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "numDataPoints",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

This preceding example is an illustration assuming your late arrival is bounded to 12 hours and it is okay to update the derived table once every 12 hours for data arriving later than the real time window. You can adapt this pattern to update your derived table once every hour so your derived table reflects the late arriving data sooner. Similarly, you can adapt the time range to be older than 12 hours, e.g., a day or even a week or more, to handle predictable late-arriving data.

## Manual executions for unpredictable late arriving data
<a name="scheduledqueries-patterns-latearrive-manual"></a>

There can be instances where you have unpredictable late arriving data or you made changes to the source data and updated some values after the fact. In all such cases, you can manually trigger scheduled queries to update the derived table. Below is an example on how you can achieve this.

Assume that you have the use case where you have the computation written to the derived table dp\$1per\$1timeseries\$1per\$1hr. Your base data in the table devops was updated in the time range 2021-11-30 23:00:00 - 2021-12-01 00:00:00. There are two different scheduled queries that can be used to update this derived table: MultiPT30mPerHrPerTimeseriesDPCount and MultiPT12HPerHrPerTimeseriesDPCountCatchUp. Each scheduled computation you create in Timestream for LiveAnalytics has a unique ARN which you obtain when you create the computation or when you perform a list operation. You can use the ARN for the computation and a value for the parameter @scheduled\$1runtime taken by the query to perform this operation. 

Assume that the computation for MultiPT30mPerHrPerTimeseriesDPCount has an ARN arn\$11 and you want to use this computation to update the derived table. Since the preceding scheduled computation updates the aggregates 1h before and 1hr after the @scheduled\$1runtime value, you can cover the time range for the update (2021-11-30 23:00:00 - 2021-12-01 00:00:00) using a value of 2021-12-01 00:00:00 for the @scheduled\$1runtime parameter. You can use the ExecuteScheduledQuery API to pass the ARN of this computation and the time parameter value in epoch seconds (in UTC) to achieve this. Below is an example using the AWS CLI and you can follow the same pattern using any of the SDKs supported by Timestream for LiveAnalytics.

```
aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638316800 --profile profile --region us-east-1
```

In the previous example, profile is the AWS profile which has the appropriate privileges to make this API call and 1638316800 corresponds to the epoch second for 2021-12-01 00:00:00. This manual trigger behaves almost like the automated trigger assuming the system triggered this invocation at the desired time period.

If you had an update in a longer time period, say the base data was updated for 2021-11-30 23:00:00 - 2021-12-01 11:00:00, then you can trigger the preceding queries multiple times to cover this entire time range. For instance, you could do six different execution as follows.

```
aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638316800 --profile profile --region us-east-1

aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638324000 --profile profile --region us-east-1

aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638331200 --profile profile --region us-east-1

aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638338400 --profile profile --region us-east-1

aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638345600 --profile profile --region us-east-1

aws timestream-query execute-scheduled-query --scheduled-query-arn arn_1 --invocation-time 1638352800 --profile profile --region us-east-1
```

The previous six commands correspond to the scheduled computation invoked at 2021-12-01 00:00:00, 2021-12-01 02:00:00, 2021-12-01 04:0:00, 2021-12-01 06:00:00, 2021-12-01 08:00:00, and 2021-12-01 10:00:

Alternatively, you can use the computation MultiPT12HPerHrPerTimeseriesDPCountCatchUp triggered at 2021-12-01 13:00:00 for one execution to update the aggregates for the entire 12 hour time range. For instance, if arn\$12 is the ARN for that computation, you can execute the following command from CLI.

```
aws timestream-query execute-scheduled-query --scheduled-query-arn arn_2 --invocation-time 1638363600 --profile profile --region us-east-1
```

It is worth noting that for a manual trigger, you can use a timestamp for the invocation-time parameter that does not need to be aligned with that automated trigger timestamps. For instance, in the previous example, you triggered the computation at time 2021-12-01 13:00:00 even though the automated schedule only triggers at timestamps 2021-12-01 10:00:00, 2021-12-01 12:00:00, and 2021-12-02 00:00:00. Timestream for LiveAnalytics provides you with the flexibility to trigger it with appropriate values as needed for your manual operations.

Following are a few important considerations when using the ExecuteScheduledQuery API.
+ If you are triggering multiple of these invocations, you need to make sure that these invocations do not generate results in overlapping time ranges. For instance, in the previous examples, there were six invocations. Each invocation covers 2 hours of time range, and hence the invocation timestamps were spread out by two hours each to avoid any overlap in the updates. This ensures that the data in the derived table ends up in a state that matches are aggregates from the source table. If you cannot ensure non-overlapping time ranges, then make sure these the executions are triggered sequentially one after the other. If you trigger multiple executions concurrently which overlap in their time ranges, then you can see trigger failures where you might see version conflicts in the error reports for these executions. Results generated by a scheduled query invocation are assigned a version based on when the invocation was triggered. Therefore, rows generated by newer invocations have higher versions. A higher version record can overwrite a lower version record. For automatically-triggered scheduled queries, Timestream for LiveAnalytics automatically manages the schedules so that you don't see these issues even if the subsequent invocations have overlapping time ranges.
+ noted earlier, you can trigger the invocations with any timestamp value for @scheduled\$1runtime. So it is your responsibility to appropriately set the values so the appropriate time ranges are updated in the derived table corresponding to the ranges where data was updated in the source table.
+ You can also use these manual trigger for scheduled queries that are in the DISABLED state. This allows you to define special queries that are not executed in an automated schedule, since they are in the DISABLED state. Rather, you can use the manual triggers on them to manage data corrections or late arrival use cases.

# Back-filling historical pre-computations
<a name="scheduledqueries-patterns-backfilling"></a>

When you create a scheduled computation, Timestream for LiveAnalytics manages executions of the queries moving forward where the refresh is governed by the schedule expression you provide. Depending of how much historical data your source table, you may want to update your derived table with aggregates corresponding to the historical data. You can use the preceding logic for manual triggers to back-fill the historical aggregates.

For instance, if we consider the derived table per\$1timeseries\$1lastpoint\$1pt1d, then the scheduled computation is updated once a day for the past day. If your source table has a year of data, you can use the ARN for this scheduled computation and trigger it manually for every day up to a year old so that the derived table has all the historical queries populated. Notes that all the caveats for manual triggers apply here. Moreover, if the derived table is set up in a way that the historical ingestion will write to magnetic store on the derived table, be aware of the [best practices](https://docs.aws.amazon.com/timestream/latest/developerguide/best-practices.html) and [limits for writes](https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html) to the magnetic store.

# Scheduled query examples
<a name="scheduledqueries-examples"></a>

This section contains examples of how you can use Timestream for LiveAnalytics's Scheduled Queries to optimize the costs and dashboard load times when visualizing fleet-wide statistics effectively monitor your fleet of devices. Scheduled Queries in Timestream for LiveAnalytics allow you to express your queries using the full SQL surface area of Timestream for LiveAnalytics. Your query can include one or more source tables, perform aggregations or any other query allowed by Timestream for LiveAnalytics's SQL language, and then store the results of the query in another destination table in Timestream for LiveAnalytics.

This section refers to the target table of a scheduled query as a *derived table*.

As an example, we will use a DevOps application where you are monitoring a large fleet of servers that are deployed across multiple deployments (such as regions, cells, and silos), multiple microservices, and you're tracking the fleet-wide statistics using Timestream for LiveAnalytics. The example schema we will use is described in [Scheduled Queries Sample Schema](https://docs.aws.amazon.com/timestream/latest/developerguide/scheduledqueries-common-schema-example.html). 

The following scenarios will be described.
+ How to convert a dashboard, plotting aggregated statistics from the raw data you ingest into Timestream for LiveAnalytics into a scheduled query and then how to use your pre-computed aggregates to create a new dashboard showing aggregate statistics.
+ How to combine scheduled queries to get an aggregate view and the raw granular data, to drill down into details. This allows you to store and analyze the raw data while optimizing your common fleet-wide operations using scheduled queries.
+ How to optimize costs using scheduled queries by finding which aggregates are used in multiple dashboards and have the same scheduled query populate multiple panels in the same or multiple dashboards.

**Topics**
+ [

# Converting an aggregate dashboard to scheduled query
](scheduledqueries-example1.md)
+ [

# Using scheduled queries and raw data for drill downs
](scheduledqueries-example2.md)
+ [

# Optimizing costs by sharing scheduled query across dashboards
](scheduledqueries-example3.md)
+ [

# Comparing a query on a base table with a query of scheduled query results
](scheduledqueries-example4-clickstream.md)

# Converting an aggregate dashboard to scheduled query
<a name="scheduledqueries-example1"></a>

Assume you are computing the fleet-wide statistics such as host counts in the fleet by the five microservices and by the six regions where your service is deployed. From the snapshot below, you can see there are 500K servers emitting metrics, and some of the bigger regions (e.g., us-east-1) have >200K servers.

Computing these aggregates, where you are computing distinct instance names over hundreds of gigabytes of data can result in query latency of tens of seconds, in addition to the cost of scanning the data.

![\[Instance counts for microservices: apollo and zeus 150k, hercules 100k, athena and demeter 50k each.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex1_img1.png)


**Original dashboard query**

The aggregate shown in the dasboard panel is computed, from raw data, using the query below. The query uses multiple SQL constructs, such as distinct counts and multiple aggregation functions.

```
SELECT CASE WHEN microservice_name = 'apollo' THEN num_instances ELSE NULL END AS apollo,
    CASE WHEN microservice_name = 'athena' THEN num_instances ELSE NULL END AS athena,
    CASE WHEN microservice_name = 'demeter' THEN num_instances ELSE NULL END AS demeter,
    CASE WHEN microservice_name = 'hercules' THEN num_instances ELSE NULL END AS hercules,
    CASE WHEN microservice_name = 'zeus' THEN num_instances ELSE NULL END AS zeus
FROM (
    SELECT microservice_name, SUM(num_instances) AS num_instances
    FROM (
        SELECT microservice_name, COUNT(DISTINCT instance_name) as num_instances
        FROM "raw_data"."devops"
        WHERE time BETWEEN from_milliseconds(1636526171043) AND from_milliseconds(1636612571043)
            AND measure_name = 'metrics'
        GROUP BY region, cell, silo, availability_zone, microservice_name
    )
    GROUP BY microservice_name
)
```

**Converting to a scheduled query**

The previous query can be converted into a scheduled query as follows. You first compute the distinct host names within a given deployment in a region, cell, silo, availability zone and microservice. Then you add up the hosts to compute a per hour per microservice host count. By using the `@scheduled_runtime` parameter supported by the scheduled queries, you can recompute it for the past hour when the query is invoked. The `bin(@scheduled_runtime, 1h)` in the `WHERE` clause of the inner query ensures that even if the query is scheduled at a time in the middle of the hour, you still get the data for the full hour.

Even though the query computes hourly aggregates, as you will see in the scheduled computation configuration, it is set up to refresh every half hour so that you get updates in your derived table sooner. You can tune that based on your freshness requirements, e.g., recompute the aggregates every 15 minutes or recompute it at the hour boundaries.

```
SELECT microservice_name, hour, SUM(num_instances) AS num_instances    
FROM (
        SELECT microservice_name, bin(time, 1h) AS hour, 
            COUNT(DISTINCT instance_name) as num_instances
       FROM raw_data.devops        
       WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND @scheduled_runtime        
           AND measure_name = 'metrics'        
       GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h)    
     )    
GROUP BY microservice_name, hour
```

```
{
    "Name": "MultiPT30mHostCountMicroservicePerHr",
    "QueryString": "SELECT microservice_name, hour, SUM(num_instances) AS num_instances    FROM (        SELECT microservice_name, bin(time, 1h) AS hour, COUNT(DISTINCT instance_name) as num_instances        FROM raw_data.devops        WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND @scheduled_runtime            AND measure_name = 'metrics'        GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h)    )    GROUP BY microservice_name, hour",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/30 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "host_count_pt1h",
            "TimeColumn": "hour",
            "DimensionMappings": [
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "num_instances",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "num_instances",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

**Using the pre-computed results in a new dashboard**

You will now see how to create your aggregate view dashboard using the derived table from the scheduled query you created. From the dashboard snapshot, you will also be able to validate that the aggregates computed from the derived table and the base table also match. Once you create the dashboards using the derived tables, you will notice the significantly faster load time and lower costs of using the derived tables compared to computing these aggregates from the raw data. Below is a snapshot of the dashboard using pre-computed data, and the query used to render this panel using pre-computed data stored in the table "derived"."host\$1count\$1pt1h". Note that the structure of the query is very similar to the query that was used in the dashboard on raw data, except that is it using the derived table which already computes the distinct counts which this query is aggregating. 

![\[Instance count by microservice showing values for apollo, athena, demeter, hercules, and zeus.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex1_img2.png)


```
SELECT CASE WHEN microservice_name = 'apollo' THEN num_instances ELSE NULL END AS apollo,
    CASE WHEN microservice_name = 'athena' THEN num_instances ELSE NULL END AS athena,
    CASE WHEN microservice_name = 'demeter' THEN num_instances ELSE NULL END AS demeter,
    CASE WHEN microservice_name = 'hercules' THEN num_instances ELSE NULL END AS hercules,
    CASE WHEN microservice_name = 'zeus' THEN num_instances ELSE NULL END AS zeus
FROM (
    SELECT microservice_name, AVG(num_instances) AS num_instances
    FROM (
        SELECT microservice_name, bin(time, 1h), SUM(num_instances) as num_instances
        FROM "derived"."host_count_pt1h"
        WHERE time BETWEEN from_milliseconds(1636567785421) AND from_milliseconds(1636654185421)
            AND measure_name = 'num_instances'
        GROUP BY microservice_name, bin(time, 1h)
    )
    GROUP BY microservice_name
)
```

# Using scheduled queries and raw data for drill downs
<a name="scheduledqueries-example2"></a>

You can use the aggregated statistics across your fleet to identify areas that need drill downs and then use the raw data to drill down into granular data to get deeper insights.

In this example, you will see how you can use aggregate dashboard to identify any deployment (a deployment is for a given microservice within a given region, cell, silo, and availability zone) which seems to have higher CPU utilization compared to other deployments. You can then drill down to get a better understanding using the raw data. Since these drill downs might be infrequent and only access data relevant to the deployment, you can use the raw data for this analysis and do not need to use scheduled queries. 

**Per deployment drill down**

The dashboard below provides drill down into more granular and server-level statistics within a given deployment. To help you drill down into the different parts of your fleet, this dashboard uses variables such as region, cell, silo, microservice, and availability\$1zone. It then shows some aggregate statistics for that deployment.

![\[Dashboard showing deployment statistics with filters for region, cell, silo, and other parameters.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex2_img1.png)


![\[CPU distribution graph showing consistent patterns for avg, p90, p95, and p99 values over 24 hours.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex2_img2.png)


In the query below, you can see that the values chosen in the drop down of the variables are used as predicates in the `WHERE` clause of the query, which allows you to only focus on the data for the deployment. And then the panel plots the aggregated CPU metrics for instances in that deployment. You can use the raw data to perform this drill down with interactive query latency to derive deeper insights.

```
SELECT bin(time, 5m) as minute,
    ROUND(AVG(cpu_user), 2) AS avg_value,
    ROUND(APPROX_PERCENTILE(cpu_user, 0.9), 2) AS p90_value,
    ROUND(APPROX_PERCENTILE(cpu_user, 0.95), 2) AS p95_value,
    ROUND(APPROX_PERCENTILE(cpu_user, 0.99), 2) AS p99_value
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099476) AND from_milliseconds(1636613499476)
    AND region = 'eu-west-1'
    AND cell = 'eu-west-1-cell-10'
    AND silo = 'eu-west-1-cell-10-silo-1'
    AND microservice_name = 'demeter'
    AND availability_zone = 'eu-west-1-3'
    AND measure_name = 'metrics'
GROUP BY bin(time, 5m)
ORDER BY 1
```

**Instance-level statistics**

This dashboard further computes another variable that also lists the servers/instances with high CPU utilization, sorted in descending order of utilization. The query used to compute this variable is displayed below.

```
WITH microservice_cell_avg AS (
    SELECT AVG(cpu_user) AS microservice_avg_metric
    FROM "raw_data"."devops"
    WHERE $__timeFilter
        AND measure_name = 'metrics'
        AND region = '${region}'
        AND cell = '${cell}'
        AND silo = '${silo}'
        AND availability_zone = '${availability_zone}'
        AND microservice_name = '${microservice}'
), instance_avg AS (
    SELECT instance_name,
        AVG(cpu_user) AS instance_avg_metric
    FROM "raw_data"."devops"
    WHERE $__timeFilter
        AND measure_name = 'metrics'
        AND region = '${region}'
        AND cell = '${cell}'
        AND silo = '${silo}'
        AND microservice_name = '${microservice}'
        AND availability_zone = '${availability_zone}'
    GROUP BY availability_zone, instance_name
) 
SELECT i.instance_name
FROM instance_avg i CROSS JOIN microservice_cell_avg m 
WHERE i.instance_avg_metric > (1 + ${utilization_threshold}) * m.microservice_avg_metric
ORDER BY i.instance_avg_metric DESC
```

In the preceding query, the variable is dynamically recalculated depending on the values chosen for the other variables. Once the variable is populated for a deployment, you can pick individual instances from the list to further visualize the metrics from that instance. You can pick the different instances from the drop down of the instance names as seen from the snapshot below.

![\[List of Amazon Web Services (AWS) resource identifiers for Demeter instances in eu-west-1 region.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex2_img3.png)


![\[Dashboard showing CPU utilization, memory usage, GC pause events, and disk I/O metrics for an AWS instance.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex2_img4.png)


Preceding panels show the statistics for the instance that is selected and below are the queries used to fetch these statistics.

```
SELECT BIN(time, 30m) AS time_bin, 
    AVG(cpu_user) AS avg_cpu,
    ROUND(APPROX_PERCENTILE(cpu_user, 0.99), 2) as p99_cpu
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499477)
    AND measure_name = 'metrics'
    AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' 
    AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' 
    AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
GROUP BY BIN(time, 30m)
ORDER BY time_bin desc
```

```
SELECT BIN(time, 30m) AS time_bin, 
    AVG(memory_used) AS avg_memory,
    ROUND(APPROX_PERCENTILE(memory_used, 0.99), 2) as p99_memory
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499477)
    AND measure_name = 'metrics'
    AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' 
    AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' 
    AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
GROUP BY BIN(time, 30m)
ORDER BY time_bin desc
```

```
SELECT COUNT(gc_pause)
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099477) AND from_milliseconds(1636613499478)
    AND measure_name = 'events'
    AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' 
    AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' 
    AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
```

```
SELECT avg(gc_pause) as avg, round(approx_percentile(gc_pause, 0.99), 2) as p99
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099478) AND from_milliseconds(1636613499478)
    AND measure_name = 'events'
    AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' 
    AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' 
    AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
```

```
SELECT BIN(time, 30m) AS time_bin, 
    AVG(disk_io_reads) AS avg,
    ROUND(APPROX_PERCENTILE(disk_io_reads, 0.99), 2) as p99
FROM "raw_data"."devops"
WHERE time BETWEEN from_milliseconds(1636527099478) AND from_milliseconds(1636613499478)
    AND measure_name = 'metrics'
    AND region = 'eu-west-1' AND cell = 'eu-west-1-cell-10' AND silo = 'eu-west-1-cell-10-silo-1' 
    AND availability_zone = 'eu-west-1-3' AND microservice_name = 'demeter' 
    AND instance_name = 'i-zaZswmJk-demeter-eu-west-1-cell-10-silo-1-00000272.amazonaws.com'
GROUP BY BIN(time, 30m)
ORDER BY time_bin desc
```

# Optimizing costs by sharing scheduled query across dashboards
<a name="scheduledqueries-example3"></a>

In this example, we will see a scenario where multiple dashboard panels display variations of similar information (finding high CPU hosts and fraction of fleet with high CPU utilization) and how you can use the same scheduled query to pre-compute results which are then used to populate multiple panels. This reuse further optimizes your costs where instead of using different scheduled queries, one for each panel, you use only owner. 

## Dashboard panels with raw data
<a name="scheduledqueries-example3-dashboard-raw"></a>

**CPU utilization per region per microservice**

The first panel computes the instances whose avg CPU utilization is a threshold below or above the above CPU utilization for given deployment within a region, cell, silo, availability zone, and microservice. It then sorts the region and microservice which has the highest percentage of hosts with high utilization. It helps identify how hot the servers of a specific deployment are running, and then subsequently drill down to better understand the issues. 

The query for the panel demonstrates the flexibility of Timestream for LiveAnalytics's SQL support to perform complex analytical tasks with common table expressions, window functions, joins, and so on. 

![\[Table showing CPU utilization data for microservices across different regions.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex3_img1.png)


*Query*:

```
WITH microservice_cell_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, AVG(cpu_user) AS microservice_avg_metric
    FROM "raw_data"."devops"
    WHERE time BETWEEN from_milliseconds(1636526593876) AND from_milliseconds(1636612993876)
        AND measure_name = 'metrics'
    GROUP BY region, cell, silo, availability_zone, microservice_name
), instance_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, instance_name,
        AVG(cpu_user) AS instance_avg_metric
    FROM "raw_data"."devops"
    WHERE time BETWEEN from_milliseconds(1636526593876) AND from_milliseconds(1636612993876)
        AND measure_name = 'metrics'
    GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name
), instances_above_threshold AS (
  SELECT i.*,
    CASE WHEN i.instance_avg_metric > (1 + 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS high_utilization,
    CASE WHEN i.instance_avg_metric < (1 - 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS low_utilization
  FROM instance_avg i INNER JOIN microservice_cell_avg m 
    ON i.region = m.region AND i.cell = m.cell AND i.silo = m.silo AND i.availability_zone = m.availability_zone
      AND m.microservice_name = i.microservice_name
), per_deployment_high AS (
SELECT region, microservice_name, COUNT(*) AS num_hosts, SUM(high_utilization) AS high_utilization_hosts, SUM(low_utilization) AS low_utilization_hosts,
    ROUND(SUM(high_utilization) * 100.0 / COUNT(*), 0) AS percent_high_utilization_hosts,
    ROUND(SUM(low_utilization) * 100.0 / COUNT(*), 0) AS percent_low_utilization_hosts
FROM instances_above_threshold
GROUP BY region, microservice_name
), per_region_ranked AS (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY region ORDER BY percent_high_utilization_hosts DESC, high_utilization_hosts DESC) AS rank
    FROM per_deployment_high
)
SELECT *
FROM per_region_ranked
WHERE rank <= 2
ORDER BY percent_high_utilization_hosts desc, rank asc
```

**Drill down into a microservice to find hot spots**

The next dashboard allows you to drill deeper into one of the microservices to find out the specific region, cell, and silo for that microservice is running what fraction of fraction of its fleet at higher CPU utilization. For instance, in the fleet wide dashboard you saw the microservice demeter show up in the top few ranked positions, so in this dashboard, you want to drill deeper into that microservice. 

This dashboard uses a variable to pick microservice to drill down into, and the values of the variable is populated using unique values of the dimension. Once you pick the microservice, the rest of the dashboard refreshes. 

As you see below, the first panel plots the percentage of hosts in a deployment (a region, cell, and silo for a microservice) over time, and the corresponding query which is used to plot the dashboard. This plot itself identifies a specific deployment having higher percentage of hosts with high CPU.

![\[Dropdown menu showing "microservice", "demeter", "topk", and "2" options.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex3_img2.png)


![\[Graph showing deployments with high CPU utilization over time, with multiple flat lines.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex3_img3.png)


*Query*:

```
WITH microservice_cell_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, bin(time, 1h) as hour, AVG(cpu_user) AS microservice_avg_metric
    FROM "raw_data"."devops"
    WHERE time BETWEEN from_milliseconds(1636526898831) AND from_milliseconds(1636613298831)
        AND measure_name = 'metrics'
        AND microservice_name = 'demeter'
    GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h)
), instance_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h) as hour,
        AVG(cpu_user) AS instance_avg_metric
    FROM "raw_data"."devops"
    WHERE time BETWEEN from_milliseconds(1636526898831) AND from_milliseconds(1636613298831)
        AND measure_name = 'metrics'
        AND microservice_name = 'demeter'
    GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h)
), instances_above_threshold AS (
  SELECT i.*,
    CASE WHEN i.instance_avg_metric > (1 + 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS high_utilization
  FROM instance_avg i INNER JOIN microservice_cell_avg m 
    ON i.region = m.region AND i.cell = m.cell AND i.silo = m.silo AND i.availability_zone = m.availability_zone
      AND m.microservice_name = i.microservice_name AND m.hour = i.hour
), high_utilization_percent AS (
    SELECT region, cell, silo, microservice_name, hour, COUNT(*) AS num_hosts, SUM(high_utilization) AS high_utilization_hosts,
        ROUND(SUM(high_utilization) * 100.0 / COUNT(*), 0) AS percent_high_utilization_hosts
    FROM instances_above_threshold
    GROUP BY region, cell, silo, microservice_name, hour
), high_utilization_ranked AS (
    SELECT region, cell, silo, microservice_name, 
        DENSE_RANK() OVER (PARTITION BY region ORDER BY AVG(percent_high_utilization_hosts) desc, AVG(high_utilization_hosts) desc) AS rank
    FROM high_utilization_percent 
    GROUP BY region, cell, silo, microservice_name
)
SELECT hup.silo, CREATE_TIME_SERIES(hour, hup.percent_high_utilization_hosts) AS percent_high_utilization_hosts
FROM high_utilization_percent hup INNER JOIN high_utilization_ranked hur
    ON hup.region = hur.region AND hup.cell = hur.cell AND hup.silo = hur.silo AND hup.microservice_name = hur.microservice_name
WHERE rank <= 2
GROUP BY hup.region, hup.cell, hup.silo
ORDER BY hup.silo
```

## Converting into a single scheduled query enabling reuse
<a name="scheduledqueries-example3-query-reuse"></a>

It is important to note that a similar computation is done across the different panels across the two dashboards. You can define a separate scheduled query for each panel. Here you will see how you can further optimize your costs by defining one scheduled query who results can be used to render all the three panels. 

Following is the query that captures the aggregates that are computed and used for all the different panels. You will observe several important aspects in the definition of this scheduled query.
+ The flexibility and the power of the SQL surface area supported by scheduled queries, where you can use common table expressions, joins, case statements, etc. 
+ You can using one scheduled query to compute the statistics at a finer granularity than a specific dashboard might need, and for all values that a dashboard might use for different variables. For instance, you will see the aggregates are computed across a region, cell, silo, and microservice. Therefore, you can combine these to create region-level, or region, and microservice-level aggregates. Similarly, the same query computes the aggregates for all regions, cells, silos, and microservices. It allows you to apply filters on these columns to obtain the aggregates for a subset of the values. For instance, you can compute the aggregates for any one region, say us-east-1, or any one microservice say demeter or drill down into a specific deployment within a region, cell, silo, and microservice. This approach further optimizes your costs of maintaining the pre-computed aggregates. 

```
WITH microservice_cell_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, bin(time, 1h) as hour, AVG(cpu_user) AS microservice_avg_metric    
    FROM raw_data.devops    
    WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND bin(@scheduled_runtime, 1h) + 1h
        AND measure_name = 'metrics'    
    GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h)    
), instance_avg AS (
    SELECT region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h) as hour,
        AVG(cpu_user) AS instance_avg_metric    
   FROM raw_data.devops    
   WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND bin(@scheduled_runtime, 1h) + 1h
       AND measure_name = 'metrics'    
   GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h)    
), instances_above_threshold AS (
    SELECT i.*,
        CASE WHEN i.instance_avg_metric > (1 + 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS high_utilization,    
        CASE WHEN i.instance_avg_metric < (1 - 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS low_utilization    
   FROM instance_avg i INNER JOIN microservice_cell_avg m
       ON i.region = m.region AND i.cell = m.cell AND i.silo = m.silo AND i.availability_zone = m.availability_zone
           AND m.microservice_name = i.microservice_name AND m.hour = i.hour    
)     
SELECT region, cell, silo, microservice_name, hour,
     COUNT(*) AS num_hosts, SUM(high_utilization) AS high_utilization_hosts, SUM(low_utilization) AS low_utilization_hosts    
FROM instances_above_threshold GROUP BY region, cell, silo, microservice_name, hour
```

The following is a scheduled query definition for the previous query. The schedule expression, it is configured to refresh every 30 mins, and refreshes the data for up to an hour back, again using the bin(@scheduled\$1runtime, 1h) construct to get the full hour's events. Depending on your application's freshness requirements, you can configure it to refresh more or less frequently. By using WHERE time BETWEEN bin(@scheduled\$1runtime, 1h) - 1h AND bin(@scheduled\$1runtime, 1h) \$1 1h, we can ensure that even if you are refreshing once every 15 minutes, you will get the full hour's data for the current hour and the previous hour. 

Later on, you will see how the three panels use these aggregates written to table deployment\$1cpu\$1stats\$1per\$1hr to visualize the metrics that are relevant to the panel.

```
{
    "Name": "MultiPT30mHighCpuDeploymentsPerHr",
    "QueryString": "WITH microservice_cell_avg AS (    SELECT region, cell, silo, availability_zone, microservice_name, bin(time, 1h) as hour, AVG(cpu_user) AS microservice_avg_metric    FROM raw_data.devops    WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND bin(@scheduled_runtime, 1h) + 1h    AND measure_name = 'metrics'    GROUP BY region, cell, silo, availability_zone, microservice_name, bin(time, 1h)    ), instance_avg AS (    SELECT region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h) as hour,    AVG(cpu_user) AS instance_avg_metric    FROM raw_data.devops    WHERE time BETWEEN bin(@scheduled_runtime, 1h) - 1h AND bin(@scheduled_runtime, 1h) + 1h    AND measure_name = 'metrics'    GROUP BY region, cell, silo, availability_zone, microservice_name, instance_name, bin(time, 1h)    ), instances_above_threshold AS (    SELECT i.*,    CASE WHEN i.instance_avg_metric > (1 + 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS high_utilization,    CASE WHEN i.instance_avg_metric < (1 - 0.2) * m.microservice_avg_metric THEN 1 ELSE 0 END AS low_utilization    FROM instance_avg i INNER JOIN microservice_cell_avg m    ON i.region = m.region AND i.cell = m.cell AND i.silo = m.silo AND i.availability_zone = m.availability_zone    AND m.microservice_name = i.microservice_name AND m.hour = i.hour    )     SELECT region, cell, silo, microservice_name, hour,         COUNT(*) AS num_hosts, SUM(high_utilization) AS high_utilization_hosts, SUM(low_utilization) AS low_utilization_hosts    FROM instances_above_threshold GROUP BY region, cell, silo, microservice_name, hour",
    "ScheduleConfiguration": {
        "ScheduleExpression": "cron(0/30 * * * ? *)"
    },
    "NotificationConfiguration": {
        "SnsConfiguration": {
            "TopicArn": "******"
        }
    },
    "TargetConfiguration": {
        "TimestreamConfiguration": {
            "DatabaseName": "derived",
            "TableName": "deployment_cpu_stats_per_hr",
            "TimeColumn": "hour",
            "DimensionMappings": [
                {
                    "Name": "region",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "cell",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "silo",
                    "DimensionValueType": "VARCHAR"
                },
                {
                    "Name": "microservice_name",
                    "DimensionValueType": "VARCHAR"
                }
            ],
            "MultiMeasureMappings": {
                "TargetMultiMeasureName": "cpu_user",
                "MultiMeasureAttributeMappings": [
                    {
                        "SourceColumn": "num_hosts",
                        "MeasureValueType": "BIGINT"
                    },
                    {
                        "SourceColumn": "high_utilization_hosts",
                        "MeasureValueType": "BIGINT"
                    },
                    {
                        "SourceColumn": "low_utilization_hosts",
                        "MeasureValueType": "BIGINT"
                    }
                ]
            }
        }
    },
    "ErrorReportConfiguration": {
        "S3Configuration" : {
            "BucketName" : "******",
            "ObjectKeyPrefix": "errors",
            "EncryptionOption": "SSE_S3"
        }
    },
    "ScheduledQueryExecutionRoleArn": "******"
}
```

## Dashboard from pre-computed results
<a name="scheduledqueries-example3-dashboard-precompute"></a>

**High CPU utilization hosts**

For the high utilization hosts, you will see how the different panels use the data from deployment\$1cpu\$1stats\$1per\$1hr to compute different aggregates necessary for the panels. For instance, this panels provides region-level information, so it reports aggregates grouped by region and microservice, without filtering any region or microservice. 

![\[Table showing microservice utilization stats across regions, with high and low host percentages.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex3_img4.png)


```
WITH per_deployment_hosts AS (
    SELECT region, cell, silo, microservice_name, 
        AVG(num_hosts) AS num_hosts, 
        AVG(high_utilization_hosts) AS high_utilization_hosts, 
        AVG(low_utilization_hosts) AS low_utilization_hosts
    FROM "derived"."deployment_cpu_stats_per_hr"
    WHERE time BETWEEN from_milliseconds(1636567785437) AND from_milliseconds(1636654185437)
        AND measure_name = 'cpu_user'
    GROUP BY region, cell, silo, microservice_name
), per_deployment_high AS (
    SELECT region, microservice_name,
        SUM(num_hosts) AS num_hosts, 
        ROUND(SUM(high_utilization_hosts), 0) AS high_utilization_hosts,
        ROUND(SUM(low_utilization_hosts),0) AS low_utilization_hosts,
        ROUND(SUM(high_utilization_hosts) * 100.0 / SUM(num_hosts)) AS percent_high_utilization_hosts,
        ROUND(SUM(low_utilization_hosts) * 100.0 / SUM(num_hosts)) AS percent_low_utilization_hosts
    FROM per_deployment_hosts
    GROUP BY region, microservice_name
), 
per_region_ranked AS (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY region ORDER BY percent_high_utilization_hosts DESC, high_utilization_hosts DESC) AS rank
    FROM per_deployment_high
)
SELECT *
FROM per_region_ranked
WHERE rank <= 2
ORDER BY percent_high_utilization_hosts desc, rank asc
```

**Drill down into a microservice to find high CPU usage deploymentss**

This next example again uses the deployment\$1cpu\$1stats\$1per\$1hr derived table, but now applies a filter for a specific microservice (demeter in this example, since it reported high utilization hosts in the aggregate dashboard). This panel tracks the percentage of high CPU utilization hosts over time.

![\[Graph showing consistent high CPU utilization percentages for multiple deployments over 24 hours.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/sched_query_ex3_img5.png)


```
WITH high_utilization_percent AS (
    SELECT region, cell, silo, microservice_name, bin(time, 1h) AS hour, MAX(num_hosts) AS num_hosts, 
        MAX(high_utilization_hosts) AS high_utilization_hosts,
        ROUND(MAX(high_utilization_hosts) * 100.0 / MAX(num_hosts)) AS percent_high_utilization_hosts
    FROM "derived"."deployment_cpu_stats_per_hr"
    WHERE time BETWEEN from_milliseconds(1636525800000) AND from_milliseconds(1636612200000)
        AND measure_name = 'cpu_user'
        AND microservice_name = 'demeter'
    GROUP BY region, cell, silo, microservice_name, bin(time, 1h)
), high_utilization_ranked AS (
    SELECT region, cell, silo, microservice_name, 
        DENSE_RANK() OVER (PARTITION BY region ORDER BY AVG(percent_high_utilization_hosts) desc, AVG(high_utilization_hosts) desc) AS rank
    FROM high_utilization_percent 
    GROUP BY region, cell, silo, microservice_name
)
SELECT hup.silo, CREATE_TIME_SERIES(hour, hup.percent_high_utilization_hosts) AS percent_high_utilization_hosts
FROM high_utilization_percent hup INNER JOIN high_utilization_ranked hur
    ON hup.region = hur.region AND hup.cell = hur.cell AND hup.silo = hur.silo AND hup.microservice_name = hur.microservice_name
WHERE rank <= 2
GROUP BY hup.region, hup.cell, hup.silo
ORDER BY hup.silo
```

# Comparing a query on a base table with a query of scheduled query results
<a name="scheduledqueries-example4-clickstream"></a>

In this Timestream query example, we use the following schema, example queries, and outputs to compare a query on a base table with a query on a derived table of scheduled query results. With a well-planned scheduled query, you can get a derived table with fewer rows and other characteristics that can lead to faster queries than would be possible on the original base table. 

For a video that describes this scenario, see [Improve query performance and reduce cost using scheduled queries in Amazon Timestream for LiveAnalytics](https://youtu.be/x8AgLhAydzY).

For this example, we use the following scenario:
+ **Region** – us-east-1
+ **Base table** – `"clickstream"."shopping"`
+ **Derived table** – `"clickstream"."aggregate"`

## Base table
<a name="scheduledqueries-example4-clickstream-base-table"></a>

The following describes the schema for the base table.


| Column | Type | Timestream for LiveAnalytics attribute type | 
| --- | --- | --- | 
|  channel  |  varchar  |  MULTI  | 
|  description  |  varchar  |  MULTI  | 
|  event  |  varchar  |  DIMENSION  | 
|  ip\$1address  |  varchar  |  DIMENSION  | 
|  measure\$1name  |  varchar  |  MEASURE\$1NAME  | 
|  product  |  varchar  |  MULTI  | 
|  product\$1id  |  varchar  |  MULTI  | 
|  quantity  |  double  |  MULTI  | 
|  query  |  varchar  |  MULTI  | 
|  session\$1id  |  varchar  |  DIMENSION  | 
|  user\$1group  |  varchar  |  DIMENSION  | 
|  user\$1id  |  varchar  |  DIMENSION  | 

The following describes the measures for the base table. A *base table* refers to a table in Timestream that scheduled query is run on.
+ **measure\$1name** – `metrics`
+ **data** – multi
+ **dimensions**:

  ```
  [ ( user_group, varchar ),( user_id, varchar ),( session_id, varchar ),( ip_address, varchar ),( event, varchar ) ]
  ```

## Query on a base table
<a name="scheduledqueries-example4-clickstream-base-table-query"></a>

The following is an ad-hoc query that gathers counts by a 5-minute aggregate in a given time range.

```
SELECT BIN(time, 5m) as time, 
channel, 
product_id,
SUM(quantity) as product_quantity 
FROM "clickstream"."shopping" 
WHERE BIN(time, 5m) BETWEEN '2023-05-11 10:10:00.000000000' AND '2023-05-11 10:30:00.000000000'
AND channel = 'Social media'
and product_id = '431412'
GROUP BY BIN(time, 5m),channel,product_id
```

Output:

```
duration:1.745 sec
Bytes scanned: 29.89 MB
Query Id: AEBQEANMHG7MHHBHCKJ3BSOE3QUGIDBGWCCP5I6J6YUW5CVJZ2M3JCJ27QRMM7A
Row count:5
```

## Scheduled query
<a name="scheduledqueries-example4-clickstream-scheduled-query"></a>

The following is a scheduled query that runs every 5 minutes.

```
SELECT BIN(time, 5m) as time, channel as measure_name, product_id, product, 
SUM(quantity) as product_quantity 
FROM "clickstream"."shopping" 
WHERE time BETWEEN BIN(@scheduled_runtime, 5m) - 10m AND BIN(@scheduled_runtime, 5m) - 5m 
AND channel = 'Social media' 
GROUP BY BIN(time, 5m), channel, product_id, product
```

## Query on a derived table
<a name="scheduledqueries-example4-clickstream-derived-table"></a>

The following is an ad-hoc query on a derived table. A *derived table* refers to a Timestream table that contains the results of a scheduled query.

```
SELECT time, measure_name, product_id,product_quantity 
FROM "clickstream"."aggregate"
WHERE time BETWEEN '2023-05-11 10:10:00.000000000' AND '2023-05-11 10:30:00.000000000'
AND measure_name = 'Social media'
and product_id = '431412'
```

Output:

```
duration: 0.2960 sec
Bytes scanned: 235.00 B
QueryID: AEBQEANMHHAAQU4FFTT6CFM6UYXTL4SMLZV22MFP4KV2Z7IRVOPLOMLDD6BR33Q
Row count: 5
```

## Comparison
<a name="scheduledqueries-example4-clickstream-comparison"></a>

The following is a comparison of the results of a query on a base table with a query on a derived table. The same query on a derived table that has aggregated results done through a scheduled query completes faster with fewer scanned bytes. 

These results show the value of using scheduled queries to aggregate data for faster queries.


|  | Query on base table | Query on derived table | 
| --- | --- | --- | 
|  Duration  |  1.745 sec  |  0.2960 sec  | 
|  Bytes scanned  |  29.89 MB  |  235 bytes  | 
|  Row count  |  5  |  5  | 