For similar capabilities to Amazon Timestream for LiveAnalytics, consider Amazon Timestream for InfluxDB. It offers simplified data ingestion and single-digit millisecond query response times for real-time analytics. Learn more [here](https://docs.aws.amazon.com//timestream/latest/developerguide/timestream-for-influxdb.html). # Using query insights to optimize queries in Amazon Timestream Query insights is a performance tuning feature that helps you optimize your queries, improve their performance, and reduce costs. With query insights, you can assess the temporal, time-based, and spatial partition key-based pruning efficiency of your queries. Using query insights, you can also identify areas for improvement to enhance query performance. In addition, with query insights, you can evaluate how effectively your queries use time-based and partition key-based indexing to optimize data retrieval. To optimize query performance, it's essential to fine-tune both the temporal and spatial parameters that govern query execution. **Topics** + [ ## Benefits of query insights ](#query-insights-benefits) + [ # Optimizing data access in Amazon Timestream ](query-insights-optimize-data-access-pattern.md) + [ # Enabling query insights in Amazon Timestream ](enable-query-insights.md) + [ # Optimizing queries using query insights response ](optimize-query-using-query-insights.md) ## Benefits of query insights The following are the key benefits of using query insights: + **Identifying inefficient queries** – Query insights provides information on the time-based and attribute-based pruning of the tables accessed by the query. This information helps you identify the tables that are sub-optimally accessed. + **Optimizing your data model and partitioning** – You can use the query insights information to access and fine-tune your data model and partitioning strategy. + **Tuning queries** – Query insights highlights opportunities to use indexes more effectively. # Optimizing data access in Amazon Timestream You can optimize the data access patterns in Amazon Timestream using the Timestream partitioning scheme or data organization techniques. **Topics** + [ ## Timestream partitioning scheme ](#query-insights-optimize-data-access-partitioning-scheme) + [ ## Data organization ](#query-insights-optimize-data-access-data-org) ## Timestream partitioning scheme Amazon Timestream uses a highly scalable partitioning scheme where each Timestream table can have hundreds, thousands, or even millions of independent partitions. A highly available partition tracking and indexing service manages the partitioning, minimizing the impact of failures and making the system more resilient. ![\[Timestream partitioning scheme\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/QueryInsights/ts-partitioning-scheme.png) ## Data organization Timestream stores each data point it ingests in a single partition. As you ingest data into a Timestream table, Timestream automatically creates partitions based on the timestamps, partition key, and other context attributes in the data. In addition to partitioning the data on time (temporal partitioning), Timestream also partitions the data based on the selected partitioning key and other dimensions (spatial partitioning). This approach is designed to distribute write traffic and allow for effective pruning of data for queries. The query insights feature provides valuable insights into the pruning efficiency of the query, which includes query spatial coverage and query temporal coverage. **Topics** + [ ### QuerySpatialCoverage ](#query-insights-data-org-query-spatial-cvg) + [ ### QueryTemporalCoverage ](#query-insights-data-org-query-temporal-cvg) ### QuerySpatialCoverage The [QuerySpatialCoverage](https://docs.aws.amazon.com/timestream/latest/developerguide/API_query_QuerySpatialCoverage.html) metric provides insights into the spatial coverage of the executed query and the table with the most inefficient spatial pruning. This information can help you identify areas of improvement in the partitioning strategy to enhance spatial pruning. The value for the `QuerySpatialCoverage` metric ranges between 0 and 1. The lower the value of the metric, the more optimal the query pruning on the spatial axis. For example, a value of 0.1 indicates that the query scans 10% of the spatial axis. A value of 1 indicates that the query scans 100% of the spatial axis. **Example Using query insights to analyze a query's spatial coverage** Say that you've a Timestream database that stores weather data. Assume that the temperature is recorded every hour from weather stations located across different states in United States. Imagine that you choose `State` as the [customer-defined partitioning key](customer-defined-partition-keys.md) (CDPK) to partition the data by state. Suppose that you execute a query to retrieve the average temperature for all weather stations in California between 2 PM and 4 PM on a specific day. The following example shows the query for this scenario. ``` SELECT AVG(temperature) FROM "weather_data"."hourly_weather" WHERE time >= '2024-10-01 14:00:00' AND time < '2024-10-01 16:00:00' AND state = 'CA'; ``` Using the query insights feature, you can analyze the query's spatial coverage. Imagine that the `QuerySpatialCoverage` metric returns a value of 0.02. This means that the query only scanned 2% of the spatial axis, which is efficient. In this case, the query was able to effectively prune the spatial range, only retrieving data from California and ignoring data from other states. On the contrary, if the `QuerySpatialCoverage` metric returned a value of 0.8, it would indicate that the query scanned 80% of the spatial axis, which is less efficient. This might suggest that the partitioning strategy needs to be refined to improve spatial pruning. For example, you can select the partition key as city or region instead of a state. By analyzing the `QuerySpatialCoverage` metric, you can identify opportunities to optimize your partitioning strategy and improve the performance of your queries. The following image shows poor spatial pruning. ![\[Result provided by the QuerySpatialCoverage metric that shows poor spatial pruning.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/QueryInsights/QuerySpatialCoverageMetricResult.png) To improve spatial pruning efficiency, you can do one or both of the following: + Add `measure_name`, the default paritioning key, or use the CDPK predicates in your query. + If you've already added the attributes mentioned in the previous point, remove functions around these attributes or clauses, such as `LIKE`. ### QueryTemporalCoverage The `QueryTemporalCoverage` metric provides insights into the temporal range scanned by the executed query, including the table with the largest time range scanned. The value for the `QueryTemporalCoverage` metric is time range represented in nanoseconds. The lower the value of this metric, the more optimal the query pruning on the temporal range. For example, a query scanning last few minutes of data is more performant than a query scanning the entire time range of the table. **Example** Say that you've a Timestream database that stores IoT sensor data, with measurements taken every minute from devices located in a manufacturing plant. Assume that you've partitioned your data by `device_ID`. Suppose that you execute a query to retrieve the average sensor reading for a specific device over the last 30 minutes. The following example shows the query for this scenario. ``` SELECT AVG(sensor_reading) FROM "sensor_data"."factory_1" WHERE device_id = 'DEV_123' AND time >= NOW() - INTERVAL 30 MINUTE and time < NOW(); ``` Using the query insights feature, you can analyze the temporal range scanned by the query. Imagine the `QueryTemporalCoverage` metric returns a value of 1800000000000 nanoseconds (30 minutes). This means that the query only scanned the last 30 minutes of data, which is a relatively narrow temporal range. This is a good sign because it indicates that the query was able to effectively prune the temporal partitioning and only retrieved the requested data. On the contrary, if the `QueryTemporalCoverage` metric returned a value of 1 year in nanoseconds, it indicates that the query scanned one year of time range in the table, which is less efficient. This might suggest that the query is not optimized for temporal pruning, and you could improve it by adding time filters. The following image shows poor temporal pruning. ![\[Result provided by the QueryTemporalCoverage metric that shows poor temporal pruning.\]](http://docs.aws.amazon.com/timestream/latest/developerguide/images/QueryInsights/QueryTemporalCoverageMetricResult.png) To improve temporal pruning, we recommend that you do one or all of the following: + Add the missing time predicates in the query and make sure that the time predicates are pruning the desired time window. + Remove functions, such as `MAX()`, around the time predicates. + Add time predicates to all the sub queries. This is important if your sub queries are joining large tables or performing complex operations. # Enabling query insights in Amazon Timestream You can enable query insights for your queries with insights delivered directly through the query response. Enabling query insights doesn't require additional infrastructure or incur any additional costs. When you enable query insights, it returns query performance related metadata fields in addition to query results as part of your query response. You can use this information to tune your queries to improve query performance and reduce query cost. For information about enabling query insights, see [Run a query](console_timestream.md#console_timestream.queries.using-console). To view examples of the responses returned by enabling query insights, see [Examples for scheduled queries](https://docs.aws.amazon.com/timestream/latest/developerguide/API_query_ExecuteScheduledQuery.html#API_query_ExecuteScheduledQuery_Examples). **Note** When you enable query insights, it rate limits the query to 1 query per second (QPS). To avoid performance impacts, we strongly recommend that you enable query insights only during the evaluation phase of your queries, before deploying them to production. The insights provided in query insights are eventually consistent, which means they might change as new data is continuously ingested into the tables. # Optimizing queries using query insights response Say that you're using Amazon Timestream for LiveAnalytics to monitor energy consumption across various locations. Imagine that you've two tables in your database named `raw-metrics` and `aggregate-metrics`. The `raw-metrics` table stores detailed energy data at the device level and contains the following columns: + Timestamp + State, for example, Washington + Device ID + Energy consumption The data for this table is collected and stored at a minute-by-minute granularity. The table uses `State` as the CDPK. The `aggregate-metrics` table stores the result of a scheduled query to aggregate the energy consumption data across all devices hourly. This table contains the following columns: + Timestamp + State, for example, Washington + Total energy consumption The `aggregate-metrics` table stores this data at an hourly granularity. The table uses `State` as the CDPK. **Topics** + [ ## Querying energy consumption for the last 24 hours ](#query-energy-consumption-data) + [ ## Optimizing the query for temporal range ](#optimize-query-temporal-range) + [ ## Optimizing the query for spatial coverage ](#optimize-query-spatial-coverage) + [ ## Improved query performance ](#improved-query-performance) ## Querying energy consumption for the last 24 hours Say that you want to extract the total energy consumed in Washington over the last 24 hours. To find this data, you can leverage the strengths of both the tables: `raw-metrics` and `aggregate-metrics`. The `aggregate-metrics` table provides hourly energy consumption data for the last 23 hours, while the `raw-metrics` table offers minute-granular data for the last one hour. By querying across both tables, you can get a complete and accurate picture of energy consumption in Washington over the last 24 hours. ``` SELECT am.time, am.state, am.total_energy_consumption, rm.time, rm.state, rm.device_id, rm.energy_consumption FROM "metrics"."aggregate-metrics" am LEFT JOIN "metrics"."raw-metrics" rm ON am.state = rm.state WHERE rm.time >= ago(1h) and rm.time < now() ``` This example query is provided for illustrative purposes only and might not work as is. It's intended to demonstrate the concept, but you might need to modify it to fit your specific use case or environment. After executing this query, you might notice that the query response time is slower than expected. To identify the root cause of this performance issue, you can use the query insights feature to analyze the query's performance and optimize its execution. The following example shows the query insights response. ``` queryInsightsResponse={ QuerySpatialCoverage: { Max: { Value: 1.0, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/raw-metrics, PartitionKey: [State] } }, QueryTemporalRange: { Max: { Value:31540000000000000 //365 days, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/aggregate-metrics } }, QueryTableCount: 2, OutputRows: 83, OutputBytes: 590 ``` The query insights response provides the following information: + **Temporal range**: The query scanned an excessive 365-day temporal range for the `aggregate-metrics` table. This indicates an inefficient use of temporal filtering. + **Spatial coverage**: The query scanned the entire spatial range (100%) of the `raw-metrics` table. This suggests that the spatial filtering isn't being utilized effectively. If your query accesses more than one table, query insights provides the metrics for the table with most sub-optimal access pattern. ## Optimizing the query for temporal range Based on the query insights response, you can optimize the query for temporal range as shown in the following example. ``` SELECT am.time, am.state, am.total_energy_consumption, rm.time, rm.state, rm.device_id, rm.energy_consumption FROM "metrics"."aggregate-metrics" am LEFT JOIN "metrics"."raw-metrics" rm ON am.state = rm.state WHERE am.time >= ago(23h) and am.time < now() AND rm.time >= ago(1h) and rm.time < now() AND rm.state = 'Washington' ``` If you run the `QueryInsights` command again, it returns the following response. ``` queryInsightsResponse={ QuerySpatialCoverage: { Max: { Value: 1.0, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/aggregate-metrics, PartitionKey: [State] } }, QueryTemporalRange: { Max: { Value: 82800000000000 //23 hours, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/aggregate-metrics } }, QueryTableCount: 2, OutputRows: 83, OutputBytes: 590 ``` This response shows that the spatial coverage for the `aggregate-metrics` table is still 100%, which is inefficient. The following section shows how to optimze the query for spatial coverage. ## Optimizing the query for spatial coverage Based on the query insights response, you can optimize the query for spatial coverage as shown in the following example. ``` SELECT am.time, am.state, am.total_energy_consumption, rm.time, rm.state, rm.device_id, rm.energy_consumption FROM "metrics"."aggregate-metrics" am LEFT JOIN "metrics"."raw-metrics" rm ON am.state = rm.state WHERE am.time >= ago(23h) and am.time < now() AND am.state ='Washington' AND rm.time >= ago(1h) and rm.time < now() AND rm.state = 'Washington' ``` If you run the `QueryInsights` command again, it returns the following response. ``` queryInsightsResponse={ QuerySpatialCoverage: { Max: { Value: 0.02, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/aggregate-metrics, PartitionKey: [State] } }, QueryTemporalRange: { Max: { Value: 82800000000000 //23 hours, TableArn: arn:aws:timestream:us-east-1:123456789012:database/metrics/table/aggregate-metrics } }, QueryTableCount: 2, OutputRows: 83, OutputBytes: 590 ``` ## Improved query performance After optimizing the query, query insights provides the following information: + Temporal pruning for the `aggregate-metrics` table is 23 hours. This indicates that only 23 hours of the temporal range is scanned. + Spatial pruning for `aggregate-metrics` table is 0.02. This indicates that only 2% of the table's spatial range data is being scanned. The query scans a very small portion of the tables leading to fast performance and reduced resource utilization. The improved pruning efficiency indicates that the query is now optimized for performance.