Amazon Athena Timestream connector
The Amazon Athena Timestream connector enables Amazon Athena to communicate with Amazon Timestream
Amazon Timestream is a fast, scalable, fully managed, purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. Timestream saves you time and cost in managing the lifecycle of time series data by keeping recent data in memory and moving historical data to a cost optimized storage tier based upon user defined policies.
If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.
Prerequisites
Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see Deploy a data source connector or Use the AWS Serverless Application Repository to deploy a data source connector.
Parameters
Use the Lambda environment variables in this section to configure the Timestream connector.
-
spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
-
spill_prefix – (Optional) Defaults to a subfolder in the specified
spill_bucket
calledathena-federation-spill
. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours. -
spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3
putObject
request that is used for spilling (for example,{"x-amz-server-side-encryption" : "AES256"}
). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference. -
kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like
a7e63k4b-8loc-40db-a2a1-4d0en2cd8331
, you can specify a KMS key ID. -
disable_spill_encryption – (Optional) When set to
True
, disables spill encryption. Defaults toFalse
so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption. -
glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.
Setting up databases and tables in AWS Glue
You can optionally use the AWS Glue Data Catalog as a source of supplemental metadata. To enable an AWS Glue table for use with Timestream, you must have an AWS Glue database and table with names that match the Timestream database and table that you want to supply supplemental metadata for.
Note
For best performance, use only lowercase for your database names and table names. Using mixed casing causes the connector to perform a case insensitive search that is more computationally intensive.
To configure AWS Glue table for use with Timestream, you must set its table properties in AWS Glue.
To use an AWS Glue table for supplemental metadata
-
Edit the table in the AWS Glue console to add the following table properties:
timestream-metadata-flag – This property indicates to the Timestream connector that the connector can use the table for supplemental metadata. You can provide any value for
timestream-metadata-flag
as long as thetimestream-metadata-flag
property is present in the list of table properties.-
_view_template – When you use AWS Glue for supplemental metadata, you can use this table property and specify any Timestream SQL as the view. The Athena Timestream connector uses the SQL from the view together with your SQL from Athena to run your query. This is useful if you want to use a feature of Timestream SQL that is not otherwise available in Athena.
-
Make sure that you use the data types appropriate for AWS Glue as listed in this document.
Data types
Currently, the Timestream connector supports only a subset of the data types available
in Timestream, specifically: the scalar values varchar
,
double
, and timestamp
.
To query the timeseries
data type, you must configure a view in AWS Glue
table properties that uses the Timestream CREATE_TIME_SERIES
function. You
also need to provide a schema for the view that uses the syntax
ARRAY<STRUCT<time:timestamp,measure_value::double:double>>
as the type for any of your time series columns. Be sure to replace
double
with the appropriate scalar type for your table.
The following image shows an example of AWS Glue table properties configured to set up a view over a time series.
Required Permissions
For full details on the IAM policies that this
connector requires, review the Policies
section of the athena-timestream.yaml
-
Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
-
Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
-
AWS Glue Data Catalog – The Timestream connector requires read only access to the AWS Glue Data Catalog to obtain schema information.
-
CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.
-
Timestream Access – For running Timestream queries.
Performance
We recommend that you use the LIMIT
clause to limit the data returned
(not the data scanned) to less than 256 MB to ensure that interactive queries are
performant.
The Athena Timestream connector performs predicate pushdown to decrease the data scanned by the query. LIMIT
clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT
queries with a LIMIT
clause to scan at least 16 MB of data. Selecting a subset of columns significantly speeds up query runtime and reduces data scanned.
The Timestream connector is resilient to throttling due to concurrency.
Passthrough queries
The Timestream connector supports passthrough queries. Passthrough queries use a table function to push your full query down to the data source for execution.
To use passthrough queries with Timestream, you can use the following syntax:
SELECT * FROM TABLE( system.query( query => '
query string
' ))
The following example query pushes down a query to a data source in Timestream. The query
selects all columns in the customer
table, limiting the results to 10.
SELECT * FROM TABLE( system.query( query => 'SELECT * FROM customer LIMIT 10' ))
License information
The Amazon Athena Timestream connector project is licensed under the Apache-2.0 License
Additional resources
For additional information about this connector, visit the corresponding site