Amazon Athena DynamoDB connector
The Amazon Athena DynamoDB connector enables Amazon Athena to communicate with DynamoDB so that you can query your tables with SQL. Write operations like INSERT INTO are not supported.
If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.
Prerequisites
Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see Deploy a data source connector or Use the AWS Serverless Application Repository to deploy a data source connector.
Parameters
Use the Lambda environment variables in this section to configure the DynamoDB connector.
-
spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
-
spill_prefix – (Optional) Defaults to a subfolder in the specified
spill_bucket
calledathena-federation-spill
. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours. -
spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3
putObject
request that is used for spilling (for example,{"x-amz-server-side-encryption" : "AES256"}
). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference. -
kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like
a7e63k4b-8loc-40db-a2a1-4d0en2cd8331
, you can specify a KMS key ID. -
disable_spill_encryption – (Optional) When set to
True
, disables spill encryption. Defaults toFalse
so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption. -
disable_glue – (Optional) If present and set to true, the connector does not attempt to retrieve supplemental metadata from AWS Glue.
-
glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.
-
disable_projection_and_casing – (Optional) Disables projection and casing. Use if you want to query DynamoDB tables that have casing in their column names and you do not want to specify a
columnMapping
property on your AWS Glue table.The
disable_projection_and_casing
parameter uses the following values to specify the behavior of casing and column mapping:-
auto – Disables projection and casing when a previously unsupported type is detected and column name mapping is not set on the table. This is the default setting.
-
always – Disables projection and casing unconditionally. This is useful when you have casing in your DynamoDB column names but do not want to specify any column name mapping.
When using the
disable_projection_and_casing
parameter, keep in mind the following points:-
Use of the parameter can result in higher bandwidth usage. Additionally, if your Lambda function is not in the same AWS Region as your data source, you will incur higher standard AWS cross-region transfer costs as a result of the higher bandwidth usage. For more information about cross-region transfer costs, see AWS Data Transfer Charges for Server and Serverless Architectures
in the AWS Partner Network Blog. -
Because a larger number of bytes is transferred and because the larger number of bytes requires a higher deserialization time, overall latency can increase.
-
Setting up databases and tables in AWS Glue
Because the connector's built-in schema inference capability is limited, you might want to use AWS Glue for metadata. To do this, you must have a database and table in AWS Glue. To enable them for use with DynamoDB, you must edit their properties.
To edit database properties in the AWS Glue console
Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/
. -
In the navigation pane, expand Data Catalog, and then choose Databases.
On the Databases page, you can edit an existing database, or choose Add database to create one.
-
In the list of databases, choose the link for the database that you want to edit.
-
Choose Edit.
-
On the Update a database page, under Database settings, for Location, add the string
dynamo-db-flag
. This keyword indicates that the database contains tables that the Athena DynamoDB connector is using for supplemental metadata and is required for AWS Glue databases other thandefault
. Thedynamo-db-flag
property is useful for filtering out databases in accounts with many databases. -
Choose Update Database.
To edit table properties in the AWS Glue console
Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/
. -
In the navigation pane, expand Data Catalog, and then choose Tables.
-
On the Tables page, in the list of tables, choose the linked name of the table that you want to edit.
-
Choose Actions, Edit table.
-
On the Edit table page, in the Table properties section, add the following table properties as required. If you use the AWS Glue DynamoDB crawler, these properties are automatically set.
-
dynamodb – String that indicates to the Athena DynamoDB connector that the table can be used for supplemental metadata. Enter the
dynamodb
string in the table properties under a field called classification (exact match).Note
The Set table properties page that is part of the table creation process in the AWS Glue console has a Data format section with a Classification field. You cannot enter or choose
dynamodb
here. Instead, after you create your table, follow the steps to edit the table and to enterclassification
anddynamodb
as a key-value pair in the Table properties section. -
sourceTable – Optional table property that defines the source table name in DynamoDB. Use this if AWS Glue table naming rules prevent you from creating a AWS Glue table with the same name as your DynamoDB table. For example, capital letters are not permitted in AWS Glue table names, but they are permitted in DynamoDB table names.
-
columnMapping – Optional table property that defines column name mappings. Use this if AWS Glue column naming rules prevent you from creating a AWS Glue table with the same column names as your DynamoDB table. For example, capital letters are not permitted in AWS Glue column names but are permitted in DynamoDB column names. The property value is expected to be in the format col1=Col1,col2=Col2. Note that column mapping applies only to top level column names and not to nested fields.
-
defaultTimeZone – Optional table property that is applied to
date
ordatetime
values that do not have an explicit time zone. Setting this value is a good practice to avoid discrepancies between the data source default time zone and the Athena session time zone. -
datetimeFormatMapping – Optional table property that specifies the
date
ordatetime
format to use when parsing data from a column of the AWS Gluedate
ortimestamp
data type. If this property is not specified, the connector attempts to inferan ISO-8601 format. If the connector cannot infer the date
ordatetime
format or parse the raw string, then the value is omitted from the result.The
datetimeFormatMapping
value should be in the formatcol1=someformat1,col2=someformat2
. Following are some example formats:yyyyMMdd'T'HHmmss ddMMyyyy'T'HH:mm:ss
If your column has
date
ordatetime
values without a time zone and you want to use the column in theWHERE
clause, set thedatetimeFormatMapping
property for the column.
-
-
If you define your columns manually, make sure that you use the appropriate data types. If you used a crawler, validate the columns and types that the crawler discovered.
-
Choose Save.
Required Permissions
For full details on the IAM policies that this
connector requires, review the Policies
section of the athena-dynamodb.yaml
-
Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
-
Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
-
AWS Glue Data Catalog – The DynamoDB connector requires read only access to the AWS Glue Data Catalog to obtain schema information.
-
CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.
-
DynamoDB read access – The connector uses the
DescribeTable
,ListSchemas
,ListTables
,Query
, andScan
API operations.
Performance
The Athena DynamoDB connector supports parallel scans and attempts to push down predicates
as part of its DynamoDB queries. A hash key predicate with X
distinct values
results in X
query calls to DynamoDB. All other predicate scenarios result in
Y
number of scan calls, where Y
is heuristically
determined based on the size of your table and its provisioned throughput.
However, selecting a subset of columns sometimes results in a longer query execution runtime.
LIMIT
clauses and simple predicates are pushed down and can reduce the
amount of data scanned and will lead to decreased query execution run time.
LIMIT clauses
A LIMIT N
statement reduces the data scanned by the query. With
LIMIT N
pushdown, the connector returns only N
rows to
Athena.
Predicates
A predicate is an expression in the WHERE
clause of a SQL query that
evaluates to a Boolean value and filters rows based on multiple conditions. For
enhanced functionality, and to reduce the amount of data scanned, the Athena DynamoDB
connector can combine these expressions and push them directly to DynamoDB.
The following Athena DynamoDB connector operators support predicate pushdown:
-
Boolean: AND
-
Equality: EQUAL, NOT_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL, GREATER_THAN, GREATER_THAN_OR_EQUAL, IS_NULL
Combined pushdown example
For enhanced querying capabilities, combine the pushdown types, as in the following example:
SELECT * FROM my_table WHERE col_a > 10 and col_b < 10 LIMIT 10
For an article on using predicate pushdown to improve performance in federated
queries, including DynamoDB, see Improve federated queries with predicate pushdown in Amazon Athena
Passthrough queries
The DynamoDB connector supports passthrough queries and uses PartiQL syntax. The DynamoDB GetItem API operation is not supported. For information about querying DynamoDB using PartiQL, see PartiQL select statements for DynamoDB in the Amazon DynamoDB Developer Guide.
To use passthrough queries with DynamoDB, use the following syntax:
SELECT * FROM TABLE( system.query( query => '
query_string
' ))
The following DynamoDB passthrough query example uses PartiQL to return a list of Fire TV
Stick devices that have a DateWatched
property later than 12/24/22.
SELECT * FROM TABLE( system.query( query => 'SELECT Devices FROM WatchList WHERE Devices.FireStick.DateWatched[0] > '12/24/22'' ))
Troubleshooting
Multiple filters on a sort key column
Error message: KeyConditionExpressions
must only contain one condition per key
Cause: This issue can occur in Athena engine version 3 in queries that have both a lower and upper bounded filter on a DynamoDB sort key column. Because DynamoDB does not support more than one filter condition on a sort key, an error is thrown when the connector attempts to push down a query that has both conditions applied.
Solution: Update the connector to version 2023.11.1 or later. For instructions on updating a connector, see Update a data source connector.
Costs
The costs for use of the connector depends on the underlying AWS resources that are
used. Because queries that use scans can consume a large number of read capacity units
(RCUs)
Additional resources
-
For an introduction to using the Amazon Athena DynamoDB connector, see Access, query, and join Amazon DynamoDB tables using Athena in the AWS Prescriptive Guidance Patterns guide.
-
For an article on how to use the Athena DynamoDB connector to query data in DynamoDB with SQL and visualize insights in Amazon QuickSight, see the AWS Big Data Blog post Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue
. -
For an article on using the Amazon Athena DynamoDB connector with Amazon DynamoDB, Athena, and Amazon QuickSight to create a simple governance dashboard, see the AWS Big Data Blog post Query cross-account Amazon DynamoDB tables using Amazon Athena Federated Query
. For additional information about this connector, visit the corresponding site
on GitHub.com.