Amazon Athena DynamoDB connector

Focus mode

Amazon Athena DynamoDB connector - Amazon Athena

Prerequisites Limitations Parameters Setting up databases and tables in AWS Glue Required Permissions Performance Passthrough queries Troubleshooting Costs Additional resources

The Amazon Athena DynamoDB connector enables Amazon Athena to communicate with DynamoDB so that you can query your tables with SQL. Write operations like INSERT INTO are not supported.

This connector can be registered with Glue Data Catalog as a federated catalog. It supports data access controls defined in Lake Formation at the catalog, database, table, column, row, and tag levels. This connector uses Glue Connections to centralize configuration properties in Glue.

If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the AWS Serverless Application Repository must have read access in Lake Formation to the AWS Glue Data Catalog.

Prerequisites

Deploy the connector to your AWS account using the Athena console or the AWS Serverless Application Repository. For more information, see Create a data source connection or Use the AWS Serverless Application Repository to deploy a data source connector.

Limitations

If you migrate your DynamoDB connections to Glue Catalog and Lake Formation, only the lowercase table and column names will be recognized.

Parameters

Use the parameters in this section to configure the DynamoDB connector.

Glue connections (recommended)

We recommended that you configure a DynamoDB connector by using a Glue connections object. To do this, set the glue_connection environment variable of the DynamoDB connector Lambda to the name of the Glue connection to use.

Glue connections properties

Use the following command to get the schema for a Glue connection object. This schema contains all the parameters that you can use to control your connection.


aws glue describe-connection-type --connection-type DYNAMODB

Lambda environment properties

glue_connection – Specifies the name of the Glue connection associated with the federated connector.

Legacy connections

Note

Athena data source connectors created on December 3, 2024 and later use AWS Glue connections.

The parameter names and definitions listed below are for Athena data source connectors created without an associated Glue connection. Use the following parameters only when you manually deploy an earlier version of an Athena data source connector or when the glue_connection environment property is not specified.

Lambda environment properties

spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.
spill_prefix – (Optional) Defaults to a subfolder in the specified spill_bucket called athena-federation-spill. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours.
spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3 putObject request that is used for spilling (for example, {"x-amz-server-side-encryption" : "AES256"}). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference.
kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like a7e63k4b-8loc-40db-a2a1-4d0en2cd8331, you can specify a KMS key ID.
disable_spill_encryption – (Optional) When set to True, disables spill encryption. Defaults to False so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption.
disable_glue – (Optional) If present and set to true, the connector does not attempt to retrieve supplemental metadata from AWS Glue.
glue_catalog – (Optional) Use this option to specify a cross-account AWS Glue catalog. By default, the connector attempts to get metadata from its own AWS Glue account.
disable_projection_and_casing – (Optional) Disables projection and casing. Use if you want to query DynamoDB tables that have casing in their column names and you do not want to specify a columnMapping property on your AWS Glue table.

The disable_projection_and_casing parameter uses the following values to specify the behavior of casing and column mapping:
- auto – Disables projection and casing when a previously unsupported type is detected and column name mapping is not set on the table. This is the default setting.
- always – Disables projection and casing unconditionally. This is useful when you have casing in your DynamoDB column names but do not want to specify any column name mapping.
When using the disable_projection_and_casing parameter, keep in mind the following points:
- Use of the parameter can result in higher bandwidth usage. Additionally, if your Lambda function is not in the same AWS Region as your data source, you will incur higher standard AWS cross-region transfer costs as a result of the higher bandwidth usage. For more information about cross-region transfer costs, see AWS Data Transfer Charges for Server and Serverless Architectures in the AWS Partner Network Blog.
- Because a larger number of bytes is transferred and because the larger number of bytes requires a higher deserialization time, overall latency can increase.

anchor anchor

Glue connections properties

Use the following command to get the schema for a Glue connection object. This schema contains all the parameters that you can use to control your connection.


aws glue describe-connection-type --connection-type DYNAMODB

Lambda environment properties

glue_connection – Specifies the name of the Glue connection associated with the federated connector.

Setting up databases and tables in AWS Glue

Because the connector's built-in schema inference capability is limited, you might want to use AWS Glue for metadata. To do this, you must have a database and table in AWS Glue. To enable them for use with DynamoDB, you must edit their properties.

To edit database properties in the AWS Glue console

Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/.
In the navigation pane, expand Data Catalog, and then choose Databases.

On the Databases page, you can edit an existing database, or choose Add database to create one.
In the list of databases, choose the link for the database that you want to edit.
Choose Edit.
On the Update a database page, under Database settings, for Location, add the string dynamo-db-flag. This keyword indicates that the database contains tables that the Athena DynamoDB connector is using for supplemental metadata and is required for AWS Glue databases other than default. The dynamo-db-flag property is useful for filtering out databases in accounts with many databases.
Choose Update Database.

To edit table properties in the AWS Glue console

Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/.
In the navigation pane, expand Data Catalog, and then choose Tables.
On the Tables page, in the list of tables, choose the linked name of the table that you want to edit.
Choose Actions, Edit table.
On the Edit table page, in the Table properties section, add the following table properties as required. If you use the AWS Glue DynamoDB crawler, these properties are automatically set.
- dynamodb – String that indicates to the Athena DynamoDB connector that the table can be used for supplemental metadata. Enter the dynamodb string in the table properties under a field called classification (exact match).
  
  Note
  The Set table properties page that is part of the table creation process in the AWS Glue console has a Data format section with a Classification field. You cannot enter or choose dynamodb here. Instead, after you create your table, follow the steps to edit the table and to enter classification and dynamodb as a key-value pair in the Table properties section.
- sourceTable – Optional table property that defines the source table name in DynamoDB. Use this if AWS Glue table naming rules prevent you from creating a AWS Glue table with the same name as your DynamoDB table. For example, capital letters are not permitted in AWS Glue table names, but they are permitted in DynamoDB table names.
- columnMapping – Optional table property that defines column name mappings. Use this if AWS Glue column naming rules prevent you from creating a AWS Glue table with the same column names as your DynamoDB table. For example, capital letters are not permitted in AWS Glue column names but are permitted in DynamoDB column names. The property value is expected to be in the format col1=Col1,col2=Col2. Note that column mapping applies only to top level column names and not to nested fields.
- defaultTimeZone – Optional table property that is applied to date or datetime values that do not have an explicit time zone. Setting this value is a good practice to avoid discrepancies between the data source default time zone and the Athena session time zone.
- datetimeFormatMapping – Optional table property that specifies the date or datetime format to use when parsing data from a column of the AWS Glue date or timestamp data type. If this property is not specified, the connector attempts to infer an ISO-8601 format. If the connector cannot infer the date or datetime format or parse the raw string, then the value is omitted from the result.
  
  The datetimeFormatMapping value should be in the format col1=someformat1,col2=someformat2. Following are some example formats:
```
yyyyMMdd'T'HHmmss 
ddMMyyyy'T'HH:mm:ss
```
  If your column has date or datetime values without a time zone and you want to use the column in the WHERE clause, set the datetimeFormatMapping property for the column.
If you define your columns manually, make sure that you use the appropriate data types. If you used a crawler, validate the columns and types that the crawler discovered.
Choose Save.

Required Permissions

For full details on the IAM policies that this connector requires, review the Policies section of the athena-dynamodb.yaml file. The following list summarizes the required permissions.

Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.
Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.
AWS Glue Data Catalog – The DynamoDB connector requires read only access to the AWS Glue Data Catalog to obtain schema information.
CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.
DynamoDB read access – The connector uses the DescribeTable, ListSchemas, ListTables, Query, and Scan API operations.

Performance

The Athena DynamoDB connector supports parallel scans and attempts to push down predicates as part of its DynamoDB queries. A hash key predicate with X distinct values results in X query calls to DynamoDB. All other predicate scenarios result in Y number of scan calls, where Y is heuristically determined based on the size of your table and its provisioned throughput. However, selecting a subset of columns sometimes results in a longer query execution runtime.

LIMIT clauses and simple predicates are pushed down and can reduce the amount of data scanned and will lead to decreased query execution run time.

LIMIT clauses

A LIMIT N statement reduces the data scanned by the query. With LIMIT N pushdown, the connector returns only N rows to Athena.

Predicates

A predicate is an expression in the WHERE clause of a SQL query that evaluates to a Boolean value and filters rows based on multiple conditions. For enhanced functionality, and to reduce the amount of data scanned, the Athena DynamoDB connector can combine these expressions and push them directly to DynamoDB.

The following Athena DynamoDB connector operators support predicate pushdown:

Boolean: AND
Equality: EQUAL, NOT_EQUAL, LESS_THAN, LESS_THAN_OR_EQUAL, GREATER_THAN, GREATER_THAN_OR_EQUAL, IS_NULL

Combined pushdown example

For enhanced querying capabilities, combine the pushdown types, as in the following example:


SELECT *
FROM my_table
WHERE col_a > 10 and col_b < 10
LIMIT 10

For an article on using predicate pushdown to improve performance in federated queries, including DynamoDB, see Improve federated queries with predicate pushdown in Amazon Athena in the AWS Big Data Blog.

Passthrough queries

The DynamoDB connector supports passthrough queries and uses PartiQL syntax. The DynamoDB GetItem API operation is not supported. For information about querying DynamoDB using PartiQL, see PartiQL select statements for DynamoDB in the Amazon DynamoDB Developer Guide.

To use passthrough queries with DynamoDB, use the following syntax:


SELECT * FROM TABLE(
        system.query(
            query => 'query_string'
        ))

The following DynamoDB passthrough query example uses PartiQL to return a list of Fire TV Stick devices that have a DateWatched property later than 12/24/22.


SELECT * FROM TABLE(
        system.query(
           query => 'SELECT Devices 
                       FROM WatchList 
                       WHERE Devices.FireStick.DateWatched[0] > '12/24/22''
        ))

Troubleshooting

Multiple filters on a sort key column

Error message: KeyConditionExpressions must only contain one condition per key

Cause: This issue can occur in Athena engine version 3 in queries that have both a lower and upper bounded filter on a DynamoDB sort key column. Because DynamoDB does not support more than one filter condition on a sort key, an error is thrown when the connector attempts to push down a query that has both conditions applied.

Solution: Update the connector to version 2023.11.1 or later. For instructions on updating a connector, see Update a data source connector.

Costs

The costs for use of the connector depends on the underlying AWS resources that are used. Because queries that use scans can consume a large number of read capacity units (RCUs), consider the information for Amazon DynamoDB pricing carefully.

Additional resources

For an introduction to using the Amazon Athena DynamoDB connector, see Access, query, and join Amazon DynamoDB tables using Athena in the AWS Prescriptive Guidance Patterns guide.
For an article on how to use the Athena DynamoDB connector to query data in DynamoDB with SQL and visualize insights in Amazon QuickSight, see the AWS Big Data Blog post Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue.
For an article on using the Amazon Athena DynamoDB connector with Amazon DynamoDB, Athena, and Amazon QuickSight to create a simple governance dashboard, see the AWS Big Data Blog post Query cross-account Amazon DynamoDB tables using Amazon Athena Federated Query.
For additional information about this connector, visit the corresponding site on GitHub.com.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

DocumentDB

Google BigQuery

Next topic:

Google BigQuery

Previous topic:

DocumentDB

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Amazon Athena DynamoDB connector

Prerequisites

Limitations

Parameters

Note

Setting up databases and tables in AWS Glue

To edit database properties in the AWS Glue console

To edit table properties in the AWS Glue console

Note

Required Permissions

Performance

LIMIT clauses

Predicates

Combined pushdown example

Passthrough queries

Troubleshooting

Multiple filters on a sort key column

Costs

Additional resources

Next topic:

Previous topic:

Need help?

On this page

Did this page help you?