Access, query, and join Amazon DynamoDB tables using Athena - AWS Prescriptive Guidance

Access, query, and join Amazon DynamoDB tables using Athena

Created by Moinul Al-Mamun (AWS)

Environment: Production

Technologies: Analytics; Databases; Serverless

AWS services: Amazon Athena; Amazon DynamoDB; AWS Lambda; Amazon S3

Summary

This pattern shows you how to set up a connection between Amazon Athena and Amazon DynamoDB by using the Amazon Athena DynamoDB connector. The connector uses an AWS Lambda function to query the data in DynamoDB. You don’t need to write any code to set up the connection. After the connection is established, you can quickly access and analyze DynamoDB tables by using Athena Federated Query to run SQL commands from Athena. You can also join one or more DynamoDB tables to each other or to other data sources, such as Amazon Redshift or Amazon Aurora.

Prerequisites and limitations

Prerequisites

  • An active AWS account with permissions to manage DynamoDB tables, Athena Data sources, Lambda, and AWS Identity and Access Management (IAM) roles

  • An Amazon Simple Storage Service (Amazon S3) bucket where Athena can store query results

  • An S3 bucket where the Athena DynamoDB Connector can save the data in the short term

  • An AWS Region that supports Athena engine version 2

  • IAM permissions to access Athena and the required S3 buckets

  • Amazon Athena DynamoDB Connector, installed

Limitations

There is a cost for querying DynamoDB tables. Table sizes exceeding a few gigabytes (GBs) can incur a high cost. We recommend that you consider cost before performing any full table SCAN operation. For more information, see Amazon DynamoDB pricing. To reduce costs and achieve high performance, we recommend that you always use LIMIT in your query (for example, SELECT * FROM table1 LIMIT 10). Also, before you perform a JOIN or GROUP BY query in a production environment, consider the size of your tables. If your tables are too large, consider alternative options such as migrating the table to Amazon S3.

Architecture

The following diagram shows how a user can run a SQL query on a DynamoDB table from Athena.

Workflow for connecting Athena and DynamoDB to run a SQL query.

The diagram shows the following workflow:

  1. To query a DynamoDB table, a user runs a SQL query from Athena.

  2. Athena initiates a Lambda function.

  3. The Lambda function queries the requested data in the DynamoDB table.

  4. DynamoDB returns the requested data to the Lambda function. Then, the function transfers the query results to the user through Athena.

  5. The Lambda function stores data in the S3 bucket.

Technology stack

  • Amazon Athena

  • Amazon DynamoDB

  • Amazon S3

  • AWS Lambda

Tools

  • Amazon Athena is an interactive query service that helps you analyze data directly in Amazon S3 by using standard SQL.

  • Amazon Athena DynamoDB Connector is an AWS tool that enables Athena to connect with DynamoDB and access your tables by using SQL queries.

  • Amazon DynamoDB is a fully managed NoSQL database service that provides fast, predictable, and scalable performance.

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.

Epics

TaskDescriptionSkills required

Create the first sample table.

  1. Sign in to the AWS Management Console and open the DynamoDB console.

  2. Choose Create table.

  3. For Table name, enter dydbtable1.

  4. For Partition key, enter PK1.

  5. For Sort key, enter SK1.

  6. In the Table settings section, choose Customize settings.

  7. In the Table class section, choose DynamoDB Standard.

  8. In the Read/write capacity settings section, for Capacity mode, choose On-demand.

  9. In the Encryption at rest section, choose Owned by Amazon DynamoDB.

  10. Choose Create table.

Developer

Insert sample data into the first table.

  1. Open the DynamoDB console.

  2. In the navigation pane, choose Table, and then choose your table in the Name column.

  3. Choose Actions, and then choose Create item.

  4. Choose JSON view.

  5. In the title bar of the Attributes editor, turn off View DynamoDB JSON.

  6. In the Attributes editor, enter the following sample data one by one:

{ "PK1": "1234", "SK1": "info", "Salary": "5000" }
{ "PK1": "1235", "SK1": "info", "Salary": "5200" }
Developer

Create the second sample table.

  1. Open the DynamoDB console.

  2. Choose Create table.

  3. For Table name, enter dydbtable2.

  4. For Partition key, enter PK2.

  5. For Sort key, enter SK2.

  6. In the Table settings section, choose Customize settings.

  7. In the Table class section, choose DynamoDB Standard.

  8. In the Read/write capacity settings section, for Capacity mode, choose On-demand.

  9. In the Encryption at rest section, choose Owned by Amazon DynamoDB.

  10. Choose Create table.

Developer

Insert sample data into the second table.

  1. Open the DynamoDB console.

  2. In the navigation pane, choose Table, and then choose your table in the Name column.

  3. Choose Actions, and then choose Create item.

  4. In the title bar of the Attributes editor, turn off View DynamoDB JSON.

  5. In the Attributes editor, enter the following sample data one by one:

{ "PK2": "1234", "SK2": "bonus", "Bonus": "500" }
{ "PK2": "1235", "SK2": "bonus", "Bonus": "1000" }
Developer
TaskDescriptionSkills required

Set up the data source connector.

Create a data source for DynamoDB, and then create a Lambda function to connect to that data source.

  1. Sign in to the AWS Management Console and open the Athena console.

  2. In the navigation pane, choose Data sources, and then choose Create data source.

  3. Choose the Amazon DynamoDB data source, and then choose Next.

  4. In the Data source details section, for Data source name, enter testDynamoDB.

  5. In the Connection details section, select a Lambda function that’s already deployed or choose Create Lambda function if you don’t have a Lambda function to use for this pattern. Note: For more information on creating a Lambda function, see Getting started with Lambda in the Lambda Developer Guide.

  6. (Optional) If you choose Create Lambda function, then you must configure the AWS CloudFormation template that’s included by the Java application before deploying that stack. The template includes ApplicationName, SpillBucket, AthenaCatalogName, and other application settings. Note: After you deploy this Java-based application, the stack creates a Lambda function that enables Athena to communicate with DynamoDB. This makes your tables accessible through SQL commands.

  7. Deploy your Lambda function.

  8. Choose Next.

Developer

Verify that the Lambda function can access the S3 spill bucket.

  1. Open the Lambda console.

  2. In the navigation pane, choose Functions, and then choose the function that you created earlier.

  3. Choose the Configuration tab.

  4. In the left pane, choose Environment variables, and then confirm that the value for the key is spill_bucket.

  5. In the left pane, choose Permissions, and then in the Execution role section, choose the attached IAM role. Note: You are directed to the IAM role that’s attached to your Lambda function in the IAM console.

  6. Confirm that you have write permission on spill_bucket bucket.

If you experience errors, see the Additional information section in this pattern for guidance.

Developer
TaskDescriptionSkills required

Query the DynamoDB tables.

  1. Sign in to the AWS Management Console and open the Athena console.

  2. In the navigation pane, choose Data sources, and then choose Create data source.

  3. In the navigation pane, choose Query editor.

  4. On the Editor tab, in the Data section, for Data source, choose your data source for Data source.

  5. For Database, choose your database.

  6. For Query 1, enter the following query: SELECT * FROM dydbtable1 t1;

  7. Choose Run, and then verify the output in the table.

  8. For Query 2, enter the following query: SELECT * FROM dydbtable2 t2;

  9. Choose Run, and then verify the output in the table.

Developer

Join the two DynamoDB tables.

DynamoDB is a NoSQL data store and doesn’t support the SQL join operation. Consequently, you must perform a join operation on two DynamoDB tables:

  1. Choose the plus icon to create another query.

  2. For Query 3, enter the following query:

SELECT pk1, salary, bonus FROM dydbtable1 t1 JOIN dydbtable2 t2 ON t1.pk1 = t2.pk2;
Developer

Related resources

Additional information

If you run a query in Athena with spill_bucket in the {bucket_name}/folder_name/ format, then you can receive the following error message:

"GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[arn:aws:lambda:us-east-1:xxxxxx:function:testdynamodb] executed in context[retrieving meta-data] with message[You do NOT own the spill bucket with the name: s3://amzn-s3-demo-bucket/athena_dynamodb_spill_data/] This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: [query-id]"

To resolve this error, update the Lambda function’s environment variable spill_bucket to {bucket_name_only}, and then update the following Lambda IAM policy for bucket write access:

{ "Action": [ "s3:GetObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:GetObjectVersion", "s3:PutObject", "s3:PutObjectAcl", "s3:GetLifecycleConfiguration", "s3:PutLifecycleConfiguration", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::spill_bucket", "arn:aws:s3:::spill_bucket/*" ], "Effect": "Allow" }

Alternatively, you can remove the Athena data source connector that you created earlier, and recreate it by using only {bucket_name} for spill_bucket.