Copy Amazon DynamoDB tables across accounts using a custom implementation
Created by Ramkumar Ramanujam (AWS)
Environment: Production | Source: Amazon DynamoDB | Target: Amazon DynamoDB |
R Type: N/A | Workload: All other workloads | Technologies: Databases |
AWS services: Amazon DynamoDB |
Summary
When working with Amazon DynamoDB on Amazon Web Services (AWS), a common use case is to copy or sync DynamoDB tables in development, testing, or staging environments with the table data that are in the production environment. As a standard practice, each environment uses a different AWS account.
DynamoDB now supports cross-account backup using AWS Backup. For information about associated storage costs when using AWS Backup, see AWS Backup pricing
You can also use Amazon DynamoDB Streams to capture table changes in the source account. Then you can initiate an AWS Lambda function, and make the corresponding changes in the target table in the target account. But that solution applies to use cases in which source and target tables must always be kept in sync. It might not apply to development, testing, and staging environments where data are updated frequently.
This pattern provides steps to implement a custom solution to copy a Amazon DynamoDB table from one account to another. This pattern can be implemented using common programming languages such as C#, Java, and Python. We recommend using a language that is supported by an AWS SDK
Prerequisites and limitations
Prerequisites
Two active AWS accounts
DynamoDB tables in both the accounts
Knowledge of AWS Identity and Access Management (IAM) roles and policies
Knowledge of how to access Amazon DynamoDB tables using any common programming language, such as C#, Java, or Python
Limitations
This pattern applies to DynamoDB tables that are around 2 GB or smaller. With additional logic to handle connection or session interruptions, throttling, and failures and retries, it can be used for larger tables.
The DynamoDB scan operation, which reads items from the source table, can fetch only up to 1 MB of data in a single call. For larger tables, greater than 2 GB, this limitation can increase the total time to perform a full table copy.
Architecture
The following diagram shows the custom implementation between the source and target AWS accounts. IAM policies and security tokens are used with the custom implementation. Data is read from Amazon DynamoDB in the source account and written to DynamoDB in the target account.
Automation and scale
This pattern applies to DynamoDB tables that are smaller in size, around 2 GB.
To apply this pattern for larger tables, address the following issues:
During the table copy operation, two active sessions are maintained, using different security tokens. If the table copy operation takes longer than the token expiration times, you must put in place logic to refresh the security tokens.
If enough read capacity units (RCUs) and write capacity units (WCUs) are not provisioned, reads or writes on the source or target table might get throttled. Be sure to catch and handle these exceptions.
Handle any other failures or exceptions and put a retry mechanism in place to retry or continue from where the copy operation failed.
Tools
Tools
Amazon DynamoDB – Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
The additional tools required will differ based on the programming language that you choose for the implementation. For example, if you use C#, you will need Microsoft Visual Studio and the following NuGet packages:
AWSSDK
AWSSDK.DynamoDBv2
Code
The following Python code snippet deletes and recreates a DynamoDB table using the Boto3 library.
Do not use the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
of an IAM user because these are long-term credentials, which should be avoided for programmatic access to AWS services. For more information about temporary credentials, see the Best practices section.
The AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and TEMPORARY_SESSION_TOKEN
used in the following code snippet are temporary credentials fetched from AWS Security Token Service (AWS STS).
import boto3 import sys import json #args = input-parameters = GLOBAL_SEC_INDEXES_JSON_COLLECTION, ATTRIBUTES_JSON_COLLECTION, TARGET_DYNAMODB_NAME, TARGET_REGION, ... #Input param: GLOBAL_SEC_INDEXES_JSON_COLLECTION #[{"IndexName":"Test-index","KeySchema":[{"AttributeName":"AppId","KeyType":"HASH"},{"AttributeName":"AppType","KeyType":"RANGE"}],"Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["PK","SK","OwnerName","AppVersion"]}}] #Input param: ATTRIBUTES_JSON_COLLECTION #[{"AttributeName":"PK","AttributeType":"S"},{"AttributeName":"SK","AttributeType":"S"},{"AttributeName":"AppId","AttributeType":"S"},{"AttributeName":"AppType","AttributeType":"N"}] region = args['TARGET_REGION'] target_ddb_name = args['TARGET_DYNAMODB_NAME'] global_secondary_indexes = json.loads(args['GLOBAL_SEC_INDEXES_JSON_COLLECTION']) attribute_definitions = json.loads(args['ATTRIBUTES_JSON_COLLECTION']) # Drop and create target DynamoDB table dynamodb_client = boto3.Session( aws_access_key_id=args['AWS_ACCESS_KEY_ID'], aws_secret_access_key=args['AWS_SECRET_ACCESS_KEY'], aws_session_token=args['TEMPORARY_SESSION_TOKEN'], ).client('dynamodb') # Delete table print('Deleting table: ' + target_ddb_name + ' ...') try: dynamodb_client.delete_table(TableName=target_ddb_name) #Wait for table deletion to complete waiter = dynamodb_client.get_waiter('table_not_exists') waiter.wait(TableName=target_ddb_name) print('Table deleted.') except dynamodb_client.exceptions.ResourceNotFoundException: print('Table already deleted / does not exist.') pass print('Creating table: ' + target_ddb_name + ' ...') table = dynamodb_client.create_table( TableName=target_ddb_name, KeySchema=[ { 'AttributeName': 'PK', 'KeyType': 'HASH' # Partition key }, { 'AttributeName': 'SK', 'KeyType': 'RANGE' # Sort key } ], AttributeDefinitions=attribute_definitions, GlobalSecondaryIndexes=global_secondary_indexes, BillingMode='PAY_PER_REQUEST' ) waiter = dynamodb_client.get_waiter('table_exists') waiter.wait(TableName=target_ddb_name) print('Table created.')
Best practices
Temporary credentials
As a security best practice, while accessing AWS services programmatically, avoid using the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
of an IAM user because these are long-term credentials. Always try to use temporary credentials to access AWS services programmatically.
As an example, a developer hardcodes the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
of an IAM user in the application during development but fails to remove the hardcoded values before pushing changes to code repository. These exposed credentials can be used by unintended or malicious users, which can have serious implications (especially if the exposed credentials have admin privileges). These exposed credentials should be deactivated or deleted immediately by using the IAM console or AWS Command Line Interface (AWS CLI).
To get temporary credentials for programmatic access to AWS services, use AWS STS. Temporary credentials are valid only for the specified time (from 15 minutes up to 36 hours). The maximum allowed duration of temporary credentials varies depending on such factors as role settings and role chaining. For more information about AWS STS, see the documentation.
Epics
Task | Description | Skills required |
---|---|---|
Create DynamoDB tables. | Create DynamoDB tables, with indexes, in both source and target AWS accounts. Set the capacity provisioning as on-demand mode, which allows DynamoDB to scale read/write capacities dynamically based on the workload. Alternatively, you can use provisioned capacity with 4000 RCUs and 4000 WCUs. | App developer, DBA, Migration engineer |
Populate the source table. | Populate the DynamoDB table in the source account with test data. Having at least 50 MB or more of test data helps you to see the peak and average RCUs consumed during table copy. You can then change the capacity provisioning as needed. | App developer, DBA, Migration engineer |
Task | Description | Skills required |
---|---|---|
Create IAM roles to access the source and target DynamoDB tables. | Create an IAM role in the source account with permissions to access (read) the DynamoDB table in the source account. Add the source account as a trusted entity for this role. Create an IAM role in the target account with permissions to access (create, read, update, delete) the DynamoDB table in the target account. Add the target account as a trusted entity for this role. | App developer, AWS DevOps |
Task | Description | Skills required |
---|---|---|
Get temporary credentials for the IAM roles. | Get temporary credentials for IAM role created in source account. Get temporary credentials for IAM role created in target account. One way to get the temporary credentials for the IAM role is to use AWS STS from the AWS CLI.
Use the appropriate AWS profile (corresponding to the source or target account). For more information about different ways to get temporary credentials, see the following: | App developer, Migration engineer |
Initialize the DynamoDB clients for source and target DynamoDB access. | Initialize the DynamoDB clients, which are provided by the AWS SDK, for the source and target DynamoDB tables.
For more information about making requests by using IAM temporary credentials, see the AWS documentation. | App developer |
Drop and recreate the target table. | Delete and recreate the target DynamoDB table (along with indexes) in the target account, using the target account DynamoDB client. Deleting all records from a DynamoDB table is a costly operation because it consumes provisioned WCUs. Deleting and recreating the table avoids those extra costs. You can add indexes to a table after you create it, but this takes 2–5 minutes longer. Creating indexes during table creation, by passing the indexes collection to the | App developer |
Perform the table copy. | Repeat the following steps until all data are copied:
For more information, see the reference implementation in C# (for dropping, creating, and populating tables) in the Attachments section. An example table config JavaScript Object Notation (JSON) file is also attached. | App developer |
Related resources
Additional information
This pattern was implemented using C# to copy a DynamoDB table with 200,000 items (average item size of 5 KB and table size of 250 MB). The target DynamoDB table was set up with provisioned capacity of 4000 RCUs and 4000 WCUs.
The complete table copy operation (from source account to target account), including dropping and recreating the table, took 5 minutes. Total capacity units consumed: 30,000 RCUs and approximately 400,000 WCUs.
For more information on DynamoDB capacity modes, see Read/Write capacity mode in the AWS documentation.
Attachments
To access additional content that is associated with this document, unzip the following file: attachment.zip