Architecture overview
This section provides a reference implementation architecture diagram for the components deployed with this solution.
Architecture diagram
Deploying this solution with the default parameters deploys the following components in your AWS account.

Data Transfer from Amazon S3 Glacier Vaults to Amazon S3 architecture on AWS
Note
AWS CloudFormation
resources are created from
AWS Cloud Development Kit (AWS CDK)
The high-level process flow for the solution components deployed
with the
AWS CloudFormation
-
Customers invoke a transfer workflow by using a Systems Manager
document (SSM document). -
The SSM document starts an AWS Step Functions
Orchestrator
execution. -
The Step Functions
Orchestrator
execution initiates a nested Step FunctionsGet Inventory
workflow to retrieve the inventory file. -
Upon completion of the inventory retrieval, the solution invokes the
Initiate Retrieval
nested Step Functions workflow. -
When a job is ready, the Amazon S3 Glacier service sends a notification to an Amazon Simple Notification Service
(Amazon SNS) topic indicating job completion. -
The solution stores all job completion notifications in the Amazon Simple Queue Service
(Amazon SQS) Notifications
queue. -
When an archive job is ready, the Amazon SQS
Notifications
queue invokes the AWS LambdaNotifications Processor
function. This Lambda function prepares the initial steps for archive retrieval. -
The Lambda
Notifications Processor
function places chunks retrieval messages in Amazon SQSChunks Retrieval
queue for chunk processing. -
The Amazon SQS
Chunks Retrieval
queue invokes the LambdaChunk Retrieval
function to process each chunk. -
The Lambda
Chunk Retrieval
function downloads the chunk from the Amazon S3 Glacier service. -
The Lambda
Chunk Retrieval
function uploads a multipart upload part to the Amazon S3service. -
After a new chunk is downloaded, the solution stores chunk metadata in Amazon DynamoDB
( etag
,checksum_sha_256
,tree_checksum
). -
The Lambda
Chunk Retrieval
function verifies whether all chunks for that archive have been processed. If yes, it inserts an event into the Amazon SQSValidation
queue to invoke the LambdaValidate
function. -
The Lambda
Validate
function does the following:-
Performs an integrity check against the tree hash in the inventory.
-
Calculates a checksum and passes it to the into the close multipart upload call. If that hash is wrong, Amazon S3 rejects the request.
-
-
A DynamoDB stream invokes the Lambda
Metrics Processor
function to update the transfer process metrics in DynamoDB. -
The Step Functions
Orchestrator
execution enters anasync
wait, pausing until the archive retrieval workflow concludes before initiating the Step FunctionsCleanup
workflow. -
The DynamoDB stream invokes the Lambda
Async Facilitator
function, which unlocks asynchronous waits in Step Functions. -
The Amazon EventBridge
rules periodically initiate Step Functions Extend Download Window
andUpdate CloudWatch Dashboard
workflows. -
Customers monitor the transfer progress by using the Amazon CloudWatch dashboard.
Translation of S3 Glacier vault archive descriptions to S3 object names
To create the key name for each of the new objects in the Amazon S3 service, this solution uses the ArchiveDescription value for each ArchiveId listed in the Amazon S3 Glacier inventory file. The following are examples.
-
If the ArchiveDescription is a single string value, such as
data01
, the solution translates that value to an S3 object key name in the destination S3 bucket. -
If the ArchiveDescription value is blank, then the solution does the following:
-
Copies the archive.
-
Uses the ArchiveId as the S3 object key name.
-
Adds the prefix
00undefined
to the S3 object key names and stores the objects in the destination S3 bucket.
-
-
If multiple ArchiveId entries have the same value for the ArchiveDescription field (for example,
duplicatefile02.txt
), then the solution appends a timestamp suffix to the name of the original file. This resolves the potential issue of having duplicate S3 object key names copied over one another. The timestamp used is the CreationDate of the archive.

Creating custom file names for S3 objects
You can provide custom S3 object key names for each ArchiveId that's copied to your S3 bucket. To do this, provide a NamingOverrideFile to the solution when you launch the transfer workflow, using the NamingOverrideFile input parameter. Use the following process.
-
Create a data file in CSV format. The file must contain only two columns: GlacierArchiveID and FileName (separated by a comma). The following table is an example.
GlacierArchiveID FileName WVfrXME2KC6JIedfadJF937412-e
Mydata.txt
yLam5H76JXYSKKIY34404D-Kwcrk
Myfolder/mydata2.txt
-
Obtain a copy of your vault inventory file for the Amazon S3 Glacier service. For more information, see Downloading a Vault Inventory in Amazon S3 Glacier in the Amazon S3 Glacier Developer Guide.
-
Copy all the ArchiveId values from your S3 Glacier vault inventory file. Paste them into the GlacierArchiveID column of your NamingOverride CSV file.
-
In the FileName column, for each ArchiveID, enter your desired S3 object key name.
Note
If you provide an empty value for the FileName, the solution uses the original value for ArchiveDescription from the S3 Glacier archive.
-
Upload the CSV file to any S3 bucket and create a presigned URL for the file.
-
Use this presigned URL as the value of the NamingOverride File input parameter used when launching the transfer workflow.