Architecture overview

This section provides a reference implementation architecture diagram for the components deployed with this solution.

Architecture diagram

Deploying this solution with the default parameters deploys the following components in your AWS account.

AWS Step Functions orchestrates Lambda functions that perform the data transfer between your S3 Glacier vault and an S3 bucket.

Data Transfer from Amazon S3 Glacier Vaults to Amazon S3 architecture on AWS

Note

AWS CloudFormation resources are created from AWS Cloud Development Kit (AWS CDK) (AWS CDK) constructs.

The high-level process flow for the solution components deployed with the AWS CloudFormation template is as follows:

Customers invoke a transfer workflow by using a Systems Manager document (SSM document).
The SSM document starts an AWS Step Functions Orchestrator execution.
The Step Functions Orchestrator execution initiates a nested Step Functions Get Inventory workflow to retrieve the inventory file.
Upon completion of the inventory retrieval, the solution invokes the Initiate Retrieval nested Step Functions workflow.
When a job is ready, the Amazon S3 Glacier service sends a notification to an Amazon Simple Notification Service (Amazon SNS) topic indicating job completion.
The solution stores all job completion notifications in the Amazon Simple Queue Service (Amazon SQS) Notifications queue.
When an archive job is ready, the Amazon SQS Notifications queue invokes the AWS Lambda Notifications Processor function. This Lambda function prepares the initial steps for archive retrieval.
The Lambda Notifications Processor function places chunks retrieval messages in Amazon SQS Chunks Retrieval queue for chunk processing.
The Amazon SQS Chunks Retrieval queue invokes the Lambda Chunk Retrieval function to process each chunk.
The Lambda Chunk Retrieval function downloads the chunk from the Amazon S3 Glacier service.
The Lambda Chunk Retrieval function uploads a multipart upload part to the Amazon S3 service.
After a new chunk is downloaded, the solution stores chunk metadata in Amazon DynamoDB (etag, checksum_sha_256, tree_checksum).
The Lambda Chunk Retrieval function verifies whether all chunks for that archive have been processed. If yes, it inserts an event into the Amazon SQS Validation queue to invoke the Lambda Validate function.
The Lambda Validate function does the following:
1. Performs an integrity check against the tree hash in the inventory.
2. Calculates a checksum and passes it to the into the close multipart upload call. If that hash is wrong, Amazon S3 rejects the request.
A DynamoDB stream invokes the Lambda Metrics Processor function to update the transfer process metrics in DynamoDB.
The Step Functions Orchestrator execution enters an async wait, pausing until the archive retrieval workflow concludes before initiating the Step Functions Cleanup workflow.
The DynamoDB stream invokes the Lambda Async Facilitator function, which unlocks asynchronous waits in Step Functions.
The Amazon EventBridge rules periodically initiate Step Functions Extend Download Window and Update CloudWatch Dashboard workflows.
Customers monitor the transfer progress by using the Amazon CloudWatch dashboard.

Translation of S3 Glacier vault archive descriptions to S3 object names

To create the key name for each of the new objects in the Amazon S3 service, this solution uses the ArchiveDescription value for each ArchiveId listed in the Amazon S3 Glacier inventory file. The following are examples.

If the ArchiveDescription is a single string value, such as data01, the solution translates that value to an S3 object key name in the destination S3 bucket.
If the ArchiveDescription value is blank, then the solution does the following:
1. Copies the archive.
2. Uses the ArchiveId as the S3 object key name.
3. Adds the prefix 00undefined to the S3 object key names and stores the objects in the destination S3 bucket.
If multiple ArchiveId entries have the same value for the ArchiveDescription field (for example, duplicatefile02.txt), then the solution appends a timestamp suffix to the name of the original file. This resolves the potential issue of having duplicate S3 object key names copied over one another. The timestamp used is the CreationDate of the archive.

Three Amazon S3 objects labeled duplicatefile02.txt with timestamps.

Creating custom file names for S3 objects

You can provide custom S3 object key names for each ArchiveId that's copied to your S3 bucket. To do this, provide a NamingOverrideFile to the solution when you launch the transfer workflow, using the NamingOverrideFile input parameter. Use the following process.

Create a data file in CSV format. The file must contain only two columns: GlacierArchiveID and FileName (separated by a comma). The following table is an example.

GlacierArchiveID	FileName
`WVfrXME2KC6JIedfadJF937412-e`	`Mydata.txt`
`yLam5H76JXYSKKIY34404D-Kwcrk`	`Myfolder/mydata2.txt`

Obtain a copy of your vault inventory file for the Amazon S3 Glacier service. For more information, see Downloading a Vault Inventory in Amazon S3 Glacier in the Amazon S3 Glacier Developer Guide.
Copy all the ArchiveId values from your S3 Glacier vault inventory file. Paste them into the GlacierArchiveID column of your NamingOverride CSV file.
In the FileName column, for each ArchiveID, enter your desired S3 object key name.

Note
If you provide an empty value for the FileName, the solution uses the original value for ArchiveDescription from the S3 Glacier archive.
Upload the CSV file to any S3 bucket and create a presigned URL for the file.
Use this presigned URL as the value of the NamingOverride File input parameter used when launching the transfer workflow.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Concepts and definitions

AWS Well-Architected design considerations

Pilih preferensi cookie Anda

Sesuaikan preferensi cookie

Penting

Kinerja

Fungsional

Iklan

Tidak dapat menyimpan preferensi cookie

Architecture overview

Architecture diagram

Note

Translation of S3 Glacier vault archive descriptions to S3 object names

Creating custom file names for S3 objects

Note

Apakah halaman ini membantu Anda?

Topik berikutnya:

Topik sebelumnya:

Perlu bantuan?

AWS Step Functions orchestrates Lambda functions that perform the data transfer between your S3 Glacier vault and an S3 bucket.

Three Amazon S3 objects labeled duplicatefile02.txt with timestamps.

Amazon S3 object labeled data01.

Amazon S3 object labeled with a randomized alpha-numeric name.