Tutorial: Upload an object through multipart upload and verify its data integrity
Multipart upload allows you to upload a single object as a set of parts. Each part is a contiguous portion of the object's data. You can upload these object parts independently and in any order. If transmission of any part fails, you can retransmit that part without affecting other parts. After all parts of your object are uploaded, Amazon S3 assembles these parts and creates the object. In general, when your object size reaches 100 MB, you should consider using multipart uploads instead of uploading the object in a single operation. For more information about multipart uploads, see Uploading and copying objects using multipart upload in Amazon S3. For limits related to multipart uploads, see Amazon S3 multipart upload limits.
You can use checksums to verify that assets are not altered when they are copied. Performing a checksum consists of using an algorithm to iterate sequentially over every byte in a file. Amazon S3 offers multiple checksum options for checking the integrity of data. We recommend that you perform these integrity checks as a durability best practice and to confirm that every byte is transferred without alteration. Amazon S3 also supports the following algorithms: SHA-1, SHA-256, CRC32, and CRC32C. Amazon S3 uses one or more of these algorithms to compute an additional checksum value and store it as part of the object metadata. For more information about checksums, see Checking object integrity in Amazon S3.
Objective
In this tutorial, you will learn how to upload an object to Amazon S3 by using a multipart upload and an additional SHA-256 checksum through the AWS Command Line Interface (AWS CLI). You’ll also learn how to check the object’s data integrity by calculating the MD5 hash and SHA-256 checksum of the uploaded object.
Topics
- Prerequisites
- Step 1: Create a large file
- Step 2: Split the file into multiple files
- Step 3: Create the multipart upload with an additional checksum
- Step 4: Upload the parts of your multipart upload
- Step 5: List all the parts of your multipart upload
- Step 6: Complete the multipart upload
- Step 7: Confirm that the object is uploaded to your bucket
- Step 8: Verify object integrity with an MD5 checksum
- Step 9: Verify object integrity with an additional checksum
- Step 10: Clean up your resources
Prerequisites
-
Before you start this tutorial, make sure that you have access to an Amazon S3 bucket that you can upload to. For more information, see Creating a bucket.
-
You must have the AWS CLI installed and configured. If you don’t have the AWS CLI installed, see Install or update to the latest version of the AWS CLI in the AWS Command Line Interface User Guide.
-
Alternatively, you can run AWS CLI commands from the console by using AWS CloudShell. AWS CloudShell is a browser-based, pre-authenticated shell that you can launch directly from the AWS Management Console. For more information, see What is CloudShell? and Getting started with AWS CloudShell in the AWS CloudShell User Guide.
Step 1: Create a large file
If you already have a file ready for upload, you can use the file for this tutorial. Otherwise, create a 15 MB file using the following steps. For limits related to multipart uploads, see Amazon S3 multipart upload limits.
To create a large file
Use one of the following commands to create your file, depending on which operating system you're using.
Linux or macOS
To create a 15 MB file, open your local terminal and run the following command:
dd if=/dev/urandom of=census-data.bin bs=1M count=15
This command creates a file named census-data.bin
filled with
random bytes, with a size of 15 MB.
Windows
To create a 15 MB file, open your local terminal and run the following command:
fsutil file createnew census-data.bin 15728640
This command creates a file named census-data.bin
with a size of
15 MB of arbitrary data (15728640 bytes).
Step 2: Split the file into multiple files
To perform the multipart upload, you have to split your large file into smaller parts. You can then upload the smaller
parts by using the multipart upload process. This step demonstrates how to split the
large file created in Step 1 into smaller
parts. The following example uses a 15 MB file named
census-data.bin
.
To split a large file into parts
Linux or macOS
To divide the large file into 5 MB parts, use the split
command. Open your
terminal and run the following:
split -b 5M -d census-data.bin census-part
This command splits census-data.bin
into 5 MB parts named
census-part**
, where **
is a numeric suffix starting from
00
.
Windows
To split the large file, use PowerShell. Open Powershell
$inputFile = "census-data.bin" $outputFilePrefix = "census-part" $chunkSize = 5MB $fs = [System.IO.File]::OpenRead($inputFile) $buffer = New-Object byte[] $chunkSize $fileNumber = 0 while ($fs.Position -lt $fs.Length) { $bytesRead = $fs.Read($buffer, 0, $chunkSize) $outputFile = "{0}{1:D2}" -f $outputFilePrefix, $fileNumber $fileStream = [System.IO.File]::Create($outputFile) $fileStream.Write($buffer, 0, $bytesRead) $fileStream.Close() $fileNumber++ } $fs.Close()
This PowerShell script reads the large file in chunks of 5 MB and writes each chunk to a new file with a numeric suffix.
After running the appropriate command, you should see the parts in the directory where you executed the command. Each part will have a suffix corresponding to its part number, for example:
census-part00 census-part01 census-part02
Step 3: Create the multipart upload with an additional checksum
To begin the multipart upload process, you need to create the multipart upload request. This step involves initiating the multipart upload and specifying an additional checksum for data integrity. The following example uses the SHA-256 checksum. If you want to provide any metadata describing the object being uploaded, you must provide it in the request to initiate the multipart upload.
Note
In this step and subsequent steps, this tutorial uses the SHA-256 additional algorithm. You might optionally use another additional checksum for these steps, such as CRC32, CRC32C, or SHA-1. If you use a different algorithm, you must use it throughout the tutorial steps.
To start the multipart upload
In your terminal, use the following create-multipart-upload
command to
start a multipart upload for your bucket. Replace
with your actual
bucket name. Also, replace the amzn-s3-demo-bucket1
census_data_file
with your chosen
file name. This file name becomes the object key when the upload completes.
aws s3api create-multipart-upload --bucket
amzn-s3-demo-bucket1
--key 'census_data_file
' --checksum-algorithm sha256
If your request succeeds, you'll see JSON output like the following:
{ "ServerSideEncryption": "AES256", "ChecksumAlgorithm": "SHA256", "Bucket": "
amzn-s3-demo-bucket1
", "Key": "census_data_file", "UploadId": "cNV6KCSNANFZapz1LUGPC5XwUVi1n6yUoIeSP138sNOKPeMhpKQRrbT9k0ePmgoOTCj9K83T4e2Gb5hQvNoNpCKqyb8m3.oyYgQNZD6FNJLBZluOIUyRE.qM5yhDTdhz" }
Note
When you send a request to initiate a multipart upload, Amazon S3 returns a
response with an upload ID, which is a unique identifier for your multipart upload.
You must include this upload ID whenever you upload parts, list the parts, complete
an upload, or stop an upload. You'll need to use the
UploadId
, Key
, and
Bucket
values for later steps, so make sure to save
these.
Also, if you’re using multipart upload with additional checksums, the part numbers
must be consecutive. If you use nonconsecutive part numbers, the
complete-multipart-upload
request can result in an HTTP 500
Internal Server Error
.
Step 4: Upload the parts of your multipart upload
In this step, you will upload the parts of your multipart upload to your S3 bucket. Use
the upload-part
command to upload each part individually. This
process requires specifying the upload ID, the part number, and the file to be uploaded
for each part.
To upload the parts
-
When uploading a part, in addition to the upload ID, you must specify a part number by using the
--part-number
argument. You can choose any part number between 1 and 10,000. A part number uniquely identifies a part and its position in the object you are uploading. The part number that you choose must be in a consecutive sequence (for example, it can be 1, 2, or 3). If you upload a new part using the same part number as a previously uploaded part, the previously uploaded part is overwritten. -
Use the
upload-part
command to upload each part of your multipart upload. The--upload-id
is the same as it was in the output created by thecreate-multipart-upload
command in Step 3. To upload the first part of your data, use the following command:aws s3api upload-part --bucket
amzn-s3-demo-bucket1
--key 'census_data_file
' --part-number1
--bodycensus-part00
--upload-id "cNV6KCSNANFZapz1LUGPC5XwUVi1n6yUoIeSP138sNOKPeMhpKQRrbT9k0ePmgoOTCj9K83T4e2Gb5hQvNoNpCKqyb8m3.oyYgQNZD6FNJLBZluOIUyRE.qM5yhDTdhz
" --checksum-algorithmSHA256
Upon completion of each
upload-part
command, you should see output like the following example:{ "ServerSideEncryption": "AES256", "ETag": "\"e611693805e812ef37f96c9937605e69\"", "ChecksumSHA256": "QLl8R4i4+SaJlrl8ZIcutc5TbZtwt2NwB8lTXkd3GH0=" }
-
For subsequent parts, increment the part number accordingly:
aws s3api upload-part --bucket
amzn-s3-demo-bucket1
--key 'census_data_file
' --part-number<part-number>
--body <file-path> --upload-id "<your-upload-id>" --checksum-algorithm SHA256For example, use the following command to upload the second part:
aws s3api upload-part --bucket
amzn-s3-demo-bucket1
--key 'census_data_file' --part-number 2 --body census-part01 --upload-id "cNV6KCSNANFZapz1LUGPC5XwUVi1n6yUoIeSP138sNOKPeMhpKQRrbT9k0ePmgoOTCj9K83T4e2Gb5hQvNoNpCKqyb8m3.oyYgQNZD6FNJLBZluOIUyRE.qM5yhDTdhz" --checksum-algorithm SHA256Amazon S3 returns an entity tag (ETag) and additional checksums for each uploaded part as a header in the response.
-
Continue using the
upload-part
command until you have uploaded all the parts of your object.
Step 5: List all the parts of your multipart upload
To complete the multipart upload, you will need a list of all the parts that have been
uploaded for that specific multipart upload. The output from the list-parts
command provides information such as bucket name, key, upload ID, part number, ETag,
additional checksums, and more. It’s helpful to save this output in a file so that you
can use it for the next step when completing the multipart upload process. You can
create a JSON output file called parts.json
by using the following
method.
To create a file that lists all of the parts
-
To generate a JSON file with the details of all the uploaded parts, use the following
list-parts
command. Replace
with your actual bucket name andamzn-s3-demo-bucket1
<your-upload-id>
with the upload ID that you received in Step 3. For more information on thelist-parts
command, see list-parts in the AWS Command Line Interface User Guide.aws s3api list-parts --bucket
amzn-s3-demo-bucket1
--key 'census_data_file
' --upload-id<your-upload-id>
--query '{Parts: Parts[*].{PartNumber: PartNumber, ETag: ETag, ChecksumSHA256: ChecksumSHA256}}' --output json > parts.jsonA new file called
parts.json
is generated. The file contains the JSON formatted information for all of your uploaded parts. Theparts.json
file includes essential information for each part of your multipart upload, such as the part numbers and their corresponding ETag values, which are necessary for completing the multipart upload process. -
Open
parts.json
by using any text editor or through the terminal. Here’s the example output:{ "Parts": [ { "PartNumber": 1, "ETag": "\"3c3097f89e2a2fece47ac54b243c9d97\"", "ChecksumSHA256": "fTPVHfyNHdv5VkR4S3EewdyioXECv7JBxN+d4FXYYTw=" }, { "PartNumber": 2, "ETag": "\"03c71cc160261b20ab74f6d2c476b450\"", "ChecksumSHA256": "VDWTa8enjOvULBAO3W2a6C+5/7ZnNjrnLApa1QVc3FE=" }, { "PartNumber": 3, "ETag": "\"81ae0937404429a97967dffa7eb4affb\"", "ChecksumSHA256": "cVVkXehUlzcwrBrXgPIM+EKQXPUvWist8mlUTCs4bg8=" } ] }
Step 6: Complete the multipart upload
After uploading all parts of your multipart upload and listing them, the final step is to complete the multipart upload. This step merges all the uploaded parts into a single object in your S3 bucket.
Note
You can calculate the object checksum before calling
complete-multipart-upload
by including
--checksum-sha256
in your request. If the checksums don't match,
Amazon S3 fails the request. For more information, see complete-multipart-upload in the
AWS Command Line Interface User Guide.
To complete the multipart upload
To finalize the multipart upload, use the complete-multipart-upload
command. This command requires the parts.json
file created in Step 5, your bucket name, and the upload ID.
Replace <
with your bucket name and
amzn-s3-demo-bucket1
><your-upload-id>
with the upload ID of
parts.json
.
aws s3api complete-multipart-upload --multipart-upload file://parts.json --bucket
amzn-s3-demo-bucket1
--key 'census_data_file' --upload-id <your-upload-id>
Here’s the example output:
{ "ServerSideEncryption": "AES256", "Location": "https://
amzn-s3-demo-bucket1
.s3.us-east-2.amazonaws.com/census_data_file", "Bucket": "amzn-s3-demo-bucket1
", "Key": "census_data_file", "ETag": "\"f453c6dccca969c457efdf9b1361e291-3\"", "ChecksumSHA256": "aI8EoktCdotjU8Bq46DrPCxQCGuGcPIhJ51noWs6hvk=-3" }
Note
Don't delete the individual part files yet. You will need the individual parts so that you can perform checksums on them to verify the integrity of the merged-together object.
Step 7: Confirm that the object is uploaded to your bucket
After completing the multipart upload, you can verify that the object has been
successfully uploaded to your S3 bucket. To list the objects in your bucket and confirm the presence of your newly
uploaded file, use the list-objects-v2
command
To list the uploaded object
To list the objects in your, use the list-objects-v2
command
bucket. Replace
with your
actual bucket name:
amzn-s3-demo-bucket1
aws s3api list-objects-v2 --bucket
amzn-s3-demo-bucket1
This command returns a list of objects in your bucket. Look for your uploaded
file (for example, census_data_file
) in the list of objects.
For more information, see the Examples section for the list-objects-v2
command in the AWS Command Line Interface User Guide.
Step 8: Verify object integrity with an MD5 checksum
When you upload an object, you can specify a checksum algorithm for Amazon S3 to use. By default, Amazon S3 stores the MD5 digest of bytes as the object’s ETag. For multipart uploads, the ETag is not the checksum for the entire object, but rather a composite of checksums for each individual part.
To verify object integrity by using an MD5 checksum
-
To retrieve the ETag of the uploaded object, perform a
head-object
request:aws s3api head-object --bucket
amzn-s3-demo-bucket1
--keycensus_data_file
Here’s the example output:
{ "AcceptRanges": "bytes", "LastModified": "2024-07-26T19:04:13+00:00", "ContentLength": 16106127360, "ETag": "\"f453c6dccca969c457efdf9b1361e291-3\"", "ContentType": "binary/octet-stream", "ServerSideEncryption": "AES256", "Metadata": {} }
This ETag has "-3" appended to the end. This indicates that the object was uploaded in three parts using multipart upload.
-
Next, calculate the MD5 checksum of each part using the
md5sum
command. Make sure that you provide the correct path to your part files:md5sum census-part*
Here’s the example output:
e611693805e812ef37f96c9937605e69 census-part00 63d2d5da159178785bfd6b6a5c635854 census-part01 95b87c7db852451bb38b3b44a4e6d310 census-part02
-
For this step, manually combine the MD5 hashes into one string. Then, run the following command to convert the string to binary and calculate the MD5 checksum of the binary value:
echo "
e611693805e812ef37f96c9937605e6963d2d5da159178785bfd6b6a5c63585495b87c7db852451bb38b3b44a4e6d310
" | xxd -r -p | md5sumHere’s the example output:
f453c6dccca969c457efdf9b1361e291 -
This hash value should match the hash value of the original ETag value in Step 1, which validates the integrity of the
census_data_file
object.
When you instruct Amazon S3 to use additional checksums, Amazon S3 calculates the
checksum value for each part and stores the values. If you want to retrieve the checksum
values for individual parts of multipart uploads that are still in progress, you can use
list-parts
.
For more information about how checksums work with multipart upload objects, see Checking object integrity in Amazon S3.
Step 9: Verify object integrity with an additional checksum
In this step, this tutorial uses SHA-256 as an additional checksum to validate object integrity. If you’ve used a different additional checksum, use that checksum value instead.
To verify object integrity with SHA256
-
Run the following command in your terminal, including the
--checksum-mode enabled
argument, to display theChecksumSHA256
value of your object:aws s3api head-object --bucket
amzn-s3-demo-bucket1
--key census_data_file --checksum-mode enabledHere’s the example output:
{ "AcceptRanges": "bytes", "LastModified": "2024-07-26T19:04:13+00:00", "ContentLength": 16106127360, "ChecksumSHA256": "aI8EoktCdotjU8Bq46DrPCxQCGuGcPIhJ51noWs6hvk=-3", "ETag": "\"f453c6dccca969c457efdf9b1361e291-3\"", "ContentType": "binary/octet-stream", "ServerSideEncryption": "AES256", "Metadata": {} }
-
Use the following commands to decode the
ChecksumSHA256
values for the individual parts into base64 and save them into a binary file calledoutfile
. These values can be found in yourparts.json
file. Replace the example base64 strings with your actualChecksumSHA256
values.echo "
QLl8R4i4+SaJlrl8ZIcutc5TbZtwt2NwB8lTXkd3GH0=
" | base64 --decode >> outfile echo "xCdgs1K5Bm4jWETYw/CmGYr+m6O2DcGfpckx5NVokvE=
" | base64 --decode >> outfile echo "f5wsfsa5bB+yXuwzqG1Bst91uYneqGD3CCidpb54mAo=
" | base64 --decode >> outfile -
Run the following command to calculate the SHA256 checksum of the
outfile
:sha256sum outfile
Here’s the example output:
688f04a24b42768b6353c06ae3a0eb3c2c50086b8670f221279d67a16b3a86f9 outfile
In the next step, take the hash value and convert it into a binary value. This binary value should match the
ChecksumSHA256
value from Step 1. -
Convert the SHA256 checksum from Step 3 into binary, and then encode it to base64 to verify that it matches the
ChecksumSHA256
value from Step 1:echo "688f04a24b42768b6353c06ae3a0eb3c2c50086b8670f221279d67a16b3a86f9" | xxd -r -p | base64
Here’s the example output:
aI8EoktCdotjU8Bq46DrPCxQCGuGcPIhJ51noWs6hvk=
This output should confirm that the base64 output matches the
ChecksumSHA256
value from thehead-object
command output. If the output matches the checksum value, then the object is valid.
Important
-
When you instruct Amazon S3 to use additional checksums, Amazon S3 calculates the checksum values for each part and stores these values.
-
If you want to retrieve the checksum values for individual parts of multipart uploads that are still in progress, you can use the
list-parts
command.
Step 10: Clean up your resources
If you want to clean up the files created in this tutorial, use the following method. For instructions on deleting the files uploaded to your S3 bucket, see Deleting Amazon S3 objects.
Delete local files created in Step 1:
To remove the files that you created for your multipart upload, run the following command from your working directory:
rm
census-data.bin
census-part* outfile parts.json