Checking object integrity in Amazon S3
Amazon S3 uses checksum values to verify the integrity of data that you upload or download. In addition, you can request that another checksum value be calculated for any object that you store in Amazon S3. You can choose a checksum algorithm to use when uploading, copying, or batch copying your data.
When you upload your data, Amazon S3 uses the algorithm that you've chosen to compute a checksum on the server side and validates it with the provided value before storing the object and storing the checksum as part of the object metadata. This validation works consistently across encryption modes, object sizes, and storage classes for both single part and multipart uploads. When you copy or batch copy your data, however, Amazon S3 calculates the checksum on the source object and moves it to the destination object.
Note
When you perform a single part or multipart upload, you can optionally include a precalculated checksum as part of your request, and use the full object checksum type. To use precalculated values with multiple objects, use the AWS CLI or AWS SDKs.
Using supported checksum algorithms
With Amazon S3, you can choose a checksum algorithm to validate your data during uploads. The specified checksum algorithm is then stored with your object and can be used to validate data integrity during downloads. You can choose one of the following Secure Hash Algorithms (SHA) or Cyclic Redundancy Check (CRC) checksum algorithms to calculate the checksum value:
-
CRC64-NVME
(Recommended) -
CRC-32
-
CRC-32C
-
SHA-1
-
SHA-256
Additionally, you can provide a checksum with each request using the Content-MD5 header.
When you upload an object, you specify the algorithm that you want to use:
-
When you use the AWS Management Console, choose the checksum algorithm that you want to use. You can optionally specify the checksum value of the object. When Amazon S3 receives the object, it calculates the checksum by using the algorithm that you specified. If the two checksum values don't match, Amazon S3 generates an error.
-
When you use an SDK, be aware of the following:
-
Set the
ChecksumAlgorithm
parameter to the algorithm that you want Amazon S3 to use. If you already have a precalculated checksum, you pass the checksum value to the AWS SDK, and the SDK includes the value in the request. If you don’t pass a checksum value or don’t specify a checksum algorithm, the SDK automatically calculates a checksum value for you and includes it with the request to provide integrity protections. If the individual checksum value doesn't match the set value of the checksum algorithm, Amazon S3 fails the request with aBadDigest
error. -
If you’re using an upgraded AWS SDK, the SDK chooses a checksum algorithm for you. However, you can override this checksum algorithm.
-
If you don’t specify a checksum algorithm and the SDK also doesn’t calculate a checksum for you, then S3 automatically chooses the
CRC-64NVME
checksum algorithm.
-
-
When you use the REST API, don't use the
x-amz-sdk-checksum-algorithm
parameter. Instead, use one of the algorithm-specific headers (for example,x-amz-checksum-crc32
).
To apply any of these checksum values to objects that are already uploaded to Amazon S3,
you can copy the object and specify whether you want to use the existing checksum
algorithm or a new one. If you don’t specify an algorithm, S3 uses the existing
algorithm. If the source object doesn’t have a specified checksum algorithm or checksum
value, Amazon S3 uses the CRC64-NVME
algorithm to calculate the checksum value
for the destination object. You can also specify a checksum algorithm when copying
objects using S3 Batch Operations.
Important
If you use a multipart upload with Checksums for composite (or
part-level) checksums, the multipart upload part numbers must be consecutive. If you try to
complete a multipart upload request with nonconsecutive part numbers, Amazon S3 generates an
HTTP 500 Internal Server
error.
Full object and composite checksum types
In Amazon S3, there are two types of supported checksums:
-
Full object checksums: A full object checksum is calculated based on all of the content of a multipart upload, covering all data from the first byte of the first part to the last byte of the last part.
Note
All PUT requests require a full object checksum type.
-
Composite checksums: A composite checksum is calculated based on the individual checksums of each part in a multipart upload. Instead of computing a checksum based on all of the data content, this approach aggregates the part-level checksums (from the first part to the last) to produce a single, combined checksum for the complete object.
Note
When an object is uploaded as a multipart upload, the entity tag (ETag) for the object is not an MD5 digest of the entire object. Instead, Amazon S3 calculates the MD5 digest of each individual part as it is uploaded. The MD5 digests are used to determine the ETag for the final object. Amazon S3 concatenates the bytes for the MD5 digests together and then calculates the MD5 digest of these concatenated values. During the final ETag creation step, Amazon S3 adds a dash with the total number of parts to the end.
Amazon S3 supports the following full object and composite checksum algorithm types:
-
CRC-64NVME
: Supports the full object algorithm type only. -
CRC-32
: Supports both full object and composite algorithm types. -
CRC-32C
: Supports both full object and composite algorithm types. -
SHA-1
: Supports both full object and composite algorithm types. -
SHA-256
: Supports both full object and composite algorithm types.
Single part uploads
Checksums of objects that are uploaded in a single part (using PutObject
) are treated as full object checksums.
When you upload an object in the Amazon S3 console, you can choose the checksum
algorithm that you want S3 to use and also (optionally) provide a precomputed
value. Amazon S3 then validates this checksum before storing the object and its
checksum value. You can verify an object's data integrity when you request the
checksum value during object downloads.
Multipart uploads
When you upload the object in multiple parts using the MultipartUpload
API, you
can specify the checksum algorithm that you want Amazon S3 to use and the checksum type
(full object or composite).
The following table indicates which checksum algorithm type is supported for each checksum algorithm in a multipart upload:
Checksum algorithm | Full object | Composite |
---|---|---|
CRC-64NVME |
Yes | No |
CRC-32 |
Yes | Yes |
CRC-32C |
Yes | Yes |
SHA-1 |
No | Yes |
SHA-256 |
No | Yes |
Using full object checksums for multipart upload
When creating or performing a multipart upload, you can use full object checksums for validation
on upload. This means that you can provide the checksum algorithm for the MultipartUpload
API, simplifying your integrity validation
tooling because you no longer need to track part boundaries for uploaded objects. You
can provide the checksum of the whole object in the CompleteMultipartUpload
request, along with the object
size.
When you provide a full object checksum during a multipart upload, the AWS SDK passes the checksum
to Amazon S3, and S3 validates the object integrity server-side, comparing it to the received
value. Then, Amazon S3 stores the object if the values match. If the two values don’t match,
S3 fails the request with a BadDigest
error. The checksum of your object is
also stored in object metadata that you use later to validate an object's data
integrity.
For full object checksums, you can use CRC-64NVME
, CRC-32
,
or CRC-32C
checksum algorithms in S3. Full object checksums in multipart
uploads are only available for CRC-based checksums because they can linearize into a
full object checksum. This linearization allows Amazon S3 to parallelize your requests for
improved performance. In particular, S3 can compute the checksum of the whole object
from the part-level checksums. This type of validation isn’t available for other
algorithms, such as SHA and MD5. Because S3 has default integrity protections, if
objects are uploaded without a checksum, S3 automatically attaches the recommended full
object CRC-64NVME
checksum algorithm to the object.
Note
To initiate the multipart upload, you can specify the checksum algorithm and the full object checksum type. After you specify the checksum algorithm and the full object checksum type, you can provide the full object checksum value for the multipart upload.
Using part-level checksums for multipart upload
When objects are uploaded to Amazon S3, they can be uploaded as a single object or uploaded
in parts with the multipart upload process. You can choose a Checksum type for
your multipart upload. For multipart upload part-level checksums (or composite checksums), Amazon S3 calculates the
checksum for each individual part by using the specified checksum algorithm. You can use
UploadPart
to provide the checksum values for each part.
If the object that you try to upload in the Amazon S3 console is set to use the
CRC-64NVME
checksum algorithm and exceeds 16 MB, it is automatically
designated as a full object checksum.
Amazon S3 then uses the stored part-level checksum values to confirm that each part is uploaded correctly. When each part’s checksum (for the whole object) is provided, S3 uses the stored checksum values of each part to calculate the full object checksum internally, comparing it with the provided checksum value. This minimizes compute costs since S3 can compute a checksum of the whole object using the checksum of the parts. For more information about multipart uploads, see Uploading and copying objects using multipart upload in Amazon S3 and Using full object checksums for multipart upload.
When the object is completely uploaded, you can use the final calculated checksum to verify the data integrity of the object.
When uploading a part of the multipart upload, be aware of the following:
-
To retrieve information about the object, including how many parts make up the entire object, you can use the
GetObjectAttributes
operation. With additional checksums, you can also recover information for each individual part that includes the part's checksum value. -
For completed uploads, you can get an individual part's checksum by using the
GetObject
orHeadObject
operations and specifying a part number or byte range that aligns with a single part. If you want to retrieve the checksum values for individual parts of multipart uploads that are still in progress, you can useListParts
. -
Because of how Amazon S3 calculates the checksum for multipart objects, the checksum value for the object might change if you copy it. If you're using an SDK or the REST API and you call
CopyObject
, Amazon S3 copies any object up to the size limitations of theCopyObject
API operation. Amazon S3 does this copy as a single action, regardless of whether the object was uploaded in a single request or as part of a multipart upload. With a copy command, the checksum of the object is a direct checksum of the full object. If the object was originally uploaded using a multipart upload, the checksum value changes even though the data doesn't. -
Objects that are larger than the size limitations of the
CopyObject
API operation must use multipart upload copy commands. -
When you perform some operations using the AWS Management Console, Amazon S3 uses a multipart upload if the object is greater than 16 MB in size.
Checksum operations
After uploading objects, you can get the checksum value and compare it to a precomputed or previously stored checksum value of the same algorithm type. The following examples show you which checksum operations or methods you can use to verify data integrity.
To learn more about using the console and specifying checksum algorithms
to use when uploading objects, see Uploading objects and Tutorial: Checking the integrity of data in Amazon S3 with additional
checksums
The following example shows how you can use the AWS SDKs to upload a large file with multipart upload, download a large file, and validate a multipart upload file, all by using SHA-256 for file validation.
You can send REST requests to upload an object with a checksum value to verify the integrity of the data with PutObject. You can also retrieve the checksum value for objects using GetObject or HeadObject.
You can send a PUT
request to upload an object of up to 5 GB
in a single operation. For more information, see the PutObject
in the AWS CLI Command Reference. You can also use get-object
and head-object
to retrieve the checksum of an
already-uploaded object to verify the integrity of the data.
For information, see Amazon S3 CLI FAQ in the AWS Command Line Interface User Guide.
Using Content-MD5 when uploading objects
Another way to verify the integrity of your object after uploading is to provide an
MD5 digest of the object when you upload it. If you calculate the MD5 digest for your
object, you can provide the digest with the PUT
command by using the
Content-MD5
header.
After uploading the object, Amazon S3 calculates the MD5 digest of the object and compares it to the value that you provided. The request succeeds only if the two digests match.
Supplying an MD5 digest isn't required, but you can use it to verify the integrity of the object as part of the upload process.
Using Content-MD5 and the ETag to verify uploaded objects
The entity tag (ETag) for an object represents a specific version of that object. Keep in mind that the ETag only reflects changes to the content of an object, not changes to its metadata. If only the metadata of an object changes, the ETag remains the same.
Depending on the object, the ETag of the object might be an MD5 digest of the object data:
-
If an object is created by the
PutObject
,PostObject
, orCopyObject
operation, or through the AWS Management Console, and that object is also plaintext or encrypted by server-side encryption with Amazon S3 managed keys (SSE-S3), that object has an ETag that is an MD5 digest of its object data. -
If an object is created by the
PutObject
,PostObject
, orCopyObject
operation, or through the AWS Management Console, and that object is encrypted by server-side encryption with customer-provided keys (SSE-C) or server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS), that object has an ETag that is not an MD5 digest of its object data. -
If an object is created by either the multipart upload process or the
UploadPartCopy
operation, the object's ETag is not an MD5 digest, regardless of the method of encryption. If an object is larger than 16 MB, the AWS Management Console uploads or copies that object as a multipart upload, and therefore the ETag isn't an MD5 digest.
For objects where the ETag is the Content-MD5
digest of the object, you
can compare the ETag value of the object with a calculated or previously stored
Content-MD5
digest.
Using trailing checksums
When uploading objects to Amazon S3, you can either provide a precalculated checksum for the object or use an AWS SDK to automatically create trailing checksums on your behalf. If you decide to use a trailing checksum, Amazon S3 automatically generates the checksum by using your specified algorithm to validate the integrity of the object during an object upload.
To create a trailing checksum when using an AWS SDK, populate the
ChecksumAlgorithm
parameter with your preferred algorithm. The SDK uses
that algorithm to calculate the checksum for your object (or object parts) and
automatically appends it to the end of your upload request. This behavior saves you time
because Amazon S3 performs both the verification and upload of your data in a single pass.
Important
If you're using S3 Object Lambda, all requests to S3 Object Lambda are signed using
s3-object-lambda
instead of s3
. This behavior affects
the signature of trailing checksum values. For more information about S3 Object Lambda, see
Transforming objects with S3 Object Lambda.