Transferring specific files or objects by using a manifest
A manifest is a list of files or objects that you want AWS DataSync to transfer. For example, instead of having to transfer everything in an S3 bucket with potentially millions of objects, DataSync transfers only the objects that you list in your manifest.
Manifests are similar to filters but let you identify exactly which files or objects to transfer instead of data that matches a filter pattern.
Creating your manifest
A manifest is a comma-separated values (CSV)-formatted file that lists the files or objects in your source location that you want DataSync to transfer. If your source is an S3 bucket, you can also include which version of an object to transfer.
Guidelines
Use these guidelines to help you create a manifest that works with DataSync.
Example manifests
Use these examples to help you create a manifest that works with DataSync.
- Manifest with full file or object paths
-
The following example shows a manifest with full file or object paths to transfer.
photos/picture1.png photos/picture2.png photos/picture3.png
- Manifest with only object keys
-
The following example shows a manifest with objects to transfer from an Amazon S3 source location. Since the location is configured with the prefix
photos
, only the object keys are specified.picture1.png picture2.png picture3.png
- Manifest with object paths and version IDs
-
The first two entries in the following manifest example include specific Amazon S3 object versions to transfer.
photos/picture1.png,111111 photos/picture2.png,121212 photos/picture3.png
- Manifest with UTF-8 characters
-
The following example shows a manifest with files that include UTF-8 characters.
documents/résumé1.pdf documents/résumé2.pdf documents/résumé3.pdf
Providing DataSync access to your manifest
You need an AWS Identity and Access Management (IAM) role that gives DataSync access to your manifest in its S3 bucket. This role must include the following permissions:
-
s3:GetObject
-
s3:GetObjectVersion
You can generate this role automatically in the DataSync console or create the role yourself.
Note
If your manifest is in a different AWS account, you must create this role manually.
When creating or starting a transfer task in the console, DataSync can create an
IAM role for you with the s3:GetObject
and
s3:GetObjectVersion
permissions that you need to access your
manifest.
- Required permissions to automatically create the role
-
To automatically create the role, make sure that the role that you're using to access the DataSync console has the following permissions:
-
iam:CreateRole
-
iam:CreatePolicy
-
iam:AttachRolePolicy
-
You can manually create the IAM role that DataSync needs to access your manifest. The following instructions assume that you're in the same AWS account where you use DataSync and your manifest's S3 bucket is located.
Open the IAM console at https://console.aws.amazon.com/iam/
. -
In the left navigation pane, under Access management, choose Roles, and then choose Create role.
-
On the Select trusted entity page, for Trusted entity type, choose AWS service.
-
For Use case, choose DataSync in the dropdown list and select DataSync. Choose Next.
-
On the Add permissions page, choose Next. Give your role a name and choose Create role.
-
On the Roles page, search for the role that you just created and choose its name.
-
On the role's details page, choose the Permissions tab. Choose Add permissions then Create inline policy.
-
Choose the JSON tab and paste the following sample policy into the policy editor:
{ "Version": "2012-10-17", "Statement": [{ "Sid": "DataSyncAccessManifest", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::
amzn-s3-demo-bucket
/my-manifest.csv
" }] } -
In the sample policy that you just pasted, replace the following values with your own:
-
Replace
with the name of the S3 bucket that's hosting your manifest.amzn-s3-demo-bucket
-
Replace
with the file name of your manifest.my-manifest.csv
-
-
Choose Next. Give your policy a name and choose Create policy.
-
(Recommended) To prevent the cross-service confused deputy problem, do the following:
-
On the role's details page, choose the Trust relationships tab. Choose Edit trust policy.
-
Update the trust policy by using the following example, which includes the
aws:SourceArn
andaws:SourceAccount
global condition context keys:{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "datasync.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "
account-id
" }, "StringLike": { "aws:SourceArn": "arn:aws:datasync:region
:account-id
:*" } } }] }-
Replace each instance
with the AWS account ID where you're using DataSync.account-id
-
Replace
with the AWS Region where you're using DataSync.region
-
-
Choose Update policy.
-
You've created an IAM role that allows DataSync to access your manifest. Specify this role when creating or starting your task.
If your manifest is in an S3 bucket that belongs to a different AWS account, you must manually create the IAM role that DataSync uses to access the manifest. Then, in the AWS account where your manifest is located, you need to include the role in the S3 bucket policy.
Creating the role
Open the IAM console at https://console.aws.amazon.com/iam/
. -
In the left navigation pane, under Access management, choose Roles, and then choose Create role.
-
On the Select trusted entity page, for Trusted entity type, choose AWS service.
-
For Use case, choose DataSync in the dropdown list and select DataSync. Choose Next.
-
On the Add permissions page, choose Next. Give your role a name and choose Create role.
-
On the Roles page, search for the role that you just created and choose its name.
-
On the role's details page, choose the Permissions tab. Choose Add permissions then Create inline policy.
-
Choose the JSON tab and paste the following sample policy into the policy editor:
{ "Version": "2012-10-17", "Statement": [{ "Sid": "DataSyncAccessManifest", "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::
amzn-s3-demo-bucket
/my-manifest.csv
" }] } -
In the sample policy that you just pasted, replace the following values with your own:
-
Replace
with the name of the S3 bucket that's hosting your manifest.amzn-s3-demo-bucket
-
Replace
with the file name of your manifest.my-manifest.csv
-
-
Choose Next. Give your policy a name and choose Create policy.
-
(Recommended) To prevent the cross-service confused deputy problem, do the following:
-
On the role's details page, choose the Trust relationships tab. Choose Edit trust policy.
-
Update the trust policy by using the following example, which includes the
aws:SourceArn
andaws:SourceAccount
global condition context keys:{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "datasync.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "
account-id
" }, "StringLike": { "aws:SourceArn": "arn:aws:datasync:region
:account-id
:*" } } }] }-
Replace each instance of
with the AWS account ID where you're using DataSync.account-id
-
Replace
with the AWS Region where you're using DataSync.region
-
-
Choose Update policy.
-
You created the IAM role that you can include in your S3 bucket policy.
Updating your S3 bucket policy with the role
Once you've created the IAM role, you must add it to the S3 bucket policy in the other AWS account where your manifest is located.
-
In the AWS Management Console, switch over to the account with your manfiest's S3 bucket.
Open the Amazon S3 console at https://console.aws.amazon.com/s3/
. -
On the bucket's detail page, choose the Permissions tab.
-
Under Bucket policy, choose Edit and do the following to modify your S3 bucket policy:
-
Update what's in the editor to include the following policy statements:
{ "Version": "2008-10-17", "Statement": [ { "Sid": "DataSyncAccessManifestBucket", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::
account-id
:role/datasync-role
" }, "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::amzn-s3-demo-bucket
" } ] } -
Replace
with the AWS account ID for the account that you're using DataSync with.account-id
-
Replace
with the IAM role that you just created that allows DataSync to access your manifest.datasync-role
-
Replace
with the name of the S3 bucket that's hosting your manifest in the other AWS account.amzn-s3-demo-bucket
-
-
Choose Save changes.
You've created an IAM role that allows DataSync to access your manifest in the other account. Specify this role when creating or starting your task.
Specifying your manifest when creating a task
You can specify the manifest that you want DataSync to use when creating a task.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Tasks, and then choose Create task.
-
Configure your task's source and destination locations.
For more information, see Where can I transfer my data with AWS DataSync?
-
For Contents to scan, choose Specific files, objects, and folders, then select Using a manifest.
-
For S3 URI, choose your manifest that's hosted on an S3 bucket.
Alternatively, you can enter the URI (for example,
s3://bucket/prefix/my-manifest.csv
). -
For Object version, choose the version of the manifest that you want DataSync to use.
By default, DataSync uses the latest version of the object.
-
For Manifest access role, do one of the following:
-
Choose Autogenerate for DataSync to automatically create an IAM role with the permissions required to access your manifest in its S3 bucket.
-
Choose an existing IAM role that can access your manifest.
For more information, see Providing DataSync access to your manifest.
-
-
Configure any other task settings you need, then choose Next.
-
Choose Create task.
-
Copy the following
create-task
command:aws datasync create-task \ --source-location-arn arn:aws:datasync:
us-east-1
:123456789012
:location/loc-12345678abcdefgh \ --destination-location-arn arn:aws:datasync:us-east-1
:123456789012
:location/loc-abcdefgh12345678 \ --manifest-config { "Source": { "S3": { "ManifestObjectPath": "s3-object-key-of-manifest
", "BucketAccessRoleArn": "bucket-iam-role
", "S3BucketArn": "amzn-s3-demo-bucket-arn
", "ManifestObjectVersionId": "manifest-version-to-use
" } } } -
For the
--source-location-arn
parameter, specify the Amazon Resource Name (ARN) of the location that you're transferring data from. -
For the
--destination-location-arn
parameter, specify the ARN of the location that you're transferring data to. -
For the
--manifest-config
parameter, do the following:-
ManifestObjectPath
– Specify the S3 object key of your manifest. -
BucketAccessRoleArn
– Specify the IAM role that allows DataSync to access your manifest in its S3 bucket.For more information, see Providing DataSync access to your manifest.
-
S3BucketArn
– Specify the ARN of the S3 bucket that's hosting your manifest. -
ManifestObjectVersionId
– Specify the version of the manifest that you want DataSync to use.By default, DataSync uses the latest version of the object.
-
-
Run the
create-task
command to create your task.
When you're ready, you can start your transfer task.
Specifying your manifest when starting a task
You can specify the manifest that you want DataSync to use when executing a task.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Tasks, and then choose the task that you want to start.
-
In the task overview page, choose Start, and then choose Start with overriding options.
-
For Contents to scan, choose Specific files, objects, and folders, then select Using a manifest.
-
For S3 URI, choose your manifest that's hosted on an S3 bucket.
Alternatively, you can enter the URI (for example,
s3://bucket/prefix/my-manifest.csv
). -
For Object version, choose the version of the manifest that you want DataSync to use.
By default, DataSync uses the latest version of the object.
-
For Manifest access role, do one of the following:
-
Choose Autogenerate for DataSync to automatically create an IAM role to access your manifest in its S3 bucket.
-
Choose an existing IAM role that can access your manifest.
For more information, see Providing DataSync access to your manifest.
-
-
Choose Start to begin your transfer.
-
Copy the following
start-task-execution
command:aws datasync start-task-execution \ --task-arn arn:aws:datasync:
us-east-1
:123456789012
:task/task-12345678abcdefgh \ --manifest-config { "Source": { "S3": { "ManifestObjectPath": "s3-object-key-of-manifest
", "BucketAccessRoleArn": "bucket-iam-role
", "S3BucketArn": "amzn-s3-demo-bucket-arn
", "ManifestObjectVersionId": "manifest-version-to-use
" } } } -
For the
--task-arn
parameter, specify the Amazon Resource Name (ARN) of the task that you're starting. -
For the
--manifest-config
parameter, do the following:-
ManifestObjectPath
– Specify the S3 object key of your manifest. -
BucketAccessRoleArn
– Specify the IAM role that allows DataSync to access your manifest in its S3 bucket.For more information, see Providing DataSync access to your manifest.
-
S3BucketArn
– Specify the ARN of the S3 bucket that's hosting your manifest. -
ManifestObjectVersionId
– Specify the version of the manifest that you want DataSync to use.By default, DataSync uses the latest version of the object.
-
-
Run the
start-task-execution
command to begin your transfer.
Limitations
-
You can't use a manifest together with filters.
-
You can't specify only a directory or folder with the intention of transferring all of its contents. For these situations, consider using an include filter instead of a manifest.
-
You can't use the Keep deleted files task option (
PreserveDeletedFiles
in the API) to maintain files or objects in the destination that aren't in the source. DataSync only transfers what's listed in your manifest and doesn't delete anything in the destination.
Troubleshooting
If you're transferring objects with specific version IDs from an S3 bucket, you might see an error related to HeadObject
or GetObjectTagging
. For example, here's an error related to GetObjectTagging
:
[WARN] Failed to read metadata for file
/picture1.png
(versionId:111111
): S3 Get Object Tagging Failed [ERROR] S3 Exception: op=GetObjectTaggingphotos/picture1.png
, code=403, type=15, exception=AccessDenied, msg=Access Denied req-hdrs: content-type=application/xml, x-amz-api-version=2006-03-01 rsp-hdrs: content-type=application/xml, date=Wed, 07 Feb 2024 20:16:14 GMT, server=AmazonS3, transfer-encoding=chunked, x-amz-id-2=IOWQ4fDEXAMPLEQM+ey7N9WgVhSnQ6JEXAMPLEZb7hSQDASK+Jd1vEXAMPLEa3Km, x-amz-request-id=79104EXAMPLEB723
If you see either of these errors, validate that the IAM role that DataSync uses to access your S3 source location has the following permissions:
-
s3:GetObjectVersion
-
s3:GetObjectVersionTagging
If you need to update your role with these permissions, see Creating an IAM role for DataSync to access your Amazon S3 location.
Next steps
If you haven't already, start your task. Otherwise, monitor your task's activity.