Transferring specific files, objects, and folders by using filters
AWS DataSync lets you apply filters to include or exclude data from your source location in a
transfer. For example, if you don't want to transfer temporary files that end with
.tmp
, you can create an exclude filter so that these files don't
make their way to your destination location.
You can use a combination of exclude and include filters in the same transfer task. If you modify a task's filters, those changes are applied the next time you run the task.
Filtering terms, definitions, and syntax
Familiarize yourself with the concepts related to DataSync filtering:
- Filter
-
The whole string that makes up a particular filter (for example,
*.tmp
|
*.temp
or/folderA|/folderB
).Filters are made up of patterns delimited by using a pipe (|). You don't need a delimiter when you add patterns in the DataSync console because you add each pattern separately.
Note
Filters are case sensitive. For example, filter
/folderA
won't match/FolderA
. - Pattern
-
A pattern within a filter. For example,
*.tmp
is a pattern that's part of the*.tmp
|
*.temp
filter. If your filter has multiple patterns, you delimit each pattern by using a pipe (|). - Folders
-
-
All filters are relative to the source location path. For example, suppose that you specify
/my_source/
as the source path when you create your source location and task and specify the include filter/transfer_this/
. In this case, DataSync transfers only the directory/my_source/transfer_this/
and its contents. -
To specify a folder directly under the source location, include a forward slash (/) in front of the folder name. In the example preceding, the pattern uses
/transfer_this
, nottransfer_this
. -
DataSync interprets the following patterns the same way and matches both the folder and its content.
/dir
/dir/
-
When you are transferring data from or to an Amazon S3 bucket, DataSync treats the
/
character in the object key as the equivalent of a folder on a file system.
-
- Special characters
-
Following are special characters for use with filtering.
Special character Description *
(wildcard)A character used to match zero or more characters. For example,
/movies_folder*
matches both/movies_folder
and/movies_folder1
.|
(pipe delimiter)A character used as a delimiter between patterns. It enables specifying multiple patterns, any of which can match the filter. For example,
*.tmp
|
*.temp
matches files ending with eithertmp
ortemp
.Note
This delimiter isn't needed when you add patterns on the console because you add each pattern on a separate line.
\
(backslash)A character used for escaping special characters (*, |, \) in a file or object name.
A double backslash (\\) is required when a backslash is part of a file name. Similarly, \\\\ represents two consecutive backslashes in a file name.
A backslash followed by a pipe (\|) is required when a pipe is part of a file name.
A backslash (\) followed by any other character, or at the end of a pattern, is ignored.
Example filters
The following examples show common filters you can use with DataSync.
Note
There are limits to how many characters you can use in a filter. For more information, see DataSync quotas.
Exclude some folders from your source location
In some cases, you want might exclude folders in your source location to not copy them to your destination location. For example, if you have temporary work-in-progress folders, you can use something like the following filter:
*/.temp
To exclude folders with similar content (such as /reports2021
and
/reports2022)
), you can use an exclude filter like the
following:
/reports*
To exclude folders at any level in the file hierarchy, you can use an exclude filter like the following.
*/folder-to-exclude-1
|*/folder-to-exclude-2
To exclude folders at the top level of the source location, you can use an exclude filter like the following.
/top-level-folder-to-exclude-1
|/top-level-folder-to-exclude-2
Include a subset of the folders on your source location
In some cases, your source location might be a large share and you need to transfer a subset of the folders under the root. To include specific folders, start a task execution with an include filter like the following.
/folder-to-transfer/*
Exclude specific file types
To exclude certain file types from the transfer, you can create a task execution
with an exclude filter such as *.temp
.
Transfer individual files you specify
To transfer a list of individual files, start a task execution with an include
filter like the following:
"/folder/subfolder/file1.txt
|/folder/subfolder/file2.txt
|/folder/subfolder/file2.txt
"
Creating include filters
Include filters define the files, objects, and folders that you want DataSync to transfer. You can configure include filters when you create, edit, or start a task.
DataSync scans and transfers only files and folders that match the include filters.
For example, to include a subset of your source folders, you might specify
/important_folder_1
|/important_folder_2
.
Note
Include filters support the wildcard (*) character only as the rightmost character
in a pattern. For example,
/documents*
|/code*
is supported, but
*.txt
isn't.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Tasks, and then choose Create task.
-
Configure your task's source and destination locations.
For more information, see Where can I transfer my data with AWS DataSync?
-
For Contents to scan, choose Specific files, objects, and folders, then select Using filters.
-
For Includes, enter your filter (for example,
/important_folders
to include an important directory), then choose Add pattern. -
Add other include filters as needed.
When using the AWS CLI, you must use single quotation marks ('
)
around the filter and a | (pipe) as a delimiter if you have more than one
filter.
The following example specifies two include filters
/important_folder1
and /important_folder2
when running the create-task
command.
aws datasync create-task --source-location-arn 'arn:aws:datasync:
region
:account-id
:location/location-id
' \ --destination-location-arn 'arn:aws:datasync:region
:account-id
:location/location-id
' \ --includes FilterType=SIMPLE_PATTERN,Value='/important_folder1|/important_folder2'
Creating exclude filters
Exclude filters define the files, objects, and folders in your source location that you don't want DataSync to transfer. You can configure these filters when you create, edit, or start a task.
Topics
Data excluded by default
DataSync automatically excludes some data from being transferred:
-
.snapshot
– DataSync ignores any path ending with.snapshot
, which typically is used for point-in-time snapshots of a storage system's files or directories. -
/.aws-datasync
and/.awssync
– DataSync creates these folders in your location to help facilitate your transfer. -
/.zfs
– You might see this folder with Amazon FSx for OpenZFS locations.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Tasks, and then choose Create task.
-
Configure your task's source and destination locations.
For more information, see Where can I transfer my data with AWS DataSync?
-
For Excludes, enter your filter (for example,
*/temp
to exclude temporary folders), then choose Add pattern. -
Add other exclude filters as needed.
-
If needed, add include filters.
When using the AWS CLI, you must use single quotation marks ('
)
around the filter and a | (pipe) as a delimiter if you have more than one
filter.
The following example specifies two exclude filters
*/temp
and */tmp
when running the
create-task
command.
aws datasync create-task \ --source-location-arn 'arn:aws:datasync:
region
:account-id
:location/location-id
' \ --destination-location-arn 'arn:aws:datasync:region
:account-id
:location/location-id
' \ --excludes FilterType=SIMPLE_PATTERN,Value='*/temp|*/tmp'