Creating your AWS DataSync task - AWS DataSync

Creating your AWS DataSync task

A task describes where and how AWS DataSync transfers data. Before you create your task, make sure that you understand how DataSync transfers work and review the task quotas.

Important

If you're planning to transfer data to or from an Amazon S3 location, review how DataSync can affect your S3 request charges and the DataSync pricing page before you begin.

Creating your task

When you create a DataSync task, you specify your source and destination locations. You also can customize your task by choosing which files to transfer, how metadata gets handled, setting up a schedule, and more.

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

  2. Make sure you're in one of the AWS Regions where you plan to transfer data.

  3. In the left navigation pane, expand Data transfer, then choose Tasks, and then choose Create task.

  4. On the Configure source location page, create or choose a source location, then choose Next.

  5. On the Configure destination location page, create or choose a source location, then choose Next.

  6. (Recommended) On the Configure settings page, give your task a name that you can remember.

  7. While still on the Configure settings page, choose your task options or use the default settings.

    You might be interested in some of the following options:

    When you're done, choose Next.

  8. Review your task configuration, then choose Create task.

You're ready to start your task.

Once you create your DataSync source and destination locations, you can create your task.

  1. In your AWS CLI settings, make sure that you're using one of the AWS Regions where you plan to transfer data.

  2. Copy the following create-task command:

    aws datasync create-task \ --source-location-arn "arn:aws:datasync:us-east-1:account-id:location/location-id" \ --destination-location-arn "arn:aws:datasync:us-east-1:account-id:location/location-id" \ --name "task-name"
  3. For --source-location-arn, specify the Amazon Resource Name (ARN) of your source location.

  4. For --destination-location-arn, specify the ARN of your destination location.

    If you're transferring across AWS Regions or accounts, make sure that the ARN includes the other Region or account ID.

  5. (Recommended) For --name, specify a name for your task that you can remember.

  6. Specify other task options as needed. You might be interested in some of the following options:

    For more options, see create-task. Here's an example create-task command that specifies several options:

    aws datasync create-task \ --source-location-arn "arn:aws:datasync:us-east-1:account-id:location/location-id" \ --destination-location-arn "arn:aws:datasync:us-east-1:account-id:location/location-id" \ --cloud-watch-log-group-arn "arn:aws:logs:region:account-id" \ --name "task-name" \ --options VerifyMode=NONE,OverwriteMode=NEVER,Atime=BEST_EFFORT,Mtime=PRESERVE,Uid=INT_VALUE,Gid=INT_VALUE,PreserveDevices=PRESERVE,PosixPermissions=PRESERVE,PreserveDeletedFiles=PRESERVE,TaskQueueing=ENABLED,LogLevel=TRANSFER
  7. Run the create-task command.

    If the command is successful, you get a response that shows you the ARN of the task that you created. For example:

    { "TaskArn": "arn:aws:datasync:us-east-1:111222333444:task/task-08de6e6697796f026" }

You're ready to start your task.

Creating multiple tasks for transferring large datasets

If you're transferring a large dataset, which might include millions of files or objects, we recommend creating multiple tasks that you can run in parallel. Spreading the workload across multiple tasks (and possibly agents, depending on your locations) helps reduce the time it takes DataSync to prepare and transfer your data.

Consider the following ways that you can spread out a large transfer across several DataSync tasks:

  • Create tasks that read different mount paths, prefixes, or folders in your source storage.

  • Create tasks that scan subsets of files, objects, and folders in your source storage by using a manifest or filters.

Be mindful that this approach can increase the I/O operations on your storage and affect your network bandwidth. For more information, see the blog on How to accelerate your data transfers with DataSync scale out architectures.

Creating multiple tasks for segmenting transferred data

If you're transferring different sets of data to the same destination, you can create multiple tasks to help segment the data that you transfer.

For example, if you're transferring to the same S3 bucket named MyBucket, you can create different prefixes in the bucket that correspond to each task. This approach prevents file name conflicts the datasets and allows you to set different permissions for each prefix. Here's how you might set this up:

  1. Create three prefixes in the destination MyBucket named task1, task2, and task3:

    • s3://MyBucket/task1

    • s3://MyBucket/task2

    • s3://MyBucket/task3

  2. Create three DataSync tasks named task1, task2, and task3 that transfer to the corresponding prefix in MyBucket.