# Performing a large data migration with AWS DataSync
<a name="datasync-large-migration"></a>

Large-scale data migrations can involve transferring significant volumes of data that encompass millions of files or objects in various formats. AWS DataSync simplifies these complex transfers by managing scheduling, monitoring, encryption, and data verification.

## What is a large data migration?
<a name="datasync-large-migration-definition"></a>

A large data migration typically involves transferring terabytes or more of data spread across various sources to a new destination storage environment (in this case, AWS). These migrations require careful planning and coordination within your organization to move data successfully while minimizing business disruption.

DataSync can simplify these migrations, which are usually complex in nature. Some benefits of using DataSync for your migration include:
+ Automated management of data-transfer processes and the infrastructure required for high performance and secure data transfers.
+ End-to-end security, including encryption and data integrity validation, to help ensure that your data arrives securely, intact, and ready to use.
+ A purpose-built network protocol and a parallel, multi-threaded architecture to speed up migrations.

## Key stages of a large data migration
<a name="datasync-large-migration-stages"></a>

You can usually break down a large migration into the following stages:
+ **(Stage 1) Planning your data migration** - At this stage, you're trying to understand why you're migrating and what sort of data you're working with. Planning activities include:
  + Understanding why you want to migrate 
  + Assembling a team to help you with all aspects of the migration.
  + Identifying data locations, formats, and usage patterns
  + Assessing available hardware resources and network requirements (if you're migrating from an on-premises data center)
  + Running proof of concept (POC) tests with DataSync to estimate migration timelines, plan cutover windows, and get a sense of how you need to configure DataSync
+ **(Stage 2) Implementing your large data migration** - At this point, you're validating your plan and starting the migration. Implementation activities include:
  + Validating the migration plan
  + Executing phased cutovers that include monitoring and verifying your data transfers as expected
  + Optimizing and adjusting as needed in between each cutover
  + Cleaning up unused resources once you're done

## Additional resources
<a name="review-migration-data-resources"></a>

AWS Prescriptive Guidance has the following resources that can help you plan and implement a large migration. Use this guide to understand how DataSync can work in the context of common migration processes and activities. 
+ [Large migrations to the AWS cloud](https://aws.amazon.com/prescriptive-guidance/large-migrations/?large-migration-strategies.sort-by=item.additionalFields.sortText&large-migration-strategies.sort-order=desc&large-migration-playbooks.sort-by=item.additionalFields.sortText&large-migration-playbooks.sort-order=desc&large-migration-patterns.sort-by=item.additionalFields.sortText&large-migration-patterns.sort-order=desc)
+ [Strategy and best practices for AWS large migrations](https://docs.aws.amazon.com/prescriptive-guidance/latest/strategy-large-scale-migrations/welcome.html)
+ [Migrate shared file systems in an AWS large migration](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/migrate-shared-file-systems-in-an-aws-large-migration.html) – This resource includes an **SFS-Discovery-Workbook** that you can download and use to plan a migration at the file share level.

# Stage 1: Planning your large data migration
<a name="datasync-large-migraton-stage-1"></a>

Planning is essential when migrating a large dataset. You must understand the data you're migrating, your motivations for the migration, and how AWS DataSync can help you get your data where you want it.

**Topics**
+ [Gathering requirements for your migration](gathering-migration-requirements.md)
+ [Running a DataSync proof of concept](datasync-large-migration-poc.md)
+ [Estimating migration timelines](datasync-large-migration-timelines.md)

# Gathering requirements for your migration
<a name="gathering-migration-requirements"></a>

The first step in a large data migration requires collecting a variety of information across your organization.

This information helps you create a migration [process](https://docs.aws.amazon.com/prescriptive-guidance/latest/strategy-large-scale-migrations/process.html), which for large migrations can include multiple transfers and procedures for cutting over operations (done in [waves](https://docs.aws.amazon.com/prescriptive-guidance/latest/application-portfolio-assessment-guide/wave-planning.html)) from your source to your destination storage.

## Understanding why you want to migrate
<a name="define-migration-goals-why"></a>

Before you can start migrating to AWS, you need to clearly understand why you're migrating your data. This helps address common migration challenges such as meeting deadlines, managing resources, and coordinating across teams.

If you need help determining your motivations for the migration, answer these questions:
+ Are you freeing up on-premises storage space?
+ Are you meeting hardware support contract deadlines?
+ Is this for a data center exit?
+ What's your migration timeline?
+ Are you transferring data from other cloud storage?
+ Are you migrating partial or complete datasets?
+ Is this for data archival?
+ Do applications or users need regular access to this data?

## Figuring out logistics
<a name="define-migration-goals-logistics"></a>

Address some basic logistics about your storage environment, the migration, and your organization:

1. Get a basic understanding of your current data storage infrastructure.

1. Verify whether you need a [DataSync agent](do-i-need-datasync-agent.md). For example, you need an agent if you're transferring from on-premises storage.

1. If you need an agent, make sure that you understand the [agent requirements](agent-requirements.md):
   + An agent can run as a virtual machine (VM) on VMware ESXi, Linux Kernel-based Virtual Machine (KVM), and Microsoft Hyper-V hypervisors. You also can deploy an agent as an Amazon EC2 instance within AWS.
   + Large migrations are typically memory intensive. Make sure that your agent has enough RAM.

1. Identify key stakeholders from your leadership, networking, storage, and IT departments who need to be involved in the migration. This can include:
   + Find a [single-threaded leader](https://docs.aws.amazon.com/prescriptive-guidance/latest/strategy-large-scale-migrations/people.html) who's dedicated to the project and its results.
   + Determine who's responsible for the ownership and classification of the data that you're migrating.
   + Identify who manages your source and who eventually will manage the AWS storage service that you're migrating to.
   + Find out who will create and manage any other processes for your data once it's in AWS.

1. Establish cross-department communication channels.

1. Create a rollback plan for contingencies.

1. Document the complete migration process, including waves, validation, and cutover procedures. Use this as your runbook for the entire migration. You will update this process as you plan and implement the migration.

## Reviewing the data you're migrating
<a name="review-migration-data"></a>

Work with your storage and application teams to analyze the characteristics of the data you're migrating. This information helps you determine a migration strategy that you can execute with DataSync.

**Contents**
+ [Determining data usage patterns](#review-migration-data-usage)
+ [Identifying data structure and layout](#review-migration-data-structure)
+ [Documenting shares and folders](#review-migration-data-document-shares)
+ [Analyzing file sizes](#review-migration-data-file-sizes)

### Determining data usage patterns
<a name="review-migration-data-usage"></a>
+ For actively used data with frequent modifications, plan for multiple waves of incremental transfers to avoid disrupting business operations.
+ For read-only data that might be considered archival, you might not need to plan for waves.
+ If you have a mix of data usage patterns, plan waves that migrate these different datasets separately. For example, you might have one wave for archive data, with the rest of the waves dedicated to migrating active data.

### Identifying data structure and layout
<a name="review-migration-data-structure"></a>
+ Determine if data is organized by time periods (year, month, day) or other patterns.
+ Use this organization structure to plan your migration waves. For example, you might migrate a year's worth of archive data during one wave.

### Documenting shares and folders
<a name="review-migration-data-document-shares"></a>
+ Create an inventory of shares and folders (including file or object counts for each).
+ Identify shares and folders with active datasets. These might require incremental transfers during the migration.
+ Review the [DataSync quotas](datasync-limits.md). This can help you plan how to partition your dataset when configuring DataSync.

### Analyzing file sizes
<a name="review-migration-data-file-sizes"></a>
+ Expect higher data throughput for transfers with larger files (MB or GB) compared to smaller files (KB).
+ If you're working with a lot of smaller files, expect more metadata operations on your storage system and lower data throughput. DataSync performs these operations when comparing and verifying your source and destination locations.

## Identifying storage requirements
<a name="determine-storage-requirements"></a>

To choose a compatible AWS storage service to migrate your data, you need to evaluate your source storage system's characteristics and performance.

This information can also help you [schedule your transfers](task-scheduling.md) to minimize impact on business operations during the migration.

**Contents**
+ [Determining source storage support](#determine-storage-requirements-protocols)
+ [Reviewing metadata preservation requirements](#determine-storage-requirements-metadata)
+ [Collecting performance metrics from source storage](#determine-storage-requirements-performance)
+ [Choosing a destination AWS storage service](#determine-storage-requirements-destination)

### Determining source storage support
<a name="determine-storage-requirements-protocols"></a>

DataSync can work with a variety of storage systems that allow access through NFS, SMB, HDFS, and S3 compatible object storage clients. 

If you're migrating from other cloud storage, verify that DataSync can work with that provider. For a list of supported source locations, see [Where can I transfer my data with AWS DataSync?](working-with-locations.md)

### Reviewing metadata preservation requirements
<a name="determine-storage-requirements-metadata"></a>

DataSync can preserve your file or object metadata during a transfer. How your metadata gets preserved depends on your transfer locations and if those locations use similar types of metadata.

DataSync in some cases needs additional permissions to preserve file metadata, such as NTFS discretionary access lists (DACLs).

For more information, see [Understanding how DataSync handles file and object metadata](metadata-copied.md).

### Collecting performance metrics from source storage
<a name="determine-storage-requirements-performance"></a>

Measure baseline IOPS and disk throughput during average and peak workloads for your source storage. Transferring data adds I/O overhead to both your source and destination storage systems.

Compare this performance data against your storage system's specifications to determine available performance resources.

### Choosing a destination AWS storage service
<a name="determine-storage-requirements-destination"></a>

At this point, you might have an idea what AWS storage service makes sense for your data. If not, data usage patterns and storage performance are a couple areas to think about when deciding. For example, you might consider Amazon S3 if you have archive data and Amazon FSx or Amazon EFS for active data.

To help you decide the right object or file-based storage for your data, see [Choosing an AWS storage service](https://docs.aws.amazon.com/decision-guides/latest/storage-on-aws-how-to-choose/choosing-aws-storage-service.html).

## Determining network requirements
<a name="datasync-migration-network-requirements"></a>

To migrate your data with DataSync, you must establish network connections between your source storage, agent, and AWS. You also need to plan for enough network bandwidth and infrastructure.

Work with your network engineers and storage administrators to gather the following network requirements.

**Contents**
+ [Assessing your available network bandwidth](#datasync-migration-network-bandwidth)
+ [Considering options for connecting your network to AWS](#datasync-migration-network-connection-options)
+ [Choosing a service endpoint for agent communication](#datasync-migration-network-service-endpoint)
+ [Planning for enough network infrastructure](#datasync-migration-network-interfaces)

### Assessing your available network bandwidth
<a name="datasync-migration-network-bandwidth"></a>

Your available network bandwidth factors into your transfer speeds and overall migration time. If you're transferring from an on-premises storage system, do the following: 
+ Work with your network team to determine average and peak bandwidth utilization. 
+ Identify windows when you can transfer data and avoid disrupting daily operations. This will inform when your migration waves and cutovers happen.

You can control how much bandwidth DataSync uses. For more information, see [Setting bandwidth limits for your AWS DataSync task](configure-bandwidth.md).

Since transfers from other cloud storage typically happen over the public internet, there usually are less bandwidth restrictions and considerations with these transfers.

### Considering options for connecting your network to AWS
<a name="datasync-migration-network-connection-options"></a>

Consider the following options for establishing network connectivity for your DataSync transfer:
+ **Direct Connect** - Review the [architecture and routing examples](direct-connect-architecture.md) for using Direct Connect with DataSync. You can monitor Direct Connect activity using [Amazon CloudWatch](https://docs.aws.amazon.com/directconnect/latest/UserGuide/monitoring-cloudwatch.html).
+ **VPN** - [AWS Site-to-Site VPN](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html) offers up to 1.25 Gbps throughput per tunnel.
+ **Public internet** - Contact with your internet service provider for network usage data.

### Choosing a service endpoint for agent communication
<a name="datasync-migration-network-service-endpoint"></a>

DataSync agents use [service endpoints](choose-service-endpoint.md) to communicate with the DataSync service. The type of endpoint you use depends on the how you're connecting for your network to AWS. 

### Planning for enough network infrastructure
<a name="datasync-migration-network-interfaces"></a>

For every transfer task that you create, DataSync automatically generates and manages the network infrastructure for your data transfers. This infrastructure is known as *network interfaces* or *elastic network interfaces*, which are logical networking components in an Amazon virtual private cloud (VPC) that represent virtual network cards. For more information, see the [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html).

Each network interface uses a single IP address in your destination VPC subnet. To make sure that you have enough network infrastructure for your migration, do the following:
+ Note the number of [network interfaces](required-network-interfaces.md) that DataSync will create for your DataSync destination location.
+ Make sure that your subnet has enough IP addresses for your DataSync tasks. For example, a task that uses an agent requires four IP addresses. If you create four tasks for your migration, that means you need 16 available IP addresses in your subnet. 

# Running a DataSync proof of concept
<a name="datasync-large-migration-poc"></a>

Running a proof of concept (POC) with AWS DataSync helps you validate the following aspects of your data migration planning:
+ Verify network connectivity between source and destination locations.
+ Validate your initial DataSync task configuration.
+ Measure data transfer performance.
+ Estimate migration timelines.
+ Define success criteria with the key stakeholders working on the migration.

## Getting started with your proof of concept
<a name="datasync-large-migration-poc-getting-started"></a>

1. Create your DataSync agent:

   1. [Deploy your agent](deploy-agents.md).

   1. [Choose a service endpoint](choose-service-endpoint.md) for your agent.

   1. [Activate your agent](activate-agent.md).

   1. [Verify your agent's network connections](test-agent-connections.md).

1. Select a small subset of data that represents the data that you're migrating.

   For example, if your source storage has a mix of large and small files, the subset of data you transfer in your POC should reflect that. This gives you a preliminary understanding of performance from the storage systems, your network, and DataSync.

1. Create a DataSync source location for your [on-premises](transferring-on-premises-storage.md) or [other cloud](transferring-other-cloud-storage.md) storage system.

1. Create a DataSync destination location for your [AWS storage service](transferring-aws-storage.md).

1. [Create a DataSync transfer task](create-task-how-to.md) with a [filter](filtering.md) that only transfers your data subset.

1. [Start your DataSync task](run-task.md).

1. Collect transfer performance metrics by monitoring the following:
   + Your task execution's data and file throughput. You can do this through the DataSync console or the [DescribeTaskExecution](https://docs.aws.amazon.com/datasync/latest/userguide/API_DescribeTaskExecution.html) operation. If you use `DescribeTaskExecution`, here's how you calculate these metrics:
     + **Data throughput**: Divide `BytesWritten` by `TransferDuration`
     + **File throughput**: Divide `FilesTransferred` by `TransferDuration`
   + Source and destination storage utilization. Work closely with your storage administrators to get this information.
   + Network usage.

1. Verify the transferred data at your destination location:
   + Review your CloudWatch logs for task execution errors.
   + Verify that permissions and metadata are preserved at the destination location.
   + Confirm that applications and users can access destination data as expected.
   + Address any issues that you encounter. For more information, see [Troubleshooting AWS DataSync issues](troubleshooting-datasync.md).

1. Run your task a few more times to get an idea how long it takes DataSync to prepare, transfer, and verify your data. (For more information, see [Task execution statuses](run-task.md#understand-task-execution-statuses).)

   If you run a task more than once, DataSync by default performs an incremental transfer and copies only the data that's changed from the previous task run.

   While the transfer time will likely be shorter for incremental transfers, DataSync will always prepare your transfer the same way by scanning and comparing your locations to identify what to transfer. You can use these preparation times to [estimate cutover timelines](datasync-large-migration-timelines.md#datasync-large-migration-cutover-timelines) for your migration.

1. If needed, update your migration plan based on what you learned during the POC.

# Estimating migration timelines
<a name="datasync-large-migration-timelines"></a>

Using the information you've collected to this point, you can estimate how long the migration will take using AWS DataSync.

## Estimating data transfer timelines
<a name="datasync-large-migration-transfer-timelines"></a>

You can estimate how long it takes DataSync to transfer your data based on the following information you collected during migration requirements gathering and your DataSync proof of concept (POC):
+ Your [available network bandwidth](gathering-migration-requirements.md#datasync-migration-network-bandwidth)
+ Source and destination storage utilization metrics
+ Performance metrics from your [DataSync POC](datasync-large-migration-poc.md)

**To estimate a data transfer timeline**

1. Compare the data and file throughput from your POC with your available network bandwidth.

1. If your throughput is lower than your available bandwidth (such as 300 MiB/s for throughput with 10 Gbps of network bandwidth), consider partitioning your dataset into multiple tasks to maximize bandwidth usage.

   DataSync has a few options for partitioning your dataset. For more information, see [Accelerating your migration with data partitioning](datasync-large-migration-data-partitioning.md).

1. Calculate how many days a transfer takes by using the following formula, which provides a theoretical minimum transfer time:

   ```
   (DATA_SIZE * 8 bits per byte)/(CIRCUIT * NETWORK_UTILIZATION percentage * 3600 seconds per hour * AVAILABLE_HOURS) = Number of days
   ```

   When using this formula, replace the following with your own values:
   + `DATA_SIZE`: The amount of data that you're migrating (expressed in bytes).
   + `CIRCUIT`: Your available network bandwidth (expressed in bits per second).
   + `NETWORK_UTILIZATION`: What percent of your network is being used.
   + `AVAILABLE_HOURS`: The number of operational hours available in each day.

   For example, you would calculate a migration with 100 TB of data, a 1 Gbps internet connection, 80 percent network utilization, and 24 hours per day availability like this:

   `(100,000,000,000,000 bytes * 8) / (1,000,000,000 bps * 0.80 * 3600 * 24) = 11.57 days`

   In this case, the migration would take almost 12 days before accounting for real-world conditions.

1. Adjust your calculated transfer duration to account for real-world conditions:
   + Network performance fluctuations
   + Storage performance variations
   + Downtime between migration waves

## Estimating cutover timelines
<a name="datasync-large-migration-cutover-timelines"></a>

If you're migrating active datasets, you likely need cutovers so that you don't disrupt business operations.

Don't underestimate how long cutovers take. With large migrations, it's not uncommon for cutover activities to take up to 30 percent of your overall migration time.

1. Evaluate if you need to perform cutovers in waves to reduce the amount of data scanned for incremental changes.

   One strategy for doing this is cutting over datasets that you partition based on shares, folders, or storage systems.

1. Review how long it generally took DataSync to prepare, transfer, and verify your data during the POC.

   Note in particular the prepare durations of your task executions. To find this information, run the [DescribeTaskExecution](https://docs.aws.amazon.com/datasync/latest/userguide/API_DescribeTaskExecution.html) operation, then check the value of [PrepareDuration](https://docs.aws.amazon.com/datasync/latest/userguide/API_TaskExecutionResultDetail.html#DataSync-Type-TaskExecutionResultDetail-PrepareDuration) for the duration time (in milliseconds).

1. Estimate how long a cutover might take by measuring the time delta across parallel tasks.

   For more information on parallel tasks, see [Accelerating your migration with data partitioning](datasync-large-migration-data-partitioning.md).

1. Use your cutover estimation to schedule your cutovers. These essentially are maintenance windows when your source data can't be modified.

## Next steps
<a name="estimate-cutover-timelines-next-steps"></a>

After estimating your timelines, you're ready to start implementing your migration.

# Stage 2: Implementing your large data migration
<a name="datasync-large-migraton-stage-2"></a>

With the information you gathered during planning, you can begin using AWS DataSync to migrate to your new storage system. If you haven't already, we recommend reviewing the [AWS Prescriptive Guidance resources for large migrations](datasync-large-migration.md#review-migration-data-resources).

**Topics**
+ [Accelerating your migration with data partitioning](datasync-large-migration-data-partitioning.md)
+ [Running your DataSync transfer tasks](datasync-large-migration-running-tasks.md)
+ [Monitoring your transfers](datasync-large-migration-monitoring.md)

# Accelerating your migration with data partitioning
<a name="datasync-large-migration-data-partitioning"></a>

With a large migration, we recommend partitioning your dataset with multiple DataSync tasks. Partitioning your source data across multiple tasks (and possibly agents) lets you parallelize your transfers and reduce the migration timeline.

Partitioning also helps you stay within DataSync [quotas](datasync-limits.md) and simplifies the monitoring and debugging of your tasks. 

The following diagram shows how you might use multiple DataSync tasks and agents to transfer data from the same source storage location. In this scenario, each task focuses on a specific folder in the source location. For more information and examples on these approaches, see [How to accelerate your data transfers with AWS DataSync scale out architectures](https://aws.amazon.com/blogs/storage/how-to-accelerate-your-data-transfers-with-aws-datasync-scale-out-architectures/).

![\[A diagram that shows one approach with DataSync for partitioning your source data to help accelerate a large migration.\]](http://docs.aws.amazon.com/datasync/latest/userguide/images/datasync-partition-by-folder.png)


## Partitioning your dataset by folder or prefix
<a name="configure-task-by-folder"></a>

When creating your DataSync source location, you can specify a folder, directory, or prefix that DataSync reads from. For example, if you're migrating a file share with top-level directories, you can create multiple locations that specify a different directory path. You can then use these locations to run multiple DataSync tasks during your migration.

## Partitioning your dataset with filters
<a name="configure-task-with-filters"></a>

You can apply [filters](filtering.md) to include or exclude data from your source location in a transfer. In the context of a large migration, filters can help you scope tasks to specific portions of your dataset.

For example, if you’re migrating archive data that’s organized by year, you can create an include filter to match for a specific year or multiple years. You also can modify the filter each time you run the task to match a different year.

## Partitioning your dataset with manifests
<a name="configure-task-with-manifest"></a>

A [manifest](transferring-with-manifest.md) is a list of files or objects that you want DataSync to transfer. With a manifest, DataSync doesn't have to read everything in a source location to determine what to transfer.

You can create manifests from inventories of your source storage or through event-driven approaches (for example, see [Implementing AWS DataSync with hundreds of millions of objects](https://aws.amazon.com/blogs/storage/implementing-aws-datasync-with-hundreds-of-millions-of-objects/)). You can also use a different manifest each time you start a task, allowing you to transfer different sets of data with the same task.

# Running your DataSync transfer tasks
<a name="datasync-large-migration-running-tasks"></a>

During each of your migration waves, your data transfer usually follows the same general process:

1. Run an initial full transfer of your data.

1. Verify the data in the destination.

1. Run incremental transfers for any data that might have changed since the initial transfer.

1. Cut over operations to your destination location.

1. Review cutover results.

## Running your tasks
<a name="datasync-large-migration-running-tasks-how-to"></a>

You likely will need to run your DataSync transfer tasks during business hours to minimize your overall migration time. It's common in these situations to run an initial full transfer followed by incremental transfers that account for changes to your source location from users and applications.

To avoid network-related issues during business hours, you can limit the amount of bandwidth that your tasks use. For more information, see [Setting bandwidth limits for your AWS DataSync task](configure-bandwidth.md).

1. Run an initial full transfer:

   1. [Start your DataSync task](run-task.md) (or tasks if you’re running tasks in parallel).

   1. Monitor the progress and performance of your task executions.

   1. Verify that your data transferred the way you expect (for example, file metadata is preserved).

1. Run incremental transfers:

   1. [Schedule your tasks](task-scheduling.md) to run periodically.

   1. Monitor your task executions and fix errors if encountered.

## Performing a cutover
<a name="datasync-migration-cutting-over-how-to"></a>

After your initial and incremental transfers, you can start the process of cutting over operations to your destination location.

1. Start the scheduled maintenance window.

1. Update your source storage system to be read only for applications and users.

1. Run final incremental transfers to copy remaining deltas between your source and destination locations.

1. Conduct a thorough data validation (for example, by reviewing CloudWatch logs and [task reports](task-reports.md)).

1. Switch your applications and users to the new environment of your destination location.

1. Test application functionality and make sure that users can access data in your destination location.

1. Schedule a retrospective meeting to review the transfer with the migration teams. Ask the following probing sample questions:
   + Was the cutover successful? If not, what was the issue?
   + Did we use all available bandwidth?
   + Was the source and destination storage fully utilized?
   + Can we get more data throughput with additional tasks?
   + Do we need to plan for a longer maintenance window?

1. If needed, update your migration plan before starting the next wave.

# Monitoring your transfers
<a name="datasync-large-migration-monitoring"></a>

AWS DataSync provides several monitoring options to help you validate and debug your transfer.

## Monitoring your transfers with CloudWatch metrics
<a name="datasync-migration-monitoring-cloudwatch-metrics"></a>

You can create custom CloudWatch dashboards with metrics from your DataSync task executions. For more information, see [Monitoring data transfers with Amazon CloudWatch metrics](monitor-datasync.md).

## Monitoring your transfers with task reports
<a name="datasync-migration-monitoring-task-reports"></a>

If you’re transferring millions of files or objects, considering using task reports. Task reports provide detailed information about what DataSync attempts to transfer, skip, verify, and delete during a task execution. For more information, see [Monitoring your data transfers with task reports](task-reports.md).

You can also visualize your task reports by using AWS services such as AWS Glue, Amazon Athena, and Amazon Quick. For more information, see the [AWS Storage Blog](https://aws.amazon.com/blogs/storage/derive-insights-from-aws-datasync-task-reports-using-aws-glue-amazon-athena-and-amazon-quicksight/).

## Monitoring your transfers with CloudWatch Logs
<a name="datasync-migration-monitoring-cloudwatch-logs"></a>

At minimum, we recommend that you configure your task to log basic information and transfer errors. For more information, see [Monitoring data transfers with Amazon CloudWatch Logs](configure-logging.md).