With AWS DataSync, you can transfer data between Microsoft Azure Blob Storage (including Azure Data Lake Storage Gen2 blob storage) and the following AWS storage services:
To set up this kind of transfer, you create a location for your Azure Blob Storage. You can use this location as a transfer source or destination.
Providing DataSync access to your Azure Blob
Storage
How DataSync accesses your Azure Blob Storage depends on several factors, including whether you're transferring to or from blob storage and what kind of shared access signature (SAS) token you're using. Your objects also must be in an access tier that DataSync can work with.
Topics
SAS tokens
A SAS token specifies the access permissions for your blob storage. (For more
information about SAS, see the Azure Blob Storage documentation
You can generate SAS tokens to provide different levels of access. DataSync supports tokens with the following access levels:
-
Account
-
Container
The access permissions that DataSync needs depends on the scope of your token. Not having the correct permissions can cause your transfer to fail. For example, your transfer won't succeed if you're moving objects with tags to Azure Blob Storage but your SAS token doesn't have tag permissions.
Topics
SAS token permissions for account-level access
DataSync needs an account-level access token with the following permissions (depending on whether you're transferring to or from Azure Blob Storage).
-
Allowed services – Blob
-
Allowed resource types – Container, Object
If you don't include these permissions, DataSync can't transfer your object metadata, including object tags.
-
Allowed permissions – Read, List
-
Allowed blob index permissions – Read/Write (if you want DataSync to copy object tags)
SAS token permissions for container-level access
DataSync needs a container-level access token with the following permissions (depending on whether you're transferring to or from Azure Blob Storage).
-
Read
-
List
-
Tag (if you want DataSync to copy object tags)
Note
You can't add the tag permission when generating a SAS token in the Azure portal. To add the tag permission, instead generate the token by using the Azure Storage Explorer
app or generate a SAS token that provides account-level access.
SAS expiration policies
Make sure that your SAS doesn't expire before you expect to finish your
transfer. For information about configuring a SAS expiration policy, see the
Azure Blob Storage documentation
If the SAS expires during the transfer, DataSync can no longer access your
Azure Blob Storage location. (You might see a
Failed to open directory
error.) If this happens,
update your location with
a new SAS token and restart your DataSync task.
Access tiers
When transferring from Azure Blob Storage, DataSync can copy objects in the hot and cool tiers. For objects in the archive access tier, you must rehydrate those objects to the hot or cool tier before you can copy them.
When transferring to Azure Blob Storage, DataSync can copy objects into the hot, cool, and archive access tiers. If you're copying objects into the archive access tier, DataSync can't verify the transfer if you're trying to verify all data in the destination.
DataSync doesn't support the cold access tier. For more information about access
tiers, see the Azure Blob Storage documentation
Considerations with Azure Blob
Storage transfers
When planning to transfer data to or from Azure Blob Storage with DataSync, there are some things to keep in mind.
Topics
Costs
The fees associated with moving data in or out of Azure Blob Storage can include:
-
Running an Azure virtual machine (VM)
(if you deploy your DataSync agent in Azure) -
Running an Amazon EC2
instance (if you deploy your DataSync agent in a VPC within AWS) -
Transferring the data by using DataSync
, including request charges related to Azure Blob Storage and Amazon S3 (if S3 is one of your transfer locations) -
Transferring data in or out of Azure Blob Storage
-
Storing data in an AWS storage service supported by DataSync
Blob types
How DataSync works with blob types depends on whether you're transferring to or from Azure Blob Storage. When you're moving data into blob storage, the objects or files that DataSync transfers can only be block blobs. When you're moving data out of blob storage, DataSync can transfer block, page, and append blobs.
For more information about blob types, see the Azure Blob Storage documentation
AWS Region availability
You can create an Azure Blob Storage transfer location in any AWS Region that's supported by DataSync.
Copying object tags
The ability for DataSync to preserve object tags when transferring to or from Azure Blob Storage depends on the following factors:
-
The size of an object's tags – DataSync can't transfer an object with tags that exceed 2 KB.
-
Whether DataSync is configured to copy object tags – DataSync copies object tags by default.
-
The namespace that your Azure storage account uses – DataSync can copy object tags if your Azure storage account uses a flat namespace but not if your account uses a hierarchical namespace (a feature of Azure Data Lake Storage Gen2). Your DataSync task will fail if you try to copy object tags and your storage account uses a hierarchical namespace.
-
Whether your SAS token authorizes tagging – The permissions that you need to copy object tags vary depending on the level of access that your token provides. Your task will fail if you try to copy object tags and your token doesn't have the right permissions for tagging. For more information, check the permission requirements for account-level access tokens or container-level access tokens.
Transferring to Amazon S3
When transferring to Amazon S3, DataSync won't transfer Azure Blob Storage objects larger than 5 TB or objects with metadata larger than 2 KB.
Deleting directories in a transfer destination
When transferring to Azure Blob Storage, DataSync can remove objects in your blob storage that aren't present in your transfer source. (You can configure this option by clearing the Keep deleted files setting in the DataSync console. Your SAS token must also have delete permissions.)
When you configure your transfer this way, DataSync won't delete directories in your
blob storage if your Azure storage account is using a hierarchical
namespace. In this case, you must manually delete the directories (for example, by
using Azure Storage Explorer
Limitations
Remember the following limitations when transferring data to or from Azure Blob Storage:
-
DataSync creates some directories in a location to help facilitate your transfer. If Azure Blob Storage is a destination location and your storage account uses a hierarchical namespace, you might notice task-specific subdirectories (such as
task-000011112222abcde
) in the/.aws-datasync
folder. DataSync typically deletes these subdirectories following a transfer. If that doesn't happen, you can delete these task-specific directories yourself as long as a task isn't running. -
DataSync doesn't support using a SAS token to access only a specific folder in your Azure Blob Storage container.
-
You can't provide DataSync a user delegation SAS token for accessing your blob storage.
Creating your DataSync agent
To get started, you must create a DataSync agent that can connect to your Azure Blob Storage container. This process includes deploying and activating an agent.
Tip
Although you can deploy your agent on an Amazon EC2 instance, using a Microsoft Hyper-V agent might result in decreased network latency and more data compression.
You can deploy your DataSync agent directly in Azure with a Microsoft Hyper-V image.
Tip
Before you continue, consider using a shell script that might help you
deploy your Hyper-V agent in Azure quicker. You can get more
information and download the code on GitHub
If you use the script, you can skip ahead to the section about Getting your agent's activation key.
Topics
Prerequisites
To prepare your DataSync agent and deploy it in Azure, you must do the following:
-
Enable Hyper-V on your local machine.
-
Install PowerShell
(including the Hyper-V Module). -
Install the Azure CLI
. -
Install AzCopy
.
Downloading and preparing your agent
Download an agent from the DataSync console. Before you can deploy the agent
in Azure, you must convert it to a fixed-size virtual hard
disk (VHD). For more information, see the Azure documentation
To download and prepare your agent
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Agents, and then choose Create agent.
-
For Hypervisor, choose Microsoft Hyper-V, and then choose Download the image.
The agent downloads in a
.zip
file that contains a.vhdx
file. -
Extract the
.vhdx
file on your local machine. -
Open PowerShell and do the following:
-
Copy the following
Convert-VHD
cmdlet:Convert-VHD -Path .\
local-path-to-vhdx-file
\aws-datasync-2.0.1686143940.1-x86_64.xfs.gpt.vhdx ` -DestinationPath .\local-path-to-vhdx-file
\aws-datasync-2016861439401-x86_64.vhd -VHDType Fixed -
Replace each instance of
with the location of thelocal-path-to-vhdx-file
.vhdx
file on your local machine. -
Run the command.
Your agent is now a fixed-size VHD (with a
.vhd
file format) and ready to deploy in Azure. -
Deploying your agent in Azure
Deploying your DataSync agent in Azure involves:
-
Creating a managed disk in Azure
-
Uploading your agent to that managed disk
-
Attaching the managed disk to a Linux virtual machine
To deploy your agent in Azure
-
In PowerShell, go to the directory that contains your agent's
.vhd
file. -
Run the
ls
command and save theLength
value (for example,85899346432
).This is the size of your agent image in bytes, which you need when creating a managed disk that can hold the image.
-
Do the following to create a managed disk:
-
Copy the following Azure CLI command:
az disk create -n
your-managed-disk
` -gyour-resource-group
` -lyour-azure-region
` --upload-type Upload ` --upload-size-bytesagent-size-bytes
` --sku standard_lrs -
Replace
with a name for your managed disk.your-managed-disk
-
Replace
with the name of the Azure resource group that your storage account belongs to.your-resource-group
-
Replace
with the Azure region where your resource group is located.your-azure-region
-
Replace
with the size of your agent image.agent-size-bytes
-
Run the command.
This command creates an empty managed disk with a standard SKU
where you can upload your DataSync agent. -
-
To generate a shared access signature (SAS) that allows write access to the managed disk, do the following:
-
Copy the following Azure CLI command:
az disk grant-access -n
your-managed-disk
` -gyour-resource-group
` --access-level Write ` --duration-in-seconds 86400 -
Replace
with the name of the managed disk that you created.your-managed-disk
-
Replace
with the name of the Azure resource group that your storage account belongs to.your-resource-group
-
Run the command.
In the output, take note of the SAS URI. You need this URI when uploading the agent to Azure.
The SAS allows you to write to the disk for up to an hour. This means that you have an hour to upload your agent to the managed disk.
-
-
To upload your agent to your managed disk in Azure, do the following:
-
Copy the following
AzCopy
command:.\azcopy copy
local-path-to-vhd-file
sas-uri
--blob-type PageBlob -
Replace
with the location of the agent'slocal-path-to-vhd-file
.vhd
file on your local machine. -
Replace
with the SAS URI that you got when you ran thesas-uri
az disk grant-access
command. -
Run the command.
-
-
When the agent upload finishes, revoke access to your managed disk. To do this, copy the following Azure CLI command:
az disk revoke-access -n
your-managed-disk
-gyour-resource-group
-
Replace
with the name of the Azure resource group that your storage account belongs to.your-resource-group
-
Replace
with the name of the managed disk that you created.your-managed-disk
-
Run the command.
-
-
Do the following to attach your managed disk to a new Linux VM:
-
Copy the following Azure CLI command:
az vm create --resource-group
your-resource-group
` --location eastus ` --nameyour-agent-vm
` --size Standard_E4as_v4 ` --os-type linux ` --attach-os-diskyour-managed-disk
-
Replace
with the name of the Azure resource group that your storage account belongs to.your-resource-group
-
Replace
with a name for the VM that you can remember.your-agent-vm
-
Replace
with the name of the managed disk that you're attaching to the VM.your-managed-disk
-
Run the command.
-
You've deployed your agent. Before you can start configuring your data transfer, you must activate the agent.
Getting your agent's activation key
To manually get your DataSync agent's activation key, follow these steps.
Alternatively, DataSync can automatically get the activation key for you, but this approach requires some network configuration.
To get your agent's activation key
-
In the Azure portal, enable boot diagnostics for the VM for your agent
by choosing the Enable with custom storage account setting and specifying your Azure storage account. After you've enabled the boot diagnostics for your agent's VM, you can access your agent’s local console to get the activation key.
-
While still in the Azure portal, go to your VM and choose Serial console.
-
In the agent's local console, log in by using the following default credentials:
-
Username –
admin
-
Password –
password
We recommend at some point changing at least the agent's password. In the agent's local console, enter
5
on the main menu, then use thepasswd
command to change the password. -
-
Enter
0
to get the agent's activation key. -
Enter the AWS Region where you're using DataSync (for example,
us-east-1
). -
Choose the service endpoint that the agent will use to connect with AWS.
-
Save the value of the
Activation key
output.
Activating your agent
After you have the activation key, you can finish creating your DataSync agent.
To activate your agent
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, choose Agents, and then choose Create agent.
-
For Hypervisor, choose Microsoft Hyper-V.
-
For Endpoint type, choose the same type of service endpoint that you specified when you got your agent's activation key (for example, choose Public service endpoints in
Region name
). -
Configure your network to work with the service endpoint type that your agent is using. For service endpoint network requirements, see the following topics:
-
For Activation key, do the following:
-
Choose Manually enter your agent's activation key.
-
Enter the activation key that you got from the agent's local console.
-
-
Choose Create agent.
Your agent is ready to connect with your Azure Blob Storage. For more information, see Creating your Azure Blob Storage transfer location.
You can deploy your DataSync agent on an Amazon EC2 instance.
To create an Amazon EC2 agent
-
Choose a service endpoint that the agent uses to communicate with AWS.
In this situation, we recommend using a virtual private cloud (VPC) service endpoint.
-
Configure your network to work with VPC service endpoints.
Creating your Azure Blob Storage
transfer location
You can configure DataSync to use your Azure Blob Storage as a transfer source or destination.
Before you begin
Make sure that you know how DataSync accesses Azure Blob Storage and works with access tiers and blob types. You also need a DataSync agent that can connect to your Azure Blob Storage container.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. In the left navigation pane, expand Data transfer, then choose Locations and Create location.
-
For Location type, choose Microsoft Azure Blob Storage.
-
For Agents, choose the DataSync agent that can connect with your Azure Blob Storage container.
You can choose more than one agent. For more information, see Using multiple DataSync agents.
-
For Container URL, enter the URL of the container that's involved in your transfer.
-
(Optional) For Access tier when used as a destination, choose the access tier that you want your objects or files transferred into.
-
For Folder, enter path segments if you want to limit your transfer to a virtual directory in your container (for example,
/my/images
). -
For SAS token, enter the SAS token that allows DataSync to access your blob storage.
The token is part of the SAS URI string that comes after the storage resource URI and a question mark (
?
). A token looks something like this:sp=r&st=2023-12-20T14:54:52Z&se=2023-12-20T22:54:52Z&spr=https&sv=2021-06-08&sr=c&sig=aBBKDWQvyuVcTPH9EBp%2FXTI9E%2F%2Fmq171%2BZU178wcwqU%3D
-
(Optional) Enter values for the Key and Value fields to tag the location.
Tags help you manage, filter, and search for your AWS resources. We recommend creating at least a name tag for your location.
-
Choose Create location.
-
Copy the following
create-location-azure-blob
command:aws datasync create-location-azure-blob \ --container-url "https://
path/to/container
" \ --authentication-type "SAS" \ --sas-configuration '{ "Token": "your-sas-token
" }' \ --agent-arnsmy-datasync-agent-arn
\ --subdirectory "/path/to/my/data
" \ --access-tier "access-tier-for-destination
" \ --tags [{"Key": "key1
","Value": "value1
"}] -
For the
--container-url
parameter, specify the URL of the Azure Blob Storage container that's involved in your transfer. -
For the
--authentication-type
parameter, specifySAS
. -
For the
--sas-configuration
parameter'sToken
option, specify the SAS token that allows DataSync to access your blob storage.The token is part of the SAS URI string that comes after the storage resource URI and a question mark (
?
). A token looks something like this:sp=r&st=2023-12-20T14:54:52Z&se=2023-12-20T22:54:52Z&spr=https&sv=2021-06-08&sr=c&sig=aBBKDWQvyuVcTPH9EBp%2FXTI9E%2F%2Fmq171%2BZU178wcwqU%3D
-
For the
--agent-arns
parameter, specify the Amazon Resource Name (ARN) of the DataSync agent that can connect to your container.Here's an example agent ARN:
arn:aws:datasync:
us-east-1
:123456789012
:agent/agent-01234567890aaabfb
You can specify more than one agent. For more information, see Using multiple DataSync agents.
-
For the
--subdirectory
parameter, specify path segments if you want to limit your transfer to a virtual directory in your container (for example,/my/images
). -
(Optional) For the
--access-tier
parameter, specify the access tier (HOT
,COOL
, orARCHIVE
) that you want your objects or files transferred into.This parameter applies only when you're using this location as a transfer destination.
-
(Optional) For the
--tags
parameter, specify key-value pairs that can help you manage, filter, and search for your location.We recommend creating a name tag for your location.
-
Run the
create-location-azure-blob
command.If the command is successful, you get a response that shows you the ARN of the location that you created. For example:
{ "LocationArn": "arn:aws:datasync:us-east-1:123456789012:location/loc-12345678abcdefgh" }
Viewing your Azure Blob Storage transfer
location
You can get details about the existing DataSync transfer location for your Azure Blob Storage.
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
In the left navigation pane, expand Data transfer, then choose Locations.
-
Choose your Azure Blob Storage location.
You can see details about your location, including any DataSync transfer tasks that are using it.
-
Copy the following
describe-location-azure-blob
command:aws datasync describe-location-azure-blob \ --location-arn "
your-azure-blob-location-arn
" -
For the
--location-arn
parameter, specify the ARN for the Azure Blob Storage location that you created (for example,arn:aws:datasync:
).us-east-1
:123456789012
:location/loc-12345678abcdefgh
-
Run the
describe-location-azure-blob
command.You get a response that shows you details about your location. For example:
{ "LocationArn": "arn:aws:datasync:us-east-1:123456789012:location/loc-12345678abcdefgh", "LocationUri": "azure-blob://my-user.blob.core.windows.net/container-1", "AuthenticationType": "SAS", "Subdirectory": "/my/images", "AgentArns": ["arn:aws:datasync:us-east-1:123456789012:agent/agent-01234567890deadfb"], }
Updating your Azure Blob Storage transfer
location
If needed, you can modify your location's configuration in the console or by using the AWS CLI.
-
Copy the following
update-location-azure-blob
command:aws datasync update-location-azure-blob \ --location-arn "
your-azure-blob-location-arn
" \ --authentication-type "SAS" \ --sas-configuration '{ "Token": "your-sas-token
" }' \ --agent-arnsmy-datasync-agent-arn
\ --subdirectory "/path/to/my/data
" \ --access-tier "access-tier-for-destination
" -
For the
--location-arn
parameter, specify the ARN for the Azure Blob Storage location that you're updating (for example,arn:aws:datasync:
).us-east-1
:123456789012
:location/loc-12345678abcdefgh
-
For the
--authentication-type
parameter, specifySAS
. -
For the
--sas-configuration
parameter'sToken
option, specify the SAS token that allows DataSync to access your blob storage.The token is part of the SAS URI string that comes after the storage resource URI and a question mark (
?
). A token looks something like this:sp=r&st=2022-12-20T14:54:52Z&se=2022-12-20T22:54:52Z&spr=https&sv=2021-06-08&sr=c&sig=qCBKDWQvyuVcTPH9EBp%2FXTI9E%2F%2Fmq171%2BZU178wcwqU%3D
-
For the
--agent-arns
parameter, specify the Amazon Resource Name (ARN) of the DataSync agent that you want to connect to your container.Here's an example agent ARN:
arn:aws:datasync:
us-east-1
:123456789012
:agent/agent-01234567890aaabfb
You can specify more than one agent. For more information, see Using multiple DataSync agents.
-
For the
--subdirectory
parameter, specify path segments if you want to limit your transfer to a virtual directory in your container (for example,/my/images
). -
(Optional) For the
--access-tier
parameter, specify the access tier (HOT
,COOL
, orARCHIVE
) that you want your objects to be transferred into.This parameter applies only when you're using this location as a transfer destination.
Next
steps
After you finish creating a DataSync location for your Azure Blob Storage, you can continue setting up your transfer. Here are some next steps to consider:
-
If you haven't already, create another location where you plan to transfer your data to or from your Azure Blob Storage.
-
Learn how DataSync handles metadata and special files, particularly if your transfer locations don't have a similar metadata structure.
-
Configure how your data gets transferred. For example, you can transfer only a subset of your data or delete files in your blob storage that aren't in your source location (as long as your SAS token has delete permissions).