Understanding how DataSync handles file and object metadata - AWS DataSync

Understanding how DataSync handles file and object metadata

AWS DataSync can preserve your file or object metadata during a data transfer. How your metadata gets copied depends on your transfer locations and if those locations use similar types of metadata.

System-level metadata

In general, DataSync doesn't copy system-level metadata. For example, when transferring from an SMB file server, the permissions you configured at the file system level aren't copied to the destination storage system.

There are exceptions. When transferring between Amazon S3 and other object storage, DataSync does copy some system-defined object metadata.

Metadata copied in Amazon S3 transfers

The following tables describe what metadata DataSync can copy when a transfer involves an Amazon S3 location.

To Amazon S3

When copying from one of these locations To this location DataSync can copy
  • NFS

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Amazon S3

The following as Amazon S3 user metadata:

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID and group ID

  • POSIX permissions

The file metadata stored in Amazon S3 user metadata is interoperable with NFS shares on file gateways using AWS Storage Gateway. A file gateway enables low-latency access from on-premises networks to data that was copied to Amazon S3 by DataSync. This metadata is also interoperable with FSx for Lustre.

When DataSync copies objects that contain this metadata back to an NFS server, the file metadata is restored. Restoring metadata requires granting elevated permissions to the NFS server. For more information, see Configuring AWS DataSync transfers with an NFS file server.

Between Amazon S3 and other object storage

When copying between these locations DataSync can copy
  • Object storage

  • Amazon S3

  • User-defined object metadata

  • Object tags

  • The following system-defined object metadata:

    • Content-Disposition

    • Content-Encoding

    • Content-Language

    • Content-Type

    Note: DataSync copies system-level metadata for all objects during an initial transfer. If you configure your task to transfer only data that has changed, DataSync won't copy system metadata in subsequent transfers unless an object's content or user metadata has also been modified.

DataSync doesn't copy other object metadata, such as object access control lists (ACLs), prior object versions, or the Last-Modified key.

  • Microsoft Azure Blob Storage

  • Amazon S3

Between Amazon S3 and HDFS

When copying between these locations DataSync can copy
  • Hadoop Distributed File System (HDFS)

  • Amazon S3

The following as Amazon S3 user metadata:

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID and group ID

  • POSIX permissions

HDFS uses strings to store file and folder user and group ownership, rather than numeric identifiers, such as UIDs and GIDs.

Metadata copied in NFS transfers

The following table describes what metadata DataSync can copy between locations that use Network File System (NFS).

When copying between these locations DataSync can copy
  • NFS

  • Amazon EFS

  • Amazon FSx for Lustre

  • Amazon FSx for OpenZFS

  • Amazon FSx for NetApp ONTAP (using NFS)

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID (UID) and group ID (GID)

  • POSIX permissions

Metadata copied in SMB transfers

The following table describes what metadata DataSync can copy between locations that use Server Message Block (SMB).

When copying between these locations DataSync can copy
  • SMB

  • Amazon FSx for Windows File Server

  • FSx for ONTAP (using SMB)

  • File timestamps: access time, modification time, and creation time

  • File owner security identifier (SID)

  • Standard file attributes: read-only (R), archive (A), system (S), hidden (H), compressed (C), not content indexed (I), encrypted (E), temporary (T), offline (O), and sparse (P)

    DataSync attempts to copy the archive (A), compressed (C), not context indexed (I), sparse (P), and temporary (T) attributes on a best-effort basis. If these attributes aren't applied on the destination, they're ignored during task verification.

  • NTFS discretionary access lists (DACLs), which determine whether to grant access to an object.

  • NTFS system access control lists (SACLs), which are used by administrators to log attempts to access a secured object.

    Note: SACLs are not copied if you use SMB version 1.0.

    Copying DACLs and SACLs requires granting specific permissions to the Windows user that DataSync uses to access your location using SMB. For more information, see creating a location for SMB, FSx for Windows File Server, or FSx for ONTAP (depending on the type of location in your transfer).

Metadata copied in other transfer scenarios

DataSync handles metadata the following ways when copying between these storage systems (most of which have different metadata structures).

When copying from one of these locations To one of these locations DataSync can copy
  • SMB

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Amazon S3

  • Object storage

  • Azure Blob Storage

  • NFS

Default POSIX metadata for all files and folders on the destination file system or objects in the destination S3 bucket. This approach includes using the default POSIX user ID and group ID values.

Windows-based metadata (such as ACLs) is not preserved.

  • Object storage

  • Amazon S3

  • Azure Blob Storage

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

Default POSIX metadata on the destination files and folders. This approach includes using the default POSIX user ID and group ID values.

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Azure Blob Storage

The following as user-defined metadata:

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID and group ID

  • POSIX permissions

  • HDFS

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • POSIX permissions

HDFS stores file and folder user and group ownership as strings rather than numeric identifiers (such as UIDs and GIDs). Default values for UIDs and GIDs are applied on the destination file system. For more information, see Understanding when and how DataSync applies default POSIX metadata.

  • Amazon S3

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for Windows File Server

  • FSx for ONTAP

  • HDFS

File and folder timestamps from the source location. The file or folder owner is set based on the HDFS user or Kerberos principal you specified when creating the HDFS transfer location. The Groups Mapping configuration on the Hadoop cluster determines the group.
  • Amazon S3

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Object storage

  • NFS

  • HDFS

  • SMB

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

File and folder timestamps from the source location. Ownership is set based on the Windows user that was specified in DataSync to access the Amazon FSx or SMB share. Permissions are inherited from the parent directory.
  • Azure Blob Storage

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

Understanding when and how DataSync applies default POSIX metadata

DataSync applies default POSIX metadata in the following situations:

  • When your transfer's source and destination locations don't have similar metadata structures

  • When metadata is missing from the source location

The following table describes how DataSync applies default POSIX metadata during these types of transfers:

Source Destination File permissions Folder permissions UID GID
  • Amazon S31

  • Object storage1

  • Microsoft Azure Blob Storage1

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • NFS

0755

0755

65534

65534

  • SMB

  • Amazon S3

  • Object storage

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • NFS

0644

0755

65534

65534

  • HDFS

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • NFS

0644

0755

65534

65534

1 In cases where the objects don't have metadata that was previously applied by DataSync.