Replicating existing objects with S3 Batch Replication - Amazon Simple Storage Service

Replicating existing objects with S3 Batch Replication

By using S3 Batch Replication, you can replicate the following types of objects:

  • Objects that existed before a replication configuration was in place

  • Objects that have previously been replicated

  • Objects that have failed replication

You can replicate these objects on demand by using a Batch Operations job. S3 Batch Replication differs from live replication, which continuously and automatically replicates new objects across Amazon S3 buckets.

To get started with Batch Replication, you can:

  • Initiate Batch Replication for a new replication rule or destination – You can create a one-time Batch Replication job when you're creating the first rule in a new replication configuration or when you're adding a new destination to an existing configuration through the Amazon S3 console.

  • Initiate Batch Replication for an existing replication configuration – You can create a new Batch Replication job by using S3 Batch Operations through the Amazon S3 console, the AWS Command Line Interface (AWS CLI), AWS SDKs, or the Amazon S3 REST API.

When the Batch Replication job finishes, you receive a completion report. For more information about how to use the report to examine the job, see Tracking job status and completion reports.

S3 Batch Replication considerations

  • Your source bucket must have an existing replication configuration. To enable replication, see Setting up live replication and Examples for configuring live replication.

  • If you have S3 Lifecycle configured for your bucket, we recommend disabling your lifecycle rules while the Batch Replication job is active. Doing so helps ensure parity between the source and destination buckets. Otherwise, these buckets could diverge, and the destination bucket won't be an exact replica of the source bucket. For example, consider the following scenario:

    • Your source bucket has multiple versions of an object and a delete marker on that object.

    • Your source and destination buckets have a lifecycle configuration to remove expired delete markers.

    In this scenario, Batch Replication might replicate the delete marker to the destination bucket before replicating the object versions. This behavior could result in your lifecycle configuration marking the delete marker as expired and the delete marker being removed from the destination bucket before the object versions are replicated.

  • The AWS Identity and Access Management (IAM) role that you specify to run the Batch Operations job must have the necessary permissions to perform the underlying Batch Replication operation. For more information about creating IAM roles, see Configuring IAM policies for Batch Replication.

  • Batch Replication requires a manifest, which can be generated by Amazon S3. The generated manifest must be stored in the same AWS Region as the source bucket. If you choose not to generate the manifest, you can supply an Amazon S3 Inventory report or CSV file that contains the objects that you want to replicate.

  • Batch Replication doesn't support re-replicating objects that were deleted with the version ID of the object from the destination bucket. To re-replicate these objects, you can copy the source objects in place with a Batch Copy job. Copying those objects in place creates new versions of the objects in the source bucket and automatically initiates replication to the destination bucket. Deleting and recreating the destination bucket doesn't initiate replication.

    For more information about Batch Copy, see Examples that use Batch Operations to copy objects.

  • If you're using a replication rule on the S3 bucket, make sure to update your replication configuration by granting the IAM role that's attached to the replication rule the proper permissions to replicate objects. This IAM role must have the necessary permissions to perform replication on both the source and destination buckets.

  • If you submit multiple Batch Replication jobs for the same bucket within a short time frame, Amazon S3 will run those jobs concurrently.

  • If you submit multiple Batch Replication jobs for two different buckets, be aware that Amazon S3 might not run all jobs concurrently. If you exceed the number of Batch Replication jobs that can run at one time on your account, Amazon S3 will pause the lower priority jobs to work on the higher priority ones. After the higher priority items have been completed, any paused jobs will become active again.

  • Batch Replication isn't supported for objects that are stored in the S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes.

  • To batch replicate S3 Intelligent-Tiering objects that are stored in the Archive Access or Deep Archive Access storage tiers, you must first initiate a restore request and wait until the objects are moved to the Frequent Access tier.

Specifying a manifest for a Batch Replication job

A manifest is an Amazon S3 object that contains the object keys that you want Amazon S3 to act upon. If you want to create a Batch Replication job, you must supply either a user-generated manifest or have Amazon S3 generate a manifest based on your replication configuration.

If you supply a user-generated manifest, it must be in the form of an Amazon S3 Inventory report or a CSV file. If the objects in your manifest are in a versioned bucket, you must specify the version IDs for the objects. Only the object with the version ID that's specified in the manifest will be replicated. To learn more about specifying a manifest, see Specifying a manifest.

If you choose to have Amazon S3 generate a manifest file on your behalf, the objects listed will use the same source bucket, prefix, and tags as all your replication configurations of the source bucket. With a generated manifest, Amazon S3 will replicate all eligible versions of your objects.

Note

If you choose to have Amazon S3 generate the manifest, the manifest must be stored in the same AWS Region as the source bucket.

Filters for a Batch Replication job

When creating your Batch Replication job, you can optionally specify additional filters, such as the object creation date and replication status, to reduce the scope of the job.

You can filter objects to replicate based on the ObjectReplicationStatuses value, by providing one or more of the following values:

  • "NONE" – Indicates that Amazon S3 has never attempted to replicate the object before.

  • "FAILED" – Indicates that Amazon S3 has attempted, but failed, to replicate the object before.

  • "COMPLETED" – Indicates that Amazon S3 has successfully replicated the object before.

  • "REPLICA" – Indicates that this is a replica object that Amazon S3 has replicated from another source.

For more information about replication statuses, see Getting replication status information.

If you don't filter your Batch Replication job, Batch Operations will attempt to replicate all objects (no matter their ObjectReplicationStatus) in your manifest that match the rules in your replication configuration, except for certain objects that aren't replicated by default. For more information, see What isn't replicated with replication configurations?

Depending on your goal, you might set ObjectReplicationStatuses to one or more of the following values:

  • To replicate only existing objects that have never been replicated, only include "NONE".

  • To retry replicating only objects that previously failed to replicate, only include "FAILED".

  • To both replicate existing objects and retry replicating objects that previously failed to replicate, include both "NONE" and "FAILED".

  • To backfill a destination bucket with objects that have been replicated to another destination, include "COMPLETED".

  • To replicate objects that were previously replicated, include "REPLICA".

Batch Replication completion report

When you create a Batch Replication job, you can request a CSV completion report. This report shows objects, replication success or failure codes, outputs, and descriptions. For more information about job tracking and completion reports, see Completion reports.

For a list of replication failure codes and descriptions, see Amazon S3 replication failure reasons.

For information about troubleshooting Batch Replication, see Batch Replication errors.

Getting started with Batch Replication

To learn more about how to use Batch Replication, see Tutorial: Replicating existing objects in your Amazon S3 buckets with S3 Batch Replication.