

# What is Amazon MSK Replicator?
<a name="msk-replicator"></a>

Amazon MSK Replicator is an Amazon MSK feature that enables you to reliably replicate data across Amazon MSK clusters in different or the same AWS Region. However, both the source and target clusters must be in the same AWS account. With MSK Replicator, you can easily build regionally resilient streaming applications for increased availability and business continuity. MSK Replicator provides automatic asynchronous replication across MSK clusters, eliminating the need to write custom code, manage infrastructure, or setup cross-region networking.

MSK Replicator automatically scales the underlying resources so that you can replicate data on-demand without having to monitor or scale capacity. MSK Replicator also replicates the necessary Kafka metadata including topic configurations, Access Control Lists (ACLs), and consumer group offsets. If an unexpected event occurs in a region, you can failover to the other AWS region and seamlessly resume processing. 

MSK Replicator supports both cross-region replication (CRR) and same-region replication (SRR). In cross-region replication, the source and target MSK clusters are in different AWS Regions. In same-region replication, both the source and target MSK clusters are in the same AWS Region. You need to create source and target MSK clusters before using them with MSK Replicator.

**Note**  
MSK Replicator supports the following AWS Regions: US East (us-east-1, N. Virginia); US East (us-east-2, Ohio); US West (us-west-2, Oregon); Europe (eu-west-1, Ireland); Europe (eu-central-1, Frankfurt); Asia Pacific (ap-southeast-1, Singapore); Asia Pacific (ap-southeast-2, Sydney); Europe (eu-north-1, Stockholm); Asia Pacific (ap-south-1, Mumbai); Europe (eu-west-3, Paris); South America (sa-east-1, São Paulo); Asia Pacific (ap-northeast-2, Seoul); Europe (eu-west-2, London); Asia Pacific (ap-northeast-1, Tokyo); US West (us-west-1, N. California); Canada (ca-central-1, Central); China (Beijing) (cn-north-1); China (Ningxia) (cn-northwest-1); Asia Pacific (Hyderabad) (ap-south-2), Asia Pacific (Malaysia) (ap-southeast-5), Asia Pacific (Thailand) (ap-southeast-7), Mexico (Central) (mx-central-1), Asia Pacific (Taipei) (ap-east-2), Canada West (Calgary) (ca-west-1), Europe (Spain) (eu-south-2), Middle East (Bahrain) (me-south-1), Asia Pacific (Jakarta) (ap-southeast-3), Africa (Cape Town) (af-south-1), Middle East (UAE) (me-central-1), Asia Pacific (Hong Kong) (ap-east-1), Asia Pacific (Osaka) (ap-northeast-3), Asia Pacific (Melbourne) (ap-southeast-4), Europe (Milan) (eu-south-1), Israel (Tel Aviv) (il-central-1), Europe (Zurich) (eu-central-2) and Asia Pacific (New Zealand) (ap-southeast-6)

Here are some common uses for Amazon MSK Replicator.
+ Build multi-region streaming applications: Build highly available and fault-tolerant streaming applications for increased resiliency without setting up custom solutions.
+ Lower latency data access: Provide lower latency data access to consumers in different geographic regions.
+ Distribute data to your partners: Copy data from one Apache Kafka cluster to many Apache Kafka clusters, so that different teams/partners have their own copies of data.
+ Aggregate data for analytics: Copy data from multiple Apache Kafka clusters into one cluster for easily generating insights on aggregated real-time data.
+ Write locally, access your data globally: Set up multi-active replication to automatically propagate writes performed in one AWS Region to other Regions for providing data at lower latency and cost.

# How Amazon MSK Replicator works
<a name="msk-replicator-how-it-works"></a>

To get started with MSK Replicator, you need create a new Replicator in your target cluster’s AWS Region. MSK Replicator automatically copies all data from the cluster in the primary AWS Region called *source* to the cluster in the destination region called *target*. Source and target clusters can be in the same or different AWS Regions. You will need to create the target cluster if it does not already exist.

When you create a Replicator, MSK Replicator deploys all required resources in the target cluster’s AWS Region to optimize for data replication latency. Replication latency varies based on many factors, including the network distance between the AWS Regions of your MSK clusters, the throughput capacity of your source and target clusters, and the number of partitions on your source and target clusters. MSK Replicator automatically scales the underlying resources so that you can replicate data on-demand without having to monitor or scale capacity.

## Data replication
<a name="msk-replicator-data-replication"></a>

By default, MSK Replicator copies all data asynchronously from the latest offset in the source cluster topic partitions to the target cluster. If the "Detect and copy new topics" setting is turned on, MSK Replicator automatically detects and copies new topics or topic partitions to the target cluster. However, it may take up to 30 seconds for the Replicator to detect and create the new topics or topic partitions on the target cluster. Any messages produced to the source topic before the topic has been created on the target cluster will not be replicated. Alternatively, you can [configure your Replicator during creation](msk-replicator-prepare-cluster.md) to start replication from the earliest offset in the source cluster topic partitions if you want to replicate existing messages on your topics to the target cluster.

MSK Replicator does not store your data. Data is consumed from your source cluster, buffered in-memory and written to the target cluster. The buffer is cleared automatically when the data is either successfully written or fails after retries. All the communication and data between MSK Replicator and your clusters are always encrypted in-transit. All MSK Replicator API calls like `DescribeClusterV2`, `CreateTopic`, `DescribeTopicDynamicConfiguration` are captured in AWS CloudTrail. Your MSK broker logs will also reflect the same.

MSK Replicator creates topics in the target cluster with a Replicator Factor of 3. If you need to, you can modify the replication factor directly on the target cluster.

## Metadata replication
<a name="msk-replicator-metadata-replication"></a>

MSK Replicator also supports copying the metadata from the source cluster to the target cluster. The metadata includes topic configuration, Access Control Lists (ACLs), and consumer groups offsets. Like data replication, metadata replication also happens asynchronously. For better performance, MSK Replicator prioritizes data replication over metadata replication.

The following table is a list of Access Control Lists (ACLs) that MSK Replicator copies.


| Operation | Research | APIs allowed | 
| --- | --- | --- | 
|  Alter  |  Topic  |  CreatePartitions  | 
|  AlterConfigs  |  Topic  |  AlterConfigs  | 
|  Create  |  Topic  |  CreateTopics, Metadata  | 
|  Delete  |  Topic  |  DeleteRecords, DeleteTopics  | 
|  Describe  |  Topic  |  ListOffsets, Metadata, OffsetFetch, OffsetForLeaderEpoch  | 
|  DescribeConfigs  |  Topic  |  DescribeConfigs  | 
|  Read  |  Topic  |  Fetch, OffsetCommit, TxnOffsetCommit  | 
|  Write (deny only)  |  Topic  |  Produce, AddPartitionsToTxn  | 

MSK Replicator copies LITERAL pattern type ACLs only for resource type Topic. PREFIXED pattern type ACLs and other resource type ACLs are not copied. MSK Replicator also does not delete ACLs on the target cluster. If you delete an ACL on the source cluster, you should also delete on the target cluster at the same time. For more details on Kafka ACLs resource, pattern and operations, see [https://kafka.apache.org/documentation/\$1security\$1authz\$1cli](https://kafka.apache.org/documentation/#security_authz_cli).

MSK Replicator replicates only Kafka ACLs, which IAM access control does not use. If your clients are using IAM access control to read/write to your MSK clusters, you need to configure the relevant IAM policies on your target cluster as well for seamless failover. This is also true for both Prefixed as well as Identical topic name replication configurations.

As part of consumer groups offsets syncing, MSK Replicator optimizes for your consumers on the source cluster which are reading from a position closer to the tip of the stream (end of the topic partition). If your consumer groups are lagging on the source cluster, you may see higher lag for those consumer groups on the target as compared to the source. This means after failover to the target cluster, your consumers will reprocess more duplicate messages. To reduce this lag, your consumers on the source cluster would need to catch up and start consuming from the tip of the stream (end of the topic partition). As your consumers catch up, MSK Replicator will automatically reduce the lag.

![\[MSK Replicator source and target clusters\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/msk-replicator-diagram.png)


## Topic name configuration
<a name="msk-replicator-topic-name-configuration"></a>

MSK Replicator has two topic name configuration modes: *Prefixed* (default) or *Identical* topic name replication.

**Prefixed topic name replication**  
By default, MSK Replicator creates new topics in the target cluster with an auto-generated prefix added to the source cluster topic name, such as `<sourceKafkaClusterAlias>.topic`. This is to distinguish the replicated topics from others in the target cluster and to avoid circular replication of data between the clusters.

For example, MSK Replicator replicates data in a topic named “topic” from the source cluster to a new topic in the target cluster called <sourceKafkaClusterAlias>.topic. You can find the prefix that will be added to the topic names in the target cluster under the **sourceKafkaClusterAlias** field using `DescribeReplicator` API or the **Replicator** details page on the MSK console. The prefix in the target cluster is <sourceKafkaClusterAlias>.

To make sure your consumers can reliably restart processing from the standby cluster, you need to configure your consumers to read data from the topics using a wildcard operator `.*`. For example, your consumers would need to consume using .`*topic1` in both AWS Regions. This example would also include a topic such as `footopic1`, so adjust the wildcard operator according to your needs.

You should use the MSK Replicator which adds a prefix when you want to keep replicator data in a separate topic in the target cluster, such as for active-active cluster setups.

**Identical topic name replication**  
As an alternative to the default setting, Amazon MSK Replicator allows you to create a Replicator with topic replication set to Identical topic name replication (**Keep the same topics name** in console). You can create a new Replicator in the AWS Region which has your target MSK cluster. Identically-named replicated topics let you avoid reconfiguring clients to read from replicated topics.

Identical topic name replication (**Keep the same topics name** in console) has the following advantages:
+ Allows you to retain identical topic names during the replication process, while also automatically avoiding the risk of infinite replication loops.
+ Makes setting up and operating multi-cluster streaming architectures simpler, since you can avoid reconfiguring clients to read from the replicated topics. 
+ For active-passive cluster architectures, Identical topic name replication functionality also streamlines the failover process, allowing applications to seamlessly failover to a standby cluster without requiring any topic name changes or client reconfigurations.
+ Can be used to more easily consolidate data from multiple MSK clusters into a single cluster for data aggregation or centralized analytics. This requires you to create separate Replicators for each source cluster and the same target cluster.
+ Can streamline data migration from one MSK cluster to another by replicating data to identically named topics in the target cluster.

Amazon MSK Replicator uses Kafka headers to automatically avoid data being replicated back to the topic it originated from, eliminating the risk of infinite cycles during replication. A header is a key-value pair that can be included with the key, value, and timestamp in each Kafka message. MSK Replicator embeds identifiers for source cluster and topic into the header of each record being replicated. MSK Replicator uses the header information to avoid infinite replication loops. You should verify that your clients are able to read replicated data as expected.

# Tutorial: Set up source and target clusters for Amazon MSK Replicator
<a name="msk-replicator-getting-started"></a>

This tutorial shows you how to set up a source cluster and a target cluster in the same AWS Region or in different AWS Regions. You then use those clusters to create an Amazon MSK Replicator.

# Prepare the Amazon MSK source cluster
<a name="msk-replicator-prepare-cluster"></a>

If you already have an MSK source cluster created for the MSK Replicator, make sure that it meets the requirements described in this section. Otherwise, follow these steps to create an MSK provisioned or serverless source cluster.

The process for creating a cross-region and same-region MSK Replicator source cluster are similar. Differences are called out in the following procedures.

1. Create an MSK provisioned or serverless cluster with [IAM access control turned on](create-iam-access-control-cluster-in-console.md) in the source region. Your source cluster must have a minimum of three brokers.

1. For a cross-region MSK Replicator, if the source is a provisioned cluster, configure it with multi-VPC private connectivity turned on for IAM access control schemes. Note that the unauthenticated auth type is not supported when multi-VPC is turned on. You do not need to turn on multi-VPC private connectivity for other authentication schemes (mTLS or SASL/SCRAM). You can simultaneously use mTLS or SASL/SCRAM auth schemes for your other clients connecting to your MSK cluster. You can configure multi-VPC private connectivity in the console cluster details **Network settings** or with the `UpdateConnectivity` API. See [Cluster owner turns on multi-VPC](mvpc-cluster-owner-action-turn-on.md). If your source cluster is an MSK Serverlesss cluster, you do not need to turn on multi-VPC private connectivity.

   For a same-region MSK Replicator, the MSK source cluster does not require multi-VPC private connectivity and the cluster can still be accessed by other clients using the unauthenticated auth type.

1. For cross-region MSK Replicators, you must attach a resource-based permissions policy to the source cluster. This allows MSK to connect to this cluster for replicating data. You can do this using the CLI or AWS Console procedures below. See also, [Amazon MSK resource-based policies](security_iam_service-with-iam.md). You do not need to perform this step for same-region MSK Replicators.

------
#### [ Console: create resource policy ]

Update the source cluster policy with the following JSON. Replace the placeholder with the ARN of your source cluster.

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
    {
        "Effect": "Allow",
        "Principal": {
            "Service": [
                "kafka.amazonaws.com"
            ]
        },
        "Action": [
            "kafka:CreateVpcConnection",
            "kafka:GetBootstrapBrokers",
            "kafka:DescribeClusterV2"
        ],
        "Resource": "arn:aws:kafka:us-east-1:123456789012:cluster/myCluster/abcd1234-5678-90ab-cdef-1234567890ab-1"
    }
  ]
}
```

Use the **Edit cluster policy** option under the **Actions** menu on the cluster details page.

![\[Edit cluster policy in console\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/edit-cluster-policy.png)


------
#### [ CLI: create resource policy ]

Note: If you use the AWS console to create a source cluster and choose the option to create a new IAM role, AWS attaches the required trust policy to the role. If you want MSK to use an existing IAM role or if you create a role on your own, attach the following trust policies to that role so that MSK Replicator can assume it. For information about how to modify the trust relationship of a role, see [Modifying a Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_modify.html).

1. Get the current version of the MSK cluster policy using this command. Replace placeholders with the actual cluster ARN.

   ```
   aws kafka get-cluster-policy —cluster-arn <Cluster ARN>
   {
   "CurrentVersion": "K1PA6795UKM GR7",
   "Policy": "..."
   }
   ```

1. Create a resource-based policy to allow MSK Replicator to access your source cluster. Use the following syntax as a template, replacing the placeholder with the actual source cluster ARN.

   ```
   aws kafka put-cluster-policy --cluster-arn "<sourceClusterARN>" --policy '{
   "Version": "2012-10-17", 		 	 	 
   "Statement": [
   {
   "Effect": "Allow",
   "Principal": {
   "Service": [
   "kafka.amazonaws.com"
   ]
   },
   "Action": [
   "kafka:CreateVpcConnection",
   "kafka:GetBootstrapBrokers",
   "kafka:DescribeClusterV2"
   ],
   "Resource": "<sourceClusterARN>"
   }
   ]
   ```

------

# Prepare the Amazon MSK target cluster
<a name="msk-replicator-prepare-target-cluster"></a>

Create an MSK target cluster (provisioned or serverless) with IAM access control turned on. The target cluster doesn’t require multi-VPC private connectivity turned on. The target cluster can be in the same AWS Region or a different Region as the source cluster. Both the source and target clusters must be in the same AWS account. Your target cluster must have a minimum of three brokers.

# Tutorial: Create an Amazon MSK Replicator
<a name="msk-replicator-create"></a>

After you set up the source and target clusters, you can use those clusters to create an Amazon MSK Replicator. Before you create the Amazon MSK Replicator, make sure you have [IAM permissions required to create an MSK Replicator](msk-replicator-requirements.md#replicator-role-permissions-successful).

**Contents**
+ [

# Considerations for creating an Amazon MSK Replicator
](msk-replicator-requirements.md)
  + [

## IAM permissions required to create an MSK Replicator
](msk-replicator-requirements.md#replicator-role-permissions-successful)
  + [

# Supported cluster types and versions for MSK Replicator
](msk-replicator-supported-clusters-versions.md)
  + [

# Supported MSK Serverless cluster configuration
](msk-replicator-serverless-requirements.md)
    + [

## Cluster configuration changes
](msk-replicator-serverless-requirements.md#msk-replicator-config-changes)
+ [

# Create replicator using the AWS console in the target cluster Region
](msk-replicator-create-console.md)
  + [

## Choose your source cluster
](msk-replicator-create-console.md#msk-replicator-create-console-choose-source)
  + [

## Choose your target cluster
](msk-replicator-create-console.md#msk-replicator-create-console-choose-target)
  + [

## Configure replicator settings and permissions
](msk-replicator-create-console.md#msk-replicator-create-settings)

# Considerations for creating an Amazon MSK Replicator
<a name="msk-replicator-requirements"></a>

The following sections give an overview of the prerequisites, supported configurations, and best practices for using the MSK Replicator feature. It covers the necessary permissions, cluster compatibility, and Serverless-specific requirements, as well as guidance on managing the Replicator after creation.

## IAM permissions required to create an MSK Replicator
<a name="replicator-role-permissions-successful"></a>

Here is an example of the IAM policy required to create an MSK Replicator. The action `kafka:TagResource` is only needed if tags are provided when creating the MSK Replicator. Replicator IAM policies should be attached to the IAM role that corresponds to your client. For information about creating authorization policies, see [Create authorization policies](https://docs.aws.amazon.com/msk/latest/developerguide/iam-access-control.html#create-iam-access-control-policies).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "MSKReplicatorIAMPassRole",
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::123456789012:role/MSKReplicationRole",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "kafka.amazonaws.com"
        }
      }
    },
    {
      "Sid": "MSKReplicatorServiceLinkedRole",
      "Effect": "Allow",
      "Action": "iam:CreateServiceLinkedRole",
      "Resource": "arn:aws:iam::123456789012:role/aws-service-role/kafka.amazonaws.com/AWSServiceRoleForKafka*"
    },
    {
      "Sid": "MSKReplicatorEC2Actions",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeVpcs",
        "ec2:CreateNetworkInterface"
      ],
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0abcd1234ef56789",
        "arn:aws:ec2:us-east-1:123456789012:security-group/sg-0123abcd4567ef89",
        "arn:aws:ec2:us-east-1:123456789012:network-interface/eni-0a1b2c3d4e5f67890",
        "arn:aws:ec2:us-east-1:123456789012:vpc/vpc-0a1b2c3d4e5f67890"
      ]
    },
    {
      "Sid": "MSKReplicatorActions",
      "Effect": "Allow",
      "Action": [
        "kafka:CreateReplicator",
        "kafka:TagResource"
      ],
      "Resource": [
        "arn:aws:kafka:us-east-1:123456789012:cluster/myCluster/abcd1234-56ef-78gh-90ij-klmnopqrstuv",
        "arn:aws:kafka:us-east-1:123456789012:replicator/myReplicator/wxyz9876-54vu-32ts-10rq-ponmlkjihgfe"
      ]
    }
  ]
}
```

------

The following is an example IAM policy to describe replicator.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "kafka:DescribeReplicator",
                "kafka:ListTagsForResource"
            ],
            "Resource": "*"
        }
    ]
}
```

------

# Supported cluster types and versions for MSK Replicator
<a name="msk-replicator-supported-clusters-versions"></a>

These are requirements for supported instance types, Kafka versions, and network configurations.
+ MSK Replicator supports both MSK provisioned clusters and MSK Serverless clusters in any combination as source and target clusters. Other types of Kafka clusters are not supported at this time by MSK Replicator.
+ MSK Serverless clusters require IAM access control, don't support Apache Kafka ACL replication and with limited support on-topic configuration replication. See [What is MSK Serverless?](serverless.md).
+ MSK Replicator is supported only on clusters running Apache Kafka 2.7.0 or higher, regardless of whether your source and target clusters are in the same or in different AWS Regions.
+ MSK Replicator supports clusters using instance types of m5.large or larger. t3.small clusters aren't supported.
+ If you are using MSK Replicator with an MSK Provisioned cluster, you need a minimum of three brokers in both source and target clusters. You can replicate data across clusters in two Availability Zones, but you would need a minimum of four brokers in those clusters.
+ Both your source and target MSK clusters must be in the same AWS account. Replication across clusters in different accounts is not supported.
+ If the source and target MSK clusters are in different AWS Regions (cross-region), MSK Replicator requires the source cluster to have multi-VPC private connectivity turned on for its IAM Access Control method.

  Multi-VPC isn't required for other authentication methods on the source cluster for MSK replication across AWS Regions.

  Multi-VPC is also not required if you're replicating data between clusters in the same AWS Region. See [Amazon MSK multi-VPC private connectivity in a single Region](aws-access-mult-vpc.md).
+ Identical topic name replication (**Keep the same topics name** in console) requires an MSK cluster running Kafka version 2.8.1 or higher.
+ For Identical topic name replication (**Keep the same topics name** in console) configurations, to avoid the risk of cyclic replication, do not make changes to the headers that MSK Replicator creates (`__mskmr`).

# Supported MSK Serverless cluster configuration
<a name="msk-replicator-serverless-requirements"></a>
+ MSK Serverless supports replication of these topic configurations for MSK Serverless target clusters during topic creation: `cleanup.policy`, `compression.type`, `max.message.bytes`, `retention.bytes`, `retention.ms`.
+ MSK Serverless supports only these topic configurations during topic configuration sync: `compression.type`, `max.message.bytes`, `retention.bytes`, `retention.ms`.
+ Replicator uses 83 compacted partitions on target MSK Serverless clusters. Make sure that target MSK Serverless clusters have a sufficient number of compacted partitions. See [MSK Serverless quota](limits.md#serverless-quota).

## Cluster configuration changes
<a name="msk-replicator-config-changes"></a>
+ It’s recommended that you do not turn tiered storage on or off after the MSK Replicator has been created. If your target cluster is not tiered, then MSK won’t copy the tiered storage configurations, regardless of whether your source cluster is tiered or not. If you turn on tiered storage on the target cluster after Replicator is created, the Replicator needs to be recreated. If you want to copy data from a non-tiered to a tiered cluster, you should not copy topic configurations. See [Enabling and disabling tiered storage on an existing topic](https://docs.aws.amazon.com/msk/latest/developerguide/msk-enable-disable-topic-tiered-storage-cli.html).
+ Don’t change cluster configuration settings after MSK Replicator creation. Cluster configuration settings are validated during MSK Replicator creation. To avoid problems with the MSK Replicator, don’t change the following settings after the MSK Replicator is created.
  + Change MSK cluster to t3 instance type.
  + Change service execution role permissions.
  + Disable MSK multi-VPC private connectivity.
  + Change the attached cluster resource-based policy.
  + Change cluster security group rules.

# Create replicator using the AWS console in the target cluster Region
<a name="msk-replicator-create-console"></a>

The following section explains the step-wise console workflow for creating a replicator.

**Replicator details**

1. In the AWS Region where your target MSK cluster is located, open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

1. Choose **Replicators** to display the list of replicators in the account.

1. Choose **Create replicator**.

1. In the **Replicator details** pane, give the new replicator a unique name.

## Choose your source cluster
<a name="msk-replicator-create-console-choose-source"></a>

The source cluster contains the data you want to copy to a target MSK cluster.

1. In the **Source cluster** pane, choose the AWS Region where the source cluster is located.

   You can look up a cluster’s Region by going to **MSK Clusters** and looking at the **Cluster** details ARN. The Region name is embedded in the ARN string. In the following example ARN, `ap-southeast-2` is the cluster region.

   ```
   arn:aws:kafka:ap-southeast-2:123456789012:cluster/cluster-11/eec93c7f-4e8b-4baf-89fb-95de01ee639c-s1
   ```

1. Enter the ARN of your source cluster or browse to choose your source cluster.

1. Choose subnet(s) for your source cluster.

   The console displays the subnets available in the source cluster’s Region for you to select. You must select a minimum of two subnets. For a same-region MSK Replicator, the subnets that you select set to access the source cluster and the subnets to access the target cluster must be in the same Availability Zone.

1. Choose security group(s) for the MSK Replicator to access your source cluster.
   + For cross-region replication (CRR), you do not need to provide security group(s) for your source cluster.
   + For same region replication (SRR), go to the Amazon EC2 console at https://console.aws.amazon.com/ec2/ and ensure that the security groups you will provide for the Replicator have outbound rules to allow traffic to your source cluster's security groups. Also, ensure that your source cluster's security groups have inbound rules that allow traffic from the Replicator security groups provided for the source.

      

**To add inbound rules to your source cluster’s security group:**

     1. In the AWS console, go to your source cluster’s details by selecting the the **Cluster name**.

     1. Select the **Properties tab**, then scroll down to the **Network settings** pane to select the name of the **Security group** applied.

     1. Go to the inbound rules and select **Edit inbound rules**.

     1. Select **Add rule**.

     1. In the **Type** column for the new rule, select **Custom TCP**.

     1. In the **Port range** column, type `9098`. MSK Replicator uses IAM access control to connect to your cluster which uses port 9098.

     1. In the **Source** column, type the name of the security group that you will provide during Replicator creation for the source cluster (this may be the same as the MSK source cluster's security group), and then select **Save rules**.

      

**To add outbound rules to Replicator’s security group provided for the source:**

     1. In the AWS console for Amazon EC2, go to the security group that you will provide during Replicator creation for the source.

     1. Go to the outbound rules and select **Edit outbound rules**.

     1. Select **Add rule**.

     1. In the **Type** column for the new rule, select **Custom TCP**.

     1. In the **Port range** column, type `9098`. MSK Replicator uses IAM access control to connect to your cluster which uses port 9098.

     1. In the **Source** column, type the name of the MSK source cluster’s security group, and then select **Save rules**.

**Note**  
Alternately, if you do not want to restrict traffic using your security groups, you can add inbound and outbound rules allowing All Traffic.  
1. Select **Add rule**.  
2. In the **Type** column, select **All Traffic**.  
3. In the Source column, type `0.0.0.0/0`, and then select **Save rules**.

## Choose your target cluster
<a name="msk-replicator-create-console-choose-target"></a>

The target cluster is the MSK provisioned or serverless cluster to which the source data is copied.

**Note**  
MSK Replicator creates new topics in the target cluster with an auto-generated prefix added to the topic name. For instance, MSK Replicator replicates data in “`topic`” from the source cluster to a new topic in the target cluster called `<sourceKafkaClusterAlias>.topic`. This is to distinguish topics that contain data replicated from source cluster from other topics in the target cluster and to avoid data being circularly replicated between the clusters. You can find the prefix that will be added to the topic names in the target cluster under the **sourceKafkaClusterAlias** field using `DescribeReplicator` API or the **Replicator details** page on the MSK Console. The prefix in the target cluster is `<sourceKafkaClusterAlias>`.

1. In the **Target cluster** pane, choose the AWS Region where the target cluster is located.

1. Enter the ARN of your target cluster or browse to choose your target cluster.

1. Choose subnet(s) for your target cluster.

   The console displays subnets available in the target cluster’s Region for you to select. Select a minimum of two subnets.

1. Choose security group(s) for the MSK Replicator to access your target cluster.

   The security groups available in the target cluster’s Region are displayed for you to select. The chosen security group is associated with each connection. For more information about using security groups, see the [Control traffic to your AWS resources using security groups](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html) in the *Amazon VPC User Guide*.
   + For both cross region replication (CRR) and same region replication (SRR), go to the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/) and ensure that the security groups you will provide to the Replicator have outbound rules to allow traffic to your target cluster's security groups. Also ensure that your target cluster's security groups have inbound rules that accept traffic from the Replicator security groups provided for the target.

      

**To add inbound rules to your target cluster’s security group:**

     1. In the AWS console, go to your target cluster’s details by selecting the the **Cluster name**.

     1. Select the **Properties** tab, then scroll down to the Network settings pane to select the name of the **Security group** applied.

     1. Go to the inbound rules and select **Edit inbound rules**.

     1. Select **Add rule**.

     1. In the **Type** column for the new rule, select **Custom TCP**.

     1. In the **Port range** column, type `9098`. MSK Replicator uses IAM access control to connect to your cluster which uses port 9098.

     1. In the **Source** column, type the name of the security group that you will provide during Replicator creation for the target cluster (this may be the same as the MSK target cluster's security group), and then select **Save rules**.

      

**To add outbound rules to Replicator’s security group provided for the target:**

     1. In the AWS console, go to the security group that you will provide during Replicator creation for the target.

     1. Select the **Properties** tab, then scroll down to the Network settings pane to select the name of the **Security group** applied.

     1. Go to the outbound rules and select **Edit outbound rules**.

     1. Select **Add rule**.

     1. In the **Type** column for the new rule, select **Custom TCP**.

     1. In the **Port range** column, type `9098`. MSK Replicator uses IAM access control to connect to your cluster which uses port 9098.

     1. In the **Source** column, type the name of the MSK target cluster’s security group, and then select **Save rules**.

**Note**  
Alternately, if you do not want to restrict traffic using your security groups, you can add inbound and outbound rules allowing All Traffic.  
1. Select **Add rule**.  
2. In the **Type** column, select **All Traffic**.  
3. In the Source column, type `0.0.0.0/0`, and then select **Save rules**.

## Configure replicator settings and permissions
<a name="msk-replicator-create-settings"></a>

1. In the **Replicator settings** pane, specify the topics you want to replicate using regular expressions in the allow and deny lists. By default, all topics are replicated.
**Note**  
MSK Replicator only replicates up to 750 topics in sorted order. If you need to replicate more topics, we recommend that you create a separate Replicator. Go to the AWS console Support Center and [create a support case](https://console.aws.amazon.com/support/home#/) if you need support for more than 750 topics per Replicator. You can monitor the number of topics being replicated using the "TopicCount" metric. See [Amazon MSK Standard broker quota](limits.md#msk-provisioned-quota).

1. By default, MSK Replicator starts replication from the *latest* (most recent) offset in the selected topics. Alternatively, you can start replication from the *earliest* (oldest) offset in the selected topics if you want to replicate existing data on your topics. Once the Replicator is created, you can’t change this setting. This setting corresponds to the [https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators-replicatorarn.html#v1-replicators-replicatorarn-model-replicationstartingposition](https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators-replicatorarn.html#v1-replicators-replicatorarn-model-replicationstartingposition) field in the [https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators.html#CreateReplicator](https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators.html#CreateReplicator) request and [https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators-replicatorarn.html#DescribeReplicator](https://docs.aws.amazon.com/msk/1.0/apireference-replicator/v1-replicators-replicatorarn.html#DescribeReplicator) response APIs.

1. Choose a topic name configuration:
   + `PREFIXED` topic name replication (**Add prefix to topics name** in console): The default setting. MSK Replicator replicates “topic1” from the source cluster to a new topic in the target cluster with the name `<sourceKafkaClusterAlias>.topic1`. 
   + Identical topic name replication (**Keep the same topics name** in console): Topics from the source cluster are replicated with identical topic names in the target cluster.

   This setting corresponds to the `TopicNameConfiguration` field in the `CreateReplicator` request and `DescribeReplicator` response APIs. See [How Amazon MSK Replicator works](msk-replicator-how-it-works.md).
**Note**  
By default, MSK Replicator creates new topics in the target cluster with an auto-generated prefix added to the topic name. This is to distinguish topics that contain data replicated from source cluster from other topics in the target cluster and to avoid data being circularly replicated between the clusters. Alternatively, you can create a MSK Replicator with Identical topic name replication (**Keep the same topics name** in console) so that topic names are preserved during replication. This configuration reduces the need for you to reconfigure client applications during setup and makes it simpler to operate multi-cluster streaming architectures.

1. By default, MSK Replicator copies all metadata including topic configurations, Access Control Lists (ACLs) and consumer group offsets for seamless failover. If you are not creating the Replicator for failover, you can optionally choose to turn off one or more of these settings available in the **Additional settings** section.
**Note**  
MSK Replicator does not replicate write ACLs since your producers should not be writing directly to the replicated topic in the target cluster. Your producers should write to the local topic in the target cluster after failover. See [Perform a planned failover to the secondary AWS Region](msk-replicator-perform-planned-failover.md) for details.

1. In the **Consumer group replication** pane, specify the consumer groups you want to replicate using regular expressions in the allow and deny lists. By default, all consumer groups are replicated.

1. In the **Compression** pane, you can optionally choose to compress the data written to the target cluster. If you’re going to use compression, we recommend that you use the same compression method as the data in your source cluster.

1. In the **Access permissions** pane do either of the following:

   1. Select **Create or update IAM role with required policies**. MSK console will automatically attach the necessary permissions and trust policy to the service execution role required to read and write to your source and target MSK clusters.   
![\[MSK console to create or update replicator IAM role\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/msk-replicator-ezCRC.png)

   1. Provide your own IAM role by selecting **Choose from IAM roles that Amazon MSK can assume**. We recommend that you attach the `AWSMSKReplicatorExecutionRole` managed IAM policy to your service execution role, instead of writing your own IAM policy.

      1. Create the IAM role that the Replicator will use to read and write to your source and target MSK clusters with the below JSON as part of the trust policy and the `AWSMSKReplicatorExecutionRole` attached to the role. In the trust policy, replace the placeholder <yourAccountID> with your actual account ID. 

------
#### [ JSON ]

****  

        ```
        {
            "Version":"2012-10-17",		 	 	 
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "kafka.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "aws:SourceAccount": "<yourAccountID>"
                        }
                    }
                }
            ]
        }
        ```

------

1. In the **Replicator tags** pane, you can optionally assign tags to the MSK Replicator resource. For more information, see [Tag an Amazon MSK cluster](msk-tagging.md). For a cross-region MSK Replicator, tags are synced to the remote Region automatically when the Replicator is created. If you change tags after the Replicator is created, the change is not automatically synced to the remote Region, so you’ll need to sync local replicator and remote replicator references manually.

1. Select **Create**.

If you want to restrict `kafka-cluster:WriteData` permission, refer to the *Create authorization policies* section of [How IAM access control for Amazon MSK works](https://docs.aws.amazon.com/msk/latest/developerguide/iam-access-control.html#how-to-use-iam-access-control). You'll need to add `kafka-cluster:WriteDataIdempotently` permission to both the source and target cluster.

It takes approximately 30 minutes for the MSK Replicator to be successfully created and transitioned to RUNNING status.

If you create a new MSK Replicator to replace one that you deleted, the new Replicator starts replication from the latest offset.

If your MSK Replicator has transitioned to a FAILED status, refer to the troubleshooting section [Troubleshooting MSK Replicator](msk-replicator-troubleshooting.md).

# Edit MSK Replicator settings
<a name="msk-replicator-edit-settings"></a>

You can’t change the source cluster, target cluster, Replicator starting position, or topic name replication configuration once the MSK Replicator has been created. You need to create a new replicator to use Identical topic name replication configuration. However, you can edit other Replicator settings, such as topics and consumer groups to replicate.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

1. In the left navigation pane, choose **Replicators** to display the list of Replicators in the account and select the MSK Replicator you want to edit.

1. Choose the **Properties** tab.

1. In the **Replicator settings** section, choose **Edit replicator**.

1. You can edit the MSK Replicator settings by changing any of these settings.
   + Specify the topics you want to replicate using regular expressions in the allow and deny lists. By default, MSK Replicator copies all metadata including topic configurations, Access Control Lists (ACLs) and consumer group offsets for seamless failover. If you are not creating the Replicator for failover, you can optionally choose to turn off one or more of these settings available in the **Additional settings** section.
**Note**  
MSK Replicator does not replicate write ACLs since your producers should not be writing directly to the replicated topic in the target cluster. Your producers should write to the local topic in the target cluster after failover. See [Perform a planned failover to the secondary AWS Region](msk-replicator-perform-planned-failover.md) for details.
   + For **Consumer group replication**, you can specify the consumer groups you want to replicate using regular expressions in the allow and deny lists. By default, all consumer groups are replicated. If allow and deny lists are empty, consumer group replication is turned off.
   + Under **Target compression type**, you can choose whether to compress the data written to the target cluster. If you’re going to use compression, we recommend that you use the same compression method as the data in your source cluster.

1. Save your changes.

   It takes approximately 30 minutes for the MSK Replicator to be successfully created and transitioned to running state. If your MSK Replicator has transitioned to a FAILED status, refer to the troubleshooting section [Troubleshoot MSK Replicator](msk-replicator-troubleshooting.md).

# Delete an MSK Replicator
<a name="msk-replicator-delete"></a>

You may need to delete a MSK Replicator if it fails to create (FAILED status). The source and target clusters assigned to an MSK Replicator can’t be changed once the MSK Replicator is created. You can delete an existing MSK Replicator and create a new one. If you create a new MSK Replicator to replace the deleted one, the new Replicator starts replication from the latest offset.

1. In the AWS Region where your source cluster is located, sign in to the AWS Management Console, and open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

1. In the navigation pane, select **Replicators**.

1. From the list of MSK Replicators, select the one you want to delete and choose **Delete**.

# Monitor replication
<a name="msk-replicator-monitor"></a>

You can use [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/) in the target cluster Region to view metrics for `ReplicationLatency`, `MessageLag`, and `ReplicatorThroughput` at a topic and aggregate level for each Amazon MSK Replicator. Metrics are visible under **ReplicatorName** in the “AWS/Kafka” namespace. You can also see `ReplicatorFailure`, `AuthError` and `ThrottleTime` metrics to check for issues.

The MSK console displays a subset of CloudWatch metrics for each MSK Replicator. From the console **Replicator** list, select the name of a Replicator and select the **Monitoring** tab.

## MSK Replicator metrics
<a name="msk-replicator-metrics"></a>

The following metrics describes performance or connection metrics for the MSK Replicator.

AuthError metrics do not cover topic-level auth errors. To monitor your MSK Replicator’s topic-level auth errors, monitor Replicator’s ReplicationLatency metrics and the source cluster’s topic-level metrics, MessagesInPerSec. If a topic’s ReplicationLatency dropped to 0 but the topic still has data being produced to it, it indicates that the Replicator has an Auth issue with the topic. Check that the Replicator’s service execution IAM role has sufficient permission to access the topic.


****  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator-monitor.html)

# Use replication to increase the resiliency of a Kafka streaming application across Regions
<a name="msk-replicator-increase-resiliency"></a>

You can use MSK Replicator to set up active-active or active-passive cluster topologies to increase resiliency of your Apache Kafka application across AWS Regions. In an active-active setup, both MSK clusters are actively serving reads and writes. In an active-passive setup, only one MSK cluster at a time is actively serving streaming data, while the other cluster is on standby.

## Considerations for building multi-Region Apache Kafka applications
<a name="msk-replication-multi-region-kafka-applications"></a>

Your consumers must be able to reprocess duplicate messages without downstream impact. MSK Replicator replicates data at-least-once which may result in duplicates in the standby cluster. When you switch over to the secondary AWS Region, your consumers may process the same data more than once. MSK Replicator prioritizes copying data over consumer offsets for better performance. After a failover, the consumer may start reading from earlier offsets resulting in duplicate processing.

Producers and consumers must also tolerate losing minimal data. Since MSK Replicator replicates data asynchronously, when the primary AWS Region starts experiencing failures, there is no guarantee that all data is replicated to the secondary Region. You can use the replication latency to determine maximum data that was not copied into the secondary Region.

## Using active-active versus active-passive cluster topology
<a name="msk-replicator-active-versus-passive"></a>

An active-active cluster topology offers near zero recovery time and the capability for your streaming application to operate simultaneously in multiple AWS Regions. When a cluster in one Region is impaired, applications connected to the cluster in the other Region continue processing data.

Active-passive setups are suited to applications that can run in only one AWS Region at a time, or when you need more control over the data processing order. Active-passive setups require more recovery time than active-active setups, as you must start your entire active-passive setup, including your producers and consumers, in the secondary Region to resume streaming data after a failover.

# Create an active-passive Kafka cluster setup with recommended topic naming configurations
<a name="msk-replicators-active-passive-cluster-setup"></a>

For an active-passive setup, we recommend you to operate a similar setup of producers, MSK clusters, and consumers (with the same consumer group name) in two different AWS Regions. It is important that the two MSK clusters have identical read and write capacity to ensure reliable data replication. You need to create a MSK Replicator to continuously copy data from the primary to the standby cluster. You also need to configure your producers to write data into topics on a cluster in the same AWS Region.

For an active-passive setup, create a new Replicator with Identical topic name replication (**Keep the same topics name** in console) to start replicating data from your MSK cluster in the primary region to your cluster in the secondary region. We recommend that you operate a duplicate set of producers and consumers in the two AWS Regions, each connecting to the cluster in their own region using its bootstrap string. This simplifies the failover process since it won’t require changes to the bootstrap string. To ensure that consumers read from near where they left off, consumers in the source and target clusters should have the same consumer group ID.

If you use Identical topic name replication (**Keep the same topics name** in console) for your MSK Replicator, it will replicate your topics with the same name as the corresponding source topics.

We recommend that you configure cluster level settings and permissions for your clients on the target cluster. You do not need to configure topic level settings and literal read ACLs as MSK Replicator automatically copies them if you have selected the option to copy access control lists. See [Metadata replication](msk-replicator-how-it-works.md#msk-replicator-metadata-replication).

# Failover to the secondary AWS Region
<a name="msk-replicator-when-planned-failover"></a>

We recommend that you monitor replication latency in the secondary AWS Region using Amazon CloudWatch. During a service event in the primary AWS Region, replication latency may suddenly increase. If the latency keeps increasing, use the AWS Service Health Dashboard to check for service events in the primary AWS Region. If there’s an event, you can failover to the secondary AWS Region.

# Perform a planned failover to the secondary AWS Region
<a name="msk-replicator-perform-planned-failover"></a>

You can conduct a planned failover to test the resiliency of your application against an unexpected event in your primary AWS region which has your source MSK cluster. A planned failover should not result in data loss.

If you’re using Identical topic name replication configuration, follow these steps:

1. Shutdown all producers and consumers connecting to your source cluster.

1. Create a new MSK Replicator to replicate data from your MSK cluster in the secondary Region to your MSK cluster in the primary Region with Identical topic name replication (**Keep the same topics name** in console). This is required to copy the data that you will be writing to the secondary region back to the primary Region so that you can failback to the primary Region after the unexpected event has ended.

1. Start producers and consumers connected to the target cluster in the secondary AWS Region.

If you’re using Prefixed topic name configuration, follow these steps to failover:

1. Shutdown all producers and consumers connecting to your source cluster.

1. Create a new MSK Replicator to replicate data from your MSK cluster in the secondary Region to your MSK cluster in the primary Region. This is required to copy the data that you will be writing to the secondary region back to the primary Region so that you can failback to the primary Region after the unexpected event has ended.

1. Start producers on target cluster in the secondary AWS Region.

1. Depending on your application’s message ordering requirements, follow the steps in one of the following tabs.

------
#### [ No message ordering ]

   If your application does not require message ordering, start consumers in the secondary AWS Region that read from both the local (for example, topic) and replicated topics (for example, `<sourceKafkaClusterAlias>.topic`) using a wildcard operator (for example, `.*topic`).

------
#### [ Message ordering ]

   If your application requires message ordering, start consumers only for the replicated topics on target cluster (for example, `<sourceKafkaClusterAlias>.topic`) but not the local topics (for example, `topic`).

------

1. Wait for all the consumers of replicated topics on the target MSK cluster to finish processing all data, so that consumer lag is 0 and the number of records processed is also 0. Then, stop consumers for the replicated topics on target cluster. At this point, all records that were replicated from the source MSK cluster to target MSK cluster have been consumed.

1. Start consumers for the local topics (for example, `topic`) on the target MSK cluster.

# Perform an unplanned failover to the secondary AWS Region
<a name="msk-replicator-perform-unplanned-failover"></a>

You can conduct an unplanned failover when there is a service event in the primary AWS Region which has your source MSK cluster and you want to temporarily redirect your traffic to the secondary Region which has your target MSK cluster. An unplanned failover could result in some data loss as MSK Replicator replicates data asynchronously. You can track the message lag using the metrics in [Monitor replication](msk-replicator-monitor.md).

If you’re using Identical topic name replication configuration (**Keep the same topics name** in console), follow these steps:

1. Attempt to shut down all producers and consumers connecting to the source MSK cluster in the primary Region. This operation might not succeed due to impairments in that region.

1. Start producers and consumers connecting to the target MSK cluster in the secondary AWS Region to complete the failover. As MSK Replicator also replicates metadata including read ACLs and consumer group offsets, your producers and consumers will seamlessly resume processing from near where they left off before failover.

If you’re using `PREFIX` topic name configuration, follow these steps to failover:

1. Attempt to shut down all producers and consumers connecting to the source MSK cluster in the primary Region. This operation might not succeed due to impairments in that region.

1. Start producers and consumers connecting to the target MSK cluster in the secondary AWS Region to complete the failover. As MSK Replicator also replicates metadata including read ACLs and consumer group offsets, your producers and consumers will seamlessly resume processing from near where they left off before failover.

1. Depending on your application’s message ordering requirements, follow the steps in one of the following tabs.

------
#### [ No message ordering ]

   If your application does not require message ordering, start consumers in the target AWS Region that read from both the local (for example, `topic`) and replicated topics (for example, `<sourceKafkaClusterAlias>.topic`) using a wildcard operator (for example, `.*topic`).

------
#### [ Message ordering ]

   1. Start consumers only for the replicated topics on target cluster (for example, `<sourceKafkaClusterAlias>.topic`) but not the local topics (for example, `topic`).

   1. Wait for all the consumers of replicated topics on the target MSK cluster to finish processing all data, so that offset lag is 0 and the number of records processed is also 0. Then, stop consumers for the replicated topics on target cluster. At this point, all records that were replicated from the source MSK cluster to target MSK cluster have been consumed.

   1. Start consumers for the local topics (for example, `topic`) on the target MSK cluster.

------

1. Once the service event has ended in the primary Region, create a new MSK Replicator to replicate data from your MSK cluster in the secondary Region to your MSK cluster in the primary Region with Replicator starting position set to *earliest*. This is required to copy the data that you will be writing to the secondary Region back to the primary Region so that you can failback to the primary Region after the service event has ended. If you don't set the Replicator starting position to *earliest*, any data you produced to the cluster in the secondary region during the service event in the primary region will not be copied back to the cluster in the primary region.

# Perform failback to the primary AWS Region
<a name="msk-replicator-perform-failback"></a>

You can failback to the primary AWS region after the service event in that region has ended.

If you’re using Identical topic name replication configuration, follow these steps:

1. Create a new MSK Replicator with your secondary cluster as source and primary cluster as target, starting position set to *earliest* and Identical topic name replication (**Keep the same topics name** in console).

   This will start the process of copying all data written to the secondary cluster after failover back to the primary region.

1. Monitor the `MessageLag` metric on the new replicator in Amazon CloudWatch until it reaches `0`, which indicates all data has been replicated from secondary to primary.

1. After all data has been replicated, stop all producers connecting to the secondary cluster and start producers connecting to the primary cluster.

1. Wait for `MaxOffsetLag` metric for your consumers connecting to secondary cluster to become `0` to ensure they have processed all the data. See [Monitor consumer lags](consumer-lag.md).

1. Once all data has been processed, stop consumers in the secondary region and start consumers connecting to the primary cluster to complete the failback.

1. Delete the Replicator you created in the first step that is replicating data from your secondary cluster to primary.

1. Verify that your existing Replicator copying data from primary to secondary cluster has status as “RUNNING” and `ReplicatorThroughput` metric in Amazon CloudWatch `0`.

   Note that when you create a new Replicator with starting position as *Earliest* for failback, it starts reading all data in your secondary clusters’ topics. Depending on your data retention settings, your topics may have data that came from your source cluster. While MSK Replicator automatically filters those messages, you will still incur data processing and transfer charges for all the data in your secondary cluster. You can track the total data processed by replicator using `ReplicatorBytesInPerSec`. See [MSK Replicator metrics](msk-replicator-monitor.md#msk-replicator-metrics).

If you’re using Prefixed topic name configuration, follow these steps:

You should initiate failback steps only after replication from the cluster in the secondary Region to the cluster in the primary Region has caught up and the MessageLag metric in Amazon CloudWatch is close to 0. A planned failback should not result in any data loss.

1. Shut down all producers and consumers connecting to the MSK cluster in the secondary Region.

1. For active-passive topology, delete the Replicator that is replicating data from cluster in the secondary Region to primary Region. You do not need to delete the Replicator for active-active topology.

1. Start producers connecting to the MSK cluster in the primary Region.

1. Depending on your application’s message ordering requirements, follow the steps in one of the following tabs.

------
#### [ No message ordering ]

   If your application does not require message ordering, start consumers in the primary AWS Region that read from both the local (for example, `topic`) and replicated topics (for example, `<sourceKafkaClusterAlias>.topic`) using a wildcard operator (for example, `.*topic`). The consumers on local topics (e.g.: topic) will resume from the last offset they consumed before the failover. If there was any unprocessed data from before the failover, it will get processed now. In the case of a planned failover, there should be no such record.

------
#### [ Message ordering ]

   1. Start consumers only for the replicated topics on primary Region (for example, `<sourceKafkaClusterAlias>.topic`) but not the local topics (for example, `topic`).

   1. Wait for all the consumers of replicated topics on the cluster in the primary Region to finish processing all data, so that offset lag is 0 and the number of records processed is also 0. Then, stop consumers for the replicated topics on cluster in the primary Region. At this point, all records that were produced in the secondary Region after failover have been consumed in the primary Region.

   1. Start consumers for the local topics (for example, `topic`) on the cluster in the primary Region.

------

1. Verify that the existing Replicator from cluster in primary to cluster in secondary Region is in RUNNING state and working as expected using the `ReplicatorThroughput` and latency metrics.

# Create an active-active setup using MSK Replicator
<a name="msk-replicator-active-active"></a>

If you want to create an active-active setup where both MSK clusters are actively serving reads and writes, we recommend that you use an MSK Replicator with Prefixed topic name replication (**Add prefix to topics name** in console). However, this will require you to reconfigure your consumers to read the replicated topics.

Follow these steps to set up active-active topology between source MSK cluster A and target MSK cluster B.

1. Create a MSK Replicator with MSK cluster A as source and MSK cluster B as target.

1. After the above MSK Replicator has been successfully created, create a Replicator with cluster B as source and cluster A as target.

1. Create two sets of producers, each writing data at the same time into the local topic (for example, “topic”) in the cluster in the same region as the producer.

1. Create two sets of consumers, each reading data using a wildcard subscription (such as “.\$1topic”) from the MSK cluster in the same AWS Region as the consumer. This way your consumers will automatically read data produced locally in the Region from the local topic (for example, `topic`), as well as data replicated from other Region in topic with the prefix `<sourceKafkaClusterAlias>.topic`). These two sets of consumers should have different consumer group IDs so that consumer group offsets are not overwritten when MSK Replicator copies them to the other cluster.

If you want to avoid reconfiguring your clients, instead of the Prefixed topic name replication (**Add prefix to topics name** in console), you can create the MSK Replicators using Identical topic name replication (**Keep the same topics name** in console) to create an active-active setup. However, you will pay additional data processing and data transfer charges for each Replicator. This is because each Replicator will need to process twice the usual amount of data, once for replication and again to prevent infinite loops. You can track the total amount of data processed by each replicator using the `ReplicatorBytesInPerSec` metric. See [Monitor replication](msk-replicator-monitor.md). This metric includes the data replicated to target cluster as well as the data filtered by MSK Replicator to prevent the data being coped back to the same topic it originated from.

**Note**  
If you're using Identical topic name replication (**Keep the same topics name** in console) to set up active-active topology, wait at least 30 seconds after deleting a topic before re-creating a topic with the same name. This waiting period helps to prevent duplicated messages being replicated back to the source cluster. Your consumers must be able to reprocess duplicate messages without downstream impact. See [Considerations for building multi-Region Apache Kafka applications](msk-replicator-increase-resiliency.md#msk-replication-multi-region-kafka-applications).

# Migrate from one Amazon MSK cluster to another using MSK Replicator
<a name="msk-replicator-migrate-cluster"></a>

You can use Identical topic name replication for cluster migration, but your consumers must be able to handle duplicate messages without downstream impact. This is because MSK Replicator provides at-least-once replication, which can lead to duplicate messages in rare scenarios. If your consumers meet this requirement, follow these steps.

1. Create a Replicator that replicates data from your old cluster to the new cluster with Replicator's starting position set to *Earliest* and using Identical topic name replication (**Keep the same topics name** in console).

1. Configure cluster-level settings and permissions on the new cluster. You do not need to configure topic-level settings and “literal” read ACLs, as MSK Replicator automatically copies them. 

1. Monitor the `MessageLag` metric in Amazon CloudWatch until it reaches 0 which indicates all data has been replicated.

1. After all data has been replicated, stop producers from writing data to the old cluster.

1. Reconfigure those producers to connect to the new cluster and start them.

1. Monitor `MaxOffsetLag` metric for your consumers reading data from the old cluster until it becomes `0`, which indicates all existing data has been processed. 

1. Stop consumers that are connecting to the old cluster.

1. Reconfigure consumers to connect to the new cluster and start them.

# Migrate from self-managed MirrorMaker2 to MSK Replicator
<a name="msk-replicator-migrate-mirrormaker2"></a>

To migrate from MirrorMaker (MM2) to MSK Replicator, follow these steps:

1. Stop the producer that is writing to your source Amazon MSK cluster.

1. Allow MM2 to replicate all the messages on your source clusters’ topics. You can monitor the consumer lag for MM2 consumer on your source MSK cluster to determine when all data has been replicated.

1. Create a new Replicator with starting position set to *Latest* and topic name configuration set to `IDENTICAL` (**Same topic names replication** in console).

1. Once your Replicator is in the RUNNING state, you can start the producers writing to the source cluster again.

# Troubleshoot MSK Replicator
<a name="msk-replicator-troubleshooting"></a>

The following information can help you troubleshoot problems that you might have with MSK Replicator. See [Troubleshoot your Amazon MSK cluster](troubleshooting.md) for problem solving information about other Amazon MSK features. You can also post your issue to [AWS re:Post](https://repost.aws/).

## MSK Replicator state goes from CREATING to FAILED
<a name="msk-replicator-troubleshooting-failed-state"></a>

Here are some common causes for MSK Replicator creation failure.

1. Verify that the security groups you provided for the Replicator creation in the Target cluster section have outbound rules to allow traffic to your target cluster's security groups. Also verify that your target cluster's security groups have inbound rules that accept traffic from the security groups you provide for the Replicator creation in the Target cluster section. See [Choose your target cluster](msk-replicator-create-console.md#msk-replicator-create-console-choose-target).

1. If you are creating Replicator for cross-region replication, verify that your source cluster has multi-VPC connectivity turned on for IAM Access Control authentication method. See [Amazon MSK multi-VPC private connectivity in a single Region](aws-access-mult-vpc.md). Also verify that the cluster policy is setup on the source cluster so that the MSK Replicator can connect to the source cluster. See [Prepare the Amazon MSK source cluster](msk-replicator-prepare-cluster.md).

1. Verify that the IAM role that you provided during MSK Replicator creation has the permissions required to read and write to your source and target clusters. Also, verify that the IAM role has permissions to write to topics. See [Configure replicator settings and permissions](msk-replicator-create-console.md#msk-replicator-create-settings)

1. Verify that your network ACLs are not blocking the connection between the MSK Replicator and your source and target clusters.

1. It's possible that source or target clusters are not fully available when the MSK Replicator tried to connect to them. This might be due to excessive load, disk usage or CPU usage, which causes the Replicator to be unable to connect to the brokers. Fix the issue with the brokers and retry Replicator creation.

After you have performed the validations above, create the MSK Replicator again.

## MSK Replicator appears stuck in the CREATING state
<a name="msk-replicator-troubleshooting-stuck-creating"></a>

Sometimes MSK Replicator creation can take up to 30 minutes. Wait for 30 minutes and check the state of the Replicator again.

## MSK Replicator is not replicating data or replicating only partial data
<a name="msk-replicator-troubleshooting-not-replicating"></a>

Follow these steps to troubleshoot data replication problems.

1. Verify that your Replicator is not running into any authentication errors using the AuthError metric provided by MSK Replicator in Amazon CloudWatch. If this metric is above 0, check if the policy of the IAM role you provided for the replicator is valid and there aren't deny permissions set for the cluster permissions. Based on clusterAlias dimension, you can identify if the source or target cluster is experiencing authentication errors.

1. Verify that your source and target clusters are not experiencing any issues. It is possible that the Replicator is not able to connect to your source or target cluster. This might happen due to too many connections, disk at full capacity or high CPU usage.

1. Verify that your source and target clusters are reachable from MSK Replicator using the KafkaClusterPingSuccessCount metric in Amazon CloudWatch. Based on clusterAlias dimension, you can identify if the source or target cluster is experiencing auth errors. If this metric is 0 or has no datapoint, the connection is unhealthy. You should check network and IAM role permissions that MSK Replicator is using to connect to your clusters. 

1. Verify that your Replicator is not running into failures due to missing topic-level permissions using the ReplicatorFailure metric in Amazon CloudWatch. If this metric is above 0, check the IAM role you provided for topic-level permissions.

1. Verify that the regular expression you provided in the allow list while creating the Replicator matches the names of the topics you want to replicate. Also, verify that the topics are not being excluded from replication due to a regular expression in the deny list.

1. Note that it may take up to 30 seconds for the Replicator to detect and create the new topics or topic partitions on the target cluster. Any messages produced to the source topic before the topic has been created on the target cluster will not be replicated if replicator starting position is latest (default). Alternatively, you can start replication from the earliest offset in the source cluster topic partitions if you want to replicate existing messages on your topics on the target cluster. See [Configure replicator settings and permissions](msk-replicator-create-console.md#msk-replicator-create-settings).

## Message offsets in the target cluster are different than the source cluster
<a name="msk-replicator-troubleshooting-different-message-offsets"></a>

As part of replicating data, MSK Replicator consumes messages from the source cluster and produces them to the target cluster. This can lead to messages having different offsets on your source and target clusters. However, if you have turned on consumer groups offsets syncing during Replicator creation, MSK Replicator will automatically translate the offsets while copying the metadata so that after failing over to the target cluster, your consumers can resume processing from near where they left off in the source cluster.

## MSK Replicator is not syncing consumer groups offsets or consumer group does not exist on target cluster
<a name="msk-replicator-troubleshooting-not-syncing-consumer-groups"></a>

Follow these steps to troubleshoot metadata replication problems.

1. Verify that your data replication is working as expected. If not, see [MSK Replicator is not replicating data or replicating only partial data](#msk-replicator-troubleshooting-not-replicating).

1. Verify that the regular expression you provided in the allow list while creating the Replicator matches the names of the consumer groups you want to replicate. Also, verify that the consumer groups are not being excluded from replication due to a regular expression in the deny list.

1. Verify that MSK Replicator has created the topic on the target cluster. It may take up to 30 seconds for the Replicator to detect and create the new topics or topic partitions on the target cluster. Any messages produced to the source topic before the topic has been created on the target cluster will not be replicated if the replicator starting position is *latest* (default). If your consumer group on the source cluster has only consumed the mesages that have not been replicated by MSK Replicator, the consumer group will not be replicated to the target cluster. After the topic is successfuly created on the target cluster, MSK Replicator will start replicating newly written messages on the source cluster to the target. Once your consumer group starts reading these messages from the source, MSK Replicator will automatically replicate the consumer group to the target cluster. Alternatively, you can start replication from the earliest offset in the source cluster topic partitions if you want to replicate existing messages on your topics on the target cluster. See [Configure replicator settings and permissions](msk-replicator-create-console.md#msk-replicator-create-settings).

**Note**  
MSK Replicator optimizes consumer groups offset syncing for your consumers on the source cluster which are reading from a position closer to the end of the topic partition. If your consumer groups are lagging on the source cluster, you may see higher lag for those consumer groups on the target as compared to the source. This means after failover to the target cluster, your consumers will reprocess more duplicate messages. To reduce this lag, your consumers on the source cluster would need to catch up and start consuming from the tip of the stream (end of the topic partition). As your consumers catch up, MSK Replicator will automatically reduce the lag.

## Replication latency is high or keeps increasing
<a name="msk-replicator-troubleshooting-high-latency"></a>

Here are some common causes for high replication latency.

1. Verify that you have the right number of partitions on your source and target MSK clusters. Having too few or too many partitions can impact performance. For guidance on choosing the number of partitions, see [Best practices for using MSK Replicator](msk-replicator-best-practices.md). The following table shows the recommended minimum number of partitions for getting the throughput you want with MSK Replicator.  
**Throughput and recommended minimum number of partitions**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/msk/latest/developerguide/msk-replicator-troubleshooting.html)

1. Verify that you have enough read and write capacity in your source and target MSK clusters to support the replication traffic. MSK Replicator acts as a consumer for your source cluster (egress) and as a producer for your target cluster (ingress). Therefore, you should provision cluster capacity to support the replication traffic in addition to other traffic on your clusters. See [Best practices for using MSK Replicator](msk-replicator-best-practices.md) for guidance on sizing your MSK clusters.

1. Replication latency might vary for MSK clusters in different source and destination AWS Region pairs, depending on how geographically far apart the clusters are from each other. For example, Replication latency is typically lower when replicating between clusters in the Europe (Ireland) and Europe (London) Regions compared to replication between clusters in the Europe (Ireland) and Asia Pacific (Sydney) Regions.

1. Verify that your Replicator is not getting throttled due to overly aggressive quotas set on your source or target clusters. You can use the ThrottleTime metric provided by MSK Replicator in Amazon CloudWatch to see the average time in milliseconds a request was throttled by brokers on your source/target cluster. If this metric is above 0, you should adjust Kafka quotas to reduce throttling so that Replicator can catch-up. See [Managing MSK Replicator throughput using Kafka quotas](msk-replicator-best-practices.md#msk-replicator-manage-throughput-kafka-quotas) for information on managing Kafka quotas for the Replicator.

1. ReplicationLatency and MessageLag might increase when an AWS Region becomes degraded. Use the [AWS Service Health Dashboard](https://health.aws.amazon.com/health/status) to check for an MSK service event in the Region where your primary MSK cluster is located. If there's a service event, you can temporarily redirect your application reads and writes to the other Region.

## Troubleshooting MSK Replicator failures using ReplicatorFailure metric
<a name="msk-replicator-troubleshooting-ReplicatorFailure"></a>

The ReplicatorFailure metric helps you monitor and detect replication issues in MSK Replicator. A non-zero value of this metric typically indicates replication failure issue, which might result from the following factors:
+ message size limitations
+ timestamp range violations
+ record batch size problems

If the ReplicatorFailure metric reports a non-zero value, follow these steps to troubleshoot the issue.

**Note**  
For more information about this metric, see [MSK Replicator metrics](msk-replicator-monitor.md#msk-replicator-metrics).

1. Configure a client that is able to connect to the target MSK cluster and has Apache Kafka CLI tools setup. For information about setting up the client and Kafka CLI tool, see [Connect to an Amazon MSK Provisioned cluster](client-access.md).

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

   Then, do the following:

   1. Obtain the ARNs of MSK Replicator and target MSK cluster.

   1. [Obtain the broker endpoints](get-bootstrap-console.md) of the target MSK cluster. You'll use these endpoints in the following steps.

1. Run the following commands to export the MSK Replicator ARN and broker endpoints you obtained in the previous step.

   Make sure that you replace the placeholder values for <*ReplicatorARN*>, <*BootstrapServerString*>, and <*ConsumerConfigFile*> used in the following examples with their actual values.

   ```
   export TARGET_CLUSTER_SERVER_STRING=<BootstrapServerString>
   ```

   ```
   export REPLICATOR_ARN=<ReplicatorARN>
   ```

   ```
   export CONSUMER_CONFIG_FILE=<ConsumerConfigFile>
   ```

1. In your `<path-to-your-kafka-installation>/bin` directory, do the following:

   1. Save the following script and name it **query-replicator-failure-message.sh**.

      ```
      #!/bin/bash
      
      # Script: Query MSK Replicator Failure Message
      # Description: This script queries exceptions from AWS MSK Replicator status topics
      # It takes a replicator ARN and bootstrap server as input and searches for replicator exceptions
      # in the replicator's status topic, formatting and displaying them in a readable manner
      #
      # Required Arguments:
      #   --replicator-arn: The ARN of the AWS MSK Replicator
      #   --bootstrap-server: The Kafka bootstrap server to connect to
      #   --consumer.config: Consumer config properties file
      # Usage Example:
      #   ./query-replicator-failure-message.sh ./query-replicator-failure-message.sh --replicator-arn <replicator-arn> --bootstrap-server <bootstrap-server> --consumer.config <consumer.config>
      
      print_usage() {
        echo "USAGE: $0 ./query-replicator-failure-message.sh --replicator-arn <replicator-arn> --bootstrap-server <bootstrap-server> --consumer.config <consumer.config>"
        echo "--replicator-arn <String: MSK Replicator ARN>      REQUIRED: The ARN of AWS MSK Replicator."
        echo "--bootstrap-server <String: server to connect to>  REQUIRED: The Kafka server to connect to."
        echo "--consumer.config <String: config file>            REQUIRED: Consumer config properties file."
        exit 1
      }
      
      # Initialize variables
      replicator_arn=""
      bootstrap_server=""
      consumer_config=""
      
      # Parse arguments
      while [[ $# -gt 0 ]]; do
        case "$1" in
          --replicator-arn)
            if [ -z "$2" ]; then
              echo "Error: --replicator-arn requires an argument."
              print_usage
            fi
            replicator_arn="$2"; shift 2 ;;
          --bootstrap-server)
            if [ -z "$2" ]; then
              echo "Error: --bootstrap-server requires an argument."
              print_usage
            fi
            bootstrap_server="$2"; shift 2 ;;
          --consumer.config)
            if [ -z "$2" ]; then
              echo "Error: --consumer.config requires an argument."
              print_usage
            fi
            consumer_config="$2"; shift 2 ;;
          *) echo "Unknown option: $1"; print_usage ;;
        esac
      done
      
      # Check for required arguments
      if [ -z "$replicator_arn" ] || [ -z "$bootstrap_server" ] || [ -z "$consumer_config" ]; then
        echo "Error: --replicator-arn, --bootstrap-server, and --consumer.config are required."
        print_usage
      fi
      
      # Extract replicator name and suffix from ARN
      replicator_arn_suffix=$(echo "$replicator_arn" | awk -F'/' '{print $NF}')
      replicator_name=$(echo "$replicator_arn" | awk -F'/' '{print $(NF-1)}')
      echo "Replicator name: $replicator_name"
      
      # List topics and find the status topic
      topics=$(./kafka-topics.sh --command-config client.properties --list --bootstrap-server "$bootstrap_server")
      status_topic_name="__amazon_msk_replicator_status_${replicator_name}_${replicator_arn_suffix}"
      
      # Check if the status topic exists
      if echo "$topics" | grep -Fq "$status_topic_name"; then
        echo "Found replicator status topic: '$status_topic_name'"
        ./kafka-console-consumer.sh --bootstrap-server "$bootstrap_server" --consumer.config "$consumer_config" --topic "$status_topic_name" --from-beginning | stdbuf -oL grep "Exception" | stdbuf -oL sed -n 's/.*Exception:\(.*\) Topic: \([^,]*\), Partition: \([^\]*\).*/ReplicatorException:\1 Topic: \2, Partition: \3/p'
      else
        echo "No topic matching the pattern '$status_topic_name' found."
      fi
      ```

   1. Run this script to query the MSK Replicator failure messages.

      ```
      <path-to-your-kafka-installation>/bin/query-replicator-failure-message.sh --replicator-arn $REPLICATOR_ARN --bootstrap-server $TARGET_CLUSTER_SERVER_STRING --consumer.config $CONSUMER_CONFIG_FILE
      ```

      This script outputs all the errors with their exception messages and affected topic-partitions. You can use this exception information to mitigate the failures as described in [Common MSK Replicator failures and their solutions](#msk-replicator-ReplicatorFailure-error-mitigation). Because the topic contains all the historical failure messages, start investigation using the last message. The following is an example of a failure message.

      ```
      ReplicatorException: The request included a message larger than the max message size the server will accept. Topic: test, Partition: 1
      ```

### Common MSK Replicator failures and their solutions
<a name="msk-replicator-ReplicatorFailure-error-mitigation"></a>

The following list describes some of the MSK Replicator failures that you might experience and how to mitigate them.

**Message size larger than max.request.size**  
**Cause**  
This failure occurs when the MSK Replicator fails to replicate data because the individual message size exceeds 10 MB. By default, MSK Replicator replicates messages up to 10 MB in size.
The following is an example of this failure message type.  

```
ReplicatorException: The message is 20635370 bytes when serialized which is larger than 10485760, which is the value of the max.request.size configuration. Topic: test, Partition: 1
```
**Solution**  
Reduce the individual message sizes in your topic. If you're unable to do so, follow these instructions for [requesting a limit increase](limits.md#request-msk-quota-increase).

**Message size larger than the max message size the server will accept**  
**Cause**  
This failure occurs when the message size exceeds the target cluster’s maximum message size.
The following is an example of this failure message type.  

```
ReplicatorException: The request included a message larger than the max message size the server will accept. Topic: test, Partition: 1
```
**Solution**  
Increase the `max.message.bytes` configuration on the target cluster or corresponding target cluster topic. Set the target cluster’s `max.message.bytes` configuration to match your largest uncompressed message size. For information about doing this, see [max.message.bytes](https://kafka.apache.org/documentation/#topicconfigs_max.message.bytes).

**Timestamp is out of range**  
**Cause**  
This failure occurs because the individual message timestamp falls outside of the target cluster’s allowed range.
The following is an example of this failure message type.  

```
ReplicatorException: Timestamp 1730137653724 of message with offset 0 is out of range. The timestamp should be within [1730137892239, 1731347492239] Topic: test, Partition: 1
```
**Solution**  
Update the target cluster’s `message.timestamp.before.max.ms` configuration to allow for messages with older timestamps. For information about doing this, see [message.timestamp.before.max.ms](https://kafka.apache.org/documentation/#topicconfigs_message.timestamp.before.max.ms).

**Record batch too large**  
**Cause**  
This failure occurs because the record batch size exceeds the segment size set for the topic on the target cluster. MSK Replicator supports a maximum batch size of 1 MB.
The following is an example of this failure message type.  

```
ReplicatorException: The request included message batch larger than the configured segment size on the server. Topic: test, Partition: 1
```
**Solution**  
The target cluster’s segment.bytes configuration must be at least as large as the batch size (1 MB) for Replicator to proceed without errors. Update the target cluster’s segment.bytes to be at least 1048576 (1 MB). For information about doing this, see [segment.bytes](https://kafka.apache.org/documentation/#topicconfigs_segment.bytes).

**Note**  
If the ReplicatorFailure metric continues to emit non-zero values after applying these solutions, repeat the troubleshooting process until the metric emits a value of zero.

# Best practices for using MSK Replicator
<a name="msk-replicator-best-practices"></a>

This section covers common best practices and implementation strategies for using Amazon MSK Replicator.

**Contents**
+ [

## Managing MSK Replicator throughput using Kafka quotas
](#msk-replicator-manage-throughput-kafka-quotas)
+ [

## Setting cluster retention period
](#msk-replicator-retention-period)

## Managing MSK Replicator throughput using Kafka quotas
<a name="msk-replicator-manage-throughput-kafka-quotas"></a>

Since MSK Replicator acts as a consumer for your source cluster, replication can cause other consumers to be throttled on your source cluster. The amount of throttling depends on the read capacity you have on your source cluster and the throughput of data you’re replicating. We recommend that your provision identical capacity for your source and target clusters, and account for the replication throughput when calculating how much capacity you need.

You can also set Kafka quotas for the Replicator on your source and target clusters to control how much capacity the MSK Replicator can use. A network bandwidth quota is recommended. A network bandwidth quota defines a byte rate threshold, defined as bytes per second, for one or more clients sharing a quota. This quota is defined on a per-broker basis.

Follow these steps to apply a quota.

1. Retrieve the bootstrap server string for the source cluster. See [Get the bootstrap brokers for an Amazon MSK cluster](msk-get-bootstrap-brokers.md).

1. Retrieve the service execution role (SER) used by the MSK Replicator. This is the SER you used for a `CreateReplicator` request. You can also pull the SER from the DescribeReplicator response from an existing Replicator.

1. Using Kafka CLI tools, run the following command against the source cluster.

   ```
   ./kafka-configs.sh --bootstrap-server <source-cluster-bootstrap-server> --alter --add-config 'consumer_byte_
   rate=<quota_in_bytes_per_second>' --entity-type users --entity-name arn:aws:sts::<customer-account-id>:assumed-role/<ser-role-name>/<customer-account-id> --command-config <client-properties-for-iam-auth></programlisting>
   ```

1. After executing the above command, verify that the `ReplicatorThroughput` metric does not cross the quota you have set.

Note that if you re-use a service execution role between multiple MSK Replicators they are all subject to this quota. If you want to maintain separate quotas per Replicator, use separate service execution roles.

For more information on using MSK IAM authentication with quotas, see [Multi-tenancy Apache Kafka clusters in Amazon MSK with IAM access control and Kafka Quotas – Part 1](https://aws.amazon.com/blogs/big-data/multi-tenancy-apache-kafka-clusters-in-amazon-msk-with-iam-access-control-and-kafka-quotas-part-1/).

**Warning**  
Setting an extremely low consumer\$1byte\$1rate may cause your MSK Replicator to act in unexpected ways.

## Setting cluster retention period
<a name="msk-replicator-retention-period"></a>

You can set the log retention period for MSK provisioned and serverless clusters. The recommended retention period is 7 days. See [Cluster configuration changes](msk-replicator-serverless-requirements.md#msk-replicator-config-changes) or [Supported MSK Serverless cluster configuration](msk-replicator-serverless-requirements.md).