

# Best practices
<a name="msk-replicator-best-practices"></a>

This section covers common best practices and implementation strategies for using Amazon MSK Replicator.

**Topics**
+ [Sizing and partitions](msk-replicator-bp-sizing.md)
+ [Managing throughput with Kafka quotas](msk-replicator-bp-quotas.md)
+ [Retention period](msk-replicator-bp-retention.md)
+ [Cluster configuration changes after Replicator creation](msk-replicator-post-creation-config.md)
+ [Considerations for multi-Region applications](msk-replicator-bp-multi-region.md)

# Sizing and partitions
<a name="msk-replicator-bp-sizing"></a>

The number of partitions on your source and target MSK clusters directly impacts replication performance. Having too few or too many partitions can impact performance.

The following table shows the recommended minimum number of partitions for getting the throughput you want with MSK Replicator.


**Throughput and recommended minimum number of partitions**  

| Throughput (MB/s) | Minimum number of partitions required | 
| --- | --- | 
| 50 | 167 | 
| 100 | 334 | 
| 250 | 833 | 
| 500 | 1666 | 
| 1000 | 3333 | 

Verify that you have enough read and write capacity in your source and target MSK clusters to support the replication traffic. MSK Replicator acts as a consumer for your source cluster (egress) and as a producer for your target cluster (ingress). Therefore, you should provision cluster capacity to support the replication traffic in addition to other traffic on your clusters.

We recommend that you provision identical capacity for your source and target clusters, and account for the replication throughput when calculating how much capacity you need.

# Managing throughput with Kafka quotas
<a name="msk-replicator-bp-quotas"></a>

Since MSK Replicator acts as a consumer for your source cluster, replication can cause other consumers to be throttled on your source cluster. The amount of throttling depends on the read capacity you have on your source cluster and the throughput of data you are replicating.

You can set Kafka quotas for the Replicator on your source and target clusters to control how much capacity the MSK Replicator can use. A network bandwidth quota is recommended. A network bandwidth quota defines a byte rate threshold, defined as bytes per second, for one or more clients sharing a quota. This quota is defined on a per-broker basis.

Follow these steps to apply a quota:

1. Retrieve the bootstrap server string for the source cluster. See [Get the bootstrap brokers for an Amazon MSK cluster](msk-get-bootstrap-brokers.md).

1. Retrieve the service execution role (SER) used by the MSK Replicator. This is the SER you used for a `CreateReplicator` request. You can also pull the SER from the `DescribeReplicator` response.

1. Using Kafka CLI tools, run the following command against the source cluster:

   ```
   ./kafka-configs.sh --bootstrap-server <source-cluster-bootstrap-server> \
     --alter \
     --add-config 'consumer_byte_rate=<quota_in_bytes_per_second>' \
     --entity-type users \
     --entity-name arn:aws:sts::<customer-account-id>:assumed-role/<ser-role-name>/<customer-account-id> \
     --command-config <client-properties-for-iam-auth>
   ```

1. After executing the command, verify that the `ReplicatorThroughput` metric does not cross the quota you have set.

If you reuse a service execution role between multiple MSK Replicators, they are all subject to this quota. If you want to maintain separate quotas per Replicator, use separate service execution roles.

For more information on using MSK IAM authentication with quotas, see [Multi-tenancy Apache Kafka clusters in Amazon MSK with IAM access control and Kafka Quotas – Part 1](https://aws.amazon.com/blogs/big-data/multi-tenancy-apache-kafka-clusters-in-amazon-msk-with-iam-access-control-and-kafka-quotas-part-1/).

**Warning**  
Setting an extremely low `consumer_byte_rate` may cause your MSK Replicator to act in unexpected ways.

# Retention period
<a name="msk-replicator-bp-retention"></a>

You can set the log retention period for MSK Provisioned and Serverless clusters. The recommended retention period is 7 days. See [Cluster configuration changes after Replicator creation](msk-replicator-post-creation-config.md) or [MSK Serverless cluster configuration](msk-replicator-supported-configs.md#msk-replicator-serverless-config).

# Cluster configuration changes after Replicator creation
<a name="msk-replicator-post-creation-config"></a>
+ We recommend that you do not turn tiered storage on or off after the MSK Replicator has been created. If your target cluster is not tiered, MSK will not copy the tiered storage configurations, regardless of whether your source cluster is tiered. If you turn on tiered storage on the target cluster after Replicator is created, the Replicator needs to be recreated. If you want to copy data from a non-tiered to a tiered cluster, you should not copy topic configurations.
+ Do not change the following cluster configuration settings after MSK Replicator creation, as they are validated during creation:
  + Change MSK cluster to t3 instance type.
  + Change service execution role permissions.
  + Disable MSK multi-VPC private connectivity.
  + Change the attached cluster resource-based policy.
  + Change cluster security group rules.
+ For Identical topic name replication configurations, do not make changes to the headers that MSK Replicator creates (`__mskmr`) to avoid the risk of cyclic replication.

# Considerations for multi-Region applications
<a name="msk-replicator-bp-multi-region"></a>

When building multi-Region Apache Kafka applications with MSK Replicator, keep the following in mind:
+ **Idempotent consumers:** Your consumers must be able to reprocess duplicate messages without downstream impact. MSK Replicator replicates data at-least-once, which may result in duplicates in the standby cluster. When you switch over to the secondary AWS Region, your consumers may process the same data more than once. MSK Replicator prioritizes copying data over consumer offsets for better performance. After a failover, the consumer may start reading from earlier offsets resulting in duplicate processing.
+ **Tolerate minimal data loss:** Producers and consumers must tolerate losing minimal data. Since MSK Replicator replicates data asynchronously, when the primary AWS Region starts experiencing failures, there is no guarantee that all data is replicated to the secondary Region. You can use the replication latency to determine the maximum data that was not copied into the secondary Region.