# Storage management for Standard brokers
<a name="msk-storage-management"></a>

Amazon MSK provides features to help you with storage management on your MSK clusters.

**Note**  
With [Express brokers](msk-broker-types-express.md), you don't need to provision or manage any storage resoures used for your data. This simplifies cluster management and eliminates one of the common causes of operational issues with Apache Kafka clusters. You also spend less as you don't have to provision idle storage capacity and you only pay for what you use.

**Standard broker type**  
With [Standard brokers](msk-broker-types-standard.md) you can choose from a variety of storage options and capabilities. Amazon MSK provides features to help you with storage management on your MSK clusters.

For information about managing throughput, see [Provision storage throughput for Standard brokers in a Amazon MSK cluster](msk-provision-throughput.md).

**Topics**
+ [Tiered storage for Standard brokers](msk-tiered-storage.md)
+ [Scale up Amazon MSK Standard broker storage](msk-update-storage.md)
+ [Manage storage throughput for Standard brokers in a Amazon MSK cluster](msk-provision-throughput-management.md)

# Tiered storage for Standard brokers
<a name="msk-tiered-storage"></a>

Tiered storage is a low-cost storage tier for Amazon MSK that scales to virtually unlimited storage, making it cost-effective to build streaming data applications.

You can create an Amazon MSK cluster configured with tiered storage that balances performance and cost. Amazon MSK stores streaming data in a performance-optimized primary storage tier until it reaches the Apache Kafka topic retention limits. Then, Amazon MSK automatically moves data into the new low-cost storage tier.

When your application starts reading data from the tiered storage, you can expect an increase in read latency for the first few bytes. As you start reading the remaining data sequentially from the low-cost tier, you can expect latencies that are similar to the primary storage tier. You don't need to provision any storage for the low-cost tiered storage or manage the infrastructure. You can store any amount of data and pay only for what you use. This feature is compatible with the APIs introduced in [KIP-405: Kafka Tiered Storage](https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage).

For information about sizing, monitoring, and optimizing your MSK tiered storage cluster, see [Best practices for running production workloads using Amazon MSK tiered storage](https://aws.amazon.com/blogs/big-data/best-practices-for-running-production-workloads-using-amazon-msk-tiered-storage/).

Here are some of the features of tiered storage:
+ You can scale to virtually unlimited storage. You don't have to guess how to scale your Apache Kafka infrastructure.
+ You can retain data longer in your Apache Kafka topics, or increase your topic storage, without the need to increase the number of brokers.
+ It provides a longer duration safety buffer to handle unexpected delays in processing.
+ You can reprocess old data in its exact production order with your existing stream processing code and Kafka APIs.
+ Partitions rebalance faster because data on secondary storage doesn't require replication across broker disks.
+ Data between brokers and the tiered storage moves within the VPC and doesn't travel through the internet.
+ A client machine can use the same process to connect to new clusters with tiered storage enabled as it does to connect to a cluster without tiered storage enabled. See [Create a client machine](https://docs.aws.amazon.com/msk/latest/developerguide/create-client-machine.html).

## Tiered storage requirements for Amazon MSK clusters
<a name="msk-tiered-storage-requirements"></a>
+ You must use Apache Kafka client version 3.0.0 or higher to create a new topic with tiered storage enabled. To transition an existing topic to tiered storage, you can reconfigure a client machine that uses a Kafka client version lower than 3.0.0 (minimum supported Apache Kafka version is 2.8.2.tiered) to enable tiered storage. See [Step 4: Create a topic in the Amazon MSK cluster](create-topic.md).
+ The Amazon MSK cluster with tiered storage enabled must use version 3.6.0 or higher, or 2.8.2.tiered.

## Tiered storage constraints and limitations for Amazon MSK clusters
<a name="msk-tiered-storage-constraints"></a>

Tiered storage has the following constraints and limitations:
+ Make sure clients are not configured to `read_committed` when reading from the remote\$1tier in Amazon MSK, unless the application is actively using the transactions feature.
+ Tiered storage isn't available in AWS GovCloud (US) regions.
+ Tiered storage applies only to provisioned mode clusters.
+ Tiered storage doesn’t support broker size t3.small.
+ The minimum retention period in low-cost storage is 3 days. There is no minimum retention period for primary storage.
+ Tiered storage doesn’t support Multiple Log directories on a broker (JBOD related features).
+ Tiered storage doesn't support compacted topics. Make sure that all topics that have tiered storage turned on have their cleanup.policy configured to 'DELETE' only.
+ Tiered storage cluster doesn’t support altering the log.cleanup.policy policy for a topic after it’s created.
+ Tiered storage can be disabled for individual topics but not for the entire cluster. Once disabled, tiered storage cannot be re-enabled for a topic.
+ If you use Amazon MSK version 2.8.2.tiered, you can migrate only to another tiered storage-supported Apache Kafka version. If you don't want to continue using a tiered storage-supported version, create a new MSK cluster and migrate your data to it.
+ The kafka-log-dirs tool can't report tiered storage data size. The tool only reports the size of the log segments in primary storage.

For information about default settings and constraints you must be mindful of when you configure tiered storage at the topic level, see [Guidelines for Amazon MSK tiered storage topic-level configuration](msk-guidelines-tiered-storage-topic-level-config.md).

# How log segments are copied to tiered storage for a Amazon MSK topic
<a name="msk-tiered-storage-retention-rules"></a>

When you enable tiered storage for a new or existing topic, Apache Kafka copies closed log segments from primary storage to tiered storage.
+ Apache Kafka only copies closed log segments. It copies all messages within the log segment to tiered storage.
+ Active segments are not eligible for tiering. The log segment size (segment.bytes) or the segment roll time (segment.ms) controls the rate of segment closure, and the rate Apache Kafka then copies them to tiered storage.

Retention settings for a topic with tiered storage enabled are different from settings for a topic without tiered storage enabled. The following rules control the retention of messages in topics with tiered storage enabled:
+ You define retention in Apache Kafka with two settings: log.retention.ms (time) and log.retention.bytes (size). These settings determine the total duration and size of the data that Apache Kafka retains in the cluster. Whether or not you enable tiered storage mode, you set these configurations at the cluster level. You can override the settings at the topic level with topic configurations.
+ When you enable tiered storage, you can additionally specify how long the primary high-performance storage tier stores data. For example, if a topic has overall retention (log.retention.ms) setting of 7 days and local retention (local.retention.ms) of 12 hours, then the cluster primary storage retains data for only the first 12 hours. The low-cost storage tier retains the data for the full 7 days.
+ The usual retention settings apply to the full log. This includes its tiered and primary parts.
+ The local.retention.ms or local.retention.bytes settings control the retention of messages in primary storage. Apache Kafka copies closed log segments to tiered storage as soon as they close (based on segment.bytes or segment.ms), independent of local retention settings. After segments are copied to tiered storage, they remain in primary storage until the local.retention.ms or local.retention.bytes thresholds are reached. At that point, the data is deleted from primary storage but remains available in tiered storage. This allows you to keep recent data on high-performance primary storage for fast access while older data is served from the low-cost tiered storage.
+ When Apache Kafka copies a message in a log segment to tiered storage, it removes the message from the cluster based on retention.ms or retention.bytes settings.

## Example Amazon MSK tiered storage scenario
<a name="msk-tiered-storage-retention-scenario"></a>

This scenario illustrates how an existing topic that has messages in primary storage behaves when tiered storage is enabled. You enable tiered storage on this topic by when you set remote.storage.enable to `true`. In this example, retention.ms is set to 5 days and local.retention.ms is set to 2 days. The following is the sequence of events when a segment expires.

**Time T0 - Before you enable tiered storage.**  
Before you enable tiered storage for this topic, there are two log segments. One of the segments is active for an existing topic partition 0.

![\[Time T0 - Before you enable tiered storage.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-1.png)


**Time T1 (< 2 days) - Tiered storage enabled. Segment 0 copied to tiered storage.**  
After you enable tiered storage for this topic, Apache Kafka copies closed log segment 0 to tiered storage as soon as it closes. The segment closes based on segment.bytes or segment.ms settings, not based on retention settings. Apache Kafka retains a copy in primary storage as well. The active segment 1 is not eligible to copy to tiered storage yet because it is still active and hasn't closed. In this timeline, Amazon MSK doesn't apply any of the retention settings yet for any of the messages in segment 0 and segment 1. (local.retention.bytes/ms, retention.ms/bytes)

![\[Time T1 (< 2 days) - Tiered storage enabled. Segment 0 copied to tiered storage.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-2.png)


**Time T2 - Local retention in effect.**  
After 2 days, the local retention threshold is reached for segment 0. The setting of local.retention.ms as 2 days determines this. Segment 0 is now deleted from primary storage, but it remains available in tiered storage. Note that segment 0 was already copied to tiered storage at Time T1 when it closed, not at Time T2 when local retention expired. Active segment 1 is neither eligible for deletion nor eligible to copy to tiered storage yet because it is still active.

![\[Time T2 - Local retention in effect.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-3.png)


**Time T3 - Overall retention in effect.**  
 After 5 days, retention settings take effect, and Kafka clears log segment 0 and associated messages from tiered storage. Segment 1 is neither eligible for expiration nor eligible to copy over to tiered storage yet because it is active. Segment 1 is not yet closed, so it is ineligible for segment roll.

![\[Time T3 - Overall retention in effect.\]](http://docs.aws.amazon.com/msk/latest/developerguide/images/tiered-storage-segments-4.png)


# Create a Amazon MSK cluster with tiered storage with the AWS Management Console
<a name="msk-create-cluster-tiered-storage-console"></a>

This process describes how to create a tiered storage Amazon MSK cluster using the AWS Management Console.

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/).

1. Choose **Create cluster**.

1. Choose **Custom create** for tiered storage.

1. Specify a name for the cluster.

1. In the **Cluster type**, select **Provisioned**.

1. Choose an Amazon Kafka version that supports tiered storage for Amazon MSK to use to create the cluster. 

1. Specify a size of broker other than **kafka.t3.small**.

1. Select the number of brokers that you want Amazon MSK to create in each Availability Zone. The minimum is one broker per Availability Zone, and the maximum is 30 brokers per cluster.

1. Specify the number of zones that brokers are distributed across.

1. Specify the number of Apache Kafka brokers that are deployed per zone.

1. Select **Storage options**. This includes **Tiered storage and EBS storage** to enable tiered storage mode.

1. Follow the remaining steps in the cluster creation wizard. When complete, **Tiered storage and EBS storage** appears as the cluster storage mode in the **Review and create** view.

1. Select **Create cluster**.

# Create an Amazon MSK cluster with tiered storage with the AWS CLI
<a name="msk-create-cluster-tiered-storage-cli"></a>

To enable tiered storage on a cluster, create the cluster with the correct Apache Kafka version and attribute for tiered storage. Follow the code example below. Also, complete the steps in the next section to [Create a Kafka topic with tiered storage enabled with the AWS CLI](#msk-create-topic-tiered-storage-cli).

See [create-cluster](https://docs.aws.amazon.com//cli/latest/reference/kafka/create-cluster.html) for a complete list of supported attributes for cluster creation.

```
aws kafka create-cluster \
 —cluster-name "MessagingCluster" \
 —broker-node-group-info file://brokernodegroupinfo.json \
 —number-of-broker-nodes 3 \
--kafka-version "3.6.0" \
--storage-mode "TIERED"
```

## Create a Kafka topic with tiered storage enabled with the AWS CLI
<a name="msk-create-topic-tiered-storage-cli"></a>

To complete the process that you started when you created a cluster with the tiered storage enabled, also create a topic with tiered storage enabled with the attributes in the later code example. The attributes specifically for tiered storage are the following:
+ `local.retention.ms` (for example, 10 mins) for time-based retention settings or `local.retention.bytes` for log segment size limits.
+ `remote.storage.enable` set to `true` to enable tiered storage.

The following configuration uses local.retention.ms, but you can replace this attribute with local.retention.bytes. This attribute controls the amount of time that can pass or number of bytes that Apache Kafka can copy before Apache Kafka copies the data from primary to tiered storage. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes.

**Note**  
You must use the Apache Kafka client version 3.0.0 and above. These versions support a setting called `remote.storage.enable` only in those client versions of `kafka-topics.sh`. To enable tiered storage on an existing topic that uses an earlier version of Apache Kafka, see the section [Enabling tiered storage on an existing Amazon MSK topic](msk-enable-disable-topic-tiered-storage-cli.md#msk-enable-topic-tiered-storage-cli).

```
bin/kafka-topics.sh --create --bootstrap-server $bs --replication-factor 2 --partitions 6 --topic MSKTutorialTopic --config remote.storage.enable=true --config local.retention.ms=100000 --config retention.ms=604800000 --config segment.bytes=134217728
```

# Enable and disable tiered storage on an existing Amazon MSK topic
<a name="msk-enable-disable-topic-tiered-storage-cli"></a>

These sections cover how to enable and disable tiered storage on a topic that you've already created. To create a new cluster and topic with tiered storage enabled, see [Creating a cluster with tiered storage using the AWS Management Console](https://docs.aws.amazon.com//msk/latest/developerguide/msk-create-cluster-tiered-storage-console).

## Enabling tiered storage on an existing Amazon MSK topic
<a name="msk-enable-topic-tiered-storage-cli"></a>

To enable tiered storage on an existing topic, use the `alter` command syntax in the following example. When you enable tiered storage on an already existing topic, you aren't restricted to a certain Apache Kafka client version.

```
bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name msk-ts-topic --add-config 'remote.storage.enable=true, local.retention.ms=604800000, retention.ms=15550000000'
```

## Disable tiered storage on an existing Amazon MSK topic
<a name="msk-disable-topic-tiered-storage-cli"></a>

To disable tiered storage on an existing topic, use the `alter` command syntax in the same order as when you enable tiered storage.

```
bin/kafka-configs.sh --bootstrap-server $bs --alter --entity-type topics --entity-name MSKTutorialTopic --add-config 'remote.log.msk.disable.policy=Delete, remote.storage.enable=false'
```

**Note**  
When you disable tiered storage, you completely delete the topic data in tiered storage. Apache Kafka retains primary storage data , but it still applies the primary retention rules based on `local.retention.ms`. After you disable tiered storage on a topic, you can't re-enable it. If you want to disable tiered storage on an existing topic, you aren't restricted to a certain Apache Kafka client version.

# Enable tiered storage on an existing Amazon MSK cluster using AWS CLI
<a name="msk-enable-cluster-tiered-storage-cli"></a>

**Note**  
You can enable tiered storage only if your cluster's log.cleanup.policy is set to `delete`, as compacted topics are not supported on tiered storage. Later, you can configure an individual topic's log.cleanup.policy to `compact` if tiered storage is not enabled on that particular topic. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes.

1. **Update the Kafka version** – Cluster versions aren't simple integers. To find the current version of the cluster, use the `DescribeCluster` operation or the `describe-cluster` AWS CLI command. An example version is `KTVPDKIKX0DER`.

   ```
   aws kafka update-cluster-kafka-version --cluster-arn ClusterArn --current-version Current-Cluster-Version --target-kafka-version 3.6.0
   ```

1. Edit cluster storage mode. The following code example shows editing the cluster storage mode to `TIERED` using the [https://docs.aws.amazon.com/cli/latest/reference/kafka/update-storage.html](https://docs.aws.amazon.com/cli/latest/reference/kafka/update-storage.html) API.

   ```
   aws kafka update-storage --current-version Current-Cluster-Version --cluster-arn Cluster-arn --storage-mode TIERED
   ```

# Update tiered storage on an existing Amazon MSK cluster using the console
<a name="msk-update-tiered-storage-console"></a>

This process describes how to updated a tiered storage Amazon MSK cluster using the AWS Management Console.

Make sure the current Apache Kafka version of your MSK cluster is 2.8.2.tiered. Refer to [updating the Apache Kafka version](https://docs.aws.amazon.com/msk/latest/developerguide/version-upgrades.html) if you need to upgrade your MSK cluster to 2.8.2.tiered version.

**Note**  
You can enable tiered storage only if your cluster's log.cleanup.policy is set to `delete`, as compacted topics are not supported on tiered storage. Later, you can configure an individual topic's log.cleanup.policy to `compact` if tiered storage is not enabled on that particular topic. See [Topic-level configuration](https://docs.aws.amazon.com//msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration) for more details on supported configuration attributes.

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/).

1. Go to the cluster summary page and choose **Properties**.

1. Go to the **Storage** section and choose **Edit cluster storage mode**.

1. Choose **Tiered storage and EBS storage** and **Save changes**.

# Scale up Amazon MSK Standard broker storage
<a name="msk-update-storage"></a>

You can increase the amount of EBS storage per broker. You can't decrease the storage. 

Storage volumes remain available during this scaling-up operation.

**Important**  
When storage is scaled for an MSK cluster, the additional storage is made available right away. However, the cluster requires a cool-down period after every storage scaling event. Amazon MSK uses this cool-down period to optimize the cluster before it can be scaled again. This period can range from a minimum of 6 hours to over 24 hours, depending on the cluster's storage size and utilization and on traffic. This is applicable for both auto scaling events and manual scaling using the [UpdateBrokerStorage](https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-nodes-storage.html#UpdateBrokerStorage) operation. For information about right-sizing your storage, see [Best practices for Standard brokers](bestpractices.md). 

You can use tiered storage to scale up to unlimited amounts of storage for your broker. See, [Tiered storage for Standard brokers](msk-tiered-storage.md).

**Topics**
+ [Automatic scaling for Amazon MSK clusters](msk-autoexpand.md)
+ [Manual scaling for Standard brokers](manually-expand-storage.md)

# Automatic scaling for Amazon MSK clusters
<a name="msk-autoexpand"></a>

To automatically expand your cluster's storage in response to increased usage, you can configure an Application Auto-Scaling policy for Amazon MSK. In an auto-scaling policy, you set the target disk utilization and the maximum scaling capacity.

Before you use automatic scaling for Amazon MSK, you should consider the following:
+ 
**Important**  
A storage scaling action can occur only once every six hours. 

  We recommend that you start with a right-sized storage volume for your storage demands. For guidance on right-sizing your cluster, see [Right-size your cluster: Number of Standard brokers per cluster](bestpractices.md#brokers-per-cluster).
+ Amazon MSK does not reduce cluster storage in response to reduced usage. Amazon MSK does not support decreasing the size of storage volumes. If you need to reduce the size of your cluster storage, you must migrate your existing cluster to a cluster with smaller storage. For information about migrating a cluster, see [Migrate to MSK cluster](migration.md).
+ Amazon MSK doesn't support automatic scaling in the Asia Pacific (Osaka), Africa (Cape Town), and Asia Pacific (Malaysia) Regions.
+ When you associate an auto-scaling policy with your cluster, Amazon EC2 Auto Scaling automatically creates an Amazon CloudWatch alarm for target tracking. If you delete a cluster with an auto-scaling policy, this CloudWatch alarm persists. To delete the CloudWatch alarm, you should remove an auto-scaling policy from a cluster before you delete the cluster. To learn more about target tracking, see [Target tracking scaling policies for Amazon EC2 Auto Scaling](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html) in the *Amazon EC2 Auto Scaling User Guide*.

**Topics**
+ [Auto-scaling policy details for Amazon MSK](msk-autoexpand-details.md)
+ [Set up automatic scaling for your Amazon MSK cluster](msk-autoexpand-setup.md)

# Auto-scaling policy details for Amazon MSK
<a name="msk-autoexpand-details"></a>

An auto-scaling policy defines the following parameters for your cluster:
+ **Storage Utilization Target**: The storage utilization threshold that Amazon MSK uses to trigger an auto-scaling operation. You can set the utilization target between 10% and 80% of the current storage capacity. We recommend that you set the Storage Utilization Target between 50% and 60%.
+ **Maximum Storage Capacity**: The maximum scaling limit that Amazon MSK can set for your broker storage. You can set the maximum storage capacity up to 16 TiB per broker. For more information, see [Amazon MSK quota](limits.md).

When Amazon MSK detects that your `Maximum Disk Utilization` metric is equal to or greater than the `Storage Utilization Target` setting, it increases your storage capacity by an amount equal to the larger of two numbers: 10 GiB or 10% of current storage. For example, if you have 1000 GiB, that amount is 100 GiB. The service checks your storage utilization every minute. Further scaling operations continue to increase storage by an amount equal to the larger of two numbers: 10 GiB or 10% of current storage.

To determine if auto-scaling operations have occurred, use the [ ListClusterOperations](https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn-operations.html#ListClusterOperations) operation.

# Set up automatic scaling for your Amazon MSK cluster
<a name="msk-autoexpand-setup"></a>

You can use the Amazon MSK console, the Amazon MSK API, or CloudFormation to implement automatic scaling for storage. CloudFormation support is available through [Application Auto Scaling](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-applicationautoscaling-scalabletarget.html).

**Note**  
You can't implement automatic scaling when you create a cluster. You must first create the cluster, and then create and enable an auto-scaling policy for it. However, you can create the policy while Amazon MSK service creates your cluster.

**Topics**
+ [Set up automatic scaling using the Amazon MSK AWS Management Console](msk-autoexpand-setup-console.md)
+ [Set up automatic scaling using the CLI](msk-autoexpand-setup-cli.md)
+ [Set up automatic-scaling for Amazon MSK using the API](msk-autoexpand-setup-api.md)

# Set up automatic scaling using the Amazon MSK AWS Management Console
<a name="msk-autoexpand-setup-console"></a>

This process describes how to use the Amazon MSK console to implement automatic scaling for storage.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

1. In the list of clusters, choose your cluster. This takes you to a page that lists details about the cluster.

1. In the **Auto scaling for storage** section, choose **Configure**.

1. Create and name an auto-scaling policy. Specify the storage utilization target, the maximum storage capacity, and the target metric.

1. Choose `Save changes`.

When you save and enable the new policy, the policy becomes active for the cluster. Amazon MSK then expands the cluster's storage when the storage utilization target is reached.

# Set up automatic scaling using the CLI
<a name="msk-autoexpand-setup-cli"></a>

This process describes how to use the Amazon MSK CLI to implement automatic scaling for storage.

1. Use the [ RegisterScalableTarget](https://docs.aws.amazon.com/cli/latest/reference/application-autoscaling/#available-commands) command to register a storage utilization target.

1. Use the [ PutScalingPolicy](https://docs.aws.amazon.com/cli/latest/reference/application-autoscaling/#available-commands) command to create an auto-expansion policy.

# Set up automatic-scaling for Amazon MSK using the API
<a name="msk-autoexpand-setup-api"></a>

This process describes how to use the Amazon MSK API to implement automatic scaling for storage.

1. Use the [ RegisterScalableTarget](https://docs.aws.amazon.com/autoscaling/application/APIReference/API_RegisterScalableTarget.html) API to register a storage utilization target.

1. Use the [ PutScalingPolicy](https://docs.aws.amazon.com/autoscaling/application/APIReference/API_PutScalingPolicy.html) API to create an auto-expansion policy.

# Manual scaling for Standard brokers
<a name="manually-expand-storage"></a>

To increase storage, wait for the cluster to be in the `ACTIVE` state. Storage scaling has a cool-down period of at least six hours between events. Even though the operation makes additional storage available right away, the service performs optimizations on your cluster that can take up to 24 hours or more. The duration of these optimizations is proportional to your storage size.

## Scaling up broker storage using the AWS Management Console
<a name="update-storage-console"></a>

1. Open the Amazon MSK console at [https://console.aws.amazon.com/msk/](https://console.aws.amazon.com/msk/).

1. Choose the MSK cluster for which you want to update broker storage.

1. In the **Storage** section, choose **Edit**.

1. Specify the storage volume you want. You can only increase the amount of storage, you can't decrease it.

1. Choose **Save changes**.

## Scaling up broker storage using the AWS CLI
<a name="update-storage-cli"></a>

Run the following command, replacing *ClusterArn* with the Amazon Resource Name (ARN) that you obtained when you created your cluster. If you don't have the ARN for your cluster, you can find it by listing all clusters. For more information, see [List Amazon MSK clusters](msk-list-clusters.md). 

Replace *Current-Cluster-Version* with the current version of the cluster. 

**Important**  
Cluster versions aren't simple integers. To find the current version of the cluster, use the [DescribeCluster](https://docs.aws.amazon.com/msk/1.0/apireference/clusters-clusterarn.html#DescribeCluster) operation or the [describe-cluster](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/kafka/describe-cluster.html) AWS CLI command. An example version is `KTVPDKIKX0DER`.

The *Target-Volume-in-GiB* parameter represents the amount of storage that you want each broker to have. It is only possible to update the storage for all the brokers. You can't specify individual brokers for which to update storage. The value you specify for *Target-Volume-in-GiB* must be a whole number that is greater than 100 GiB. The storage per broker after the update operation can't exceed 16384 GiB.

```
aws kafka update-broker-storage --cluster-arn ClusterArn --current-version Current-Cluster-Version --target-broker-ebs-volume-info '{"KafkaBrokerNodeId": "All", "VolumeSizeGB": Target-Volume-in-GiB}' 
```

## Scaling up broker storage using the API
<a name="update-storage-api"></a>

To update a broker storage using the API, see [UpdateBrokerStorage](https://docs.aws.amazon.com//msk/1.0/apireference/clusters-clusterarn-nodes-storage.html#UpdateBrokerStorage).

# Manage storage throughput for Standard brokers in a Amazon MSK cluster
<a name="msk-provision-throughput-management"></a>

For information on how to provision throughput using the Amazon MSK console, CLI, and API, see [Provision storage throughput for Standard brokers in a Amazon MSK cluster](msk-provision-throughput.md).

**Topics**
+ [Amazon MSK broker throughput bottlenecks and maximum throughput settings](#throughput-bottlenecks)
+ [Measure storage throughput of a Amazon MSK cluster](#throughput-metrics)
+ [Configuration update values for provisioned storage in a Amazon MSK cluster](#provisioned-throughput-config)
+ [Provision storage throughput for Standard brokers in a Amazon MSK cluster](msk-provision-throughput.md)

## Amazon MSK broker throughput bottlenecks and maximum throughput settings
<a name="throughput-bottlenecks"></a>

There are multiple causes of bottlenecks in broker throughput: volume throughput, Amazon EC2 to Amazon EBS network throughput, and Amazon EC2 egress throughput. You can enable provisioned storage throughput to adjust volume throughput. However, broker throughput limitations can be caused by Amazon EC2 to Amazon EBS network throughput and Amazon EC2 egress throughput. 

Amazon EC2 egress throughput is impacted by the number of consumer groups and consumers per consumer groups. Also, both Amazon EC2 to Amazon EBS network throughput and Amazon EC2 egress throughput are higher for larger broker sizes.

For volume sizes of 10 GiB or larger, you can provision storage throughput of 250 MiB per second or greater. 250 MiB per second is the default. To provision storage throughput, you must choose broker size kafka.m5.4xlarge or larger (or kafka.m7g.2xlarge or larger), and you can specify maximum throughput as shown in the following table.


****  

| broker size | Maximum storage throughput (MiB/second) | 
| --- | --- | 
| kafka.m5.4xlarge | 593 | 
| kafka.m5.8xlarge | 850 | 
| kafka.m5.12xlarge | 1000 | 
| kafka.m5.16xlarge | 1000 | 
| kafka.m5.24xlarge | 1000 | 
| kafka.m7g.2xlarge | 312.5 | 
| kafka.m7g.4xlarge | 625 | 
| kafka.m7g.8xlarge | 1000 | 
| kafka.m7g.12xlarge | 1000 | 
| kafka.m7g.16xlarge | 1000 | 

## Measure storage throughput of a Amazon MSK cluster
<a name="throughput-metrics"></a>

You can use the `VolumeReadBytes` and `VolumeWriteBytes` metrics to measure the average storage throughput of a cluster. The sum of these two metrics gives the average storage throughput in bytes. To get the average storage throughput for a cluster, set these two metrics to SUM and the period to 1 minute, then use the following formula.

```
Average storage throughput in MiB/s = (Sum(VolumeReadBytes) + Sum(VolumeWriteBytes)) / (60 * 1024 * 1024)
```

For information about the `VolumeReadBytes` and `VolumeWriteBytes` metrics, see [`PER_BROKER` Level monitoring](metrics-details.md#broker-metrics).

## Configuration update values for provisioned storage in a Amazon MSK cluster
<a name="provisioned-throughput-config"></a>

You can update your Amazon MSK configuration either before or after you turn on provisioned throughput. However, you won't see the desired throughput until you perform both actions: update the `num.replica.fetchers` configuration parameter and turn on provisioned throughput.

In the default Amazon MSK configuration, `num.replica.fetchers` has a value of 2. To update your `num.replica.fetchers`, you can use the suggested values from the following table. These values are for guidance purposes. We recommend that you adjust these values based on your use case.


****  

| broker size | num.replica.fetchers | 
| --- | --- | 
| kafka.m5.4xlarge | 4 | 
| kafka.m5.8xlarge | 8 | 
| kafka.m5.12xlarge | 14 | 
| kafka.m5.16xlarge | 16 | 
| kafka.m5.24xlarge | 16 | 

Your updated configuration may not take effect for up to 24 hours, and may take longer when a source volume is not fully utilized. However, transitional volume performance at least equals the performance of source storage volumes during the migration period. A fully-utilized 1 TiB volume typically takes about six hours to migrate to an updated configuration.

# Provision storage throughput for Standard brokers in a Amazon MSK cluster
<a name="msk-provision-throughput"></a>

Amazon MSK brokers persist data on storage volumes. Storage I/O is consumed when producers write to the cluster, when data is replicated between brokers, and when consumers read data that isn't in memory. The volume storage throughput is the rate at which data can be written into and read from a storage volume. Provisioned storage throughput is the ability to specify that rate for the brokers in your cluster.

You can specify the provisioned throughput rate in MiB per second for clusters whose brokers are of size `kafka.m5.4xlarge` or larger and if the storage volume is 10 GiB or greater. It is possible to specify provisioned throughput during cluster creation. You can also enable or disable provisioned throughput for a cluster that is in the `ACTIVE` state.

For information about managing throughput, see [Manage storage throughput for Standard brokers in a Amazon MSK cluster](msk-provision-throughput-management.md).

**Topics**
+ [Provision Amazon MSK cluster storage throughput using the AWS Management Console](#provisioned-throughput-console)
+ [Provision Amazon MSK cluster storage throughput using the AWS CLI](#provisioned-throughput-cli)
+ [Provision storage throughput when creating a Amazon MSK cluster using the API](#provisioned-throughput-api)

## Provision Amazon MSK cluster storage throughput using the AWS Management Console
<a name="provisioned-throughput-console"></a>

This process shows an example of how you can use the AWS Management Console to create a Amazon MSK cluster with provisioned throughput enabled.

1. Sign in to the AWS Management Console, and open the Amazon MSK console at [https://console.aws.amazon.com/msk/home?region=us-east-1\$1/home/](https://console.aws.amazon.com/msk/home?region=us-east-1#/home/).

1. Choose **Create cluster**.

1. Choose **Custom create**.

1. Specify a name for the cluster.

1. In the **Storage** section, choose **Enable**.

1. Choose a value for storage throughput per broker.

1. Choose a VPC, zones and subnets, and a security group.

1. Choose **Next**.

1. At the bottom of the **Security** step, choose **Next**.

1. At the bottom of the **Monitoring and tags** step, choose **Next**.

1. Review the cluster settings, then choose **Create cluster**.

## Provision Amazon MSK cluster storage throughput using the AWS CLI
<a name="provisioned-throughput-cli"></a>

This process shows an example of how you can use the AWS CLI to create a cluster with provisioned throughput enabled.

1. Copy the following JSON and paste it into a file. Replace the subnet IDs and security group ID placeholders with values from your account. Name the file `cluster-creation.json` and save it.

   ```
   {
       "Provisioned": {
           "BrokerNodeGroupInfo":{
               "InstanceType":"kafka.m5.4xlarge",
               "ClientSubnets":[
                   "Subnet-1-ID",
                   "Subnet-2-ID"
               ],
               "SecurityGroups":[
                   "Security-Group-ID"
               ],
               "StorageInfo": {
                   "EbsStorageInfo": {
                       "VolumeSize": 10,
                       "ProvisionedThroughput": {
                           "Enabled": true,
                           "VolumeThroughput": 250
                       }
                   }
               }
           },
           "EncryptionInfo": {
               "EncryptionInTransit": {
                   "InCluster": false,
                   "ClientBroker": "PLAINTEXT"
               }
           },
           "KafkaVersion":"2.8.1",
           "NumberOfBrokerNodes": 2
       },
       "ClusterName": "provisioned-throughput-example"
   }
   ```

1. Run the following AWS CLI command from the directory where you saved the JSON file in the previous step.

   ```
   aws kafka create-cluster-v2 --cli-input-json file://cluster-creation.json
   ```

## Provision storage throughput when creating a Amazon MSK cluster using the API
<a name="provisioned-throughput-api"></a>

To configure provisioned storage throughput while creating a cluster, use [CreateClusterV2](https://docs.aws.amazon.com/MSK/2.0/APIReference/v2-clusters.html#CreateClusterV2).