# Operational best practices for Amazon OpenSearch Service
<a name="bp"></a>

This chapter provides best practices for operating Amazon OpenSearch Service domains and includes general guidelines that apply to many use cases. Each workload is unique, with unique characteristics, so no generic recommendation is exactly right for every use case. The most important best practice is to deploy, test, and tune your domains in a continuous cycle to find the optimal configuration, stability, and cost for your workload.

**Topics**
+ [Monitoring and alerting](#bp-monitoring)
+ [Shard strategy](#bp-sharding-strategy)
+ [Stability](#bp-stability)
+ [Performance](#bp-perf)
+ [Security](#bp-security)
+ [Cost optimization](#bp-cost-optimization)
+ [Recommended CloudWatch alarms for Amazon OpenSearch Service](cloudwatch-alarms.md)
+ [Sizing Amazon OpenSearch Service domains](sizing-domains.md)
+ [Petabyte scale in Amazon OpenSearch Service](petabyte-scale.md)
+ [Dedicated coordinator nodes in Amazon OpenSearch Service](Dedicated-coordinator-nodes.md)
+ [Dedicated master nodes in Amazon OpenSearch Service](managedomains-dedicatedmasternodes.md)

## Monitoring and alerting
<a name="bp-monitoring"></a>

The following best practices apply to monitoring your OpenSearch Service domains.

### Configure CloudWatch alarms
<a name="bp-monitoring-cw"></a>

OpenSearch Service emits performance metrics to Amazon CloudWatch. Regularly review your [cluster and instance metrics](managedomains-cloudwatchmetrics.md) and configure [recommended CloudWatch alarms](cloudwatch-alarms.md) based on your workload performance.

### Enable log publishing
<a name="bp-monitoring-logs"></a>

OpenSearch Service exposes OpenSearch error logs, search slow logs, indexing slow logs, and audit logs in Amazon CloudWatch Logs. Search slow logs, indexing slow logs, and error logs are useful for troubleshooting performance and stability issues. Audit logs, which are only available if you enable [fine-grained access control](fgac.md) to track user activity. For more information, see [Logs](https://opensearch.org/docs/latest/monitoring-your-cluster/logs/) in the OpenSearch documentation.

Search slow logs and indexing slow logs are an important tool for understanding and troubleshooting the performance of your search and indexing operations. [Enable search and index slow log delivery](createdomain-configure-slow-logs.md#createdomain-configure-slow-logs-console) for all production domains. You must also [configure logging thresholds](createdomain-configure-slow-logs.md#createdomain-configure-slow-logs-indices)—otherwise, CloudWatch won't capture the logs.

## Shard strategy
<a name="bp-sharding-strategy"></a>

Shards distribute your workload across the data nodes in your OpenSearch Service domain. Properly configured indexes can help boost overall domain performance.

When you send data to OpenSearch Service, you send that data to an index. An index is analogous to a database table, with *documents* as the rows, and *fields* as the columns. When you create the index, you tell OpenSearch how many primary shards you want to create. The primary shards are independent partitions of the full dataset. OpenSearch Service automatically distributes your data across the primary shards in an index. You can also configure *replicas* of the index. Each replica shard comprises a full set of copies of the primary shards for that index.

OpenSearch Service maps the shards for each index across the data nodes in your cluster. It ensures that the primary and replica shards for the index reside on different data nodes. The first replica ensures that you have two copies of the data in the index. You should always use at least one replica. Additional replicas provide additional redundancy and read capacity.

OpenSearch sends indexing requests to all of the data nodes that contain shards that belong to the index. It sends indexing requests first to data nodes that contain primary shards, and then to data nodes that contain replica shards. Search requests are routed by the coordinator node to either a primary or replica shard for all shards belonging to the index.

For example, for an index with five primary shards and one replica, each indexing request touches 10 shards. In contrast, search requests are sent to *n* shards, where *n * is the number of primary shards. For an index with five primary shards and one replica, each search query touches five shards (primary or replica) from that index.

### Determine shard and data node counts
<a name="bp-shard-count"></a>

Use the following best practices to determine shard and data node counts for your domain.

**Shard size** – The size of data on disk is a direct result of the size of your source data, and it changes as you index more data. The source-to-index ratio can vary wildly, from 1:10 to 10:1 or more, but usually it's around 1:1.10. You can use that ratio to predict the index size on disk. You can also index some data and retrieve the actual index sizes to determine the ratio for your workload. After you have a predicted index size, set a shard count so that each shard will be between 10–30 GiB (for search workloads), or between 30–50 GiB (for logs workloads). 50 GiB should be the maximum—be sure to plan for growth.

**Shard count** – The distribution of shards to data nodes has a large impact on a domain’s performance. When you have indexes with multiple shards, try to make the shard count a multiple of the data node count. This helps to ensure that shards are evenly distributed across data nodes, and prevents hot nodes. For example, if you have 12 primary shards, your data node count should be 2, 3, 4, 6, or 12. However, shard count is secondary to shard size—if you have 5 GiB of data, you should still use a single shard. 

**Shards per data node** – The total number of shards that a node can hold is proportional to the node’s Java virtual machine (JVM) heap memory. Aim for 25 shards or fewer per GiB of heap memory. For example, a node with 32 GiB of heap memory should hold no more than 800 shards. Although shard distribution can vary based on your workload patterns, there's a limit of 1,000 shards per node for Elasticsearch and OpenSearch 1.1 to 2.15 and 4,000 for OpenSearch 2.17 and above. The [cat/allocation](https://opensearch.org/docs/latest/api-reference/cat/cat-allocation/) API provides a quick view of the number of shards and total shard storage across data nodes.

**Shard to CPU ratio** – When a shard is involved in an indexing or search request, it uses a vCPU to process the request. As a best practice, use an initial scale point of 1.5 vCPU per shard. If your instance type has 8 vCPUs, set your data node count so that each node has no more than six shards. Note that this is an approximation. Be sure to test your workload and scale your cluster accordingly.

For storage volume, shard size, and instance type recommendations, see the following resources:
+ [Sizing Amazon OpenSearch Service domains](sizing-domains.md)
+ [Petabyte scale in Amazon OpenSearch Service](petabyte-scale.md)

### Avoid storage skew
<a name="bp-sharding-skew"></a>

Storage skew occurs when one or more nodes within a cluster holds a higher proportion of storage for one or more indexes than the others. Indications of storage skew include uneven CPU utilization, intermittent and uneven latency, and uneven queueing across data nodes. To determine whether you have skew issues, see the following troubleshooting sections:
+ [Node shard and storage skew](handling-errors.md#handling-errors-node-skew)
+ [Index shard and storage skew](handling-errors.md#handling-errors-index-skew)

## Stability
<a name="bp-stability"></a>

The following best practices apply to maintaining a stable and healthy OpenSearch Service domain.

### Keep current with OpenSearch
<a name="bp-stability-current"></a>

**Service software updates**

OpenSearch Service regularly releases [software updates](service-software.md) that add features or otherwise improve your domains. Updates don't change the OpenSearch or Elasticsearch engine version. We recommend that you schedule a recurring time to run the [DescribeDomain](https://docs.aws.amazon.com/opensearch-service/latest/APIReference/API_DescribeDomain.html) API operation, and initiate a service software update if the `UpdateStatus` is `ELIGIBLE`. If you don't update your domain within a certain time frame (typically two weeks), OpenSearch Service automatically performs the update.

**OpenSearch version upgrades**

OpenSearch Service regularly adds support for community-maintained versions of OpenSearch. Always upgrade to the latest OpenSearch versions when they're available. 

OpenSearch Service simultaneously upgrades both OpenSearch and OpenSearch Dashboards (or Elasticsearch and Kibana if your domain is running a legacy engine). If the cluster has dedicated master nodes, upgrades complete without downtime. Otherwise, the cluster might be unresponsive for several seconds post-upgrade while it elects a master node. OpenSearch Dashboards might be unavailable during some or all of the upgrade.

There are two ways to upgrade a domain:
+ [In-place upgrade](starting-upgrades.md) – This option is easier because you keep the same cluster.
+ [Snapshot/restore upgrade](snapshot-based-migration.md) – This option is good for testing new versions on a new cluster or migrating between clusters.

Regardless of which upgrade process you use, we recommend that you maintain a domain that is solely for development and testing, and upgrade it to the new version *before* you upgrade your production domain. Choose **Development and testing** for the deployment type when you're creating the test domain. Make sure to upgrade all clients to compatible versions immediately following the domain upgrade.

### Improve snapshot performance
<a name="bp-stability-snapshots"></a>

To prevent your snapshot from getting stuck in processing, the instance type for the dedicated master node should match the shard count. For more information, see [Choosing instance types for dedicated master nodes](managedomains-dedicatedmasternodes.md#dedicatedmasternodes-instance). Additionally, each node should have no more than the recommended 25 shards per GiB of Java heap memory. For more information, see [Choosing the number of shards](bp-sharding.md).

### Enable dedicated master nodes
<a name="bp-stability-master"></a>

[Dedicated master nodes](managedomains-dedicatedmasternodes.md) improve cluster stability. A dedicated master node performs cluster management tasks, but doesn't hold index data or respond to client requests. This offloading of cluster management tasks increases the stability of your domain and makes it possible for some [configuration changes](managedomains-configuration-changes.md) to happen without downtime.

Enable and use three dedicated master nodes for optimal domain stability across three Availability Zones. Deploying with [Multi-AZ with Standby](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-multiaz.html#managedomains-za-standby) configures three dedicated master nodes for you. For instance type recommendations, see [Choosing instance types for dedicated master nodes](managedomains-dedicatedmasternodes.md#dedicatedmasternodes-instance).

### Deploy across multiple Availability Zones
<a name="bp-stability-az"></a>

To prevent data loss and minimize cluster downtime in the event of a service disruption, you can distribute nodes across two or three [Availability Zones](managedomains-multiaz.md) in the same AWS Region. Best practice is to deploy using [Multi-AZ with Standby](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-multiaz.html#managedomains-za-standby), which configures three Availability Zones, with two zones active and one acting as a standby, and with and two replica shards per index. This configuration lets OpenSearch Service distribute replica shards to different AZs than their corresponding primary shards. There are no cross-AZ data transfer charges for cluster communications between Availability Zones.

Availability Zones are isolated locations within each Region. With a two-AZ configuration, losing one Availability Zone means that you lose half of all domain capacity. Moving to three Availability Zones further reduces the impact of losing a single Availability Zone.

### Control ingest flow and buffering
<a name="bp-stability-ingest"></a>

We recommend that you limit the overall request count using the [\$1bulk](https://opensearch.org/docs/latest/api-reference/document-apis/bulk/) API operation. It's more efficient to send one `_bulk` request that contains 5,000 documents than it is to send 5,000 requests that contain a single document.

For optimal operational stability, it's sometimes necessary to limit or even pause the upstream flow of indexing requests. Limiting the rate of index requests is an important mechanism for dealing with unexpected or occasional spikes in requests that might otherwise overwhelm the cluster. Consider building a flow control mechanism into your upstream architecture.

The following diagram shows multiple component options for a log ingest architecture. Configure the aggregation layer to allow sufficient space to buffer incoming data for sudden traffic spikes and brief domain maintenance.

![\[Log ingest architecture with producers, collectors, aggregators, and dashboards components.\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/images/log-ingest.png)


### Create mappings for search workloads
<a name="bp-stability-mappings"></a>

For search workloads, create [mappings](https://opensearch.org/docs/latest/field-types/index/) that define how OpenSearch stores and indexes documents and their fields. Set `dynamic` to `strict` in order to prevent new fields from being added accidentally.

```
PUT my-index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": { "type" : "text" },
      "author": { "type" : "integer" },
      "year": { "type" : "text" }
    }
  }
}
```

### Use index templates
<a name="bp-stability-templates"></a>

You can use an [index template](https://opensearch.org/docs/latest/opensearch/index-templates/) as a way to tell OpenSearch how to configure an index when it's created. Configure index templates before creating indexes. Then, when you create an index, it inherits the settings and mappings from the template. You can apply more than one template to a single index, so you can specify settings in one template and mappings in another. This strategy allows one template for common settings across multiple indexes, and separate templates for more specific settings and mappings.

The following settings are helpful to configure in templates:
+ Number of primary and replica shards
+ Refresh interval (how often to refresh and make recent changes to the index available to search)
+ Dynamic mapping control
+ Explicit field mappings

The following example template contains each of these settings:

```
{
   "index_patterns":[
      "index-*"
   ],
   "order": 0,
   "settings": {
      "index": {
         "number_of_shards": 3,
         "number_of_replicas": 1,
         "refresh_interval": "60s"
      }
   },
   "mappings": {
      "dynamic": false,
      "properties": {
         "field_name1": {
            "type": "keyword"
         }
      }
   }
}
```

Even if they rarely change, having settings and mappings defined centrally in OpenSearch is simpler to manage than updating multiple upstream clients.

### Manage indexes with Index State Management
<a name="bp-stability-ism"></a>

If you're managing logs or time-series data, we recommend using [Index State Management](ism.md) (ISM). ISM lets you automate regular index lifecycle management tasks. With ISM, you can create policies that invoke index alias rollovers, take index snapshots, move indexes between storage tiers, and delete old indexes. You can even use the ISM [rollover](https://opensearch.org/docs/latest/im-plugin/ism/policies/#rollover) operation as an alternative data lifecycle management strategy to avoid shard skew.

First, set up an ISM policy. For example, see [Sample policies](ism.md#ism-example). Then, attach the policy to one or more indexes. If you include an [ISM template](ism.md#ism-template) field in the policy, OpenSearch Service automatically applies the policy to any index that matches the specified pattern.

### Remove unused indexes
<a name="bp-stability-remove"></a>

Regularly review the indexes in your cluster and identify any that aren't in use. Take a snapshot of those indexes so that they're stored in S3, and then delete them. When you remove unused indexes, you reduce the shard count, and make it possible to have more balanced storage distribution and resource utilization across nodes. Even when they're idle, indexes consume some resources during internal index maintenance activities.

Rather than manually deleting unused indexes, you can use ISM to automatically take a snapshot and delete indexes after a certain period of time.

### Use multiple domains for high availability
<a name="bp-stability-ha"></a>

To achieve high availability beyond [99.9% uptime](https://aws.amazon.com/opensearch-service/sla/) across multiple Regions, consider using two domains. For small or slowly changing datasets, you can set up [cross-cluster replication](replication.md) to maintain an active-passive model. In this model, only the leader domain is written to, but either domain can be read from. For larger data sets and quickly changing data, configure dual delivery in your ingest pipeline so that all data is written independently to both domains in an active-active model.

Architect your upstream and downstream applications with failover in mind. Make sure to test the failover process along with other disaster recovery processes.

## Performance
<a name="bp-perf"></a>

The following best practices apply to tuning your domains for optimal performance.

### Optimize bulk request size and compression
<a name="bp-perf-bulk"></a>

Bulk sizing depends on your data, analysis, and cluster configuration, but a good starting point is 3–5 MiB per bulk request.

Send requests and receive responses from your OpenSearch domains by using [gzip compression](gzip.md) to reduce the payload size of requests and responses. You can use gzip compression with the [OpenSearch Python client](gzip.md#gzip-code), or by including the following [headers](gzip.md#gzip-headers) from the client side:
+ `'Accept-Encoding': 'gzip'`
+ `'Content-Encoding': 'gzip'`

To optimize your bulk request sizes, start with a bulk request size of 3 MiB. Then, slowly increase the request size until indexing performance stops improving.

**Note**  
To enable gzip compression on domains running Elasticsearch version 6.x, you must set `http_compression.enabled` at the cluster level. This setting is true by default in Elasticsearch versions 7.x and all versions of OpenSearch.

### Reduce the size of bulk request responses
<a name="bp-perf-response-time"></a>

To reduce the size of OpenSearch responses, exclude unnecessary fields with the `filter_path` parameter. Make sure that you don't filter out any fields that are required to identify or retry failed requests. For more information and examples, see [Reducing response size](indexing.md#indexing-size).

### Tune refresh intervals
<a name="bp-perf-refresh"></a>

OpenSearch indexes have eventual read consistency. A refresh operation makes all the updates that are performed on an index available for search. The default refresh interval is one second, which means that OpenSearch performs a refresh every second while an index is being written to.

The less frequently that you refresh an index (higher refresh interval), the better the overall indexing performance is. The trade-off of increasing the refresh interval is that there’s a longer delay between an index update and when the new data is available for search. Set your refresh interval as high as you can tolerate to improve overall performance.

We recommend setting the `refresh_interval` parameter for all of your indexes to 30 seconds or more.

### Enable Auto-Tune
<a name="bp-perf-autotune"></a>

[Auto-Tune](auto-tune.md) uses performance and usage metrics from your OpenSearch cluster to suggest changes to queue sizes, cache sizes, and Java virtual machine (JVM) settings on your nodes. These optional changes improve cluster speed and stability. You can revert to the default OpenSearch Service settings at any time. Auto-Tune is enabled by default on new domains unless you explicitly disable it.

We recommend that you enable Auto-Tune on all domains, and either set a recurring maintenance window or periodically review its recommendations. 

## Security
<a name="bp-security"></a>

The following best practices apply to securing your domains.

### Enable fine-grained access control
<a name="bp-security-fgac"></a>

[Fine-grained access control](fgac.md) lets you control who can access certain data within an OpenSearch Service domain. Compared to generalized access control, fine-grained access control gives each cluster, index, document, and field its own specified policy for access. Access criteria can be based on a number of factors, including the role of the person who is requesting access and the action that they intend to perform on the data. For example, you might give one user access to write to an index, and another user access only to read the data on the index without making any changes.

Fine-grained access control allows data with different access requirements to exist in the same storage space without running into security or compliance issues.

We recommend enabling fine-grained access control on your domains.

### Deploy domains within a VPC
<a name="bp-security-vpc"></a>

Placing your OpenSearch Service domain within a virtual private cloud (VPC) helps enable secure communication between OpenSearch Service and other services within the VPC—without the need for an internet gateway, NAT device, or VPN connection. All traffic remains securely within the AWS Cloud. Because of their logical isolation, domains that reside within a VPC have an extra layer of security compared to domains that use public endpoints.

We recommend that you [create your domains within a VPC](vpc.md).

### Apply a restrictive access policy
<a name="bp-security-iam"></a>

Even if your domain is deployed within a VPC, it's a best practice to implement security in layers. Make sure to [check the configuration](createupdatedomains.md#createdomain-configure-access-policies) of your current access policies.

Apply a restrictive [resource-based access policy](ac.md#ac-types-resource) to your domains and follow the [principle of least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege) when granting access to the configuration API and the OpenSearch API operations. As a general rule, avoid using the anonymous user principal `"Principal": {"AWS": "*" }` in your access policies. 

There are some situations, however, where it's acceptable to use an open access policy, such as when you enable fine-grained access control. An open access policy can enable you to access the domain in cases where request signing is difficult or impossible, such as from certain clients and tools.

### Enable encryption at rest
<a name="bp-security-vpc"></a>

OpenSearch Service domains offer encryption of data at rest to help prevent unauthorized access to your data. Encryption at rest uses AWS Key Management Service (AWS KMS) to store and manage your encryption keys, and the Advanced Encryption Standard algorithm with 256-bit keys (AES-256) to perform the encryption.

If your domain stores sensitive data, [enable encryption of data at rest](encryption-at-rest.md).

### Enable node-to-node encryption
<a name="bp-security-ntn"></a>

Node-to-node encryption provides an additional layer of security on top of the default security features within OpenSearch Service. It implements Transport Layer Security (TLS) for all communications between the nodes that are provisioned within OpenSearch. Node-to-node encryption, any data sent to your OpenSearch Service domain over HTTPS remains encrypted in transit while it's being distributed and replicated between nodes.

If your domain stores sensitive data, [enable node-to-node encryption](ntn.md).

### Monitor with AWS Security Hub CSPM
<a name="bp-security-hub"></a>

Monitor your usage of OpenSearch Service as it relates to security best practices by using [AWS Security Hub CSPM](https://docs.aws.amazon.com/securityhub/latest/userguide/what-is-securityhub.html). Security Hub CSPM uses security controls to evaluate resource configurations and security standards to help you comply with various compliance frameworks. For more information about using Security Hub CSPM to evaluate OpenSearch Service resources, see [Amazon OpenSearch Service controls](https://docs.aws.amazon.com/securityhub/latest/userguide/opensearch-controls.html) in the *AWS Security Hub User Guide*. 

## Cost optimization
<a name="bp-cost-optimization"></a>

The following best practices apply to optimizing and saving on your OpenSearch Service costs.

### Use the latest generation instance types
<a name="bp-cost-optimization-instances"></a>

OpenSearch Service is always adopting new Amazon EC2 [instances types](supported-instance-types.md) that deliver better performance at a lower cost. We recommend always using the latest generation instances. 

Avoid using T2 or `t3.small` instances for production domains because they can become unstable under sustained heavy load. `r6g.large` instances are an option for small production workloads (both as data nodes and as dedicated master nodes).

### Use the latest Amazon EBS gp3 volumes
<a name="bp-cost-optimization-gp3"></a>

OpenSearch data nodes require low latency and high throughput storage to provide fast indexing and query. By using Amazon EBS gp3 volumes, you get higher baseline performance (IOPS and throughput) at a 9.6% lower cost than with the previously-offered Amazon EBS gp2 volume type. You can provision additional IOPS and throughput independent of volume size using gp3. These volumes are also more stable than previous generation volumes as they do not use burst credits. The gp3 volume type also doubles the per-data-node volume size limits of the gp2 volume type. With these larger volumes, you can reduce the cost of passive data by increasing the amount of storage per data node.

### Use UltraWarm and cold storage for time-series log data
<a name="bp-cost-optimization-uw-cold"></a>

If you're using OpenSearch for log analytics, move your data to UltraWarm or cold storage to reduce costs. Use Index State Management (ISM) to migrate data between storage tiers and manage data retention.

[UltraWarm](ultrawarm.md) provides a cost-effective way to store large amounts of read-only data in OpenSearch Service. UltraWarm uses Amazon S3 for storage, which means that the data is immutable and only one copy is needed. You only pay for storage that's equivalent to the size of the primary shards in your indexes. Latencies for UltraWarm queries grow with the amount of S3 data that's needed to service the query. After the data has been cached on the nodes, queries to UltraWarm indexes perform similar to queries to hot indexes.

[Cold storage](cold-storage.md) is also backed by S3. When you need to query cold data, you can selectively attach it to existing UltraWarm nodes. Cold data incurs the same managed storage cost as UltraWarm, but objects in cold storage don't consume UltraWarm node resources. Therefore, cold storage provides a significant amount of storage capacity without impacting UltraWarm node size or count.

UltraWarm becomes cost-effective when you have roughly 2.5 TiB of data to migrate from hot storage. Monitor your fill rate and plan to move indexes to UltraWarm before you reach that volume of data.

### Review recommendations for Reserved Instances
<a name="bp-cost-optimization-ri"></a>

Consider purchasing [Reserved Instances](ri.md) (RIs) after you have a good baseline on your performance and compute consumption. Discounts start at around 30% for no-upfront, 1-year reservations and can increase up to 50% for all-upfront, 3-year commitments.

After you observe stable operation for at least 14 days, review [Accessing reservation recommendations](https://docs.aws.amazon.com/cost-management/latest/userguide/ri-recommendations.html) in the *AWS Cost Management User Guide*. The **Amazon OpenSearch Service** heading displays specific RI purchase recommendations and projected savings.

# Recommended CloudWatch alarms for Amazon OpenSearch Service
<a name="cloudwatch-alarms"></a>

CloudWatch alarms perform an action when a CloudWatch metric exceeds a specified value for some amount of time. For example, you might want AWS to email you if your cluster health status is `red` for longer than one minute. This section includes some recommended alarms for Amazon OpenSearch Service and how to respond to them.

You can automatically deploy these alarms using CloudFormation. For a sample stack, see the related [GitHub repository](https://github.com/ev2900/OpenSearch_CloudWatch_Alarms).

**Note**  
If you deploy the CloudFormation stack, the `KMSKeyError` and `KMSKeyInaccessible` alarms will exists in an `Insufficient Data` state because these metrics only appear if a domain encounters a problem with its encryption key.

For more information about configuring alarms, see [Creating Amazon CloudWatch Alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/cloudwatch-alarms.html)

**Note**  
If you just want to *view* metrics, see [Monitoring OpenSearch cluster metrics with Amazon CloudWatch](managedomains-cloudwatchmetrics.md).

## Other alarms you might consider
<a name="cw-alarms-additional"></a>

Consider configuring the following alarms depending on which OpenSearch Service features you regularly use. 

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/cloudwatch-alarms.html)

# Sizing Amazon OpenSearch Service domains
<a name="sizing-domains"></a>

There's no perfect method of sizing Amazon OpenSearch Service domains. However, by starting with an understanding of your storage needs, the service, and OpenSearch itself, you can make an educated initial estimate on your hardware needs. This estimate can serve as a useful starting point for the most critical aspect of sizing domains: testing them with representative workloads and monitoring their performance.

**Topics**
+ [Calculating storage requirements](bp-storage.md)
+ [Choosing the number of shards](bp-sharding.md)
+ [Choosing instance types and testing](bp-instances.md)

# Calculating storage requirements
<a name="bp-storage"></a>

Most OpenSearch workloads fall into one of two broad categories:
+ **Long-lived index**: You write code that processes data into one or more OpenSearch indexes and then updates those indexes periodically as the source data changes. Some common examples are website, document, and ecommerce search.
+ **Rolling indexes**: Data continuously flows into a set of temporary indexes, with an indexing period and retention window (such as a set of daily indexes that is retained for two weeks). Some common examples are log analytics, time-series processing, and clickstream analytics.

For long-lived index workloads, you can examine the source data on disk and easily determine how much storage space it consumes. If the data comes from multiple sources, just add those sources together.

For rolling indexes, you can multiply the amount of data generated during a representative time period by the retention period. For example, if you generate 200 MiB of log data per hour, that's 4.7 GiB per day, which is 66 GiB of data at any given time if you have a two-week retention period.

The size of your source data, however, is just one aspect of your storage requirements. You also have to consider the following:
+ **Number of replicas**: Each replica is a full copy of the primary shard, the store size of the index shows the size taken by the primary and replica shard. By default, each OpenSearch index has one replica. We recommend at least one to prevent data loss. Replicas also improve search performance, so you might want more if you have a read-heavy workload. Use `PUT /my-index/_settings` to update the `number_of_replicas` setting for your index.
+ **OpenSearch indexing overhead**: The on-disk size of an index varies. The total size of the source data plus the index is often 110% of the source, with the index up to 10% of the source data. After you index your data, you can use the `_cat/indices?v` API and `pri.store.size` value to calculate the exact overhead. `_cat/allocation?v` also provides a useful summary.
+ **Operating system reserved space**: By default, Linux reserves 5% of the file system for the `root` user for critical processes, system recovery, and to safeguard against disk fragmentation problems.
+ **OpenSearch Service overhead**: OpenSearch Service reserves 20% of the storage space of each instance (up to 20 GiB) for segment merges, logs, and other internal operations.

  Because of this 20 GiB maximum, the total amount of reserved space can vary dramatically depending on the number of instances in your domain. For example, a domain might have three `m6g.xlarge.search` instances, each with 500 GiB of storage space, for a total of 1.46 TiB. In this case, the total reserved space is only 60 GiB. Another domain might have 10 `m3.medium.search` instances, each with 100 GiB of storage space, for a total of 0.98 TiB. Here, the total reserved space is 200 GiB, even though the first domain is 50% larger.

  In the following formula, we apply a "worst-case" estimate for overhead. This estimate includes additional free space to help minimize the impact of node failures and Availability Zone outages.

In summary, if you have 66 GiB of data at any given time and want one replica, your *minimum* storage requirement is closer to 66 \$1 2 \$1 1.1 / 0.95 / 0.8 = 191 GiB. You can generalize this calculation as follows:

 **Source data \$1 (1 \$1 number of replicas) \$1 (1 \$1 indexing overhead) / (1 - Linux reserved space) / (1 - OpenSearch Service overhead) = minimum storage requirement**

Or you can use this simplified version:

 **Source data \$1 (1 \$1 number of replicas) \$1 1.45 = minimum storage requirement**

Insufficient storage space is one of the most common causes of cluster instability. So you should cross-check the numbers when you [choose instance types, instance counts, and storage volumes](bp-instances.md).

Other storage considerations exist:
+ If your minimum storage requirement exceeds 1 PB, see [Petabyte scale in Amazon OpenSearch Service](petabyte-scale.md).
+ If you have rolling indexes and want to use a hot-warm architecture, see [UltraWarm storage for Amazon OpenSearch Service](ultrawarm.md).

# Choosing the number of shards
<a name="bp-sharding"></a>

After you understand your storage requirements, you can investigate your indexing strategy. By default in OpenSearch Service, each index is divided into five primary shards and one replica (total of 10 shards). This behavior differs from open source OpenSearch, which defaults to one primary and one replica shard. Because you can't easily change the number of primary shards for an existing index, you should decide about shard count *before* indexing your first document.

The overall goal of choosing a number of shards is to distribute an index evenly across all data nodes in the cluster. However, these shards shouldn't be too large or too numerous. A general guideline is to try to keep shard size between 10–30 GiB for workloads where search latency is a key performance objective, and 30–50 GiB for write-heavy workloads such as log analytics. 

Large shards can make it difficult for OpenSearch to recover from failure, but because each shard uses some amount of CPU and memory, having too many small shards can cause performance issues and out of memory errors. In other words, shards should be small enough that the underlying OpenSearch Service instance can handle them, but not so small that they place needless strain on the hardware.

For example, suppose you have 66 GiB of data. You don't expect that number to increase over time, and you want to keep your shards around 30 GiB each. Your number of shards therefore should be approximately 66 \$1 1.1 / 30 = 3. You can generalize this calculation as follows:

 **(Source data \$1 room to grow) \$1 (1 \$1 indexing overhead) / desired shard size = approximate number of primary shards**

This equation helps compensate for data growth over time. If you expect those same 66 GiB of data to quadruple over the next year, the approximate number of shards is (66 \$1 198) \$1 1.1 / 30 = 10. Remember, though, you don't have those extra 198 GiB of data *yet*. Check to make sure that this preparation for the future doesn't create unnecessarily tiny shards that consume huge amounts of CPU and memory in the present. In this case, 66 \$1 1.1 / 10 shards = 7.26 GiB per shard, which will consume extra resources and is below the recommended size range. You might consider the more middle-of-the-road approach of six shards, which leaves you with 12-GiB shards today and 48-GiB shards in the future. Then again, you might prefer to start with three shards and reindex your data when the shards exceed 50 GiB.

A far less common issue involves limiting the number of shards per node. If you size your shards appropriately, you typically run out of disk space long before encountering this limit. For example, an `m6g.large.search` instance has a maximum disk size of 512 GiB. If you stay below 80% disk usage and size your shards at 20 GiB, it can accommodate approximately 20 shards. Elasticsearch 7.*x* and later, and all versions of OpenSearch up to 2.15, have a limit of *1,000* shards per node. To adjust the maximum shards per node, configure the `cluster.max_shards_per_node` setting. For OpenSearch 2.17 and later, OpenSearch Service supports 1000 shards for every 16GB of JVM heap memory up to a maximum of 4000 shards per node. For an example, see [Cluster settings](https://opensearch.org/docs/latest/opensearch/rest-api/cluster-settings/#request-body). For more information about shard count, see [Shard count quotas](limits.md#shard-count).

Sizing shards appropriately almost always keeps you below this limit, but you can also consider the number of shards for each GiB of Java heap. On a given node, have no more than 25 shards per GiB of Java heap. For example, an `m5.large.search` instance has a 4-GiB heap, so each node should have no more than 100 shards. At that shard count, each shard is roughly 5 GiB in size, which is well below our recommendation.

# Choosing instance types and testing
<a name="bp-instances"></a>

After you calculate your storage requirements and choose the number of shards that you need, you can start to make hardware decisions. Hardware requirements vary dramatically by workload, but we can still offer some basic recommendations.

In general, [the storage limits](limits.md) for each instance type map to the amount of CPU and memory that you might need for light workloads. For example, an `m6g.large.search` instance has a maximum EBS volume size of 512 GiB, 2 vCPU cores, and 8 GiB of memory. If your cluster has many shards, performs taxing aggregations, updates documents frequently, or processes a large number of queries, those resources might be insufficient for your needs. If your cluster falls into one of these categories, try starting with a configuration closer to 2 vCPU cores and 8 GiB of memory for every 100 GiB of your storage requirement.

**Tip**  
For a summary of the hardware resources that are allocated to each instance type, see [Amazon OpenSearch Service pricing](https://aws.amazon.com/opensearch-service/pricing/).

Still, even those resources might be insufficient. Some OpenSearch users report that they need many times those resources to fulfill their requirements. To find the right hardware for your workload, you have to make an educated initial estimate, test with representative workloads, adjust, and test again.

## Step 1: Make an initial estimate
<a name="initial-estimate"></a>

To start, we recommend a minimum of three nodes to avoid potential OpenSearch issues, such as a *split brain* state (when a lapse in communication leads to a cluster having two master nodes). If you have three [dedicated master nodes](managedomains-dedicatedmasternodes.md), we still recommend a minimum of two data nodes for replication.

## Step 2: Calculate storage requirements per node
<a name="determine-storage"></a>

If you have a 184-GiB storage requirement and the recommended minimum number of three nodes, use the equation 184 / 3 = 61 GiB to find the amount of storage that each node needs. In this example, you might select three `m6g.large.search` instances, where each uses a 90-GiB EBS storage volume, so that you have a safety net and some room for growth over time. This configuration provides 6 vCPU cores and 24 GiB of memory, so it's suited to lighter workloads.

For a more substantial example, consider a 14 TiB (14,336 GiB) storage requirement and a heavy workload. In this case, you might choose to begin testing with 2 \$1 144 = 288 vCPU cores and 8 \$1 144 = 1152 GiB of memory. These numbers work out to approximately 18 `i3.4xlarge.search` instances. If you don't need the fast, local storage, you could also test 18 `r6g.4xlarge.search` instances, each using a 1-TiB EBS storage volume.

If your cluster includes hundreds of terabytes of data, see [Petabyte scale in Amazon OpenSearch Service](petabyte-scale.md).

## Step 3: Perform representative testing
<a name="test-sizing"></a>

After configuring the cluster, you can [add your indexes](indexing.md) using the number of shards you calculated earlier, perform some representative client testing using a realistic dataset, and [monitor CloudWatch metrics](managedomains-cloudwatchmetrics.md) to see how the cluster handles the workload.

## Step 4: Succeed or iterate
<a name="test-iterate"></a>

If performance satisfies your needs, tests succeed, and CloudWatch metrics are normal, the cluster is ready to use. Remember to [set CloudWatch alarms](cloudwatch-alarms.md) to detect unhealthy resource usage.

If performance isn't acceptable, tests fail, or `CPUUtilization` or `JVMMemoryPressure` are high, you might need to choose a different instance type (or add instances) and continue testing. As you add instances, OpenSearch automatically rebalances the distribution of shards throughout the cluster.

Because it's easier to measure the excess capacity in an overpowered cluster than the deficit in an underpowered one, we recommend starting with a larger cluster than you think you need. Next, test and scale down to an efficient cluster that has the extra resources to ensure stable operations during periods of increased activity.

Production clusters or clusters with complex states benefit from [dedicated master nodes](managedomains-dedicatedmasternodes.md), which improve performance and cluster reliability.

# Petabyte scale in Amazon OpenSearch Service
<a name="petabyte-scale"></a>

Amazon OpenSearch Service domains offer attached storage of up to 10 PB. You can configure a domain with 1000 `OR1.16xlarge.search` instance types, each with 36 TB of storage. Because of the sheer difference in scale, recommendations for domains of this size differ from [our general recommendations](bp.md). This section discusses considerations for creating domains, costs, storage, and shard size.

While this section frequently references the `i3.16xlarge.search` instance types, you can use several other instance types to reach 10 PB of total domain storage.

**Creating domains**  
Domains of this size exceed the default limit of 80 instances per domain. To request a service limit increase of up to 1000 instances per domain, open a case at the [AWS Support Center](https://console.aws.amazon.com/support/home#/).

**Pricing**  
Before creating a domain of this size, check the [Amazon OpenSearch Service pricing](https://aws.amazon.com/opensearch-service/pricing/) page to ensure that the associated costs match your expectations. Examine [UltraWarm storage for Amazon OpenSearch Service](ultrawarm.md) to see if a hot-warm architecture fits your use case.

**Storage**  
The `i3` instance types are designed to provide fast, local non-volatile memory express (NVMe) storage. Because this local storage tends to offer performance benefits when compared to Amazon Elastic Block Store, EBS volumes are not an option when you select these instance types in OpenSearch Service. If you prefer EBS storage, use another instance type, such as `r6.12xlarge.search`.

**Shard size and count**  
A common OpenSearch guideline is not to exceed 50 GB per shard. Given the number of shards necessary to accommodate large domains and the resources available to `i3.16xlarge.search` instances, we recommend a shard size of 100 GB.  
For example, if you have 450 TB of source data and want one replica, your *minimum* storage requirement is closer to 450 TB \$1 2 \$1 1.1 / 0.95 = 1.04 PB. For an explanation of this calculation, see [Calculating storage requirements](bp-storage.md). Although 1.04 PB / 15 TB = 70 instances, you might select 90 or more `i3.16xlarge.search` instances to give yourself a storage safety net, deal with node failures, and account for some variance in the amount of data over time. Each instance adds another 20 GiB to your minimum storage requirement, but for disks of this size, those 20 GiB are almost negligible.  
Controlling the number of shards is tricky. OpenSearch users often rotate indexes on a daily basis and retain data for a week or two. In this situation, you might find it useful to distinguish between "active" and "inactive" shards. Active shards are, well, actively being written to or read from. Inactive shards might service some read requests, but are largely idle. In general, you should keep the number of active shards below a few thousand. As the number of active shards approaches 10,000, considerable performance and stability risks emerge.  
To calculate the number of primary shards, use this formula: 450,000 GB \$1 1.1 / 100 GB per shard = 4,950 shards. Doubling that number to account for replicas is 9,900 shards, which represents a major concern if all shards are active. But if you rotate indexes and only 1/7th or 1/14th of the shards are active on any given day (1,414 or 707 shards, respectively), the cluster might work well. As always, the most important step of sizing and configuring your domain is to perform representative client testing using a realistic dataset.

# Dedicated coordinator nodes in Amazon OpenSearch Service
<a name="Dedicated-coordinator-nodes"></a>

Dedicated coordinator nodes in Amazon OpenSearch Service are specialized nodes that offload coordination tasks from data nodes. These tasks include managing search requests and hosting OpenSearch Dashboards. By separating these functions, dedicated coordinator nodes reduce the load on data nodes, which allows them to focus on data storage, indexing, and search operations. This improves overall cluster performance and resource utilization. 

Additionally, dedicated coordinator nodes help reduce the number of private IP addresses required for VPC configurations, which leads to more efficient network management. This setup can result in up to 15% improvement in indexing throughput and 20% better query performance, depending on workload characteristics.

## When to use dedicated coordinator nodes
<a name="dedicated-coordinator-nodes-uses"></a>

Dedicated coordinator nodes are most beneficial in the following scenarios.
+ **Large clusters** – In environments with a high volume of data or complex queries, offloading coordination tasks to dedicated nodes can improve cluster performance.
+ **Frequent queries** – Workloads involving frequent search queries or aggregations, especially those with complex date histograms or multiple aggregations, benefit from faster query processing.
+ **Heavy Dashboards use** – OpenSearch Dashboards can be resource-intensive. Offloading this responsibility to dedicated coordinator nodes reduces the strain on data nodes.

## Architecture and behavior
<a name="dedicated-coordinator-nodes-architecture"></a>

In an OpenSearch cluster, dedicated coordinator nodes handle two key responsibilities.
+ **Request handling** – These nodes receive incoming search requests and forward them to the appropriate data nodes, which store the relevant data. They then consolidate the results from multiple data nodes into a single global result set, which is returned to the client.
+ **Dashboards hosting** – Coordinator nodes manage OpenSearch Dashboards, which relieves data nodes from the additional burden of hosting OpenSearch Dashboards and handling related traffic.

In VPC domains, dedicated coordinator nodes are assigned Elastic Network Interfaces (ENIs) rather than data nodes. This arrangement helps reduce the number of private IP addresses required for VPCs, which improves network efficiency. Typically, dedicated coordinator nodes make up around 10% of the total data nodes.

## Requirements and limitations
<a name="dedicated-coordinator-nodes-requirements"></a>

Dedicated coordinator nodes have the following requirements and limitations.
+ Dedicated coordinator nodes are supported in all OpenSearch versions and Elasticsearch versions 6.8 to 7.10.
+ To enable dedicated coordinator nodes, your domain must have dedicated master nodes enabled. For more information, see [Dedicated master nodes in Amazon OpenSearch Service](managedomains-dedicatedmasternodes.md).
+ Provisioning dedicated coordinator nodes can incur additional costs. However, the improved resource efficiency and enhanced performance justify the investment, particularly for large or complex clusters.

## Provisioning dedicated coordinator nodes
<a name="dedicated-coordinator-nodes-provisioning"></a>

Perform the following steps to provision dedicated coordinator nodes in an existing domain. Make sure your domain has dedicated *master* nodes enabled before you provision coordinator nodes.

### Console
<a name="dedicated-coordinator-nodes-provisioning-console"></a>

**To provision dedicated coordinator nodes in the AWS Management Console**

1. Sign in to the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/home](https://console.aws.amazon.com/aos/home).

1. Choose **Domains**, then select the domain you want to modify.

1. In the **Cluster configuration** section, choose **Edit**.

1. Choose **Enable dedicated coordinator nodes**.

1. Select the instance type and number of coordinator nodes to provision.

1. Choose **Save changes**. It might take several minutes for the domain to update.

### AWS CLI
<a name="dedicated-coordinator-nodes-provisioning-cli"></a>

To provision dedicated coordinator nodes using the AWS CLI, use the [update-domain-config](https://docs.aws.amazon.com/cli/latest/reference/opensearch/update-domain-config.html) command. The following example provisions three `r6g.large.search` coordinator nodes in a domain.

```
aws opensearch update-domain-config \
  --domain-name my-opensearch-domain \
  --cluster-config InstanceCount=3,InstanceType=r6g.large.search,DedicatedCoordinatorCount=3,ZoneAwarenessEnabled=true,DedicatedCoordinatorEnabled=true
```

This command enables dedicated coordinator nodes, sets the instance type and count for the coordinator nodes, and enables zone awareness for higher availability.

## Best practices
<a name="best-practices-dedicated-coordinator-nodes"></a>

Consider the following best practices when you use dedicated coordinator nodes.
+ Use general purpose instances for most use cases. They provide a balanced approach between cost and performance. Memory-optimized instances are ideal for workloads that require substantial memory resources, such as those that involve complex aggregations or large-scale searches.
+ A good starting point is to provision between 5% and 10% of your data nodes as dedicated coordinator nodes. For example, if your domain has 90 data nodes of a particular instance type, consider provisioning 5 to 9 coordinator nodes of the same instance type.
**Note**  
Instance type availability varies by region. When selecting instance types for coordinator nodes, verify that your chosen instance type is available in your target region. You can check instance type availability in the OpenSearch Service console when creating or modifying your domain.
+ To minimize the risk of a single point of failure, provision at least two dedicated coordinator nodes. This ensures that your cluster remains operational even if one node fails.
+ If you're using cross-Region search, provision dedicated coordinator nodes in the destination domains. Source domains typically don't handle coordination tasks.
+ For indexing-heavy environments, consider CPU-optimized instances that match the instance size of your data nodes for optimal performance.
+ For memory-intensive workloads, use a slightly larger instance type for your dedicated coordinator nodes to help manage the increased memory demands.
+ Track the `CoordinatorCPUUtilization` Amazon CloudWatch metric. If it consistently exceeds 80%, it might indicate that you need larger or additional coordinator nodes to handle the load.
+ Size your dedicated coordinator nodes to match your data nodes. For example, start with 4xlarge general purpose coordinator nodes when using 4xlarge data nodes.
+ Use multiple smaller instances instead of fewer larger instances for coordinator nodes, unless your individual requests or responses need extremely high memory (in GBs). For example, choose 12 4xl instances rather than 6 8xlarge general purpose instances.

### Node recommendations by cluster size
<a name="dedicated-coordinator-nodes-recs"></a>

Use the following guidelines as a starting point for provisioning dedicated coordinator nodes based on your cluster size. Adjust the number and type of nodes based on workload characteristics and performance metrics.


| Cluster size | Recommended coordinator nodes | Instance type | 
| --- | --- | --- | 
|  Small (up to 50 nodes)  | 3-5 nodes | General purpose | 
|  Medium (50-100 nodes)  | 5-9 nodes | Memory optimized | 
|  Large (100\$1 nodes)  | 10-15 nodes | Memory optimized | 

# Dedicated master nodes in Amazon OpenSearch Service
<a name="managedomains-dedicatedmasternodes"></a>

Amazon OpenSearch Service uses *dedicated master nodes* to increase cluster stability. A dedicated master node performs cluster management tasks, but does not hold data or respond to data upload requests. This offloading of cluster management tasks increases the stability of your domain. Just like all other node types, you pay an hourly rate for each dedicated master node.

Dedicated master nodes perform the following cluster management tasks:
+ Track all nodes in the cluster.
+ Track the number of indexes in the cluster.
+ Track the number of shards belonging to each index.
+ Maintain routing information for nodes in the cluster.
+ Update the cluster state after state changes, such as creating an index and adding or removing nodes in the cluster.
+ Replicate changes to the cluster state across all nodes in the cluster.
+ Monitor the health of all cluster nodes by sending *heartbeat signals*, periodic signals that monitor the availability of the data nodes in the cluster.

The following illustration shows an OpenSearch Service domain with 10 instances. Seven of the instances are data nodes and three are dedicated master nodes. Only one of the dedicated master nodes is active. The two gray dedicated master nodes wait as backup in case the active dedicated master node fails. All data upload requests are served by the seven data nodes, and all cluster management tasks are offloaded to the active dedicated master node.

![\[OpenSearch Service domain with data nodes and dedicated master nodes, illustrating cluster management.\]](http://docs.aws.amazon.com/opensearch-service/latest/developerguide/images/DedicatedMasterNodes_no-caption.png)


## Choosing the number of dedicated master nodes
<a name="dedicatedmasternodes-number"></a>

We recommend that you use Multi-AZ with Standby, which adds **three** dedicated master nodes to each production OpenSearch Service domain. If you deploy with Multi-AZ without Standby or single-AZ, we still recommend three dedicated master nodes. Never choose an even number of dedicated master nodes. Consider the following when choosing the number of dedicated master nodes:
+ One dedicated master node is explicitly prohibited by OpenSearch Service because you have no backup in the event of a failure. You receive a validation exception if you try to create a domain with only one dedicated master node.
+ If you have two dedicated master nodes, your cluster doesn't have the necessary quorum of nodes to elect a new master node in the event of a failure.

  A quorum is the number of dedicated master nodes / 2 \$1 1 (rounded down to the nearest whole number). In this case, 2 / 2 \$1 1 = 2. Because one dedicated master node has failed and only one backup exists, the cluster doesn't have a quorum and can't elect a new master.
+ Three dedicated master nodes, the recommended number, provides two backup nodes in the event of a master node failure and the necessary quorum (2) to elect a new master.
+ Four dedicated master nodes are not better than three and can cause issues if you use [multiple Availability Zones](managedomains-multiaz.md).
  + If one master node fails, you have the quorum (3) to elect a new master. If two nodes fail, you lose that quorum, just as you do with three dedicated master nodes.
  + In a three Availability Zone configuration, two AZs have one dedicated master node, and one AZ has two. If that AZ experiences a disruption, the remaining two AZs don't have the necessary quorum (3) to elect a new master.
+ Having five dedicated master nodes works as well as three and allows you to lose two nodes while maintaining a quorum. But because only one dedicated master node is active at any given time, this configuration means that you pay for four idle nodes. Many users find this level of failover protection excessive.

If a cluster has an even number of master-eligible nodes, OpenSearch and Elasticsearch versions 7.*x* and later ignore one node so that the voting configuration is always an odd number. In this case, four dedicated master nodes are essentially equivalent to three (and two to one).

**Note**  
If your cluster doesn't have the necessary quorum to elect a new master node, write *and* read requests to the cluster both fail. This behavior differs from the OpenSearch default.

## Choosing instance types for dedicated master nodes
<a name="dedicatedmasternodes-instance"></a>

### OpenSearch Service domain and instance quotas
<a name="limits-number-per-az"></a>

Although dedicated master nodes don't process search and query requests, their size is highly correlated with the instance size and number of instances, indexes, and shards that they can manage. For production clusters, we recommend, at a minimum, the following instance types for dedicated master nodes. 

These recommendations are based on typical workloads and can vary based on your needs. Clusters with many shards or field mappings can benefit from larger instance types. For more information, see [Recommended CloudWatch alarms for Amazon OpenSearch Service](cloudwatch-alarms.md) to determine if you need to use a larger instance type.


| RAM | Max Node Support for Elasticsearch and OpenSearch Service 1.x to 2.15 | Max Shard Support for Elasticsearch and OpenSearch Service 1.x to 2.15 | Max Node Support for OpenSearch Service 2.17 and above | Max Shard Support for OpenSearch Service 2.17 and above | 
| --- | --- | --- | --- | --- | 
| 2 GB | Not applicable | Not applicable | 10 | 1K | 
| 4 GB | Not applicable | Not applicable | 10 | 5K | 
| 8 GB | 10 | 10K | 30 | 15K | 
| 16 GB | 30 | 30K | 60 | 30K | 
| 32 GB | 75 | 40K | 120 | 60K | 
| 64 GB | 125 | 75K | 240 | 120K | 
| 128 GB | 200 | 75K | 480 | 240K | 
| 256 GB | Not applicable | Not applicable | 1002 | 500K |