# Monitoring MemoryDB with Amazon CloudWatch
<a name="monitoring-cloudwatch"></a>

You can monitor MemoryDB using CloudWatch, which collects raw data and processes it into readable, near real-time metrics. These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the [Amazon CloudWatch User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/).

The following sections list the metrics and dimensions for MemoryDB.

**Topics**
+ [

# Host-Level Metrics
](metrics.HostLevel.md)
+ [

# Metrics for MemoryDB
](metrics.memorydb.md)
+ [

# Which Metrics Should I Monitor?
](metrics.whichshouldimonitor.md)
+ [

# Choosing Metric Statistics and Periods
](metrics.ChoosingStatisticsAndPeriods.md)
+ [

# Monitoring CloudWatch metrics
](cloudwatchmetrics.md)

# Host-Level Metrics
<a name="metrics.HostLevel"></a>

The `AWS/MemoryDB` namespace includes the following host-level metrics for individual nodes.

**See Also**
+ [Metrics for MemoryDB](metrics.memorydb.md)


| Metric | Description | Unit | 
| --- | --- | --- | 
| CPUUtilization |  The percentage of CPU utilization for the entire host. Because Valkey and Redis OSS are single-threaded, we recommend you monitor EngineCPUUtilization metric for nodes with 4 or more vCPUs. |  Percent  | 
| FreeableMemory  |  The amount of free memory available on the host. This number is derived from the memory in RAM and buffers that the OS reports as freeable. |  Bytes  | 
| NetworkBytesIn |  The number of bytes the host has read from the network.  |  Bytes  | 
| NetworkBytesOut | The number of bytes sent out on all network interfaces by the instance.  |  Bytes  | 
| NetworkPacketsIn | The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance.  | Count  | 
| NetworkPacketsOut | The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. | Count  | 
| NetworkBandwidthInAllowanceExceeded | The number of packets shaped because the inbound aggregate bandwidth exceeded the maximum for the instance. | Count  | 
| NetworkConntrackAllowanceExceeded | The number of packets shaped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. | Count  | 
| NetworkBandwidthOutAllowanceExceeded | The number of packets shaped because the outbound aggregate bandwidth exceeded the maximum for the instance. | Count  | 
| NetworkPacketsPerSecondAllowanceExceeded | The number of packets shaped because the bidirectional packets per second exceeded the maximum for the instance. | Count  | 
| NetworkMaxBytesIn | The maximum per second burst of received bytes within each minute. | Bytes | 
| NetworkMaxBytesOut  | The maximum per second burst of transmitted bytes within each minute. | Bytes | 
| NetworkMaxPacketsIn | The maximum per second burst of received packets within each minute. | Count  | 
| NetworkMaxPacketsOut | The maximum per second burst of transmitted packets within each minute. | Count  | 
| SwapUsage |  The amount of swap used on the host.  |  Bytes  | 

# Metrics for MemoryDB
<a name="metrics.memorydb"></a>

The `AWS/MemoryDB` namespace includes the following metrics.

With the exception of `ReplicationLag`, `EngineCPUUtilization`, `SuccessfulWriteRequestLatency`, and `SuccessfulReadRequestLatency`, these metrics are derived from the Valkey and Redis OSS **info** command. Each metric is calculated at the node level.

For complete documentation of the **INFO** command, see [INFO](http://valkey.io/commands/info). 

**See also:**
+ [Host-Level Metrics](metrics.HostLevel.md)

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/memorydb/latest/devguide/metrics.memorydb.html)

The following are aggregations of certain kinds of commands, derived from **info commandstats**. The commandstats section provides statistics based on the command type, including the number of calls.

For a full list of available commands, see [commands](https://valkey.io/commands). 


| Metric  | Description  | Unit  | 
| --- | --- | --- | 
| EvalBasedCmds | The total number of commands for eval-based commands. This is derived from the commandstats statistic by summing eval and evalsha. | Count | 
| GeoSpatialBasedCmds | The total number of commands for geospatial-based commands. This is derived from the commandstats statistic. It's derived by summing all of the geo type of commands: geoadd, geodist, geohash, geopos, georadius, and georadiusbymember. | Count | 
| GetTypeCmds | The total number of read-only type commands. This is derived from the commandstats statistic by summing all of the read-only type commands (get, hget, scard, lrange, and so on.) | Count | 
| HashBasedCmds | The total number of commands that are hash-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more hashes (hget, hkeys, hvals, hdel, and so on). | Count | 
| HyperLogLogBasedCmds | The total number of HyperLogLog-based commands. This is derived from the commandstats statistic by summing all of the pf type of commands (pfadd, pfcount, pfmerge, and so on.). | Count | 
|  JsonBasedCmds |  The total number of commands that are JSON-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more JSON document objects.  | Count | 
| KeyBasedCmds | The total number of commands that are key-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more keys across multiple data structures (del, expire, rename, and so on.). | Count | 
| ListBasedCmds | The total number of commands that are list-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more lists (lindex, lrange, lpush, ltrim, and so on). | Count | 
| PubSubBasedCmds | The total number of commands for pub/sub functionality. This is derived from the commandstats statistics by summing all of the commands used for pub/sub functionality: psubscribe, publish, pubsub, punsubscribe, subscribe, and unsubscribe. | Count | 
| SearchBasedCmds | The total number of secondary index and search commands, including both read and write commands. This is derived from the commandstats statistic by summing all search commands that act upon secondary indexes. | Count | 
| SearchBasedGetCmds | Total number of secondary index and search read-only commands. This is derived from the commandstats statistic by summing all secondary index and search get commands. | Count | 
| SearchBasedSetCmds | Total number of secondary index and search write commands. This is derived from the commandstats statistic by summing all secondary index and search set commands. | Count | 
| SearchNumberOfIndexes | Total number of indexes.  | Count | 
| SearchNumberOfIndexedKeys | Total number of indexed keys  | Count | 
| SearchTotalIndexSize | Memory (bytes) used by all the indexes.  | Bytes | 
| SetBasedCmds | The total number of commands that are set-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more sets (scard, sdiff, sadd, sunion, and so on). | Count | 
| SetTypeCmds | The total number of write types of commands. This is derived from the commandstats statistic by summing all of the mutative types of commands that operate on data (set, hset, sadd, lpop, and so on.) | Count | 
| SortedSetBasedCmds | The total number of commands that are sorted set-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more sorted sets (zcount, zrange, zrank, zadd, and so on). | Count | 
| StringBasedCmds | The total number of commands that are string-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more strings (strlen, setex, setrange, and so on). | Count | 
| StreamBasedCmds | The total number of commands that are stream-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more streams data types (xrange, xlen, xadd, xdel, and so on). | Count | 

# Which Metrics Should I Monitor?
<a name="metrics.whichshouldimonitor"></a>

The following CloudWatch metrics offer good insight into MemoryDB performance. In most cases, we recommend that you set CloudWatch alarms for these metrics so that you can take corrective action before performance issues occur.

**Topics**
+ [

## CPUUtilization
](#metrics-cpu-utilization)
+ [

## EngineCPUUtilization
](#metrics-engine-cpu-utilization)
+ [

## SwapUsage
](#metrics-swap-usage)
+ [

## Evictions
](#metrics-evictions)
+ [

## CurrConnections
](#metrics-curr-connections)
+ [

## Memory
](#metrics-memory)
+ [

## Network
](#metrics-network)
+ [

## Latency
](#metrics-latency)
+ [

## Replication
](#metrics-replication)

## CPUUtilization
<a name="metrics-cpu-utilization"></a>

This is a host-level metric reported as a percentage. For more information, see [Host-Level Metrics](metrics.HostLevel.md).

 For smaller node types with 2vCPUs or less, use the `CPUUtilization ` metric to monitor your workload.

Generally speaking, we suggest you set your threshold at 90% of your available CPU. Because Valkey and Redis OSS are single-threaded, the actual threshold value should be calculated as a fraction of the node's total capacity. For example, suppose you are using a node type that has two cores. In this case, the threshold for CPUUtilization would be 90/2, or 45%. To find the number of cores (vCPUs) your node type has, see [MemoryDB Pricing](https://aws.amazon.com/memorydb/pricing/?p=ps).

You will need to determine your own threshold, based on the number of cores in the node that you are using. If you exceed this threshold, and your main workload is from read requests, scale your cluster out by adding read replicas. If the main workload is from write requests, we recommend that you add more shards to distribute the write workload across more primary nodes.

**Tip**  
Instead of using the Host-Level metric `CPUUtilization`, you might be able to use the metric `EngineCPUUtilization`, which reports the percentage of usage on the Valkey or Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/metrics.memorydb.html).

For larger node types with 4vCPUs or more, you may want to use the `EngineCPUUtilization` metric, which reports the percentage of usage on the Valkey or Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/metrics.memorydb.html).

## EngineCPUUtilization
<a name="metrics-engine-cpu-utilization"></a>

For larger node types with 4vCPUs or more, you may want to use the `EngineCPUUtilization` metric, which reports the percentage of usage on the Valkey or Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/metrics.memorydb.html).

## SwapUsage
<a name="metrics-swap-usage"></a>

This is a host-level metric reported in bytes. For more information, see [Host-Level Metrics](metrics.HostLevel.md).

If either the `FreeableMemory` CloudWatch metric is close to 0 (i.e., below 100MB), or the `SwapUsage` metric is greater than the `FreeableMemory` metric, then a node could be under memory pressure.

## Evictions
<a name="metrics-evictions"></a>

This is a engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

## CurrConnections
<a name="metrics-curr-connections"></a>

This is a engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

An increasing number of *CurrConnections* might indicate a problem with your application; you will need to investigate the application behavior to address this issue. 

## Memory
<a name="metrics-memory"></a>

Memory is a core aspect of Valkey and of Redis OSS. Understanding the memory utilization of your cluster is necessary to avoid data loss and accommodate future growth of your dataset. Statistics about the memory utilization of a node are available in the memory section of the [INFO](https://valkey.io/commands/info) command.

## Network
<a name="metrics-network"></a>

One of the determining factors for the network bandwidth capacity of your cluster is the node type you have selected. For more information about the network capacity of your node, see [Amazon MemoryDB pricing](https://aws.amazon.com/memorydb/pricing/).

## Latency
<a name="metrics-latency"></a>

The latency metrics `SuccessfulWriteRequestLatency` and `SuccessfulReadRequestLatency` measure the total time that MemoryDB for the Valkey engine takes to respond to a request.

**Note**  
Inflated values for `SuccessfulWriteRequestLatency` and `SuccessfulReadRequestLatency` metrics may occur when using Valkey pipelining with CLIENT REPLY enabled on the Valkey client. Valkey pipelining is a technique for improving performance by issuing multiple commands at once, without waiting for the response to each individual command. To avoid inflated values, we recommend configuring your Redis client to pipeline commands with [CLIENT REPLY OFF](https://valkey.io/commands/client-reply/).

## Replication
<a name="metrics-replication"></a>

The volume of data being replicated is visible via the `ReplicationBytes` metric. You can monitor `MaxReplicationThroughput` against the replication capacity throughput. It is recommended to add more shards when reaching the maximum replication capacity throughput.

`ReplicationDelayedWriteCommands` can also indicate if the workload is exceeding the maximum replication capacity throughput. For more information about replication in MemoryDB, see [Understanding MemoryDB replication](https://docs.aws.amazon.com/memorydb/latest/devguide/replication.html)

# Choosing Metric Statistics and Periods
<a name="metrics.ChoosingStatisticsAndPeriods"></a>

While CloudWatch will allow you to choose any statistic and period for each metric, not all combinations will be useful. For example, the Average, Minimum, and Maximum statistics for CPUUtilization are useful, but the Sum statistic is not.

All MemoryDB samples are published for a 60 second duration for each individual node. For any 60 second period, a node metric will only contain a single sample.

# Monitoring CloudWatch metrics
<a name="cloudwatchmetrics"></a>

MemoryDB and CloudWatch are integrated so you can gather a variety of metrics. You can monitor these metrics using CloudWatch. 

**Note**  
The following examples require the CloudWatch command line tools. For more information about CloudWatch and to download the developer tools, see the [ CloudWatch product page](https://aws.amazon.com/cloudwatch). 

The following procedures show you how to use CloudWatch to gather storage space statistics for an cluster for the past hour. 

**Note**  
The `StartTime` and `EndTime` values supplied in the examples following are for illustrative purposes. Make sure to substitute appropriate start and end time values for your nodes.

For information on MemoryDB limits, see [AWS service limits](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_memorydb) for MemoryDB.

## Monitoring CloudWatch metrics (Console)
<a name="cloudwatchmetricsclusters.viewdetails"></a>

 **To gather CPU utilization statistics for a cluster** 

1. Sign in to the AWS Management Console and open the MemoryDB console at [https://console.aws.amazon.com/memorydb/](https://console.aws.amazon.com/memorydb/).

1. Select the nodes you want to view metrics for. 
**Note**  
Selecting more than 20 nodes disables viewing metrics on the console.

   1. On the ** Clusters** page of the AWS Management Console, click the name of one or more clusters.

      The detail page for the cluster appears. 

   1. Click the **Nodes** tab at the top of the window.

   1. On the **Nodes** tab of the detail window, select the nodes that you want to view metrics for.

      A list of available CloudWatch Metrics appears at the bottom of the console window. 

   1. Click on the **CPU Utilization** metric. 

      The CloudWatch console will open, displaying your selected metrics. You can use the **Statistic** and **Period** drop-down list boxes and **Time Range** tab to change the metrics being displayed. 

## Monitoring CloudWatch metrics using the CloudWatch CLI
<a name="cloudwatchmetrics.cli"></a>

 **To gather CPU utilization statistics for a cluster** 
+ Use the CloudWatch command **aws cloudwatch get-metric-statistics** with the following parameters (note that the start and end times are shown as examples only; you will need to substitute your own appropriate start and end times):

  For Linux, macOS, or Unix:

  ```
  1. aws cloudwatch get-metric-statistics CPUUtilization \
  2.     --dimensions=ClusterName=mycluster,NodeId=0002" \
  3.     --statistics=Average \
  4.     --namespace="AWS/MemoryDB" \
  5.     --start-time 2013-07-05T00:00:00 \
  6.     --end-time 2013-07-06T00:00:00 \
  7.     --period=60
  ```

  For Windows:

  ```
  1. mon-get-stats CPUUtilization ^
  2.     --dimensions=ClusterName=mycluster,NodeId=0002" ^
  3.     --statistics=Average ^
  4.     --namespace="AWS/MemoryDB" ^
  5.     --start-time 2013-07-05T00:00:00 ^
  6.     --end-time 2013-07-06T00:00:00 ^
  7.     --period=60
  ```

## Monitoring CloudWatch metrics using the CloudWatch API
<a name="cloudwatchmetrics.api"></a>

 **To gather CPU utilization statistics for a cluster** 
+ Call the CloudWatch API `GetMetricStatistics` with the following parameters (note that the start and end times are shown as examples only; you will need to substitute your own appropriate start and end times):
  + `Statistics.member.1``=Average`
  + `Namespace``=AWS/MemoryDB`
  + `StartTime``=2013-07-05T00:00:00`
  + `EndTime``=2013-07-06T00:00:00`
  + `Period``=60`
  + `MeasureName``=CPUUtilization`
  + `Dimensions``=ClusterName=mycluster,NodeId=0002`  
**Example**  

  ```
   1. http://monitoring.amazonaws.com/
   2.     ?SignatureVersion=4
   3.     &Action=GetMetricStatistics
   4.     &Version=2014-12-01
   5.     &StartTime=2013-07-16T00:00:00
   6.     &EndTime=2013-07-16T00:02:00
   7.     &Period=60
   8.     &Statistics.member.1=Average
   9.     &Dimensions.member.1="ClusterName=mycluster"
  10.     &Dimensions.member.2="NodeId=0002"
  11.     &Namespace=Amazon/memorydb
  12.     &MeasureName=CPUUtilization						
  13.     &Timestamp=2013-07-07T17%3A48%3A21.746Z
  14.     &AWS;AccessKeyId=<&AWS; Access Key ID>
  15.     &Signature=<Signature>
  ```