

# Monitoring use with CloudWatch Metrics
<a name="CacheMetrics"></a>

ElastiCache provides metrics that enable you to monitor your clusters. You can access these metrics through CloudWatch. For more information on CloudWatch, see the [CloudWatch documentation.](https://aws.amazon.com/documentation/cloudwatch/)

ElastiCache provides both host-level metrics (for example, CPU usage) and metrics that are specific to the cache engine software (for example, cache gets and cache misses). These metrics are measured and published for each Cache node in 60-second intervals.

**Important**  
You should consider setting CloudWatch alarms on certain key metrics, so that you will be notified if your cluster's performance starts to degrade. For more information, see [Which Metrics Should I Monitor?](CacheMetrics.WhichShouldIMonitor.md) in this guide.

**Topics**
+ [Host-Level Metrics](CacheMetrics.HostLevel.md)
+ [Metrics for Valkey and Redis OSS](CacheMetrics.Redis.md)
+ [Metrics for Memcached](CacheMetrics.Memcached.md)
+ [Which Metrics Should I Monitor?](CacheMetrics.WhichShouldIMonitor.md)
+ [Choosing Metric Statistics and Periods](CacheMetrics.ChoosingStatisticsAndPeriods.md)
+ [Monitoring CloudWatch Cluster and Node Metrics](CloudWatchMetrics.md)

# Host-Level Metrics
<a name="CacheMetrics.HostLevel"></a>

The `AWS/ElastiCache` namespace includes the following host-level metrics for individual cache nodes. These metrics are measured and published for each Cache node in 60-second intervals.

**See Also**
+ [Metrics for Valkey and Redis OSS](CacheMetrics.Redis.md)


| Metric | Description | Unit | 
| --- | --- | --- | 
| CPUUtilization |  The percentage of CPU utilization for the entire host. Because Valkey and Redis OSS are single-threaded, we recommend you monitor EngineCPUUtilization metric for nodes with 4 or more vCPUs. |  Percent  | 
| CPUCreditBalance | The number of earned CPU credits that an instance has accrued since it was launched or started. For T2 Standard, the CPUCreditBalance also includes the number of launch credits that have been accrued. Credits are accrued in the credit balance after they are earned, and removed from the credit balance when they are spent. The credit balance has a maximum limit, determined by the instance size. After the limit is reached, any new credits that are earned are discarded. For T2 Standard, launch credits do not count towards the limit. The credits in the CPUCreditBalance are available for the instance to spend to burst beyond its baseline CPU utilization. CPU credit metrics are available at a five-minute frequency only. This metrics is not available for T2 burstable performance instances.  | Credits (vCPU-minutes)  | 
| CPUCreditUsage | The number of CPU credits spent by the instance for CPU utilization. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). CPU credit metrics are available at a five-minute frequency only. If you specify a period greater than five minutes, use the Sum statistic instead of the Average statistic. This metrics is not available for T2 burstable performance instances.  | Credits (vCPU-minutes)  | 
| FreeableMemory  |  The amount of free memory available on the host. This is derived from the RAM, buffers, and cache that the OS reports as freeable. |  Bytes  | 
| NetworkBytesIn |  The number of bytes the host has read from the network.  |  Bytes  | 
| NetworkBytesOut | The number of bytes sent out on all network interfaces by the instance.  |  Bytes  | 
| NetworkPacketsIn | The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance.  | Count  | 
| NetworkPacketsOut |  The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. | Count  | 
| NetworkBandwidthInAllowanceExceeded | The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance. | Count  | 
| NetworkConntrackAllowanceExceeded | The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. | Count  | 
| NetworkBandwidthOutAllowanceExceeded | The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance. | Count  | 
| NetworkPacketsPerSecondAllowanceExceeded | The number of packets queued or dropped because the bidirectional packets per second exceeded the maximum for the instance. | Count  | 
| NetworkMaxBytesIn | The maximum per second burst of received bytes within each minute. | Bytes | 
| NetworkMaxBytesOut  | The maximum per second burst of transmitted bytes within each minute. | Bytes | 
| NetworkMaxPacketsIn | The maximum per second burst received packets within each minute. | Count  | 
| NetworkMaxPacketsOut | The maximum per second burst of transmitted packets within each minute. | Count  | 
| SwapUsage |  The amount of swap used on the host.  |  Bytes  | 

# Metrics for Valkey and Redis OSS
<a name="CacheMetrics.Redis"></a>

The `Amazon ElastiCache` namespace includes the following Valkey and Redis OSS metrics. These metrics are the same when using the Valkey engine.

With the exception of `ReplicationLag`, `EngineCPUUtilization`, `SuccessfulWriteRequestLatency`, and `SuccessfulReadRequestLatency`, these metrics are derived from the **info** command. Each metric is calculated at the cache node level.

For complete documentation of the **info** command, see [http://valkey.io/commands/info](https://valkey.io/commands/info). 

**See Also**
+ [Host-Level Metrics](CacheMetrics.HostLevel.md)

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonElastiCache/latest/dg/CacheMetrics.Redis.html)

The following are aggregations of certain kinds of commands, derived from **info commandstats**. The commandstats section provides statistics based on the command type, including the number of calls, the total CPU time consumed by these commands, and the average CPU consumed per command execution. For each command type, the following line is added: `cmdstat_XXX: calls=XXX,usec=XXX,usec_per_call=XXX`.

The latency metrics listed following are calculated using commandstats statistic from [INFO](https://valkey.io/commands/info). They are calculated in the following way: `delta(usec)/delta(calls)`. `delta` is calculated as the diff within one minute. Latency is defined as CPU time taken by ElastiCache to process the command. Note that for clusters using data tiering, the time taken to fetch items from SSD is not included in these measurements.

For a full list of available commands, see [commands](https://valkey.io/commands) in the Valkey documentation. 


| Metric  | Description  | Unit  | 
| --- | --- | --- | 
| ClusterBasedCmds | The total number of commands that are cluster-based. This is derived from the commandstats statistic by summing all of the commands that act upon a cluster (cluster slot, cluster info, and so on).  | Count | 
| ClusterBasedCmdsLatency | Latency of cluster-based commands. | Microseconds | 
| EvalBasedCmds | The total number of commands for eval-based commands. This is derived from the commandstats statistic by summing eval, evalsha. | Count | 
| EvalBasedCmdsLatency | Latency of eval-based commands. | Microseconds | 
| GeoSpatialBasedCmds | The total number of commands for geospatial-based commands. This is derived from the commandstats statistic. It's derived by summing all of the geo type of commands: geoadd, geodist, geohash, geopos, georadius, and georadiusbymember. | Count | 
| GeoSpatialBasedCmdsLatency | Latency of geospatial-based commands.  | Microseconds | 
| GetTypeCmds | The total number of read-only type commands. This is derived from the commandstats statistic by summing all of the read-only type commands (get, hget, scard, lrange, and so on.) | Count | 
|  GetTypeCmdsLatency |  Latency of read commands.  | Microseconds | 
| HashBasedCmds | The total number of commands that are hash-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more hashes (hget, hkeys, hvals, hdel, and so on). | Count | 
|  HashBasedCmdsLatency |  Latency of hash-based commands.  | Microseconds | 
| HyperLogLogBasedCmds | The total number of HyperLogLog-based commands. This is derived from the commandstats statistic by summing all of the pf type of commands (pfadd, pfcount, pfmerge, and so on.). | Count | 
|  HyperLogLogBasedCmdsLatency |  Latency of HyperLogLog-based commands.  | Microseconds | 
| JsonBasedCmds | The total number of JSON commands, including both read and write commands. This is derived from the commandstats statistic by summing all JSON commands that act upon JSON keys. | Count | 
| JsonBasedCmdsLatency | Latency of all JSON commands, including both read and write commands. | Microseconds | 
| JsonBasedGetCmds | The total number of JSON read-only commands. This is derived from the commandstats statistic by summing all JSON read commands that act upon JSON keys. | Count | 
| JsonBasedGetCmdsLatency | Latency of JSON read-only commands. | Microseconds | 
| JsonBasedSetCmds | The total number of JSON write commands. This is derived from the commandstats statistic by summing all JSON write commands that act upon JSON keys. | Count | 
| JsonBasedSetCmdsLatency | Latency of JSON write commands. | Microseconds | 
| KeyBasedCmds | The total number of commands that are key-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more keys across multiple data structures (del, expire, rename, and so on.). | Count | 
|  KeyBasedCmdsLatency |  Latency of key-based commands.  | Microseconds | 
| ListBasedCmds | The total number of commands that are list-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more lists (lindex, lrange, lpush, ltrim, and so on). | Count | 
|  ListBasedCmdsLatency |  Latency of list-based commands.  | Microseconds | 
| NonKeyTypeCmds | The total number of commands that are not key-based. This is derived from the commandstats statistic by summing all of the commands that do not act upon a key, for example, acl, dbsize or info. | Count | 
| NonKeyTypeCmdsLatency | Latency of non-key-based commands. | Microseconds | 
| PubSubBasedCmds | The total number of commands for pub/sub functionality. This is derived from the commandstatsstatistics by summing all of the commands used for pub/sub functionality: psubscribe, publish, pubsub, punsubscribe, ssubscribe, sunsubscribe, spublish, subscribe, and unsubscribe. | Count | 
| PubSubBasedCmdsLatency | Latency of pub/sub-based commands. | Microseconds | 
| SetBasedCmds | The total number of commands that are set-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more sets (scard, sdiff, sadd, sunion, and so on). | Count | 
|  SetBasedCmdsLatency |  Latency of set-based commands.  | Microseconds | 
| SetTypeCmds | The total number of write types of commands. This is derived from the commandstats statistic by summing all of the mutative types of commands that operate on data (set, hset, sadd, lpop, and so on.) | Count | 
|  SetTypeCmdsLatency |  Latency of write commands.  | Microseconds | 
| SortedSetBasedCmds | The total number of commands that are sorted set-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more sorted sets (zcount, zrange, zrank, zadd, and so on). | Count | 
|  SortedSetBasedCmdsLatency |  Latency of sorted-based commands.  | Microseconds | 
| StringBasedCmds | The total number of commands that are string-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more strings (strlen, setex, setrange, and so on). | Count | 
|  StringBasedCmdsLatency |  Latency of string-based commands.  | Microseconds | 
| StreamBasedCmds | The total number of commands that are stream-based. This is derived from the commandstats statistic by summing all of the commands that act upon one or more streams data types (xrange, xlen, xadd, xdel, and so on). | Count | 
|  StreamBasedCmdsLatency |  Latency of stream-based commands.  | Microseconds | 
| SearchBasedCmds | The total number of Search commands, including both read and write commands. This is derived from the commandstats statistic by summing all Search commands. | Count | 
| SearchBasedCmdsLatency | Latency of all Search commands, including both read and write commands. | Microseconds | 
| SearchBasedGetCmds | The total number of Search read-only commands. This is derived from the commandstats statistic by summing all Search read commands. | Count | 
| SearchBasedGetCmdsLatency | Latency of Search read-only commands. | Microseconds | 
| SearchBasedSetCmds | The total number of Search write commands. This is derived from the commandstats statistic by summing all Search write commands. | Count | 
| SearchBasedSetCmdsLatency | Latency of Search write commands. | Microseconds | 

# Metrics for Memcached
<a name="CacheMetrics.Memcached"></a>

The `AWS/ElastiCache` namespace includes the following Memcached metrics.

The AWS/ElastiCache namespace includes the following metrics that are derived from the Memcached stats command. Each metric is calculated at the cache node level.

**See also**
+ [Host-Level Metrics](CacheMetrics.HostLevel.md)


| Metric  | Description  | Unit  | 
| --- | --- | --- | 
| BytesReadIntoMemcached | The number of bytes that have been read from the network by the cache node. | Bytes | 
| BytesUsedForCacheItems | The number of bytes used to store cache items. | Bytes | 
| BytesWrittenOutFromMemcached | The number of bytes that have been written to the network by the cache node. | Bytes | 
| CasBadval | The number of CAS (check and set) requests the cache has received where the Cas value did not match the Cas value stored.  | Count | 
| CasHits | The number of Cas requests the cache has received where the requested key was found and the Cas value matched. | Count | 
| CasMisses | The number of Cas requests the cache has received where the key requested was not found.   | Count | 
| CmdFlush | The number of flush commands the cache has received. | Count | 
| CmdGet | The number of get commands the cache has received. | Count | 
| CmdSet | The number of set commands the cache has received. | Count | 
| CurrConnections | A count of the number of connections connected to the cache at an instant in time. ElastiCache uses 2 to 3 of the connections to monitor the cluster. In addition to the above, memcached creates a number of internal connections equal to twice the threads used for the node type. The thread count for the various node types can be see in the `Nodetype Specific Parameters` of the applicable Parameter Group. The total connections is the sum of client connections, the connections for monitoring and the internal connections mentioned above.  | Count | 
| CurrItems | A count of the number of items currently stored in the cache. | Count | 
| DecrHits | The number of decrement requests the cache has received where the requested key was found. | Count | 
| DecrMisses | The number of decrement requests the cache has received where the requested key was not found. | Count | 
| DeleteHits | The number of delete requests the cache has received where the requested key was found. | Count | 
| DeleteMisses | The number of delete requests the cache has received where the requested key was not found. | Count | 
| Evictions | The number of non-expired items the cache evicted to allow space for new writes. | Count | 
| GetHits | The number of get requests the cache has received where the key requested was found. | Count | 
| GetMisses | The number of get requests the cache has received where the key requested was not found. | Count | 
| IncrHits | The number of increment requests the cache has received where the key requested was found. | Count | 
| IncrMisses | The number of increment requests the cache has received where the key requested was not found. | Count | 
| Reclaimed | The number of expired items the cache evicted to allow space for new writes. | Count | 

For Memcached 1.4.14, the following additional metrics are provided.


| Metric  | Description  | Unit  | 
| --- | --- | --- | 
| BytesUsedForHash | The number of bytes currently used by hash tables. | Bytes | 
| CmdConfigGet | The cumulative number of config get requests. | Count | 
| CmdConfigSet | The cumulative number of config set requests. | Count | 
| CmdTouch | The cumulative number of touch requests. | Count | 
| CurrConfig | The current number of configurations stored. | Count | 
| EvictedUnfetched | The number of valid items evicted from the least recently used cache (LRU) which were never touched after being set. | Count | 
| ExpiredUnfetched | The number of expired items reclaimed from the LRU which were never touched after being set. | Count | 
| SlabsMoved | The total number of slab pages that have been moved. | Count | 
| TouchHits | The number of keys that have been touched and were given a new expiration time. | Count | 
| TouchMisses | The number of items that have been touched, but were not found. | Count | 

The AWS/ElastiCache namespace includes the following calculated cache-level metrics.


| Metric  | Description  | Unit  | 
| --- | --- | --- | 
| NewConnections | The number of new connections the cache has received. This is derived from the memcached total\$1connections statistic by recording the change in total\$1connections across a period of time. This will always be at least 1, due to a connection reserved for a ElastiCache. | Count | 
| NewItems | The number of new items the cache has stored. This is derived from the memcached total\$1items statistic by recording the change in total\$1items across a period of time. | Count | 
| UnusedMemory | The amount of memory not used by data. This is derived from the Memcached statistics limit\$1maxbytes and bytes by subtracting bytes from limit\$1maxbytes. Because Memcached overhead uses memory in addition to that used by data, UnusedMemory should not be considered to be the amount of memory available for additional data. You may experience evictions even though you still have some unused memory. For more detailed information, see [Memcached item memory usage](https://web.archive.org/web/20190422040715/https://www.deplication.net/2016/02/memcached-item-memory-usage/).  | Bytes | 

# Which Metrics Should I Monitor?
<a name="CacheMetrics.WhichShouldIMonitor"></a>

The following CloudWatch metrics offer good insight into ElastiCache performance. In most cases, we recommend that you set CloudWatch alarms for these metrics so that you can take corrective action before performance issues occur.

**Topics**
+ [CPUUtilization](#metrics-cpu-utilization)
+ [EngineCPUUtilization](#metrics-engine-cpu-utilization)
+ [SwapUsage (Valkey and Redis OSS)](#metrics-swap-usage)
+ [Evictions](#metrics-evictions)
+ [CurrConnections](#metrics-curr-connections)
+ [Memory (Valkey and Redis OSS)](#metrics-memory)
+ [Network](#metrics-network)
+ [Latency](#metrics-latency)
+ [Replication](#metrics-replication)
+ [Traffic Management (Valkey and Redis OSS)](#traffic-management)

## CPUUtilization
<a name="metrics-cpu-utilization"></a>

This is a host-level metric reported as a percentage. For more information, see [Host-Level Metrics](CacheMetrics.HostLevel.md).

**Valkey and Redis OSS**

 For smaller node types with 2vCPUs or less, use the `CPUUtilization ` metric to monitor your workload.

Generally speaking, we suggest you set your threshold at 90% of your available CPU. Because Valkey and Redis OSS are both single-threaded, the actual threshold value should be calculated as a fraction of the node's total capacity. For example, suppose you are using a node type that has two cores. In this case, the threshold for CPUUtilization would be 90/2, or 45%. 

You will need to determine your own threshold, based on the number of cores in the cache node that you are using. If you exceed this threshold, and your main workload is from read requests, scale your cluster out by adding read replicas. If the main workload is from write requests, depending on your cluster configuration, we recommend that you:
+ **Valkey or Redis OSS (cluster mode disabled) clusters:** scale up by using a larger cache instance type.
+ **Valkey or Redis OSS (cluster mode enabled) clusters:** add more shards to distribute the write workload across more primary nodes.

**Tip**  
Instead of using the Host-Level metric `CPUUtilization`, Valkey and Redis OSS users might be able to use the metric `EngineCPUUtilization`, which reports the percentage of usage on the Valkey or Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for Valkey and Redis OSS](CacheMetrics.Redis.md).

For larger node types with 4vCPUs or more, you may want to use the `EngineCPUUtilization` metric, which reports the percentage of usage on the Valkey or Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for Redis OSS](CacheMetrics.Redis.md).

**Memcached**

Because Memcached is multi-threaded, this metric can be as high as 90%. If you exceed this threshold, scale your cluster up by using a larger cache node type or scale out by adding more cache nodes.

## EngineCPUUtilization
<a name="metrics-engine-cpu-utilization"></a>

For larger node types with 4vCPUs or more, you may want to use the `EngineCPUUtilization` metric, which reports the percentage of usage on the Redis OSS engine core. To see if this metric is available on your nodes and for more information, see [Metrics for Valkey and Redis OSS](CacheMetrics.Redis.md).

For more information, see the **CPUs** section at [Monitoring best practices with Amazon ElastiCache for Redis OSS using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## SwapUsage (Valkey and Redis OSS)
<a name="metrics-swap-usage"></a>

This is a host-level metric reported in bytes. For more information, see [Host-Level Metrics](CacheMetrics.HostLevel.md).

The `FreeableMemory` CloudWatch metric being close to 0 (i.e., below 100MB) or `SwapUsage` metric greater than the `FreeableMemory` metric indicates a node is under memory pressure. If this happens, see the following topics:
+ [Ensuring you have enough memory to make a Valkey or Redis OSS snapshot](BestPractices.BGSAVE.md)
+ [Managing reserved memory for Valkey and Redis OSS](redis-memory-management.md)

## Evictions
<a name="metrics-evictions"></a>

This is a cache engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

If you are using Memcached and exceed your chosen threshold, scale your cluster up by using a larger node type or scale out by adding more nodes.

## CurrConnections
<a name="metrics-curr-connections"></a>

This is a cache engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

An increasing number of *CurrConnections* might indicate a problem with your application; you will need to investigate the application behavior to address this issue. 

For more information, see the **Connections** section at [Monitoring best practices with Amazon ElastiCache for Redis OSS using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## Memory (Valkey and Redis OSS)
<a name="metrics-memory"></a>

Memory is a core aspect of Valkey and Redis OSS. Understanding the memory utilization of your cluster is necessary to avoid data loss and accommodate future growth of your dataset. Statistics about the memory utilization of a node are available in the memory section of the [INFO](https://valkey.io/commands/info) command.

For more information, see the **Memory** section at [Monitoring best practices with Amazon ElastiCache for Redis OSS using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## Network
<a name="metrics-network"></a>

One of the determining factors for the network bandwidth capacity of your cluster is the node type you have selected. For more information about the network capacity of your node, see [Amazon ElastiCache pricing](https://aws.amazon.com/elasticache/pricing/).

For more information, see the **Network** section at [Monitoring best practices with Amazon ElastiCache for Redis OSS using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## Latency
<a name="metrics-latency"></a>

Measuring response time for an ElastiCache for Valkey instance can be approached in various ways depending on the level of granularity required. The key stages that contribute to the overall server-side response time for ElastiCache for Valkey are command pre-processing, command execution, and command post-processing. 

 Command-specific latency metrics derived from the Valkey [INFO](https://valkey.io/commands/info) command such as GetTypeCmdsLatency and SetTypeCmdsLatency metric focus specifically on executing the core command logic for the Valkey command. These metrics will be helpful if your use case is to determine the command execution time or aggregated latencies per data structure.

The latency metrics `SuccessfulWriteRequestLatency` and `SuccessfulReadRequestLatency` measure the total time that the ElastiCache for Valkey engine takes to respond to a request.

**Note**  
Inflated values for `SuccessfulWriteRequestLatency` and `SuccessfulReadRequestLatency` metrics can occur when using Valkey pipelining with CLIENT REPLY enabled on the Valkey client. Valkey pipelining is a technique for improving performance by issuing multiple commands at once, without waiting for the response to each individual command. To avoid inflated values, we recommend configuring your Valkey client to pipeline commands with [CLIENT REPLY OFF](https://valkey.io/commands/client-reply/).

For more information, see the **Latency** section at [Monitoring best practices with Amazon ElastiCache using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## Replication
<a name="metrics-replication"></a>

The volume of data being replicated is visible via the `ReplicationBytes` metric. Although this metric is representative of the write load on the replication group, it doesn't provide insights into replication health. For this purpose, you can use the `ReplicationLag` metric. 

For more information, see the **Replication** section at [Monitoring best practices with Amazon ElastiCache for Redis OSS using Amazon CloudWatch](https://aws.amazon.com/blogs/database/monitoring-best-practices-with-amazon-elasticache-for-redis-using-amazon-cloudwatch/).

## Traffic Management (Valkey and Redis OSS)
<a name="traffic-management"></a>

 ElastiCache for Redis OSS automatically manages traffic against a node when more incoming commands are sent to the node than can be processed by Valkey or Redis OSS. This is done to maintain optimal operation and stability of the engine. 

 When traffic is actively managed on a node, the metric `TrafficManagementActive` will emit data points of 1. This indicates that the node may be underscaled for the workload being provided. If this metric remains 1 for long periods of time, evaluate the cluster to decide if scaling up or scaling out is necessary. 

 For more information, see the `TrafficManagementActive` metric on the [Metrics](CacheMetrics.Redis.md) page.

# Choosing Metric Statistics and Periods
<a name="CacheMetrics.ChoosingStatisticsAndPeriods"></a>

While CloudWatch will allow you to choose any statistic and period for each metric, not all combinations will be useful. For example, the Average, Minimum, and Maximum statistics for CPUUtilization are useful, but the Sum statistic is not.

All ElastiCache samples are published for a 60 second duration for each individual cache node. For any 60 second period, a cache node metric will only contain a single sample.

For further information on how to retrieve metrics for your cache nodes, see [Monitoring CloudWatch Cluster and Node Metrics](CloudWatchMetrics.md).

# Monitoring CloudWatch Cluster and Node Metrics
<a name="CloudWatchMetrics"></a>

ElastiCache and CloudWatch are integrated so you can gather a variety of metrics. You can monitor these metrics using CloudWatch. 

**Note**  
The following examples require the CloudWatch command line tools. For more information about CloudWatch and to download the developer tools, see the [ CloudWatch product page](https://aws.amazon.com/cloudwatch). 

The following procedures show you how to use CloudWatch to gather storage space statistics for an cluster for the past hour. 

**Note**  
The `StartTime` and `EndTime` values supplied in the examples below are for illustrative purposes. You must substitute appropriate start and end time values for your cache nodes.

For information on ElastiCache limits, see [AWS Service Limits](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elasticache) for ElastiCache.

## Monitoring CloudWatch Cluster and Node Metrics (Console)
<a name="CloudWatchMetrics.CON"></a>

 **To gather CPU utilization statistics for a cache cluster** 

1. Sign in to the AWS Management Console and open the ElastiCache console at [ https://console.aws.amazon.com/elasticache/](https://console.aws.amazon.com/elasticache/).

1. Select the cache nodes you want to view metrics for. 
**Note**  
Selecting more than 20 nodes disables viewing metrics on the console.

   1. On the **Cache Clusters** page of the AWS Management Console, click the name of one or more clusters.

      The detail page for the cluster appears. 

   1. Click the **Nodes** tab at the top of the window.

   1. On the **Nodes** tab of the detail window, select the cache nodes that you want to view metrics for.

      A list of available CloudWatch Metrics appears at the bottom of the console window. 

   1. Click on the **CPU Utilization** metric. 

      The CloudWatch console will open, displaying your selected metrics. You can use the **Statistic** and **Period** drop-down list boxes and **Time Range** tab to change the metrics being displayed. 

## Monitoring CloudWatch Cluster and Node Metrics using the CloudWatch CLI
<a name="CloudWatchMetrics.CLI"></a>

 **To gather CPU utilization statistics for a cache cluster** 
+ For Linux, macOS, or Unix:

  ```
  aws cloudwatch get-metric-statistics \
      --namespace AWS/ElastiCache \
      --metric-name CPUUtilization \
      --dimensions='[{"Name":"CacheClusterId","Value":"test"},{"Name":"CacheNodeId","Value":"0001"}]' \					
      --statistics=Average \
      --start-time 2018-07-05T00:00:00 \
      --end-time 2018-07-06T00:00:00 \
      --period=3600
  ```

  For Windows:

  ```
  aws cloudwatch get-metric-statistics ^
      --namespace AWS/ElastiCache ^
      --metric-name CPUUtilization ^
      --dimensions='[{"Name":"CacheClusterId","Value":"test"},{"Name":"CacheNodeId","Value":"0001"}]' ^
      --statistics=Average ^
      --start-time 2018-07-05T00:00:00 ^
      --end-time 2018-07-06T00:00:00 ^
      --period=3600
  ```

## Monitoring CloudWatch Cluster and Node Metrics using the CloudWatch API
<a name="CloudWatchMetrics.API"></a>

 **To gather CPU utilization statistics for a cache cluster** 
+ Call the CloudWatch API `GetMetricStatistics` with the following parameters (note that the start and end times are shown as examples only; you will need to substitute your own appropriate start and end times):
  + `Statistics.member.1``=Average`
  + `Namespace``=AWS/ElastiCache`
  + `StartTime``=2013-07-05T00:00:00`
  + `EndTime``=2013-07-06T00:00:00`
  + `Period``=60`
  + `MeasureName``=CPUUtilization`
  + `Dimensions``=CacheClusterId=mycachecluster,CacheNodeId=0002`  
**Example**  

  ```
   1. http://monitoring.amazonaws.com/
   2.     ?Action=GetMetricStatistics
   3.     &SignatureVersion=4
   4.     &Version=2014-12-01
   5.     &StartTime=2018-07-05T00:00:00
   6.     &EndTime=2018-07-06T23:59:00
   7.     &Period=3600
   8.     &Statistics.member.1=Average
   9.     &Dimensions.member.1="CacheClusterId=mycachecluster"
  10.     &Dimensions.member.2="CacheNodeId=0002"
  11.     &Namespace=&AWS;/ElastiCache
  12.     &MeasureName=CPUUtilization						
  13.     &Timestamp=2018-07-07T17%3A48%3A21.746Z
  14.     &AWS;AccessKeyId=<&AWS; Access Key ID>
  15.     &Signature=<Signature>
  ```