

# ElastiCache best practices and caching strategies
<a name="BestPractices"></a>

Below you can find recommended best practices for Amazon ElastiCache. Following these improves your cache's performance and reliability. 

**Topics**
+ [Overall best practices](WorkingWithRedis.md)
+ [Best Practices for using Read Replicas](ReadReplicas.md)
+ [Supported and restricted Valkey, Memcached, and Redis OSS commands](SupportedCommands.md)
+ [Valkey and Redis OSS configuration and limits](RedisConfiguration.md)
+ [IPv6 client examples for Valkey, Memcached, and Redis OSS](network-type-best-practices.md)
+ [Best practices for clients (Valkey and Redis OSS)](BestPractices.Clients.redis.md)
+ [Best practices for clients (Memcached)](BestPractices.Clients.memcached.md)
+ [TLS enabled dual stack ElastiCache clusters](#network-type-configuring-tls-enabled-dual-stack)
+ [Managing reserved memory for Valkey and Redis OSS](redis-memory-management.md)
+ [Best practices when working with Valkey and Redis OSS node-based clusters](BestPractices.SelfDesigned.md)
+ [Caching database query results](caching-database-query-results.md)
+ [Caching strategies for Memcached](Strategies.md)

# Overall best practices
<a name="WorkingWithRedis"></a>

Below you can find information about best practices for using the Valkey, Memcached, and Redis OSS interfaces within ElastiCache.
+ **Use cluster-mode enabled configurations** – Cluster-mode enabled allows the cache to scale horizontally to achieve higher storage and throughput than a cluster-mode disabled configuration. ElastiCache serverless is only available in a cluster-mode enabled configuration.
+ **Use long-lived connections** – Creating a new connection is expensive, and takes time and CPU resources from the cache. Reuse connections when possible (e.g. with connection pooling) to amortize this cost over many commands.
+ **Read from replicas** – If you are using ElastiCache serverless or have provisioned read replicas (node-based clusters), direct reads to replicas to achieve better scalability and/or lower latency. Reads from replicas are eventually consistent with the primary.

  In a node-based cluster, avoid directing read requests to a single read replica since reads may not be available temporarily if the node fails. Either configure your client to direct read requests to at least two read replicas, or direct reads to a single replica and the primary.

  In ElastiCache serverless, reading from the replica port (6380) will direct reads to the client's local availability zone when possible, reducing retrieval latency. It will automatically fall back to the other nodes during failures.
+ **Avoid expensive commands** – Avoid running any computationally and I/O intensive operations, such as the `KEYS` and `SMEMBERS` commands. We suggest this approach because these operations increase the load on the cluster and have an impact on the performance of the cluster. Instead, use the `SCAN` and `SSCAN` commands.
+ **Follow Lua best practices** – Avoid long running Lua scripts, and always declare keys used in Lua scripts up front. We recommend this approach to determine that the Lua script is not using cross slot commands. Ensure that the keys used in Lua scripts belong to the same slot.
+ **Use sharded pub/sub** – When using Valkey or Redis OSS to support pub/sub workloads with high throughput, we recommend you use [sharded pub/sub](https://valkey.io/topics/pubsub/) (available with Valkey, and with Redis OSS 7 or later). Traditional pub/sub in cluster-mode enabled clusters broadcasts messages to all nodes in the cluster, which can result in high `EngineCPUUtilization`. Note that in ElastiCache serverless, traditional pub/sub commands internally use sharded pub/sub commands.

**Topics**

# Best Practices for using Read Replicas
<a name="ReadReplicas"></a>

Many applications, such as session stores, leaderboards, and recommendation engines, require high availability and handle significantly more read operations than write operations. These applications can often tolerate slightly stale data (eventual consistency), meaning that it's acceptable if different users momentarily see slightly different versions of the same data. For example:
+ Cached query results can often tolerate slightly stale data, especially for cache-aside patterns where the source of truth is external.
+ In a gaming leaderboard, a few seconds delay in updated scores often won't significantly impact the user experience.
+ For session stores, some slight delays in propagating session data across replicas rarely affect application functionality.
+ Recommendation engines typically use historical data analysis, so real-time consistency is less critical.

Eventual consistency means that all replica nodes will eventually return the same data once the replication process is complete, typically within milliseconds. For such use cases, implementing read replicas is an effective strategy to reduce latency when reading from your ElastiCache instance.

Using read replicas in Amazon ElastiCache can provide significant performance benefits through:

**Enhanced Read Scalability**
+ Distributes read operations across multiple replica nodes
+ Offloads read traffic from the primary node
+ Reduces read latency by serving requests from geographically closer replicas

**Optimized Primary Node Performance**
+ Dedicates primary node resources to write operations
+ Reduces connection overhead on the primary node
+ Improves write performance and maintains better response times during peak traffic periods

## Using Read from Replica in ElastiCache Serverless
<a name="ReadReplicas.serverless"></a>

ElastiCache serverless provides two different endpoints, for different consistency requirements. The two endpoints use the same DNS name but different ports. In order to use the read-from-replica port, you must authorize access to both ports from your client application by [ configuring the security groups and network access control lists of your VPC](set-up.md#elasticache-install-grant-access-VPN).

**Primary endpoint (Port 6379)**
+ Use for operations requiring strong consistency
+ Guarantees reading the most up-to-date data
+ Best for critical transactions and write operations
+ Necessary for write operations
+ Example: `test-12345.serverless.use1.cache.amazonaws.com:6379`

**Read-optimized endpoint (Port 6380)**
+ Optimized for read operations that can tolerate eventual consistency
+ When possible, ElastiCache serverless automatically routes read requests to a replica node in the client's local Availability Zone. This optimization provides lower latency by avoiding the additional network latency incurred when retrieving data from a node in a different availability zone.
+ ElastiCache serverless automatically selects available nodes in other zones if a local node is unavailable
+ Example: `test-12345.serverless.use1.cache.amazonaws.com:6380`
+ Clients like Glide and Lettuce will automatically detect and route reads to the latency optimized endpoint if you provide the read from replica configuration. If your client doesn’t support routing configuration (e.g., valkey-java and older jedis versions), you must define the right port and client configuration to read from replicas.

## Connecting to read replicas in ElastiCache Serverless - Valkey and Glide
<a name="ReadReplicas.connecting-primary"></a>

The following code snippet shows how you can configure read from replica for ElastiCache Serverless in the Valkey glide library. You don’t need to specify port for read from replicas, but you need to configure the routing configuration `ReadFrom.PREFER_REPLICA`.

```
package glide.examples;

import glide.api.GlideClusterClient;
import glide.api.logging.Logger;
import glide.api.models.configuration.GlideClusterClientConfiguration;
import glide.api.models.configuration.NodeAddress;
import glide.api.models.exceptions.ClosingException;
import glide.api.models.exceptions.ConnectionException;
import glide.api.models.exceptions.TimeoutException;
import glide.api.models.configuration.ReadFrom;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;

public class ClusterExample {

    public static void main(String[] args) {
        // Set logger configuration
        Logger.setLoggerConfig(Logger.Level.INFO);

        GlideClusterClient client = null;

        try {
            System.out.println("Connecting to Valkey Glide...");

            // Configure the Glide Client
            GlideClusterClientConfiguration config = GlideClusterClientConfiguration.builder()
                .address(NodeAddress.builder()
                    .host("your-endpoint")
                    .port(6379)
                    .build())
                .useTLS(true)
                .readFrom(ReadFrom.PREFER_REPLICA)
                .build();

            // Create the GlideClusterClient
            client = GlideClusterClient.createClient(config).get();
            System.out.println("Connected successfully.");

            // Perform SET operation
            CompletableFuture<String> setResponse = client.set("key", "value");
            System.out.println("Set key 'key' to 'value': " + setResponse.get());

            // Perform GET operation
            CompletableFuture<String> getResponse = client.get("key");
            System.out.println("Get response for 'key': " + getResponse.get());

            // Perform PING operation
            CompletableFuture<String> pingResponse = client.ping();
            System.out.println("PING response: " + pingResponse.get());

        } catch (ClosingException | ConnectionException | TimeoutException | ExecutionException e) {
            System.err.println("An exception occurred: ");
            e.printStackTrace();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        } finally {
            // Close the client connection
            if (client != null) {
                try {
                    client.close();
                    System.out.println("Client connection closed.");
                } catch (ClosingException | ExecutionException e) {
                    System.err.println("Error closing client: " + e.getMessage());
                }
            }
        }
    }
}
```

# Supported and restricted Valkey, Memcached, and Redis OSS commands
<a name="SupportedCommands"></a>

## Supported Valkey and Redis OSS commands
<a name="SupportedCommandsRedis"></a>

**Supported Valkey and Redis OSS commands**

The following Valkey and Redis OSS commands are supported by serverless caches. In addition to these commands, these [Supported Valkey and Redis OSS commandsJSON commands](json-list-commands.md) are also supported.

For information on Bloom Filter commands see [Bloom filter commands](BloomFilters.md#SupportedCommandsBloom)

**Bitmap Commands**
+ `BITCOUNT`

  Counts the number of set bits (population counting) in a string.

  [Learn more](https://valkey.io/commands/bitcount/)
+ `BITFIELD`

  Performs arbitrary bitfield integer operations on strings.

  [Learn more](https://valkey.io/commands/bitfield/)
+ `BITFIELD_RO`

  Performs arbitrary read-only bitfield integer operations on strings.

  [Learn more](https://valkey.io/commands/bitfield_ro/)
+ `BITOP`

  Performs bitwise operations on multiple strings, and stores the result.

  [Learn more](https://valkey.io/commands/bitop/)
+ `BITPOS`

  Finds the first set (1) or clear (0) bit in a string.

  [Learn more](https://valkey.io/commands/bitpos/)
+ `GETBIT`

  Returns a bit value by offset.

  [Learn more](https://valkey.io/commands/getbit/)
+ `SETBIT`

  Sets or clears the bit at offset of the string value. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/setbit/)

**Cluster Management Commands**
+ `CLUSTER COUNTKEYSINSLOT`

  Returns the number of keys in a hash slot.

  [Learn more](https://valkey.io/commands/cluster-countkeysinslot/)
+ `CLUSTER GETKEYSINSLOT`

  Returns the key names in a hash slot.

  [Learn more](https://valkey.io/commands/cluster-getkeysinslot/)
+ `CLUSTER INFO`

  Returns information about the state of a node. In a serverless cache, returns state about the single virtual “shard” exposed to the client.

  [Learn more](https://valkey.io/commands/cluster-info/)
+ `CLUSTER KEYSLOT`

  Returns the hash slot for a key.

  [Learn more](https://valkey.io/commands/cluster-keyslot/)
+ `CLUSTER MYID`

  Returns the ID of a node. In a serverless cache, returns state about the single virtual “shard” exposed to the client. 

  [Learn more](https://valkey.io/commands/cluster-myid/)
+ `CLUSTER NODES`

  Returns the cluster configuration for a node. In a serverless cache, returns state about the single virtual “shard” exposed to the client. 

  [Learn more](https://valkey.io/commands/cluster-nodes/)
+ `CLUSTER REPLICAS`

  Lists the replica nodes of a master node. In a serverless cache, returns state about the single virtual “shard” exposed to the client. 

  [Learn more](https://valkey.io/commands/cluster-replicas/)
+ `CLUSTER SHARDS`

  Returns the mapping of cluster slots to shards. In a serverless cache, returns state about the single virtual “shard” exposed to the client. 

  [Learn more](https://valkey.io/commands/cluster-shards/)
+ `CLUSTER SLOTS`

  Returns the mapping of cluster slots to nodes. In a serverless cache, returns state about the single virtual “shard” exposed to the client. 

  [Learn more](https://valkey.io/commands/cluster-slots/)
+ `CLUSTER SLOT-STATS`

  Allows tracking of per slot metrics for key count, CPU utilization, network bytes in, and network bytes out. 

  [Learn more](https://valkey.io/commands/cluster-slot-stats/)
+ `READONLY`

  Enables read-only queries for a connection to a Valkey or Redis OSS Cluster replica node.

  [Learn more](https://valkey.io/commands/readonly/)
+ `READWRITE`

  Enables read-write queries for a connection to a Valkey or Redis OSS Cluster replica node.

  [Learn more](https://valkey.io/commands/readwrite/)
+ `SCRIPT SHOW`

  Returns the original source code of a script in the script cache.

  [Learn more](https://valkey.io/commands/script-show/)

**Connection Management Commands**
+ `AUTH`

  Authenticates the connection.

  [Learn more](https://valkey.io/commands/auth/)
+ `CLIENT GETNAME`

  Returns the name of the connection.

  [Learn more](https://valkey.io/commands/client-getname/)
+ `CLIENT REPLY`

  Instructs the server whether to reply to commands.

  [Learn more](https://valkey.io/commands/client-reply/)
+ `CLIENT SETNAME`

  Sets the connection name.

  [Learn more](https://valkey.io/commands/client-setname/)
+ `ECHO`

  Returns the given string.

  [Learn more](https://valkey.io/commands/echo/)
+ `HELLO`

  Handshakes with the Valkey or Redis OSS server.

  [Learn more](https://valkey.io/commands/hello/)
+ `PING`

  Returns the server's liveliness response.

  [Learn more](https://valkey.io/commands/ping/)
+ `QUIT`

  Closes the connection.

  [Learn more](https://valkey.io/commands/quit/)
+ `RESET`

  Resets the connection.

  [Learn more](https://valkey.io/commands/reset/)
+ `SELECT`

  Changes the selected database.

  [Learn more](https://valkey.io/commands/select/)

**Generic Commands**
+ `COPY`

  Copies the value of a key to a new key.

  [Learn more](https://valkey.io/commands/copy/)
+ `DEL`

  Deletes one or more keys.

  [Learn more](https://valkey.io/commands/del/)
+ `DUMP`

  Returns a serialized representation of the value stored at a key.

  [Learn more](https://valkey.io/commands/dump/)
+ `EXISTS`

  Determines whether one or more keys exist.

  [Learn more](https://valkey.io/commands/exists/)
+ `EXPIRE`

  Sets the expiration time of a key in seconds.

  [Learn more](https://valkey.io/commands/expire/)
+ `EXPIREAT`

  Sets the expiration time of a key to a Unix timestamp.

  [Learn more](https://valkey.io/commands/expireat/)
+ `EXPIRETIME`

  Returns the expiration time of a key as a Unix timestamp.

  [Learn more](https://valkey.io/commands/expiretime/)
+ `PERSIST`

  Removes the expiration time of a key.

  [Learn more](https://valkey.io/commands/persist/)
+ `PEXPIRE`

  Sets the expiration time of a key in milliseconds.

  [Learn more](https://valkey.io/commands/pexpire/)
+ `PEXPIREAT`

  Sets the expiration time of a key to a Unix milliseconds timestamp.

  [Learn more](https://valkey.io/commands/pexpireat/)
+ `PEXPIRETIME`

  Returns the expiration time of a key as a Unix milliseconds timestamp.

  [Learn more](https://valkey.io/commands/pexpiretime/)
+ `PTTL`

  Returns the expiration time in milliseconds of a key.

  [Learn more](https://valkey.io/commands/pttl/)
+ `RANDOMKEY`

  Returns a random key name from the database.

  [Learn more](https://valkey.io/commands/randomkey/)
+ `RENAME`

  Renames a key and overwrites the destination.

  [Learn more](https://valkey.io/commands/rename/)
+ `RENAMENX`

  Renames a key only when the target key name doesn't exist.

  [Learn more](https://valkey.io/commands/renamenx/)
+ `RESTORE`

  Creates a key from the serialized representation of a value.

  [Learn more](https://valkey.io/commands/restore/)
+ `SCAN`

  Iterates over the key names in the database.

  [Learn more](https://valkey.io/commands/scan/)
+ `SORT`

  Sorts the elements in a list, a set, or a sorted set, optionally storing the result.

  [Learn more](https://valkey.io/commands/sort/)
+ `SORT_RO`

  Returns the sorted elements of a list, a set, or a sorted set.

  [Learn more](https://valkey.io/commands/sort_ro/)
+ `TOUCH`

  Returns the number of existing keys out of those specified after updating the time they were last accessed.

  [Learn more](https://valkey.io/commands/touch/)
+ `TTL`

  Returns the expiration time in seconds of a key.

  [Learn more](https://valkey.io/commands/ttl/)
+ `TYPE`

  Determines the type of value stored at a key.

  [Learn more](https://valkey.io/commands/type/)
+ `UNLINK`

  Asynchronously deletes one or more keys.

  [Learn more](https://valkey.io/commands/unlink/)

**Geospatial Commands**
+ `GEOADD`

  Adds one or more members to a geospatial index. The key is created if it doesn't exist.

  [Learn more](https://valkey.io/commands/geoadd/)
+ `GEODIST`

  Returns the distance between two members of a geospatial index.

  [Learn more](https://valkey.io/commands/geodist/)
+ `GEOHASH`

  Returns members from a geospatial index as geohash strings.

  [Learn more](https://valkey.io/commands/geohash/)
+ `GEOPOS`

  Returns the longitude and latitude of members from a geospatial index.

  [Learn more](https://valkey.io/commands/geopos/)
+ `GEORADIUS`

  Queries a geospatial index for members within a distance from a coordinate, optionally stores the result.

  [Learn more](https://valkey.io/commands/georadius/)
+ `GEORADIUS_RO`

  Returns members from a geospatial index that are within a distance from a coordinate.

  [Learn more](https://valkey.io/commands/georadius_ro/)
+ `GEORADIUSBYMEMBER`

  Queries a geospatial index for members within a distance from a member, optionally stores the result.

  [Learn more](https://valkey.io/commands/georadiusbymember/)
+ `GEORADIUSBYMEMBER_RO`

  Returns members from a geospatial index that are within a distance from a member.

  [Learn more](https://valkey.io/commands/georadiusbymember_ro/)
+ `GEOSEARCH`

  Queries a geospatial index for members inside an area of a box or a circle.

  [Learn more](https://valkey.io/commands/geosearch/)
+ `GEOSEARCHSTORE`

  Queries a geospatial index for members inside an area of a box or a circle, optionally stores the result.

  [Learn more](https://valkey.io/commands/geosearchstore/)

**Hash Commands**
+ `HDEL`

  Deletes one or more fields and their values from a hash. Deletes the hash if no fields remain.

  [Learn more](https://valkey.io/commands/hdel/)
+ `HEXISTS`

  Determines whether a field exists in a hash.

  [Learn more](https://valkey.io/commands/hexists/)
+ `HGET`

  Returns the value of a field in a hash.

  [Learn more](https://valkey.io/commands/hget/)
+ `HGETALL`

  Returns all fields and values in a hash.

  [Learn more](https://valkey.io/commands/hgetall/)
+ `HINCRBY`

  Increments the integer value of a field in a hash by a number. Uses 0 as initial value if the field doesn't exist.

  [Learn more](https://valkey.io/commands/hincrby/)
+ `HINCRBYFLOAT`

  Increments the floating point value of a field by a number. Uses 0 as initial value if the field doesn't exist.

  [Learn more](https://valkey.io/commands/hincrbyfloat/)
+ `HKEYS`

  Returns all fields in a hash.

  [Learn more](https://valkey.io/commands/hkeys/)
+ `HLEN`

  Returns the number of fields in a hash.

  [Learn more](https://valkey.io/commands/hlen/)
+ `HMGET`

  Returns the values of all fields in a hash.

  [Learn more](https://valkey.io/commands/hmget/)
+ `HMSET`

  Sets the values of multiple fields.

  [Learn more](https://valkey.io/commands/hmset/)
+ `HRANDFIELD`

  Returns one or more random fields from a hash.

  [Learn more](https://valkey.io/commands/hrandfield/)
+ `HSCAN`

  Iterates over fields and values of a hash.

  [Learn more](https://valkey.io/commands/hscan/)
+ `HSET`

  Creates or modifies the value of a field in a hash.

  [Learn more](https://valkey.io/commands/hset/)
+ `HSETNX`

  Sets the value of a field in a hash only when the field doesn't exist.

  [Learn more](https://valkey.io/commands/hsetnx/)
+ `HSTRLEN`

  Returns the length of the value of a field.

  [Learn more](https://valkey.io/commands/hstrlen/)
+ `HVALS`

  Returns all values in a hash.

  [Learn more](https://valkey.io/commands/hvals/)

**HyperLogLog Commands**
+ `PFADD`

  Adds elements to a HyperLogLog key. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/pfadd/)
+ `PFCOUNT`

  Returns the approximated cardinality of the set(s) observed by the HyperLogLog key(s).

  [Learn more](https://valkey.io/commands/pfcount/)
+ `PFMERGE`

  Merges one or more HyperLogLog values into a single key.

  [Learn more](https://valkey.io/commands/pfmerge/)

**List Commands**
+ `BLMOVE`

  Pops an element from a list, pushes it to another list and returns it. Blocks until an element is available otherwise. Deletes the list if the last element was moved.

  [Learn more](https://valkey.io/commands/blmove/)
+ `BLMPOP`

  Pops the first element from one of multiple lists. Blocks until an element is available otherwise. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/blmpop/)
+ `BLPOP`

  Removes and returns the first element in a list. Blocks until an element is available otherwise. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/blpop/)
+ `BRPOP`

  Removes and returns the last element in a list. Blocks until an element is available otherwise. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/brpop/)
+ `BRPOPLPUSH`

  Pops an element from a list, pushes it to another list and returns it. Block until an element is available otherwise. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/brpoplpush/)
+ `LINDEX`

  Returns an element from a list by its index.

  [Learn more](https://valkey.io/commands/lindex/)
+ `LINSERT`

  Inserts an element before or after another element in a list.

  [Learn more](https://valkey.io/commands/linsert/)
+ `LLEN`

  Returns the length of a list.

  [Learn more](https://valkey.io/commands/llen/)
+ `LMOVE`

  Returns an element after popping it from one list and pushing it to another. Deletes the list if the last element was moved.

  [Learn more](https://valkey.io/commands/lmove/)
+ `LMPOP`

  Returns multiple elements from a list after removing them. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/lmpop/)
+ `LPOP`

  Returns the first elements in a list after removing it. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/lpop/)
+ `LPOS`

  Returns the index of matching elements in a list.

  [Learn more](https://valkey.io/commands/lpos/)
+ `LPUSH`

  Prepends one or more elements to a list. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/lpush/)
+ `LPUSHX`

  Prepends one or more elements to a list only when the list exists.

  [Learn more](https://valkey.io/commands/lpushx/)
+ `LRANGE`

  Returns a range of elements from a list.

  [Learn more](https://valkey.io/commands/lrange/)
+ `LREM`

  Removes elements from a list. Deletes the list if the last element was removed.

  [Learn more](https://valkey.io/commands/lrem/)
+ `LSET`

  Sets the value of an element in a list by its index.

  [Learn more](https://valkey.io/commands/lset/)
+ `LTRIM`

  Removes elements from both ends a list. Deletes the list if all elements were trimmed.

  [Learn more](https://valkey.io/commands/ltrim/)
+ `RPOP`

  Returns and removes the last elements of a list. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/rpop/)
+ `RPOPLPUSH`

  Returns the last element of a list after removing and pushing it to another list. Deletes the list if the last element was popped.

  [Learn more](https://valkey.io/commands/rpoplpush/)
+ `RPUSH`

  Appends one or more elements to a list. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/rpush/)
+ `RPUSHX`

  Appends an element to a list only when the list exists.

  [Learn more](https://valkey.io/commands/rpushx/)

**Pub/Sub Commands**

**Note**  
PUBSUB commands internally use sharded PUBSUB, so channel names will be mixed.
+ `PUBLISH`

  Posts a message to a channel.

  [Learn more](https://valkey.io/commands/publish/)
+ `PUBSUB CHANNELS`

  Returns the active channels.

  [Learn more](https://valkey.io/commands/pubsub-channels/)
+ `PUBSUB NUMSUB`

  Returns a count of subscribers to channels.

  [Learn more](https://valkey.io/commands/pubsub-numsub/)
+ `PUBSUB SHARDCHANNELS`

  Returns the active shard channels.

  [Learn more](https://valkey.io/commands/pubsub-shardchannels/)
+ `PUBSUB SHARDNUMSUB`

  Returns the count of subscribers of shard channels.

  [Learn more](https://valkey.io/commands/pubsub-shardnumsub/)
+ `SPUBLISH`

  Post a message to a shard channel

  [Learn more](https://valkey.io/commands/spublish/)
+ `SSUBSCRIBE`

  Listens for messages published to shard channels.

  [Learn more](https://valkey.io/commands/ssubscribe/)
+ `SUBSCRIBE`

  Listens for messages published to channels.

  [Learn more](https://valkey.io/commands/subscribe/)
+ `SUNSUBSCRIBE`

  Stops listening to messages posted to shard channels.

  [Learn more](https://valkey.io/commands/sunsubscribe/)
+ `UNSUBSCRIBE`

  Stops listening to messages posted to channels.

  [Learn more](https://valkey.io/commands/unsubscribe/)

**Scripting Commands**
+ `EVAL`

  Executes a server-side Lua script.

  [Learn more](https://valkey.io/commands/eval/)
+ `EVAL_RO`

  Executes a read-only server-side Lua script.

  [Learn more](https://valkey.io/commands/eval_ro/)
+ `EVALSHA`

  Executes a server-side Lua script by SHA1 digest.

  [Learn more](https://valkey.io/commands/evalsha/)
+ `EVALSHA_RO`

  Executes a read-only server-side Lua script by SHA1 digest.

  [Learn more](https://valkey.io/commands/evalsha_ro/)
+ `SCRIPT EXISTS`

  Determines whether server-side Lua scripts exist in the script cache.

  [Learn more](https://valkey.io/commands/script-exists/)
+ `SCRIPT FLUSH`

  Currently a no-op, script cache is managed by the service. 

  [Learn more](https://valkey.io/commands/script-flush/)
+ `SCRIPT LOAD`

  Loads a server-side Lua script to the script cache.

  [Learn more](https://valkey.io/commands/script-load/)

**Server Management Commands**

**Note**  
When using node-based ElastiCache clusters for Valkey and Redis OSS, flush commands must be sent to every primary by the client to flush all keys. ElastiCache Serverless for Valkey and Redis OSS works differently, because it abstracts away the underlying cluster topology. The result is that in ElastiCache Serverless, `FLUSHDB` and `FLUSHALL` commands will always flush all keys across the cluster. For this reason, flush commands cannot be included inside a Serverless transaction. 
+ `ACL CAT`

  Lists the ACL categories, or the commands inside a category.

  [Learn more](https://valkey.io/commands/acl-cat/)
+ `ACL GENPASS`

  Generates a pseudorandom, secure password that can be used to identify ACL users.

  [Learn more](https://valkey.io/commands/acl-genpass/)
+ `ACL GETUSER`

  Lists the ACL rules of a user.

  [Learn more](https://valkey.io/commands/acl-getuser/)
+ `ACL LIST`

  Dumps the effective rules in ACL file format.

  [Learn more](https://valkey.io/commands/acl-list/)
+ `ACL USERS`

  Lists all ACL users.

  [Learn more](https://valkey.io/commands/acl-users/)
+ `ACL WHOAMI`

  Returns the authenticated username of the current connection.

  [Learn more](https://valkey.io/commands/acl-whoami/)
+ `DBSIZE`

  Return the number of keys in the currently-selected database. This operation is not guaranteed to be atomic across all slots.

  [Learn more](https://valkey.io/commands/dbsize/)
+ `COMMAND`

  Returns detailed information about all commands.

  [Learn more](https://valkey.io/commands/command/)
+ `COMMAND COUNT`

  Returns a count of commands.

  [Learn more](https://valkey.io/commands/command-count/)
+ `COMMAND DOCS`

  Returns documentary information about one, multiple or all commands.

  [Learn more](https://valkey.io/commands/command-docs/)
+ `COMMAND GETKEYS`

  Extracts the key names from an arbitrary command.

  [Learn more](https://valkey.io/commands/command-getkeys/)
+ `COMMAND GETKEYSANDFLAGS`

  Extracts the key names and access flags for an arbitrary command.

  [Learn more](https://valkey.io/commands/command-getkeysandflags/)
+ `COMMAND INFO`

  Returns information about one, multiple or all commands.

  [Learn more](https://valkey.io/commands/command-info/)
+ `COMMAND LIST`

  Returns a list of command names.

  [Learn more](https://valkey.io/commands/command-list/)
+ `COMMANDLOG`

  A container for command log commands.

  [Learn more](https://valkey.io/commands/commandlog/)
+ `COMMANDLOG GET`

  Returns the specified command log's entries.

  [Learn more](https://valkey.io/commands/commandlog-get/)
+ `COMMANDLOG HELP`

  Show helpful text about the different subcommands.

  [Learn more](https://valkey.io/commands/commandlog-help/)
+ `COMMANDLOG LEN`

  Returns the number of entries in the specified type of command log.

  [Learn more](https://valkey.io/commands/commandlog-len/)
+ `COMMANDLOG RESET`

  Clears all entries from the specified type of command log.

  [Learn more](https://valkey.io/commands/commandlog-reset/)
+ `FLUSHALL`

  Removes all keys from all databases. This operation is not guaranteed to be atomic across all slots. 

  [Learn more](https://valkey.io/commands/flushall/)
+ `FLUSHDB`

  Remove all keys from the current database. This operation is not guaranteed to be atomic across all slots.

  [Learn more](https://valkey.io/commands/flushdb/)
+ `INFO`

  Returns information and statistics about the server.

  [Learn more](https://valkey.io/commands/info/)
+ `LOLWUT`

  Displays computer art and the Valkey or Redis OSS version.

  [Learn more](https://valkey.io/commands/lolwut/)
+ `ROLE`

  Returns the replication role.

  [Learn more](https://valkey.io/commands/role/)
+ `TIME`

  Returns the server time.

  [Learn more](https://valkey.io/commands/time/)

**Set Commands**
+ `SADD`

  Adds one or more members to a set. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/sadd/)
+ `SCARD`

  Returns the number of members in a set.

  [Learn more](https://valkey.io/commands/scard/)
+ `SDIFF`

  Returns the difference of multiple sets.

  [Learn more](https://valkey.io/commands/sdiff/)
+ `SDIFFSTORE`

  Stores the difference of multiple sets in a key.

  [Learn more](https://valkey.io/commands/sdiffstore/)
+ `SINTER`

  Returns the intersect of multiple sets.

  [Learn more](https://valkey.io/commands/sinter/)
+ `SINTERCARD`

  Returns the number of members of the intersect of multiple sets.

  [Learn more](https://valkey.io/commands/sintercard/)
+ `SINTERSTORE`

  Stores the intersect of multiple sets in a key.

  [Learn more](https://valkey.io/commands/sinterstore/)
+ `SISMEMBER`

  Determines whether a member belongs to a set.

  [Learn more](https://valkey.io/commands/sismember/)
+ `SMEMBERS`

  Returns all members of a set.

  [Learn more](https://valkey.io/commands/smembers/)
+ `SMISMEMBER`

  Determines whether multiple members belong to a set.

  [Learn more](https://valkey.io/commands/smismember/)
+ `SMOVE`

  Moves a member from one set to another.

  [Learn more](https://valkey.io/commands/smove/)
+ `SPOP`

  Returns one or more random members from a set after removing them. Deletes the set if the last member was popped.

  [Learn more](https://valkey.io/commands/spop/)
+ `SRANDMEMBER`

  Get one or multiple random members from a set

  [Learn more](https://valkey.io/commands/srandmember/)
+ `SREM`

  Removes one or more members from a set. Deletes the set if the last member was removed.

  [Learn more](https://valkey.io/commands/srem/)
+ `SSCAN`

  Iterates over members of a set.

  [Learn more](https://valkey.io/commands/sscan/)
+ `SUNION`

  Returns the union of multiple sets.

  [Learn more](https://valkey.io/commands/sunion/)
+ `SUNIONSTORE`

  Stores the union of multiple sets in a key.

  [Learn more](https://valkey.io/commands/sunionstore/)

**Sorted Set Commands**
+ `BZMPOP`

  Removes and returns a member by score from one or more sorted sets. Blocks until a member is available otherwise. Deletes the sorted set if the last element was popped.

  [Learn more](https://valkey.io/commands/bzmpop/)
+ `BZPOPMAX`

  Removes and returns the member with the highest score from one or more sorted sets. Blocks until a member available otherwise. Deletes the sorted set if the last element was popped.

  [Learn more](https://valkey.io/commands/bzpopmax/)
+ `BZPOPMIN`

  Removes and returns the member with the lowest score from one or more sorted sets. Blocks until a member is available otherwise. Deletes the sorted set if the last element was popped.

  [Learn more](https://valkey.io/commands/bzpopmin/)
+ `ZADD`

  Adds one or more members to a sorted set, or updates their scores. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/zadd/)
+ `ZCARD`

  Returns the number of members in a sorted set.

  [Learn more](https://valkey.io/commands/zcard/)
+ `ZCOUNT`

  Returns the count of members in a sorted set that have scores within a range.

  [Learn more](https://valkey.io/commands/zcount/)
+ `ZDIFF`

  Returns the difference between multiple sorted sets.

  [Learn more](https://valkey.io/commands/zdiff/)
+ `ZDIFFSTORE`

  Stores the difference of multiple sorted sets in a key.

  [Learn more](https://valkey.io/commands/zdiffstore/)
+ `ZINCRBY`

  Increments the score of a member in a sorted set.

  [Learn more](https://valkey.io/commands/zincrby/)
+ `ZINTER`

  Returns the intersect of multiple sorted sets.

  [Learn more](https://valkey.io/commands/zinter/)
+ `ZINTERCARD`

  Returns the number of members of the intersect of multiple sorted sets.

  [Learn more](https://valkey.io/commands/zintercard/)
+ `ZINTERSTORE`

  Stores the intersect of multiple sorted sets in a key.

  [Learn more](https://valkey.io/commands/zinterstore/)
+ `ZLEXCOUNT`

  Returns the number of members in a sorted set within a lexicographical range.

  [Learn more](https://valkey.io/commands/zlexcount/)
+ `ZMPOP`

  Returns the highest- or lowest-scoring members from one or more sorted sets after removing them. Deletes the sorted set if the last member was popped.

  [Learn more](https://valkey.io/commands/zmpop/)
+ `ZMSCORE`

  Returns the score of one or more members in a sorted set.

  [Learn more](https://valkey.io/commands/zmscore/)
+ `ZPOPMAX`

  Returns the highest-scoring members from a sorted set after removing them. Deletes the sorted set if the last member was popped.

  [Learn more](https://valkey.io/commands/zpopmax/)
+ `ZPOPMIN`

  Returns the lowest-scoring members from a sorted set after removing them. Deletes the sorted set if the last member was popped.

  [Learn more](https://valkey.io/commands/zpopmin/)
+ `ZRANDMEMBER`

  Returns one or more random members from a sorted set.

  [Learn more](https://valkey.io/commands/zrandmember/)
+ `ZRANGE`

  Returns members in a sorted set within a range of indexes.

  [Learn more](https://valkey.io/commands/zrange/)
+ `ZRANGEBYLEX`

  Returns members in a sorted set within a lexicographical range.

  [Learn more](https://valkey.io/commands/zrangebylex/)
+ `ZRANGEBYSCORE`

  Returns members in a sorted set within a range of scores.

  [Learn more](https://valkey.io/commands/zrangebyscore/)
+ `ZRANGESTORE`

  Stores a range of members from sorted set in a key.

  [Learn more](https://valkey.io/commands/zrangestore/)
+ `ZRANK`

  Returns the index of a member in a sorted set ordered by ascending scores.

  [Learn more](https://valkey.io/commands/zrank/)
+ `ZREM`

  Removes one or more members from a sorted set. Deletes the sorted set if all members were removed.

  [Learn more](https://valkey.io/commands/zrem/)
+ `ZREMRANGEBYLEX`

  Removes members in a sorted set within a lexicographical range. Deletes the sorted set if all members were removed.

  [Learn more](https://valkey.io/commands/zremrangebylex/)
+ `ZREMRANGEBYRANK`

  Removes members in a sorted set within a range of indexes. Deletes the sorted set if all members were removed.

  [Learn more](https://valkey.io/commands/zremrangebyrank/)
+ `ZREMRANGEBYSCORE`

  Removes members in a sorted set within a range of scores. Deletes the sorted set if all members were removed.

  [Learn more](https://valkey.io/commands/zremrangebyscore/)
+ `ZREVRANGE`

  Returns members in a sorted set within a range of indexes in reverse order.

  [Learn more](https://valkey.io/commands/zrevrange/)
+ `ZREVRANGEBYLEX`

  Returns members in a sorted set within a lexicographical range in reverse order.

  [Learn more](https://valkey.io/commands/zrevrangebylex/)
+ `ZREVRANGEBYSCORE`

  Returns members in a sorted set within a range of scores in reverse order.

  [Learn more](https://valkey.io/commands/zrevrangebyscore/)
+ `ZREVRANK`

  Returns the index of a member in a sorted set ordered by descending scores.

  [Learn more](https://valkey.io/commands/zrevrank/)
+ `ZSCAN`

  Iterates over members and scores of a sorted set.

  [Learn more](https://valkey.io/commands/zscan/)
+ `ZSCORE`

  Returns the score of a member in a sorted set.

  [Learn more](https://valkey.io/commands/zscore/)
+ `ZUNION`

  Returns the union of multiple sorted sets.

  [Learn more](https://valkey.io/commands/zunion/)
+ `ZUNIONSTORE`

  Stores the union of multiple sorted sets in a key.

  [Learn more](https://valkey.io/commands/zunionstore/)

**Stream Commands**
+ `XACK`

  Returns the number of messages that were successfully acknowledged by the consumer group member of a stream.

  [Learn more](https://valkey.io/commands/xack/)
+ `XADD`

  Appends a new message to a stream. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/xadd/)
+ `XAUTOCLAIM`

  Changes, or acquires, ownership of messages in a consumer group, as if the messages were delivered to as consumer group member.

  [Learn more](https://valkey.io/commands/xautoclaim/)
+ `XCLAIM`

  Changes, or acquires, ownership of a message in a consumer group, as if the message was delivered a consumer group member.

  [Learn more](https://valkey.io/commands/xclaim/)
+ `XDEL`

  Returns the number of messages after removing them from a stream.

  [Learn more](https://valkey.io/commands/xdel/)
+ `XGROUP CREATE`

  Creates a consumer group. 

  [Learn more](https://valkey.io/commands/xgroup-create/)
+ `XGROUP CREATECONSUMER`

  Creates a consumer in a consumer group.

  [Learn more](https://valkey.io/commands/xgroup-createconsumer/)
+ `XGROUP DELCONSUMER`

  Deletes a consumer from a consumer group.

  [Learn more](https://valkey.io/commands/xgroup-delconsumer/)
+ `XGROUP DESTROY`

  Destroys a consumer group.

  [Learn more](https://valkey.io/commands/xgroup-destroy/)
+ `XGROUP SETID`

  Sets the last-delivered ID of a consumer group.

  [Learn more](https://valkey.io/commands/xgroup-setid/)
+ `XINFO CONSUMERS`

  Returns a list of the consumers in a consumer group.

  [Learn more](https://valkey.io/commands/xinfo-consumers/)
+ `XINFO GROUPS`

  Returns a list of the consumer groups of a stream.

  [Learn more](https://valkey.io/commands/xinfo-groups/)
+ `XINFO STREAM`

  Returns information about a stream.

  [Learn more](https://valkey.io/commands/xinfo-stream/)
+ `XLEN`

  Return the number of messages in a stream.

  [Learn more](https://valkey.io/commands/xlen/)
+ `XPENDING`

  Returns the information and entries from a stream consumer group's pending entries list.

  [Learn more](https://valkey.io/commands/xpending/)
+ `XRANGE`

  Returns the messages from a stream within a range of IDs.

  [Learn more](https://valkey.io/commands/xrange/)
+ `XREAD`

  Returns messages from multiple streams with IDs greater than the ones requested. Blocks until a message is available otherwise.

  [Learn more](https://valkey.io/commands/xread/)
+ `XREADGROUP`

  Returns new or historical messages from a stream for a consumer in a group. Blocks until a message is available otherwise.

  [Learn more](https://valkey.io/commands/xreadgroup/)
+ `XREVRANGE`

  Returns the messages from a stream within a range of IDs in reverse order.

  [Learn more](https://valkey.io/commands/xrevrange/)
+ `XTRIM`

  Deletes messages from the beginning of a stream.

  [Learn more](https://valkey.io/commands/xtrim/)

**String Commands**
+ `APPEND`

  Appends a string to the value of a key. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/append/)
+ `DECR`

  Decrements the integer value of a key by one. Uses 0 as initial value if the key doesn't exist.

  [Learn more](https://valkey.io/commands/decr/)
+ `DECRBY`

  Decrements a number from the integer value of a key. Uses 0 as initial value if the key doesn't exist.

  [Learn more](https://valkey.io/commands/decrby/)
+ `GET`

  Returns the string value of a key.

  [Learn more](https://valkey.io/commands/get/)
+ `GETDEL`

  Returns the string value of a key after deleting the key.

  [Learn more](https://valkey.io/commands/getdel/)
+ `GETEX`

  Returns the string value of a key after setting its expiration time.

  [Learn more](https://valkey.io/commands/getex/)
+ `GETRANGE`

  Returns a substring of the string stored at a key.

  [Learn more](https://valkey.io/commands/getrange/)
+ `GETSET`

  Returns the previous string value of a key after setting it to a new value.

  [Learn more](https://valkey.io/commands/getset/)
+ `INCR`

  Increments the integer value of a key by one. Uses 0 as initial value if the key doesn't exist.

  [Learn more](https://valkey.io/commands/incr/)
+ `INCRBY`

  Increments the integer value of a key by a number. Uses 0 as initial value if the key doesn't exist.

  [Learn more](https://valkey.io/commands/incrby/)
+ `INCRBYFLOAT`

  Increment the floating point value of a key by a number. Uses 0 as initial value if the key doesn't exist.

  [Learn more](https://valkey.io/commands/incrbyfloat/)
+ `LCS`

  Finds the longest common substring.

  [Learn more](https://valkey.io/commands/lcs/)
+ `MGET`

  Atomically returns the string values of one or more keys.

  [Learn more](https://valkey.io/commands/mget/)
+ `MSET`

  Atomically creates or modifies the string values of one or more keys.

  [Learn more](https://valkey.io/commands/mset/)
+ `MSETNX`

  Atomically modifies the string values of one or more keys only when all keys don't exist.

  [Learn more](https://valkey.io/commands/msetnx/)
+ `PSETEX`

  Sets both string value and expiration time in milliseconds of a key. The key is created if it doesn't exist.

  [Learn more](https://valkey.io/commands/psetex/)
+ `SET`

  Sets the string value of a key, ignoring its type. The key is created if it doesn't exist.

  [Learn more](https://valkey.io/commands/set/)
+ `SETEX`

  Sets the string value and expiration time of a key. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/setex/)
+ `SETNX`

  Set the string value of a key only when the key doesn't exist.

  [Learn more](https://valkey.io/commands/setnx/)
+ `SETRANGE`

  Overwrites a part of a string value with another by an offset. Creates the key if it doesn't exist.

  [Learn more](https://valkey.io/commands/setrange/)
+ `STRLEN`

  Returns the length of a string value.

  [Learn more](https://valkey.io/commands/strlen/)
+ `SUBSTR`

  Returns a substring from a string value.

  [Learn more](https://valkey.io/commands/substr/)

**Transaction Commands**
+ `DISCARD`

  Discards a transaction.

  [Learn more](https://valkey.io/commands/discard/)
+ `EXEC`

  Executes all commands in a transaction.

  [Learn more](https://valkey.io/commands/exec/)
+ `MULTI`

  Starts a transaction.

  [Learn more](https://valkey.io/commands/multi/)

## Restricted Valkey and Redis OSS commands
<a name="RestrictedCommandsRedis"></a>

To deliver a managed service experience, ElastiCache restricts access to certain cache engine-specific commands that require advanced privileges. For caches running Redis OSS, the following commands are unavailable:
+ `acl setuser`
+ `acl load`
+ `acl save`
+ `acl deluser`
+ `bgrewriteaof`
+ `bgsave`
+ `cluster addslot`
+ `cluster addslotsrange`
+ `cluster bumpepoch`
+ `cluster delslot`
+ `cluster delslotsrange `
+ `cluster failover `
+ `cluster flushslots `
+ `cluster forget `
+ `cluster links`
+ `cluster meet`
+ `cluster setslot`
+ `config`
+ `debug`
+ `migrate`
+ `psync`
+ `replicaof`
+ `save`
+ `slaveof`
+ `shutdown`
+ `sync`

In addition, the following commands are unavailable for serverless caches:
+ `acl log`
+ `client caching`
+ `client getredir`
+ `client id`
+ `client info`
+ `client kill`
+ `client list`
+ `client no-evict`
+ `client pause`
+ `client tracking`
+ `client trackinginfo`
+ `client unblock`
+ `client unpause`
+ `cluster count-failure-reports`
+ `commandlog`
+ `commandlog get`
+ `commandlog help`
+ `commandlog len`
+ `commandlog reset`
+ `fcall`
+ `fcall_ro`
+ `function`
+ `function delete`
+ `function dump`
+ `function flush`
+ `function help`
+ `function kill`
+ `function list`
+ `function load`
+ `function restore`
+ `function stats`
+ `keys`
+ `lastsave`
+ `latency`
+ `latency doctor`
+ `latency graph`
+ `latency help`
+ `latency histogram`
+ `latency history`
+ `latency latest`
+ `latency reset`
+ `memory`
+ `memory doctor`
+ `memory help`
+ `memory malloc-stats`
+ `memory purge`
+ `memory stats`
+ `memory usage`
+ `monitor`
+ `move`
+ `object`
+ `object encoding`
+ `object freq`
+ `object help`
+ `object idletime`
+ `object refcount`
+ `pfdebug`
+ `pfselftest`
+ `psubscribe`
+ `pubsub numpat`
+ `punsubscribe`
+ `script kill`
+ `slowlog`
+ `slowlog get`
+ `slowlog help`
+ `slowlog len`
+ `slowlog reset`
+ `swapdb`
+ `wait`

## Supported Memcached commands
<a name="SupportedCommandsMem"></a>

ElastiCache Serverless for Memcached supports all of the memcached [commands](https://github.com/memcached/memcached/wiki/Commands) in open source memcached 1.6 except for the following: 
+ Client connections require TLS, as a result UDP protocol is not supported.
+ Binary protocol is not supported, as it is officially [deprecated](https://github.com/memcached/memcached/wiki/ReleaseNotes160) in memcached 1.6.
+ `GET/GETS` commands are limited to 16KB to avoid potential DoS attack to the server with fetching large number of keys.
+ Delayed `flush_all` command will be rejected with `CLIENT_ERROR`.
+ Commands that configure the engine or reveal internal information about engine state or logs are not supported, such as:
  + For `STATS` command, only `stats` and `stats reset` are supported. Other variations will return `ERROR`
  + `lru / lru_crawler` - modification for LRU and LRU crawler settings
  + `watch` - watches memcached server logs
  + `verbosity` - configures the server log level
  + `me` - meta debug (me) command is not supported

# Valkey and Redis OSS configuration and limits
<a name="RedisConfiguration"></a>

The Valkey and Redis OSS engines each provides a number of configuration parameters, some of which are modifiable in ElastiCache for Redis OSS and some of which are not modifiable to provide stable performance and reliability.

## Serverless caches
<a name="RedisConfiguration.Serverless"></a>

For serverless caches, parameter groups are not used and all Valkey or Redis OSS configuration is not modifiable. The following Valkey or Redis OSS parameters are in place:


****  

|  Name  |  Details  |  Description  | 
| --- | --- | --- | 
| acl-pubsub-default | `allchannels` | Default pubsub channel permissions for ACL users on the cache. | 
| client-output-buffer-limit | `normal 0 0 0` `pubsub 32mb 8mb 60` | Normal clients have no buffer limit. PUB/SUB clients will be disconnected if they breach 32MiB backlog, or breach 8MiB backlog for 60s. | 
| client-query-buffer-limit | 1 GiB | The maximum size of a single client query buffer. Additionally, clients cannot issue a request with more than 3,999 arguments. | 
| cluster-allow-pubsubshard-when-down | yes | This allows the cache to serve pubsub traffic while the cache is partially down. | 
| cluster-allow-reads-when-down | yes | This allows the cache to serve read traffic while the cache is partially down. | 
| cluster-enabled | yes | All serverless caches are cluster mode enabled, which allows them to transparently partition their data across multiple backend shards. All slots are surfaced to clients as being owned by a single virtual node. | 
| cluster-require-full-coverage | no | When the keyspace is partially down (i.e. at least one hash slot is inaccessible), the cache will continue accepting queries for the part of the keyspace that is still covered. The entire keyspace will always be "covered" by a single virtual node in cluster slots. | 
| lua-time-limit | 5000 | The maximum execution time for a Lua script, in milliseconds, before ElastiCache takes action to stop the script. If `lua-time-limit` is exceeded, all Valkey or Redis OSS commands may return an error of the form *\$1\$1\$1\$1-BUSY*. Since this state can cause interference with many essential Valkey or Redis OSS operations, ElastiCache will first issue a *SCRIPT KILL* command. If this is unsuccessful, ElastiCache will forcibly restart Valkey or Redis OSS. | 
| maxclients | 65000 | The maximum number of clients that can be connected to the cache at one time. Further connections established may or may not succeed. | 
| maxmemory-policy | volatile-lru | Items with a TTL set are evicted following least-recently-used (LRU) estimation when a cache's memory limit is reached. | 
| notify-keyspace-events | (an empty string) | Keyspace events are currently not supported on serverless caches. | 
| port | Primary port: 6379 Read port: 6380 | Serverless caches advertise two ports with the same hostname. The primary port allows writes and reads, whereas the read port allows lower-latency eventually-consistent reads using the READONLY command. | 
| proto-max-bulk-len | 512 MiB | The maximum size of a single element request. | 
| timeout | 0 | Clients are not forcibly disconnected at a specific idle time, but they may be disconnected during steady-state for load balancing purposes. | 

Additionally, the following limits are in place:


****  

|  Name  |  Details  |  Description  | 
| --- | --- | --- | 
| Size per cache | 5,000 GiB | Maximum amount of data that can be stored per serverless cache. | 
| Size per slot | 32 GiB | The maximum size of a single Valkey or Redis OSS hash slot. Clients trying to set more data than this on a single Valkey or Redis OSS slot will trigger the eviction policy on the slot, and if no keys are evictable, will receive an out of memory (OOM) error. | 
| ECPU per cache | 15,000,000 ECPU/second | ElastiCache Processing Units (ECPU) metric. The number of ECPUs consumed by your requests depends on the vCPU time taken and the amount of data transferred. | 
| ECPU per slot | 30K - 90K ECPU/second | Maximum of 30K ECPUs/second per slot or 90K ECPUs/second when using Read from Replica using READONLY connections. | 
| Arguments per Request | 3,999 | Maximum number of arguments per request. Clients sending more arguments per request will receive an error. | 
| Key name length | 4 KiB | The maximum size for a single Valkey or Redis OSS key or channel name. Clients referencing keys larger than this will receive an error. | 
| Lua script size | 4 MiB | The maximum size of a single Valkey or Redis OSS Lua script. Attempts to load a Lua script larger than this will receive an error. | 

## Node-based clusters
<a name="RedisConfiguration.SelfDesigned"></a>

For node-based clusters, see [Valkey and Redis OSS parameters](ParameterGroups.Engine.md#ParameterGroups.Redis) for the default values of configuration parameters and which are configurable. The default values are generally recommended unless you have a specific use case requiring them to be overridden.

# IPv6 client examples for Valkey, Memcached, and Redis OSS
<a name="network-type-best-practices"></a>

ElastiCache is compatible with Valkey, Memcached, and Redis OSS. This means that clients which support IPv6 connections should be able to connect to IPv6 enabled ElastiCache for Memcached clusters. There are some caveats worth noting when interacting with IPv6 enabled resources.

You can view the [Best practices for Valkey and Redis clients](https://aws.amazon.com/blogs/database/best-practices-redis-clients-and-amazon-elasticache-for-redis/) blog post on the AWS Database Blog for recommendations on configuring Valkey and Redis OSS clients for ElastiCache resources.

Following are best practices for interacting with IPv6 enabled ElastiCache resources with commonly used open-source client libraries. 

## Validated clients with Valkey and Redis OSS
<a name="network-type-validated-clients-redis"></a>

ElastiCache is compatible with Valkey and open-source Redis OSS. This means that Valkey and open source Redis OSS clients that support IPv6 connections should be able to connect to IPv6 enabled ElastiCache for Redis OSS clusters. In addition, several of the most popular Python and Java clients have been specifically tested and validated to work with all supported network type configurations (IPv4 only, IPv6 only, and Dual Stack)

The following clients have specifically been validated to work with all supported network type configurations for Valkey and Redis OSS.

Validated Clients:
+ [Redis Py ()](https://github.com/redis/redis-py) – [4.1.2](https://github.com/redis/redis-py/tree/v4.1.2)
+ [Lettuce](https://lettuce.io/) – [Version: 6.1.6.RELEASE](https://github.com/lettuce-io/lettuce-core/tree/6.1.6.RELEASE)
+ [Jedis](https://github.com/redis/jedis) – [Version: 3.6.0](https://github.com/redis/jedis/tree/jedis-3.6.0)

# Best practices for clients (Valkey and Redis OSS)
<a name="BestPractices.Clients.redis"></a>

Learn best practices for common scenarios and follow along with code examples of some of the most popular open source Valkey and Redis OSS client libraries (redis-py, PHPRedis, and Lettuce), as well as best practices for interacting with ElastiCache resources with commonly used open-source Memcached client libraries.

**Topics**
+ [Large number of connections (Valkey and Redis OSS)](BestPractices.Clients.Redis.Connections.md)
+ [Cluster client discovery and exponential backoff (Valkey and Redis OSS)](BestPractices.Clients.Redis.Discovery.md)
+ [Configure a client-side timeout (Valkey and Redis OSS)](BestPractices.Clients.Redis.ClientTimeout.md)
+ [Configure a server-side idle timeout (Valkey and Redis OSS)](BestPractices.Clients.Redis.ServerTimeout.md)
+ [Lua scripts](BestPractices.Clients.Redis.LuaScripts.md)
+ [Storing large composite items (Valkey and Redis OSS)](BestPractices.Clients.Redis.LargeItems.md)
+ [Lettuce client configuration (Valkey and Redis OSS)](BestPractices.Clients-lettuce.md)
+ [Configuring a preferred protocol for dual stack clusters (Valkey and Redis OSS)](#network-type-configuring-dual-stack-redis)

# Large number of connections (Valkey and Redis OSS)
<a name="BestPractices.Clients.Redis.Connections"></a>

Serverless caches and individual ElastiCache for Redis OSS nodes support up to 65,000 concurrent client connections. However, to optimize for performance, we advise that client applications do not constantly operate at that level of connections. Valkey and Redis OSS each have a single-threaded process based on an event loop where incoming client requests are handled sequentially. That means the response time of a given client becomes longer as the number of connected clients increases.

You can take the following set of actions to avoid hitting a connection bottleneck on a Valkey or Redis OSS server:
+ Perform read operations from read replicas. This can be done by using the ElastiCache reader endpoints in cluster mode disabled or by using replicas for reads in cluster mode enabled, including a serverless cache.
+ Distribute write traffic across multiple primary nodes. You can do this in two ways. You can use a multi-sharded Valkey or Redis OSS cluster with a cluster mode capable client. You could also write to multiple primary nodes in cluster mode disabled with client-side sharding. This is done automatically in a serverless cache.
+ Use a connection pool when available in your client library.

In general, creating a TCP connection is a computationally expensive operation compared to typical Valkey or Redis OSS commands. For example, handling a SET/GET request is an order of magnitude faster when reusing an existing connection. Using a client connection pool with a finite size reduces the overhead of connection management. It also bounds the number of concurrent incoming connections from the client application.

The following code example of PHPRedis shows that a new connection is created for each new user request:

```
$redis = new Redis();
if ($redis->connect($HOST, $PORT) != TRUE) {
	//ERROR: connection failed
	return;
}
$redis->set($key, $value);
unset($redis);
$redis = NULL;
```

We benchmarked this code in a loop on an Amazon Elastic Compute Cloud (Amazon EC2) instance connected to a Graviton2 (m6g.2xlarge) ElastiCache for Redis OSS node. We placed both the client and server at the same Availability Zone. The average latency of the entire operation was 2.82 milliseconds.

When we updated the code and used persistent connections and a connection pool, the average latency of the entire operation was 0.21 milliseconds:

```
$redis = new Redis();
if ($redis->pconnect($HOST, $PORT) != TRUE) {
	// ERROR: connection failed
	return;
}
$redis->set($key, $value);
unset($redis);
$redis = NULL;
```

Required redis.ini configurations:
+ `redis.pconnect.pooling_enabled=1`
+ `redis.pconnect.connection_limit=10`

The following code is an example of a [Redis-py connection pool](https://redis.readthedocs.io/en/stable/):

```
conn = Redis(connection_pool=redis.BlockingConnectionPool(host=HOST, max_connections=10))
conn.set(key, value)
```

The following code is an example of a [Lettuce connection pool](https://lettuce.io/core/release/reference/#_connection_pooling):

```
RedisClient client = RedisClient.create(RedisURI.create(HOST, PORT));
GenericObjectPool<StatefulRedisConnection> pool = ConnectionPoolSupport.createGenericObjectPool(() -> client.connect(), new GenericObjectPoolConfig());
pool.setMaxTotal(10); // Configure max connections to 10
try (StatefulRedisConnection connection = pool.borrowObject()) {
	RedisCommands syncCommands = connection.sync();
	syncCommands.set(key, value);
}
```

# Cluster client discovery and exponential backoff (Valkey and Redis OSS)
<a name="BestPractices.Clients.Redis.Discovery"></a>

When connecting to an ElastiCache Valkey or Redis OSS cluster in cluster mode enabled, the corresponding client library must be cluster aware. The clients must obtain a map of hash slots to the corresponding nodes in the cluster in order to send requests to the right nodes and avoid the performance overhead of handing cluster redirections. As a result, the client must discover a complete list of slots and the mapped nodes in two different situations:
+ The client is initialized and must populate the initial slots configuration
+ A MOVED redirection is received from the server, such as in the situation of a failover when all slots served by the former primary node are taken over by the replica, or re-sharding when slots are being moved from the source primary to the target primary node

Client discovery is usually done via issuing a CLUSTER SLOT or CLUSTER NODE command to the Valkey or Redis OSS server. We recommend the CLUSTER SLOT method because it returns the set of slot ranges and the associated primary and replica nodes back to the client. This doesn't require additional parsing from the client and is more efficient.

Depending on the cluster topology, the size of the response for the CLUSTER SLOT command can vary based on the cluster size. Larger clusters with more nodes produce a larger response. As a result, it's important to ensure that the number of clients doing the cluster topology discovery doesn't grow unbounded. For example, when the client application starts up or loses connection from the server and must perform cluster discovery, one common mistake is that the client application fires several reconnection and discovery requests without adding exponential backoff upon retry. This can render the Valkey or Redis OSS server unresponsive for a prolonged period of time, with the CPU utilization at 100%. The outage is prolonged if each CLUSTER SLOT command must process a large number of nodes in the cluster bus. We have observed multiple client outages in the past due to this behavior across a number of different languages including Python (redis-py-cluster) and Java (Lettuce and Redisson).

In a serverless cache, many of the problems are automatically mitigated because the advertised cluster topology is static and consists of two entries: a write endpoint and a read endpoint. Cluster discovery is also automatically spread over multiple nodes when using the cache endpoint. The following recommendations are still useful, however.

To mitigate the impact caused by a sudden influx of connection and discovery requests, we recommend the following:
+ Implement a client connection pool with a finite size to bound the number of concurrent incoming connections from the client application.
+ When the client disconnects from the server due to timeout, retry with exponential backoff with jitter. This helps to avoid multiple clients overwhelming the server at the same time.
+ Use the guide at [Finding connection endpoints in ElastiCache](Endpoints.md) to find the cluster endpoint to perform cluster discovery. In doing so, you spread the discovery load across all nodes in the cluster (up to 90) instead of hitting a few hardcoded seed nodes in the cluster.

The following are some code examples for exponential backoff retry logic in redis-py, PHPRedis, and Lettuce.

**Backoff logic sample 1: redis-py**

redis-py has a built-in retry mechanism that retries one time immediately after a failure. This mechanism can be enabled through the `retry_on_timeout` argument supplied when creating a [Redis OSS](https://redis.readthedocs.io/en/stable/examples/connection_examples.html#redis.Redis) object. Here we demonstrate a custom retry mechanism with exponential backoff and jitter. We've submitted a pull request to natively implement exponential backoff in [redis-py (\$11494)](https://github.com/andymccurdy/redis-py/pull/1494). In the future it may not be necessary to implement manually.

```
def run_with_backoff(function, retries=5):
base_backoff = 0.1 # base 100ms backoff
max_backoff = 10 # sleep for maximum 10 seconds
tries = 0
while True:
try:
  return function()
except (ConnectionError, TimeoutError):
  if tries >= retries:
	raise
  backoff = min(max_backoff, base_backoff * (pow(2, tries) + random.random()))
  print(f"sleeping for {backoff:.2f}s")
  sleep(backoff)
  tries += 1
```

You can then use the following code to set a value:

```
client = redis.Redis(connection_pool=redis.BlockingConnectionPool(host=HOST, max_connections=10))
res = run_with_backoff(lambda: client.set("key", "value"))
print(res)
```

Depending on your workload, you might want to change the base backoff value from 1 second to a few tens or hundreds of milliseconds for latency-sensitive workloads.

**Backoff logic sample 2: PHPRedis**

PHPRedis has a built-in retry mechanism that retries a (non-configurable) maximum of 10 times. There is a configurable delay between tries (with a jitter from the second retry onwards). For more information, see the following [sample code](https://github.com/phpredis/phpredis/blob/b0b9dd78ef7c15af936144c1b17df1a9273d72ab/library.c#L335-L368). We've submitted a pull request to natively implement exponential backoff in [PHPredis (\$11986)](https://github.com/phpredis/phpredis/pull/1986) that has since been merged and [documented](https://github.com/phpredis/phpredis/blob/develop/README.md#retry-and-backoff). For those on the latest release of PHPRedis, it won't be necessary to implement manually but we've included the reference here for those on previous versions. For now, the following is a code example that configures the delay of the retry mechanism:

```
$timeout = 0.1; // 100 millisecond connection timeout
$retry_interval = 100; // 100 millisecond retry interval
$client = new Redis();
if($client->pconnect($HOST, $PORT, $timeout, NULL, $retry_interval) != TRUE) {
	return; // ERROR: connection failed
}
$client->set($key, $value);
```

**Backoff logic sample 3: Lettuce**

Lettuce has built-in retry mechanisms based on the exponential backoff strategies described in the post [Exponential Backoff and Jitter](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/). The following is a code excerpt showing the full jitter approach:

```
public static void main(String[] args)
{
	ClientResources resources = null;
	RedisClient client = null;

	try {
		resources = DefaultClientResources.builder()
				.reconnectDelay(Delay.fullJitter(
			Duration.ofMillis(100),     // minimum 100 millisecond delay
			Duration.ofSeconds(5),      // maximum 5 second delay
			100, TimeUnit.MILLISECONDS) // 100 millisecond base
		).build();

		client = RedisClient.create(resources, RedisURI.create(HOST, PORT));
		client.setOptions(ClientOptions.builder()
	.socketOptions(SocketOptions.builder().connectTimeout(Duration.ofMillis(100)).build()) // 100 millisecond connection timeout
	.timeoutOptions(TimeoutOptions.builder().fixedTimeout(Duration.ofSeconds(5)).build()) // 5 second command timeout
	.build());

	    // use the connection pool from above example
	} finally {
		if (connection != null) {
			connection.close();
		}

		if (client != null){
			client.shutdown();
		}

		if (resources != null){
			resources.shutdown();
		}

	}
}
```

# Configure a client-side timeout (Valkey and Redis OSS)
<a name="BestPractices.Clients.Redis.ClientTimeout"></a>

**Configuring the client-side timeout**

Configure the client-side timeout appropriately to allow the server sufficient time to process the request and generate the response. This also allows it to fail fast if the connection to the server can't be established. Certain Valkey or Redis OSS commands can be more computationally expensive than others. For example, Lua scripts or MULTI/EXEC transactions that contain multiple commands that must be run atomically. In general, a higher client-side timeout is recommended to avoid a time out of the client before the response is received from the server, including the following:
+ Running commands across multiple keys
+ Running MULTI/EXEC transactions or Lua scripts that consist of multiple individual Valkey or Redis OSS commands
+ Reading large values
+ Performing blocking operations such as BLPOP

In case of a blocking operation such as BLPOP, the best practice is to set the command timeout to a number lower than the socket timeout.

The following are code examples for implementing a client-side timeout in redis-py, PHPRedis, and Lettuce.

**Timeout configuration sample 1: redis-py**

The following is a code example with redis-py:

```
# connect to Redis server with a 100 millisecond timeout
# give every Redis command a 2 second timeout
client = redis.Redis(connection_pool=redis.BlockingConnectionPool(host=HOST, max_connections=10,socket_connect_timeout=0.1,socket_timeout=2))

res = client.set("key", "value") # will timeout after 2 seconds
print(res)                       # if there is a connection error

res = client.blpop("list", timeout=1) # will timeout after 1 second
                                      # less than the 2 second socket timeout
print(res)
```

**Timeout config sample 2: PHPRedis**

The following is a code example with PHPRedis:

```
// connect to Redis server with a 100ms timeout
// give every Redis command a 2s timeout
$client = new Redis();
$timeout = 0.1; // 100 millisecond connection timeout
$retry_interval = 100; // 100 millisecond retry interval
$client = new Redis();
if($client->pconnect($HOST, $PORT, 0.1, NULL, 100, $read_timeout=2) != TRUE){
	return; // ERROR: connection failed
}
$client->set($key, $value);

$res = $client->set("key", "value"); // will timeout after 2 seconds
print "$res\n";                      // if there is a connection error

$res = $client->blpop("list", 1); // will timeout after 1 second
print "$res\n";                   // less than the 2 second socket timeout
```

**Timeout config sample 3: Lettuce**

The following is a code example with Lettuce:

```
// connect to Redis server and give every command a 2 second timeout
public static void main(String[] args)
{
	RedisClient client = null;
	StatefulRedisConnection<String, String> connection = null;
	try {
		client = RedisClient.create(RedisURI.create(HOST, PORT));
		client.setOptions(ClientOptions.builder()
	.socketOptions(SocketOptions.builder().connectTimeout(Duration.ofMillis(100)).build()) // 100 millisecond connection timeout
	.timeoutOptions(TimeoutOptions.builder().fixedTimeout(Duration.ofSeconds(2)).build()) // 2 second command timeout 
	.build());

		// use the connection pool from above example

		commands.set("key", "value"); // will timeout after 2 seconds
		commands.blpop(1, "list"); // BLPOP with 1 second timeout
	} finally {
		if (connection != null) {
			connection.close();
		}

		if (client != null){
			client.shutdown();
		}
	}
}
```

# Configure a server-side idle timeout (Valkey and Redis OSS)
<a name="BestPractices.Clients.Redis.ServerTimeout"></a>

We have observed cases when a customer's application has a high number of idle clients connected, but isn't actively sending commands. In such scenarios, you can exhaust all 65,000 connections with a high number of idle clients. To avoid such scenarios, configure the timeout setting appropriately on the server via [Valkey and Redis OSS parameters](ParameterGroups.Engine.md#ParameterGroups.Redis). This ensures that the server actively disconnects idle clients to avoid an increase in the number of connections. This setting is not available on serverless caches.

# Lua scripts
<a name="BestPractices.Clients.Redis.LuaScripts"></a>

Valkey and Redis OSS supports more than 200 commands, including those to run Lua scripts. However, when it comes to Lua scripts, there are several pitfalls that can affect memory and availability of Valkey or Redis OSS.

**Unparameterized Lua scripts**

Each Lua script is cached on the Valkey or Redis OSS server before it runs. Unparameterized Lua scripts are unique, which can lead to the Valkey or Redis OSS server storing a large number of Lua scripts and consuming more memory. To mitigate this, ensure that all Lua scripts are parameterized and regularly perform SCRIPT FLUSH to clean up cached Lua scripts if needed.

Also be aware that keys must be provided. If a value for the KEY parameter is not provided, the script will fail. For example, this will not work: 

```
serverless-test-lst4hg.serverless.use1.cache.amazonaws.com:6379> eval 'return "Hello World"' 0
(error) ERR Lua scripts without any input keys are not supported.
```

This will work:

```
serverless-test-lst4hg.serverless.use1.cache.amazonaws.com:6379> eval 'return redis.call("get", KEYS[1])' 1 mykey-2
"myvalue-2"
```

The following example shows how to use parameterized scripts. First, we have an example of an unparameterized approach that results in three different cached Lua scripts and is not recommended:

```
eval "return redis.call('set','key1','1')" 0
eval "return redis.call('set','key2','2')" 0
eval "return redis.call('set','key3','3')" 0
```

Instead, use the following pattern to create a single script that can accept passed parameters:

```
eval "return redis.call('set',KEYS[1],ARGV[1])" 1 key1 1 
eval "return redis.call('set',KEYS[1],ARGV[1])" 1 key2 2 
eval "return redis.call('set',KEYS[1],ARGV[1])" 1 key3 3
```

**Long-running Lua scripts**

Lua scripts can run multiple commands atomically, so it can take longer to complete than a regular Valkey or Redis OSS command. If the Lua script only runs read-only operations, you can stop it in the middle. However, as soon as the Lua script performs a write operation, it becomes unkillable and must run to completion. A long-running Lua script that is mutating can cause the Valkey or Redis OSS server to be unresponsive for a long time. To mitigate this issue, avoid long-running Lua scripts and test the script out in a pre-production environment.

**Lua script with stealth writes**

There are a few ways a Lua script can continue to write new data into Valkey or Redis OSS even when Valkey or Redis OSS is over `maxmemory`:
+ The script starts when the Valkey or Redis OSS server is below `maxmemory`, and contains multiple write operations inside
+ The script's first write command isn't consuming memory (such as DEL), followed by more write operations that consume memory
+ You can mitigate this problem by configuring a proper eviction policy in Valkey or Redis OSS server other than `noeviction`. This allows Redis OSS to evict items and free up memory in between Lua scripts.

# Storing large composite items (Valkey and Redis OSS)
<a name="BestPractices.Clients.Redis.LargeItems"></a>

In some scenarios, an application may store large composite items in Valkey or Redis OSS (such as a multi-GB hash dataset). This is not a recommended practice because it often leads to performance problems in Valkey or Redis OSS. For example, the client can do a HGETALL command to retrieve the entire multi GB hash collection. This can generate significant memory pressure to the Valkey or Redis OSS server buffering the large item in the client output buffer. Also, for slot migration in cluster mode, ElastiCache doesn't migrate slots that contain items with serialized size that is larger than 256 MB.

To solve the large item problems, we have the following recommendations:
+ Break up the large composite item into multiple smaller items. For example, break up a large hash collection into individual key-value fields with a key name scheme that appropriately reflects the collection, such as using a common prefix in the key name to identify the collection of items. If you must access multiple fields in the same collection atomically, you can use the MGET command to retrieve multiple key-values in the same command.
+ If you evaluated all options and still can't break up the large collection dataset, try to use commands that operate on a subset of the data in the collection instead of the entire collection. Avoid having a use case that requires you to atomically retrieve the entire multi-GB collection in the same command. One example is using HGET or HMGET commands instead of HGETALL on hash collections.

# Lettuce client configuration (Valkey and Redis OSS)
<a name="BestPractices.Clients-lettuce"></a>

This section describes the recommended Java and Lettuce configuration options, and how they apply to ElastiCache clusters.

The recommendations in this section were tested with Lettuce version 6.2.2.

**Topics**
+ [Example: Lettuce config for cluster mode, TLS enabled](BestPractices.Clients-lettuce-cme.md)
+ [Example: Lettuce config for cluster mode disabled, TLS enabled](BestPractices.Clients-lettuce-cmd.md)

**Java DNS cache TTL**

The Java virtual machine (JVM) caches DNS name lookups. When the JVM resolves a hostname to an IP address, it caches the IP address for a specified period of time, known as the *time-to-live* (TTL).

The choice of TTL value is a trade-off between latency and responsiveness to change. With shorter TTLs, DNS resolvers notice updates in the cluster's DNS faster. This can make your application respond faster to replacements or other workflows that your cluster undergoes. However, if the TTL is too low, it increases the query volume, which can increase the latency of your application. While there is no correct TTL value, it's worth considering the length of time that you can afford to wait for a change to take effect when setting your TTL value.

Because ElastiCache nodes use DNS name entries that might change, we recommend that you configure your JVM with a low TTL of 5 to 10 seconds. This ensures that when a node's IP address changes, your application will be able to receive and use the resource's new IP address by requerying the DNS entry.

On some Java configurations, the JVM default TTL is set so that it will never refresh DNS entries until the JVM is restarted.

For details on how to set your JVM TTL, see [How to set the JVM TTL](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html#how-to-set-the-jvm-ttl).

**Lettuce version**

We recommend using Lettuce version 6.2.2 or later.

**Endpoints**

When you're using cluster mode enabled clusters, set the `redisUri` to the cluster configuration endpoint. The DNS lookup for this URI returns a list of all available nodes in the cluster, and is randomly resolved to one of them during the cluster initialization. For more details about how topology refresh works, see *dynamicRefreshSources* later in this topic.

**SocketOption**

Enable [KeepAlive](https://lettuce.io/core/release/api/io/lettuce/core/SocketOptions.KeepAliveOptions.html). Enabling this option reduces the need to handle failed connections during command runtime.

Ensure that you set the [Connection timeout](https://lettuce.io/core/release/api/io/lettuce/core/SocketOptions.Builder.html#connectTimeout-java.time.Duration-) based on your application requirements and workload. For more information, see the Timeouts section later in this topic.

**ClusterClientOption: Cluster Mode Enabled client options**

Enable [AutoReconnect](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterClientOptions.Builder.html#autoReconnect-boolean-) when connection is lost.

Set [CommandTimeout](https://lettuce.io/core/release/api/io/lettuPrce/core/RedisURI.html#getTimeout--). For more details, see the Timeouts section later in this topic.

Set [nodeFilter](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterClientOptions.Builder.html#nodeFilter-java.util.function.Predicate-) to filter out failed nodes from the topology. Lettuce saves all nodes that are found in the 'cluster nodes' output (including nodes with PFAIL/FAIL status) in the client's 'partitions' (also known as shards). During the process of creating the cluster topology, it attempts to connect to all the partition nodes. This Lettuce behavior of adding failed nodes can cause connection errors (or warnings) when nodes are getting replaced for any reason. 

For example, after a failover is finished and the cluster starts the recovery process, while the clusterTopology is getting refreshed, the cluster bus nodes map has a short period of time that the down node is listed as a FAIL node, before it's completely removed from the topology. During this period, the Lettuce client considers it a healthy node and continually connects to it. This causes a failure after retrying is exhausted. 

For example:

```
final ClusterClientOptions clusterClientOptions = 
    ClusterClientOptions.builder()
    ... // other options
    .nodeFilter(it -> 
        ! (it.is(RedisClusterNode.NodeFlag.FAIL) 
        || it.is(RedisClusterNode.NodeFlag.EVENTUAL_FAIL) 
        || it.is(RedisClusterNode.NodeFlag.HANDSHAKE)
        || it.is(RedisClusterNode.NodeFlag.NOADDR)))
    .validateClusterNodeMembership(false)
    .build();
redisClusterClient.setOptions(clusterClientOptions);
```

**Note**  
Node filtering is best used with DynamicRefreshSources set to true. Otherwise, if the topology view is taken from a single problematic seed node, that sees a primary node of some shard as failing, it will filter out this primary node, which will result in slots not being covered. Having multiple seed nodes (when DynamicRefreshSources is true) reduces the likelihood of this issue, since at least some of the seed nodes should have an updated topology view after a failover with the newly promoted primary.

**ClusterTopologyRefreshOptions: Options to control the cluster topology refreshing of the Cluster Mode Enabled client**

**Note**  
Cluster mode disabled clusters don't support the cluster discovery commands and aren't compatible with all clients dynamic topology discovery functionality.  
Cluster mode disabled with ElastiCache isn't compatible with Lettuce's `MasterSlaveTopologyRefresh`. Instead, for cluster mode disabled you can configure a `StaticMasterReplicaTopologyProvider` and provide the cluster read and write endpoints.  
For more information on connecting to cluster mode disabled clusters, see [Finding a Valkey or Redis OSS (Cluster Mode Disabled) Cluster's Endpoints (Console)](Endpoints.md#Endpoints.Find.Redis).  
If you wish to use Lettuce's dynamic topology discovery functionality, then you can create a cluster mode enabled cluster with the same shard configuration as your existing cluster. However, for cluster mode enabled clusters we recommend configuring at least 3 shards with at least one 1 replica to support fast failover.

Enable [enablePeriodicRefresh](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.Builder.html#enablePeriodicRefresh-java.time.Duration-). This enables periodic cluster topology updates so that the client updates the cluster topology in the intervals of the refreshPeriod (default: 60 seconds). When it's disabled, the client updates the cluster topology only when errors occur when it attempts to run commands against the cluster. 

With this option enabled, you can reduce the latency that's associated with refreshing the cluster topology by adding this job to a background task. While topology refreshment is performed in a background job, it can be somewhat slow for clusters with many nodes. This is because all nodes are being queried for their views to get the most updated cluster view. If you run a large cluster, you might want to increase the period.

Enable [enableAllAdaptiveRefreshTriggers](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.Builder.html#enableAllAdaptiveRefreshTriggers--). This enables adaptive topology refreshing that uses all [triggers](https://lettuce.io/core/6.1.6.RELEASE/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.RefreshTrigger.html): MOVED\$1REDIRECT, ASK\$1REDIRECT, PERSISTENT\$1RECONNECTS, UNCOVERED\$1SLOT, UNKNOWN\$1NODE. Adaptive refresh triggers initiate topology view updates based on events that happen during Valkey or Redis OSS cluster operations. Enabling this option leads to an immediate topology refresh when one of the preceding triggers occur. Adaptive triggered refreshes are rate-limited using a timeout because events can happen on a large scale (default timeout between updates: 30).

Enable [closeStaleConnections](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.Builder.html#closeStaleConnections-boolean-). This enables closing stale connections when refreshing the cluster topology. It only comes into effect if [ClusterTopologyRefreshOptions.isPeriodicRefreshEnabled()](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.html#isPeriodicRefreshEnabled--) is true. When it's enabled, the client can close stale connections and create new ones in the background. This reduces the need to handle failed connections during command runtime.

Enable [dynamicRefreshSources](https://lettuce.io/core/release/api/io/lettuce/core/cluster/ClusterTopologyRefreshOptions.Builder.html#dynamicRefreshSources-boolean-). We recommend enabling dynamicRefreshSources for small clusters, and disabling it for large clusters. dynamicRefreshSources enables discovering cluster nodes from the provided seed node (for example, cluster configuration endpoint). It uses all the discovered nodes as sources for refreshing the cluster topology. 

Using dynamic refresh queries all discovered nodes for the cluster topology and attempts to choose the most accurate cluster view. If it's set to false, only the initial seed nodes are used as sources for topology discovery, and the number of clients are obtained only for the initial seed nodes. When it's disabled, if the cluster configuration endpoint is resolved to a failed node, trying to refresh the cluster view fails and leads to exceptions. This scenario can happen because it takes some time until a failed node's entry is removed from the cluster configuration endpoint. Therefore, the configuration endpoint can still be randomly resolved to a failed node for a short period of time. 

When it's enabled, however, we use all of the cluster nodes that are received from the cluster view to query for their current view. Because we filter out failed nodes from that view, the topology refresh will be successful. However, when dynamicRefreshSources is true, Lettuce queries all nodes to get the cluster view, and then compares the results. So it can be expensive for clusters with a lot of nodes. We suggest that you turn off this feature for clusters with many nodes. 

```
final ClusterTopologyRefreshOptions topologyOptions = 
    ClusterTopologyRefreshOptions.builder()
    .enableAllAdaptiveRefreshTriggers()
    .enablePeriodicRefresh()
    .dynamicRefreshSources(true)
    .build();
```

**ClientResources**

Configure [DnsResolver](https://lettuce.io/core/release/api/io/lettuce/core/resource/DefaultClientResources.Builder.html#dnsResolver-io.lettuce.core.resource.DnsResolver-) with [DirContextDnsResolver](https://lettuce.io/core/release/api/io/lettuce/core/resource/DirContextDnsResolver.html). The DNS resolver is based on Java's com.sun.jndi.dns.DnsContextFactory.

Configure [reconnectDelay](https://lettuce.io/core/release/api/io/lettuce/core/resource/DefaultClientResources.Builder.html#reconnectDelay-io.lettuce.core.resource.Delay-) with exponential backoff and full jitter. Lettuce has built-in retry mechanisms based on the exponential backoff strategies.. For details, see [Exponential Backoff and Jitter](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter) on the AWS Architecture Blog. For more information about the importance of having a retry backoff strategy, see the backoff logic sections of the [Best practices blog post](https://aws.amazon.com/blogs/database/best-practices-redis-clients-and-amazon-elasticache-for-redis/) on the AWS Database Blog.

```
ClientResources clientResources = DefaultClientResources.builder()
   .addressResolverGroup(new DirContextDnsResolver())
    .reconnectDelay(
        Delay.fullJitter(
            Duration.ofMillis(100),     // minimum 100 millisecond delay
            Duration.ofSeconds(10),      // maximum 10 second delay
            100, TimeUnit.MILLISECONDS)) // 100 millisecond base
    .build();
```

**Timeouts **

Use a lower connect timeout value than your command timeout. Lettuce uses lazy connection establishment. So if the connect timeout is higher than the command timeout, you can have a period of persistent failure after a topology refresh if Lettuce tries to connect to an unhealthy node and the command timeout is always exceeded. 

Use a dynamic command timeout for different commands. We recommend that you set the command timeout based on the command expected duration. For example, use a longer timeout for commands that iterate over several keys, such as FLUSHDB, FLUSHALL, KEYS, SMEMBERS, or Lua scripts. Use shorter timeouts for single key commands, such as SET, GET, and HSET.

**Note**  
Timeouts that are configured in the following example are for tests that ran SET/GET commands with keys and values up to 20 bytes long. The processing time can be longer when the commands are complex or the keys and values are larger. You should set the timeouts based on the use case of your application. 

```
private static final Duration META_COMMAND_TIMEOUT = Duration.ofMillis(1000);
private static final Duration DEFAULT_COMMAND_TIMEOUT = Duration.ofMillis(250);
// Socket connect timeout should be lower than command timeout for Lettuce
private static final Duration CONNECT_TIMEOUT = Duration.ofMillis(100);
    
SocketOptions socketOptions = SocketOptions.builder()
    .connectTimeout(CONNECT_TIMEOUT)
    .build();
 

class DynamicClusterTimeout extends TimeoutSource {
     private static final Set<ProtocolKeyword> META_COMMAND_TYPES = ImmutableSet.<ProtocolKeyword>builder()
          .add(CommandType.FLUSHDB)
          .add(CommandType.FLUSHALL)
          .add(CommandType.CLUSTER)
          .add(CommandType.INFO)
          .add(CommandType.KEYS)
          .build();

    private final Duration defaultCommandTimeout;
    private final Duration metaCommandTimeout;

    DynamicClusterTimeout(Duration defaultTimeout, Duration metaTimeout)
    {
        defaultCommandTimeout = defaultTimeout;
        metaCommandTimeout = metaTimeout;
    }

    @Override
    public long getTimeout(RedisCommand<?, ?, ?> command) {
        if (META_COMMAND_TYPES.contains(command.getType())) {
            return metaCommandTimeout.toMillis();
        }
        return defaultCommandTimeout.toMillis();
    }
}

// Use a dynamic timeout for commands, to avoid timeouts during
// cluster management and slow operations.
TimeoutOptions timeoutOptions = TimeoutOptions.builder()
.timeoutSource(
    new DynamicClusterTimeout(DEFAULT_COMMAND_TIMEOUT, META_COMMAND_TIMEOUT))
.build();
```

# Example: Lettuce config for cluster mode, TLS enabled
<a name="BestPractices.Clients-lettuce-cme"></a>

**Note**  
Timeouts in the following example are for tests that ran SET/GET commands with keys and values up to 20 bytes long. The processing time can be longer when the commands are complex or the keys and values are larger. You should set the timeouts based on the use case of your application. 

```
// Set DNS cache TTL
public void setJVMProperties() {
    java.security.Security.setProperty("networkaddress.cache.ttl", "10");
}

private static final Duration META_COMMAND_TIMEOUT = Duration.ofMillis(1000);
private static final Duration DEFAULT_COMMAND_TIMEOUT = Duration.ofMillis(250);
// Socket connect timeout should be lower than command timeout for Lettuce
private static final Duration CONNECT_TIMEOUT = Duration.ofMillis(100);

// Create RedisURI from the cluster configuration endpoint
clusterConfigurationEndpoint = <cluster-configuration-endpoint> // TODO: add your cluster configuration endpoint
final RedisURI redisUriCluster =
    RedisURI.Builder.redis(clusterConfigurationEndpoint)
        .withPort(6379)
        .withSsl(true)
        .build();

// Configure the client's resources                
ClientResources clientResources = DefaultClientResources.builder()
    .reconnectDelay(
        Delay.fullJitter(
            Duration.ofMillis(100),     // minimum 100 millisecond delay
            Duration.ofSeconds(10),      // maximum 10 second delay
            100, TimeUnit.MILLISECONDS)) // 100 millisecond base
    .addressResolverGroup(new DirContextDnsResolver())
    .build(); 

// Create a cluster client instance with the URI and resources
RedisClusterClient redisClusterClient = 
    RedisClusterClient.create(clientResources, redisUriCluster);

// Use a dynamic timeout for commands, to avoid timeouts during
// cluster management and slow operations.
class DynamicClusterTimeout extends TimeoutSource {
     private static final Set<ProtocolKeyword> META_COMMAND_TYPES = ImmutableSet.<ProtocolKeyword>builder()
          .add(CommandType.FLUSHDB)
          .add(CommandType.FLUSHALL)
          .add(CommandType.CLUSTER)
          .add(CommandType.INFO)
          .add(CommandType.KEYS)
          .build();

    private final Duration metaCommandTimeout;
    private final Duration defaultCommandTimeout;

    DynamicClusterTimeout(Duration defaultTimeout, Duration metaTimeout)
    {
        defaultCommandTimeout = defaultTimeout;
        metaCommandTimeout = metaTimeout;
    }

    @Override
    public long getTimeout(RedisCommand<?, ?, ?> command) {
        if (META_COMMAND_TYPES.contains(command.getType())) {
            return metaCommandTimeout.toMillis();
        }
        return defaultCommandTimeout.toMillis();
    }
}

TimeoutOptions timeoutOptions = TimeoutOptions.builder()
    .timeoutSource(new DynamicClusterTimeout(DEFAULT_COMMAND_TIMEOUT, META_COMMAND_TIMEOUT))
     .build();

// Configure the topology refreshment options
final ClusterTopologyRefreshOptions topologyOptions = 
    ClusterTopologyRefreshOptions.builder()
    .enableAllAdaptiveRefreshTriggers()
    .enablePeriodicRefresh()
    .dynamicRefreshSources(true)
    .build();

// Configure the socket options
final SocketOptions socketOptions = 
    SocketOptions.builder()
    .connectTimeout(CONNECT_TIMEOUT) 
    .keepAlive(true)
    .build();

// Configure the client's options
final ClusterClientOptions clusterClientOptions = 
    ClusterClientOptions.builder()
    .topologyRefreshOptions(topologyOptions)
    .socketOptions(socketOptions)
    .autoReconnect(true)
    .timeoutOptions(timeoutOptions) 
    .nodeFilter(it -> 
        ! (it.is(RedisClusterNode.NodeFlag.FAIL) 
        || it.is(RedisClusterNode.NodeFlag.EVENTUAL_FAIL) 
        || it.is(RedisClusterNode.NodeFlag.NOADDR))) 
    .validateClusterNodeMembership(false)
    .build();
    
redisClusterClient.setOptions(clusterClientOptions);

// Get a connection
final StatefulRedisClusterConnection<String, String> connection = 
    redisClusterClient.connect();

// Get cluster sync/async commands   
RedisAdvancedClusterCommands<String, String> sync = connection.sync();
RedisAdvancedClusterAsyncCommands<String, String> async = connection.async();
```

# Example: Lettuce config for cluster mode disabled, TLS enabled
<a name="BestPractices.Clients-lettuce-cmd"></a>

**Note**  
Timeouts in the following example are for tests that ran SET/GET commands with keys and values up to 20 bytes long. The processing time can be longer when the commands are complex or the keys and values are larger. You should set the timeouts based on the use case of your application. 

```
// Set DNS cache TTL
public void setJVMProperties() {
    java.security.Security.setProperty("networkaddress.cache.ttl", "10");
}

private static final Duration META_COMMAND_TIMEOUT = Duration.ofMillis(1000);
private static final Duration DEFAULT_COMMAND_TIMEOUT = Duration.ofMillis(250);
// Socket connect timeout should be lower than command timeout for Lettuce
private static final Duration CONNECT_TIMEOUT = Duration.ofMillis(100);

// Create RedisURI from the primary/reader endpoint
clusterEndpoint = <primary/reader-endpoint> // TODO: add your node endpoint
RedisURI redisUriStandalone =
    RedisURI.Builder.redis(clusterEndpoint).withPort(6379).withSsl(true).withDatabase(0).build();

ClientResources clientResources =
    DefaultClientResources.builder()
        .addressResolverGroup(new DirContextDnsResolver())
        .reconnectDelay(
            Delay.fullJitter(
                Duration.ofMillis(100), // minimum 100 millisecond delay
                Duration.ofSeconds(10), // maximum 10 second delay
                100,
                TimeUnit.MILLISECONDS)) // 100 millisecond base
        .build();

// Use a dynamic timeout for commands, to avoid timeouts during
// slow operations.
class DynamicTimeout extends TimeoutSource {
     private static final Set<ProtocolKeyword> META_COMMAND_TYPES = ImmutableSet.<ProtocolKeyword>builder()
          .add(CommandType.FLUSHDB)
          .add(CommandType.FLUSHALL)
          .add(CommandType.INFO)
          .add(CommandType.KEYS)
          .build();

    private final Duration metaCommandTimeout;
    private final Duration defaultCommandTimeout;

    DynamicTimeout(Duration defaultTimeout, Duration metaTimeout)
    {
        defaultCommandTimeout = defaultTimeout;
        metaCommandTimeout = metaTimeout;
    }

    @Override
    public long getTimeout(RedisCommand<?, ?, ?> command) {
        if (META_COMMAND_TYPES.contains(command.getType())) {
            return metaCommandTimeout.toMillis();
        }
        return defaultCommandTimeout.toMillis();
    }
}

TimeoutOptions timeoutOptions = TimeoutOptions.builder()
    .timeoutSource(new DynamicTimeout(DEFAULT_COMMAND_TIMEOUT, META_COMMAND_TIMEOUT))
     .build();                      
                                    
final SocketOptions socketOptions =
    SocketOptions.builder().connectTimeout(CONNECT_TIMEOUT).keepAlive(true).build();

ClientOptions clientOptions =
    ClientOptions.builder().timeoutOptions(timeoutOptions).socketOptions(socketOptions).build();

RedisClient redisClient = RedisClient.create(clientResources, redisUriStandalone);
redisClient.setOptions(clientOptions);
```

## Configuring a preferred protocol for dual stack clusters (Valkey and Redis OSS)
<a name="network-type-configuring-dual-stack-redis"></a>

For cluster mode enabled Valkey or Redis OSS clusters, you can control the protocol clients will use to connect to the nodes in the cluster with the IP Discovery parameter. The IP Discovery parameter can be set to either IPv4 or IPv6. 

For Valkey or Redis OSS clusters, the IP discovery parameter sets the IP protocol used in the [cluster slots ()](https://valkey.io/commands/cluster-slots/), [cluster shards ()](https://valkey.io/commands/cluster-shards/), and [cluster nodes ()](https://valkey.io/commands/cluster-nodes/) output. These commands are used by clients to discover the cluster topology. Clients use the IPs in theses commands to connect to the other nodes in the cluster. 

Changing the IP Discovery will not result in any downtime for connected clients. However, the changes will take some time to propagate. To determine when the changes have completely propagated for a Valkey or Redis OSS Cluster, monitor the output of `cluster slots`. Once all of the nodes returned by the cluster slots command report IPs with the new protocol the changes have finished propagating. 

Example with Redis-Py:

```
cluster = RedisCluster(host="xxxx", port=6379)
target_type = IPv6Address # Or IPv4Address if changing to IPv4

nodes = set()
while len(nodes) == 0 or not all((type(ip_address(host)) is target_type) for host in nodes):
    nodes = set()

   # This refreshes the cluster topology and will discovery any node updates.
   # Under the hood it calls cluster slots
    cluster.nodes_manager.initialize()
    for node in cluster.get_nodes():
        nodes.add(node.host)
    self.logger.info(nodes)

    time.sleep(1)
```

Example with Lettuce:

```
RedisClusterClient clusterClient = RedisClusterClient.create(RedisURI.create("xxxx", 6379));

Class targetProtocolType = Inet6Address.class; // Or Inet4Address.class if you're switching to IPv4

Set<String> nodes;
    
do {
   // Check for any changes in the cluster topology.
   // Under the hood this calls cluster slots
    clusterClient.refreshPartitions();
    Set<String> nodes = new HashSet<>();

    for (RedisClusterNode node : clusterClient.getPartitions().getPartitions()) {
        nodes.add(node.getUri().getHost());
    }

    Thread.sleep(1000);
} while (!nodes.stream().allMatch(node -> {
            try {
                return finalTargetProtocolType.isInstance(InetAddress.getByName(node));
            } catch (UnknownHostException ignored) {}
            return false;
}));
```

# Best practices for clients (Memcached)
<a name="BestPractices.Clients.memcached"></a>

Learn best practices for common scenarios with ElastiCache for Memcached clusters.

**Topics**
+ [Configuring your ElastiCache client for efficient load balancing (Memcached)](BestPractices.LoadBalancing.md)
+ [Validated clients with Memcached](network-type-validated-clients-memcached.md)
+ [Configuring a preferred protocol for dual stack clusters (Memcached)](network-type-configuring-dual-stack-memcached.md)

# Configuring your ElastiCache client for efficient load balancing (Memcached)
<a name="BestPractices.LoadBalancing"></a>

**Note**  
This section applies to node-based multi-node Memcached clusters.

To effectively use multiple ElastiCache Memcached nodes, you need to be able to spread your cache keys across the nodes. A simple way to load balance a cluster with *n* nodes is to calculate the hash of the object's key and mod the result by *n*: `hash(key) mod n`. The resulting value (0 through *n*–1) is the number of the node where you place the object. 

This approach is simple and works well as long as the number of nodes (*n*) is constant. However, whenever you add or remove a node from the cluster, the number of keys that need to be moved is *(n - 1) / n* (where *n* is the new number of nodes). Thus, this approach results in a large number of keys being moved, which translates to a large number of initial cache misses, especially as the number of nodes gets large. Scaling from 1 to 2 nodes results in (2-1) / 2 (50 percent) of the keys being moved, the best case. Scaling from 9 to 10 nodes results in (10–1)/10 (90 percent) of the keys being moved. If you're scaling up due to a spike in traffic, you don't want to have a large number of cache misses. A large number of cache misses results in hits to the database, which is already overloaded due to the spike in traffic.

The solution to this dilemma is consistent hashing. Consistent hashing uses an algorithm such that whenever a node is added or removed from a cluster, the number of keys that must be moved is roughly *1 / n* (where *n* is the new number of nodes). Scaling from 1 to 2 nodes results in 1/2 (50 percent) of the keys being moved, the worst case. Scaling from 9 to 10 nodes results in 1/10 (10 percent) of the keys being moved.

As the user, you control which hashing algorithm is used for multi-node clusters. We recommend that you configure your clients to use consistent hashing. Fortunately, there are many Memcached client libraries in most popular languages that implement consistent hashing. Check the documentation for the library you are using to see if it supports consistent hashing and how to implement it.

If you are working in Java, PHP, or .NET, we recommend you use one of the Amazon ElastiCache client libraries.

## Consistent Hashing Using Java
<a name="BestPractices.LoadBalancing.Java"></a>

The ElastiCache Memcached Java client is based on the open-source spymemcached Java client, which has consistent hashing capabilities built in. The library includes a KetamaConnectionFactory class that implements consistent hashing. By default, consistent hashing is turned off in spymemcached.

For more information, see the KetamaConnectionFactory documentation at [KetamaConnectionFactory](https://github.com/RTBHOUSE/spymemcached/blob/master/src/main/java/net/spy/memcached/KetamaConnectionFactory.java).

## Consistent hashing using PHP with Memcached
<a name="BestPractices.LoadBalancing.PHP"></a>

The ElastiCache Memcached PHP client is a wrapper around the built-in Memcached PHP library. By default, consistent hashing is turned off by the Memcached PHP library.

Use the following code to turn on consistent hashing.

```
$m = new Memcached();
$m->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
```

In addition to the preceding code, we recommend that you also turn `memcached.sess_consistent_hash` on in your php.ini file.

 For more information, see the run-time configuration documentation for Memcached PHP at [http://php.net/manual/en/memcached.configuration.php](http://php.net/manual/en/memcached.configuration.php). Note specifically the `memcached.sess_consistent_hash` parameter.

## Consistent hashing using .NET with Memcached
<a name="BestPractices.LoadBalancing.dotNET"></a>

The ElastiCache Memcached .NET client is a wrapper around Enyim Memcached. By default, consistent hashing is turned on by the Enyim Memcached client.

 For more information, see the `memcached/locator` documentation at [https://github.com/enyim/EnyimMemcached/wiki/MemcachedClient-Configuration\$1user-content-memcachedlocator](https://github.com/enyim/EnyimMemcached/wiki/MemcachedClient-Configuration#user-content-memcachedlocator).

# Validated clients with Memcached
<a name="network-type-validated-clients-memcached"></a>

The following clients have specifically been validated to work with all supported network type configurations for Memcached.

Validated Clients:
+ [AWS ElastiCache Cluster Client Memcached for Php](https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-php) – [Version \$13.6.2](https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-php/tree/v3.2.0)
+ [AWS ElastiCache Cluster Client Memcached for Java](https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-java) – Latest master on Github

# Configuring a preferred protocol for dual stack clusters (Memcached)
<a name="network-type-configuring-dual-stack-memcached"></a>

For Memcached clusters you can control the protocol clients will use to connect to the nodes in the cluster with the IP Discovery parameter. The IP Discovery parameter can be set to either IPv4 or IPv6. 

The IP discovery parameter controls the IP protocol used in the config get cluster output. Which in turn will determine the IP protocol used by clients which support auto-discovery for ElastiCache for Memcached clusters.

Changing the IP Discovery will not result in any downtime for connected clients. However, the changes will take some time to propagate. 

Monitor the output of `getAvailableNodeEndPoints` for Java and for Php monitor the output of `getServerList`. Once the output of these functions reports IPs for all of the nodes in the cluster that use the updated protocol, the changes have finished propagating.

Java Example:

```
MemcachedClient client = new MemcachedClient(new InetSocketAddress("xxxx", 11211));

Class targetProtocolType = Inet6Address.class; // Or Inet4Address.class if you're switching to IPv4

Set<String> nodes;
    
do {
    nodes = client.getAvailableNodeEndPoints().stream().map(NodeEndPoint::getIpAddress).collect(Collectors.toSet());

    Thread.sleep(1000);
} while (!nodes.stream().allMatch(node -> {
            try {
                return finalTargetProtocolType.isInstance(InetAddress.getByName(node));
            } catch (UnknownHostException ignored) {}
            return false;
        }));
```

Php Example:

```
$client = new Memcached;
$client->setOption(Memcached::OPT_CLIENT_MODE, Memcached::DYNAMIC_CLIENT_MODE);
$client->addServer("xxxx", 11211);

$nodes = [];
$target_ips_count = 0;
do {
    # The PHP memcached client only updates the server list if the polling interval has expired and a
    # command is sent
    $client->get('test');
 
    $nodes = $client->getServerList();

    sleep(1);
    $target_ips_count = 0;

    // For IPv4 use FILTER_FLAG_IPV4
    $target_ips_count = count(array_filter($nodes, function($node) { return filter_var($node["ipaddress"], FILTER_VALIDATE_IP, FILTER_FLAG_IPV6); }));
 
} while (count($nodes) !== $target_ips_count);
```

Any existing client connections that were created before the IP Discovery was updated will still be connected using the old protocol. All of the validated clients will automatically reconnect to the cluster using the new IP protocol once the changes are detected in the output of the cluster discovery commands. However, this depends on the implementation of the client.

## TLS enabled dual stack ElastiCache clusters
<a name="network-type-configuring-tls-enabled-dual-stack"></a>

When TLS is enabled for ElastiCache clusters the cluster discovery functions (`cluster slots`, `cluster shards`, and `cluster nodes` for Redis) or `config get cluster` for Memcached return hostnames instead of IPs. The hostnames are then used instead of IPs to connect to the ElastiCache cluster and perform a TLS handshake. This means that clients won't be affected by the IP Discovery parameter. For TLS enabled clusters the IP Discovery parameter has no effect on the preferred IP protocol. Instead, the IP protocol used will be determined by which IP protocol the client prefers when resolving DNS hostnames.

**Java clients**

When connecting from a Java environment that supports both IPv4 and IPv6, Java will by default prefer IPv4 over IPv6 for backwards compatibility. However, the IP protocol preference is configurable through the JVM arguments. To prefer IPv4, the JVM accepts `-Djava.net.preferIPv4Stack=true` and to prefer IPv6 set `-Djava.net.preferIPv6Stack=true`. Setting `-Djava.net.preferIPv4Stack=true` means that the JVM will no longer make any IPv6 connections. **For Valkey or Redis OSS, this includes those to other non-Valkey and non-Redis OSS applications.**

**Host Level Preferences**

In general, if the client or client runtime don't provide configuration options for setting an IP protocol preference, when performing DNS resolution, the IP protocol will depend on the host's configuration. By default, most hosts prefer IPv6 over IPv4 but this preference can be configured at the host level. This will affect all DNS requests from that host, not just those to ElastiCache clusters.

**Linux hosts**

For Linux, an IP protocol preference can be configured by modifying the `gai.conf` file. The `gai.conf` file can be found under `/etc/gai.conf`. If there is no `gai.conf` specified then an example one should be available under `/usr/share/doc/glibc-common-x.xx/gai.conf` which can be copied to `/etc/gai.conf` and then the default configuration should be un-commented. To update the configuration to prefer IPv4 when connecting to an ElastiCache cluster update the precedence for the CIDR range encompassing the cluster IPs to be above the precedence for default IPv6 connections. By default IPv6 connections have a precedence of 40. For example, assuming the cluster is located in a subnet with the CIDR 172.31.0.0:0/16, the configuration below would cause clients to prefer IPv4 connections to that cluster.

```
label ::1/128       0
label ::/0          1
label 2002::/16     2
label ::/96         3
label ::ffff:0:0/96 4
label fec0::/10     5
label fc00::/7      6
label 2001:0::/32   7
label ::ffff:172.31.0.0/112 8
#
#    This default differs from the tables given in RFC 3484 by handling
#    (now obsolete) site-local IPv6 addresses and Unique Local Addresses.
#    The reason for this difference is that these addresses are never
#    NATed while IPv4 site-local addresses most probably are.  Given
#    the precedence of IPv6 over IPv4 (see below) on machines having only
#    site-local IPv4 and IPv6 addresses a lookup for a global address would
#    see the IPv6 be preferred.  The result is a long delay because the
#    site-local IPv6 addresses cannot be used while the IPv4 address is
#    (at least for the foreseeable future) NATed.  We also treat Teredo
#    tunnels special.
#
# precedence  <mask>   <value>
#    Add another rule to the RFC 3484 precedence table.  See section 2.1
#    and 10.3 in RFC 3484.  The default is:
#
precedence  ::1/128       50
precedence  ::/0          40
precedence  2002::/16     30
precedence ::/96          20
precedence ::ffff:0:0/96  10
precedence ::ffff:172.31.0.0/112 100
```

More details on `gai.conf` are available on the [Linux man page](https://man7.org/linux/man-pages/man5/gai.conf.5.html) 

**Windows hosts**

The process for Windows hosts is similar. For Windows hosts you can run `netsh interface ipv6 set prefix CIDR_CONTAINING_CLUSTER_IPS PRECEDENCE LABEL`. This has the same effect as modifying the `gai.conf` file on Linux hosts.

This will update the preference policies to prefer IPv4 connections over IPv6 connections for the specified CIDR range. For example, assuming that the cluster is in a subnet with the 172.31.0.0:0/16 CIDR executing `netsh interface ipv6 set prefix ::ffff:172.31.0.0:0/112 100 15` would result in the following precedence table which would cause clients to prefer IPv4 when connecting to the cluster. 

```
C:\Users\Administrator>netsh interface ipv6 show prefixpolicies
Querying active state...

Precedence Label Prefix
---------- ----- --------------------------------
100 15 ::ffff:172.31.0.0:0/112
20 4 ::ffff:0:0/96
50 0 ::1/128
40 1 ::/0
30 2 2002::/16
5 5 2001::/32
3 13 fc00::/7
1 11 fec0::/10
1 12 3ffe::/16
1 3 ::/96
```

# Managing reserved memory for Valkey and Redis OSS
<a name="redis-memory-management"></a>

Reserved memory is memory set aside for nondata use. When performing a backup or failover, Valkey and Redis OSS use available memory to record write operations to your cluster while the cluster's data is being written to the .rdb file. If you don't have sufficient memory available for all the writes, the process fails. Following, you can find information on options for managing reserved memory for ElastiCache for Redis OSS and how to apply those options.

**Topics**
+ [How Much Reserved Memory Do You Need?](#redis-memory-management-need)
+ [Parameters to Manage Reserved Memory](#redis-memory-management-parameters)
+ [Specifying Your Reserved Memory Management Parameter](#redis-reserved-memory-management-change)

## How Much Reserved Memory Do You Need?
<a name="redis-memory-management-need"></a>

If you are running a version of Redis OSS before 2.8.22, reserve more memory for backups and failovers than if you are running Redis OSS 2.8.22 or later. This requirement is due to the different ways that ElastiCache for Redis OSS implements the backup process. The rule of thumb is to reserve half of a node type's `maxmemory` value for Redis OSS overhead for versions before 2.8.22, and one-fourth for Redis OSS versions 2.8.22 and later. 

Due to different ways that ElastiCache implements the backup and replication process, the rule of thumb is to reserve 25% of a node type's `maxmemory` value by using the `reserved-memory-percent` parameter. This is the default value and recommended for most cases.

When burstable micro and small instance types are operating near the `maxmemory` limits, they may experience swap usage. To improve the operational reliability on these instance types during backup, replication and high traffic, we recommend increasing the value of the `reserved-memory-percent` parameter up to 30% on small instance types, and up to 50% on micro instance types.

For write-heavy workloads on ElastiCache clusters with data tiering, we recommend increasing the `reserved-memory-percent` to up to 50% of the node's available memory.

For more information, see the following:
+ [Ensuring you have enough memory to make a Valkey or Redis OSS snapshot](BestPractices.BGSAVE.md)
+ [How synchronization and backup are implemented](Replication.Redis.Versions.md)
+ [Data tiering in ElastiCache](data-tiering.md)

## Parameters to Manage Reserved Memory
<a name="redis-memory-management-parameters"></a>

As of March 16, 2017, Amazon ElastiCache provides two mutually exclusive parameters for managing your Valkey or Redis OSS memory, `reserved-memory` and `reserved-memory-percent`. Neither of these parameters is part of the Valkey or Redis OSS distribution. 

Depending upon when you became an ElastiCache customer, one or the other of these parameters is the default memory management parameter. This parameter applies when you create a new Valkey or Redis OSS cluster or replication group and use a default parameter group. 
+ For customers who started before March 16, 2017 – When you create a Redis OSS cluster or replication group using the default parameter group, your memory management parameter is `reserved-memory`. In this case, zero (0) bytes of memory are reserved. 
+ For customers who started on or after March 16, 2017 – When you create a Valkey or Redis OSS cluster or replication group using the default parameter group, your memory management parameter is `reserved-memory-percent`. In this case, 25 percent of your node's `maxmemory` value is reserved for nondata purposes.

After reading about the two Valkey or Redis OSS memory management parameters, you might prefer to use the one that isn't your default or with nondefault values. If so, you can change to the other reserved memory management parameter. 

To change the value of that parameter, you can create a custom parameter group and modify it to use your preferred memory management parameter and value. You can then use the custom parameter group whenever you create a new Valkey or Redis OSS cluster or replication group. For existing clusters or replication groups, you can modify them to use your custom parameter group.

 For more information, see the following: 
+ [Specifying Your Reserved Memory Management Parameter](#redis-reserved-memory-management-change)
+ [Creating an ElastiCache parameter group](ParameterGroups.Creating.md)
+ [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md)
+ [Modifying an ElastiCache cluster](Clusters.Modify.md)
+ [Modifying a replication group](Replication.Modify.md)

### The reserved-memory Parameter
<a name="redis-memory-management-parameters-reserved-memory"></a>

Before March 16, 2017, all ElastiCache for Redis OSS reserved memory management was done using the parameter `reserved-memory`. The default value of `reserved-memory` is 0. This default reserves no memory for Valkey or Redis OSS overhead and allows Valkey or Redis OSS to consume all of a node's memory with data. 

Changing `reserved-memory` so you have sufficient memory available for backups and failovers requires you to create a custom parameter group. In this custom parameter group, you set `reserved-memory` to a value appropriate for the Valkey or Redis OSS version running on your cluster and cluster's node type. For more information, see [How Much Reserved Memory Do You Need?](#redis-memory-management-need)

The parameter `reserved-memory` is specific to ElastiCache and isn't part of the general Redis OSS distribution.

The following procedure shows how to use `reserved-memory` to manage the memory on your Valkey or Redis OSS cluster.

**To reserve memory using reserved-memory**

1. Create a custom parameter group specifying the parameter group family matching the engine version you’re running—for example, specifying the `redis2.8` parameter group family. For more information, see [Creating an ElastiCache parameter group](ParameterGroups.Creating.md).

   ```
   aws elasticache create-cache-parameter-group \
      --cache-parameter-group-name redis6x-m3xl \
      --description "Redis OSS 2.8.x for m3.xlarge node type" \
      --cache-parameter-group-family redis6.x
   ```

1. Calculate how many bytes of memory to reserve for Valkey or Redis OSS overhead. You can find the value of `maxmemory` for your node type at [Redis OSS node-type specific parameters](ParameterGroups.Engine.md#ParameterGroups.Redis.NodeSpecific).

1. Modify the custom parameter group so that the parameter `reserved-memory` is the number of bytes you calculated in the previous step. The following AWS CLI example assumes you’re running a version of Redis OSS before 2.8.22 and need to reserve half of the node’s `maxmemory`. For more information, see [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md).

   ```
   aws elasticache modify-cache-parameter-group \
      --cache-parameter-group-name redis28-m3xl \
      --parameter-name-values "ParameterName=reserved-memory, ParameterValue=7130316800"
   ```

   You need a separate custom parameter group for each node type that you use, because each node type has a different `maxmemory` value. Thus, each node type needs a different value for `reserved-memory`.

1. Modify your Redis OSS cluster or replication group to use your custom parameter group.

   The following CLI example modifies the cluster ` my-redis-cluster` to use the custom parameter group `redis28-m3xl` beginning immediately. For more information, see [Modifying an ElastiCache cluster](Clusters.Modify.md).

   ```
   aws elasticache modify-cache-cluster \
      --cache-cluster-id my-redis-cluster \
      --cache-parameter-group-name redis28-m3xl \
      --apply-immediately
   ```

   The following CLI example modifies the replication group `my-redis-repl-grp` to use the custom parameter group `redis28-m3xl` beginning immediately. For more information, [Modifying a replication group](Replication.Modify.md).

   ```
   aws elasticache modify-replication-group \
      --replication-group-id my-redis-repl-grp \
      --cache-parameter-group-name redis28-m3xl \
      --apply-immediately
   ```

### The reserved-memory-percent parameter
<a name="redis-memory-management-parameters-reserved-memory-percent"></a>

On March 16, 2017, Amazon ElastiCache introduced the parameter `reserved-memory-percent` and made it available on all versions of ElastiCache for Redis OSS. The purpose of `reserved-memory-percent` is to simplify reserved memory management across all your clusters. It does so by enabling you to have a single parameter group for each parameter group family (such as `redis2.8`) to manage your clusters' reserved memory, regardless of node type. The default value for `reserved-memory-percent` is 25 (25 percent).

The parameter `reserved-memory-percent` is specific to ElastiCache and isn't part of the general Redis OSS distribution.

If your cluster is using a node type from the r6gd family and your memory usage reaches 75 percent, data-tiering will automatically be triggered. For more information, see [Data tiering in ElastiCache](data-tiering.md).

**To reserve memory using reserved-memory-percent**  
To use `reserved-memory-percent` to manage the memory on your ElastiCache for Redis OSS cluster, do one of the following:
+ If you are running Redis OSS 2.8.22 or later, assign the default parameter group to your cluster. The default 25 percent should be adequate. If not, take the steps described following to change the value.
+ If you are running a version of Redis OSS before 2.8.22, you probably need to reserve more memory than `reserved-memory-percent`'s default 25 percent. To do so, use the following procedure. 

**To change the percent value of reserved-memory-percent**

1. Create a custom parameter group specifying the parameter group family matching the engine version you’re running—for example, specifying the `redis2.8` parameter group family. A custom parameter group is necessary because you can't modify a default parameter group. For more information, see [Creating an ElastiCache parameter group](ParameterGroups.Creating.md).

   ```
   aws elasticache create-cache-parameter-group \
      --cache-parameter-group-name redis28-50 \
      --description "Redis OSS 2.8.x 50% reserved" \
      --cache-parameter-group-family redis2.8
   ```

   Because `reserved-memory-percent` reserves memory as a percent of a node’s `maxmemory`, you don't need a custom parameter group for each node type.

1. Modify the custom parameter group so that `reserved-memory-percent` is 50 (50 percent). For more information, see [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md).

   ```
   aws elasticache modify-cache-parameter-group \
      --cache-parameter-group-name redis28-50 \
      --parameter-name-values "ParameterName=reserved-memory-percent, ParameterValue=50"
   ```

1. Use this custom parameter group for any Redis OSS clusters or replication groups running a version of Redis OSS older than 2.8.22.

   The following CLI example modifies the Redis OSS cluster `my-redis-cluster` to use the custom parameter group `redis28-50` beginning immediately. For more information, see [Modifying an ElastiCache cluster](Clusters.Modify.md).

   ```
   aws elasticache modify-cache-cluster \
      --cache-cluster-id my-redis-cluster \
      --cache-parameter-group-name redis28-50 \
      --apply-immediately
   ```

   The following CLI example modifies the Redis OSS replication group `my-redis-repl-grp` to use the custom parameter group `redis28-50` beginning immediately. For more information, see [Modifying a replication group](Replication.Modify.md).

   ```
   aws elasticache modify-replication-group \
      --replication-group-id my-redis-repl-grp \
      --cache-parameter-group-name redis28-50 \
      --apply-immediately
   ```

## Specifying Your Reserved Memory Management Parameter
<a name="redis-reserved-memory-management-change"></a>

If you were a current ElastiCache customer on March 16, 2017, your default reserved memory management parameter is `reserved-memory` with zero (0) bytes of reserved memory. If you became an ElastiCache customer after March 16, 2017, your default reserved memory management parameter is `reserved-memory-percent` with 25 percent of the node's memory reserved. This is true no matter when you created your ElastiCache for Redis OSS cluster or replication group. However, you can change your reserved memory management parameter using either the AWS CLI or ElastiCache API.

The parameters `reserved-memory` and `reserved-memory-percent` are mutually exclusive. A parameter group always has one but never both. You can change which parameter a parameter group uses for reserved memory management by modifying the parameter group. The parameter group must be a custom parameter group, because you can't modify default parameter groups. For more information, see [Creating an ElastiCache parameter group](ParameterGroups.Creating.md).

**To specify reserved-memory-percent**  
To use `reserved-memory-percent` as your reserved memory management parameter, modify a custom parameter group using the `modify-cache-parameter-group` command. Use the `parameter-name-values` parameter to specify `reserved-memory-percent` and a value for it.

The following CLI example modifies the custom parameter group `redis32-cluster-on` so that it uses `reserved-memory-percent` to manage reserved memory. A value must be assigned to `ParameterValue` for the parameter group to use the `ParameterName` parameter for reserved memory management. For more information, see [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md).

```
aws elasticache modify-cache-parameter-group \
   --cache-parameter-group-name redis32-cluster-on \
   --parameter-name-values "ParameterName=reserved-memory-percent, ParameterValue=25"
```

**To specify reserved-memory**  
To use `reserved-memory` as your reserved memory management parameter, modify a custom parameter group using the `modify-cache-parameter-group` command. Use the `parameter-name-values` parameter to specify `reserved-memory` and a value for it.

The following CLI example modifies the custom parameter group `redis32-m3xl` so that it uses `reserved-memory` to manage reserved memory. A value must be assigned to `ParameterValue` for the parameter group to use the `ParameterName` parameter for reserved memory management. Because the engine version is newer than 2.8.22, we set the value to `3565158400` which is 25 percent of a `cache.m3.xlarge`’s `maxmemory`. For more information, see [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md).

```
aws elasticache modify-cache-parameter-group \
   --cache-parameter-group-name redis32-m3xl \
   --parameter-name-values "ParameterName=reserved-memory, ParameterValue=3565158400"
```

# Best practices when working with Valkey and Redis OSS node-based clusters
<a name="BestPractices.SelfDesigned"></a>

Multi-AZ use, having sufficient memory, cluster resizing and minimizing downtime are all useful concepts to keep in mind when working with node-based clusters in Valkey or Redis OSS. We recommend you review and follow these best practices.

**Topics**
+ [Minimizing downtime with Multi-AZ](multi-az.md)
+ [Ensuring you have enough memory to make a Valkey or Redis OSS snapshot](BestPractices.BGSAVE.md)
+ [Online cluster resizing](best-practices-online-resharding.md)
+ [Minimizing downtime during maintenance](BestPractices.MinimizeDowntime.md)

# Minimizing downtime with Multi-AZ
<a name="multi-az"></a>

There are a number of instances where ElastiCache Valkey or Redis OSS may need to replace a primary node; these include certain types of planned maintenance and the unlikely event of a primary node or Availability Zone failure.

This replacement results in some downtime for the cluster, but if Multi-AZ is enabled, the downtime is minimized. The role of primary node will automatically fail over to one of the read replicas. There is no need to create and provision a new primary node, because ElastiCache will handle this transparently. This failover and replica promotion ensure that you can resume writing to the new primary as soon as promotion is complete.

See [Minimizing downtime in ElastiCache by using Multi-AZ with Valkey and Redis OSS](AutoFailover.md), to learn more about Multi-AZ and minimizing downtime.

# Ensuring you have enough memory to make a Valkey or Redis OSS snapshot
<a name="BestPractices.BGSAVE"></a>

**Snapshots and synchronizations in Valkey 7.2 and later, and Redis OSS version 2.8.22 and later**  
Valkey has default support for snapshots and synchronizations. Redis OSS 2.8.22 introduces a forkless save process that allows you to allocate more of your memory to your application's use without incurring increased swap usage during synchronizations and saves. For more information, see [How synchronization and backup are implemented](Replication.Redis.Versions.md).

**Redis OSS snapshots and synchronizations before version 2.8.22**

When you work with ElastiCache for Redis OSS, Redis OSS calls a background write command in a number of cases:
+ When creating a snapshot for a backup.
+ When synchronizing replicas with the primary in a replication group.
+ When enabling the append-only file feature (AOF) for Redis OSS.
+ When promoting a replica to primary (which causes a primary/replica sync).

Whenever Redis OSS executes a background write process, you must have sufficient available memory to accommodate the process overhead. Failure to have sufficient memory available causes the process to fail. Because of this, it is important to choose a node instance type that has sufficient memory when creating your Redis OSS cluster.

## Background Write Process and Memory Usage with Valkey and Redis OSS
<a name="BestPractices.BGSAVE.Process"></a>

Whenever a background write process is called, Valkey and Redis OSS forks its process (remember, these engines are single threaded). One fork persists your data to disk in a Redis OSS .rdb snapshot file. The other fork services all read and write operations. To ensure that your snapshot is a point-in-time snapshot, all data updates and additions are written to an area of available memory separate from the data area.

As long as you have sufficient memory available to record all write operations while the data is being persisted to disk, you should have no insufficient memory issues. You are likely to experience insufficient memory issues if any of the following are true:
+ Your application performs many write operations, thus requiring a large amount of available memory to accept the new or updated data.
+ You have very little memory available in which to write new or updated data.
+ You have a large dataset that takes a long time to persist to disk, thus requiring a large number of write operations.

The following diagram illustrates memory use when executing a background write process.

![\[Image: Diagram of memory use during a background write.\]](http://docs.aws.amazon.com/AmazonElastiCache/latest/dg/images/ElastiCache-bgsaveMemoryUseage.png)


For information on the impact of doing a backup on performance, see [Performance impact of backups of node-based clusters](backups.md#backups-performance).

For more information on how Valkey and Redis OSS perform snapshots, see [http://valkey.io](http://valkey.io).

For more information on regions and Availability Zones, see [Choosing regions and availability zones for ElastiCache](RegionsAndAZs.md). 

## Avoiding running out of memory when executing a background write
<a name="BestPractices.BGSAVE.memoryFix"></a>

Whenever a background write process such as `BGSAVE` or `BGREWRITEAOF` is called, to keep the process from failing, you must have more memory available than will be consumed by write operations during the process. The worst-case scenario is that during the background write operation every record is updated and some new records are added to the cache. Because of this, we recommend that you set `reserved-memory-percent` to 50 (50 percent) for Redis OSS versions before 2.8.22, or 25 (25 percent) for Valkey and all Redis OSS versions 2.8.22 and later. 

The `maxmemory` value indicates the memory available to you for data and operational overhead. Because you cannot modify the `reserved-memory` parameter in the default parameter group, you must create a custom parameter group for the cluster. The default value for `reserved-memory` is 0, which allows Redis OSS to consume all of *maxmemory* with data, potentially leaving too little memory for other uses, such as a background write process. For `maxmemory` values by node instance type, see [Redis OSS node-type specific parameters](ParameterGroups.Engine.md#ParameterGroups.Redis.NodeSpecific).

You can also use `reserved-memory` parameter to reduce the amount of memory used on the box.

For more information on Valkey and Redis-specific parameters in ElastiCache, see [Valkey and Redis OSS parameters](ParameterGroups.Engine.md#ParameterGroups.Redis).

For information on creating and modifying parameter groups, see [Creating an ElastiCache parameter group](ParameterGroups.Creating.md) and [Modifying an ElastiCache parameter group](ParameterGroups.Modifying.md).

# Online cluster resizing
<a name="best-practices-online-resharding"></a>

*Resharding * involves adding and removing shards or nodes to your cluster and redistributing key spaces. As a result, multiple things have an impact on the resharding operation, such as the load on the cluster, memory utilization, and overall size of data. For the best experience, we recommend that you follow overall cluster best practices for uniform workload pattern distribution. In addition, we recommend taking the following steps.

Before initiating resharding, we recommend the following:
+ **Test your application** – Test your application behavior during resharding in a staging environment if possible.
+ **Get early notification for scaling issues** – Resharding is a compute-intensive operation. Because of this, we recommend keeping CPU utilization under 80 percent on multicore instances and less than 50 percent on single core instances during resharding. Monitor ElastiCache for Redis OSS metrics and initiate resharding before your application starts observing scaling issues. Useful metrics to track are `CPUUtilization`, `NetworkBytesIn`, `NetworkBytesOut`, `CurrConnections`, `NewConnections`, `FreeableMemory`, `SwapUsage`, and `BytesUsedForCacheItems`.
+ **Ensure sufficient free memory is available before scaling in** – If you're scaling in, ensure that free memory available on the shards to be retained is at least 1.5 times the memory used on the shards you plan to remove.
+ **Initiate resharding during off-peak hours** – This practice helps to reduce the latency and throughput impact on the client during the resharding operation. It also helps to complete resharding faster as more resources can be used for slot redistribution.
+ **Review client timeout behavior** – Some clients might observe higher latency during online cluster resizing. Configuring your client library with a higher timeout can help by giving the system time to connect even under higher load conditions on server. In some cases, you might open a large number of connections to the server. In these cases, consider adding exponential backoff to reconnect logic. Doing this can help prevent a burst of new connections hitting the server at the same time.
+ **Load your Functions on every shard** – When scaling out your cluster, ElastiCache will automatically replicate the Functions loaded in one of the existing nodes (selected at random) to the new node(s). If your cluster has Valkey 7.2 and above, or Redis OSS 7.0 or above, and your application uses [Functions](https://valkey.io/topics/functions-intro/), we recommend loading all of your functions to all the shards before scaling out so that your cluster does not end up with different functions on different shards.

After resharding, note the following:
+ Scale-in might be partially successful if insufficient memory is available on target shards. If such a result occurs, review available memory and retry the operation, if necessary. The data on the target shards will not be deleted.
+ `FLUSHALL` and `FLUSHDB` commands are not supported inside Lua scripts during a resharding operation. Prior to Redis OSS 6, the `BRPOPLPUSH` command is not supported if it operates on the slot being migrated.

# Minimizing downtime during maintenance
<a name="BestPractices.MinimizeDowntime"></a>

Cluster mode configuration has the best availability during managed or unmanaged operations. We recommend that you use a cluster mode supported client that connects to the cluster discovery endpoint. For cluster mode disabled, we recommend that you use the primary endpoint for all write operations. 

For read activity, applications can also connect to any node in the cluster. Unlike the primary endpoint, node endpoints resolve to specific endpoints. If you make a change in your cluster, such as adding or deleting a replica, you must update the node endpoints in your application. This is why for cluster mode disabled, we recommend that you use the reader endpoint for read activity.

If AutoFailover is enabled in the cluster, the primary node might change. Therefore, the application should confirm the role of the node and update all the read endpoints. Doing this helps ensure that you aren't causing a major load on the primary. With AutoFailover disabled, the role of the node doesn't change. However, the downtime in managed or unmanaged operations is higher as compared to clusters with AutoFailover enabled.

 Avoid directing read requests to a single read replica node, as its unavailability could lead to a read outage. Either fallback to reading from the primary, or ensure that you have at least two read replicas to avoid any read interruption during maintenance. 

# Caching database query results
<a name="caching-database-query-results"></a>

A common pattern to reduce database query latencies is query caching. Applications implement query caching by querying the cache for results associated with a database query, returning those results if they are cached. If no cached results are found, the query is executed on the database, the results are populated in the cache, and the results are then returned to the application initiating the query. Subsequent database queries will then return cached results as long as they remain in the cache.

## When to use query caching
<a name="caching-database-query-results-when-to-use"></a>

Query caching with ElastiCache is most effective for the following workload types:
+ **Read-heavy applications** where the same queries are executed repeatedly with data that changes infrequently.
+ **Expensive queries** such as non-indexed lookups, aggregations, or multi-table joins where query execution times are long.
+ **High-concurrency scenarios** where offloading reads to ElastiCache reduces database CPU pressure and improves overall throughput.

Query caching is not recommended for queries where strong consistency is required, or for queries inside multi-statement transactions that require read-after-write consistency.

## Using the AWS Advanced JDBC Wrapper
<a name="caching-database-query-results-jdbc-wrapper"></a>

If your application is using a JDBC driver to query a relational database, you can implement query caching with the [Remote Query Cache Plugin](https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheRemoteQueryCachePlugin.md) in the [AWS Advanced JDBC Wrapper](https://github.com/aws/aws-advanced-jdbc-wrapper). The plugin automatically caches selected SQL query result sets in ElastiCache, returning the result set from the cache instead of the database for future queries. Caching query results can reduce database load and lower average query latencies with minimal application code changes.

## How query caching works with the plugin
<a name="caching-database-query-results-how-it-works"></a>

The Remote Query Cache Plugin makes it easy for Java applications that query PostgreSQL, MySQL, or MariaDB databases to automatically cache query results in ElastiCache. You configure the plugin with your cache endpoint information and indicate which queries in your code to cache using query hints. When the plugin detects a hinted query, it returns the query result from the cache if present (a cache hit). If the results are not in the cache (a cache miss), the plugin executes the query on the database, stores the results in the cache, and returns the results to your application so the next time the query is executed the results can be served from the cache.

## Caching hints
<a name="caching-database-query-results-hints"></a>

You control which queries to cache by setting hints on each query. You can apply query hints directly to query strings in your application code with a comment prefix:

```
/* CACHE_PARAM(ttl=300s) */ SELECT * FROM my_table WHERE id = 42
```

where `ttl` is the time-to-live in seconds. You can also set query hints in prepared statements using common frameworks like Hibernate and Spring Boot.

## Prerequisites
<a name="caching-database-query-results-prerequisites"></a>

To use the Remote Query Cache Plugin in your application, you need an ElastiCache for Valkey or Redis OSS cache (both Serverless and node-based are supported) along with the following dependencies:
+ [AWS Advanced JDBC Wrapper](https://github.com/aws/aws-advanced-jdbc-wrapper) version 3.3.0 or later.
+ [Apache Commons Pool](https://commons.apache.org/proper/commons-pool/) version 2.11.1 or later.
+ [Valkey Glide](https://glide.valkey.io/) version 2.3.0 or later.

## Example: Caching a query with the plugin
<a name="caching-database-query-results-example"></a>

The following example shows how to enable the plugin and cache a query result for 300 seconds (5 minutes) with an ElastiCache for Valkey serverless cache:

```
import java.sql.*;
import java.util.Properties;

public class QueryCacheExample {
    public static void main(String[] args) throws SQLException {
        Properties props = new Properties();
        props.setProperty("user", "myuser");
        props.setProperty("password", "mypassword");

        // Enable the remote query cache plugin
        props.setProperty("wrapperPlugins", "remoteQueryCache");

        // Point to your ElastiCache endpoint
        props.setProperty("cacheEndpointAddrRw", "my-cache.serverless.use1.cache.amazonaws.com:6379");

        Connection conn = DriverManager.getConnection(
            "jdbc:aws-wrapper:postgresql://my-database.cluster-abc123.us-east-1.rds.amazonaws.com:5432/mydb",
            props
        );

        Statement stmt = conn.createStatement();

        // The SQL comment hint tells the plugin to cache this query for 300 seconds
        ResultSet rs = stmt.executeQuery(
            "/* CACHE_PARAM(ttl=300s) */ SELECT product_name, price FROM products WHERE category = 'electronics'"
        );

        while (rs.next()) {
            System.out.println(rs.getString("product_name") + ": $" + rs.getBigDecimal("price"));
        }

        rs.close();
        stmt.close();
        conn.close();
    }
}
```

The first time this query runs, the result is returned from the database and cached in ElastiCache. For the next 300 seconds, subsequent executions of that query are served directly from the cache.

## Related resources
<a name="caching-database-query-results-related"></a>

You can find more extensive examples and detailed information about plugin configuration in the [Remote Query Cache Plugin documentation](https://github.com/aws/aws-advanced-jdbc-wrapper/blob/main/docs/using-the-jdbc-driver/using-plugins/UsingTheRemoteQueryCachePlugin.md).

# Caching strategies for Memcached
<a name="Strategies"></a>

In the following topic, you can find strategies for populating and maintaining your Memcached cache.

What strategies to implement for populating and maintaining your cache depend upon what data you cache and the access patterns to that data. For example, you likely don't want to use the same strategy for both a top-10 leaderboard on a gaming site and trending news stories. In the rest of this section, we discuss common cache maintenance strategies and their advantages and disadvantages.

**Topics**
+ [Read replicas](#Strategies.ReadReplicas)
+ [Lazy loading](#Strategies.LazyLoading)
+ [Write-through](#Strategies.WriteThrough)
+ [Adding TTL](#Strategies.WithTTL)
+ [Related topics](#Strategies.SeeAlso)

## Read replicas
<a name="Strategies.ReadReplicas"></a>

You can often significantly improve performance for ElastiCache serverless caches by reading from replicas instead of the primary cache node. For more information see [Best Practices for using Read Replicas](ReadReplicas.md).

## Lazy loading
<a name="Strategies.LazyLoading"></a>

As the name implies, *lazy loading* is a caching strategy that loads data into the cache only when necessary. It works as described following. 

Amazon ElastiCache is an in-memory key-value store that sits between your application and the data store (database) that it accesses. Whenever your application requests data, it first makes the request to the ElastiCache cache. If the data exists in the cache and is current, ElastiCache returns the data to your application. If the data doesn't exist in the cache or has expired, your application requests the data from your data store. Your data store then returns the data to your application. Your application next writes the data received from the store to the cache. This way, it can be more quickly retrieved the next time it's requested.

A *cache hit* occurs when data is in the cache and isn't expired:

1. Your application requests data from the cache.

1. The cache returns the data to the application.

A *cache miss* occurs when data isn't in the cache or is expired:

1. Your application requests data from the cache.

1. The cache doesn't have the requested data, so returns a `null`.

1. Your application requests and receives the data from the database.

1. Your application updates the cache with the new data.

### Advantages and disadvantages of lazy loading
<a name="Strategies.LazyLoading.Evaluation"></a>

The advantages of lazy loading are as follows:
+ Only requested data is cached.

  Because most data is never requested, lazy loading avoids filling up the cache with data that isn't requested.
+ Node failures aren't fatal for your application.

  When a node fails and is replaced by a new, empty node, your application continues to function, though with increased latency. As requests are made to the new node, each cache miss results in a query of the database. At the same time, the data copy is added to the cache so that subsequent requests are retrieved from the cache.

The disadvantages of lazy loading are as follows:
+ There is a cache miss penalty. Each cache miss results in three trips: 

  1. Initial request for data from the cache

  1. Query of the database for the data

  1. Writing the data to the cache

   These misses can cause a noticeable delay in data getting to the application.
+ Stale data.

  If data is written to the cache only when there is a cache miss, data in the cache can become stale. This result occurs because there are no updates to the cache when data is changed in the database. To address this issue, you can use the [Write-through](#Strategies.WriteThrough) and [Adding TTL](#Strategies.WithTTL) strategies.

### Lazy loading pseudocode example
<a name="Strategies.LazyLoading.CodeExample"></a>

The following is a pseudocode example of lazy loading logic.

```
// *****************************************
// function that returns a customer's record.
// Attempts to retrieve the record from the cache.
// If it is retrieved, the record is returned to the application.
// If the record is not retrieved from the cache, it is
//    retrieved from the database, 
//    added to the cache, and 
//    returned to the application
// *****************************************
get_customer(customer_id)

    customer_record = cache.get(customer_id)
    if (customer_record == null)
    
        customer_record = db.query("SELECT * FROM Customers WHERE id = {0}", customer_id)
        cache.set(customer_id, customer_record)
    
    return customer_record
```

For this example, the application code that gets the data is the following.

```
customer_record = get_customer(12345)
```

## Write-through
<a name="Strategies.WriteThrough"></a>

The write-through strategy adds data or updates data in the cache whenever data is written to the database.

### Advantages and disadvantages of write-through
<a name="Strategies.WriteThrough.Evaluation"></a>

The advantages of write-through are as follows:
+ Data in the cache is never stale.

  Because the data in the cache is updated every time it's written to the database, the data in the cache is always current.
+ Write penalty vs. read penalty.

  Every write involves two trips: 

  1. A write to the cache

  1. A write to the database

   Which adds latency to the process. That said, end users are generally more tolerant of latency when updating data than when retrieving data. There is an inherent sense that updates are more work and thus take longer.

The disadvantages of write-through are as follows:
+ Missing data.

  If you spin up a new node, whether due to a node failure or scaling out, there is missing data. This data continues to be missing until it's added or updated on the database. You can minimize this by implementing [lazy loading](#Strategies.LazyLoading) with write-through.
+ Cache churn.

  Most data is never read, which is a waste of resources. By [adding a time to live (TTL) value](#Strategies.WithTTL), you can minimize wasted space.

### Write-through pseudocode example
<a name="Strategies.WriteThrough.CodeExample"></a>

The following is a pseudocode example of write-through logic.

```
// *****************************************
// function that saves a customer's record.
// *****************************************
save_customer(customer_id, values)

    customer_record = db.query("UPDATE Customers WHERE id = {0}", customer_id, values)
    cache.set(customer_id, customer_record)
    return success
```

For this example, the application code that gets the data is the following.

```
save_customer(12345,{"address":"123 Main"})
```

## Adding TTL
<a name="Strategies.WithTTL"></a>

Lazy loading allows for stale data but doesn't fail with empty nodes. Write-through ensures that data is always fresh, but can fail with empty nodes and can populate the cache with superfluous data. By adding a time to live (TTL) value to each write, you can have the advantages of each strategy. At the same time, you can and largely avoid cluttering up the cache with extra data.

*Time to live (TTL)* is an integer value that specifies the number of seconds until the key expires. Valkey or Redis OSS can specify seconds or milliseconds for this value. Memcached specifies this value in seconds. When an application attempts to read an expired key, it is treated as though the key is not found. The database is queried for the key and the cache is updated. This approach doesn't guarantee that a value isn't stale. However, it keeps data from getting too stale and requires that values in the cache are occasionally refreshed from the database.

For more information, see the [Valkey and Redis OSS commands](https://valkey.io/commands) or the [Memcached `set` commands](https://www.tutorialspoint.com/memcached/memcached_set_data.htm).

### TTL pseudocode examples
<a name="Strategies.WithTTL.CodeExample"></a>

The following is a pseudocode example of write-through logic with TTL.

```
// *****************************************
// function that saves a customer's record.
// The TTL value of 300 means that the record expires
//    300 seconds (5 minutes) after the set command 
//    and future reads will have to query the database.
// *****************************************
save_customer(customer_id, values)

    customer_record = db.query("UPDATE Customers WHERE id = {0}", customer_id, values)
    cache.set(customer_id, customer_record, 300)

    return success
```

The following is a pseudocode example of lazy loading logic with TTL.

```
// *****************************************
// function that returns a customer's record.
// Attempts to retrieve the record from the cache.
// If it is retrieved, the record is returned to the application.
// If the record is not retrieved from the cache, it is 
//    retrieved from the database, 
//    added to the cache, and 
//    returned to the application.
// The TTL value of 300 means that the record expires
//    300 seconds (5 minutes) after the set command 
//    and subsequent reads will have to query the database.
// *****************************************
get_customer(customer_id)

    customer_record = cache.get(customer_id)
    
    if (customer_record != null)
        if (customer_record.TTL < 300)
            return customer_record        // return the record and exit function
            
    // do this only if the record did not exist in the cache OR
    //    the TTL was >= 300, i.e., the record in the cache had expired.
    customer_record = db.query("SELECT * FROM Customers WHERE id = {0}", customer_id)
    cache.set(customer_id, customer_record, 300)  // update the cache
    return customer_record                // return the newly retrieved record and exit function
```

For this example, the application code that gets the data is the following.

```
save_customer(12345,{"address":"123 Main"})
```

```
customer_record = get_customer(12345)
```

## Related topics
<a name="Strategies.SeeAlso"></a>
+ [In-Memory Data Store](elasticache-use-cases.md#elasticache-use-cases-data-store)
+ [Choosing an engine and version](SelectEngine.md)
+ [Scaling ElastiCache](Scaling.md)