Large number of connections (Valkey and Redis OSS) - Amazon ElastiCache

Large number of connections (Valkey and Redis OSS)

Serverless caches and individual ElastiCache (Redis OSS) nodes support up to 65,000 concurrent client connections. However, to optimize for performance, we advise that client applications do not constantly operate at that level of connections. Valkey and Redis OSS each have a single-threaded process based on an event loop where incoming client requests are handled sequentially. That means the response time of a given client becomes longer as the number of connected clients increases.

You can take the following set of actions to avoid hitting a connection bottleneck on a Valkey or Redis OSS server:

  • Perform read operations from read replicas. This can be done by using the ElastiCache reader endpoints in cluster mode disabled or by using replicas for reads in cluster mode enabled, including a serverless cache.

  • Distribute write traffic across multiple primary nodes. You can do this in two ways. You can use a multi-sharded Valkey or Redis OSS cluster with a cluster mode capable client. You could also write to multiple primary nodes in cluster mode disabled with client-side sharding. This is done automatically in a serverless cache.

  • Use a connection pool when available in your client library.

In general, creating a TCP connection is a computationally expensive operation compared to typical Valkey or Redis OSS commands. For example, handling a SET/GET request is an order of magnitude faster when reusing an existing connection. Using a client connection pool with a finite size reduces the overhead of connection management. It also bounds the number of concurrent incoming connections from the client application.

The following code example of PHPRedis shows that a new connection is created for each new user request:

$redis = new Redis(); if ($redis->connect($HOST, $PORT) != TRUE) { //ERROR: connection failed return; } $redis->set($key, $value); unset($redis); $redis = NULL;

We benchmarked this code in a loop on an Amazon Elastic Compute Cloud (Amazon EC2) instance connected to a Graviton2 (m6g.2xlarge) ElastiCache (Redis OSS) node. We placed both the client and server at the same Availability Zone. The average latency of the entire operation was 2.82 milliseconds.

When we updated the code and used persistent connections and a connection pool, the average latency of the entire operation was 0.21 milliseconds:

$redis = new Redis(); if ($redis->pconnect($HOST, $PORT) != TRUE) { // ERROR: connection failed return; } $redis->set($key, $value); unset($redis); $redis = NULL;

Required redis.ini configurations:

  • redis.pconnect.pooling_enabled=1

  • redis.pconnect.connection_limit=10

The following code is an example of a Redis-py connection pool:

conn = Redis(connection_pool=redis.BlockingConnectionPool(host=HOST, max_connections=10)) conn.set(key, value)

The following code is an example of a Lettuce connection pool:

RedisClient client = RedisClient.create(RedisURI.create(HOST, PORT)); GenericObjectPool<StatefulRedisConnection> pool = ConnectionPoolSupport.createGenericObjectPool(() -> client.connect(), new GenericObjectPoolConfig()); pool.setMaxTotal(10); // Configure max connections to 10 try (StatefulRedisConnection connection = pool.borrowObject()) { RedisCommands syncCommands = connection.sync(); syncCommands.set(key, value); }