Monitoring the write-through cache and logical slots for Aurora PostgreSQL logical replication - Amazon Aurora

Monitoring the write-through cache and logical slots for Aurora PostgreSQL logical replication

Monitor the logical replication write-through cache and manage logical slots to improve performance for your Aurora PostgreSQL DB cluster. Following, find more information about the write-through cache and logical slots.

Monitoring the Aurora PostgreSQL logical replication write-through cache

By default, Aurora PostgreSQL versions 14.5, 13.8, 12.12, and 11.17 and higher use a write-through cache to improve the performance for logical replication. Without the write-through cache, Aurora PostgreSQL uses the Aurora storage layer in its implementation of the native PostgreSQL logical replication process. It does so by writing WAL data to storage and then reading the data back from storage to decode it and send (replicate) to its targets (subscribers). This can result in bottlenecks during logical replication for Aurora PostgreSQL DB clusters.

The write-through cache reduces the need to use the Aurora storage layer. Instead of always writing and reading from the Aurora storage layer, Aurora PostgreSQL uses a buffer to cache the logical WAL stream so that it can be used during the replication process, rather than always pulling from disk. This buffer is the PostgreSQL native cache used by logical replication, identified in Aurora PostgreSQL DB cluster parameters as rds.logical_wal_cache. By default, this cache uses 1/32 of the Aurora PostgreSQL DB cluster's buffer cache setting (shared_buffers) but not less than 64kB nor more than the size of one WAL segment, typically 16MB.

When you use logical replication with your Aurora PostgreSQL DB cluster (for the versions that support the write-through cache), you can monitor the cache hit ratio to see how well it's working for your use case. To do so, connect to your Aurora PostgreSQL DB cluster's write instance using psql and then use the Aurora function, aurora_stat_logical_wal_cache, as shown in the following example.

SELECT * FROM aurora_stat_logical_wal_cache();

The function returns output such as the following.

name | active_pid | cache_hit | cache_miss | blks_read | hit_rate | last_reset_timestamp -----------+------------+-----------+------------+-----------+----------+-------------- test_slot1 | 79183 | 24 | 0 | 24 | 100.00% | 2022-08-05 17:39... test_slot2 | | 1 | 0 | 1 | 100.00% | 2022-08-05 17:34... (2 rows)

The last_reset_timestamp values have been shortened for readability. For more information about this function, see aurora_stat_logical_wal_cache.

Aurora PostgreSQL provides the following two functions for monitoring the write-through cache.

If you find that the automatically adjusted WAL cache size isn't sufficient for your workloads, you can change the the value of the rds.logical_wal_cache manually, by modifying the parameter in your custom DB cluster parameter group. Note that any positive value less than 32kB is treated as 32kB. For more information about the wal_buffers, see Write Ahead Log in the PostgreSQL documentation.

Managing logical slots for Aurora PostgreSQL

Streaming activity is captured in the pg_replication_origin_status view. To see the contents of this view, you can use the pg_show_replication_origin_status() function, as shown following:

SELECT * FROM pg_show_replication_origin_status();

You can get a list of your logical slots by using the following SQL query.

SELECT * FROM pg_replication_slots;

To drop a logical slot, use the pg_drop_replication_slot with the name of the slot, as shown in the following command.

SELECT pg_drop_replication_slot('test_slot');