Resolving vacuum performance issues in RDS for PostgreSQL - Amazon Relational Database Service

Resolving vacuum performance issues in RDS for PostgreSQL

This section discusses factors that often contribute to slower vacuum performance and how to address these issues.

Vacuum large indexes

The VACUUM process consists of several stages: initialization, scanning the heap, vacuuming indexes and the heap, cleaning up indexes, truncating the heap, and performing final cleanup. During scanning, pages are pruned, defragmented, and frozen. After the heap is fully scanned, the indexes are cleaned up, empty pages are returned to the operating system, and the final cleanup tasks, like vacuuming the free space map and updating statistics, are completed.

When vacuuming indexes, multiple passes may be required if the available maintenance_work_mem (or autovacuum_work_mem) is insufficient to process the index. In PostgreSQL versions 16 and earlier, a 1GB limit on memory allocation for storing dead tuple IDs often required multiple passes, especially for large indexes. PostgreSQL version 17 addresses this limitation by introducing TidStore, a dynamic memory allocation system that replaces the single-allocation array. This removes the 1GB constraint, improves memory efficiency, and decreases the likelihood of multiple scans per each index.

However, even in PostgreSQL 17, multiple passes may still be required for large indexes if the available memory is insufficient to process the entire index in one go. Typically, large indexes tend to contain more dead tuples requiring multiple passes.

The postgres_get_av_diag() function detects a slow vacuum operation by calculating the memory needed to vacuum indexes. If the available memory is insufficient to complete the index vacuuming in a single pass, it issues the following recommendation:

NOTICE: Your database is currently running aggressive vacuum to prevent wraparound and it might be slow.
NOTICE: The current setting of autovacuum_work_mem is "XXX MB" and might not be sufficient. Consider increasing the setting, and if necessary, scaling up the Amazon RDS instance class for more memory. Additionally, review the possibility of manual vacuum with exclusion of indexes using (VACUUM (INDEX_CLEANUP FALSE, VERBOSE TRUE) table_name;).
Note

The postgres_get_av_diag() function relies on pg_stat_all_tables.n_dead_tup for estimating the amount of memory required for index vacuuming.

Guidance

You can apply the following workarounds using manual VACUUM FREEZE to speed up freezing the table.

Increase the memory for vacuuming

As suggested by the postgres_get_av_diag() function, it's advisable to increase the autovacuum_work_mem parameter to address potential memory constraints at the instance level. While autovacuum_work_mem is a dynamic parameter, it's important to note that for the new memory setting to take effect, the autovacuum daemon needs to restart its workers. To accomplish this:

  1. Confirm that the new setting is in place.

  2. Terminate the processes currently running autovacuum.

This approach ensures that the adjusted memory allocation is applied to new autovacuum operations.

For more immediate results, consider manually performing a VACUUM FREEZE operation with an increased maintenance_work_mem setting within your session:

SET maintenance_work_mem TO '1GB'; VACUUM FREEZE VERBOSE table_name;

If you're using Amazon RDS and find that you need additional memory to support higher values for maintenance_work_mem or autovacuum_work_mem, consider upgrading to an instance class with more memory. This can provide the necessary resources to enhance both manual and automatic vacuum operations, leading to improved overall vacuum and database performance.

Disable INDEX_CLEANUP

Manual VACUUM in PostgreSQL version 12 and later allows skipping the index cleanup phase, while emergency autovacuum in PostgreSQL version 14 and later does this automatically based on the vacuum_failsafe_age parameter.

Warning

Skipping index cleanup can lead to index bloat and negatively impact query performance. To mitigate this, consider reindexing or vacuuming affected indexes during a maintenance window.

For additional guidance on handling large indexes, refer to the documentation on Managing autovacuum with large indexes .

Parallel index vacuuming

Starting with PostgreSQL 13, indexes can be vacuumed and cleaned in parallel by default using manual VACUUM, with one vacuum worker process assigned to each index. However, for PostgreSQL to determine if a vacuum operation qualifies for parallel execution, specific criteria must be met:

  • There must be at least two indexes.

  • The max_parallel_maintenance_workers parameter should be set to at least 2.

  • The index size must exceed the min_parallel_index_scan_size limit, which defaults to 512KB.

You can adjust the max_parallel_maintenance_workers setting based on the number of vCPUs available on your Amazon RDS instance and the number of indexes on the table to optimize vacuuming turnaround time.

For more information, see Parallel vacuuming in Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL.

Too many tables or databases to vacuum

As mentioned in PostgreSQL's The Autovacuum Daemon documentation, the autovacuum daemon operates through multiple processes. This includes a persistent autovacuum launcher responsible for starting autovacuum worker processes for each database within the system. The launcher schedules these workers to initiate approximately every autovacuum_naptime seconds per database.

With 'N' databases, a new worker begins roughly every [autovacuum_naptime/N seconds]. However, the total number of concurrent workers is limited by the autovacuum_max_workers setting. If the number of databases or tables requiring vacuuming exceeds this limit, the next database or table will be processed as soon as a worker becomes available.

When many large tables or databases require vacuuming concurrently, all available autovacuum workers can become occupied for an extended duration, delaying maintenance on other tables and databases. In environments with high transaction rates, this bottleneck can quickly escalate and potentially lead to wraparound vacuum issues within your Amazon RDS instance.

When postgres_get_av_diag() detects a high number of tables or databases, it provides the following recommendation:

NOTICE: Your database is currently running aggressive vacuum to prevent wraparound and it might be slow.
NOTICE: The current setting of autovacuum_max_workers:3 might not be sufficient. Consider increasing the setting and, if necessary, consider scaling up the Amazon RDS instance class for more workers.

Guidance

Increase autovacuum_max_workers

To expedite the vacuuming, we recommend adjusting the autovacuum_max_workers parameter to allow more concurrent autovacuum workers. If performance bottlenecks persist, consider scaling up your Amazon RDS instance to a class with more vCPUs, which can further improve the parallel processing capabilities.

Aggressive vacuum (to prevent wraparound) is running

The age of the database (MaximumUsedTransactionIDs) in PostgreSQL only decreases when an aggressive vacuum (to prevent wraparound) is successfully completed. Until this vacuum finishes, the age will continue to increase depending on the transaction rate.

The postgres_get_av_diag() function generates the following NOTICE when it detects an aggressive vacuum. However, it only triggers this output after the vacuum has been active for at least two minutes.

NOTICE: Your database is currently running aggressive vacuum to prevent wraparound, monitor autovacuum performance.

For more information about aggressive vacuum, see When an aggressive vacuum is already running.

You can verify if an aggressive vacuum is in progress with the following query:

SELECT a.xact_start AS start_time, v.datname "database", a.query, a.wait_event, v.pid, v.phase, v.relid::regclass, pg_size_pretty(pg_relation_size(v.relid)) AS heap_size, ( SELECT string_agg(pg_size_pretty(pg_relation_size(i.indexrelid)) || ':' || i.indexrelid::regclass || chr(10), ', ') FROM pg_index i WHERE i.indrelid = v.relid ) AS index_sizes, trunc(v.heap_blks_scanned * 100 / NULLIF(v.heap_blks_total, 0)) AS step1_scan_pct, v.index_vacuum_count || '/' || ( SELECT count(*) FROM pg_index i WHERE i.indrelid = v.relid ) AS step2_vacuum_indexes, trunc(v.heap_blks_vacuumed * 100 / NULLIF(v.heap_blks_total, 0)) AS step3_vacuum_pct, age(CURRENT_TIMESTAMP, a.xact_start) AS total_time_spent_sofar FROM pg_stat_activity a INNER JOIN pg_stat_progress_vacuum v ON v.pid = a.pid;

You can determine if it's an aggressive vacuum (to prevent wraparound) by checking the query column in the output. The phrase "to prevent wraparound" indicates that it is an aggressive vacuum.

query | autovacuum: VACUUM public.t3 (to prevent wraparound)

For example, suppose you have a blocker at transaction age 1 billion and a table requiring an aggressive vacuum to prevent wraparound at the same transaction age. Additionally, there's another blocker at transaction age 750 million. After clearing the blocker at transaction age 1 billion, the transaction age won't immediately drop to 750 million. It will remain high until the table needing the aggressive vacuum or any transaction with an age over 750 million is completed. During this period, the transaction age of your PostgreSQL cluster will continue to rise. Once the vacuum process is completed, the transaction age will drop to 750 million but will start increasing again until further vacuuming is finished. This cycle will continue as long as these conditions persist, until the transaction age eventually drops to the level configured for your Amazon RDS instance, specified by autovacuum_freeze_max_age.