Vacuuming and analyzing tables automatically - AWS Prescriptive Guidance

Vacuuming and analyzing tables automatically

Autovacuum is a daemon (that is, it runs in the background) that automatically vacuums (cleans up) dead tuples, reclaims storage, and gathers statistics. It checks for bloated tables in the database and clears the bloat to reuse the space. It monitors database tables and indexes and adds them to a vacuum job after they reach a specific threshold of update or delete operations.

Autovacuum manages vacuuming by automating the PostgreSQL VACUUM and ANALYZE commands. VACUUM removes bloat from tables and reclaims the space, whereas ANALYZE updates the statistics that enable the optimizer to produce efficient plans. VACUUM also performs a major task called vacuum freezing to prevent transaction ID wraparound issues in the database. Every row that is updated in the database receives a transaction ID from the PostgreSQL transaction control mechanism. These IDs control the row's visibility to other concurrent transactions. The transaction ID is a 32-bit number. Two billion IDs are always kept in the visible past. The remaining (about 2.2 billion) IDs are preserved for transactions that will take place in the future and are hidden from the current transaction. PostgreSQL requires an occasional cleaning and freezing of old rows in order to prevent transactions from wrapping around and making old, existing rows invisible when new transactions are created. For more information, see Preventing Transaction ID Wraparound Failures in the PostgreSQL documentation.

Autovacuum is recommended and enabled by default. Its parameters include the following.

Parameter

Description

Default for Amazon RDS

Default for Aurora

autovacuum_vacuum_threshold

The minimum number of tuple update or delete operations that must occur on a table before autovacuum vacuums it.

50 operations

50 operations

autovacuum_analyze_threshold

The minimum number of tuple inserts, updates, or deletes that must occur on a table before autovacuum analyzes it.

50 operations

50 operations

autovacuum_vacuum_scale_factor

The percentage of tuples that must be modified in a table before autovacuum vacuums it.

0.2%

0.1%

autovacuum_analyze_scale_factor

The percentage of tuples that must be modified in a table before autovacuum analyzes it.

0.05%

0.05%

autovacuum_freeze_max_age

The maximum age of frozen IDs before a table is vacuumed to prevent transaction ID wraparound issues.

200,000,000 transactions

200,000,000 transactions

Autovacuum makes a list of tables to process based on specific threshold formulas, as follows.

  • Threshold for running VACUUM on a table:

    vacuum threshold = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * Total row count of table)
  • Threshold for running ANALYZE on a table:

    analyze threshold = autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor * Total row count of table)

For small to medium-sized tables, the default values might be sufficient. However, a large table that has frequent data modifications will have a higher number of dead tuples. In this case, autovacuum might process the table frequently for maintenance, and the maintenance of other tables might get delayed or ignored until the large table finishes. To avoid this, you can tune the autovacuum parameters described in the following section.

Autovacuum memory-related parameters

autovacuum_max_workers

Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that can run at the same time. This parameter can be set only when you start the server. If the autovacuum process is busy with a large table, this parameter helps run the cleanup for other tables.

maintenance_work_mem

Specifies the maximum amount of memory to be used by maintenance operations such as VACUUM, CREATE INDEX, and ALTER. In Amazon RDS and Aurora, memory is allocated based on the instance class by using the formula GREATEST({DBInstanceClassMemory/63963136*1024},65536). When autovacuum runs, up to autovacuum_max_workers times that calculated value can be allocated, so be careful not to set the value too high. To control this, you can set autovacuum_work_mem separately.

autovacuum_work_mem

Specifies the maximum amount of memory to be used by each autovacuum worker process. This parameter defaults to -1, which indicates that you should use the value of maintenance_work_mem instead.

For more information about autovacuum memory parameters, see Allocating memory for autovacuum in the Amazon RDS documentation.

Tuning autovacuum parameters

Users might need to tune autovacuum parameters depending on the their update and delete operations. The settings for the following parameters can be set at the table, instance, or cluster level.

Cluster or instance level

As an example, let's look at a banking database where continuous data manipulation language (DML) operations are expected. To maintain the database's health, you should tune autovacuum parameters at the cluster level for Aurora and at the instance level for Amazon RDS, and apply the same parameter group to the reader as well. In the case of a failover, the same parameters should apply to the new writer.

Table level

For example, in a database for food delivery where continuous DML operations are expected on a single table called orders, you should consider tuning the autovacuum_analyze_threshold parameter at the table level by using the following command:

ALTER TABLE <table_name> SET (autovacuum_analyze_threshold = <threshold rows>)

Using aggressive autovacuum settings at the table level

The example orders table that has continuous update and delete operations becomes a candidate for vacuuming because of default autovacuum settings. This leads to bad plan generation and slow queries. Clearing the bloat and updating statistics requires table-level aggressive autovaccum settings.

To determine settings, keep track of the duration of queries running on this table and identify the percentage of DML operations that result in plan changes. The pg_stat_all_table view helps you track insert, update, and delete operations.

Let's assume that the optimizer generates bad plans whenever 5 percent of the orders table changes. In this case, you should change the threshold to 5 percent as follows:

ALTER TABLE orders SET (autovacuum_analyze_threshold = 0.05 and autovacuum_vacuum_threshold = 0.05)
Tip

Select aggressive autovacuum settings carefully to avoid high consumption of resources.

For more information, see the following:

To make sure that autovacuum works effectively, monitor dead rows, disk usage, and the last time autovacuum or ANALYZE ran on a regular basis. The pg_stat_all_tables view provides information on each table (relname) and how many dead tuples (n_dead_tup) are in the table.

Monitoring the number of dead tuples in each table, especially in frequently updated tables, helps you determine if the autovacuum processes are periodically removing the dead tuples so their disk space can be reused for better performance. You can use the following query to check the number of dead tuples and when the last autovacuum ran on the tables:

SELECT relname AS TableName,n_live_tup AS LiveTuples,n_dead_tup AS DeadTuples, last_autovacuum AS Autovacuum,last_autoanalyze AS AutoanalyzeFROM pg_all_user_tables;

Advantages and limitations

Autovacuum provides the following advantages:

  • It removes bloat from tables automatically.

  • It prevents transaction ID wraparound.

  • It keeps database statistics up to date.

Limitations:

  • If queries use parallel processing, the number of worker processes might not be enough for autovacuum.

  • If autovacuum runs during peak hours, resource utilization might increase. You should tune parameters to handle this issue.

  • If table pages are occupied in another session, autovacuum might skip those pages.

  • Autovacuum can't access temporary tables.