Resolving identifiable vacuum blockers in RDS for PostgreSQL - Amazon Relational Database Service

Resolving identifiable vacuum blockers in RDS for PostgreSQL

Autovacuum performs aggressive vacuums and lowers the age of transaction IDs to below the threshold specified by the autovacuum_freeze_max_age parameter of your RDS instance. You can track this age using the Amazon CloudWatch metric MaximumUsedTransactionIDs.

To find the setting of autovacuum_freeze_max_age (which has a default of 200 million transaction IDs) for your Amazon RDS instance, you can use the following query:

SELECT TO_CHAR(setting::bigint, 'FM9,999,999,999') autovacuum_freeze_max_age FROM pg_settings WHERE name = 'autovacuum_freeze_max_age';

Note that postgres_get_av_diag() only checks for aggressive vacuum blockers when the age exceeds Amazon RDS’ adaptive autovacuum threshold of 500 million transaction IDs. For postgres_get_av_diag() to detect blockers, the blocker must be at least 500 million transactions old.

The postgres_get_av_diag() function identifies the following types of blockers:

Active statement

In PostgreSQL, an active statement is an SQL statement that is currently being executed by the database. This includes queries, transactions, or any operations in progress. When monitoring via pg_stat_activity, the state column indicates that the process with the corresponding PID is active.

The postgres_get_av_diag() function displays output similar to the following when it identifies a statement that is an active statement.

blocker | Active statement database | my_database blocker_identifier | SELECT pg_sleep(20000); wait_event | Timeout:PgSleep autovacuum_lagging_by | 568,600,871 suggestion | Connect to database "my_database", review carefully and you may consider terminating the process using suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. suggested_action | {"SELECT pg_terminate_backend (29621);"}

Suggested action

Following the guidance in the suggestion column, the user can connect to the database where the active statement is present and, as specified in the suggested_action column, it's advisable to carefully review the option to terminate the session. If termination is safe, you may use the pg_terminate_backend() function to terminate the session. This action can be performed by an administrator (such as the RDS master account) or a user with the required pg_terminate_backend() privilege.

Warning

A terminated session will undo (ROLLBACK) changes it made. Depending on your requirements, you may want to rerun the statement. However, it is recommended to do so only after the autovacuum process has finished its aggressive vacuum operation.

Idle in transaction

An idle in transaction statement refers to any session that has opened an explicit transaction (such as by issuing a BEGIN statement), performed some work, and is now waiting for the client to either pass more work or signal the end of the transaction by issuing a COMMIT, ROLLBACK, or END (which would result in an implicit COMMIT).

The postgres_get_av_diag() function displays output similar to the following when it identifies an idle in transaction statement as a blocker.

blocker | idle in transaction database | my_database blocker_identifier | INSERT INTO tt SELECT * FROM tt; wait_event | Client:ClientRead autovacuum_lagging_by | 1,237,201,759 suggestion | Connect to database "my_database", review carefully and you may consider terminating the process using suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. suggested_action | {"SELECT pg_terminate_backend (28438);"}

Suggested action

As indicated in the suggestion column, you can connect to the database where the idle in transaction session is present and terminate the session using the pg_terminate_backend() function. The user can be your admin (RDS master account) user or a user with the pg_terminate_backend() privilege.

Warning

A terminated session will undo (ROLLBACK) changes it made. Depending on your requirements, you may want to rerun the statement. However, it is recommended to do so only after the autovacuum process has finished its aggressive vacuum operation.

Prepared transaction

PostgreSQL allows transactions that are part of a two-phase commit strategy called prepared transactions. These are enabled by setting the max_prepared_transactions parameter to a non-zero value. Prepared transactions are designed to ensure that a transaction is durable and remains available even after database crashes, restarts, or client disconnections. Like regular transactions, they are assigned a transaction ID and can affect the autovacuum. If left in a prepared state, autovacuum cannot perform freeezing and it can lead to transaction ID wraparound.

When transactions are left prepared indefinitely without being resolved by a transaction manager, they become orphaned prepared transactions. The only way to fix this is to either commit or rollback the transaction using the COMMIT PREPARED or ROLLBACK PREPARED commands, respectively.

Note

Be aware that a backup taken during a prepared transaction will still contain that transaction after restoration. Refer to the following information about how to locate and close such transactions.

The postgres_get_av_diag() function displays the following output when it identifies a blocker that is a prepared transaction.

blocker | Prepared transaction database | my_database blocker_identifier | myptx wait_event | Not applicable autovacuum_lagging_by | 1,805,802,632 suggestion | Connect to database "my_database" and consider either COMMIT or ROLLBACK the prepared transaction using suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. suggested_action | {"COMMIT PREPARED 'myptx';",[OR],"ROLLBACK PREPARED 'myptx';"}

Suggested action

As mentioned in the suggestion column, connect to the database where the prepared transaction is located. Based on the suggested_action column, carefully review whether to perform either COMMIT or ROLLBACK, and the the appropiate the action.

To monitor prepared transactions in general, PostgreSQL offers a catalog view called pg_prepared_xacts. You can use the following query to find prepared transactions.

SELECT gid, prepared, owner, database, transaction AS oldest_xmin FROM pg_prepared_xacts ORDER BY age(transaction) DESC;

Logical replication slot

The purpose of a replication slot is to hold unconsumed changes until they are replicated to a target server. For more information, see PostgreSQL's Logical replication.

There are two types of logical replication slots.

Inactive logical replication slots

When replication is terminated, unconsumed transaction logs can't be removed, and the replication slot becomes inactive. Although an inactive logical replication slot isn't currently used by a subscriber, it remains on the server, leading to the retention of WAL files and preventing the removal of old transaction logs. This can increase disk usage and specifically block autovacuum from cleaning up internal catalog tables, as the system must preserve LSN information from being overwritten. If not addressed, this can result in catalog bloat, performance degradation, and an increased risk of wraparound vacuum, potentially causing transaction downtime.

Active but slow logical replication slots

Sometimes removal of dead tuples of catalog is delayed due to the performance degradation of logical replication. This delay in replication slows down updating the catalog_xmin and can lead to catalog bloat and wraparound vacuum.

The postgres_get_av_diag() function displays output similar to the following when it finds a logical replication slot as a blocker.

blocker | Logical replication slot database | my_database blocker_identifier | slot1 wait_event | Not applicable autovacuum_lagging_by | 1,940,103,068 suggestion | Ensure replication is active and resolve any lag for the slot if active. If inactive, consider dropping it using the command in suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. suggested_action | {"SELECT pg_drop_replication_slot('slot1') FROM pg_replication_slots WHERE active = 'f';"}

Suggested action

To resolve this problem, check the replication configuration for issues with the target schema or data that might be terminating the apply process. The most common reasons are the following:

  • Missing columns

  • Incompatible data type

  • Data mismatch

  • Missing table

If the problem is related to infrastructure issues:

If your instance is outside the AWS network or on AWS EC2, consult your administrator on how to resolve the availability or infrastructure-related issues.

Dropping the inactive slot

Warning

Caution: Before dropping a replication slot, carefully ensure that it has no ongoing replication, is inactive, and is in an unrecoverable state. Dropping a slot prematurely could disrupt replication or cause data loss.

After confirming that the replication slot is no longer needed, drop it to allow autovacuum to continue. The condition active = 'f' ensures that only an inactive slot is dropped.

SELECT pg_drop_replication_slot('slot1') WHERE active ='f'

Read replicas

When the hot_standby_feedback setting is enabled for Amazon RDS read replicas, it prevents autovacuum on the primary database from removing dead rows that might still be needed by queries running on the read replica. This affects all types of physical read replicas including those managed with or without replication slots. This behavior is necessary because queries running on the standby replica require those rows to remain available on the primary preventing query conflicts and cancellations.

Read replica with physical replication slot

Read replicas with physical replication slots significantly enhance the reliability and stability of replication in RDS for PostgreSQL. These slots ensure the primary database retains essential Write-Ahead Log files until the replica processes them, maintaining data consistency even during network disruptions.

Beginning with RDS for PostgreSQL version 14, all replicas utilize replication slots. In earlier versions, only cross-Region replicas used replication slots.

The postgres_get_av_diag() function displays output similar to the following when it finds a read replica with physical replication slot as the blocker.

blocker | Read replica with physical replication slot database | blocker_identifier | rds_us_west_2_db_xxxxxxxxxxxxxxxxxxxxx wait_event | Not applicable autovacuum_lagging_by | 554,080,689 suggestion | Run the following query on the replica "rds_us_west_2_db_xxxxxxxxxxxxxxxxxxxx" to find the long running query: | SELECT * FROM pg_catalog.pg_stat_activity WHERE backend_xmin::text::bigint = 757989377; | Review carefully and you may consdier terminating the query on read replica using suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. + | suggested_action | {"SELECT pg_terminate_backend(pid) FROM pg_catalog.pg_stat_activity WHERE backend_xmin::text::bigint = 757989377;"," + | [OR] + | ","Disable hot_standby_feedback"," + | [OR] + | ","Delete the read replica if not needed"}
Read replica with streaming replication

Amazon RDS allows setting up read replicas without a physical replication slot in older versions, up to version 13. This approach reduces overhead by allowing the primary to recycle WAL files more aggressively, which is advantageous in environments with limited disk space and can tolerate occasional ReplicaLag. However, without a slot, the standby must remain in sync to avoid missing WAL files. Amazon RDS uses archived WAL files to help the replica catch up if it falls behind, but this process requires careful monitoring and can be slow.

The postgres_get_av_diag() function displays output similar to the following when it finds a streaming read replica as the blocker.

blocker | Read replica with streaming replication slot database | Not applicable blocker_identifier | xx.x.x.xxx/xx wait_event | Not applicable autovacuum_lagging_by | 610,146,760 suggestion | Run the following query on the replica "xx.x.x.xxx" to find the long running query: + | SELECT * FROM pg_catalog.pg_stat_activity WHERE backend_xmin::text::bigint = 348319343; + | Review carefully and you may consdier terminating the query on read replica using suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. + | suggested_action | {"SELECT pg_terminate_backend(pid) FROM pg_catalog.pg_stat_activity WHERE backend_xmin::text::bigint = 348319343;"," + | [OR] + | ","Disable hot_standby_feedback"," + | [OR] + | ","Delete the read replica if not needed"}

Suggested action

As recommended in the suggested_action column, carefully review these options to unblock autovacuum.

  • Terminate the query – Following the guidance in the suggestion column, you can connect to the read replica, as specified in the suggested_action column, it's advisable to carefully review the option to terminate the session. If termination is deemed safe, you may use the pg_terminate_backend() function to terminate the session. This action can be performed by an administrator (such as the RDS master account) or a user with the required pg_terminate_backend() privilege.

    You may run the following SQL command on the read replica to terminate the query that is preventing the vacuum on the primary from cleaning up old rows. The value of backend_xmin is reported in the function’s output:

    SELECT pg_terminate_backend(pid) FROM pg_catalog.pg_stat_activity WHERE backend_xmin::text::bigint = backend_xmin;
  • Disable hot standby feedback – Consider disabling the hot_standby_feedback parameter if it's causing significant vacuum delays.

    The hot_standby_feedback parameter allows a read replica to inform the primary about its query activity, preventing the primary from vacuuming tables or rows that are in use on the standby. While this ensures query stability on the standby, it can significantly delay vacuuming on the primary. Disabling this feature allows the primary to proceed with vacuuming without waiting for the standby to catch up. However, this can lead to query cancellations or failures on the standby if it attempts to access rows that have been vacuumed by the primary.

  • Delete the read replica if not needed – If the read replica is no longer necessary, you can delete it. This will remove the associated replication overhead and allow the primary to recycle transaction logs without being held back by the replica.

Temporary tables

Temporary tables, created using the TEMPORARY keyword, reside in the temp schema, for example pg_temp_xxx, and are only accessible to the session that created them. Temporary tables are dropped when the session ends. However, these tables are invisible to PostgreSQL's autovacuum process, and must be manually vacuumed by the session that created them. Trying to vacuum the temp table from another session has no effect.

In unusual circumstances, a temporary table exists without an active session owning it. If the owning session ends unexpectedly due to a fatal crash, network issue, or similar event, the temporary table might not be cleaned up, leaving it behind as an "orphaned" table. When the PostgreSQL autovacuum process detects an orphaned temporary table, it logs the following message:

LOG: autovacuum: found orphan temp table \"%s\".\"%s\" in database \"%s\"

The postgres_get_av_diag() function displays output similar to the following when it identifies a temporary table as a blocker. For the function to correctly show the output related to temporary tables, it needs to be executed within the same database where those tables exist.

blocker | Temporary table database | my_database blocker_identifier | pg_temp_14.ttemp wait_event | Not applicable autovacuum_lagging_by | 1,805,802,632 suggestion | Connect to database "my_database". Review carefully, you may consider dropping temporary table using command in suggested_action. For more information, see Working with PostgreSQL autovacuum in the Amazon RDS User Guide. suggested_action | {"DROP TABLE ttemp;"}

Suggested action

Follow the instructions provided in the suggestion column of the output to identify and remove the temporary table that is preventing autovacuum from running. Use the following command to drop the temporary table reported by postgres_get_av_diag(). Replace the table name based on the output provided by the postgres_get_av_diag() function.

DROP TABLE my_temp_schema.my_temp_table;

The following query can be used to identify temporary tables:

SELECT oid, relname, relnamespace::regnamespace, age(relfrozenxid) FROM pg_class WHERE relpersistence = 't' ORDER BY age(relfrozenxid) DESC;