

# Tuning with wait events for Aurora PostgreSQL
<a name="AuroraPostgreSQL.Tuning"></a>

Wait events are an important tuning tool for Aurora PostgreSQL. When you can find out why sessions are waiting for resources and what they are doing, you're better able to reduce bottlenecks. You can use the information in this section to find possible causes and corrective actions. Before delving into this section, we strongly recommend that you understand basic Aurora concepts, especially the following topics:
+ [Amazon Aurora storage](Aurora.Overview.StorageReliability.md)
+ [Managing performance and scaling for Aurora DB clusters](Aurora.Managing.Performance.md) 

**Important**  
The wait events in this section are specific to Aurora PostgreSQL. Use the information in this section to tune Amazon Aurora only, not RDS for PostgreSQL.  
Some wait events in this section have no analogs in the open source versions of these database engines. Other wait events have the same names as events in open source engines, but behave differently. For example, Amazon Aurora storage works differently from open source storage, so storage-related wait events indicate different resource conditions.

**Topics**
+ [Essential concepts for Aurora PostgreSQL tuning](AuroraPostgreSQL.Tuning.concepts.md)
+ [Aurora PostgreSQL wait events](AuroraPostgreSQL.Tuning.concepts.summary.md)
+ [Client:ClientRead](apg-waits.clientread.md)
+ [Client:ClientWrite](apg-waits.clientwrite.md)
+ [CPU](apg-waits.cpu.md)
+ [IO:BufFileRead and IO:BufFileWrite](apg-waits.iobuffile.md)
+ [IO:DataFileRead](apg-waits.iodatafileread.md)
+ [IO:XactSync](apg-waits.xactsync.md)
+ [IPC:DamRecordTxAck](apg-waits.ipcdamrecordtxac.md)
+ [IPC:parallel wait events](apg-ipc-parallel.md)
+ [IPC:ProcArrayGroupUpdate](apg-rpg-ipcprocarraygroup.md)
+ [Lock:advisory](apg-waits.lockadvisory.md)
+ [Lock:extend](apg-waits.lockextend.md)
+ [Lock:Relation](apg-waits.lockrelation.md)
+ [Lock:transactionid](apg-waits.locktransactionid.md)
+ [Lock:tuple](apg-waits.locktuple.md)
+ [LWLock:buffer\$1content (BufferContent)](apg-waits.lockbuffercontent.md)
+ [LWLock:buffer\$1mapping](apg-waits.lwl-buffer-mapping.md)
+ [LWLock:BufferIO (IPC:BufferIO)](apg-waits.lwlockbufferio.md)
+ [LWLock:lock\$1manager](apg-waits.lw-lock-manager.md)
+ [LWLock:MultiXact](apg-waits.lwlockmultixact.md)
+ [LWLock:pg\$1stat\$1statements](apg-rpg-lwlockpgstat.md)
+ [Timeout:PgSleep](apg-waits.timeoutpgsleep.md)

# Essential concepts for Aurora PostgreSQL tuning
<a name="AuroraPostgreSQL.Tuning.concepts"></a>

Before you tune your Aurora PostgreSQL database, make sure to learn what wait events are and why they occur. Also review the basic memory and disk architecture of Aurora PostgreSQL. For a helpful architecture diagram, see the [PostgreSQL](https://en.wikibooks.org/wiki/PostgreSQL/Architecture) wikibook.

**Topics**
+ [Aurora PostgreSQL wait events](#AuroraPostgreSQL.Tuning.concepts.waits)
+ [Aurora PostgreSQL memory](#AuroraPostgreSQL.Tuning.concepts.memory)
+ [Aurora PostgreSQL processes](#AuroraPostgreSQL.Tuning.concepts.processes)

## Aurora PostgreSQL wait events
<a name="AuroraPostgreSQL.Tuning.concepts.waits"></a>

A *wait event* indicates a resource for which a session is waiting. For example, the wait event `Client:ClientRead` occurs when Aurora PostgreSQL is waiting to receive data from the client. Typical resources that a session waits for include the following:
+ Single-threaded access to a buffer, for example, when a session is attempting to modify a buffer
+ A row that is currently locked by another session
+ A data file read
+ A log file write

For example, to satisfy a query, the session might perform a full table scan. If the data isn't already in memory, the session waits for the disk I/O to complete. When the buffers are read into memory, the session might need to wait because other sessions are accessing the same buffers. The database records the waits by using a predefined wait event. These events are grouped into categories.

A wait event doesn't by itself show a performance problem. For example, if requested data isn't in memory, reading data from disk is necessary. If one session locks a row for an update, another session waits for the row to be unlocked so that it can update it. A commit requires waiting for the write to a log file to complete. Waits are integral to the normal functioning of a database. 

Large numbers of wait events typically show a performance problem. In such cases, you can use wait event data to determine where sessions are spending time. For example, if a report that typically runs in minutes now runs for hours, you can identify the wait events that contribute the most to total wait time. If you can determine the causes of the top wait events, you can sometimes make changes that improve performance. For example, if your session is waiting on a row that has been locked by another session, you can end the locking session. 

## Aurora PostgreSQL memory
<a name="AuroraPostgreSQL.Tuning.concepts.memory"></a>

Aurora PostgreSQL memory is divided into shared and local.

**Topics**
+ [Shared memory in Aurora PostgreSQL](#AuroraPostgreSQL.Tuning.concepts.shared)
+ [Local memory in Aurora PostgreSQL](#AuroraPostgreSQL.Tuning.concepts.local)

### Shared memory in Aurora PostgreSQL
<a name="AuroraPostgreSQL.Tuning.concepts.shared"></a>

Aurora PostgreSQL allocates shared memory when the instance starts. Shared memory is divided into multiple subareas. Following, you can find a description of the most important ones.

**Topics**
+ [Shared buffers](#AuroraPostgreSQL.Tuning.concepts.buffer-pool)
+ [Write ahead log (WAL) buffers](#AuroraPostgreSQL.Tuning.concepts.WAL)

#### Shared buffers
<a name="AuroraPostgreSQL.Tuning.concepts.buffer-pool"></a>

The *shared buffer pool* is an Aurora PostgreSQL memory area that holds all pages that are or were being used by application connections. A *page* is the memory version of a disk block. The shared buffer pool caches the data blocks read from disk. The pool reduces the need to reread data from disk, making the database operate more efficiently.

Every table and index is stored as an array of pages of a fixed size. Each block contains multiple tuples, which correspond to rows. A tuple can be stored in any page.

The shared buffer pool has finite memory. If a new request requires a page that isn't in memory, and no more memory exists, Aurora PostgreSQL evicts a less frequently used page to accommodate the request. The eviction policy is implemented by a clock sweep algorithm.

The `shared_buffers` parameter determines how much memory the server dedicates to caching data.

#### Write ahead log (WAL) buffers
<a name="AuroraPostgreSQL.Tuning.concepts.WAL"></a>

A *write-ahead log (WAL) buffer* holds transaction data that Aurora PostgreSQL later writes to persistent storage. Using the WAL mechanism, Aurora PostgreSQL can do the following:
+ Recover data after a failure
+ Reduce disk I/O by avoiding frequent writes to disk

When a client changes data, Aurora PostgreSQL writes the changes to the WAL buffer. When the client issues a `COMMIT`, the WAL writer process writes transaction data to the WAL file.

The `wal_level` parameter determines how much information is written to the WAL.

### Local memory in Aurora PostgreSQL
<a name="AuroraPostgreSQL.Tuning.concepts.local"></a>

Every backend process allocates local memory for query processing.

**Topics**
+ [Work memory area](#AuroraPostgreSQL.Tuning.concepts.local.work_mem)
+ [Maintenance work memory area](#AuroraPostgreSQL.Tuning.concepts.local.maintenance_work_mem)
+ [Temporary buffer area](#AuroraPostgreSQL.Tuning.concepts.temp)

#### Work memory area
<a name="AuroraPostgreSQL.Tuning.concepts.local.work_mem"></a>

The *work memory area* holds temporary data for queries that performs sorts and hashes. For example, a query with an `ORDER BY` clause performs a sort. Queries use hash tables in hash joins and aggregations.

The `work_mem` parameter the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The default value is 4 MB. Multiple sessions can run simultaneously, and each session can run maintenance operations in parallel. For this reason, the total work memory used can be multiples of the `work_mem` setting. 

#### Maintenance work memory area
<a name="AuroraPostgreSQL.Tuning.concepts.local.maintenance_work_mem"></a>

The *maintenance work memory area* caches data for maintenance operations. These operations include vacuuming, creating an index, and adding foreign keys.

The `maintenance_work_mem` parameter specifies the maximum amount of memory to be used by maintenance operations. The default value is 64 MB. A database session can only run one maintenance operation at a time.

#### Temporary buffer area
<a name="AuroraPostgreSQL.Tuning.concepts.temp"></a>

The *temporary buffer area* caches temporary tables for each database session.

Each session allocates temporary buffers as needed up to the limit you specify. When the session ends, the server clears the buffers.

The `temp_buffers` parameter sets the maximum number of temporary buffers used by each session. Before the first use of temporary tables within a session, you can change the `temp_buffers` value.

## Aurora PostgreSQL processes
<a name="AuroraPostgreSQL.Tuning.concepts.processes"></a>

Aurora PostgreSQL uses multiple processes.

**Topics**
+ [Postmaster process](#AuroraPostgreSQL.Tuning.concepts.postmaster)
+ [Backend processes](#AuroraPostgreSQL.Tuning.concepts.backend)
+ [Background processes](#AuroraPostgreSQL.Tuning.concepts.vacuum)

### Postmaster process
<a name="AuroraPostgreSQL.Tuning.concepts.postmaster"></a>

The *postmaster process* is the first process started when you start Aurora PostgreSQL. The postmaster process has the following primary responsibilities:
+ Fork and monitor background processes
+ Receive authentication requests from client processes, and authenticate them before allowing the database to service requests

### Backend processes
<a name="AuroraPostgreSQL.Tuning.concepts.backend"></a>

If the postmaster authenticates a client request, the postmaster forks a new backend process, also called a postgres process. One client process connects to exactly one backend process. The client process and the backend process communicate directly without intervention by the postmaster process.

### Background processes
<a name="AuroraPostgreSQL.Tuning.concepts.vacuum"></a>

The postmaster process forks several processes that perform different backend tasks. Some of the more important include the following:
+ WAL writer

  Aurora PostgreSQL writes data in the WAL (write ahead logging) buffer to the log files. The principle of write ahead logging is that the database can't write changes to the data files until after the database writes log records describing those changes to disk. The WAL mechanism reduces disk I/O, and allows Aurora PostgreSQL to use the logs to recover the database after a failure.
+ Background writer

  This process periodically write dirty (modified) pages from the memory buffers to the data files. A page becomes dirty when a backend process modifies it in memory.
+ Autovacuum daemon

  The daemon consists of the following:
  + The autovacuum launcher
  + The autovacuum worker processes

  When autovacuum is turned on, it checks for tables that have had a large number of inserted, updated, or deleted tuples. The daemon has the following responsibilities:
  + Recover or reuse disk space occupied by updated or deleted rows
  + Update statistics used by the planner
  + Protect against loss of old data because of transaction ID wraparound

  The autovacuum feature automates the execution of `VACUUM` and `ANALYZE` commands. `VACUUM` has the following variants: standard and full. Standard vacuum runs in parallel with other database operations. `VACUUM FULL` requires an exclusive lock on the table it is working on. Thus, it can't run in parallel with operations that access the same table. `VACUUM` creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.

# Aurora PostgreSQL wait events
<a name="AuroraPostgreSQL.Tuning.concepts.summary"></a>

The following table lists the wait events for Aurora PostgreSQL that most commonly indicate performance problems, and summarizes the most common causes and corrective actions. The following wait events are a subset of the list in [Amazon Aurora PostgreSQL wait events](AuroraPostgreSQL.Reference.Waitevents.md).


| Wait event | Definition | 
| --- | --- | 
|  [Client:ClientRead](apg-waits.clientread.md)  |  This event occurs when Aurora PostgreSQL is waiting to receive data from the client.  | 
|  [Client:ClientWrite](apg-waits.clientwrite.md)  |  This event occurs when Aurora PostgreSQL is waiting to write data to the client.  | 
|  [CPU](apg-waits.cpu.md)  |  This event occurs when a thread is active in CPU or is waiting for CPU.  | 
|  [IO:BufFileRead and IO:BufFileWrite](apg-waits.iobuffile.md)  |  These events occur when Aurora PostgreSQL creates temporary files.  | 
|  [IO:DataFileRead](apg-waits.iodatafileread.md)  |  This event occurs when a connection waits on a backend process to read a required page from storage because the page isn't available in shared memory.   | 
|  [IO:XactSync](apg-waits.xactsync.md)  |  This event occurs when the database is waiting for the Aurora storage subsystem to acknowledge the commit of a regular transaction, or the commit or rollback of a prepared transaction.   | 
|  [IPC:DamRecordTxAck](apg-waits.ipcdamrecordtxac.md)  |  This event occurs when Aurora PostgreSQL in a session using database activity streams generates an activity stream event, then waits for that event to become durable.  | 
|  [IPC:parallel wait events](apg-ipc-parallel.md)  |  These wait events indicate that a session is waiting for inter-process communication related to parallel query execution operations.  | 
|  [IPC:ProcArrayGroupUpdate](apg-rpg-ipcprocarraygroup.md)  |  This event occurs when a session is waiting for the group leader to update the transaction status at the end of the transaction.  | 
|  [Lock:advisory](apg-waits.lockadvisory.md)  |  This event occurs when a PostgreSQL application uses a lock to coordinate activity across multiple sessions.  | 
|  [Lock:extend](apg-waits.lockextend.md) |  This event occurs when a backend process is waiting to lock a relation to extend it while another process has a lock on that relation for the same purpose.  | 
|  [Lock:Relation](apg-waits.lockrelation.md)  |  This event occurs when a query is waiting to acquire a lock on a table or view that's currently locked by another transaction.  | 
|  [Lock:transactionid](apg-waits.locktransactionid.md)  | This event occurs when a transaction is waiting for a row-level lock. | 
|  [Lock:tuple](apg-waits.locktuple.md)  |  This event occurs when a backend process is waiting to acquire a lock on a tuple.  | 
|  [LWLock:buffer\$1content (BufferContent)](apg-waits.lockbuffercontent.md)  |  This event occurs when a session is waiting to read or write a data page in memory while another session has that page locked for writing.  | 
|  [LWLock:buffer\$1mapping](apg-waits.lwl-buffer-mapping.md)  |  This event occurs when a session is waiting to associate a data block with a buffer in the shared buffer pool.  | 
|  [LWLock:BufferIO (IPC:BufferIO)](apg-waits.lwlockbufferio.md)  |  This event occurs when Aurora PostgreSQL or RDS for PostgreSQL is waiting for other processes to finish their input/output (I/O) operations when concurrently trying to access a page.  | 
|  [LWLock:lock\$1manager](apg-waits.lw-lock-manager.md)  | This event occurs when the Aurora PostgreSQL engine maintains the shared lock's memory area to allocate, check, and deallocate a lock when a fast path lock isn't possible. | 
| [LWLock:MultiXact](apg-waits.lwlockmultixact.md)  | This type of event occurs when Aurora PostgreSQL is keeping a session open to complete multiple transactions that involve the same row in a table. The wait event denotes which aspect of multiple transaction processing is generating the wait event, that is, LWLock:MultiXactOffsetSLRU, LWLock:MultiXactOffsetBuffer, LWLock:MultiXactMemberSLRU, or LWLock:MultiXactMemberBuffer. | 
|  [LWLock:pg\$1stat\$1statements](apg-rpg-lwlockpgstat.md)  |  This event occurs when the `pg_stat_statements` extension takes an exclusive lock on the hash table that tracks SQL statements.  | 
|  [Timeout:PgSleep](apg-waits.timeoutpgsleep.md)  |  This event occurs when a server process has called the `pg_sleep` function and is waiting for the sleep timeout to expire.   | 

# Client:ClientRead
<a name="apg-waits.clientread"></a>

The `Client:ClientRead` event occurs when Aurora PostgreSQL is waiting to receive data from the client.

**Topics**
+ [Supported engine versions](#apg-waits.clientread.context.supported)
+ [Context](#apg-waits.clientread.context)
+ [Likely causes of increased waits](#apg-waits.clientread.causes)
+ [Actions](#apg-waits.clientread.actions)

## Supported engine versions
<a name="apg-waits.clientread.context.supported"></a>

This wait event information is supported for Aurora PostgreSQL version 10 and higher.

## Context
<a name="apg-waits.clientread.context"></a>

An Aurora PostgreSQL DB cluster is waiting to receive data from the client. The Aurora PostgreSQL DB cluster must receive the data from the client before it can send more data to the client. The time that the cluster waits before receiving data from the client is a `Client:ClientRead` event.

## Likely causes of increased waits
<a name="apg-waits.clientread.causes"></a>

Common causes for the `Client:ClientRead` event to appear in top waits include the following: 

**Increased network latency**  
There might be increased network latency between the Aurora PostgreSQL DB cluster and client. Higher network latency increases the time required for DB cluster to receive data from the client.

**Increased load on the client**  
There might be CPU pressure or network saturation on the client. An increase in load on the client can delay transmission of data from the client to the Aurora PostgreSQL DB cluster.

**Excessive network round trips**  
A large number of network round trips between the Aurora PostgreSQL DB cluster and the client can delay transmission of data from the client to the Aurora PostgreSQL DB cluster.

**Large copy operation**  
During a copy operation, the data is transferred from the client's file system to the Aurora PostgreSQL DB cluster. Sending a large amount of data to the DB cluster can delay transmission of data from the client to the DB cluster.

**Idle client connection**  
A connection to an Aurora PostgreSQL DB instance is in idle in transaction state and is waiting for a client to send more data or issue a command. This state can lead to an increase in `Client:ClientRead` events.

**PgBouncer used for connection pooling**  
PgBouncer has a low-level network configuration setting called `pkt_buf`, which is set to 4,096 by default. If the workload is sending query packets larger than 4,096 bytes through PgBouncer, we recommend increasing the `pkt_buf` setting to 8,192. If the new setting doesn't decrease the number of `Client:ClientRead` events, we recommend increasing the `pkt_buf` setting to larger values, such as 16,384 or 32,768. If the query text is large, the larger setting can be particularly helpful.

## Actions
<a name="apg-waits.clientread.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Place the clients in the same Availability Zone and VPC subnet as the cluster](#apg-waits.clientread.actions.az-vpc-subnet)
+ [Scale your client](#apg-waits.clientread.actions.scale-client)
+ [Use current generation instances](#apg-waits.clientread.actions.db-instance-class)
+ [Increase network bandwidth](#apg-waits.clientread.actions.increase-network-bandwidth)
+ [Monitor maximums for network performance](#apg-waits.clientread.actions.monitor-network-performance)
+ [Monitor for transactions in the "idle in transaction" state](#apg-waits.clientread.actions.check-idle-in-transaction)

### Place the clients in the same Availability Zone and VPC subnet as the cluster
<a name="apg-waits.clientread.actions.az-vpc-subnet"></a>

To reduce network latency and increase network throughput, place clients in the same Availability Zone and virtual private cloud (VPC) subnet as the Aurora PostgreSQL DB cluster. Make sure that the clients are as geographically close to the DB cluster as possible.

### Scale your client
<a name="apg-waits.clientread.actions.scale-client"></a>

Using Amazon CloudWatch or other host metrics, determine if your client is currently constrained by CPU or network bandwidth, or both. If the client is constrained, scale your client accordingly.

### Use current generation instances
<a name="apg-waits.clientread.actions.db-instance-class"></a>

In some cases, you might not be using a DB instance class that supports jumbo frames. If you're running your application on Amazon EC2, consider using a current generation instance for the client. Also, configure the maximum transmission unit (MTU) on the client operating system. This technique might reduce the number of network round trips and increase network throughput. For more information, see [ Jumbo frames (9001 MTU)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#jumbo_frame_instances) in the *Amazon EC2 User Guide*.

For information about DB instance classes, see [Amazon AuroraDB instance classes](Concepts.DBInstanceClass.md). To determine the DB instance class that is equivalent to an Amazon EC2 instance type, place `db.` before the Amazon EC2 instance type name. For example, the `r5.8xlarge` Amazon EC2 instance is equivalent to the `db.r5.8xlarge` DB instance class.

### Increase network bandwidth
<a name="apg-waits.clientread.actions.increase-network-bandwidth"></a>

Use `NetworkReceiveThroughput` and `NetworkTransmitThroughput` Amazon CloudWatch metrics to monitor incoming and outgoing network traffic on the DB cluster. These metrics can help you to determine if network bandwidth is sufficient for your workload. 

If your network bandwidth isn't enough, increase it. If the AWS client or your DB instance is reaching the network bandwidth limits, the only way to increase the bandwidth is to increase your DB instance size.

For more information about CloudWatch metrics, see [Amazon CloudWatch metrics for Amazon Aurora](Aurora.AuroraMonitoring.Metrics.md).

### Monitor maximums for network performance
<a name="apg-waits.clientread.actions.monitor-network-performance"></a>

If you are using Amazon EC2 clients, Amazon EC2 provides maximums for network performance metrics, including aggregate inbound and outbound network bandwidth. It also provides connection tracking to ensure that packets are returned as expected and link-local services access for services such as the Domain Name System (DNS). To monitor these maximums, use a current enhanced networking driver and monitor network performance for your client. 

For more information, see [ Monitor network performance for your Amazon EC2 instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html) in the *Amazon EC2 User Guide* and [Monitor network performance for your Amazon EC2 instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/monitoring-network-performance-ena.html) in the *Amazon EC2 User Guide*.

### Monitor for transactions in the "idle in transaction" state
<a name="apg-waits.clientread.actions.check-idle-in-transaction"></a>

Check whether you have an increasing number of `idle in transaction` connections. To do this, monitor the `state` column in the `pg_stat_activity` table. You might be able to identify the connection source by running a query similar to the following.

```
select client_addr, state, count(1) from pg_stat_activity 
where state like 'idle in transaction%' 
group by 1,2 
order by 3 desc
```

# Client:ClientWrite
<a name="apg-waits.clientwrite"></a>

The `Client:ClientWrite` event occurs when Aurora PostgreSQL is waiting to write data to the client.

**Topics**
+ [Supported engine versions](#apg-waits.clientwrite.context.supported)
+ [Context](#apg-waits.clientwrite.context)
+ [Likely causes of increased waits](#apg-waits.clientwrite.causes)
+ [Actions](#apg-waits.clientwrite.actions)

## Supported engine versions
<a name="apg-waits.clientwrite.context.supported"></a>

This wait event information is supported for Aurora PostgreSQL version 10 and higher.

## Context
<a name="apg-waits.clientwrite.context"></a>

A client process must read all of the data received from an Aurora PostgreSQL DB cluster before the cluster can send more data. The time that the cluster waits before sending more data to the client is a `Client:ClientWrite` event.

Reduced network throughput between the Aurora PostgreSQL DB cluster and the client can cause this event. CPU pressure and network saturation on the client can also cause this event. *CPU pressure* is when the CPU is fully utilized and there are tasks waiting for CPU time. *Network saturation* is when the network between the database and client is carrying more data than it can handle. 

## Likely causes of increased waits
<a name="apg-waits.clientwrite.causes"></a>

Common causes for the `Client:ClientWrite` event to appear in top waits include the following: 

**Increased network latency**  
There might be increased network latency between the Aurora PostgreSQL DB cluster and client. Higher network latency increases the time required for the client to receive the data.

**Increased load on the client**  
There might be CPU pressure or network saturation on the client. An increase in load on the client delays the reception of data from the Aurora PostgreSQL DB cluster.

**Large volume of data sent to the client**  
The Aurora PostgreSQL DB cluster might be sending a large amount of data to the client. A client might not be able to receive the data as fast as the cluster is sending it. Activities such as a copy of a large table can result in an increase in `Client:ClientWrite` events.

## Actions
<a name="apg-waits.clientwrite.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Place the clients in the same Availability Zone and VPC subnet as the cluster](#apg-waits.clientwrite.actions.az-vpc-subnet)
+ [Use current generation instances](#apg-waits.clientwrite.actions.db-instance-class)
+ [Reduce the amount of data sent to the client](#apg-waits.clientwrite.actions.reduce-data)
+ [Scale your client](#apg-waits.clientwrite.actions.scale-client)

### Place the clients in the same Availability Zone and VPC subnet as the cluster
<a name="apg-waits.clientwrite.actions.az-vpc-subnet"></a>

To reduce network latency and increase network throughput, place clients in the same Availability Zone and virtual private cloud (VPC) subnet as the Aurora PostgreSQL DB cluster.

### Use current generation instances
<a name="apg-waits.clientwrite.actions.db-instance-class"></a>

In some cases, you might not be using a DB instance class that supports jumbo frames. If you're running your application on Amazon EC2, consider using a current generation instance for the client. Also, configure the maximum transmission unit (MTU) on the client operating system. This technique might reduce the number of network round trips and increase network throughput. For more information, see [ Jumbo frames (9001 MTU)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#jumbo_frame_instances) in the *Amazon EC2 User Guide*.

For information about DB instance classes, see [Amazon AuroraDB instance classes](Concepts.DBInstanceClass.md). To determine the DB instance class that is equivalent to an Amazon EC2 instance type, place `db.` before the Amazon EC2 instance type name. For example, the `r5.8xlarge` Amazon EC2 instance is equivalent to the `db.r5.8xlarge` DB instance class.

### Reduce the amount of data sent to the client
<a name="apg-waits.clientwrite.actions.reduce-data"></a>

When possible, adjust your application to reduce the amount of data that the Aurora PostgreSQL DB cluster sends to the client. Making such adjustments relieves CPU and network contention on the client.

### Scale your client
<a name="apg-waits.clientwrite.actions.scale-client"></a>

Using Amazon CloudWatch or other host metrics, determine if your client is currently constrained by CPU or network bandwidth, or both. If the client is constrained, scale your client accordingly.

# CPU
<a name="apg-waits.cpu"></a>

This event occurs when a thread is active in CPU or is waiting for CPU.

**Topics**
+ [Supported engine versions](#apg-waits.cpu.context.supported)
+ [Context](#apg-waits.cpu.context)
+ [Likely causes of increased waits](#apg-waits.cpu.causes)
+ [Actions](#apg-waits.cpu.actions)

## Supported engine versions
<a name="apg-waits.cpu.context.supported"></a>

This wait event information is relevant for Aurora PostgreSQL version 9.6 and higher. 

## Context
<a name="apg-waits.cpu.context"></a>

The *central processing unit (CPU)* is the component of a computer that runs instructions. For example, CPU instructions perform arithmetic operations and exchange data in memory. If a query increases the number of instructions that it performs through the database engine, the time spent running the query increases. *CPU scheduling* is giving CPU time to a process. Scheduling is orchestrated by the kernel of the operating system.

**Topics**
+ [How to tell when this wait occurs](#apg-waits.cpu.when-it-occurs)
+ [DBLoadCPU metric](#apg-waits.cpu.context.dbloadcpu)
+ [os.cpuUtilization metrics](#apg-waits.cpu.context.osmetrics)
+ [Likely cause of CPU scheduling](#apg-waits.cpu.context.scheduling)

### How to tell when this wait occurs
<a name="apg-waits.cpu.when-it-occurs"></a>

This `CPU` wait event indicates that a backend process is active in CPU or is waiting for CPU. You know that it's occurring when a query shows the following information:
+ The `pg_stat_activity.state` column has the value `active`.
+ The `wait_event_type` and `wait_event` columns in `pg_stat_activity` are both `null`.

To see the backend processes that are using or waiting on CPU, run the following query.

```
SELECT * 
FROM   pg_stat_activity
WHERE  state = 'active'
AND    wait_event_type IS NULL
AND    wait_event IS NULL;
```

### DBLoadCPU metric
<a name="apg-waits.cpu.context.dbloadcpu"></a>

The Performance Insights metric for CPU is `DBLoadCPU`. The value for `DBLoadCPU` can differ from the value for the Amazon CloudWatch metric `CPUUtilization`. The latter metric is collected from the HyperVisor for a database instance.

### os.cpuUtilization metrics
<a name="apg-waits.cpu.context.osmetrics"></a>

Performance Insights operating-system metrics provide detailed information about CPU utilization. For example, you can display the following metrics:
+ `os.cpuUtilization.nice.avg`
+ `os.cpuUtilization.total.avg`
+ `os.cpuUtilization.wait.avg`
+ `os.cpuUtilization.idle.avg`

Performance Insights reports the CPU usage by the database engine as `os.cpuUtilization.nice.avg`.

### Likely cause of CPU scheduling
<a name="apg-waits.cpu.context.scheduling"></a>

From an operating system perspective, the CPU is active when it isn't running the idle thread. The CPU is active while it performs a computation, but it's also active when it waits on memory I/O. This type of I/O dominates a typical database workload.

Processes are likely to wait to get scheduled on a CPU when the following conditions are met:
+ The CloudWatch `CPUUtilization` metric is near 100 percent.
+ The average load is greater than the number of vCPUs, indicating a heavy load. You can find the `loadAverageMinute` metric in the OS metrics section in Performance Insights.

## Likely causes of increased waits
<a name="apg-waits.cpu.causes"></a>

When the CPU wait event occurs more than normal, possibly indicating a performance problem, typical causes include the following.

**Topics**
+ [Likely causes of sudden spikes](#apg-waits.cpu.causes.spikes)
+ [Likely causes of long-term high frequency](#apg-waits.cpu.causes.long-term)
+ [Corner cases](#apg-waits.cpu.causes.corner-cases)

### Likely causes of sudden spikes
<a name="apg-waits.cpu.causes.spikes"></a>

The most likely causes of sudden spikes are as follows:
+ Your application has opened too many simultaneous connections to the database. This scenario is known as a "connection storm."
+ Your application workload changed in any of the following ways:
  + New queries
  + An increase in the size of your dataset
  + Index maintenance or creation
  + New functions
  + New operators
  + An increase in parallel query execution
+ Your query execution plans have changed. In some cases, a change can cause an increase in buffers. For example, the query is now using a sequential scan when it previously used an index. In this case, the queries need more CPU to accomplish the same goal.

### Likely causes of long-term high frequency
<a name="apg-waits.cpu.causes.long-term"></a>

The most likely causes of events that recur over a long period:
+ Too many backend processes are running concurrently on CPU. These processes can be parallel workers.
+ Queries are performing suboptimally because they need a large number of buffers.

### Corner cases
<a name="apg-waits.cpu.causes.corner-cases"></a>

If none of the likely causes turn out to be actual causes, the following situations might be occurring:
+ The CPU is swapping processes in and out.
+ CPU context switching has increased.
+ Aurora PostgreSQL code is missing wait events.

## Actions
<a name="apg-waits.cpu.actions"></a>

If the `CPU` wait event dominates database activity, it doesn't necessarily indicate a performance problem. Respond to this event only when performance degrades.

**Topics**
+ [Investigate whether the database is causing the CPU increase](#apg-waits.cpu.actions.db-CPU)
+ [Determine whether the number of connections increased](#apg-waits.cpu.actions.connections)
+ [Respond to workload changes](#apg-waits.cpu.actions.workload)

### Investigate whether the database is causing the CPU increase
<a name="apg-waits.cpu.actions.db-CPU"></a>

Examine the `os.cpuUtilization.nice.avg` metric in Performance Insights. If this value is far less than the CPU usage, nondatabase processes are the main contributor to CPU.

### Determine whether the number of connections increased
<a name="apg-waits.cpu.actions.connections"></a>

Examine the `DatabaseConnections` metric in Amazon CloudWatch. Your action depends on whether the number increased or decreased during the period of increased CPU wait events.

#### The connections increased
<a name="apg-waits.cpu.actions.connections.increased"></a>

If the number of connections went up, compare the number of backend processes consuming CPU to the number of vCPUs. The following scenarios are possible:
+ The number of backend processes consuming CPU is less than the number of vCPUs.

  In this case, the number of connections isn't an issue. However, you might still try to reduce CPU utilization.
+ The number of backend processes consuming CPU is greater than the number of vCPUs.

  In this case, consider the following options:
  + Decrease the number of backend processes connected to your database. For example, implement a connection pooling solution such as RDS Proxy. To learn more, see [Amazon RDS Proxyfor Aurora](rds-proxy.md).
  + Upgrade your instance size to get a higher number of vCPUs.
  + Redirect some read-only workloads to reader nodes, if applicable.

#### The connections didn't increase
<a name="apg-waits.cpu.actions.connections.decreased"></a>

Examine the `blks_hit` metrics in Performance Insights. Look for a correlation between an increase in `blks_hit` and CPU usage. The following scenarios are possible:
+ CPU usage and `blks_hit` are correlated.

  In this case, find the top SQL statements that are linked to the CPU usage, and look for plan changes. You can use either of the following techniques:
  + Explain the plans manually and compare them to the expected execution plan.
  + Look for an increase in block hits per second and local block hits per second. In the **Top SQL** section of Performance Insights dashboard, choose **Preferences**.
+ CPU usage and `blks_hit` aren't correlated.

  In this case, determine whether any of the following occurs:
  + The application is rapidly connecting to and disconnecting from the database. 

    Diagnose this behavior by turning on `log_connections` and `log_disconnections`, then analyzing the PostgreSQL logs. Consider using the `pgbadger` log analyzer. For more information, see [https://github.com/darold/pgbadger](https://github.com/darold/pgbadger).
  + The OS is overloaded.

    In this case, Performance Insights shows that backend processes are consuming CPU for a longer time than usual. Look for evidence in the Performance Insights `os.cpuUtilization` metrics or the CloudWatch `CPUUtilization` metric. If the operating system is overloaded, look at Enhanced Monitoring metrics to diagnose further. Specifically, look at the process list and the percentage of CPU consumed by each process.
  + Top SQL statements are consuming too much CPU.

    Examine statements that are linked to the CPU usage to see whether they can use less CPU. Run an `EXPLAIN` command, and focus on the plan nodes that have the most impact. Consider using a PostgreSQL execution plan visualizer. To try out this tool, see [http://explain.dalibo.com/](http://explain.dalibo.com/).

### Respond to workload changes
<a name="apg-waits.cpu.actions.workload"></a>

If your workload has changed, look for the following types of changes:

New queries  
Check whether the new queries are expected. If so, ensure that their execution plans and the number of executions per second are expected.

An increase in the size of the data set  
Determine whether partitioning, if it's not already implemented, might help. This strategy might reduce the number of pages that a query needs to retrieve.

Index maintenance or creation  
Check whether the schedule for the maintenance is expected. A best practice is to schedule maintenance activities outside of peak activities.

New functions  
Check whether these functions perform as expected during testing. Specifically, check whether the number of executions per second is expected.

New operators  
Check whether they perform as expected during the testing.

An increase in running parallel queries  
Determine whether any of the following situations has occurred:  
+ The relations or indexes involved have suddenly grown in size so that they differ significantly from `min_parallel_table_scan_size` or `min_parallel_index_scan_size`.
+ Recent changes have been made to `parallel_setup_cost` or `parallel_tuple_cost`.
+ Recent changes have been made to `max_parallel_workers` or `max_parallel_workers_per_gather`.

# IO:BufFileRead and IO:BufFileWrite
<a name="apg-waits.iobuffile"></a>

The `IO:BufFileRead` and `IO:BufFileWrite` events occur when Aurora PostgreSQL creates temporary files. When operations require more memory than the working memory parameters currently define, they write temporary data to persistent storage. This operation is sometimes called *spilling to disk*. For more information about the temporary files and their usage, see [Managing temporary files with PostgreSQL](PostgreSQL.ManagingTempFiles.md).

**Topics**
+ [Supported engine versions](#apg-waits.iobuffile.context.supported)
+ [Context](#apg-waits.iobuffile.context)
+ [Likely causes of increased waits](#apg-waits.iobuffile.causes)
+ [Actions](#apg-waits.iobuffile.actions)

## Supported engine versions
<a name="apg-waits.iobuffile.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.iobuffile.context"></a>

`IO:BufFileRead` and `IO:BufFileWrite` relate to the work memory area and maintenance work memory area. For more information about these local memory areas, see [Work memory area](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.local.work_mem) and [Maintenance work memory area](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.local.maintenance_work_mem).

The default value for `work_mem` is 4 MB. If one session performs operations in parallel, each worker handling the parallelism uses 4 MB of memory. For this reason, set `work_mem` carefully. If you increase the value too much, a database running many sessions might consume too much memory. If you set the value too low, Aurora PostgreSQL creates temporary files in local storage. The disk I/O for these temporary files can reduce performance.

If you observe the following sequence of events, your database might be generating temporary files:

1. Sudden and sharp decreases in availability

1. Fast recovery for the free space

You might also see a "chainsaw" pattern. This pattern can indicate that your database is creating small files constantly.

## Likely causes of increased waits
<a name="apg-waits.iobuffile.causes"></a>

In general, these wait events are caused by operations that consume more memory than the `work_mem` or `maintenance_work_mem` parameters allocate. To compensate, the operations write to temporary files. Common causes for the `IO:BufFileRead` and `IO:BufFileWrite` events include the following:

**Queries that need more memory than exists in the work memory area**  
Queries with the following characteristics use the work memory area:  
+ Hash joins
+ `ORDER BY` clause
+ `GROUP BY` clause
+ `DISTINCT`
+ Window functions
+ `CREATE TABLE AS SELECT`
+ Materialized view refresh

**Statements that need more memory than exists in the maintenance work memory area**  
The following statements use the maintenance work memory area:  
+ `CREATE INDEX`
+ `CLUSTER`

## Actions
<a name="apg-waits.iobuffile.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Identify the problem](#apg-waits.iobuffile.actions.problem)
+ [Examine your join queries](#apg-waits.iobuffile.actions.joins)
+ [Examine your ORDER BY and GROUP BY queries](#apg-waits.iobuffile.actions.order-by)
+ [Avoid using the DISTINCT operation](#apg-waits.iobuffile.actions.distinct)
+ [Consider using window functions instead of GROUP BY functions](#apg-waits.iobuffile.actions.window)
+ [Investigate materialized views and CTAS statements](#apg-waits.iobuffile.actions.mv-refresh)
+ [Use pg\$1repack when you create indexes](#apg-waits.iobuffile.actions.pg_repack)
+ [Increase maintenance\$1work\$1mem when you cluster tables](#apg-waits.iobuffile.actions.cluster)
+ [Tune memory to prevent IO:BufFileRead and IO:BufFileWrite](#apg-waits.iobuffile.actions.tuning-memory)

### Identify the problem
<a name="apg-waits.iobuffile.actions.problem"></a>

You can view temporary file usage directly in Performance Insights. For more information, see [Viewing temporary file usage with Performance Insights](PostgreSQL.ManagingTempFiles.Example.md). When Performance Insights is disabled, you might notice increased `IO:BufFileRead` and `IO:BufFileWrite` operations. To troubleshoot, do the following:

1. Examine the `FreeLocalStorage` metric in Amazon CloudWatch.

1. Look for a chainsaw pattern, which is a series of jagged spikes.

A chainsaw pattern indicates a quick consumption and release of storage, often associated with temporary files. If you notice this pattern, turn on Performance Insights. When using Performance Insights, you can identify when the wait events occur and which queries are associated with them. Your solution depends on the specific query causing the events.

Or set the parameter `log_temp_files`. This parameter logs all queries generating more than threshold KB of temporary files. If the value is `0`, Aurora PostgreSQL logs all temporary files. If the value is `1024`, Aurora PostgreSQL logs all queries that produces temporary files larger than 1 MB. For more information about `log_temp_files`, see [Error Reporting and Logging](https://www.postgresql.org/docs/10/runtime-config-logging.html) in the PostgreSQL documentation.

### Examine your join queries
<a name="apg-waits.iobuffile.actions.joins"></a>

Your application probably use joins. For example, the following query joins four tables.

```
SELECT * 
       FROM order 
 INNER JOIN order_item 
       ON (order.id = order_item.order_id)
 INNER JOIN customer 
       ON (customer.id = order.customer_id)
 INNER JOIN customer_address 
       ON (customer_address.customer_id = customer.id AND 
           order.customer_address_id = customer_address.id)
 WHERE customer.id = 1234567890;
```

A possible cause of spikes in temporary file usage is a problem in the query itself. For example, a broken clause might not filter the joins properly. Consider the second inner join in the following example.

```
SELECT * 
       FROM order
 INNER JOIN order_item 
       ON (order.id = order_item.order_id)
 INNER JOIN customer 
       ON (customer.id = customer.id)
 INNER JOIN customer_address 
       ON (customer_address.customer_id = customer.id AND 
           order.customer_address_id = customer_address.id)
 WHERE customer.id = 1234567890;
```

The preceding query mistakenly joins `customer.id` to `customer.id`, generating a Cartesian product between every customer and every order. This type of accidental join generates large temporary files. Depending on the size of the tables, a Cartesian query can even fill up storage. Your application might have Cartesian joins when the following conditions are met:
+ You see large, sharp decreases in storage availability, followed by fast recovery.
+ No indexes are being created.
+ No `CREATE TABLE FROM SELECT` statements are being issued.
+ No materialized views are being refreshed.

To see whether the tables are being joined using the proper keys, inspect your query and object-relational mapping directives. Bear in mind that certain queries of your application are not called all the time, and some queries are dynamically generated.

### Examine your ORDER BY and GROUP BY queries
<a name="apg-waits.iobuffile.actions.order-by"></a>

In some cases, an `ORDER BY` clause can result in excessive temporary files. Consider the following guidelines:
+ Only include columns in an `ORDER BY` clause when they need to be ordered. This guideline is especially important for queries that return thousands of rows and specify many columns in the `ORDER BY` clause.
+ Considering creating indexes to accelerate `ORDER BY` clauses when they match columns that have the same ascending or descending order. Partial indexes are preferable because they are smaller. Smaller indexes are read and traversed more quickly.
+ If you create indexes for columns that can accept null values, consider whether you want the null values stored at the end or at the beginning of the indexes.

  If possible, reduce the number of rows that need to be ordered by filtering the result set. If you use `WITH` clause statements or subqueries, remember that an inner query generates a result set and passes it to the outside query. The more rows that a query can filter out, the less ordering the query needs to do.
+ If you don't need to obtain the full result set, use the `LIMIT` clause. For example, if you only want the top five rows, a query using the `LIMIT` clause doesn't keep generating results. In this way, the query requires less memory and temporary files.

A query that uses a `GROUP BY` clause can also require temporary files. `GROUP BY` queries summarize values by using functions such as the following:
+ `COUNT`
+ `AVG`
+ `MIN`
+ `MAX`
+ `SUM`
+ `STDDEV`

To tune `GROUP BY` queries, follow the recommendations for `ORDER BY` queries.

### Avoid using the DISTINCT operation
<a name="apg-waits.iobuffile.actions.distinct"></a>

If possible, avoid using the `DISTINCT` operation to remove duplicated rows. The more unnecessary and duplicated rows that your query returns, the more expensive the `DISTINCT` operation becomes. If possible, add filters in the `WHERE` clause even if you use the same filters for different tables. Filtering the query and joining correctly improves your performance and reduces resource use. It also prevents incorrect reports and results.

If you need to use `DISTINCT` for multiple rows of a same table, consider creating a composite index. Grouping multiple columns in an index can improve the time to evaluate distinct rows. Also, if you use Amazon Aurora PostgreSQL version 10 or higher, you can correlate statistics among multiple columns by using the `CREATE STATISTICS` command.

### Consider using window functions instead of GROUP BY functions
<a name="apg-waits.iobuffile.actions.window"></a>

Using `GROUP BY`, you change the result set, and then retrieve the aggregated result. Using window functions, you aggregate data without changing the result set. A window function uses the `OVER` clause to perform calculations across the sets defined by the query, correlating one row with another. You can use all the `GROUP BY` functions in window functions, but also use functions such as the following:
+ `RANK`
+ `ARRAY_AGG`
+ `ROW_NUMBER`
+ `LAG`
+ `LEAD`

To minimize the number of temporary files generated by a window function, remove duplications for the same result set when you need two distinct aggregations. Consider the following query.

```
SELECT sum(salary) OVER (PARTITION BY dept ORDER BY salary DESC) as sum_salary
     , avg(salary) OVER (PARTITION BY dept ORDER BY salary ASC) as avg_salary
  FROM empsalary;
```

You can rewrite the query with the `WINDOW` clause as follows.

```
SELECT sum(salary) OVER w as sum_salary
         , avg(salary) OVER w as_avg_salary
    FROM empsalary
  WINDOW w AS (PARTITION BY dept ORDER BY salary DESC);
```

By default, the Aurora PostgreSQL execution planner consolidates similar nodes so that it doesn't duplicate operations. However, by using an explicit declaration for the window block, you can maintain the query more easily. You might also improve performance by preventing duplication.

### Investigate materialized views and CTAS statements
<a name="apg-waits.iobuffile.actions.mv-refresh"></a>

When a materialized view refreshes, it runs a query. This query can contain an operation such as `GROUP BY`, `ORDER BY`, or `DISTINCT`. During a refresh, you might observe large numbers of temporary files and the wait events `IO:BufFileWrite` and `IO:BufFileRead`. Similarly, when you create a table based on a `SELECT` statement, the `CREATE TABLE` statement runs a query. To reduce the temporary files needed, optimize the query.

### Use pg\$1repack when you create indexes
<a name="apg-waits.iobuffile.actions.pg_repack"></a>

When you create an index, the engine orders the result set. As tables grow in size, and as values in the indexed column become more diverse, the temporary files require more space. In most cases, you can't prevent the creation of temporary files for large tables without modifying the maintenance work memory area. For more information, see [Maintenance work memory area](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.local.maintenance_work_mem). 

A possible workaround when recreating a large index is to use the pg\$1repack tool. For more information, see [Reorganize tables in PostgreSQL databases with minimal locks](https://reorg.github.io/pg_repack/) in the pg\$1repack documentation.

### Increase maintenance\$1work\$1mem when you cluster tables
<a name="apg-waits.iobuffile.actions.cluster"></a>

The `CLUSTER` command clusters the table specified by *table\$1name* based on an existing index specified by *index\$1name*. Aurora PostgreSQL physically recreates the table to match the order of a given index.

When magnetic storage was prevalent, clustering was common because storage throughput was limited. Now that SSD-based storage is common, clustering is less popular. However, if you cluster tables, you can still increase performance slightly depending on the table size, index, query, and so on. 

If you run the `CLUSTER` command and observe the wait events `IO:BufFileWrite` and `IO:BufFileRead`, tune `maintenance_work_mem`. Increase the memory size to a fairly large amount. A high value means that the engine can use more memory for the clustering operation.

### Tune memory to prevent IO:BufFileRead and IO:BufFileWrite
<a name="apg-waits.iobuffile.actions.tuning-memory"></a>

In some situation, you need to tune memory. Your goal is to balance the following requirements:
+ The `work_mem` value (see [Work memory area](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.local.work_mem))
+ The memory remaining after discounting the `shared_buffers` value (see [Buffer pool](AuroraMySQL.Managing.Tuning.concepts.md#AuroraMySQL.Managing.Tuning.concepts.memory.buffer-pool))
+ The maximum connections opened and in use, which is limited by `max_connections`

#### Increase the size of the work memory area
<a name="apg-waits.iobuffile.actions.tuning-memory.work-mem"></a>

In some situations, your only option is to increase the memory used by your session. If your queries are correctly written and are using the correct keys for joins, consider increasing the `work_mem` value. For more information, see [Work memory area](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.local.work_mem).

To find out how many temporary files a query generates, set `log_temp_files` to `0`. If you increase the `work_mem` value to the maximum value identified in the logs, you prevent the query from generating temporary files. However, `work_mem` sets the maximum per plan node for each connection or parallel worker. If the database has 5,000 connections, and if each one uses 256 MiB memory, the engine needs 1.2 TiB of RAM. Thus, your instance might run out of memory.

#### Reserve sufficient memory for the shared buffer pool
<a name="apg-waits.iobuffile.actions.tuning-memory.shared-pool"></a>

Your database uses memory areas such as the shared buffer pool, not just the work memory area. Consider the requirements of these additional memory areas before you increase `work_mem`. For more information about the buffer pool, see [Buffer pool](AuroraMySQL.Managing.Tuning.concepts.md#AuroraMySQL.Managing.Tuning.concepts.memory.buffer-pool).

For example, assume that your Aurora PostgreSQL instance class is db.r5.2xlarge. This class has 64 GiB of memory. By default, 75 percent of the memory is reserved for the shared buffer pool. After you subtract the amount allocated to the shared memory area, 16,384 MB remains. Don't allocate the remaining memory exclusively to the work memory area because the operating system and the engine also require memory.

The memory that you can allocate to `work_mem` depends on the instance class. If you use a larger instance class, more memory is available. However, in the preceding example, you can't use more than 16 GiB. Otherwise, your instance becomes unavailable when it runs out of memory. To recover the instance from the unavailable state, the Aurora PostgreSQL automation services automatically restart.

#### Manage the number of connections
<a name="apg-waits.iobuffile.actions.tuning-memory.connections"></a>

Suppose that your database instance has 5,000 simultaneous connections. Each connection uses at least 4 MiB of `work_mem`. The high memory consumption of the connections is likely to degrade performance. In response, you have the following options:
+ Upgrade to a larger instance class.
+ Decrease the number of simultaneous database connections by using a connection proxy or pooler.

For proxies, consider Amazon RDS Proxy, pgBouncer, or a connection pooler based on your application. This solution alleviates the CPU load. It also reduces the risk when all connections require the work memory area. When fewer database connections exist, you can increase the value of `work_mem`. In this way, you reduce the occurrence of the `IO:BufFileRead` and `IO:BufFileWrite` wait events. Also, the queries waiting for the work memory area speed up significantly.

# IO:DataFileRead
<a name="apg-waits.iodatafileread"></a>

The `IO:DataFileRead` event occurs when a connection waits on a backend process to read a required page from storage because the page isn't available in shared memory.

**Topics**
+ [Supported engine versions](#apg-waits.iodatafileread.context.supported)
+ [Context](#apg-waits.iodatafileread.context)
+ [Likely causes of increased waits](#apg-waits.iodatafileread.causes)
+ [Actions](#apg-waits.iodatafileread.actions)

## Supported engine versions
<a name="apg-waits.iodatafileread.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.iodatafileread.context"></a>

All queries and data manipulation (DML) operations access pages in the buffer pool. Statements that can induce reads include `SELECT`, `UPDATE`, and `DELETE`. For example, an `UPDATE` can read pages from tables or indexes. If the page being requested or updated isn't in the shared buffer pool, this read can lead to the `IO:DataFileRead` event.

Because the shared buffer pool is finite, it can fill up. In this case, requests for pages that aren't in memory force the database to read blocks from disk. If the `IO:DataFileRead` event occurs frequently, your shared buffer pool might be too small to accommodate your workload. This problem is acute for `SELECT` queries that read a large number of rows that don't fit in the buffer pool. For more information about the buffer pool, see [Buffer pool](AuroraMySQL.Managing.Tuning.concepts.md#AuroraMySQL.Managing.Tuning.concepts.memory.buffer-pool).

## Likely causes of increased waits
<a name="apg-waits.iodatafileread.causes"></a>

Common causes for the `IO:DataFileRead` event include the following:

**Connection spikes**  
You might find multiple connections generating the same number of IO:DataFileRead wait events. In this case, a spike (sudden and large increase) in `IO:DataFileRead` events can occur. 

**SELECT and DML statements performing sequential scans**  
Your application might be performing a new operation. Or an existing operation might change because of a new execution plan. In such cases, look for tables (particularly large tables) that have a greater `seq_scan` value. Find them by querying `pg_stat_user_tables`. To track queries that are generating more read operations, use the extension `pg_stat_statements`.

**CTAS and CREATE INDEX for large data sets**  
A *CTAS* is a `CREATE TABLE AS SELECT` statement. If you run a CTAS using a large data set as a source, or create an index on a large table, the `IO:DataFileRead` event can occur. When you create an index, the database might need to read the entire object using a sequential scan. A CTAS generates `IO:DataFile` reads when pages aren't in memory.

**Multiple vacuum workers running at the same time**  
Vacuum workers can be triggered manually or automatically. We recommend adopting an aggressive vacuum strategy. However, when a table has many updated or deleted rows, the `IO:DataFileRead` waits increase. After space is reclaimed, the vacuum time spent on `IO:DataFileRead` decreases.

**Ingesting large amounts of data**  
When your application ingests large amounts of data, `ANALYZE` operations might occur more often. The `ANALYZE` process can be triggered by an autovacuum launcher or invoked manually.  
The `ANALYZE` operation reads a subset of the table. The number of pages that must be scanned is calculated by multiplying 30 by the `default_statistics_target` value. For more information, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET). The `default_statistics_target` parameter accepts values between 1 and 10,000, where the default is 100.

**Resource starvation**  
If instance network bandwidth or CPU are consumed, the `IO:DataFileRead` event might occur more frequently.

## Actions
<a name="apg-waits.iodatafileread.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Check predicate filters for queries that generate waits](#apg-waits.iodatafileread.actions.filters)
+ [Minimize the effect of maintenance operations](#apg-waits.iodatafileread.actions.maintenance)
+ [Respond to high numbers of connections](#apg-waits.iodatafileread.actions.connections)

### Check predicate filters for queries that generate waits
<a name="apg-waits.iodatafileread.actions.filters"></a>

Assume that you identify specific queries that are generating `IO:DataFileRead` wait events. You might identify them using the following techniques:
+ Performance Insights
+ Catalog views such as the one provided by the extension `pg_stat_statements`
+ The catalog view `pg_stat_all_tables`, if it periodically shows an increased number of physical reads
+ The `pg_statio_all_tables` view, if it shows that `_read` counters are increasing

We recommend that you determine which filters are used in the predicate (`WHERE` clause) of these queries. Follow these guidelines:
+ Run the `EXPLAIN` command. In the output, identify which types of scans are used. A sequential scan doesn't necessarily indicate a problem. Queries that use sequential scans naturally produce more `IO:DataFileRead` events when compared to queries that use filters.

  Find out whether the column listed in the `WHERE` clause is indexed. If not, consider creating an index for this column. This approach avoids the sequential scans and reduces the `IO:DataFileRead` events. If a query has restrictive filters and still produces sequential scans, evaluate whether the proper indexes are being used.
+ Find out whether the query is accessing a very large table. In some cases, partitioning a table can improve performance, allowing the query to only read necessary partitions.
+ Examine the cardinality (total number of rows) from your join operations. Note how restrictive the values are that you're passing in the filters for your `WHERE` clause. If possible, tune your query to reduce the number of rows that are passed in each step of the plan.

### Minimize the effect of maintenance operations
<a name="apg-waits.iodatafileread.actions.maintenance"></a>

Maintenance operations such as `VACUUM` and `ANALYZE` are important. We recommend that you don't turn them off because you find `IO:DataFileRead` wait events related to these maintenance operations. The following approaches can minimize the effect of these operations:
+ Run maintenance operations manually during off-peak hours. This technique prevents the database from reaching the threshold for automatic operations.
+ For very large tables, consider partitioning the table. This technique reduces the overhead of maintenance operations. The database only accesses the partitions that require maintenance.
+ When you ingest large amounts of data, consider disabling the autoanalyze feature.

The autovacuum feature is automatically triggered for a table when the following formula is true.

```
pg_stat_user_tables.n_dead_tup > (pg_class.reltuples x autovacuum_vacuum_scale_factor) + autovacuum_vacuum_threshold
```

The view `pg_stat_user_tables` and catalog `pg_class` have multiple rows. One row can correspond to one row in your table. This formula assumes that the `reltuples` are for a specific table. The parameters `autovacuum_vacuum_scale_factor` (0.20 by default) and `autovacuum_vacuum_threshold` (50 tuples by default) are usually set globally for the whole instance. However, you can set different values for a specific table.

**Topics**
+ [Find tables consuming space unnecessarily](#apg-waits.iodatafileread.actions.maintenance.tables)
+ [Find indexes consuming unnecessary space](#apg-waits.iodatafileread.actions.maintenance.indexes)
+ [Find tables that are eligible to be autovacuumed](#apg-waits.iodatafileread.actions.maintenance.autovacuumed)

#### Find tables consuming space unnecessarily
<a name="apg-waits.iodatafileread.actions.maintenance.tables"></a>

To find tables consuming more space than necessary, run the following query. When this query is run by a database user role that doesn't have the `rds_superuser` role, it returns information about only those tables that the user role has permissions to read. This query is supported by PostgreSQL version 12 and later versions. 

```
WITH report AS (
   SELECT   schemaname
           ,tblname
           ,n_dead_tup
           ,n_live_tup
           ,block_size*tblpages AS real_size
           ,(tblpages-est_tblpages)*block_size AS extra_size
           ,CASE WHEN tblpages - est_tblpages > 0
              THEN 100 * (tblpages - est_tblpages)/tblpages::float
              ELSE 0
            END AS extra_ratio, fillfactor, (tblpages-est_tblpages_ff)*block_size AS bloat_size
           ,CASE WHEN tblpages - est_tblpages_ff > 0
              THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
              ELSE 0
            END AS bloat_ratio
           ,is_na
    FROM (
           SELECT  ceil( reltuples / ( (block_size-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages
                  ,ceil( reltuples / ( (block_size-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff
                  ,tblpages
                  ,fillfactor
                  ,block_size
                  ,tblid
                  ,schemaname
                  ,tblname
                  ,n_dead_tup
                  ,n_live_tup
                  ,heappages
                  ,toastpages
                  ,is_na
             FROM (
                    SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2*ma)
                               - CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END
                               - CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END
                           ) AS tpl_size
                           ,block_size - page_hdr AS size_per_block
                           ,(heappages + toastpages) AS tblpages
                           ,heappages
                           ,toastpages
                           ,reltuples
                           ,toasttuples
                           ,block_size
                           ,page_hdr
                           ,tblid
                           ,schemaname
                           ,tblname
                           ,fillfactor
                           ,is_na
                           ,n_dead_tup
                           ,n_live_tup
                          FROM (
                                SELECT  tbl.oid                       AS tblid
                                       ,ns.nspname                    AS schemaname
                                       ,tbl.relname                   AS tblname
                                       ,tbl.reltuples                 AS reltuples
                                       ,tbl.relpages                  AS heappages
                                       ,coalesce(toast.relpages, 0)   AS toastpages
                                       ,coalesce(toast.reltuples, 0)  AS toasttuples
                                       ,psat.n_dead_tup               AS n_dead_tup
                                       ,psat.n_live_tup               AS n_live_tup
                                       ,24                            AS page_hdr
                                       ,current_setting('block_size')::numeric AS block_size
                                       ,coalesce(substring( array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor
                                       ,CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END        AS ma
                                       ,23 + CASE WHEN MAX(coalesce(null_frac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END              AS tpl_hdr_size
                                       ,sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) )                                    AS tpl_data_size
                                       ,bool_or(att.atttypid = 'pg_catalog.name'::regtype) OR count(att.attname) <> count(s.attname)         AS is_na
                                  FROM  pg_attribute       AS att
                                  JOIN  pg_class           AS tbl    ON (att.attrelid = tbl.oid)
                                  JOIN  pg_stat_all_tables AS psat   ON (tbl.oid = psat.relid)
                                  JOIN  pg_namespace       AS ns     ON (ns.oid = tbl.relnamespace)
                             LEFT JOIN  pg_stats           AS s      ON (s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname)
                             LEFT JOIN  pg_class           AS toast  ON (tbl.reltoastrelid = toast.oid)
                                 WHERE  att.attnum > 0
                                   AND  NOT att.attisdropped
                                   AND  tbl.relkind = 'r'
                              GROUP BY  tbl.oid, ns.nspname, tbl.relname, tbl.reltuples, tbl.relpages, toastpages, toasttuples, fillfactor, block_size, ma, n_dead_tup, n_live_tup
                              ORDER BY  schemaname, tblname
                           ) AS s
                 ) AS s2
       ) AS s3
 ORDER BY bloat_size DESC
)
  SELECT * 
    FROM report 
   WHERE bloat_ratio != 0
 -- AND schemaname = 'public'
 -- AND tblname = 'pgbench_accounts'
;

-- WHERE NOT is_na
--   AND tblpages*((pst).free_percent + (pst).dead_tuple_percent)::float4/100 >= 1
```

You can check for table and index bloat in your application. For more information, see [Diagnosing table and index bloat](AuroraPostgreSQL.diag-table-ind-bloat.md).

#### Find indexes consuming unnecessary space
<a name="apg-waits.iodatafileread.actions.maintenance.indexes"></a>

To find indexes consuming unnecessary space, run the following query.

```
-- WARNING: run with a nonsuperuser role, the query inspects
-- only indexes on tables you have permissions to read.
-- WARNING: rows with is_na = 't' are known to have bad statistics ("name" type is not supported).
-- This query is compatible with PostgreSQL 8.2 and later.

SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size,
  bs*(relpages-est_pages)::bigint AS extra_size,
  100 * (relpages-est_pages)::float / relpages AS extra_ratio,
  fillfactor, bs*(relpages-est_pages_ff) AS bloat_size,
  100 * (relpages-est_pages_ff)::float / relpages AS bloat_ratio,
  is_na
  -- , 100-(sub.pst).avg_leaf_density, est_pages, index_tuple_hdr_bm, 
  -- maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, sub.reltuples, sub.relpages 
  -- (DEBUG INFO)
FROM (
  SELECT coalesce(1 +
       ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 
       -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
    ) AS est_pages,
    coalesce(1 +
       ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
    ) AS est_pages_ff,
    bs, nspname, table_oid, tblname, idxname, relpages, fillfactor, is_na
    -- , stattuple.pgstatindex(quote_ident(nspname)||'.'||quote_ident(idxname)) AS pst, 
    -- index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples 
    -- (DEBUG INFO)
  FROM (
    SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, relam, table_oid, fillfactor,
      ( index_tuple_hdr_bm +
          maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
            WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
            ELSE index_tuple_hdr_bm%maxalign
          END
        + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
            WHEN nulldatawidth = 0 THEN 0
            WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
            ELSE nulldatawidth::integer%maxalign
          END
      )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
      -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
    FROM (
      SELECT
        i.nspname, i.tblname, i.idxname, i.reltuples, i.relpages, i.relam, a.attrelid AS table_oid,
        current_setting('block_size')::numeric AS bs, fillfactor,
        CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
          WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
          ELSE 4
        END AS maxalign,
        /* per page header, fixed size: 20 for 7.X, 24 for others */
        24 AS pagehdr,
        /* per page btree opaque data */
        16 AS pageopqdata,
        /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
        CASE WHEN max(coalesce(s.null_frac,0)) = 0
          THEN 2 -- IndexTupleData size
          ELSE 2 + (( 32 + 8 - 1 ) / 8) 
          -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
        END AS index_tuple_hdr_bm,
        /* data len: we remove null values save space using it fractionnal part from stats */
        sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
        max( CASE WHEN a.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
      FROM pg_attribute AS a
        JOIN (
          SELECT nspname, tbl.relname AS tblname, idx.relname AS idxname, 
            idx.reltuples, idx.relpages, idx.relam,
            indrelid, indexrelid, indkey::smallint[] AS attnum,
            coalesce(substring(
              array_to_string(idx.reloptions, ' ')
               from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor
          FROM pg_index
            JOIN pg_class idx ON idx.oid=pg_index.indexrelid
            JOIN pg_class tbl ON tbl.oid=pg_index.indrelid
            JOIN pg_namespace ON pg_namespace.oid = idx.relnamespace
          WHERE pg_index.indisvalid AND tbl.relkind = 'r' AND idx.relpages > 0
        ) AS i ON a.attrelid = i.indexrelid
        JOIN pg_stats AS s ON s.schemaname = i.nspname
          AND ((s.tablename = i.tblname AND s.attname = pg_catalog.pg_get_indexdef(a.attrelid, a.attnum, TRUE)) 
          -- stats from tbl
          OR  (s.tablename = i.idxname AND s.attname = a.attname))
          -- stats from functionnal cols
        JOIN pg_type AS t ON a.atttypid = t.oid
      WHERE a.attnum > 0
      GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
    ) AS s1
  ) AS s2
    JOIN pg_am am ON s2.relam = am.oid WHERE am.amname = 'btree'
) AS sub
-- WHERE NOT is_na
ORDER BY 2,3,4;
```

#### Find tables that are eligible to be autovacuumed
<a name="apg-waits.iodatafileread.actions.maintenance.autovacuumed"></a>

To find tables that are eligible to be autovacuumed, run the following query.

```
--This query shows tables that need vacuuming and are eligible candidates.
--The following query lists all tables that are due to be processed by autovacuum. 
-- During normal operation, this query should return very little.
WITH  vbt AS (SELECT setting AS autovacuum_vacuum_threshold 
              FROM pg_settings WHERE name = 'autovacuum_vacuum_threshold')
    , vsf AS (SELECT setting AS autovacuum_vacuum_scale_factor 
              FROM pg_settings WHERE name = 'autovacuum_vacuum_scale_factor')
    , fma AS (SELECT setting AS autovacuum_freeze_max_age 
              FROM pg_settings WHERE name = 'autovacuum_freeze_max_age')
    , sto AS (SELECT opt_oid, split_part(setting, '=', 1) as param, 
                split_part(setting, '=', 2) as value 
              FROM (SELECT oid opt_oid, unnest(reloptions) setting FROM pg_class) opt)
SELECT
    '"'||ns.nspname||'"."'||c.relname||'"' as relation
    , pg_size_pretty(pg_table_size(c.oid)) as table_size
    , age(relfrozenxid) as xid_age
    , coalesce(cfma.value::float, autovacuum_freeze_max_age::float) autovacuum_freeze_max_age
    , (coalesce(cvbt.value::float, autovacuum_vacuum_threshold::float) + 
         coalesce(cvsf.value::float,autovacuum_vacuum_scale_factor::float) * c.reltuples) 
         as autovacuum_vacuum_tuples
    , n_dead_tup as dead_tuples
FROM pg_class c 
JOIN pg_namespace ns ON ns.oid = c.relnamespace
JOIN pg_stat_all_tables stat ON stat.relid = c.oid
JOIN vbt on (1=1) 
JOIN vsf ON (1=1) 
JOIN fma on (1=1)
LEFT JOIN sto cvbt ON cvbt.param = 'autovacuum_vacuum_threshold' AND c.oid = cvbt.opt_oid
LEFT JOIN sto cvsf ON cvsf.param = 'autovacuum_vacuum_scale_factor' AND c.oid = cvsf.opt_oid
LEFT JOIN sto cfma ON cfma.param = 'autovacuum_freeze_max_age' AND c.oid = cfma.opt_oid
WHERE c.relkind = 'r' 
AND nspname <> 'pg_catalog'
AND (
    age(relfrozenxid) >= coalesce(cfma.value::float, autovacuum_freeze_max_age::float)
    or
    coalesce(cvbt.value::float, autovacuum_vacuum_threshold::float) + 
      coalesce(cvsf.value::float,autovacuum_vacuum_scale_factor::float) * c.reltuples <= n_dead_tup
    -- or 1 = 1
)
ORDER BY age(relfrozenxid) DESC;
```

### Respond to high numbers of connections
<a name="apg-waits.iodatafileread.actions.connections"></a>

When you monitor Amazon CloudWatch, you might find that the `DatabaseConnections` metric spikes. This increase indicates an increased number of connections to your database. We recommend the following approach:
+ Limit the number of connections that the application can open with each instance. If your application has an embedded connection pool feature, set a reasonable number of connections. Base the number on what the vCPUs in your instance can parallelize effectively.

  If your application doesn't use a connection pool feature, considering using Amazon RDS Proxy or an alternative. This approach lets your application open multiple connections with the load balancer. The balancer can then open a restricted number of connections with the database. As fewer connections are running in parallel, your DB instance performs less context switching in the kernel. Queries should progress faster, leading to fewer wait events. For more information, see [Amazon RDS Proxyfor Aurora](rds-proxy.md).
+ Whenever possible, take advantage of reader nodes for Aurora PostgreSQL and read replicas for RDS for PostgreSQL. When your application runs a read-only operation, send these requests to the reader-only endpoint. This technique spreads application requests across all reader nodes, reducing the I/O pressure on the writer node.
+ Consider scaling up your DB instance. A higher-capacity instance class gives more memory, which gives Aurora PostgreSQL a larger shared buffer pool to hold pages. The larger size also gives the DB instance more vCPUs to handle connections. More vCPUs are particularly helpful when the operations that are generating `IO:DataFileRead` wait events are writes.

# IO:XactSync
<a name="apg-waits.xactsync"></a>

The `IO:XactSync` event occurs when the database is waiting for the Aurora storage subsystem to acknowledge the commit of a regular transaction, or the commit or rollback of a prepared transaction. A prepared transaction is part of PostgreSQL's support for a two-phase commit. This event can also occur when a query is waiting for another transaction to commit, particularly in cases where auto-commit is turned off. In such scenarios, updates might appear to be waiting on XactSync even though they haven't been committed yet.

**Topics**
+ [Supported engine versions](#apg-waits.xactsync.context.supported)
+ [Context](#apg-waits.xactsync.context)
+ [Likely causes of increased waits](#apg-waits.xactsync.causes)
+ [Actions](#apg-waits.xactsync.actions)

## Supported engine versions
<a name="apg-waits.xactsync.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.xactsync.context"></a>

The event `IO:XactSync` indicates that the instance is spending time waiting for the Aurora storage subsystem to confirm that transaction data was processed.

## Likely causes of increased waits
<a name="apg-waits.xactsync.causes"></a>

When the `IO:XactSync` event appears more than normal, possibly indicating a performance problem, typical causes include the following:

**Network saturation**  
Traffic between clients and the DB instance or traffic to the storage subsystem might be too heavy for the network bandwidth.

**CPU pressure**  
A heavy workload might be preventing the Aurora storage daemon from getting sufficient CPU time.

## Actions
<a name="apg-waits.xactsync.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Monitor your resources](#apg-waits.xactsync.actions.monitor)
+ [Scale up the CPU](#apg-waits.xactsync.actions.scalecpu)
+ [Increase network bandwidth](#apg-waits.xactsync.actions.scalenetwork)
+ [Reduce the number of commits](#apg-waits.xactsync.actions.commits)

### Monitor your resources
<a name="apg-waits.xactsync.actions.monitor"></a>

To determine the cause of the increased `IO:XactSync` events, check the following metrics:
+ `WriteThroughput` and `CommitThroughput` – Changes in write throughput or commit throughput can show an increase in workload.
+ `WriteLatency` and `CommitLatency` – Changes in write latency or commit latency can show that the storage subsystem is being asked to do more work.
+ `CPUUtilization` – If the instance's CPU utilization is above 90 percent, the Aurora storage daemon might not be getting sufficient time on the CPU. In this case, I/O performance degrades.

For information about these metrics, see [Instance-level metrics for Amazon Aurora](Aurora.AuroraMonitoring.Metrics.md#Aurora.AuroraMySQL.Monitoring.Metrics.instances).

### Scale up the CPU
<a name="apg-waits.xactsync.actions.scalecpu"></a>

To address CPU starvation issues, consider changing to an instance type with more CPU capacity. For information about CPU capacity for a DB instance class, see [Hardware specifications for DB instance classesfor Aurora](Concepts.DBInstanceClass.Summary.md).

### Increase network bandwidth
<a name="apg-waits.xactsync.actions.scalenetwork"></a>

To determine whether the instance is reaching its network bandwidth limits, check for the following other wait events: 
+ `IO:DataFileRead`, `IO:BufferRead`, `IO:BufferWrite`, and `IO:XactWrite` – Queries using large amounts of I/O can generate more of these wait events.
+ `Client:ClientRead` and `Client:ClientWrite` – Queries with large amounts of client communication can generate more of these wait events.

If network bandwidth is an issue, consider changing to an instance type with more network bandwidth. For information about network performance for a DB instance class, see [Hardware specifications for DB instance classesfor Aurora](Concepts.DBInstanceClass.Summary.md).

### Reduce the number of commits
<a name="apg-waits.xactsync.actions.commits"></a>

To reduce the number of commits, combine statements into transaction blocks.

# IPC:DamRecordTxAck
<a name="apg-waits.ipcdamrecordtxac"></a>

The `IPC:DamRecordTxAck` event occurs when Aurora PostgreSQL in a session using database activity streams generates an activity stream event, then waits for that event to become durable. 

**Topics**
+ [Relevant engine versions](#apg-waits.ipcdamrecordtxac.context.supported)
+ [Context](#apg-waits.ipcdamrecordtxac.context)
+ [Causes](#apg-waits.ipcdamrecordtxac.causes)
+ [Actions](#apg-waits.ipcdamrecordtxac.actions)

## Relevant engine versions
<a name="apg-waits.ipcdamrecordtxac.context.supported"></a>

This wait event information is relevant for all Aurora PostgreSQL 10.7 and higher 10 versions, 11.4 and higher 11 versions, and all 12 and 13 versions.

## Context
<a name="apg-waits.ipcdamrecordtxac.context"></a>

In synchronous mode, durability of activity stream events is favored over database performance. While waiting for a durable write of the event, the session blocks other database activity, causing the `IPC:DamRecordTxAck` wait event.

## Causes
<a name="apg-waits.ipcdamrecordtxac.causes"></a>

The most common cause for the `IPC:DamRecordTxAck` event to appear in top waits is that the Database Activity Streams (DAS) feature is a holistic audit. Higher SQL activity generates activity stream events that need to be recorded.

## Actions
<a name="apg-waits.ipcdamrecordtxac.actions"></a>

We recommend different actions depending on the causes of your wait event:
+ Reduce the number of SQL statements or turn off database activity streams. Doing this reduces the number of events that require durable writes.
+ Change to asynchronous mode. Doing this helps to reduce contention on the `IPC:DamRecordTxAck` wait event.

  However, the DAS feature can't guarantee the durability of every event in asynchronous mode.

# IPC:parallel wait events
<a name="apg-ipc-parallel"></a>

The following `IPC:parallel wait events` indicate that a session is waiting for inter-process communication related to parallel query execution operations.
+ `IPC:BgWorkerStartup` - A process is waiting for a parallel worker process to complete its startup sequence. This happens when initializing workers for parallel query execution.
+ `IPC:BgWorkerShutdown` - A process is waiting for a parallel worker process to complete its shutdown sequence. This occurs during the cleanup phase of parallel query execution.
+ `IPC:ExecuteGather` - A process is waiting to receive data from parallel worker processes during query execution. This occurs when the leader process needs to gather results from its workers.
+ `IPC:ParallelFinish` - A process is waiting for parallel workers to finish their execution and report their final results. This happens during the completion phase of parallel query execution.

**Topics**
+ [Supported engine versions](#apg-ipc-parallel-context-supported)
+ [Context](#apg-ipc-parallel-context)
+ [Likely causes of increased waits](#apg-ipc-parallel-causes)
+ [Actions](#apg-ipc-parallel-actions)

## Supported engine versions
<a name="apg-ipc-parallel-context-supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-ipc-parallel-context"></a>

Parallel query execution in PostgreSQL involves multiple processes working together to process a single query. When a query is determined to be suitable for parallelization, a leader process coordinates with one or more parallel worker processes based on the `max_parallel_workers_per_gather` parameter setting. The leader process divides the work among workers, each worker processes its portion of data independently, and results are gathered back to the leader process.

**Note**  
Each parallel worker operates as a separate process with resource requirements similar to a full user session. This means a parallel query with 4 workers can consume up to 5 times the resources (CPU, memory, I/O bandwidth) compared to a non-parallel query, as both the leader process and each worker process maintain their own resource allocations. For instance, settings like `work_mem` are applied individually to each worker, potentially multiplying the total memory usage across all processes.

The parallel query architecture consists of three main components:
+ Leader process: The main process that initiates the parallel operation, divides the workload, and coordinates with worker processes.
+ Worker processes: Background processes that execute portions of the query in parallel.
+ Gather/Gather merge: Operations that combine results from multiple worker processes back to the leader

During parallel execution, processes need to communicate with each other through Inter-Process Communication (IPC) mechanisms. These IPC wait events occur during different phases:
+ Worker startup: When parallel workers are being initialized
+ Data exchange: When workers are processing data and sending results to the leader
+ Worker shutdown: When parallel execution completes and workers are being terminated
+ Synchronization points: When processes need to coordinate or wait for other processes to complete their tasks

Understanding these wait events is crucial for diagnosing performance issues related to parallel query execution, especially in high-concurrency environments where multiple parallel queries may be executing simultaneously.

## Likely causes of increased waits
<a name="apg-ipc-parallel-causes"></a>

Several factors can contribute to an increase in parallel-related IPC wait events:

**High concurrency of parallel queries**  
When many parallel queries are running simultaneously, it can lead to resource contention and increased waiting times for IPC operations. This is particularly common in systems with high transaction volumes or analytical workloads.

**Suboptimal parallel query plans**  
If the query planner chooses inefficient parallel plans, it may result in unnecessary parallelization or poor work distribution among workers. This can lead to increased IPC waits, especially for IPC:ExecuteGather and IPC:ParallelFinish events. These planning issues often stem from outdated statistics and table/index bloat.

**Frequent startup and shutdown of parallel workers**  
Short-lived queries that frequently initiate and terminate parallel workers can cause an increase in `IPC:BgWorkerStartup` and `IPC:BgWorkerShutdown` events. This is often seen in OLTP workloads with many small, parallelizable queries.

**Resource constraints**  
Limited CPU, memory, or I/O capacity can cause bottlenecks in parallel execution, leading to increased wait times across all IPC events. For example, if CPU is saturated, worker processes may take longer to start up or process their portion of work.

**Complex query structures**  
Queries with multiple levels of parallelism (e.g., parallel joins followed by parallel aggregations) can lead to more complex IPC patterns and potentially increased wait times, especially for `IPC:ExecuteGather` events.

**Large result sets**  
Queries that produce large result sets may cause increased `IPC:ExecuteGather` wait times as the leader process spends more time collecting and processing results from worker processes.

Understanding these factors can help in diagnosing and addressing performance issues related to parallel query execution in Aurora PostgreSQL.

## Actions
<a name="apg-ipc-parallel-actions"></a>

When you see waits related to parallel query, it typically means that a backend process is coordinating or waiting on parallel worker processes. These waits are common during execution of parallel plans. You can investigate and mitigate the impact of these waits by monitoring parallel worker usage, reviewing the parameter settings, and tuning query execution and resource allocation.

**Topics**
+ [Analyze query plans for inefficient parallelism](#apg-ipc-parallel-analyze-plans)
+ [Monitor parallel query usage](#apg-ipc-parallel-monitor)
+ [Review and adjust parallel query settings](#apg-ipc-parallel-adjust-settings)
+ [Optimize resource allocation](#apg-ipc-parallel-optimize-resources)
+ [Investigate connection management](#apg-ipc-parallel-connection-management)
+ [Review and optimize maintenance operations](#apg-ipc-parallel-maintenance)
+ [Utilize Query Plan Management (QPM)](#apg-ipc-parallel-query-plan-management)

### Analyze query plans for inefficient parallelism
<a name="apg-ipc-parallel-analyze-plans"></a>

Parallel query execution can often lead to system instability, CPU spikes, and unpredictable query performance variance. It's crucial to thoroughly analyze whether parallelism actually improves your specific workload. Use EXPLAIN ANALYZE to review parallel query execution plans.

Temporarily disable parallelism at the session level to compare plan efficiency:

```
SET max_parallel_workers_per_gather = 0;
EXPLAIN ANALYZE <your_query>;
```

Re-enable parallelism and compare:

```
RESET max_parallel_workers_per_gather;
EXPLAIN ANALYZE <your_query>;
```

If disabling parallelism yields better or more consistent results, consider disabling it for specific queries at the session level using SET commands. For a broader impact, you might want to disable parallelism at the instance level by adjusting the relevant parameters in your cluster or instance parameter group. For more information, see [Amazon Aurora PostgreSQL parameters](AuroraPostgreSQL.Reference.ParameterGroups.md).

### Monitor parallel query usage
<a name="apg-ipc-parallel-monitor"></a>

Use the following queries to gain visibility into parallel query activity and capacity:

Check active parallel worker processes:

```
SELECT
    COUNT(*)
FROM
    pg_stat_activity
WHERE
    backend_type = 'parallel worker';
```

This query shows the number of active parallel worker processes. A high value may indicate that your `max\$1parallel\$1workers` is configured with a high value and you might want to consider reducing it.

Check concurrent parallel queries:

```
SELECT
    COUNT(DISTINCT leader_pid)
FROM
    pg_stat_activity
WHERE
    leader_pid IS NOT NULL;
```

This query returns the number of distinct leader processes that have launched parallel queries. A high number here indicates that multiple sessions are running parallel queries concurrently, which can increase demand on CPU and memory.

### Review and adjust parallel query settings
<a name="apg-ipc-parallel-adjust-settings"></a>

Review the following parameters to ensure they align with your workload:
+ `max_parallel_workers`: Total number of parallel workers across all sessions.
+ `max_parallel_workers_per_gather`: Max workers per query.

For OLAP workloads, increasing these values can improve performance. For OLTP workloads, lower values are generally preferred.

```
SHOW max_parallel_workers;
SHOW max_parallel_workers_per_gather;
```

### Optimize resource allocation
<a name="apg-ipc-parallel-optimize-resources"></a>

Monitor CPU utilization and consider adjusting the number of vCPUs if consistently high and if your application benefits from parallel queries. Ensure adequate memory is available for parallel operations.
+ Use Performance Insights metrics to determine if the system is CPU-bound.
+ Each parallel worker uses its own `work_mem`. Ensure total memory usage is within instance limits.

The parallel queries may consume very substantially more resources than non-parallel queries, because each worker process is a completely separate process which has roughly the same impact on the system as an additional user session. This should be taken into account when choosing a value for this setting, as well as when configuring other settings that control resource utilization, such as `work_mem`. For more information, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM). Resource limits such as `work_mem` are applied individually to each worker, which means the total utilization may be much higher across all processes than it would normally be for any single process.

Consider increasing vCPUs or tuning memory parameters if your workload is heavily parallelized.

### Investigate connection management
<a name="apg-ipc-parallel-connection-management"></a>

If experiencing connection exhaustion, review application connection pooling strategies. Consider implementing connection pooling at the application level if not already in use.

### Review and optimize maintenance operations
<a name="apg-ipc-parallel-maintenance"></a>

Coordinate index creation and other maintenance tasks to prevent resource contention. Consider scheduling these operations during off-peak hours. Avoid scheduling heavy maintenance (e.g., parallel index builds) during periods of high user query load. These operations can consume parallel workers and impact performance for regular queries.

### Utilize Query Plan Management (QPM)
<a name="apg-ipc-parallel-query-plan-management"></a>

In Aurora PostgreSQL, the Query Plan Management (QPM) feature is designed to ensure plan adaptability and stability, regardless of database environment changes that might cause query plan regression. For more information, see [Overview of Aurora PostgreSQL query plan management](AuroraPostgreSQL.Optimize.overview.md).QPM provides some control over the optimizer. Review approved plans in QPM to ensure they align with current parallelism settings. Update or remove outdated plans that may be forcing suboptimal parallel execution.

You can also fix the plans using pg\$1hint\$1plan. For more information, see [Fixing plans using pg\$1hint\$1plan](AuroraPostgreSQL.Optimize.Maintenance.md#AuroraPostgreSQL.Optimize.Maintenance.pg_hint_plan). You can use the hint named `Parallel` to enforce parallel execution. For more information, see the [Hints for parallel plans](https://github.com/ossc-db/pg_hint_plan/blob/master/docs/hint_table.md#hints-for-parallel-plans).

# IPC:ProcArrayGroupUpdate
<a name="apg-rpg-ipcprocarraygroup"></a>

The `IPC:ProcArrayGroupUpdate` event occurs when a session is waiting for the group leader to update the transaction status at the end of that operation. While PostgreSQL generally associates IPC type wait events with parallel query operations, this particular wait event is not specific to parallel queries.

**Topics**
+ [Supported engine versions](#apg-rpg-ipcprocarraygroup.supported)
+ [Context](#apg-rpg-ipcprocarraygroup.context)
+ [Likely causes of increased waits](#apg-rpg-ipcprocarraygroup.causes)
+ [Actions](#apg-rpg-ipcprocarraygroup.actions)

## Supported engine versions
<a name="apg-rpg-ipcprocarraygroup.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-rpg-ipcprocarraygroup.context"></a>

**Understanding the process array** – The process (proc) array is a shared memory structure in PostgreSQL. It holds information about all running processes, including transaction details. During transaction completion (`COMMIT` or `ROLLBACK`), the ProcArray needs to be updated to reflect the change and clear the transactionID from the array. The session attempting to finish its transaction must acquire an exclusive lock on the ProcArray. This prevents other processes from obtaining shared or exclusive locks on it.

**Group update mechanism** – While performing a COMMIT or ROLLBACK, if a backend process cannot obtain a ProcArrayLock in exclusive mode, it updates a special field called ProcArrayGroupMember. This adds the transaction to the list of sessions that intend to end. This backend process then sleeps and the time it sleeps is instrumented as the ProcArrayGroupUpdate wait event. The first process in the ProcArray with procArrayGroupMember, referred to as the leader process, acquires the ProcArrayLock in exclusive mode. It then clears the list of processes waiting for group transactionID clearing. Once this completes, the leader releases the ProcArrayLock and then wakes up all processes in this list, notifying them that their transaction is completed.

## Likely causes of increased waits
<a name="apg-rpg-ipcprocarraygroup.causes"></a>

The more processes that are running, the longer a leader will hold on to a procArrayLock in exclusive mode. Consequently, the more write transactions end up in a group update scenario causing a potential pile up of processes waiting on the `ProcArrayGroupUpdate` wait event. In Database Insights' Top SQL view, you will see that COMMIT is the statement with the majority of this wait event. This is expected but will require deeper investigation into the specific write SQL being run to determine what appropriate action to take.

## Actions
<a name="apg-rpg-ipcprocarraygroup.actions"></a>

We recommend different actions depending on the causes of your wait event. Identify `IPC:ProcArrayGroupUpdate` events by using Amazon RDS Performance Insights or by querying the PostgreSQL system view `pg_stat_activity`.

**Topics**
+ [Monitoring transaction commit and rollback operations](#apg-rpg-ipcprocarraygroup.actions.monitor)
+ [Reducing concurrency](#apg-rpg-ipcprocarraygroup.actions.concurrency)
+ [Implementing connection pooling](#apg-rpg-ipcprocarraygroup.actions.pooling)

### Monitoring transaction commit and rollback operations
<a name="apg-rpg-ipcprocarraygroup.actions.monitor"></a>

**Monitor commits and rollbacks** – An increased number of commits and rollbacks can lead to increased pressure on the ProcArray. For example, if a SQL statement begins to fail due to increased duplicate key violations, you may see an increase in rollbacks which can increase ProcArray contention and table bloat.

Amazon RDS Database Insights provides the PostgreSQL metrics `xact_commit` and `xact_rollback` to report the number of commits and rollbacks per second.

### Reducing concurrency
<a name="apg-rpg-ipcprocarraygroup.actions.concurrency"></a>

**Batching transactions** – Where possible, batch operations in single transactions to reduce commit/rollback operations.

**Limit concurrency** – Reduce the number of concurrently active transactions to alleviate lock contention on the ProcArray. While it will require some testing, reducing the total number of concurrent connections can reduce contention and maintain throughput.

### Implementing connection pooling
<a name="apg-rpg-ipcprocarraygroup.actions.pooling"></a>

**Connection pooling solutions** – Use connection pooling to manage database connections efficiently, reducing the total number of backends and therefore the workload on the ProcArray. While it will require some testing, reducing the total number of concurrent connections can reduce contention and maintain throughput.

For more information, see [Connection pooling for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.connection_pooling.html).

**Reduce connection storms** – Similarly, a pattern of frequently creating and terminating connections causes additional pressure on the ProcArray. By reducing this pattern, overall contention is reduced.

# Lock:advisory
<a name="apg-waits.lockadvisory"></a>

The `Lock:advisory` event occurs when a PostgreSQL application uses a lock to coordinate activity across multiple sessions.

**Topics**
+ [Relevant engine versions](#apg-waits.lockadvisory.context.supported)
+ [Context](#apg-waits.lockadvisory.context)
+ [Causes](#apg-waits.lockadvisory.causes)
+ [Actions](#apg-waits.lockadvisory.actions)

## Relevant engine versions
<a name="apg-waits.lockadvisory.context.supported"></a>

This wait event information is relevant for Aurora PostgreSQL versions 9.6 and higher.

## Context
<a name="apg-waits.lockadvisory.context"></a>

PostgreSQL advisory locks are application-level, cooperative locks explicitly locked and unlocked by the user's application code. An application can use PostgreSQL advisory locks to coordinate activity across multiple sessions. Unlike regular, object- or row-level locks, the application has full control over the lifetime of the lock. For more information, see [Advisory Locks](https://www.postgresql.org/docs/12/explicit-locking.html#ADVISORY-LOCKS) in the PostgreSQL documentation.

Advisory locks can be released before a transaction ends or be held by a session across transactions. This isn't true for implicit, system-enforced locks, such as an access-exclusive lock on a table acquired by a `CREATE INDEX` statement.

For a description of the functions used to acquire (lock) and release (unlock) advisory locks, see [Advisory Lock Functions](https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS) in the PostgreSQL documentation.

Advisory locks are implemented on top of the regular PostgreSQL locking system and are visible in the `pg_locks` system view.

## Causes
<a name="apg-waits.lockadvisory.causes"></a>

This lock type is exclusively controlled by an application explicitly using it. Advisory locks that are acquired for each row as part of a query can cause a spike in locks or a long-term buildup.

These effects happen when the query is run in a way that acquires locks on more rows than are returned by the query. The application must eventually release every lock, but if locks are acquired on rows that aren't returned, the application can't find all of the locks.

The following example is from [Advisory Locks](https://www.postgresql.org/docs/12/explicit-locking.html#ADVISORY-LOCKS) in the PostgreSQL documentation.

```
SELECT pg_advisory_lock(id) FROM foo WHERE id > 12345 LIMIT 100;
```

In this example, the `LIMIT` clause can only stop the query's output after the rows have already been internally selected and their ID values locked. This can happen suddenly when a growing data volume causes the planner to choose a different execution plan that wasn't tested during development. The buildup in this case happens because the application explicitly calls `pg_advisory_unlock` for every ID value that was locked. However, in this case it can't find the set of locks acquired on rows that weren't returned. Because the locks are acquired on the session level, they aren't released automatically at the end of the transaction.

Another possible cause for spikes in blocked lock attempts is unintended conflicts. In these conflicts, unrelated parts of the application share the same lock ID space by mistake.

## Actions
<a name="apg-waits.lockadvisory.actions"></a>

Review application usage of advisory locks and detail where and when in the application flow each type of advisory lock is acquired and released.

Determine whether a session is acquiring too many locks or a long-running session isn't releasing locks early enough, leading to a slow buildup of locks. You can correct a slow buildup of session-level locks by ending the session using `pg_terminate_backend(pid)`. 

A client waiting for an advisory lock appears in `pg_stat_activity` with `wait_event_type=Lock` and `wait_event=advisory`. You can obtain specific lock values by querying the `pg_locks` system view for the same `pid`, looking for `locktype=advisory` and `granted=f`.

You can then identify the blocking session by querying `pg_locks` for the same advisory lock having `granted=t`, as shown in the following example.

```
SELECT blocked_locks.pid AS blocked_pid,
         blocking_locks.pid AS blocking_pid,
         blocked_activity.usename AS blocked_user,
         blocking_activity.usename AS blocking_user,
         now() - blocked_activity.xact_start AS blocked_transaction_duration,
         now() - blocking_activity.xact_start AS blocking_transaction_duration,
         concat(blocked_activity.wait_event_type,':',blocked_activity.wait_event) AS blocked_wait_event,
         concat(blocking_activity.wait_event_type,':',blocking_activity.wait_event) AS blocking_wait_event,
         blocked_activity.state AS blocked_state,
         blocking_activity.state AS blocking_state,
         blocked_locks.locktype AS blocked_locktype,
         blocking_locks.locktype AS blocking_locktype,
         blocked_activity.query AS blocked_statement,
         blocking_activity.query AS blocking_statement
    FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks blocking_locks
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
    WHERE NOT blocked_locks.GRANTED;
```

All of the advisory lock API functions have two sets of arguments, either one `bigint` argument or two `integer` arguments:
+ For the API functions with one `bigint` argument, the upper 32 bits are in `pg_locks.classid` and the lower 32 bits are in `pg_locks.objid`.
+ For the API functions with two `integer` arguments, the first argument is `pg_locks.classid` and the second argument is `pg_locks.objid`.

The `pg_locks.objsubid` value indicates which API form was used: `1` means one `bigint` argument; `2` means two `integer` arguments.

# Lock:extend
<a name="apg-waits.lockextend"></a>

The `Lock:extend` event occurs when a backend process is waiting to lock a relation to extend it while another process has a lock on that relation for the same purpose.

**Topics**
+ [Supported engine versions](#apg-waits.lockextend.context.supported)
+ [Context](#apg-waits.lockextend.context)
+ [Likely causes of increased waits](#apg-waits.lockextend.causes)
+ [Actions](#apg-waits.lockextend.actions)

## Supported engine versions
<a name="apg-waits.lockextend.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.lockextend.context"></a>

The event `Lock:extend` indicates that a backend process is waiting to extend a relation that another backend process holds a lock on while it's extending that relation. Because only one process at a time can extend a relation, the system generates a `Lock:extend` wait event. `INSERT`, `COPY`, and `UPDATE` operations can generate this event.

## Likely causes of increased waits
<a name="apg-waits.lockextend.causes"></a>

When the `Lock:extend` event appears more than normal, possibly indicating a performance problem, typical causes include the following:

**Surge in concurrent inserts or updates to the same table **  
There might be an increase in the number of concurrent sessions with queries that insert into or update the same table.

**Insufficient network bandwidth**  
The network bandwidth on the DB instance might be insufficient for the storage communication needs of the current workload. This can contribute to storage latency that causes an increase in `Lock:extend` events.

## Actions
<a name="apg-waits.lockextend.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Reduce concurrent inserts and updates to the same relation](#apg-waits.lockextend.actions.action1)
+ [Increase network bandwidth](#apg-waits.lockextend.actions.increase-network-bandwidth)

### Reduce concurrent inserts and updates to the same relation
<a name="apg-waits.lockextend.actions.action1"></a>

First, determine whether there's an increase in `tup_inserted` and `tup_updated` metrics and an accompanying increase in this wait event. If so, check which relations are in high contention for insert and update operations. To determine this, query the `pg_stat_all_tables` view for the values in `n_tup_ins` and `n_tup_upd` fields. For information about the `pg_stat_all_tables` view, see [pg\$1stat\$1all\$1tables](https://www.postgresql.org/docs/13/monitoring-stats.html#MONITORING-PG-STAT-ALL-TABLES-VIEW) in the PostgreSQL documentation. 

To get more information about blocking and blocked queries, query `pg_stat_activity` as in the following example:

```
SELECT
    blocked.pid,
    blocked.usename,
    blocked.query,
    blocking.pid AS blocking_id,
    blocking.query AS blocking_query,
    blocking.wait_event AS blocking_wait_event,
    blocking.wait_event_type AS blocking_wait_event_type
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
where
blocked.wait_event = 'extend'
and blocked.wait_event_type = 'Lock';
 
   pid  | usename  |            query             | blocking_id |                         blocking_query                           | blocking_wait_event | blocking_wait_event_type
  ------+----------+------------------------------+-------------+------------------------------------------------------------------+---------------------+--------------------------
   7143 |  myuser  | insert into tab1 values (1); |        4600 | INSERT INTO tab1 (a) SELECT s FROM generate_series(1,1000000) s; | DataFileExtend      | IO
```

After you identify relations that contribute to increase `Lock:extend` events, use the following techniques to reduce the contention:
+ Find out whether you can use partitioning to reduce contention for the same table. Separating inserted or updated tuples into different partitions can reduce contention. For information about partitioning, see [Managing PostgreSQL partitions with the pg\$1partman extension](PostgreSQL_Partitions.md).
+ If the wait event is mainly due to update activity, consider reducing the relation's fillfactor value. This can reduce requests for new blocks during the update. The fillfactor is a storage parameter for a table that determines the maximum amount of space for packing a table page. It's expressed as a percentage of the total space for a page. For more information about the fillfactor parameter, see [CREATE TABLE](https://www.postgresql.org/docs/13/sql-createtable.html) in the PostgreSQL documentation. 
**Important**  
We highly recommend that you test your system if you change the fillfactor because changing this value can negatively impact performance, depending on your workload.

### Increase network bandwidth
<a name="apg-waits.lockextend.actions.increase-network-bandwidth"></a>

To see whether there's an increase in write latency, check the `WriteLatency` metric in CloudWatch. If there is, use the `WriteThroughput` and `ReadThroughput` Amazon CloudWatch metrics to monitor the storage related traffic on the DB cluster. These metrics can help you to determine if network bandwidth is sufficient for the storage activity of your workload.

If your network bandwidth isn't enough, increase it. If your DB instance is reaching the network bandwidth limits, the only way to increase the bandwidth is to increase your DB instance size.

For more information about CloudWatch metrics, see [Amazon CloudWatch metrics for Amazon Aurora](Aurora.AuroraMonitoring.Metrics.md). For information about network performance for each DB instance class, see [Hardware specifications for DB instance classesfor Aurora](Concepts.DBInstanceClass.Summary.md).

# Lock:Relation
<a name="apg-waits.lockrelation"></a>

The `Lock:Relation` event occurs when a query is waiting to acquire a lock on a table or view (relation) that's currently locked by another transaction.

**Topics**
+ [Supported engine versions](#apg-waits.lockrelation.context.supported)
+ [Context](#apg-waits.lockrelation.context)
+ [Likely causes of increased waits](#apg-waits.lockrelation.causes)
+ [Actions](#apg-waits.lockrelation.actions)

## Supported engine versions
<a name="apg-waits.lockrelation.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.lockrelation.context"></a>

Most PostgreSQL commands implicitly use locks to control concurrent access to data in tables. You can also use these locks explicitly in your application code with the `LOCK` command. Many lock modes aren't compatible with each other, and they can block transactions when they're trying to access the same object. When this happens, Aurora PostgreSQL generates a `Lock:Relation` event. Some common examples are the following:
+ Exclusive locks such as `ACCESS EXCLUSIVE` can block all concurrent access. Data definition language (DDL) operations such as `DROP TABLE`, `TRUNCATE`, `VACUUM FULL`, and `CLUSTER` acquire `ACCESS EXCLUSIVE` locks implicitly. `ACCESS EXCLUSIVE` is also the default lock mode for `LOCK TABLE` statements that don't specify a mode explicitly.
+ Using `CREATE INDEX (without CONCURRENT)` on a table conflicts with data manipulation language (DML) statements `UPDATE`, `DELETE`, and `INSERT`, which acquire `ROW EXCLUSIVE` locks.

For more information about table-level locks and conflicting lock modes, see [Explicit Locking](https://www.postgresql.org/docs/13/explicit-locking.html) in the PostgreSQL documentation.

Blocking queries and transactions typically unblock in one of the following ways:
+ Blocking query – The application can cancel the query or the user can end the process. The engine can also force the query to end because of a session's statement-timeout or a deadlock detection mechanism.
+ Blocking transaction – A transaction stops blocking when it runs a `ROLLBACK` or `COMMIT` statement. Rollbacks also happen automatically when sessions are disconnected by a client or by network issues, or are ended. Sessions can be ended when the database engine is shut down, when the system is out of memory, and so forth.

## Likely causes of increased waits
<a name="apg-waits.lockrelation.causes"></a>

When the `Lock:Relation` event occurs more frequently than normal, it can indicate a performance issue. Typical causes include the following:

**Increased concurrent sessions with conflicting table locks**  
There might be an increase in the number of concurrent sessions with queries that lock the same table with conflicting locking modes.

**Maintenance operations**  
Health maintenance operations such as `VACUUM` and `ANALYZE` can significantly increase the number of conflicting locks. `VACUUM FULL` acquires an `ACCESS EXCLUSIVE` lock, and `ANALYZE` acquires a `SHARE UPDATE EXCLUSIVE` lock. Both types of locks can cause a `Lock:Relation` wait event. Application data maintenance operations such as refreshing a materialized view can also increase blocked queries and transactions.

**Locks on reader instances**  
There might be a conflict between the relation locks held by the writer and readers. Currently, only `ACCESS EXCLUSIVE` relation locks are replicated to reader instances. However, the `ACCESS EXCLUSIVE` relation lock will conflict with any `ACCESS SHARE` relation locks held by the reader. This can cause an increase in lock relation wait events on the reader. 

## Actions
<a name="apg-waits.lockrelation.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Reduce the impact of blocking SQL statements](#apg-waits.lockrelation.actions.reduce-blocks)
+ [Minimize the effect of maintenance operations](#apg-waits.lockrelation.actions.maintenance)
+ [Check for reader locks](#apg-waits.lockrelation.actions.readerlocks)

### Reduce the impact of blocking SQL statements
<a name="apg-waits.lockrelation.actions.reduce-blocks"></a>

To reduce the impact of blocking SQL statements, modify your application code where possible. Following are two common techniques for reducing blocks:
+ Use the `NOWAIT` option – Some SQL commands, such as `SELECT` and `LOCK` statements, support this option. The `NOWAIT` directive cancels the lock-requesting query if the lock can't be acquired immediately. This technique can help prevent a blocking session from causing a pile-up of blocked sessions behind it.

  For example: Assume that transaction A is waiting on a lock held by transaction B. Now, if B requests a lock on a table that’s locked by transaction C, transaction A might be blocked until transaction C completes. But if transaction B uses a `NOWAIT` when it requests the lock on C, it can fail fast and ensure that transaction A doesn't have to wait indefinitely.
+ Use `SET lock_timeout` – Set a `lock_timeout` value to limit the time a SQL statement waits to acquire a lock on a relation. If the lock isn't acquired within the timeout specified, the transaction requesting the lock is canceled. Set this value at the session level.

### Minimize the effect of maintenance operations
<a name="apg-waits.lockrelation.actions.maintenance"></a>

Maintenance operations such as `VACUUM` and `ANALYZE` are important. We recommend that you don't turn them off because you find `Lock:Relation` wait events related to these maintenance operations. The following approaches can minimize the effect of these operations:
+ Run maintenance operations manually during off-peak hours.
+ To reduce `Lock:Relation` waits caused by autovacuum tasks, perform any needed autovacuum tuning. For information about tuning autovacuum, see [ Working with PostgreSQL autovacuum on Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.html) in the * Amazon RDS User Guide*.

### Check for reader locks
<a name="apg-waits.lockrelation.actions.readerlocks"></a>

You can see how concurrent sessions on a writer and readers might be holding locks that block each other. One way to do this is by running queries that return the lock type and relation. In the table, you can find a sequence of queries to two such concurrent sessions, a writer session and a reader session.

The replay process waits for the duration of `max_standby_streaming_delay` before canceling the reader query. As shown in the example, the lock timeout of 100ms is well below the default `max_standby_streaming_delay` of 30 seconds. The lock times out before it's an issue. 


| Sequence event | Session | Command or Output | 
| --- | --- | --- | 
|  Sets an environment variable called READER with the specified value and tries to connect to the DB instance with this endpoint.  |  Reader session  |  CLI command: <pre>export READER=aurorapg2.12345678910.us-west-1.rds.amazonaws.com<br /><br />psql -h $READER</pre> Output: 

```
psql (15devel, server 10.14)
Type "help" for help.
```  | 
|  Sets an environment variable called WRITER and tries to connect to the DB instance with this endpoint .  |  Writer session  |  CLI command: <pre>export WRITER=aurorapg1.12345678910.us-west-1.rds.amazonaws.com<br />psql -h $WRITER</pre> Output: 

```
psql (15devel, server 10.14) 
Type "help" for help.
```  | 
|  The writer session creates table t1 on the writer instance.  |  Writer session  |  PostgreSQL query: <pre>postgres=> CREATE TABLE t1(b integer);<br />CREATE TABLE</pre>  | 
|  If there are no conflicting queries on the writer, the ACCESS EXCLUSIVE lock is acquired on the writer immediately.  |  Writer session  |  `ACCESS EXCLUSIVE` lock enabled  | 
|  The reader session sets a lock timeout interval of 100 milliseconds.  |  Reader session  |  PostgreSQL query: <pre>postgres=> SET lock_timeout=100;<br />SET</pre>  | 
|  The reader session tries to read data from table t1 on the reader instance.  |  Reader session  |  PostgreSQL query: <pre>postgres=> SELECT * FROM t1;</pre> Sample output: 

```
b
---
(0 rows)
```  | 
|  The writer session drops t1.  |  Writer session  |  PostgreSQL query: <pre>postgres=> BEGIN;<br />BEGIN<br />postgres=> DROP TABLE t1;<br />DROP TABLE<br />postgres=></pre>  | 
|  The query times out and is canceled on the reader.  |  Reader session  |  PostgreSQL query: <pre>postgres=> SELECT * FROM t1;</pre> Sample output: 

```
ERROR:  canceling statement due to lock timeout
LINE 1: SELECT * FROM t1;
                      ^
```  | 
|  To determine the cause of the error. the reader session queries `pg_locks` and `pg_stat_activity`  |  Reader session  |  PostgreSQL query: <pre>postgres=> SELECT locktype, relation, mode, backend_type<br />postgres=> FROM pg_locks l, pg_stat_activity t1<br />postgres=> WHERE l.pid=t1.pid AND relation = 't1'::regclass::oid;</pre>  | 
|  The result indicates that the `aurora wal replay` process is holding an `ACCESS EXCLUSIVE` lock on table t1.  |  Reader session  |  Query result: <pre> locktype | relation |        mode         |   backend_type<br />----------+----------+---------------------+-------------------<br /> relation | 68628525 | AccessExclusiveLock | aurora wal replay<br />(1 row)</pre>  | 

# Lock:transactionid
<a name="apg-waits.locktransactionid"></a>

The `Lock:transactionid` event occurs when a transaction is waiting for a row-level lock.

**Topics**
+ [Supported engine versions](#apg-waits.locktransactionid.context.supported)
+ [Context](#apg-waits.locktransactionid.context)
+ [Likely causes of increased waits](#apg-waits.locktransactionid.causes)
+ [Actions](#apg-waits.locktransactionid.actions)

## Supported engine versions
<a name="apg-waits.locktransactionid.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.locktransactionid.context"></a>

The event `Lock:transactionid` occurs when a transaction is trying to acquire a row-level lock that has already been granted to a transaction that is running at the same time. The session that shows the `Lock:transactionid` wait event is blocked because of this lock. After the blocking transaction ends in either a `COMMIT` or `ROLLBACK` statement, the blocked transaction can proceed.

The multiversion concurrency control semantics of Aurora PostgreSQL guarantee that readers don't block writers and writers don't block readers. For row-level conflicts to occur, blocking and blocked transactions must issue conflicting statements of the following types:
+ `UPDATE`
+ `SELECT … FOR UPDATE`
+ `SELECT … FOR KEY SHARE`

The statement `SELECT … FOR KEY SHARE` is a special case. The database uses the clause `FOR KEY SHARE` to optimize the performance of referential integrity. A row-level lock on a row can block `INSERT`, `UPDATE`, and `DELETE` commands on other tables that reference the row.

## Likely causes of increased waits
<a name="apg-waits.locktransactionid.causes"></a>

When this event appears more than normal, the cause is typically `UPDATE`, `SELECT … FOR UPDATE`, or `SELECT … FOR KEY SHARE` statements combined with the following conditions.

**Topics**
+ [High concurrency](#apg-waits.locktransactionid.concurrency)
+ [Idle in transaction](#apg-waits.locktransactionid.idle)
+ [Long-running transactions](#apg-waits.locktransactionid.long-running)

### High concurrency
<a name="apg-waits.locktransactionid.concurrency"></a>

Aurora PostgreSQL can use granular row-level locking semantics. The probability of row-level conflicts increases when the following conditions are met:
+ A highly concurrent workload contends for the same rows.
+ Concurrency increases.

### Idle in transaction
<a name="apg-waits.locktransactionid.idle"></a>

Sometimes the `pg_stat_activity.state` column shows the value `idle in transaction`. This value appears for sessions that have started a transaction, but haven't yet issued a `COMMIT` or `ROLLBACK`. If the `pg_stat_activity.state` value isn't `active`, the query shown in `pg_stat_activity` is the most recent one to finish running. The blocking session isn't actively processing a query because an open transaction is holding a lock.

If an idle transaction acquired a row-level lock, it might be preventing other sessions from acquiring it. This condition leads to frequent occurrence of the wait event `Lock:transactionid`. To diagnose the issue, examine the output from `pg_stat_activity` and `pg_locks`.

### Long-running transactions
<a name="apg-waits.locktransactionid.long-running"></a>

Transactions that run for a long time get locks for a long time. These long-held locks can block other transactions from running.

## Actions
<a name="apg-waits.locktransactionid.actions"></a>

Row-locking is a conflict among `UPDATE`, `SELECT … FOR UPDATE`, or `SELECT … FOR KEY SHARE` statements. Before attempting a solution, find out when these statements are running on the same row. Use this information to choose a strategy described in the following sections.

**Topics**
+ [Respond to high concurrency](#apg-waits.locktransactionid.actions.problem)
+ [Respond to idle transactions](#apg-waits.locktransactionid.actions.find-blocker)
+ [Respond to long-running transactions](#apg-waits.locktransactionid.actions.concurrency)

### Respond to high concurrency
<a name="apg-waits.locktransactionid.actions.problem"></a>

If concurrency is the issue, try one of the following techniques:
+ Lower the concurrency in the application. For example, decrease the number of active sessions.
+ Implement a connection pool. To learn how to pool connections with RDS Proxy, see [Amazon RDS Proxyfor Aurora](rds-proxy.md).
+ Design the application or data model to avoid contending `UPDATE` and `SELECT … FOR UPDATE` statements. You can also decrease the number of foreign keys accessed by `SELECT … FOR KEY SHARE` statements.

### Respond to idle transactions
<a name="apg-waits.locktransactionid.actions.find-blocker"></a>

If `pg_stat_activity.state` shows `idle in transaction`, use the following strategies:
+ Turn on autocommit wherever possible. This approach prevents transactions from blocking other transactions while waiting for a `COMMIT` or `ROLLBACK`.
+ Search for code paths that are missing `COMMIT`, `ROLLBACK`, or `END`.
+ Make sure that the exception handling logic in your application always has a path to a valid `end of transaction`.
+ Make sure that your application processes query results after ending the transaction with `COMMIT` or `ROLLBACK`.

### Respond to long-running transactions
<a name="apg-waits.locktransactionid.actions.concurrency"></a>

If long-running transactions are causing the frequent occurrence of `Lock:transactionid`, try the following strategies:
+ Keep row locks out of long-running transactions.
+ Limit the length of queries by implementing autocommit whenever possible.

# Lock:tuple
<a name="apg-waits.locktuple"></a>

The `Lock:tuple` event occurs when a backend process is waiting to acquire a lock on a tuple.

**Topics**
+ [Supported engine versions](#apg-waits.locktuple.context.supported)
+ [Context](#apg-waits.locktuple.context)
+ [Likely causes of increased waits](#apg-waits.locktuple.causes)
+ [Actions](#apg-waits.locktuple.actions)

## Supported engine versions
<a name="apg-waits.locktuple.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.locktuple.context"></a>

The event `Lock:tuple` indicates that a backend is waiting to acquire a lock on a tuple while another backend holds a conflicting lock on the same tuple. The following table illustrates a scenario in which sessions generate the `Lock:tuple` event.


|  Time  |  Session 1  |  Session 2  |  Session 3  | 
| --- | --- | --- | --- | 
|  t1  |  Starts a transaction.  |    |    | 
|  t2  |  Updates row 1.  |    |    | 
|  t3  |    |  Updates row 1. The session acquires an exclusive lock on the tuple and then waits for session 1 to release the lock by committing or rolling back.  |    | 
|  t4  |    |    |  Updates row 1. The session waits for session 2 to release the exclusive lock on the tuple.  | 

Or you can simulate this wait event by using the benchmarking tool `pgbench`. Configure a high number of concurrent sessions to update the same row in a table with a custom SQL file.

To learn more about conflicting lock modes, see [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html) in the PostgreSQL documentation. To learn more about `pgbench`, see [pgbench](https://www.postgresql.org/docs/current/pgbench.html) in the PostgreSQL documentation.

## Likely causes of increased waits
<a name="apg-waits.locktuple.causes"></a>

When this event appears more than normal, possibly indicating a performance problem, typical causes include the following:
+ A high number of concurrent sessions are trying to acquire a conflicting lock for the same tuple by running `UPDATE` or `DELETE` statements.
+ Highly concurrent sessions are running a `SELECT` statement using the `FOR UPDATE` or `FOR NO KEY UPDATE` lock modes.
+ Various factors drive application or connection pools to open more sessions to execute the same operations. As new sessions are trying to modify the same rows, DB load can spike, and `Lock:tuple` can appear.

For more information, see [Row-Level Locks](https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS) in the PostgreSQL documentation.

## Actions
<a name="apg-waits.locktuple.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Investigate your application logic](#apg-waits.locktuple.actions.problem)
+ [Find the blocker session](#apg-waits.locktuple.actions.find-blocker)
+ [Reduce concurrency when it is high](#apg-waits.locktuple.actions.concurrency)
+ [Troubleshoot bottlenecks](#apg-waits.locktuple.actions.bottlenecks)

### Investigate your application logic
<a name="apg-waits.locktuple.actions.problem"></a>

Find out whether a blocker session has been in the `idle in transaction` state for long time. If so, consider ending the blocker session as a short-term solution. You can use the `pg_terminate_backend` function. For more information about this function, see [Server Signaling Functions](https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL) in the PostgreSQL documentation.

For a long-term solution, do the following:
+ Adjust the application logic.
+ Use the `idle_in_transaction_session_timeout` parameter. This parameter ends any session with an open transaction that has been idle for longer than the specified amount of time. For more information, see [Client Connection Defaults](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) in the PostgreSQL documentation.
+ Use autocommit as much as possible. For more information, see [SET AUTOCOMMIT](https://www.postgresql.org/docs/current/ecpg-sql-set-autocommit.html) in the PostgreSQL documentation.

### Find the blocker session
<a name="apg-waits.locktuple.actions.find-blocker"></a>

While the `Lock:tuple` wait event is occurring, identify the blocker and blocked session by finding out which locks depend on one another. For more information, see [Lock dependency information](https://wiki.postgresql.org/wiki/Lock_dependency_information) in the PostgreSQL wiki. To analyze past `Lock:tuple` events, use the Aurora function `aurora_stat_backend_waits`. 

The following example queries all sessions, filtering on `tuple` and ordering by `wait_time`.

```
--AURORA_STAT_BACKEND_WAITS
      SELECT a.pid, 
             a.usename, 
             a.app_name, 
             a.current_query,
             a.current_wait_type, 
             a.current_wait_event, 
             a.current_state, 
             wt.type_name AS wait_type, 
             we.event_name AS wait_event, 
             a.waits, 
             a.wait_time
        FROM (SELECT pid, 
                     usename, 
                     left(application_name,16) AS app_name,
                     coalesce(wait_event_type,'CPU') AS current_wait_type,
                     coalesce(wait_event,'CPU') AS current_wait_event, 
                     state AS current_state,
                     left(query,80) as current_query,
                     (aurora_stat_backend_waits(pid)).* 
                FROM pg_stat_activity 
               WHERE pid <> pg_backend_pid()
                 AND usename<>'rdsadmin') a
NATURAL JOIN aurora_stat_wait_type() wt 
NATURAL JOIN aurora_stat_wait_event() we
WHERE we.event_name = 'tuple'
    ORDER BY a.wait_time;

  pid  | usename | app_name |                 current_query                  | current_wait_type | current_wait_event | current_state | wait_type | wait_event | waits | wait_time
-------+---------+----------+------------------------------------------------+-------------------+--------------------+---------------+-----------+------------+-------+-----------
 32136 | sys     | psql     | /*session3*/ update tab set col=1 where col=1; | Lock              | tuple              | active        | Lock      | tuple      |     1 |   1000018
 11999 | sys     | psql     | /*session4*/ update tab set col=1 where col=1; | Lock              | tuple              | active        | Lock      | tuple      |     1 |   1000024
```

### Reduce concurrency when it is high
<a name="apg-waits.locktuple.actions.concurrency"></a>

The `Lock:tuple` event might occur constantly, especially in a busy workload time. In this situation, consider reducing the high concurrency for very busy rows. Often, just a few rows control a queue or the Boolean logic, which makes these rows very busy.

You can reduce concurrency by using different approaches based in the business requirement, application logic, and workload type. For example, you can do the following:
+ Redesign your table and data logic to reduce high concurrency.
+ Change the application logic to reduce high concurrency at the row level.
+ Leverage and redesign queries with row-level locks.
+ Use the `NOWAIT` clause with retry operations.
+ Consider using optimistic and hybrid-locking logic concurrency control.
+ Consider changing the database isolation level.

### Troubleshoot bottlenecks
<a name="apg-waits.locktuple.actions.bottlenecks"></a>

The `Lock:tuple` can occur with bottlenecks such as CPU starvation or maximum usage of Amazon EBS bandwidth. To reduce bottlenecks, consider the following approaches:
+ Scale up your instance class type.
+ Optimize resource-intensive queries.
+ Change the application logic.
+ Archive data that is rarely accessed.

# LWLock:buffer\$1content (BufferContent)
<a name="apg-waits.lockbuffercontent"></a>

The `LWLock:buffer_content` event occurs when a session is waiting to read or write a data page in memory while another session has that page locked for writing. In Aurora PostgreSQL 13 and higher, this wait event is called `BufferContent`.

**Topics**
+ [Supported engine versions](#apg-waits.lockbuffercontent.context.supported)
+ [Context](#apg-waits.lockbuffercontent.context)
+ [Likely causes of increased waits](#apg-waits.lockbuffercontent.causes)
+ [Actions](#apg-waits.lockbuffercontent.actions)

## Supported engine versions
<a name="apg-waits.lockbuffercontent.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.lockbuffercontent.context"></a>

To read or manipulate data, PostgreSQL accesses it through shared memory buffers. To read from the buffer, a process gets a lightweight lock (LWLock) on the buffer content in shared mode. To write to the buffer, it gets that lock in exclusive mode. Shared locks allow other processes to concurrently acquire shared locks on that content. Exclusive locks prevent other processes from getting any type of lock on it.

The `LWLock:buffer_content` (`BufferContent`) event indicates that multiple processes are attempting to get lightweight locks (LWLocks) on contents of a specific buffer.

## Likely causes of increased waits
<a name="apg-waits.lockbuffercontent.causes"></a>

When the `LWLock:buffer_content` (`BufferContent`) event appears more than normal, possibly indicating a performance problem, typical causes include the following:

**Increased concurrent updates to the same data**  
There might be an increase in the number of concurrent sessions with queries that update the same buffer content. This contention can be more pronounced on tables with a lot of indexes.

**Workload data is not in memory**  
When data that the active workload is processing is not in memory, these wait events can increase. This effect is because processes holding locks can keep them longer while they perform disk I/O operations.

**Excessive use of foreign key constraints**  
Foreign key constraints can increase the amount of time a process holds onto a buffer content lock. This effect is because read operations require a shared buffer content lock on the referenced key while that key is being updated.

## Actions
<a name="apg-waits.lockbuffercontent.actions"></a>

We recommend different actions depending on the causes of your wait event. You might identify `LWLock:buffer_content` (`BufferContent`) events by using Amazon RDS Performance Insights or by querying the view `pg_stat_activity`.

**Topics**
+ [Improve in-memory efficiency](#apg-waits.lockbuffercontent.actions.in-memory)
+ [Reduce usage of foreign key constraints](#apg-waits.lockbuffercontent.actions.foreignkey)
+ [Remove unused indexes](#apg-waits.lockbuffercontent.actions.indexes)
+ [Remove duplicate indexes](#apg-waits.lockbuffercontent.actions.duplicate-indexes)
+ [Drop or REINDEX invalid indexes](#apg-waits.lockbuffercontent.actions.invalid-indexes)
+ [Use partial indexes](#apg-waits.lockbuffercontent.actions.partial-indexes)
+ [Remove table and index bloat](#apg-waits.lockbuffercontent.actions.bloat)

### Improve in-memory efficiency
<a name="apg-waits.lockbuffercontent.actions.in-memory"></a>

To increase the chance that active workload data is in memory, partition tables or scale up your instance class. For information about DB instance classes, see [Amazon AuroraDB instance classes](Concepts.DBInstanceClass.md).

Monitor the `BufferCacheHitRatio` metric, which measures the percentage of requests served by the buffer cache of a DB instance in your DB cluster. This metric provides insight into the amount of data being served from memory. A high hit ratio indicates that your DB instance has sufficient memory available for your working data set, while a low ratio suggests that your queries are frequently accessing data from storage.

The cache read hit per table and cache read hit per index under Memory setting section of the [PG Collector](https://github.com/awslabs/pg-collector) report can provide insights into the tables and indexes cache hit ratio.

### Reduce usage of foreign key constraints
<a name="apg-waits.lockbuffercontent.actions.foreignkey"></a>

Investigate workloads experiencing high numbers of `LWLock:buffer_content` (`BufferContent`) wait events for usage of foreign key constraints. Remove unnecessary foreign key constraints.

### Remove unused indexes
<a name="apg-waits.lockbuffercontent.actions.indexes"></a>

For workloads experiencing high numbers of `LWLock:buffer_content` (`BufferContent`) wait events, identify unused indexes and remove them.

The unused indexes section of the [PG Collector](https://github.com/awslabs/pg-collector) report can provide insights into the unused indexes in the database.

### Remove duplicate indexes
<a name="apg-waits.lockbuffercontent.actions.duplicate-indexes"></a>

Identify duplicate indexes and remove them.

The duplicate indexes section of the [PG Collector](https://github.com/awslabs/pg-collector) report can provide insights into the duplicate indexes in the database.

### Drop or REINDEX invalid indexes
<a name="apg-waits.lockbuffercontent.actions.invalid-indexes"></a>

Invalid indexes typically occur when using `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY` and the command fails or is aborted.

Invalid indexes can't be used for queries, though they will still be updated and take up disk space.

The Invalid indexes section of the [PG Collector](https://github.com/awslabs/pg-collector) report can provide insights into the invalid indexes in the database.

### Use partial indexes
<a name="apg-waits.lockbuffercontent.actions.partial-indexes"></a>

Partial indexes can be leveraged to enhance query performance and reduce index size. A partial index is an index built over a subset of a table, with the subset defined by a conditional expression. As detailed in the [partial index](https://www.postgresql.org/docs/current/indexes-partial.html) documentation, partial indexes can reduce the overhead of maintaining indexes, as PostgreSQL does not need to update the index in all cases.

### Remove table and index bloat
<a name="apg-waits.lockbuffercontent.actions.bloat"></a>

Excessive table and index bloat can negatively impact database performance. Bloated tables and indexes increase the active working set size, degrading in-memory efficiency. Additionally, bloat increases storage costs and slows query execution. To diagnose bloat, refer to the [Diagnosing table and index bloat](AuroraPostgreSQL.diag-table-ind-bloat.md). Further, the Fragmentation (Bloat) section of the [PG Collector](https://github.com/awslabs/pg-collector) report can provide insights into tables and indexes bloat.

To address table and index bloat, there are a few options:

**VACUUM FULL**  
`VACUUM FULL` creates a new copy of the table, copying over only the live tuples, and then replaces the old table with the new one while holding an `ACCESS EXCLUSIVE` lock. This prevents any reading or writing to the table, which can cause an outage. Additionally, `VACUUM FULL` will take longer if the table is large.

**pg\$1repack**  
The `pg_repack` is helpful in situations where `VACUUM FULL` might not be suitable. It creates a new table that contains the data of the bloated table, tracks the changes from the original table, and then replaces the original table with the new one. It doesn't lock the original table for read or write operations while it's building the new table. For more information, for how to use `pg_repack`, see [Removing bloat with pg\$1repack](https://docs.aws.amazon.com/prescriptive-guidance/latest/postgresql-maintenance-rds-aurora/pg-repack.html) and [pg\$1repack](https://reorg.github.io/pg_repack/).

**REINDEX**  
The `REINDEX` command can be leveraged to address index bloat. `REINDEX` writes a new version of the index without the dead pages or the empty or nearly-empty pages, thereby reducing the space consumption of the index. For detailed information about the [https://www.postgresql.org/docs/current/sql-reindex.html](https://www.postgresql.org/docs/current/sql-reindex.html) command, please refer to the REINDEX documentation.

After removing bloat from tables and indexes, it may be necessary to increase the autovacuum frequency on those tables. Implementing aggressive autovacuum settings at the table level can help prevent future bloat from occurring. For more information, please refer to the documentation on [https://docs.aws.amazon.com/prescriptive-guidance/latest/postgresql-maintenance-rds-aurora/autovacuum.html](https://docs.aws.amazon.com/prescriptive-guidance/latest/postgresql-maintenance-rds-aurora/autovacuum.html).

# LWLock:buffer\$1mapping
<a name="apg-waits.lwl-buffer-mapping"></a>

This event occurs when a session is waiting to associate a data block with a buffer in the shared buffer pool.

**Note**  
This event appears as `LWLock:buffer_mapping` in Aurora PostgreSQL version 12 and lower, and `LWLock:BufferMapping` in version 13 and higher.

**Topics**
+ [Supported engine versions](#apg-waits.lwl-buffer-mapping.context.supported)
+ [Context](#apg-waits.lwl-buffer-mapping.context)
+ [Causes](#apg-waits.lwl-buffer-mapping.causes)
+ [Actions](#apg-waits.lwl-buffer-mapping.actions)

## Supported engine versions
<a name="apg-waits.lwl-buffer-mapping.context.supported"></a>

This wait event information is relevant for Aurora PostgreSQL version 9.6 and higher.

## Context
<a name="apg-waits.lwl-buffer-mapping.context"></a>

The *shared buffer pool* is an Aurora PostgreSQL memory area that holds all pages that are or were being used by processes. When a process needs a page, it reads the page into the shared buffer pool. The `shared_buffers` parameter sets the shared buffer size and reserves a memory area to store the table and index pages. If you change this parameter, make sure to restart the database. For more information, see [Shared buffers](AuroraPostgreSQL.Tuning.concepts.md#AuroraPostgreSQL.Tuning.concepts.buffer-pool).

The `LWLock:buffer_mapping` wait event occurs in the following scenarios:
+ A process searches the buffer table for a page and acquires a shared buffer mapping lock.
+ A process loads a page into the buffer pool and acquires an exclusive buffer mapping lock.
+ A process removes a page from the pool and acquires an exclusive buffer mapping lock.

## Causes
<a name="apg-waits.lwl-buffer-mapping.causes"></a>

When this event appears more than normal, possibly indicating a performance problem, the database is paging in and out of the shared buffer pool. Typical causes include the following:
+ Large queries
+ Bloated indexes and tables
+ Full table scans
+ A shared pool size that is smaller than the working set

## Actions
<a name="apg-waits.lwl-buffer-mapping.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Monitor buffer-related metrics](#apg-waits.lwl-buffer-mapping.actions.monitor-metrics)
+ [Assess your indexing strategy](#apg-waits.lwl-buffer-mapping.actions.indexes)
+ [Reduce the number of buffers that must be allocated quickly](#apg-waits.lwl-buffer-mapping.actions.buffers)

### Monitor buffer-related metrics
<a name="apg-waits.lwl-buffer-mapping.actions.monitor-metrics"></a>

When `LWLock:buffer_mapping` waits spike, investigate the buffer hit ratio. You can use these metrics to get a better understanding of what is happening in the buffer cache. Examine the following metrics:

`BufferCacheHitRatio`  
This Amazon CloudWatch metric measures the percentage of requests that are served by the buffer cache of a DB instance in your DB cluster. You might see this metric decrease in the lead-up to the `LWLock:buffer_mapping` wait event.

`blks_hit`  
This Performance Insights counter metric indicates the number of blocks that were retrieved from the shared buffer pool. After the `LWLock:buffer_mapping` wait event appears, you might observe a spike in `blks_hit`.

`blks_read`  
This Performance Insights counter metric indicates the number of blocks that required I/O to be read into the shared buffer pool. You might observe a spike in `blks_read` in the lead-up to the `LWLock:buffer_mapping` wait event.

### Assess your indexing strategy
<a name="apg-waits.lwl-buffer-mapping.actions.indexes"></a>

To confirm that your indexing strategy is not degrading performance, check the following:

Index bloat  
Ensure that index and table bloat aren't leading to unnecessary pages being read into the shared buffer. If your tables contain unused rows, consider archiving the data and removing the rows from the tables. You can then rebuild the indexes for the resized tables.

Indexes for frequently used queries  
To determine whether you have the optimal indexes, monitor DB engine metrics in Performance Insights. The `tup_returned` metric shows the number of rows read. The `tup_fetched` metric shows the number of rows returned to the client. If `tup_returned` is significantly larger than `tup_fetched`, the data might not be properly indexed. Also, your table statistics might not be current.

### Reduce the number of buffers that must be allocated quickly
<a name="apg-waits.lwl-buffer-mapping.actions.buffers"></a>

To reduce the `LWLock:buffer_mapping` wait events, try to reduce the number of buffers that must be allocated quickly. One strategy is to perform smaller batch operations. You might be able to achieve smaller batches by partitioning your tables.

# LWLock:BufferIO (IPC:BufferIO)
<a name="apg-waits.lwlockbufferio"></a>

The `LWLock:BufferIO` event occurs when Aurora PostgreSQL or RDS for PostgreSQL is waiting for other processes to finish their input/output (I/O) operations when concurrently trying to access a page. Its purpose is for the same page to be read into the shared buffer.

**Topics**
+ [Relevant engine versions](#apg-waits.lwlockbufferio.context.supported)
+ [Context](#apg-waits.lwlockbufferio.context)
+ [Causes](#apg-waits.lwlockbufferio.causes)
+ [Actions](#apg-waits.lwlockbufferio.actions)

## Relevant engine versions
<a name="apg-waits.lwlockbufferio.context.supported"></a>

This wait event information is relevant for all Aurora PostgreSQL versions. For Aurora PostgreSQL 12 and earlier versions this wait event is named as lwlock:buffer\$1io whereas in Aurora PostgreSQL 13 version it is named as lwlock:bufferio. From Aurora PostgreSQL 14 version BufferIO wait event moved from `LWLock` to `IPC` wait event type (IPC:BufferIO). 

## Context
<a name="apg-waits.lwlockbufferio.context"></a>

Each shared buffer has an I/O lock that is associated with the `LWLock:BufferIO` wait event, each time a block (or a page) has to be retrieved outside the shared buffer pool.

This lock is used to handle multiple sessions that all require access to the same block. This block has to be read from outside the shared buffer pool, defined by the `shared_buffers` parameter.

As soon as the page is read inside the shared buffer pool, the `LWLock:BufferIO` lock is released.

**Note**  
The `LWLock:BufferIO` wait event precedes the [IO:DataFileRead](apg-waits.iodatafileread.md) wait event. The `IO:DataFileRead` wait event occurs while data is being read from storage.

For more information on lightweight locks, see [Locking Overview](https://github.com/postgres/postgres/blob/65dc30ced64cd17f3800ff1b73ab1d358e92efd8/src/backend/storage/lmgr/README#L20).

## Causes
<a name="apg-waits.lwlockbufferio.causes"></a>

Common causes for the `LWLock:BufferIO` event to appear in top waits include the following:
+ Multiple backends or connections trying to access the same page that's also pending an I/O operation
+ The ratio between the size of the shared buffer pool (defined by the `shared_buffers` parameter) and the number of buffers needed by the current workload
+ The size of the shared buffer pool not being well balanced with the number of pages being evicted by other operations
+ Large or bloated indexes that require the engine to read more pages than necessary into the shared buffer pool
+ Lack of indexes that forces the DB engine to read more pages from the tables than necessary
+ Sudden spikes for database connections trying to perform operations on the same page

## Actions
<a name="apg-waits.lwlockbufferio.actions"></a>

We recommend different actions depending on the causes of your wait event:
+ Observe Amazon CloudWatch metrics for correlation between sharp decreases in the `BufferCacheHitRatio` and `LWLock:BufferIO` wait events. This effect can mean that you have a small shared buffers setting. You might need to increase it or scale up your DB instance class. You can split your workload into more reader nodes.
+ Verify whether you have unused indexes, then remove them.
+ Use partitioned tables (which also have partitioned indexes). Doing this helps to keep index reordering low and reduces its impact.
+ Avoid indexing columns unnecessarily.
+ Prevent sudden database connection spikes by using a connection pool.
+ Restrict the maximum number of connections to the database as a best practice.

# LWLock:lock\$1manager
<a name="apg-waits.lw-lock-manager"></a>

This event occurs when the Aurora PostgreSQL engine maintains the shared lock's memory area to allocate, check, and deallocate a lock when a fast path lock isn't possible.

**Topics**
+ [Supported engine versions](#apg-waits.lw-lock-manager.context.supported)
+ [Context](#apg-waits.lw-lock-manager.context)
+ [Likely causes of increased waits](#apg-waits.lw-lock-manager.causes)
+ [Actions](#apg-waits.lw-lock-manager.actions)

## Supported engine versions
<a name="apg-waits.lw-lock-manager.context.supported"></a>

This wait event information is relevant for Aurora PostgreSQL version 9.6 and higher. 

## Context
<a name="apg-waits.lw-lock-manager.context"></a>

When you issue a SQL statement, Aurora PostgreSQL records locks to protect the structure, data, and integrity of your database during concurrent operations. The engine can achieve this goal using a fast path lock or a path lock that isn't fast. A path lock that isn't fast is more expensive and creates more overhead than a fast path lock.

### Fast path locking
<a name="apg-waits.lw-lock-manager.context.fast-path"></a>

To reduce the overhead of locks that are taken and released frequently, but that rarely conflict, backend processes can use fast path locking. The database uses this mechanism for locks that meet the following criteria:
+ They use the DEFAULT lock method.
+ They represent a lock on a database relation rather than a shared relation.
+ They are weak locks that are unlikely to conflict.
+ The engine can quickly verify that no conflicting locks can possibly exist.

The engine can't use fast path locking when either of the following conditions is true:
+ The lock doesn't meet the preceding criteria.
+ No more slots are available for the backend process.

For more information about fast path locking, see [fast path](https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README#L70-L76) in the PostgreSQL lock manager README and [pg-locks](https://www.postgresql.org/docs/15/view-pg-locks.html) in the PostgreSQL documentation. 

### Example of a scaling problem for the lock manager
<a name="apg-waits.lw-lock-manager.context.lock-manager"></a>

In this example, a table named `purchases` stores five years of data, partitioned by day. Each partition has two indexes. The following sequence of events occurs:

1. You query many days worth of data, which requires the database to read many partitions.

1. The database creates a lock entry for each partition. If partition indexes are part of the optimizer access path, the database creates a lock entry for them, too.

1. When the number of requested locks entries for the same backend process is higher than 16, which is the value of `FP_LOCK_SLOTS_PER_BACKEND`, the lock manager uses the non–fast path lock method.

Modern applications might have hundreds of sessions. If concurrent sessions are querying the parent without proper partition pruning, the database might create hundreds or even thousands of non–fast path locks. Typically, when this concurrency is higher than the number of vCPUs, the `LWLock:lock_manager` wait event appears.

**Note**  
The `LWLock:lock_manager` wait event isn't related to the number of partitions or indexes in a database schema. Instead, it's related to the number of non–fast path locks that the database must control.

## Likely causes of increased waits
<a name="apg-waits.lw-lock-manager.causes"></a>

When the `LWLock:lock_manager` wait event occurs more than normal, possibly indicating a performance problem, the most likely causes of sudden spikes are as follows:
+ Concurrent active sessions are running queries that don't use fast path locks. These sessions also exceed the maximum vCPU.
+ A large number of concurrent active sessions are accessing a heavily partitioned table. Each partition has multiple indexes.
+ The database is experiencing a connection storm. By default, some applications and connection pool software create more connections when the database is slow. This practice makes the problem worse. Tune your connection pool software so that connection storms don't occur.
+ A large number of sessions query a parent table without pruning partitions.
+ A data definition language (DDL), data manipulation language (DML), or a maintenance command exclusively locks either a busy relation or tuples that are frequently accessed or modified.

## Actions
<a name="apg-waits.lw-lock-manager.actions"></a>

We recommend different actions depending on the causes of your wait event.

**Topics**
+ [Use partition pruning](#apg-waits.lw-lock-manager.actions.pruning)
+ [Remove unnecessary indexes](#apg-waits.lw-lock-manager.actions.indexes)
+ [Tune your queries for fast path locking](#apg-waits.lw-lock-manager.actions.tuning)
+ [Tune for other wait events](#apg-waits.lw-lock-manager.actions.other-waits)
+ [Reduce hardware bottlenecks](#apg-waits.lw-lock-manager.actions.hw-bottlenecks)
+ [Use a connection pooler](#apg-waits.lw-lock-manager.actions.pooler)
+ [Upgrade your Aurora PostgreSQL version](#apg-waits.lw-lock-manager.actions.pg-version)

### Use partition pruning
<a name="apg-waits.lw-lock-manager.actions.pruning"></a>

*Partition pruning* is a query optimization strategy that excludes unneeded partitions from table scans, thereby improving performance. Partition pruning is turned on by default. If it is turned off, turn it on as follows.

```
SET enable_partition_pruning = on;
```

Queries can take advantage of partition pruning when their `WHERE` clause contains the column used for the partitioning. For more information, see [Partition Pruning](https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITION-PRUNING) in the PostgreSQL documentation.

### Remove unnecessary indexes
<a name="apg-waits.lw-lock-manager.actions.indexes"></a>

Your database might contain unused or rarely used indexes. If so, consider deleting them. Do either of the following:
+ Learn how to find unnecessary indexes by reading [Unused Indexes](https://wiki.postgresql.org/wiki/Index_Maintenance#Unused_Indexes) in the PostgreSQL wiki.
+ Run PG Collector. This SQL script gathers database information and presents it in a consolidated HTML report. Check the "Unused indexes" section. For more information, see [pg-collector](https://github.com/awslabs/pg-collector) in the AWS Labs GitHub repository.

### Tune your queries for fast path locking
<a name="apg-waits.lw-lock-manager.actions.tuning"></a>

To find out whether your queries use fast path locking, query the `fastpath` column in the `pg_locks` table. If your queries aren't using fast path locking, try to reduce number of relations per query to fewer than 16.

### Tune for other wait events
<a name="apg-waits.lw-lock-manager.actions.other-waits"></a>

If `LWLock:lock_manager` is first or second in the list of top waits, check whether the following wait events also appear in the list:
+ `Lock:Relation`
+ `Lock:transactionid`
+ `Lock:tuple`

If the preceding events appear high in the list, consider tuning these wait events first. These events can be a driver for `LWLock:lock_manager`.

### Reduce hardware bottlenecks
<a name="apg-waits.lw-lock-manager.actions.hw-bottlenecks"></a>

You might have a hardware bottleneck, such as CPU starvation or maximum usage of your Amazon EBS bandwidth. In these cases, consider reducing the hardware bottlenecks. Consider the following actions:
+ Scale up your instance class.
+ Optimize queries that consume large amounts of CPU and memory.
+ Change your application logic.
+ Archive your data.

For more information about CPU, memory, and EBS network bandwidth, see [Amazon RDS Instance Types](https://aws.amazon.com/rds/instance-types/).

### Use a connection pooler
<a name="apg-waits.lw-lock-manager.actions.pooler"></a>

If your total number of active connections exceeds the maximum vCPU, more OS processes require CPU than your instance type can support. In this case, consider using or tuning a connection pool. For more information about the vCPUs for your instance type, see [Amazon RDS Instance Types](https://aws.amazon.com/rds/instance-types/).

For more information about connection pooling, see the following resources:
+ [Amazon RDS Proxyfor Aurora](rds-proxy.md)
+ [pgbouncer](http://www.pgbouncer.org/usage.html)
+ [Connection Pools and Data Sources](https://www.postgresql.org/docs/7.4/jdbc-datasource.html) in the *PostgreSQL Documentation*

### Upgrade your Aurora PostgreSQL version
<a name="apg-waits.lw-lock-manager.actions.pg-version"></a>

If your current version of Aurora PostgreSQL is lower than 12, upgrade to version 12 or higher. PostgreSQL versions 12 and 13 have an improved partition mechanism. For more information about version 12, see [PostgreSQL 12.0 Release Notes]( https://www.postgresql.org/docs/release/12.0/). For more information about upgrading Aurora PostgreSQL, see [Database engine updates for Amazon Aurora PostgreSQL](AuroraPostgreSQL.Updates.md).

# LWLock:MultiXact
<a name="apg-waits.lwlockmultixact"></a>

The `LWLock:MultiXactMemberBuffer`, `LWLock:MultiXactOffsetBuffer`, `LWLock:MultiXactMemberSLRU`, and `LWLock:MultiXactOffsetSLRU` wait events indicate that a session is waiting to retrieve a list of transactions that modifies the same row in a given table. 
+ `LWLock:MultiXactMemberBuffer` – A process is waiting for I/O on a simple least-recently used (SLRU) buffer for a multixact member.
+ `LWLock:MultiXactMemberSLRU` – A process is waiting to access the simple least-recently used (SLRU) cache for a multixact member.
+ `LWLock:MultiXactOffsetBuffer` – A process is waiting for I/O on a simple least-recently used (SLRU) buffer for a multixact offset.
+ `LWLock:MultiXactOffsetSLRU` – A process is waiting to access the simple least-recently used (SLRU) cache for a multixact offset.

**Topics**
+ [Supported engine versions](#apg-waits.xactsync.context.supported)
+ [Context](#apg-waits.lwlockmultixact.context)
+ [Likely causes of increased waits](#apg-waits.lwlockmultixact.causes)
+ [Actions](#apg-waits.lwlockmultixact.actions)

## Supported engine versions
<a name="apg-waits.xactsync.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Context
<a name="apg-waits.lwlockmultixact.context"></a>

A *multixact* is a data structure that stores a list of transaction IDs (XIDs) that modify the same table row. When a single transaction references a row in a table, the transaction ID is stored in the table header row. When multiple transactions reference the same row in a table, the list of transaction IDs is stored in the multixact data structure. The multixact wait events indicate that a session is retrieving from the data structure the list of transactions that refer to a given row in a table.

## Likely causes of increased waits
<a name="apg-waits.lwlockmultixact.causes"></a>

Three common causes of multixact use are as follows:
+ **Sub-transactions from explicit savepoints** – Explicitly creating a savepoint in your transactions spawns new transactions for the same row. For example, using `SELECT FOR UPDATE`, then `SAVEPOINT`, and then `UPDATE`. 

  Some drivers, object-relational mappers (ORMs), and abstraction layers have configuration options for automatically wrapping all operations with savepoints. This can generate many multixact wait events in some workloads. The PostgreSQL JDBC Driver's `autosave` option is an example of this. For more information, see [pgJDBC](https://jdbc.postgresql.org/) in the PostgreSQL JDBC documentation. Another example is the PostgreSQL ODBC driver and its `protocol` option. For more information, see [psqlODBC Configuration Options](https://odbc.postgresql.org/docs/config.html) in the PostgreSQL ODBC driver documentation. 
+ **Sub-transactions from PL/pgSQL EXCEPTION clauses** – Each `EXCEPTION` clause that you write in your PL/pgSQL functions or procedures creates a `SAVEPOINT` internally.
+ **Foreign keys** – Multiple transactions acquire a shared lock on the parent record.

When a given row is included in a multiple transaction operation, processing the row requires retrieving transaction IDs from the `multixact` listings. If lookups can't get the multixact from the memory cache, the data structure must be read from the Aurora storage layer. This I/O from storage means that SQL queries can take longer. Memory cache misses can start occurring with heavy usage due to a large number of multiple transactions. All these factors contribute to an increase in this wait event.

## Actions
<a name="apg-waits.lwlockmultixact.actions"></a>

We recommend different actions depending on the causes of your wait event. Some of these actions can help in immediate reduction of the wait events. But, others might require investigation and correction to scale your workload.

**Topics**
+ [Perform vacuum freeze on tables with this wait event](#apg-waits.lwlockmultixact.actions.vacuumfreeze)
+ [Increase autovacuum frequency on tables with this wait event](#apg-waits.lwlockmultixact.actions.autovacuum)
+ [Increase memory parameters](#apg-waits.lwlockmultixact.actions.memoryparam)
+ [Reduce long-running transactions](#apg-waits.lwlockmultixact.actions.longtransactions)
+ [Long term actions](#apg-waits.lwlockmultixact.actions.longactions)

### Perform vacuum freeze on tables with this wait event
<a name="apg-waits.lwlockmultixact.actions.vacuumfreeze"></a>

If this wait event spikes suddenly and affects your production environment, you can use any of the following temporary methods to reduce its count.
+ Use *VACUUM FREEZE* on the affected table or table partition to resolve the issue immediately. For more information, see [VACUUM](https://www.postgresql.org/docs/current/sql-vacuum.html).
+ Use the VACUUM (FREEZE, INDEX\$1CLEANUP FALSE) clause to perform a quick vacuum by skipping the indexes. For more information, see [Vacuuming a table as quickly as possible](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.LargeIndexes.html#Appendix.PostgreSQL.CommonDBATasks.Autovacuum.LargeIndexes.Executing).

### Increase autovacuum frequency on tables with this wait event
<a name="apg-waits.lwlockmultixact.actions.autovacuum"></a>

After scanning all tables in all databases, VACUUM will eventually remove multixacts, and their oldest multixact values are advanced. For more information, see [Multixacts and Wraparound](https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND). To keep the LWLock:MultiXact wait events to its minimum, you must run the VACUUM as often as necessary. To do so, ensure that the VACUUM in your Aurora PostgreSQL DB cluster is configured optimally.

If using VACUUM FREEZE on the affected table or table partition resolves the wait event issue, we recommend using a scheduler, such as `pg_cron`, to perform the VACUUM instead of adjusting autovacuum at the instance level. 

For the autovacuum to happen more frequently, you can reduce the value of the storage parameter `autovacuum_multixact_freeze_max_age` in the affected table. For more information, see [autovacuum\$1multixact\$1freeze\$1max\$1age](https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-MULTIXACT-FREEZE-MAX-AGE).

### Increase memory parameters
<a name="apg-waits.lwlockmultixact.actions.memoryparam"></a>

You can optimize memory usage for multixact caches by adjusting the following parameters. These settings control how much memory is reserved for these caches, which can help reduce multixact wait events in your workload. We recommend starting with the following values:

For Aurora PostgreSQL 17 and later:  
+ `multixact_offset_buffers` = 128
+ `multixact_member_buffers` = 256

For Aurora PostgreSQL 16 and earlier:  
+ `multixact_offsets_cache_size` = 128
+ `multixact_members_cache_size` = 256

**Note**  
In Aurora PostgreSQL 17, parameter names were changed from `multixact_offsets_cache_size` to `multixact_offset_buffers` and from `multixact_members_cache_size` to `multixact_member_buffers` to align with community PostgreSQL 17.

You can set these parameters at the cluster level so that all instances in your cluster remain consistent. We recommend you to test and adjust the values to best suit your specific workload requirements and instance class. You must reboot the writer instance for the parameter changes to take effect.

The parameters are expressed in terms of multixact cache entries. Each cache entry uses `8 KB` of memory. To calculate the total memory reserved, multiply each parameter value by `8 KB`. For example, if you set a parameter to 128, the total reserved memory would be `128 * 8 KB = 1 MB`.

### Reduce long-running transactions
<a name="apg-waits.lwlockmultixact.actions.longtransactions"></a>

Long-running transaction causes the vacuum to retain its information until the transaction is committed or until the read-only transaction is closed. We recommend that you proactively monitor and manage long-running transactions. For more information, see [Database has long running idle in transaction connection](PostgreSQL.Tuning_proactive_insights.md#proactive-insights.idle-txn). Try to modify your application to avoid or minimize your use of long-running transactions.

### Long term actions
<a name="apg-waits.lwlockmultixact.actions.longactions"></a>

Examine your workload to discover the cause for the multixact spillover. You must fix the issue in order to scale your workload and reduce the wait event.
+ You must analyze the DDL (data definition language) used to create your tables. Make sure that the table structures and indexes are well designed.
+ When the affected tables have foreign keys, determine whether they are needed or if there is another way to enforce referential integrity.
+ When a table has large unused indexes, it can cause autovacuum to not fit your workload and might block it from running. To avoid this, check for unused indexes and remove them completely. For more information, see [Managing autovacuum with large indexes](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Autovacuum.LargeIndexes.html).
+ Reduce the use of savepoints in your transactions.

# LWLock:pg\$1stat\$1statements
<a name="apg-rpg-lwlockpgstat"></a>

The LWLock:pg\$1stat\$1statements wait event occurs when the `pg_stat_statements` extension takes an exclusive lock on the hash table that tracks SQL statements. This happens in the following scenarios:
+ When the number of tracked statements reaches the configured `pg_stat_statements.max` parameter value and there is a need to make room for more entries, the extension performs a sort on the number of calls, removes the 5% of the least-executed statements, and re-populates the hash with the remaining entries.
+ When `pg_stat_statements` performs a `garbage collection` operation to the `pgss_query_texts.stat` file on disk and rewrites the file.

**Topics**
+ [Supported engine versions](#apg-rpg-lwlockpgstat.supported)
+ [Context](#apg-rpg-lwlockpgstat.context)
+ [Likely causes of increased waits](#apg-rpg-lwlockpgstat.causes)
+ [Actions](#apg-rpg-lwlockpgstat.actions)

## Supported engine versions
<a name="apg-rpg-lwlockpgstat.supported"></a>

 This wait event information is supported for all versions of Aurora PostgreSQL. 

## Context
<a name="apg-rpg-lwlockpgstat.context"></a>

**Understanding the pg\$1stat\$1statements extension** – The pg\$1stat\$1statements extension tracks SQL statement execution statistics in a hash table. The extension tracks SQL statements up to the limit defined by the `pg_stat_statements.max` parameter. This parameter determines the maximum number of statements that can be tracked which corresponds to the maximum number of rows in the pg\$1stat\$1statements view.

**Statement statistics persistence** – The extension persists statement statistics across instance restarts by:
+ Writing data to a file named pg\$1stat\$1statements.stat
+ Using the pg\$1stat\$1statements.save parameter to control persistence behavior

When pg\$1stat\$1statements.save is set to:
+ on (default): Statistics are saved at shutdown and reloaded at server start
+ off: Statistics are neither saved at shutdown nor reloaded at server start

**Query text storage** – The extension stores the text of tracked queries in a file named `pgss_query_texts.stat`. This file can grow to double the average size of all tracked SQL statements before garbage collection occurs. The extension requires an exclusive lock on the hash table during cleanup operations and rewrite `pgss_query_texts.stat` file.

**Statement deallocation process** – When the number of tracked statements reaches the `pg_stat_statements.max` limit and new statements need to be tracked, the extension:
+ Takes an exclusive lock (LWLock:pg\$1stat\$1statements) on the hash table.
+ Loads existing data into local memory.
+ Performs a quicksort based on the number of calls.
+ Removes the least-called statements (bottom 5%).
+ Re-populates the hash table with the remaining entries.

**Monitoring statement deallocation** – In PostgreSQL 14 and later, you can monitor statement deallocation using the pg\$1stat\$1statements\$1info view. This view includes a dealloc column that shows how many times statements were deallocated to make room for new ones

If the deallocation of statements occurs frequently, it will lead to more frequent garbage collection of the `pgss_query_texts.stat` file on disk.

## Likely causes of increased waits
<a name="apg-rpg-lwlockpgstat.causes"></a>

The typical causes of increased `LWLock:pg_stat_statements` waits include:
+ An increase in the number of unique queries used by the application.
+ The `pg_stat_statements.max` parameter value being small compared to the number of unique queries being used.

## Actions
<a name="apg-rpg-lwlockpgstat.actions"></a>

We recommend different actions depending on the causes of your wait event. You might identify `LWLock:pg_stat_statements` events by using Amazon RDS Performance Insights or by querying the view `pg_stat_activity`.

Adjust the following `pg_stat_statements` parameters to control tracking behavior and reduce LWLock:pg\$1stat\$1 statements wait events.

**Topics**
+ [Disable pg\$1stat\$1statements.track parameter](#apg-rpg-lwlockpgstat.actions.disabletrack)
+ [Increase pg\$1stat\$1statements.max parameter](#apg-rpg-lwlockpgstat.actions.increasemax)
+ [Disable pg\$1stat\$1statements.track\$1utility parameter](#apg-rpg-lwlockpgstat.actions.disableutility)

### Disable pg\$1stat\$1statements.track parameter
<a name="apg-rpg-lwlockpgstat.actions.disabletrack"></a>

If the LWLock:pg\$1stat\$1statements wait event is adversely impacting database performance, and a rapid solution is required before further analysis of the `pg_stat_statements` view to identify the root cause, the `pg_stat_statements.track` parameter can be disabled by setting it to `none`. This will disable the collection of statement statistics.

### Increase pg\$1stat\$1statements.max parameter
<a name="apg-rpg-lwlockpgstat.actions.increasemax"></a>

To reduce deallocation and minimize garbage collection of the `pgss_query_texts.stat` file on disk, increase the value of the `pg_stat_statements.max` parameter. The default value is `5,000`.

**Note**  
The `pg_stat_statements.max` parameter is static. You must restart your DB instance to apply any changes to this parameter. 

### Disable pg\$1stat\$1statements.track\$1utility parameter
<a name="apg-rpg-lwlockpgstat.actions.disableutility"></a>

You can analyze the pg\$1stat\$1statements view to determine which utility commands are consuming the most resources tracked by `pg_stat_statements`.

The `pg_stat_statements.track_utility` parameter controls whether the module tracks utility commands, which include all commands except SELECT, INSERT, UPDATE, DELETE, and MERGE. By default, this parameter is set to `on`.

For example, when your application uses many savepoint queries, which are inherently unique, it can increase statement deallocation. To address this, you can disable the `pg_stat_statements.track_utility` parameter to stop `pg_stat_statements` from tracking savepoint queries.

**Note**  
The `pg_stat_statements.track_utility` parameter is a dynamic parameter. You can change its value without restarting your database instance.

**Example of unique save point queries in pg\$1stat\$1statements**  <a name="savepoint-queries"></a>

```
                     query                       |       queryid       
-------------------------------------------------+---------------------
 SAVEPOINT JDBC_SAVEPOINT_495701                 | -7249565344517699703
 SAVEPOINT JDBC_SAVEPOINT_1320                   | -1572997038849006629
 SAVEPOINT JDBC_SAVEPOINT_26739                  |  54791337410474486
 SAVEPOINT JDBC_SAVEPOINT_1294466                |  8170064357463507593
 ROLLBACK TO SAVEPOINT JDBC_SAVEPOINT_65016      | -33608214779996400
 SAVEPOINT JDBC_SAVEPOINT_14185                  | -2175035613806809562
 SAVEPOINT JDBC_SAVEPOINT_45837                  | -6201592986750645383
 SAVEPOINT JDBC_SAVEPOINT_1324                   |  6388797791882029332
```

PostgreSQL 17 introduces several enhancements for utility command tracking:
+ Savepoint names are now displayed as constants.
+ Global Transaction IDs (GIDs) of two-phase commit commands are now displayed as constants.
+ Names of DEALLOCATE statements are shown as constants.
+ CALL parameters are now displayed as constants.

# Timeout:PgSleep
<a name="apg-waits.timeoutpgsleep"></a>

The `Timeout:PgSleep` event occurs when a server process has called the `pg_sleep` function and is waiting for the sleep timeout to expire.

**Topics**
+ [Supported engine versions](#apg-waits.timeoutpgsleep.context.supported)
+ [Likely causes of increased waits](#apg-waits.timeoutpgsleep.causes)
+ [Actions](#apg-waits.timeoutpgsleep.actions)

## Supported engine versions
<a name="apg-waits.timeoutpgsleep.context.supported"></a>

This wait event information is supported for all versions of Aurora PostgreSQL.

## Likely causes of increased waits
<a name="apg-waits.timeoutpgsleep.causes"></a>

This wait event occurs when an application, stored function, or user issues a SQL statement that calls one of the following functions:
+ `pg_sleep`
+ `pg_sleep_for`
+ `pg_sleep_until`

The preceding functions delay execution until the specified number of seconds have elapsed. For example, `SELECT pg_sleep(1)` pauses for 1 second. For more information, see [Delaying Execution](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-DELAY) in the PostgreSQL documentation.

## Actions
<a name="apg-waits.timeoutpgsleep.actions"></a>

Identify the statement that was running the `pg_sleep` function. Determine if the use of the function is appropriate.