LWLock:BufferIO (IPC:BufferIO) - Amazon Aurora

LWLock:BufferIO (IPC:BufferIO)

The LWLock:BufferIO event occurs when Aurora PostgreSQL or RDS for PostgreSQL is waiting for other processes to finish their input/output (I/O) operations when concurrently trying to access a page. Its purpose is for the same page to be read into the shared buffer.

Relevant engine versions

This wait event information is relevant for all Aurora PostgreSQL versions. For Aurora PostgreSQL 12 and earlier versions this wait event is named as lwlock:buffer_io whereas in Aurora PostgreSQL 13 version it is named as lwlock:bufferio. From Aurora PostgreSQL 14 version BufferIO wait event moved from LWLock to IPC wait event type (IPC:BufferIO).

Context

Each shared buffer has an I/O lock that is associated with the LWLock:BufferIO wait event, each time a block (or a page) has to be retrieved outside the shared buffer pool.

This lock is used to handle multiple sessions that all require access to the same block. This block has to be read from outside the shared buffer pool, defined by the shared_buffers parameter.

As soon as the page is read inside the shared buffer pool, the LWLock:BufferIO lock is released.

Note

The LWLock:BufferIO wait event precedes the IO:DataFileRead wait event. The IO:DataFileRead wait event occurs while data is being read from storage.

For more information on lightweight locks, see Locking Overview.

Causes

Common causes for the LWLock:BufferIO event to appear in top waits include the following:

  • Multiple backends or connections trying to access the same page that's also pending an I/O operation

  • The ratio between the size of the shared buffer pool (defined by the shared_buffers parameter) and the number of buffers needed by the current workload

  • The size of the shared buffer pool not being well balanced with the number of pages being evicted by other operations

  • Large or bloated indexes that require the engine to read more pages than necessary into the shared buffer pool

  • Lack of indexes that forces the DB engine to read more pages from the tables than necessary

  • Sudden spikes for database connections trying to perform operations on the same page

Actions

We recommend different actions depending on the causes of your wait event:

  • Observe Amazon CloudWatch metrics for correlation between sharp decreases in the BufferCacheHitRatio and LWLock:BufferIO wait events. This effect can mean that you have a small shared buffers setting. You might need to increase it or scale up your DB instance class. You can split your workload into more reader nodes.

  • Tune max_wal_size and checkpoint_timeout based on your workload peak time if you see LWLock:BufferIO coinciding with BufferCacheHitRatio metric dips. Then identify which query might be causing it.

  • Verify whether you have unused indexes, then remove them.

  • Use partitioned tables (which also have partitioned indexes). Doing this helps to keep index reordering low and reduces its impact.

  • Avoid indexing columns unnecessarily.

  • Prevent sudden database connection spikes by using a connection pool.

  • Restrict the maximum number of connections to the database as a best practice.