Optimizing binary log replication for Aurora MySQL - Amazon Aurora

Optimizing binary log replication for Aurora MySQL

Following, you can learn how to optimize binary log replication performance and troubleshoot related issues in Aurora MySQL.

Tip

This discussion presumes that you are familiar with the MySQL binary log replication mechanism and how it works. For background information, see Replication Implementation in the MySQL documentation.

Multithreaded binary log replication

With multithreaded binary log replication, a SQL thread reads events from the relay log and queues them up for SQL worker threads to apply. The SQL worker threads are managed by a coordinator thread. The binary log events are applied in parallel when possible.

Multithreaded binary log replication is supported in Aurora MySQL version 3, and in Aurora MySQL version 2.12.1 and higher.

When an Aurora MySQL DB instance is configured to use binary log replication, by default the replica instance uses single-threaded replication for Aurora MySQL versions lower than 3.04. To enable multithreaded replication, you update the replica_parallel_workers parameter to a value greater than zero in your custom parameter group.

For Aurora MySQL version 3.04 and higher, replication is multithreaded by default, with replica_parallel_workers set to 4. You can modify this parameter in your custom parameter group.

The following configuration options help you to fine-tune multithreaded replication. For usage information, see Replication and Binary Logging Options and Variables in the MySQL Reference Manual.

Optimal configuration depends on several factors. For example, performance for binary log replication is influenced by your database workload characteristics and the DB instance class the replica is running on. Thus, we recommend that you thoroughly test all changes to these configuration parameters before applying new parameter settings to a production instance:

  • binlog_group_commit_sync_delay

  • binlog_group_commit_sync_no_delay_count

  • binlog_transaction_dependency_history_size

  • binlog_transaction_dependency_tracking

  • replica_preserve_commit_order

  • replica_parallel_type

  • replica_parallel_workers

In Aurora MySQL version 3.06 and higher, you can improve performance for binary log replicas when replicating transactions for large tables with more than one secondary index. This feature introduces a thread pool to apply secondary index changes in parallel on a binlog replica. The feature is controlled by the aurora_binlog_replication_sec_index_parallel_workers DB cluster parameter, which controls the total number of parallel threads available to apply the secondary index changes. The parameter is set to 0 (disabled) by default. Enabling this feature doesn't require an instance restart. To enable this feature, stop ongoing replication, set the desired number of parallel worker threads, and then start replication again.

Optimizing binlog replication

In Aurora MySQL 2.10 and higher, Aurora automatically applies an optimization known as the binlog I/O cache to binary log replication. By caching the most recently committed binlog events, this optimization is designed to improve binlog dump thread performance while limiting the impact to foreground transactions on the binlog source instance.

Note

This memory used for this feature is independent of the MySQL binlog_cache setting.

This feature doesn't apply to Aurora DB instances that use the db.t2 and db.t3 instance classes.

You don't need to adjust any configuration parameters to turn on this optimization. In particular, if you had adjusted the configuration parameter aurora_binlog_replication_max_yield_seconds to a nonzero value in earlier Aurora MySQL versions, set it back to zero for currently available versions.

The status variables aurora_binlog_io_cache_reads and aurora_binlog_io_cache_read_requests help you to monitor how often the data is read from the binlog I/O cache.

  • aurora_binlog_io_cache_read_requests shows the number of binlog I/O read requests from the cache.

  • aurora_binlog_io_cache_reads shows the number of binlog I/O reads that retrieve information from the cache.

The following SQL query computes the percentage of binlog read requests that take advantage of the cached information. In this case, the closer the ratio is to 100, the better it is.

mysql> SELECT (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='aurora_binlog_io_cache_reads') / (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='aurora_binlog_io_cache_read_requests') * 100 as binlog_io_cache_hit_ratio; +---------------------------+ | binlog_io_cache_hit_ratio | +---------------------------+ | 99.99847949080622 | +---------------------------+

The binlog I/O cache feature also includes new metrics related to the binlog dump threads. Dump threads are the threads that are created when new binlog replicas are connected to the binlog source instance.

The dump thread metrics are printed to the database log every 60 seconds with the prefix [Dump thread metrics]. The metrics include information for each binlog replica such as Secondary_id, Secondary_uuid, binlog file name, and the position that each replica is reading. The metrics also include Bytes_behind_primary representing the distance in bytes between replication source and replica. This metric measures the lag of the replica I/O thread. That figure is different from the lag of the replica SQL applier thread, which is represented by the seconds_behind_master metric on the binlog replica. You can determine whether binlog replicas are catching up to the source or falling behind by checking whether the distance decreases or increases.