Optimizing binary log replication for Aurora MySQL
Following, you can learn how to optimize binary log replication performance and troubleshoot related issues in Aurora MySQL.
Tip
This discussion presumes that you are familiar with the MySQL binary log replication mechanism and how it works. For
background information, see Replication Implementation
Multithreaded binary log replication
With multithreaded binary log replication, a SQL thread reads events from the relay log and queues them up for SQL worker threads to apply. The SQL worker threads are managed by a coordinator thread. The binary log events are applied in parallel when possible.
Multithreaded binary log replication is supported in Aurora MySQL version 3, and in Aurora MySQL version 2.12.1 and higher.
When an Aurora MySQL DB instance is configured to use binary log replication, by
default the replica instance uses single-threaded replication for Aurora MySQL
versions lower than 3.04. To enable multithreaded replication, you update the
replica_parallel_workers
parameter to a value greater than zero
in your custom parameter group.
For Aurora MySQL version 3.04 and higher, replication is multithreaded by default, with
replica_parallel_workers
set to 4
. You can modify this parameter in your custom parameter
group.
The following configuration options help you to fine-tune multithreaded replication. For usage information, see Replication and Binary Logging Options and
Variables
Optimal configuration depends on several factors. For example, performance for binary log replication is influenced by your database workload characteristics and the DB instance class the replica is running on. Thus, we recommend that you thoroughly test all changes to these configuration parameters before applying new parameter settings to a production instance:
-
binlog_group_commit_sync_delay
-
binlog_group_commit_sync_no_delay_count
-
binlog_transaction_dependency_history_size
-
binlog_transaction_dependency_tracking
-
replica_preserve_commit_order
-
replica_parallel_type
-
replica_parallel_workers
In Aurora MySQL version 3.06 and higher, you can improve performance for binary log replicas when replicating
transactions for large tables with more than one secondary index. This feature introduces a thread pool to apply
secondary index changes in parallel on a binlog replica. The feature is controlled by the aurora_binlog_replication_sec_index_parallel_workers
DB cluster parameter, which controls the total number
of parallel threads available to apply the secondary index changes. The parameter is set to 0
(disabled) by
default. Enabling this feature doesn't require an instance restart. To enable this feature, stop ongoing replication,
set the desired number of parallel worker threads, and then start replication again.
Optimizing binlog replication
In Aurora MySQL 2.10 and higher, Aurora automatically applies an optimization known as the binlog I/O cache to binary log replication. By caching the most recently committed binlog events, this optimization is designed to improve binlog dump thread performance while limiting the impact to foreground transactions on the binlog source instance.
Note
This memory used for this feature is independent of the MySQL binlog_cache
setting.
This feature doesn't apply to Aurora DB instances that use the db.t2
and db.t3
instance classes.
You don't need to adjust any configuration parameters to turn on this optimization. In particular, if you had adjusted the
configuration parameter aurora_binlog_replication_max_yield_seconds
to a nonzero value in earlier Aurora MySQL versions, set it
back to zero for currently available versions.
The status variables aurora_binlog_io_cache_reads
and aurora_binlog_io_cache_read_requests
help you to monitor
how often the data is read from the binlog I/O cache.
-
aurora_binlog_io_cache_read_requests
shows the number of binlog I/O read requests from the cache. -
aurora_binlog_io_cache_reads
shows the number of binlog I/O reads that retrieve information from the cache.
The following SQL query computes the percentage of binlog read requests that take advantage of the cached information. In this case, the closer the ratio is to 100, the better it is.
mysql> SELECT (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='aurora_binlog_io_cache_reads') / (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME='aurora_binlog_io_cache_read_requests') * 100 as binlog_io_cache_hit_ratio; +---------------------------+ | binlog_io_cache_hit_ratio | +---------------------------+ | 99.99847949080622 | +---------------------------+
The binlog I/O cache feature also includes new metrics related to the binlog dump threads. Dump threads are the threads that are created when new binlog replicas are connected to the binlog source instance.
The dump thread metrics are printed to the database log every 60 seconds with the prefix [Dump thread
metrics]
. The metrics include information for each binlog replica such as Secondary_id
,
Secondary_uuid
, binlog file name, and the position that each replica is reading. The metrics also
include Bytes_behind_primary
representing the distance in bytes between replication source and replica.
This metric measures the lag of the replica I/O thread. That figure is different from the lag of the replica SQL applier
thread, which is represented by the seconds_behind_master
metric on the binlog replica. You can determine
whether binlog replicas are catching up to the source or falling behind by checking whether the distance decreases or
increases.