Optimizing parallel query in Aurora MySQL - Amazon Aurora

Optimizing parallel query in Aurora MySQL

To optimize your DB cluster for parallel query, consider which DB clusters would benefit from parallel query and whether to upgrade for parallel query. Then, tune your workload and create schema objects for parallel query.

Planning for a parallel query cluster

Planning for a DB cluster that has parallel query turned on requires making some choices. These include performing setup steps (either creating or restoring a full Aurora MySQL cluster) and deciding how broadly to turn on parallel query across your DB cluster.

Consider the following as part of planning:

  • If you use Aurora MySQL that's compatible with MySQL 5.7, you must choose Aurora MySQL 2.09 or higher. In this case, you always create a provisioned cluster. Then you turn on parallel query using the aurora_parallel_query parameter.

    If you have an existing Aurora MySQL cluster that's running version 2.09 or higher, you don't have to create a new cluster to use parallel query. You can associate your cluster, or specific DB instances in the cluster, with a parameter group that has the aurora_parallel_query parameter turned on. By doing so, you can reduce the time and effort to set up the relevant data to use with parallel query.

  • Plan for any large tables that you need to reorganize so that you can use parallel query when accessing them. You might need to create new versions of some large tables where parallel query is useful. For example, you might need to remove full-text search indexes. For details, see Creating schema objects to take advantage of parallel query.

Checking Aurora MySQL version compatibility for parallel query

To check which Aurora MySQL versions are compatible with parallel query clusters, use the describe-db-engine-versions AWS CLI command and check the value of the SupportsParallelQuery field. The following code example shows how to check which combinations are available for parallel query clusters in a specified AWS Region. Make sure to specify the full --query parameter string on a single line.

aws rds describe-db-engine-versions --region us-east-1 --engine aurora-mysql \ --query '*[]|[?SupportsParallelQuery == `true`].[EngineVersion]' --output text

The preceding commands produce output similar to the following. The output might vary depending on which Aurora MySQL versions are available in the specified AWS Region.

5.7.mysql_aurora.2.11.1 8.0.mysql_aurora.3.01.0 8.0.mysql_aurora.3.01.1 8.0.mysql_aurora.3.02.0 8.0.mysql_aurora.3.02.1 8.0.mysql_aurora.3.02.2 8.0.mysql_aurora.3.03.0

After you start using parallel query with a cluster, you can monitor performance and remove obstacles to parallel query usage. For those instructions, see Performance tuning for parallel query.

Upgrade considerations for parallel query

Depending on the original and destination versions when you upgrade a parallel query cluster, you might find enhancements in the types of queries that parallel query can optimize. You might also find that you don't need to specify a special engine mode parameter for parallel query. The following sections explain the considerations when you upgrade a cluster that has parallel query turned on.

Upgrading parallel query clusters to Aurora MySQL version 3

Several SQL statements, clauses, and data types have new or improved parallel query support starting in Aurora MySQL version 3. When you upgrade from a release that's earlier than version 3, check whether additional queries can benefit from parallel query optimizations. For information about these parallel query enhancements, see Column data types, Partitioned tables, and Aggregate functions, GROUP BY clauses, and HAVING clauses.

If you're upgrading a parallel query cluster from Aurora MySQL 2.08 or lower, also learn about changes in how to turn on parallel query. To do so, read Upgrading to Aurora MySQL 2.09 and higher.

In Aurora MySQL version 3, hash join optimization is turned on by default. The aurora_disable_hash_join configuration option from earlier versions isn't used.

Upgrading to Aurora MySQL 2.09 and higher

In Aurora MySQL version 2.09 and higher, parallel query works for provisioned clusters and doesn't require the parallelquery engine mode parameter. Thus, you don't need to create a new cluster or restore from an existing snapshot to use parallel query with these versions. You can use the upgrade procedures described in Upgrading the minor version or patch level of an Aurora MySQL DB cluster to upgrade your cluster to such a version. You can upgrade an older cluster regardless of whether it was a parallel query cluster or a provisioned cluster. To reduce the number of choices in the Engine version menu, you can choose Show versions that support the parallel query feature to filter the entries in that menu. Then choose Aurora MySQL 2.09 or higher.

After you upgrade an earlier parallel query cluster to Aurora MySQL 2.09 or higher, you turn on parallel query in the upgraded cluster. Parallel query is turned off by default in these versions, and the procedure for enabling it is different. The hash join optimization is also turned off by default and must be turned on separately. Thus, make sure that you turn on these settings again after the upgrade. For instructions on doing so, see Turning parallel query on and off in Aurora MySQL and Turning on hash join for parallel query clusters.

In particular, you turn on parallel query by using the configuration parameters aurora_parallel_query=ON and aurora_disable_hash_join=OFF instead of aurora_pq_supported and aurora_pq. The aurora_pq_supported and aurora_pq parameters are deprecated in the newer Aurora MySQL versions.

In the upgraded cluster, the EngineMode attribute has the value provisioned instead of parallelquery. To check whether parallel query is available for a specified engine version, now you check the value of the SupportsParallelQuery field in the output of the describe-db-engine-versions AWS CLI command. In earlier Aurora MySQL versions, you checked for the presence of parallelquery in the SupportedEngineModes list.

After you upgrade to Aurora MySQL version 2.09 or higher, you can take advantage of the following features. These features aren't available to parallel query clusters running older Aurora MySQL versions.

Performance tuning for parallel query

To manage the performance of a workload with parallel query, make sure that parallel query is used for the queries where this optimization helps the most.

To do so, you can do the following:

Creating schema objects to take advantage of parallel query

Before you create or modify tables that you plan to use for parallel query, make sure to familiarize yourself with the requirements described in Prerequisites and Limitations.

Because parallel query requires tables to use the ROW_FORMAT=Compact or ROW_FORMAT=Dynamic setting, check your Aurora configuration settings for any changes to the INNODB_FILE_FORMAT configuration option. Issue the SHOW TABLE STATUS statement to confirm the row format for all the tables in a database.

Before changing your schema to turn on parallel query to work with more tables, make sure to test. Your tests should confirm if parallel query results in a net increase in performance for those tables. Also, make sure that the schema requirements for parallel query are otherwise compatible with your goals.

For example, before switching from ROW_FORMAT=Compressed to ROW_FORMAT=Compact or ROW_FORMAT=Dynamic, test the performance of workloads for the original and new tables. Also, consider other potential effects such as increased data volume.