Splitting a shard in a DB shard group - Amazon Aurora

Splitting a shard in a DB shard group

You can split a shard in a DB shard group manually into two smaller shards. This is called a user-initiated shard split.

Aurora PostgreSQL Limitless Database can also split shards when they have very large amounts of data or very high usage. This is called a system-initiated shard split.

Prerequisites

User-initiated shard splits have the following prerequisites:

  • You must have a DB shard group.

  • The DB shard group can't be empty: it must contain at least one sharded table.

  • A user must have the rds_aurora_limitless_cluster_admin privilege. The rds_superuser has this privilege; therefore the master user also has it. The rds_superuser can grant the privilege to other users:

    /* Logged in as the master user or a user with rds_superuser privileges */ CREATE USER username; GRANT rds_aurora_limitless_cluster_admin to username;
  • You must know the subcluster (node) ID of the shard that you want to split. You can obtain the ID by using the following query:

    SELECT * FROM rds_aurora.limitless_subclusters; subcluster_id | subcluster_type ---------------+----------------- 1 | router 2 | router 3 | shard 4 | shard 5 | shard 6 | shard

To enable system-initiated shard splits, set the following DB cluster parameters in a custom DB cluster parameter group associated with your DB cluster:

Parameter Value

rds_aurora.limitless_enable_auto_scale

on

rds_aurora.limitless_auto_scale_options

Either split_shardor add_router,split_shard

rds_aurora.limitless_finalize_split_shard_mode

This parameter determines how system-initiated shard splits are finalized. The value can be one of the following:

  • user_initiated – You decide when to finalize the shard split. This is the default value.

  • immediate – Shard splits are finalized immediately.

For more information, see Finalizing shard splits.

Note

This parameter applies only to system-initiated shard splits.

For more information, see DB cluster parameter groups for Amazon Aurora DB clusters.

Splitting a shard

To split a shard in a DB shard group, use the rds_aurora.limitless_split_shard function. This function starts a shard-split job that runs asynchronously.

SELECT rds_aurora.limitless_split_shard('subcluster_id');

Wait for the return of a job ID upon successful submission of the job, for example:

SELECT rds_aurora.limitless_split_shard('3'); job_id --------------- 1691300000000 (1 row)

Tracking shard splits

You can use the job ID to track a shard-split job. To describe a particular job and get more details about it, run the following query:

SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(job_id);

For example:

SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(1691300000000); job_id | action | job_details | status | submission_time | message ---------------+-------------+-----------------------+---------+------------------------+------------------------------------------- 1691300000000 | SPLIT_SHARD | Split Shard 3 by User | SUCCESS | 2023-08-06 05:33:20+00 | Scaling job succeeded. + | | | | | New shard instance with ID 7 was created. (1 row)

The query returns an error when you pass a nonexistent job as the input.

SELECT * from rds_aurora.limitless_list_shard_scale_jobs(1691300000001); ERROR: no job found with the job ID provided

You can track the status of all shard-split jobs by using the same query without a job ID, for example:

SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(); job_id | action | job_details | status | submission_time | message ---------------+-------------+-----------------------+-------------+------------------------+-------------------------------------------------------------- 1691200000000 | SPLIT_SHARD | Split Shard 3 by User | IN_PROGRESS | 2023-08-05 01:46:40+00 | 1691300000000 | SPLIT_SHARD | Split Shard 4 by User | SUCCESS | 2023-08-06 05:33:20+00 | Scaling job succeeded. + | | | | | New shard instance with ID 7 was created. 1691400000000 | SPLIT_SHARD | Split Shard 5 by User | FAILED | 2023-08-07 09:20:00+00 | Error occurred for the add shard job 1691400000000. | | | | | Retry the command. If the issue persists, contact AWS Support. 1691500000000 | SPLIT_SHARD | Split Shard 5 by User | CANCELED | 2023-08-07 09:20:00+00 | Scaling job was cancelled. (4 rows)

The job status can be one of the following:

  • IN_PROGRESS – The shard-split job has been submitted and is in progress. You can have only one job in progress at a time.

  • PENDING – The shard-split job is waiting for you to finalize it. For more information, see Finalizing shard splits.

  • CANCELLATION_IN_PROGRESS – The shard-split job is being canceled by the user.

  • CANCELED – The shard-split job has been successfully canceled by the user or by the system.

  • SUCCESS – The shard-split job completed successfully. The message field contains the instance ID of the new shard.

  • FAILED – The shard-split job failed. The message field contains the details of the failure and any actions that can be taken as a followup to the failed job.

Finalizing shard splits

Finalizing is the last step of the shard-split process. It causes some downtime. If you start a shard-split job, then finalizing happens immediately after the job completes successfully.

Sometimes the system splits shards based on workload, when you've enabled system-initiated shard splits by using the rds_aurora.limitless_enable_auto_scale parameter.

In this case, you can choose whether finalizing happens immediately, or at a time that you choose. You use the rds_aurora.limitless_finalize_split_shard_mode DB cluster parameter to choose when it happens:

  • If you set the value to immediate, it happens immediately.

  • If you set the value to user_initiated, you have to finalize the shard-split job manually.

    An RDS event is sent to you, and the status of the shard-split job is set to PENDING.

When set to user_initiated, you use the rds_aurora.limitless_finalize_split_shard function to finalize the shard-split job:

SELECT * FROM rds_aurora.limitless_finalize_split_shard(job_id);
Note

This function applies only to shard splits that are initiated by the system, not by you.

Canceling a shard split

You can cancel a user-initiated or system-initiated shard split that's IN_PROGRESS or PENDING. You need the job ID to cancel it.

SELECT * from rds_aurora.limitless_cancel_shard_scale_jobs(job_id);

No output is returned unless there's an error. You can track the cancellation using a job-tracking query.