Splitting a shard in a DB shard group
You can split a shard in a DB shard group manually into two smaller shards. This is called a user-initiated shard split.
Aurora PostgreSQL Limitless Database can also split shards when they have very large amounts of data or very high usage. This is called a system-initiated shard split.
Topics
Prerequisites
User-initiated shard splits have the following prerequisites:
-
You must have a DB shard group.
-
The DB shard group can't be empty: it must contain at least one sharded table.
-
A user must have the
rds_aurora_limitless_cluster_admin
privilege. Therds_superuser
has this privilege; therefore the master user also has it. Therds_superuser
can grant the privilege to other users:/* Logged in as the master user or a user with rds_superuser privileges */ CREATE USER
username
; GRANT rds_aurora_limitless_cluster_admin tousername
; -
You must know the subcluster (node) ID of the shard that you want to split. You can obtain the ID by using the following query:
SELECT * FROM rds_aurora.limitless_subclusters; subcluster_id | subcluster_type ---------------+----------------- 1 | router 2 | router 3 | shard 4 | shard 5 | shard 6 | shard
To enable system-initiated shard splits, set the following DB cluster parameters in a custom DB cluster parameter group associated with your DB cluster:
Parameter | Value |
---|---|
|
|
|
Either |
|
This parameter determines how system-initiated shard splits are finalized. The value can be one of the following:
For more information, see Finalizing shard splits. NoteThis parameter applies only to system-initiated shard splits. |
For more information, see DB cluster parameter groups for Amazon Aurora DB clusters.
Splitting a shard
To split a shard in a DB shard group, use the rds_aurora.limitless_split_shard
function. This function starts a shard-split job
that runs asynchronously.
SELECT rds_aurora.limitless_split_shard('
subcluster_id
');
Wait for the return of a job ID upon successful submission of the job, for example:
SELECT rds_aurora.limitless_split_shard('3'); job_id --------------- 1691300000000 (1 row)
Tracking shard splits
You can use the job ID to track a shard-split job. To describe a particular job and get more details about it, run the following query:
SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(
job_id
);
For example:
SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(1691300000000); job_id | action | job_details | status | submission_time | message ---------------+-------------+-----------------------+---------+------------------------+------------------------------------------- 1691300000000 | SPLIT_SHARD | Split Shard 3 by User | SUCCESS | 2023-08-06 05:33:20+00 | Scaling job succeeded. + | | | | | New shard instance with ID 7 was created. (1 row)
The query returns an error when you pass a nonexistent job as the input.
SELECT * from rds_aurora.limitless_list_shard_scale_jobs(1691300000001); ERROR: no job found with the job ID provided
You can track the status of all shard-split jobs by using the same query without a job ID, for example:
SELECT * FROM rds_aurora.limitless_list_shard_scale_jobs(); job_id | action | job_details | status | submission_time | message ---------------+-------------+-----------------------+-------------+------------------------+-------------------------------------------------------------- 1691200000000 | SPLIT_SHARD | Split Shard 3 by User | IN_PROGRESS | 2023-08-05 01:46:40+00 | 1691300000000 | SPLIT_SHARD | Split Shard 4 by User | SUCCESS | 2023-08-06 05:33:20+00 | Scaling job succeeded. + | | | | | New shard instance with ID 7 was created. 1691400000000 | SPLIT_SHARD | Split Shard 5 by User | FAILED | 2023-08-07 09:20:00+00 | Error occurred for the add shard job 1691400000000. | | | | | Retry the command. If the issue persists, contact AWS Support. 1691500000000 | SPLIT_SHARD | Split Shard 5 by User | CANCELED | 2023-08-07 09:20:00+00 | Scaling job was cancelled. (4 rows)
The job status can be one of the following:
-
IN_PROGRESS
– The shard-split job has been submitted and is in progress. You can have only one job in progress at a time. -
PENDING
– The shard-split job is waiting for you to finalize it. For more information, see Finalizing shard splits. -
CANCELLATION_IN_PROGRESS
– The shard-split job is being canceled by the user. -
CANCELED
– The shard-split job has been successfully canceled by the user or by the system. -
SUCCESS
– The shard-split job completed successfully. Themessage
field contains the instance ID of the new shard. -
FAILED
– The shard-split job failed. Themessage
field contains the details of the failure and any actions that can be taken as a followup to the failed job.
Finalizing shard splits
Finalizing is the last step of the shard-split process. It causes some downtime. If you start a shard-split job, then finalizing happens immediately after the job completes successfully.
Sometimes the system splits shards based on workload, when you've enabled system-initiated shard splits by using the
rds_aurora.limitless_enable_auto_scale
parameter.
In this case, you can choose whether finalizing happens immediately, or at a time that you choose. You use the
rds_aurora.limitless_finalize_split_shard_mode
DB cluster parameter to choose when it happens:
-
If you set the value to
immediate
, it happens immediately. -
If you set the value to
user_initiated
, you have to finalize the shard-split job manually.An RDS event is sent to you, and the status of the shard-split job is set to
PENDING
.
When set to user_initiated
, you use the rds_aurora.limitless_finalize_split_shard
function to finalize the
shard-split job:
SELECT * FROM rds_aurora.limitless_finalize_split_shard(
job_id
);
Note
This function applies only to shard splits that are initiated by the system, not by you.
Canceling a shard split
You can cancel a user-initiated or system-initiated shard split that's IN_PROGRESS
or PENDING
. You need the job ID
to cancel it.
SELECT * from rds_aurora.limitless_cancel_shard_scale_jobs(
job_id
);
No output is returned unless there's an error. You can track the cancellation using a job-tracking query.