Loading data from an Aurora PostgreSQL DB cluster or RDS for PostgreSQL DB instance
After you complete the resource and authentication setup, connect to the cluster endpoint and call the
rds_aurora.limitless_data_load_start
stored procedure from a limitless database, such as postgres_limitless
. The
limitless database is a database on the DB shard group into which you want to migrate data.
This function connects asynchronously in the background to the source database specified in the command, reads the data from the source, and
loads the data onto the shards. For better performance, the data is loaded using parallel threads. The function retrieves a point-in-time table
snapshot by running a SELECT
command to read the data of the table(s) provided in the command.
You can load data into sharded, reference, and standard tables.
You can load data at the database, schema, or table level in rds_aurora.limitless_data_load_start
calls.
-
Database – You can load one database at a time in each call, with no limit on the schema or table count within the database.
-
Schema – You can load a maximum of 15 schemas in each call, with no limit on the table count within each schema.
-
Table – You can load a maximum of 15 tables in each call.
Note
This feature doesn't use Amazon RDS snapshots or point-in-time isolation of the database. For consistency across tables, we recommend cloning the source database and pointing to that cloned database as the source.
The stored procedure uses the following syntax:
CALL rds_aurora.limitless_data_load_start('
source_type
', 'source_DB_cluster_or_instance_ID
', 'source_database_name
', 'streaming_mode'
, 'data_loading_IAM_role_arn
', 'source_DB_secret_arn
', 'destination_DB_secret_arn
', 'ignore_primary_key_conflict_boolean_flag
', 'is_dry_run
', (optional parameter) schemas/tables => ARRAY['name1
', 'name2
', ...]);
The input parameters are the following:
-
source_type
– The source type:aurora_postgresql
orrds_postgresql
-
source_DB_cluster_or_instance_ID
– The source Aurora PostgreSQL DB cluster identifier or RDS for PostgreSQL DB instance identifier -
source_database_name
– The source database name, such aspostgres
-
streaming_mode
– Whether to include change data capture (CDC):full_load
orfull_load_and_cdc
-
data_loading_IAM_role_arn
– The IAM role Amazon Resource Name (ARN) foraurora-data-loader
-
source_DB_secret_arn
– The source DB secret ARN -
destination_DB_secret_arn
– The destination DB secret ARN -
ignore_primary_key_conflict_boolean_flag
– Whether to continue if a primary key conflict occurs:-
If set to
true
, data loading ignores new changes for rows with a primary key conflict. -
If set to
false
, data loading overwrites the existing rows on destination tables when it encounters a primary key conflict.
-
-
is_dry_run
– Whether to test that the data loading job can connect to the source and destination databases:-
If set to
true
, tests the connections without loading data -
If set to
false
, loads the data
-
-
(optional)
schemas
ortables
– An array of schemas or tables to load. You can specify either of the following:-
A list of tables in the format
tables => ARRAY['
schema1
.table1
', 'schema1
.table2
', 'schema2
.table1
', ...] -
A list of schemas in the format
schemas => ARRAY[
'schema1
', 'schema2
', ...]
If you don't include this parameter, the entire specified source database is migrated.
-
The output parameter is the job ID with a message.
The following example shows how to use the rds_aurora.limitless_data_load_start
stored procedure to load data from an
Aurora PostgreSQL DB cluster.
CALL rds_aurora.limitless_data_load_start('aurora_postgresql', 'my-db-cluster', 'postgres', 'full_load_and_cdc', 'arn:aws:iam::123456789012:role/aurora-data-loader-8f2c66', 'arn:aws:secretsmanager:us-east-1:123456789012:secret:secret-source-8f2c66-EWrr0V', 'arn:aws:secretsmanager:us-east-1:123456789012:secret:secret-destination-8f2c66-d04fbD', 'true', 'false', tables => ARRAY['public.customer', 'public.order', 'public.orderdetails']); INFO: limitless data load job id 1688761223647 is starting.