Migrate an on-premises Oracle database to Amazon RDS for Oracle by using direct Oracle Data Pump Import over a database link - AWS Prescriptive Guidance

Migrate an on-premises Oracle database to Amazon RDS for Oracle by using direct Oracle Data Pump Import over a database link

Created by Rizwan Wangde (AWS)

Environment: Production

Source: On-premises Oracle database

Target: Amazon RDS for Oracle

R Type: Replatform

Workload: Oracle

Technologies: Migration; Databases

AWS services: AWS DMS; AWS Direct Connect; Amazon RDS

Numerous patterns cover migrating on-premises Oracle databases to Amazon RDS for Oracle by using Oracle Data Pump, a native Oracle utility that is the preferred way to migrate large Oracle workloads. These patterns typically involve exporting application schemas or tables into dump files, transferring the dump files to a database directory on Amazon RDS for Oracle, and then importing the application schemas and data from the dump files.

Using that approach, a migration can take longer depending on the size of the data and the time that it takes to transfer the dump files to the Amazon RDS instance. In addition, the dump files reside on the Amazon RDS instance's Amazon Elastic Block Store (Amazon EBS) volume, which must be large enough for the database and the dump files. When the dump files are deleted after import, the empty space cannot be retrieved, so you continue to pay for unused space.

This pattern mitigates those issues by performing a direct import on the Amazon RDS instance by using the Oracle Data Pump API (DBMS_DATAPUMP) over a database link. The pattern initiates a simultaneous export and import pipeline between the source and target databases. This pattern doesn't require sizing an EBS volume for the dump files because no dump files are created or stored on the volume. This approach saves the monthly cost of unused disk space.

Prerequisites 

  • An active Amazon Web Services (AWS) account.

  • A virtual private cloud (VPC) configured with private subnets across at least two Availability Zones, to provide the network infrastructure for the Amazon RDS instance.

  • An Oracle database in an on-premises data center.

  • An existing Amazon RDS Oracle instance in a single Availability Zone. Using a single Availability Zone improves write performance during migration. A Multi-AZ deployment can be enabled 24–48 hours before cutover.

  • AWS Direct Connect (recommended for large sized databases).

  • Network connectivity and firewall rules on premises configured to allow an inbound connection from the Amazon RDS instance to the on-premises Oracle database.

Limitations 

  • The database size limit on Amazon RDS for Oracle is 64 TiB (as of December 2022).

Product versions

  • Source database: Oracle Database version 10g Release 1 and later.

  • Target database: For the latest list of supported versions and editions on Amazon RDS, see Amazon RDS for Oracle in the AWS documentation.

Source technology stack  

  • Self-managed Oracle database on premises or in the cloud

Target technology stack  

  • Amazon RDS for Oracle

Target architecture 

The following diagram shows the architecture for migrating from an on-premises Oracle database to Amazon RDS for Oracle in a Single-AZ environment. The arrow directions depict the data flow in the architecture. The diagram doesn't show what component is initiating the connection.

Full-load migration for an on-premises Oracle database.
  1. The Amazon RDS for Oracle instance connects to the on-premises source Oracle database to perform a full-load migration over the database link.

  2. AWS DMS connects to the on-premises source Oracle database to perform ongoing replication by using change data capture (CDC).

  3. CDC changes are applied to the Amazon RDS for Oracle database.

AWS services

  • AWS Database Migration Service (AWS DMS) helps you migrate data stores into the AWS Cloud or between combinations of cloud and on-premises setups. This pattern uses CDC and the Replicate data changes only setting.

  • AWS Direct Connect links your internal network to a Direct Connect location over a standard Ethernet fiber-optic cable. With this connection, you can create virtual interfaces directly to public AWS services while bypassing internet service providers in your network path.

  • Amazon Relational Database Service (Amazon RDS) for Oracle helps you set up, operate, and scale an Oracle relational database in the AWS Cloud.

Other tools

Although AWS Direct Connect uses dedicated, private network connections between the on-premises network and AWS, consider the following options for additional security and data encryption for data in transit:

TaskDescriptionSkills required

Set up network connectivity from the target database to the source database.

Configure the on-premises network and firewall to allow incoming connection from the target Amazon RDS instance to the on-premises source Oracle database.

Network administrator, Security engineer

Create a database user with the appropriate privileges.

Create a database user in the on-premises source Oracle database with privileges to migrate data between the source and target using Oracle Data Pump.

GRANT CONNECT to <migration_user>; GRANT DATAPUMP_EXP_FULL_DATABASE to <migration_user>; GRANT SELECT ANY TABLE to <migration_user>;
DBA

Prepare the on-premises source database for AWS DMS CDC migration.

(Optional) Prepare the on-premises source Oracle database for AWS DMS CDC migration after completion of Oracle Data Pump Full Load:

  1. Configure the additional privileges required to manage FLASHBACK during Oracle Data Pump migration.

    GRANT FLASHBACK ANY TABLE to <migration_user>; GRANT FLASHBACK ARCHIVE ADMINISTER to <migration_user>;
  2. To configure user account privileges required on a self-managed Oracle source for AWS DMS, see the AWS DMS documentation.

  3. To prepare an Oracle self-managed source database for CDC using AWS DMS, see the AWS DMS documentation.

DBA

Install and configure SQL Developer.

Install and configure SQL Developer to connect and run SQL queries on the source and target databases.

DBA, Migration engineer

Generate a script to create the tablespaces.

Use the following example SQL query to generate the script on the source database.

SELECT 'CREATE TABLESPACE ' tablespace_name ' DATAFILE SIZE 1G AUTOEXTEND ON MAXSIZE UNLIMITED;' from dba_tablespaces where tablespace_name not in ('SYSTEM', 'SYSAUX','TEMP','UNDOTBS1') order by 1;

The script will be applied on the target database.

DBA

Generate a script to create users, profiles, roles, and privileges.

To generate a script to create the database users, profiles, roles, and privileges, use the scripts from the Oracle Support document How to Extract DDL for User including Privileges and Roles Using dbms_metadata.get_ddl (Doc ID 2739952.1) (Oracle account required).

The script will be applied on the target database.

DBA
TaskDescriptionSkills required

Create a database link to the source database and verify connectivity.

To create a database link to the on-premises source database, you can use the following example command.

CREATE DATABASE LINK link2src CONNECT TO <migration_user_account> IDENTIFIED BY <password> USING '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<dns or ip address of remote db>) (PORT=<listener port>))(CONNECT_DATA=(SID=<remote SID>)))';

To verify connectivity, run the following SQL command.

select * from dual@link2src;

Connectivity is successful if the response is X.

DBA

Run the scripts to prepare the target instance.

Run the previously generated scripts to prepare the target Amazon RDS for Oracle instance:

  1. Tablespaces

  2. Profiles

  3. Roles

This helps ensure that the Oracle Data Pump migration can create the schemas and their objects.

DBA, Migration engineer
TaskDescriptionSkills required

Migrate the required schemas.

To migrate the required schemas from the source on-premises database to the target Amazon RDS instance, use the code in the Additional information section:

  • To migrate a single schema, run Code 1 from the Additional information section.

  • To migrate multiple schemas, run Code 2 from the Additional information section.

To tune the performance of the migration, you can adjust the number of parallel processes by running the following command.

DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4);
DBA

Gather schema statistics to improve performance.

The Gather Schema Statistics command returns the Oracle query optimizer statistics gathered for database objects. By using this information, the optimizer can select the best execution plan for any query against these objects.

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(ownname => '<schema_name>');
DBA
TaskDescriptionSkills required

Capture the SCN on the source on-premises Oracle database.

Capture the system change number (SCN) on the source on-premises Oracle database. You will use the SCN for full-load import and as the starting point for CDC replication.

To generate the current SCN on the source database, run the following SQL statement.

SELECT current_scn FROM V$DATABASE;
DBA

Perform the full-load migration of the schemas.

To migrate the required schemas (FULL LOAD) from the source on-premises database to the target Amazon RDS instance, do the following:

  • To migrate a single schema, run Code 3 from the Additional information section.

  • To migrate multiple schemas, run Code 4 from the Additional information section.

In the code, replace <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE> with the SCN that you captured from the source database.

DBMS_DATAPUMP.SET_PARAMETER (handle => v_hdnl, name => 'FLASHBACK_SCN', value => <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE>);

To tune the performance of the migration, you can adjust the number of parallel processes.

DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4);
DBA

Disable the triggers under the migrated schemas.

Before you begin the AWS DMS CDC-only task, disable the TRIGGERS under the migrated schemas.

DBA

Gather schema statistics to improve performance.

The Gather Schema Statistics command returns the Oracle query optimizer statistics gathered for database objects. By using this information, the optimizer can select the best execution plan for any query against these objects.

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(ownname => '<schema_name>');
DBA

Use AWS DMS to perform an ongoing replication from the source to target.

Use AWS DMS to perform an ongoing replication from the source Oracle database to the target Amazon RDS for Oracle instance.

For more information, see Creating tasks for ongoing replication using AWS DMS and the blog post How to work with native CDC support in AWS DMS.

DBA, Migration engineer
TaskDescriptionSkills required

Enable Multi-AZ on the instance 48 hours before cutover.

If this is a production instance, we recommend enabling Multi-AZ deployment on the Amazon RDS instance to provide the benefits of high availability (HA) and disaster recovery (DR).

DBA, Migration engineer

Stop the AWS DMS CDC-only task (if CDC was turned on).

  1. Ensure the source latency and target latency on the AWS DMS task's Amazon CloudWatch metrics show 0 seconds.

  2. Stop the AWS DMS CDC-only task.

DBA

Enable the triggers.

Enable the TRIGGERS that you disabled before the CDC task was created.

DBA

AWS

Oracle documentation

Code 1: Full-load migration only, single application schema

DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER(v_hdnl,'SCHEMA_EXPR','IN (''<schema_name>'')'); -- To migrate one selected schema DBMS_DATAPUMP.METADATA_FILTER (hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /

Code 2: Full-load migration only, multiple application schemas

DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'SCHEMA_LIST', '''<SCHEMA_1>'',''<SCHEMA_2>'', ''<SCHEMA_3>'''); -- To migrate multiple schemas DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /

Code 3: Full-load migration before CDC-only task, single application schema

DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN(operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE( handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER(v_hdnl,'SCHEMA_EXPR','IN (''<schema_name>'')'); -- To migrate one selected schema DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARAMETER (handle => v_hdnl, name => 'FLASHBACK_SCN', value => <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE>); -- SCN required for AWS DMS CDC only task. DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /

Code 4: Full-load migration before CDC-only task, multiple application schemas

DECLARE v_hdnl NUMBER; BEGIN v_hdnl := DBMS_DATAPUMP.OPEN (operation => 'IMPORT', job_mode => 'SCHEMA', remote_link => '<DB LINK Name to Source Database>', job_name => null); DBMS_DATAPUMP.ADD_FILE (handle => v_hdnl, filename => 'import_01.log', directory => 'DATA_PUMP_DIR', filetype => dbms_datapump.ku$_file_type_log_file); DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'SCHEMA_LIST', '''<SCHEMA_1>'',''<SCHEMA_2>'', ''<SCHEMA_3>'''); -- To migrate multiple schemas DBMS_DATAPUMP.METADATA_FILTER (v_hdnl, 'EXCLUDE_PATH_EXPR','IN (''STATISTICS'')'); -- To prevent gathering Statistics during the import DBMS_DATAPUMP.SET_PARAMETER (handle => v_hdnl, name => 'FLASHBACK_SCN', value => <CURRENT_SCN_VALUE_IN_SOURCE_DATABASE>); -- SCN required for AWS DMS CDC only task. DBMS_DATAPUMP.SET_PARALLEL (handle => v_hdnl, degree => 4); -- Number of parallel processes performing export and import DBMS_DATAPUMP.START_JOB(v_hdnl); END; /

Scenario where a mixed migration approach can work better

In rare scenarios where the source database contains tables with millions of rows and very large-sized LOBSEGMENT columns, this pattern will slow down the migration. Oracle migrates LOBSEGMENTs over the network link one at a time. It extracts a single row (along with the LOB column data) from the source table, and inserts the row into the target table, repeating the process until all rows are migrated. Oracle Data Pump over the database link doesn’t support bulk load or direct path load mechanisms for LOBSEGMENTs.

In this situation, we recommend the following:

  • Skip the identified tables during the Oracle Data Pump migration by adding the following metadata filter.

    dbms_datapump.metadata_filter(handle =>h1, name=>'NAME_EXPR', value => 'NOT IN (''TABLE_1'',''TABLE_2'')');
  • Use an AWS DMS task (full-load migration, with CDC replication if required) to migrate the identified tables. AWS DMS will extract multiple rows from the source Oracle database, and insert them in a batch to the target Amazon RDS instance, which improves performance.