Automate cross-Region failover and failback by using DR Orchestrator Framework
Created by Jitendra Kumar (AWS), Oliver Francis (AWS), and Pavithra Balasubramanian (AWS)
Code repository: aws-cross-region-dr-databases | Environment: Production | Technologies: Databases; Infrastructure; Migration; Modernization |
AWS services: Amazon Aurora; AWS CloudFormation; Amazon ElastiCache; Amazon RDS; AWS Step Functions |
Summary
This pattern describes how to use DR Orchestrator Framework to orchestrate and automate the manual, error-prone steps to perform disaster recovery across Amazon Web Services (AWS) Regions. The pattern covers the following databases:
Amazon Relational Database Service (Amazon RDS) for MySQL, Amazon RDS for PostgreSQL, or Amazon RDS for MariaDB
Amazon Aurora MySQL-Compatible Edition or Amazon Aurora PostgreSQL-Compatible Edition (using a centralized file)
Amazon ElastiCache (Redis OSS)
To demonstrate the functionality of DR Orchestrator Framework, you create two DB instances or clusters. The primary is in the AWS Region us-east-1
, and the secondary is in us-west-2
. To create these resources, you use the AWS CloudFormation templates in the App-Stack
folder of the aws-cross-region-dr-databases
Prerequisites and limitations
General prerequisites
DR Orchestrator Framework deployed in both primary and secondary AWS Regions
Two Amazon Simple Storage Service
buckets A virtual private cloud (VPC)
with two subnets and an AWS security group
Engine-specific prerequisites
Amazon Aurora – At least one Aurora global database must be available in two AWS Regions. You can use
us-east-1
as the primary Region, and useus-west-2
as the secondary Region.Amazon ElastiCache (Redis OSS) – An ElastiCache global datastore must be available in two AWS Regions. You can
use us-east-1
as the primary Region, and useus-west-2
as the secondary Region.
Amazon RDS limitations
DR Orchestrator Framework doesn't check the replication lag before doing a failover or failback. Replication lag must be checked manually.
This solution has been tested using a primary database instance with one read replica. If you want to use more than one read replica, test the solution thoroughly before implementing it in a production environment.
Aurora limitations
Feature availability and support vary across specific versions of each database engine and across AWS Regions. For more information on feature and Region availability for cross-Region replication, see Cross-Region read replicas.
Aurora global databases have specific configuration requirements for supported Aurora DB instance classes and the maximum number of AWS Regions. For more information, see Configuration requirements of an Amazon Aurora global database.
This solution has been tested using a primary database instance with one read replica. If you want to use more than one read replica, test the solution thoroughly before implementing it in a production environment.
ElastiCache limitations
For information about Region availability for Global Datastore and ElastiCache configuration requirements, see Prerequisites and limitations in the ElastiCache documentation.
Amazon RDS product versions
Amazon RDS supports the following engine versions:
MySQL – Amazon RDS supports DB instances running the following versions of MySQL: MySQL 8.0 and MySQL 5.7
PostgreSQL – For information about supported versions of Amazon RDS for PostgreSQL, see Available PostgreSQL database versions.
MariaDB – Amazon RDS supports DB instances running the following versions of MariaDB:
MariaDB 10.11
MariaDB 10.6
MariaDB 10.5
Aurora product versions
Amazon Aurora global database switchover requires Aurora MySQL-Compatible with MySQL 5.7 compatibility, version 2.09.1 and higher
For more information, see Limitations of Amazon Aurora global databases.
ElastiCache (Redis OSS) product versions
Amazon ElastiCache (Redis OSS) supports the following Redis versions:
Redis 7.1 (enhanced)
Redis 7.0 (enhanced)
Redis 6.2 (enhanced)
Redis 6.0 (enhanced)
Redis 5.0.6 (enhanced)
For more information, see Supported ElastiCache (Redis OSS) versions.
Architecture
Amazon RDS architecture
The Amazon RDS architecture includes the following resources:
The primary Amazon RDS DB instance created in the primary Region (
us-east-1
) with read/write access for clientsAn Amazon RDS read replica created in the secondary Region (
us-west-2
) with read-only access for clientsDR Orchestrator Framework deployed in both the primary and secondary Regions
The diagram shows the following:
Asynchronous replication between the primary instance and the secondary instance
Read/write access for clients in the primary Region
Read-only access for clients in the secondary Region
Aurora architecture
The Amazon Aurora architecture includes the following resources:
The primary Aurora DB cluster created in the primary Region (
us-east-1
) with an active-writer endpointAn Aurora DB cluster created in the secondary Region (
us-west-2
) with an inactive-writer endpointDR Orchestrator Framework deployed in both the primary and secondary Regions
The diagram shows the following:
Asynchronous replication between the primary cluster and the secondary cluster
The primary DB cluster with an active-writer endpoint
The secondary DB cluster with an inactive-writer endpoint
ElastiCache (Redis OSS) architecture
The Amazon ElastiCache (Redis OSS) architecture includes the following resources:
An ElastiCache (Redis OSS) global datastore created with two clusters:
The primary cluster in the primary Region (
us-east-1
)The secondary cluster in the secondary Region (
us-west-2
)
An Amazon cross-Region link with TLS 1.2 encryption between the two clusters
DR Orchestrator Framework deployed in both primary and secondary Regions
Automation and scale
DR Orchestrator Framework is scalable and supports the failover or failback of more than one AWS database in parallel.
You can use the following payload code to fail over multiple AWS databases in your account. In this example, three AWS databases (two global databases such as Aurora MySQL-Compatible or Aurora PostgreSQL-Compatible, and one Amazon RDS for MySQL instance) fail over to the DR Region:
{ "StatePayload": [ { "layer": 1, "resources": [ { "resourceType": "PlannedFailoverAurora", "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (MySQL)", "parameters": { "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-mysql-global-identifier", "DBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-cluster-identifier" } }, { "resourceType": "PlannedFailoverAurora", "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (PostgreSQL)", "parameters": { "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-postgres-global-identifier", "DBClusterIdentifier": "!Import dr-globaldb-cluster-postgres-cluster-identifier" } }, { "resourceType": "PromoteRDSReadReplica", "resourceName": "Promote RDS for MySQL Read Replica", "parameters": { "RDSInstanceIdentifier": "!Import rds-mysql-instance-identifier", "TargetClusterIdentifier": "!Import rds-mysql-instance-global-arn" } } ] } ] }
Tools
AWS services
Amazon Aurora is a fully managed relational database engine that's built for the cloud and compatible with MySQL and PostgreSQL.
Amazon ElastiCache helps you set up, manage, and scale distributed in-memory cache environments in the AWS Cloud. This pattern uses Amazon ElastiCache (Redis OSS).
AWS Lambda
is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use. In this pattern, Lambda functions are used by AWS Step Functions to perform the steps. Amazon Relational Database Service (Amazon RDS) helps you set up, operate, and scale a relational database in the AWS Cloud. This pattern supports Amazon RDS for MySQL, Amazon RDS for PostgreSQL, and Amazon RDS for MariaDB.
AWS SDK for Python (Boto3)
helps you integrate your Python application, library, or script with AWS services. In this pattern, Boto3 APIs are used to communicate with the database instances or global databases. AWS Step Functions
is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications. In this pattern, Step Functions state machines are used to orchestrate and run the cross-Region failover and failback of the database instances or global databases.
Code repository
The code for this pattern is available in the aws-cross-region-dr-databases
Epics
Task | Description | Skills required |
---|---|---|
Clone the GitHub repository. | To clone the repository, run the following command:
| AWS DevOps, AWS administrator |
Package Lambda functions code in a .zip file archive. | Create the archive files for Lambda functions to include the DR Orchestrator Framework dependencies:
| AWS administrator |
Create S3 buckets. | S3 buckets are needed to store DR Orchestrator Framework along with your latest configuration. Create two S3 buckets, one in the primary Region (
Replace | AWS administrator |
Create subnets and security groups. | In both the primary Region (
| AWS administrator |
Update the DR Orchestrator parameter files. | In the
Use the following parameter values, replacing
| AWS administrator |
Upload the DR Orchestrator Framework code to the S3 bucket. | The code will be safer in an S3 bucket than in the local directory. Upload the To upload the code, do the following:
| AWS administrator |
Deploy DR Orchestrator Framework in the primary Region. | To deploy DR Orchestrator Framework in the primary Region (
| AWS administrator |
Deploy DR Orchestrator Framework in the secondary Region. | In the secondary Region (
| AWS administrator |
Verify the deployment. | If the AWS CloudFormation command runs successfully, it returns the following output:
Alternatively, you can navigate to the AWS CloudFormation console and verify the status of the | AWS administrator |
Task | Description | Skills required |
---|---|---|
Create the database subnets and security groups. | In your VPC, create two subnets and one security group for the DB instance or global database in both the primary (
| AWS administrator |
Update the parameter file for the primary DB instance or cluster. | In the Amazon RDS In the
Amazon Aurora In the
Amazon ElastiCache (Redis OSS) In the
| AWS administrator |
Deploy your DB instance or cluster in the primary Region. | To deploy your instance or cluster in the primary Region ( Amazon RDS
Amazon Aurora
Amazon ElastiCache (Redis OSS)
Verify that the AWS CloudFormation resources deployed successfully. | AWS administrator |
Update the parameter file for the secondary DB instance or cluster. | In the Amazon RDS In the
Amazon Aurora In the
Amazon ElastiCache (Redis OSS) In the
| AWS administrator |
Deploy your DB instance or cluster in the secondary Region. | Run the following commands, based on your database engine. Amazon RDS
Amazon Aurora
Amazon ElastiCache (Redis OSS)
Verify that the AWS CloudFormation resources deployed successfully. | AWS administrator |
Related resources
Disaster recovery strategy for databases on AWS (AWS Prescriptive Guidance strategy)
Automate your DR solution for relational databases on AWS (AWS Prescriptive Guidance guide)
Automate your DR solution for relational databases on AWS (AWS Prescriptive Guidance guide)