Backup and recovery for Amazon RDS - AWS Prescriptive Guidance

Backup and recovery for Amazon RDS

Amazon RDS includes features for automating database backups. Amazon RDS creates a storage volume snapshot of your database instance, backing up the entire DB instance, not individual databases only. Using Amazon RDS, you can establish a backup window for automated backups, create database instance snapshots, and share and copy snapshots across Regions and accounts.

Amazon RDS provides two different options for backing up and restoring your DB instances:

  • Automated backups provide point-in-time recovery (PITR) of your DB instance. Automated backups are turned on by default when you create a new DB instance.

    Amazon RDS performs a daily backup of your data during a backup window that you define when you create the DB instance. You can configure a retention period of up to 35 days for the automated backup. Amazon RDS also uploads the transaction logs for DB instances to Amazon S3 every 5 minutes. Amazon RDS uses your daily backups along with your database transaction logs to restore your DB instance. You can restore the instance to any second during your retention period, up to the LatestRestorableTime (typically, the last five minutes).

    To find the latest restorable time for your DB instances, use the DescribeDBInstances API call. Or look on the Description tab for the database on the Amazon RDS console.

    When you initiate a PITR, transaction logs are combined with the most appropriate daily backup to restore your DB instance to the requested time.

  • DB snapshots are user-initiated backups that you can use to restore your DB instance to a known state as frequently as you like. You can then restore to that state at any time. You can use the Amazon RDS console or the CreateDBSnapshot API call to create DB snapshots. These snapshots are kept until you use the console or the DeleteDBSnapshot API call to explicitly delete them.

Both of these backup options are supported for Amazon RDS in AWS Backup, which also provides other features. Consider using AWS Backup to set up a standard backup plan for your Amazon RDS databases, and use the user-initiated instance backup options when your backup plans for a particular database are unique.

Amazon RDS prevents direct access to the underlying storage used by the DB instance. This also prevents you from directly exporting the database on an RDS DB instance to its local disk. In some cases, you can use native backup and restore functions using client utilities. For example, you can use the mysqldump command with an Amazon RDS MySQL database to export a database to your local client machine. In some cases, Amazon RDS also provides augmented options for performing a native backup and restore of a database. For example, Amazon RDS provides stored procedures to export and import RDS database backups of SQL Server databases.

Be sure to thoroughly test your database restore process and its impact on database clients as a part of your overall backup and restore approach.

Using DNS CNAME records to reduce client impact during a database recovery

When you restore a database by using PITR or an RDS DB instance snapshot, a new DB instance with a new endpoint is created. In this way, you can create multiple DB instances from a specific DB snapshot or point in time. There are special considerations when you restore an RDS DB instance to replace a live RDS DB instance. For example, you must determine how you will redirect your existing database clients to the new instance with minimal interruption and modification. You also must ensure continuity and consistency in the data within the database by considering the restored data time and the recovery time when the new instance begins receiving writes.

You can create a separate DNS CNAME record that points to your DB instance endpoint and have your clients use this DNS name. Then you can update the CNAME to point to new, restored endpoint without having to update your database clients.

Set the Time to Live (TTL) for your CNAME record to an appropriate value. The TTL that you specify determines how long the record is cached with DNS resolvers before another request is made. It is important to note that some DNS resolvers or applications might not honor the TTL, and they might cache the record for longer than the TTL. For Amazon Route 53, if you specify a longer value (for example, 172800 seconds, or two days), you reduce the number of calls that DNS recursive resolvers must make to Route 53 to get the latest information in this record. This reduces latency and reduces your bill for the Route 53 service. For more information, see How Amazon Route 53 routes traffic for your domain.

Applications and client operating systems might also cache DNS information that you have to flush or restart to initiate a new DNS resolution request and retrieve the updated CNAME record.

When you initiate a database restore and shift traffic to your restored instance, verify that all your clients are writing to your restored instance instead of your prior instance. Your data architecture might support restoring your database, updating DNS to shift traffic to your restored instance, and then remediating any data that may still be written to your prior instance. If this isn’t the case, you can stop your existing instance before you update the DNS CNAME record. Then all access is from your newly restored instance. This may temporarily cause connection problems for some of your database clients that you can handle individually. To reduce client impact, you can perform the database restore during a maintenance window.

Write your applications to handle database connection failures gracefully with retries using exponential backoff. This enables your application to recover when a database connection becomes unavailable during a restore without causing your application to unexpectedly crash.

After you have completed your restore process, you can keep your prior instance in a stopped state. Or you can use security group rules to limit traffic to your prior instance until you are satisfied that it is no longer needed. For a gradual decommissioning approach, first limit access to a running database by the security group. You can eventually stop the instance when it is no longer needed. Finally, take a snapshot of the database instance and delete it.