Requirements for enabling CloudWatch action based recovery Configure CloudWatch action based recovery Troubleshoot CloudWatch action based recovery failures

Configure CloudWatch action based recovery on an Amazon EC2 instance

Important

This section describes how to proactively configure recovery mechanisms on an EC2 instance. These recovery mechanisms are designed to restore instance availability when AWS detects an underlying hardware or software issue that causes a system status check to fail. If you are currently experiencing problems accessing your instance, see Troubleshoot EC2 instances.

If AWS detects that an instance is unavailable due to an underlying hardware or software issue, CloudWatch action based recovery can automatically restore instance availability by moving the instance from the host with the underlying issue to a different host.

If CloudWatch action based recovery occurs, AWS sends one of the following events to your AWS Health Dashboard, depending on the outcome:

Success event: AWS_EC2_INSTANCE_AUTO_RECOVERY_SUCCESS
Failure event: AWS_EC2_INSTANCE_AUTO_RECOVERY_FAILURE

You can configure CloudWatch action based recovery to add recovery actions to Amazon CloudWatch alarms. CloudWatch action based recovery works with the StatusCheckFailed_System metric. CloudWatch action based recovery provides to-the-minute recovery response time granularity and Amazon Simple Notification Service (Amazon SNS) notifications of recovery actions and outcomes. These configuration options allow for faster recovery attempts with more granular control over the system status check failure event response compared to simplified automatic recovery. For more information about available CloudWatch options, see Status checks for your instances.

However, CloudWatch action based recovery can only operate if an instance is in the running state, there are no service events listed in the AWS Health Dashboard, and there is available capacity for the instance type. In some situations, such as significant outages, capacity constraints might cause recovery attempts to fail. For more information, see Troubleshoot CloudWatch action based recovery failures.

Warning

When AWS recovers your instance due to an underlying hardware or software issue, be aware of the following consequences: data stored in volatile memory (RAM) and on instance store volumes will be lost, and the operating system’s uptime will start over from zero. To help protect against data loss, we recommend that you regularly create backups of valuable data. For more information about backup and recovery best practices for EC2 instances, see Best practices for Amazon EC2.

Automatic instance recovery mechanisms are designed for individual instances. For guidance on building a resilient system, see Build a resilient system.

Topics

Requirements for enabling CloudWatch action based recovery
Configure CloudWatch action based recovery
Troubleshoot CloudWatch action based recovery failures

Requirements for enabling CloudWatch action based recovery

CloudWatch action based recovery can be enabled on instances that meet the following criteria:

Instance types

General purpose: A1, M3, M4, M5, M5a, M5n, M5zn, M6a, M6g, M6i, M6in, M7a, M7g, M7i, M7i-flex, M8g, T1, T2, T3, T3a, T4g
Compute optimized: C3, C4, C5, C5a, C5n, C6a, C6g, C6gn, C6i, C6in, C7a, C7g, C7gn, C7i, C7i-flex, C8g
Memory optimized: R3, R4, R5, R5a, R5b, R5n, R6a, R6g, R6i, R6in, R7a, R7g, R7i, R7iz, R8g, U-3tb1, U-6tb1, U-9tb1, U-12tb1, U-18tb1, U-24tb1, U7i-6tb, U7i-8tb, U7i-12tb, U7in-16tb, U7in-24tb, U7in-32tb, U7inh-32tb, X1, X1e, X2idn, X2iedn, X2iezn, X8g
Accelerated computing: G3, G5g, Inf1, P2, P3, VT1
High-performance computing: Hpc6a, Hpc7a, Hpc7g
Metal instances: Any of the above instance types with the metal instance size.
If instance store volumes are added at launch: Then only the following instance types are supported: M3, C3, R3, X1, X1e, X2idn, X2iedn

Tenancy

Shared
Dedicated Instance

For more information, see Amazon EC2 Dedicated Instances.

Limitations

CloudWatch action based recovery is not supported for instances with the following characteristics:

Tenancy: Dedicated Host. For Dedicated Hosts, use Dedicated Host Auto Recovery instead.
Networking: Instances using an Elastic Fabric Adapter
Auto Scaling: Instances that are part of an Auto Scaling group
Maintenance: Instances currently undergoing a scheduled maintenance event

View the instance types that support CloudWatch action based recovery

You can use the AWS Management Console or the AWS CLI to view the instance types that support CloudWatch action based recovery.

Configure CloudWatch action based recovery

To configure CloudWatch action based recovery for an EC2 instance, create a CloudWatch alarm that monitors the StatusCheckFailed_System metric for the specified instance. Set the alarm to trigger when the metric value is 1, indicating a failed system status check. Configure the alarm action to automatically recover the instance when triggered.

You can configure the alarm using either the Amazon EC2 console or the CloudWatch console. For the instructions, see Add recover actions to Amazon CloudWatch alarms in this user guide, or Adding recover actions to Amazon CloudWatch alarms in the Amazon CloudWatch User Guide.

Troubleshoot CloudWatch action based recovery failures

If CloudWatch action based recovery fails to recover your instance, consider the following issues:

AWS service events are running

CloudWatch action based recovery does not operate during service events in the AWS Health Dashboard. You might not receive recovery failure notifications for such events. For the latest service availability information, see the Service health status page.
Insufficient capacity

There is temporarily insufficient replacement hardware to migrate the instance.
Maximum daily recovery attempts reached

The instance has reached the maximum daily allowance for recovery attempts. Your instance might subsequently be retired if automatic recovery fails and a hardware degradation is determined to be the root cause of the original failed system status check.

If the instance’s system status check failure persists despite multiple recovery attempts, see Troubleshoot instances with failed status checks for additional guidance.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Simplified automatic recovery

Instance metadata