Configure CloudWatch action based recovery
Important
The following information applies to configuring recovery-related capabilities on healthy instances. If you are currently encountering difficulties accessing your instance, see Troubleshoot EC2 instances.
For your workload to function properly after a successful instance recovery, your instance must boot and accept traffic without requiring manual intervention.
You can configure Amazon CloudWatch action based recovery to add recovery actions to Amazon CloudWatch
alarms. CloudWatch action based recovery works with the StatusCheckFailed_System
metric. CloudWatch action based recovery provides to-the-minute recovery response time granularity
and Amazon Simple Notification Service (Amazon SNS) notifications of recovery actions and outcomes. These configuration
options allow for faster recovery attempts with more granular control over the system status
check failure event response compared to simplified automatic recovery. For more information
about available CloudWatch options, see Status checks for
your instances.
Amazon CloudWatch action based recovery doesn't operate during service events in the AWS Health Dashboard. For more information, see Troubleshooting CloudWatch action based recovery failures.
Topics
Requirements and limitations for CloudWatch action based recovery
CloudWatch action based recovery can attempt to recover an instance if it:
-
Is in the
running
state. For more information, see Amazon EC2 instance state changes. -
Uses
default
(On-Demand) ordedicated
instance tenancy. For more information, see Amazon EC2 billing and purchasing options. -
Is of an instance type for which Amazon EC2 has capacity available. In some situations, such as significant outages, not enough capacity will be available and some recovery attempts might fail.
-
Doesn't use
host
instance tenancy. For Amazon EC2 Dedicated Hosts, you can use Dedicated Host Auto Recovery to automatically recover unhealthy instances. -
Doesn't use an Elastic Fabric Adapter.
-
Isn't a member of an Auto Scaling group.
-
Isn't currently undergoing a scheduled maintenance event.
-
Uses one of the following instance types:
-
General purpose: A1 | M3 | M4 | M5 | M5a | M5n | M5zn | M6a | M6g | M6i | M6in | M7a | M7g | M7i | M7i-flex | M8g | T1 | T2 | T3 | T3a | T4g
-
Compute optimized: C3 | C4 | C5 | C5a | C5n | C6a | C6g | C6gn | C6i | C6in | C7a | C7g | C7gn | C7i | C7i-flex | C8g
-
Memory optimized: R3 | R4 | R5 | R5a | R5b | R5n | R6a | R6g | R6i | R6in | R7a | R7g | R7i | R7iz | R8g | u-3tb1 | u-6tb1 | u-9tb1 | u-12tb1 | u-18tb1 | u-24tb1 | u7i-12tb | u7in-16tb | u7in-24tb | u7in-32tb | X1 | X1e | X2iezn | X8g
-
Accelerated computing: G3 | G3s | G5g | Inf1 | P2 | P3 | VT1
-
High-performance computing: Hpc6a | Hpc7a | Hpc7g
-
Metal instances: Any of the above types with the metal instance size.
-
-
Has instance store volumes and uses one of the following instance types: M3 | C3 | R3 | X1 | X1e | X2idn | X2iedn
Warning
-
Data on instance store volumes will be lost if the instance is stopped. For more information about stopping an instance, see Stopped instances.
-
In the event of a systems status check failure, the instance store and block device mapped data might be lost. For these instance types, you can consider using Enable termination protection.
We recommend that you regularly create backups of valuable data. For information about backup and recovery best practices for Amazon EC2, see Best practices for Amazon EC2.
You can also use the AWS Management Console or the AWS CLI to view the instance types that support CloudWatch action based recovery.
Configure CloudWatch action based recovery
CloudWatch action based recovery works with the StatusCheckFailed_System
metric.
CloudWatch action based recovery is configured through the CloudWatch console. To set up CloudWatch action
based recovery, see Adding recover actions to CloudWatch alarms in the Amazon CloudWatch User
Guide.
Troubleshooting CloudWatch action based recovery failures
The following issues can cause the recovery of your instance with CloudWatch action based recovery to fail:
-
CloudWatch action based recovery does not operate during service events in the AWS Health Dashboard. You might not receive recovery failure notifications for such events. For the latest service availability information, see the Service health
status page. -
Temporary, insufficient capacity of replacement hardware.
-
The instance has reached the maximum daily allowance for recovery attempts. Your instance might subsequently be retired if automatic recovery fails and a hardware degradation is determined to be the root cause for the original system status check failure.
If the instance’s system status check failure persists despite multiple recovery attempts, see Troubleshoot instances with failed status checks for additional guidance.