

# Status checks for Amazon EC2 instances
<a name="monitoring-system-instance-status-check"></a>

With instance status monitoring, you can quickly determine whether Amazon EC2 has detected any problems that might prevent your instances from running applications. Amazon EC2 performs automated checks on every running EC2 instance to identify hardware and software issues. You can view the results of these status checks to identify specific and detectable problems. The event status data augments the information that Amazon EC2 already provides about the state of each instance (such as `pending`, `running`, `stopping`) and the utilization metrics that Amazon CloudWatch monitors (CPU utilization, network traffic, and disk activity).

Status checks are performed every minute, returning a pass or a fail status. If all checks pass, the overall status of the instance is **OK**. If one or more checks fail, the overall status is **impaired**. Status checks are built into Amazon EC2, so they cannot be disabled or deleted.

When a status check fails, the corresponding CloudWatch metric for status checks is incremented. For more information, see [Status check metrics](viewing_metrics_with_cloudwatch.md#status-check-metrics). You can use these metrics to create CloudWatch alarms that are triggered based on the result of the status checks. For example, you can create an alarm to warn you if status checks fail on a specific instance. For more information, see [Create CloudWatch alarms for Amazon EC2 instances that fail status checks](creating_status_check_alarms.md).

You can also create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers the instance if it becomes impaired due to an underlying issue. For more information, see [Automatic instance recovery](ec2-instance-recover.md).

**Topics**
+ [Types of status checks](#types-of-instance-status-checks)
+ [View status checks for Amazon EC2 instances](viewing_status.md)
+ [Create CloudWatch alarms for Amazon EC2 instances that fail status checks](creating_status_check_alarms.md)

## Types of status checks
<a name="types-of-instance-status-checks"></a>

There are three types of status checks.
+ [System status checks](#system-status-checks)
+ [Instance status checks](#instance-status-checks)
+ [Attached EBS status checks](#attached-ebs-status-checks)

### System status checks
<a name="system-status-checks"></a>

System status checks monitor the AWS systems on which your instance runs. These checks detect underlying problems with your instance that require AWS involvement to repair. When a system status check fails, you can choose to wait for AWS to fix the issue, or you can resolve it yourself. For instances backed by Amazon EBS, you can stop and start the instance yourself, which in most cases results in the instance being migrated to a new host. For instances backed by instance store (supported only for Linux instances), you can terminate and replace the instance. Note that instance store volumes are ephemeral and all data is lost when the instance is stopped.

The following are examples of problems that can cause system status checks to fail:
+ Loss of network connectivity
+ Loss of system power
+ Software issues on the physical host
+ Hardware issues on the physical host that impact network reachability

If a system status check fails, we increment the [StatusCheckFailed\$1System](viewing_metrics_with_cloudwatch.md#status-check-metrics) metric.

**Bare metal instances**  
If you perform a restart from the operating system on a bare metal instance, the system status check might temporarily return a fail status. When the instance becomes available, the system status check should return a pass status.

### Instance status checks
<a name="instance-status-checks"></a>

Instance status checks monitor the software and network connectivity of your individual instance. Amazon EC2 checks the health of the instance by sending an address resolution protocol (ARP) request to the network interface (NIC). These checks detect problems that require your involvement to repair. When an instance status check fails, you typically must address the problem yourself (for example, by rebooting the instance or by making instance configuration changes).

**Note**  
Recent Linux distributions that use `systemd-networkd` for network configuration might report on health checks differently from earlier distributions. During the boot process, this type of network can start earlier and potentially finish before other startup tasks that can also affect instance health. Status checks that depend on network availability can report a healthy status before other tasks complete.

The following are examples of problems that can cause instance status checks to fail:
+ Failed system status checks
+ Incorrect networking or startup configuration
+ Exhausted memory
+ Corrupted file system
+ Incompatible kernel
+ During a reboot, an instance status check reports a failure until the instance becomes available again.

If an instance status check fails, we increment the [StatusCheckFailed\$1Instance](viewing_metrics_with_cloudwatch.md#status-check-metrics) metric.

**Bare metal instances**  
If you perform a restart from the operating system on a bare metal instance, the instance status check might temporarily return a fail status. When the instance becomes available, the instance status check should return a pass status.

### Attached EBS status checks
<a name="attached-ebs-status-checks"></a>

Attached EBS status checks monitor if the Amazon EBS volumes attached to an instance are reachable and able to complete I/O operations. The `StatusCheckFailed_AttachedEBS` metric is a binary value that indicates impairment if one or more of the EBS volumes attached to the instance are unable to complete I/O operations. These status checks detect underlying issues with the compute or Amazon EBS infrastructure. When the attached EBS status check metric fails, you can either wait for AWS to resolve the issue, or you can take actions, such as replacing the affected volumes or stopping and restarting the instance.

The following are examples of issues that can cause attached EBS status checks to fail:
+ Hardware or software issues on the storage subsystems underlying the EBS volumes
+ Hardware issues on the physical host that impact reachability of the EBS volumes
+ Connectivity issues between the instance and EBS volumes

You can use the `StatusCheckFailed_AttachedEBS` metric to help improve the resilience of your workload. You can use this metric to create Amazon CloudWatch alarms that are triggered based on the result of the status check. For example, you could fail over to a secondary instance or Availability Zone when you detect a prolonged impact. Alternatively, you can monitor the I/O performance of each attached volume using EBS CloudWatch metrics to detect and replace the impaired volume. If your workload is not driving I/O to any EBS volumes attached to your instance, and the EBS status check indicates an impairment, you can stop and start the instance to move it to a new host. This can resolve underlying host issues that are impacting the reachability of the EBS volumes. For more information, see [Amazon CloudWatch metrics for Amazon EBS](https://docs.aws.amazon.com/ebs/latest/userguide/using_cloudwatch_ebs.html).

You can also configure your Amazon EC2 Auto Scaling groups to detect attached EBS status check failures, and then replace the affected instance with a new one. For more information, see [ Monitor and replace Auto Scaling instances with impaired Amazon EBS volumes](https://docs.aws.amazon.com/autoscaling/ec2/userguide/monitor-and-replace-instances-with-impaired-ebs-volumes.html) in the *Amazon EC2 Auto Scaling User Guide*.

**Note**  
The attached EBS status check metric is available only for Nitro instances.

# View status checks for Amazon EC2 instances
<a name="viewing_status"></a>

If your instance has a failed status check, you typically must address the problem yourself (for example, by rebooting the instance or by making instance configuration changes). To troubleshoot system or instance status check failures yourself, see [Troubleshoot Amazon EC2 Linux instances with failed status checks](TroubleshootingInstances.md).

------
#### [ Console ]

**To view status checks**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Instances**.

1. On the **Instances** page, the **Status check** column lists the operational status of each instance.

1. To view the status of a specific instance, select the instance, and then choose the **Status and alarms** tab.

1. To review the CloudWatch metrics for status checks, on the **Status and alarms** tab, expand **Metrics** to see the graphs for the following metrics:
   + **Status check failed for system**
   + **Status check failed for instance**
   + **Status check failed for attached EBS**

   For more information, see [Status check metrics](viewing_metrics_with_cloudwatch.md#status-check-metrics).

------
#### [ AWS CLI ]

**To view status checks**  
Use the [describe-instance-status](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instance-status.html) command.

**Example**: Get the status of all running instances

```
aws ec2 describe-instance-status
```

**Example**: Get the status of all instances

```
aws ec2 describe-instance-status --include-all-instances
```

**Example**: Get the status of a single running instance

```
aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0
```

**Example**: Get all instances with a status of `impaired`

```
aws ec2 describe-instance-status \
--filters Name=instance-status.status,Values=impaired
```

------
#### [ PowerShell ]

**To view status checks**  
Use the [Get-EC2InstanceStatus](https://docs.aws.amazon.com/powershell/latest/reference/items/Get-EC2InstanceStatus.html) command.

**Example**: Get the status of all running instances

```
Get-EC2InstanceStatus
```

**Example**: Get the status of all instances

```
Get-EC2InstanceStatus -IncludeAllInstance $true
```

**Example**: Get the status of a single running instance

```
Get-EC2InstanceStatus -InstanceId i-1234567890abcdef0
```

**Example**: Get all instances with a status of `impaired`

```
Get-EC2InstanceStatus \
-Filter @{Name="instance-status.status"; Values="impaired"}
```

------

# Create CloudWatch alarms for Amazon EC2 instances that fail status checks
<a name="creating_status_check_alarms"></a>

You can use the [status check metrics](viewing_metrics_with_cloudwatch.md#status-check-metrics) to create CloudWatch alarms to notify you when an instance has a failed status check.

Status checks and status check alarms can temporarily enter an *insufficient data* state if there are missing metric data points. Although rare, this can happen when there is an interruption in the metric reporting systems, even when an instance is healthy. We recommend that you treat this state as missing data instead of a status check failure or alarm breach. This is especially important when taking stop, terminate, reboot, or recover actions on the instance in response.

------
#### [ Console ]

This example configures an alarm that sends a notification when an instance fails a status check. You can optionally stop, terminate, or recover the instance.

**To create a status check alarm**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Instances**.

1. Select the instance, choose the **Status Checks** tab, and choose **Actions**, **Create status check alarm**.

1. On the **Manage CloudWatch alarms** page, under **Add or edit alarm**, choose **Create an alarm**.

1. For **Alarm notification**, turn the toggle on to configure Amazon Simple Notification Service (Amazon SNS) notifications. Select an existing Amazon SNS topic or enter a name to create a new topic.

   If you add an email address to the list of recipients or create a new topic, Amazon SNS sends a confirmation email to each new address. Each recipient must choose the confirmation link in the email. Only confirmed addresses receive alert notifications.

1. For **Alarm action**, turn the toggle on to specify an action to take when the alarm is triggered. Select the action.

1. For **Alarm thresholds**, specify the metric and criteria for the alarm.

   You can leave the default settings for **Group samples by** (**Average**) and **Type of data to sample** (**Status check failed:either**), or you can change them to suit your needs.

   For **Consecutive period**, set the number of periods to evaluate and, in **Period**, enter the evaluation period duration before triggering the alarm and sending an email.

1. (Optional) For **Sample metric data**, choose **Add to dashboard**.

1. Choose **Create**.

If you need to change an instance status alarm, you can edit it.

**To edit a status check alarm**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Instances**.

1. Select the instance and choose **Actions**, **Monitoring**, **Manage CloudWatch alarms**.

1. On the **Manage CloudWatch alarms** page, under **Add or edit alarm**, choose **Edit an alarm**.

1. For **Search for alarm**, choose the alarm.

1. When you are finished making changes, choose **Update**.

------
#### [ AWS CLI ]

In the following example, the alarm publishes a notification to an SNS topic when the instance fails either the instance check or system status check for at least two consecutive periods. The CloudWatch metric used is `StatusCheckFailed`.

**To create a status check alarm**

1. Select an existing SNS topic or create a new one. For more information, see [Accessing Amazon SNS in the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-services-sns.html) in the *AWS Command Line Interface User Guide*.

1. Use the following [list-metrics](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/list-metrics.html) command to view the available Amazon CloudWatch metrics for Amazon EC2.

   ```
   aws cloudwatch list-metrics --namespace AWS/EC2
   ```

1. Use the following [put-metric-alarm](https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-metric-alarm.html) command to create the alarm.

   ```
   aws cloudwatch put-metric-alarm \
       --alarm-name StatusCheckFailed-Alarm-for-i-1234567890abcdef0 \
       --metric-name StatusCheckFailed \
       --namespace AWS/EC2 \
       --statistic Maximum \
       --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
       --unit Count \
       --period 300 \
       --evaluation-periods 2 \
       --threshold 1 \
       --comparison-operator GreaterThanOrEqualToThreshold \
       --alarm-actions arn:aws:sns:us-west-2:111122223333:my-sns-topic
   ```

   The period is the time frame, in seconds, in which Amazon CloudWatch metrics are collected. This example uses 300, which is 60 seconds multiplied by 5 minutes. The evaluation period is the number of consecutive periods for which the value of the metric must be compared to the threshold. This example uses 2. The alarm actions are the actions to perform when this alarm is triggered.

------
#### [ PowerShell ]

**To create a status check alarm**  
Use the [Write-CWMetricAlarm](https://docs.aws.amazon.com/powershell/latest/reference/items/Write-CWMetricAlarm.html) cmdlet as follows to publish notifications to an SNS topic when the instance fails status checks for at least two consecutive periods.

```
Write-CWMetricAlarm `
    -AlarmName "StatusCheckFailed-Alarm-for-i-1234567890abcdef0" `
    -MetricName "StatusCheckFailed" `
    -Namespace "AWS/EC2" `
    -Statistic "Maximum" `
    -Dimension @{Name="InstanceId"; Values="i-1234567890abcdef0"} `
    -Unit "Count" `
    -Period 300 `
    -EvaluationPeriod 2 `
    -Threshold 1 `
    -ComparisonOperator "GreaterThanOrEqualToThreshold" `
    -AlarmAction "arn:aws:sns:us-west-2:111122223333:my-sns-topic"
```

The period is the time frame, in seconds, in which Amazon CloudWatch metrics are collected. This example uses 300, which is 60 seconds multiplied by 5 minutes. The evaluation period is the number of consecutive periods for which the value of the metric must be compared to the threshold. This example uses 2. The alarm actions are the actions to perform when this alarm is triggered.

------