Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Define and configure alarms in Incident Detection and Response

Focus mode
Define and configure alarms in Incident Detection and Response - AWS Incident Detection and Response User Guide

AWS works with you to define metrics and alarms to provide visibility into the performance of your applications and their underlying AWS infrastructure. We ask that alarms adhere to the following criteria when defining and configuring thresholds:

  • Alarms only enter the "Alarm" state when there is critical impact to the monitored workload (loss of revenue or degraded customer experience that significantly reduces performance) that requires immediate operator attention.

  • Alarms must also engage your specified resolvers for the workload at the same time, or prior to, engaging the incident management team. Incident management engineers should be collaborating with your specified resolvers in the mitigation process, not serve as a first line responder and then escalate to you.

  • Alarm thresholds must be set to an appropriate threshold and duration so that any time an alarm fires, an investigation must take place. If an alarm is flapping between "Alarm" and "OK" state, sufficient impact is occurring to warrant operator response and attention.

Types of alarms:

  • Alarms that portray the level of business impact and pass relevant information for simple fault detection.

  • Amazon CloudWatch canaries. For more information, see Canaries and X-Ray tracing, and X-Ray.

  • Aggregate alarming (monitoring of dependencies)

The following table provides example alarms, all using the CloudWatch monitoring system.

Metric name / Alarm threshold Alarm ARN or resource ID If this alarm fires If engaged, cut a Premium Support Case for these services

API errors /

# of errors >= 10 for 10 datapoints

arn:aws:cloudwatch:us-west-2:000000000000:alarm:E2MPmimLambda-Errors

Ticket cut to database administrator (DBA) team

Lambda, API Gateway

ServiceUnavailable (Http status code 503)

# of errors >=3 for 10 datapoints (different clients) in a 5 minute window

arn:aws:cloudwatch:us-west-2:xxxxx:alarm:httperrorcode503

Ticket cut to Service team

Lambda, API Gateway

ThrottlingException (Http status code 400)

# of errors >=3 for 10 datapoints (different clients) in a 5 minute window

arn:aws:cloudwatch:us-west-2:xxxxx:alarm:httperrorcode400

Ticket cut to Service team

EC2, Amazon Aurora

For more details, see AWS Incident Detection and Response monitoring and observability.

Key outputs:

  • Definition and configuration of alarms on your workloads.

  • Completion of the alarm details on the onboarding questionnaire.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.