OPS10-BP07 Automate responses to events
Automating event responses is key for fast, consistent, and error-free operational handling. Create streamlined processes and use tools to automatically manage and respond to events, minimizing manual interventions and enhancing operational effectiveness.
Desired outcome:
-
Reduced human errors and faster resolution times through automation.
-
Consistent and reliable operational event handling.
-
Enhanced operational efficiency and system reliability.
Common anti-patterns:
-
Manual event handling leads to delays and errors.
-
Automation is overlooked in repetitive, critical tasks.
-
Repetitive, manual tasks lead to alert fatigue and missing critical issues.
Benefits of establishing this best practice:
-
Accelerated event responses, reducing system downtime.
-
Reliable operations with automated and consistent event handling.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Incorporate automation to create efficient operational workflows and minimize manual interventions.
Implementation steps
-
Identify automation opportunites: Determine repetitive tasks for automation, such as issue remediation, ticket enrichment, capacity management, scaling, deployments, and testing.
-
Identify automation prompts:
-
Assess and define specific conditions or metrics that initiate automated responses using Amazon CloudWatch alarm actions.
-
Use Amazon EventBridge
to respond to events in AWS services, custom workloads, and SaaS applications. -
Consider initiation events such as specific log entries, performance metrics thresholds, or state changes in AWS resources.
-
-
Implement event-driven automation:
-
Use AWS Systems Manager Automation runbooks to simplify maintenance, deployment, and remediation tasks.
-
Creating incidents in Incident Manager automatically gathers and adds details about the involved AWS resources to the incident.
-
Proactively monitor quotas using Quota Monitor for AWS
. -
Automatically adjust capacity with AWS Auto Scaling
to maintain availability and performance. -
Automate development pipelines with Amazon CodeCatalyst
. -
Smoke test or continually monitor endpoints and APIs using synthetic monitoring.
-
-
Perform risk mitigation through automation:
-
Implement automated security responses
to swiftly address risks. -
Use AWS Systems Manager State Manager to reduce configuration drift.
-
Level of effort for the implementation plan: High
Resources
Related best practices:
Related documents:
Related videos:
Related examples: