PERF02-BP03 Collect compute-related metrics
Record and track compute-related metrics to better understand how your compute resources are performing and improve their performance and their utilization.
Common anti-patterns:
-
You only use manual log file searching for metrics.
-
You only use the default metrics recorded by your monitoring software.
-
You only review metrics when there is an issue.
Benefits of establishing this best practice: Collecting performance-related metrics will help you align application performance with business requirements to ensure that you are meeting your workload needs. It can also help you continually improve the resource performance and utilization in your workload.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Cloud workloads can generate large volumes of data such as
metrics, logs, and events. In the AWS Cloud, collecting metrics is
a crucial step to improve security, cost efficiency, performance,
and sustainability. AWS provides a wide range of
performance-related metrics using monitoring services such as
Amazon CloudWatch
Implementation steps
-
Identify which performance-related metrics are relevant to your workload. You should collect metrics around resource utilization and the way your cloud workload is operating (like response time and throughput).
-
Choose and set up the right logging and monitoring solution for your workload.
-
Define the required filter and aggregation for the metrics based on your workload requirements.
-
Configure data retention policies for your metrics to match your security and operational goals.
-
If required, create alarms and notifications for your metrics to help you proactively respond to performance-related issues.
-
Use automation to deploy your metric and log aggregation agents.
Resources
Related documents:
Related videos:
Related examples: