Container, queue, and database metrics for Amazon MWAA
In addition to Apache Airflow metrics, you can monitor the underlying components of your Amazon Managed Workflows for Apache Airflow environments using CloudWatch, which collects raw data and processes data into readable, near real-time metrics. With these environment metrics, you will have greater visibility into key performance indicators to help you appropriately size your environments and debug issues with your workflows. These metrics apply to all supported Apache Airflow versions on Amazon MWAA.
Amazon MWAA will provide CPU and memory utilization for each Amazon Elastic Container Service (Amazon ECS) container and Amazon Aurora PostgreSQL instance, and Amazon Simple Queue Service (Amazon SQS) metrics for the number of messages and the age of the oldest message, Amazon Relational Database Service (Amazon RDS) metrics for database connections, disk queue depth, write operations, latency, and throughput, and Amazon RDS Proxy metrics. These metrics also include the number of base workers, additional workers, schedulers, and web servers.
These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on why a schedule is failing, and troubleshoot underlying issues. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.
Terms
- Namespace
-
A namespace is a container for the CloudWatch metrics of an AWS service. For Amazon MWAA, the namespace is
AWS/MWAA
. - CloudWatch metrics
-
A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.
- Dimension
-
A dimension is a name/value pair that is part of the identity of a metric.
- Unit
-
A statistic has a unit of measure. For Amazon MWAA, units include Count.
Dimensions
This section describes the CloudWatch dimensions grouping for Amazon MWAA metrics in CloudWatch.
Dimension | Description |
---|---|
Cluster |
Metrics for the minimum three Amazon ECS container that an Amazon MWAA environment uses to run Apache Airflow components: scheduler, worker, and web server. |
Queue |
Metrics for the Amazon SQS queues that decouple the scheduler from workers. When workers read the messages, they are considered in-flight and not available for other workers. Messages become available for other workers to read if they are not deleted before the 12 hours visibility timeout. |
Database |
Metrics the Aurora clusters used by Amazon MWAA. This includes metrics for the primary database instance and a read replica to support the read operations. Amazon MWAA publishes database metrics for both READER and WRITER instances. |
Accessing metrics in the CloudWatch console
This section describes how to access your Amazon MWAA metrics in CloudWatch.
To view performance metrics for a dimension
-
Open the Metrics page
on the CloudWatch console. -
Use the AWS Region selector to select your region.
-
Choose the AWS/MWAA namespace.
-
In the All metrics tab, choose a dimension. For example, Cluster.
-
Choose a CloudWatch metric for a dimension. For example, NumSchedulers or CPUUtilization. Then, choose Graph all search results.
-
Choose the Graphed metrics tab to view performance metrics.
List of metrics
The following tables list the cluster, queue, and database service metrics for Amazon MWAA. To view descriptions for metrics directly emitted from Amazon ECS, Amazon SQS, or Amazon RDS, choose the respective documentation link.
Cluster metrics
The following metrics apply to each scheduler, base worker, additional worker, and web server. For more information and descriptions of each cluster metric, see Available metrics and dimensions in the Amazon ECS Developer Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Percent |
|
|
Percent |
Evaluating the number of additional worker and web server containers
You can use the component metrics provided under the Cluster dimension, as described in the following procedure, to assess how many additional workers, or web servers,
an environment is using at a given point in time. You can do this by graphing either the CPUUtilization or the
MemoryUtilization metric and setting the statistic type to Sample Count. The resulting value is the
total number of RUNNING
tasks for the AdditionalWorker
component. Understanding the number of additional worker instances
utilized by your environment can help you gauge how your environment scales and allow you to optimize the number of additional workers.
For more information, see Service RUNNING
task count in the
Amazon Elastic Container Service Developer Guide.
Database metrics
The following metrics apply to each database instance associated with the Amazon MWAA environment.
Namespace | Metric | Unit |
---|---|---|
|
|
Percent |
|
|
Count |
|
|
Count |
|
|
Bytes |
|
|
Count per five minutes |
|
|
Count per second |
|
|
Seconds |
|
|
Bytes per second |
Queue metrics
For more information on units and descriptions for the following queue metrics, see Available CloudWatch metrics for Amazon SQS in the Amazon Simple Queue Service Developer Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Seconds |
|
|
Count |
|
|
Count |
Application Load Balancer metrics
Application Load Balancer metrics apply to the web servers running in your environment. Amazon MWAA uses these metrics to for scaling your web servers based on the amount of traffic. For more information on units and descriptions for the following load balancer metrics, see CloudWatch metrics for your Application Load Balancer in the Application Load Balancers User Guide.
Namespace | Metric | Unit |
---|---|---|
|
|
Count |