# Container Insights
<a name="ContainerInsights"></a>

Use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. Container Insights is available for Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), RedHat OpenShift on AWS (ROSA), and Kubernetes platforms on Amazon EC2. Container Insights supports collecting metrics from clusters deployed on AWS Fargate for both Amazon ECS and Amazon EKS.

CloudWatch automatically collects metrics for many resources, such as CPU, memory, disk, and network. Container Insights also provides diagnostic information, such as container restart failures, to help you isolate issues and resolve them quickly. You can also set CloudWatch alarms on metrics that Container Insights collects.

Container Insights collects data as *performance log events* using [embedded metric format](CloudWatch_Embedded_Metric_Format.md). These performance log events are entries that use a structured JSON schema that enables high-cardinality data to be ingested and stored at scale. From this data, CloudWatch creates aggregated metrics at the cluster, node, pod, task, and service level as CloudWatch metrics. The metrics that Container Insights collects are available in CloudWatch automatic dashboards, and are also viewable in the **Metrics** section of the CloudWatch console. Metrics are not visible until the container tasks have been running for some time.

When you deploy Container Insights, it automatically creates a log group for the performance log events. You don't need to create this log group yourself.

To help you manage your Container Insights costs, CloudWatch does not automatically create all possible metrics from the log data. However, you can view additional metrics and additional levels of granularity by using CloudWatch Logs Insights to analyze the raw performance log events.

With the original version of Container Insights, metrics collected and logs ingested are charged as custom metrics. With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics and logs are charged per observation instead of being charged per metric stored or log ingested. For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

**[Preview]** For Amazon EKS, Container Insights with OpenTelemetry provides an additional metric mode that collects metrics using the OpenTelemetry Protocol (OTLP) and supports PromQL queries. Each metric is enriched with up to 150 labels, including OpenTelemetry semantic convention attributes and Kubernetes pod and node labels. For more information, see [Container Insights with OpenTelemetry metrics for Amazon EKS](container-insights-otel-metrics.md).

In Amazon EKS, RedHatOpenshift on AWS, and Kubernetes, Container Insights uses a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It then collects performance data at every layer of the performance stack.

Container Insights supports encryption with the AWS KMS key for the logs and metrics that it collects. To enable this encryption, you must manually enable AWS KMS encryption for the log group that receives Container Insights data. This causes Container Insights to encrypt this data using the provided KMS key. Only symmetric keys are supported. Do not use asymmetric KMS keys to encrypt your log groups.

For more information, see [Encrypt Log Data in CloudWatch Logs Using AWS KMS](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html).

## Supported platforms
<a name="container-insights-platforms"></a>

Container Insights is available for Amazon Elastic Container Service, Amazon Elastic Kubernetes Service, RedHat OpenShift on AWS, and Kubernetes platforms on Amazon EC2 instances.
+ For Amazon ECS, Container Insights collects metrics at the cluster, task, and service levels on both Linux and Windows Server instances. Container Insights collects metrics at the instance level only on Linux instances. Network metrics are available for containers that use `bridge` network mode and `awsvpc` network mode, but are not available for containers that use `host` network mode.
+ For Amazon Elastic Kubernetes Service, and Kubernetes platforms on Amazon EC2 instances, Container Insights is supported on both Linux and Windows instances.
+ **[Preview]** Container Insights with OpenTelemetry metrics is available for Amazon EKS. For more information, see [Container Insights with OpenTelemetry metrics for Amazon EKS](container-insights-otel-metrics.md).

# Container Insights with enhanced observability for Amazon ECS
<a name="container-insights-detailed-ecs-metrics"></a>

On December 2, 2024, AWS released Container Insights with enhanced observability for Amazon ECS. This version supports enhanced observability for Amazon ECS clusters using the Amazon EC2 and Fargate launch types. After you configure Container Insights with enhanced observability on Amazon ECS, Container Insights auto-collects detailed infrastructure telemetry from the cluster level down to the container level in your environment and displays these critical performance data in curated dashboards removing the heavy lifting in observability set-up. For information about how to set up Container Insights with enhanced observability, see [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS-cluster.md).

Container Insights with enhanced observability provides all of the Container Insights metrics, plus additional task and container metrics. For more information, see [Amazon ECS Container Insights with enhanced observability metrics](Container-Insights-enhanced-observability-metrics-ECS.md).

Container Insights with enhanced observability also supports CloudWatch cross-account observability. You can use a single monitoring account to monitor and troubleshoot applications that span multiple AWS accounts within a single Region. For more information, see [CloudWatch cross-account observability](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Unified-Cross-Account.html).

# Container Insights with enhanced observability for Amazon EKS
<a name="container-insights-detailed-metrics"></a>

On November 6, 2023, a new version of Container Insights was released. This version supports enhanced observability for Amazon EKS clusters running on Amazon EC2 and can collect more detailed metrics from these clusters. After installation, it automatically collects detailed infrastructure telemetry and container logs for your Amazon EKS clusters. You can then use curated, immediately usable dashboards to drill down into application and infrastructure telemetry. 

Container Insights with enhanced observability for Amazon EKS collects granular health, performance, and status metrics up to the container level, and also control plane metrics. For more information about the additional metrics and dimensions collected, see [Amazon EKS and Kubernetes Container Insights with enhanced observability metrics](Container-Insights-metrics-enhanced-EKS.md).

If you installed Container Insights by using the CloudWatch agent on an Amazon EKS cluster on Amazon EC2 after November 6, 2023, you have Container Insights with enhanced observability for Amazon EKS. Otherwise, you can upgrade an Amazon EKS cluster to this new version by following the instructions in [Upgrading to Container Insights with enhanced observability for Amazon EKS in CloudWatch](Container-Insights-upgrade-enhanced.md).

Container Insights supports CloudWatch cross-account observability. You use a single monitoring account to monitor and troubleshoot your applications that span multiple AWS accounts within a single Region. For more information, see [CloudWatch cross-account observability](CloudWatch-Unified-Cross-Account.md).

Container Insights with enhanced observability for Amazon EKS also supports Windows worker nodes.

Container Insights with enhanced observability for Amazon EKS is not supported on Fargate.

**Note**  
You can find whether you have clusters that can be upgraded to Container Insights with enhanced observability for Amazon EKS by navigating to the Container Insights console. To do so, choose **Insights**, **Container Insights** in the navigation pane of the CloudWatch console. In the Container Insights console, a banner informs you if you have any Amazon EKS clusters that can be upgraded, and links to the upgrade page.

# Container Insights with OpenTelemetry metrics for Amazon EKS
<a name="container-insights-otel-metrics"></a>

**Preview**  
Container Insights with OpenTelemetry metrics provides visibility into the operational health of your Amazon EKS cluster infrastructure. It is available in public preview at no additional charge in US East (N. Virginia), US West (Oregon), Europe (Ireland), Asia Pacific (Singapore), and Asia Pacific (Sydney).

The Amazon CloudWatch Observability EKS add-on collects open source metrics from your Amazon EKS clusters and sends them to CloudWatch using the OpenTelemetry Protocol (OTLP) at 30 second granularity. These metrics use metric names from their original sources, including cAdvisor, Prometheus Node Exporter, NVIDIA DCGM, Kube State Metrics, and AWS Neuron Monitor. You can query these metrics using PromQL in CloudWatch Query Studio or through the Prometheus compatible query API.

Each metric is automatically enriched with up to 150 labels, including OpenTelemetry semantic convention attributes and Kubernetes pod and node labels. PromQL handles aggregation at query time, so each metric is published once per resource rather than at multiple aggregation levels. The add-on also correlates accelerator metrics from AWS Neuron and AWS Elastic Fabric Adapter with the specific pods and containers using them, providing visibility that is not available from the metric sources alone.

To enable OTel Container Insights on an Amazon EKS cluster, install the Amazon CloudWatch Observability EKS add-on version `v6.0.1-eksbuild.1` or later through the Amazon EKS console or through infrastructure as code.

For more information about setting up OTel Container Insights, see [Setting up Container Insights](deploy-container-insights.md).

For more information about querying these metrics with PromQL, see [PromQL querying](CloudWatch-PromQL-Querying.md).

## How OTel Container Insights compares to the Container Insights (enhanced)
<a name="container-insights-otel-comparison"></a>

The following table summarizes the differences between Container Insights (enhanced) and OTel Container Insights.


| Feature | Container Insights (enhanced) | OTel Container Insights | 
| --- | --- | --- | 
| Metric names | CloudWatch-format metrics (for example, pod\$1cpu\$1utilization) | Open-source native (for example, container\$1cpu\$1usage\$1seconds\$1total) | 
| Labels per metric | 3–6 predefined dimensions per metric | Up to 150 labels, including all Kubernetes pod and node labels | 
| Aggregation | Pre-aggregated at multiple levels (cluster, namespace, workload, pod) | Raw per-resource metrics; aggregate at query time with PromQL | 
| Query language | CloudWatch Metrics API | PromQL (Prometheus-compatible) | 
| Metric ingestion | CloudWatch Logs in EMF format | OTLP endpoint | 

## How metrics are labeled
<a name="container-insights-otel-labels"></a>

Each metric collected by OTel Container Insights carries labels from three sources.

Telemetry source native labels  
Labels from the original metric source (for example, cAdvisor provides labels such as `pod`, `namespace`, and `container`). These are preserved as datapoint attributes.

OpenTelemetry resource attributes  
The add-on appends resource attributes following OpenTelemetry semantic conventions for [Kubernetes](https://opentelemetry.io/docs/specs/semconv/resource/k8s/), [Host](https://opentelemetry.io/docs/specs/semconv/resource/host/), and [Cloud](https://opentelemetry.io/docs/specs/semconv/resource/cloud/), such as `k8s.pod.name`, `k8s.namespace.name`, `k8s.node.name`, `host.name`, and `cloud.region`. These attributes are consistent across all metric sources.

Kubernetes pod and node labels  
All pod labels and node labels discovered from the Kubernetes API are appended as resource attributes with the prefixes `k8s.pod.label` and `k8s.node.label`.

For more information about how to query these attributes using PromQL, see [PromQL querying](CloudWatch-PromQL-Querying.md).

## Supported metrics
<a name="container-insights-otel-supported-metrics"></a>

The following table lists the metric sources and categories collected by OTel Container Insights.


| Metric source | Metric category | Prerequisites | 
| --- | --- | --- | 
| cAdvisor | CPU metrics | - | 
| cAdvisor | Memory metrics | - | 
| cAdvisor | Network metrics | - | 
| cAdvisor | Disk and filesystem metrics | - | 
| Prometheus Node Exporter | CPU metrics | - | 
| Prometheus Node Exporter | Memory metrics | - | 
| Prometheus Node Exporter | Disk metrics | - | 
| Prometheus Node Exporter | Filesystem metrics | - | 
| Prometheus Node Exporter | Network metrics | - | 
| Prometheus Node Exporter | System metrics | - | 
| Prometheus Node Exporter | VMStat metrics | - | 
| Prometheus Node Exporter | Netstat and socket metrics | - | 
| NVIDIA DCGM | GPU utilization and performance metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU memory metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU power and thermal metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU throttling metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU error and reliability metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU NVLink metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| NVIDIA DCGM | GPU informational metrics | [NVIDIA device plugin](https://github.com/NVIDIA/k8s-device-plugin) and [NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed. | 
| AWS Neuron Monitor | NeuronCore metrics | [Neuron driver](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html) and [Neuron device plugin](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html) must be installed. | 
| AWS Neuron Monitor | NeuronDevice metrics | [Neuron driver](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html) and [Neuron device plugin](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html) must be installed. | 
| AWS Neuron Monitor | Neuron system metrics | [Neuron driver](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html) and [Neuron device plugin](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html) must be installed. | 
| AWS Elastic Fabric Adapter | EFA metrics | [EFA device plugin](https://github.com/aws/eks-charts/tree/master/stable/aws-efa-k8s-device-plugin) must be installed. | 
| NVMe | NVMe SMART metrics | - | 
| Kube State Metrics | Pod, node, Deployment, DaemonSet, StatefulSet, ReplicaSet, Job, CronJob, Service, Namespace, PersistentVolume, PersistentVolumeClaim metrics | - | 
| Kubernetes API server | API server and etcd metrics | - | 

## CloudWatch agent container image
<a name="container-insights-download-limit"></a>

Amazon provides a CloudWatch agent container image on Amazon Elastic Container Registry. For more information, see [cloudwatch-agent](https://gallery.ecr.aws/cloudwatch-agent/cloudwatch-agent) on Amazon ECR.

# Setting up Container Insights
<a name="deploy-container-insights"></a>

The Container Insights setup process is different for Amazon ECS and Amazon EKS and Kubernetes. 
+ [Setting up Container Insights on Amazon EKS and Kubernetes](deploy-container-insights-EKS.md)
+ [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS.md)

**Topics**
+ [

# Setting up Container Insights on Amazon ECS
](deploy-container-insights-ECS.md)
+ [

# Setting up Container Insights on Amazon EKS and Kubernetes
](deploy-container-insights-EKS.md)
+ [

# Setting up Container Insights on RedHat OpenShift on AWS (ROSA)
](deploy-container-insights-RedHatOpenShift.md)

# Setting up Container Insights on Amazon ECS
<a name="deploy-container-insights-ECS"></a>

You can use one or both of the following options to enable Container Insights on Amazon ECS clusters:
+ Use the AWS Management Console or the AWS CLI to start collecting cluster-level, task-level, and service-level metrics.
+ Deploy the CloudWatch agent as a daemon service to start collecting of instance-level metrics on clusters that are hosted on Amazon EC2 instances.

**Topics**
+ [

# Setting up Container Insights on Amazon ECS
](deploy-container-insights-ECS-cluster.md)
+ [

# Setting up Container Insights on Amazon ECS using AWS Distro for OpenTelemetry
](deploy-container-insights-ECS-adot.md)
+ [

# Deploying the CloudWatch agent to collect EC2 instance-level metrics on Amazon ECS
](deploy-container-insights-ECS-instancelevel.md)
+ [

# Deploying the AWS Distro for OpenTelemetry to collect EC2 instance-level metrics on Amazon ECS clusters
](deploy-container-insights-ECS-OTEL.md)
+ [

# Set up FireLens to send logs to CloudWatch Logs
](deploy-container-insights-ECS-logs.md)

# Setting up Container Insights on Amazon ECS
<a name="deploy-container-insights-ECS-cluster"></a>

You can set up Container Insights with enhanced observability or Container Insights on new and existing Amazon ECS clusters using either the Amazon ECS console or the AWS CLI. Container Insights collects metrics at the cluster, task, and service levels. Container Insights with enhanced observability provides additional dimensions and metrics, allowing you to deep dive down to container level visibility. 

If you're using Amazon ECS on an Amazon EC2 instance, launch that instance using an AMI that includes Amazon ECS agent version 1.29 or later. For information about updating your agent version, see [Updating the Amazon ECS Container Agent](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-update.html).

**Note**  
If the customer managed AWS KMS key that you use for your Amazon ECS Container Insights metrics is not already configured to work with CloudWatch, you must update the key policy to allow for encrypted logs in CloudWatch Logs. You must also associate your own AWS KMS key with the log group in `/aws/ecs/containerinsights/ClusterName/performance`. For more information, see [Encrypt log data in CloudWatch Logs using AWS Key Management Service](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/encrypt-log-data-kms.html).

We recommend that you use Container Insights with enhanced observability instead of Container Insights as it provides detailed visibility in your container environment, reducing the mean time to resolution.

## Set up Container Insights with enhanced observability
<a name="set-container-insights-ECS-cluster-enhanced"></a>

You can turn on Container Insights with enhanced observability using the Amazon ECS console or AWS CLI. 

------
#### [ AWS CLI ]

Use the following command to turn on Container Insights with enhanced observability.

 Set the `containerInsights` account setting to `enhanced`

```
aws ecs put-account-setting --name containerInsights --value enhanced
```

Example output

```
{
    "setting": {
        "name": "containerInsights",
        "value": "enhanced",
        "principalArn": "arn:aws:iam::123456789012:johndoe",
         "type": user
    }
}
```

**Note**  
By default, the `put-account-setting` applies only to the currently authenticated user. To enable the setting account-wide for all users and roles, use the root user as in the following example.  

```
aws ecs put-account-setting --name containerInsights --value enhanced --principal-arn arn:aws:iam::accountID:root
```

After you set this account setting, all new clusters automatically use Container Insights with enhanced observability. Use the `update-cluster-settings` command to add Container Insights with enhanced observability to existing cluster, or to upgrade clusters that currently use Container Insights to Container Insights with enhanced observability.

```
aws ecs update-cluster-settings --cluster cluster-name --settings name=containerInsights,value=enhanced
```

------
#### [ Amazon ECS console ]

1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2).

1. In the navigation bar at the top, select the Region for which to view your account settings. 

1. In the navigation page, choose **Account Settings**.

1. Choose **Update**.

1. To use Container Insights with enhanced observability, choose **Container Insights with enhanced observability**.

1. Choose **Save changes**.

1. On the confirmation screen, choose **Confirm** to save the selection.

After you set this, all new clusters automatically use Container Insights with enhanced observability. You can add Container Insights with enhanced observability to existing clusters, or update clusters that currently use Container Insights to Container Insights with enhanced observability. For more information, see [Updating an Amazon ECS cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-cluster-v2.html) in the *Amazon Elastic Container Service Developer Guide*.

------

## Set up Container Insights
<a name="set-container-insights-ECS-cluster"></a>

You can turn on Container Insights using the Amazon ECS console or AWS CLI. 

------
#### [ AWS CLI ]

To use Container Insights, set the `container Insights` account setting to `enabled`. Use the following command to turn on Container Insights.

```
aws ecs put-account-setting --name containerInsights --value enabled
```

Example output

```
{
    "setting": {
        "name": "container Insights",
        "value": "enabled",
        "principalArn": "arn:aws:iam::123456789012:johndoe",
         "type": user
    }
}
```

When you set the `container Insights` account setting to `enabled`, all new clusters have Container Insights enabled by default. Use the `update-cluster-settings` command to add Container Insights to an existing cluster.

```
aws ecs update-cluster-settings --cluster cluster-name --settings name=containerInsights,value=enabled
```

------
#### [ Amazon ECS console ]

1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2).

1. In the navigation bar at the top, select the Region for which to view your account settings. 

1. In the navigation page, choose **Account Settings**.

1. Choose **Update**.

1. To use Container Insights, choose **Container Insights**.

1. Choose **Save changes**.

1. On the confirmation screen, choose **Confirm** to save the selection.

After you set this, all new clusters automatically use Container Insights. Update existing clusters to add Container Insights. For more information, see [Updating an Amazon ECS cluster](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/update-cluster-v2.html) in the *Amazon Elastic Container Service Developer Guide*.

------

# Setting up Container Insights on Amazon ECS using AWS Distro for OpenTelemetry
<a name="deploy-container-insights-ECS-adot"></a>

Use this section if you want to use AWS Distro for OpenTelemetry to set up CloudWatch Container Insights on an Amazon ECS cluster. For more information about AWS Distro for Open Telemetry, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/). 

These steps assume that you already have a cluster running Amazon ECS. For more information about using AWS Distro for Open Telemetry with Amazon ECS and setting up an Amazon ECS cluster for this purpose, see [Setting up AWS Distro for OpenTelemetry Collector in Amazon Elastic Container Service](https://aws-otel.github.io/docs/setup/ecs).

## Step 1: Create a task role
<a name="deploy-container-insights-ECS-adot-CreateTaskRole"></a>

The first step is creating a task role in the cluster that the AWS OpenTelemetry Collector will use.

**To create a task role for AWS Distro for OpenTelemetry**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Policies** and then choose **Create policy**.

1. Choose the **JSON** tab and copy in the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "logs:PutLogEvents",
                   "logs:CreateLogGroup",
                   "logs:CreateLogStream",
                   "logs:DescribeLogStreams",
                   "logs:DescribeLogGroups",
                   "ssm:GetParameters"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Choose **Review policy**.

1. For name, enter **AWSDistroOpenTelemetryPolicy**, and then choose **Create policy**.

1. In the left navigation pane, choose **Roles** and then choose **Create role**.

1. In the list of services, choose **Elastic Container Service**.

1. Lower on the page, choose **Elastic Container Service Task** and then choose **Next: Permissions**.

1. In the list of policies, search for **AWSDistroOpenTelemetryPolicy**.

1. Select the check box next to **AWSDistroOpenTelemetryPolicy**.

1. Choose **Next: Tags** and then choose **Next: Review.**

1. For **Role name** enter **AWSOpenTelemetryTaskRole** and then choose **Create role**.

## Step 2: Create a task execution role
<a name="deploy-container-insights-ECS-adot-CreateTaskExecutionRole"></a>

The next step is creating a task execution role for the AWS OpenTelemetry Collector.

**To create a task execution role for AWS Distro for OpenTelemetry**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane, choose **Roles** and then choose **Create role**.

1. In the list of services, choose **Elastic Container Service**.

1. Lower on the page, choose **Elastic Container Service Task** and then choose **Next: Permissions**.

1. In the list of policies, search for **AmazonECSTaskExecutionRolePolicy** and then select the check box next to **AmazonECSTaskExecutionRolePolicy**.

1. In the list of policies, search for **CloudWatchLogsFullAccess** and then select the check box next to **CloudWatchLogsFullAccess**.

1. In the list of policies, search for **AmazonSSMReadOnlyAccess** and then select the check box next to **AmazonSSMReadOnlyAccess**.

1. Choose **Next: Tags** and then choose **Next: Review.**

1. For **Role name** enter **AWSOpenTelemetryTaskExecutionRole** and then choose **Create role**.

## Step 3: Create a task definition
<a name="deploy-container-insights-ECS-adot-CreateTaskDefinition"></a>

The next step is creating a task definition.

**To create a task definition for AWS Distro for OpenTelemetry**

1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2).

1. In the navigation pane, choose **Task definitions**

1. Choose **Create new task definition**, **Create new task definition**.

1. For **Task definition family**, specify a unique name for the task definition.

1. Configure your containers, and then choose **Next**.

1. Under **Metrics and logging**, select **Use metric collection**.

1. Choose **Next**.

1. Choose **Create**.

For more information about using the AWS OpenTelemetry collector with Amazon ECS, see [Setting up AWS Distro for OpenTelemetry Collector in Amazon Elastic Container Service](https://aws-otel.github.io/docs/setup/ecs).

## Step 4: Run the task
<a name="deploy-container-insights-ECS-adot-CreateTaskDefinition"></a>

The final step is running the task that you've created.

**To run the task for AWS Distro for OpenTelemetry**

1. Open the console at [https://console.aws.amazon.com/ecs/v2](https://console.aws.amazon.com/ecs/v2).

1. In the left navigation pane, choose **Task Definitions** and then select the task that you just created.

1. Choose **Actions**, **Deploy**, **Run task**. 

1. Choose **Deploy**, **Run task**.

1. In the **Compute options** section, from **Existing cluster**, choose the cluster.

1. Choose **Create**.

1. Next, you can check for the new metrics in the CloudWatch console.

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the left navigation pane, choose **Metrics**.

   You should see a **ECS/ContainerInsights** namespace. Choose that namespace and you should see eight metrics.

# Deploying the CloudWatch agent to collect EC2 instance-level metrics on Amazon ECS
<a name="deploy-container-insights-ECS-instancelevel"></a>

To deploy the CloudWatch agent to collect instance-level metrics from Amazon ECS clusters that are hosted on EC2 instance, use a quick start setup with a default configuration, or install the agent manually to be able to customize it.

Both methods require that you already have at least one Amazon ECS cluster deployed with an EC2 launch type and that the CloudWatch agent continer has access to the Amazon EC2 Instance Metadata Service (IMDS). For more information about IMDS, see [Instance metadata and user data](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html).

These methods also assume that you have the AWS CLI installed. Additionally, to run the commands in the following procedures, you must be logged on to an account or role that has the **IAMFullAccess** and **AmazonECS\$1FullAccess** policies.

**Important**  
When defining the CloudWatch Agent container in your task definition, set `essential: false`. This prevents the entire Amazon ECS service from stopping if the CloudWatch Agent container fails. Other critical application containers will continue running even if the agent is temporarily unavailable.

**Topics**
+ [

## Quick setup using CloudFormation
](#deploy-container-insights-ECS-instancelevel-quickstart)
+ [

## Manual and custom setup
](#deploy-container-insights-ECS-instancelevel-manual)

## Quick setup using CloudFormation
<a name="deploy-container-insights-ECS-instancelevel-quickstart"></a>

To use the quick setup, enter the following command to use CloudFormation to install the agent. Replace *cluster-name* and *cluster-region* with the name and Region of your Amazon ECS cluster.

This command creates the IAM roles **CWAgentECSTaskRole** and **CWAgentECSExecutionRole**. If these roles already exist in your account, use `ParameterKey=CreateIAMRoles,ParameterValue=False` instead of `ParameterKey=CreateIAMRoles,ParameterValue=True` when you enter the command. Otherwise, the command will fail.

```
ClusterName=cluster-name
Region=cluster-region
curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/daemon-service/cwagent-ecs-instance-metric/cloudformation-quickstart/cwagent-ecs-instance-metric-cfn.json
aws cloudformation create-stack --stack-name CWAgentECS-${ClusterName}-${Region} \
    --template-body file://cwagent-ecs-instance-metric-cfn.json \
    --parameters ParameterKey=ClusterName,ParameterValue=${ClusterName} \
                 ParameterKey=CreateIAMRoles,ParameterValue=True \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${Region}
```

**(Alternative) Using your own IAM roles**

If you want to use your own custom ECS task role and ECS task execution role instead of the **CWAgentECSTaskRole** and **CWAgentECSExecutionRole** roles, first make sure that the role to be used as the ECS task role has **CloudWatchAgentServerPolicy** attached. Also, make sure that the role to be used as the ECS task execution role has both the **CloudWatchAgentServerPolicy** and **AmazonECSTaskExecutionRolePolicy** policies attached. Then enter the following command. In the command, replace *task-role-arn* with the ARN of your custom ECS task role, and replace *execution-role-arn* with the ARN of your custom ECS task execution role.

```
ClusterName=cluster-name
Region=cluster-region
TaskRoleArn=task-role-arn
ExecutionRoleArn=execution-role-arn
curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/daemon-service/cwagent-ecs-instance-metric/cloudformation-quickstart/cwagent-ecs-instance-metric-cfn.json
aws cloudformation create-stack --stack-name CWAgentECS-${ClusterName}-${Region} \
    --template-body file://cwagent-ecs-instance-metric-cfn.json \
    --parameters ParameterKey=ClusterName,ParameterValue=${ClusterName} \
                 ParameterKey=TaskRoleArn,ParameterValue=${TaskRoleArn} \
                 ParameterKey=ExecutionRoleArn,ParameterValue=${ExecutionRoleArn} \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${Region}
```

**Troubleshooting the quick setup**

To check the status of the CloudFormation stack, enter the following command.

```
ClusterName=cluster-name
Region=cluster-region
aws cloudformation describe-stacks --stack-name CWAgentECS-$ClusterName-$Region --region $Region
```

If you see the `StackStatus` is other than `CREATE_COMPLETE` or `CREATE_IN_PROGRESS`, check the stack events to find the error. Enter the following command.

```
ClusterName=cluster-name
Region=cluster-region
aws cloudformation describe-stack-events --stack-name CWAgentECS-$ClusterName-$Region --region $Region
```

To check the status of the `cwagent` daemon service, enter the following command. In the output, you should see that the `runningCount` is equal to the `desiredCount` in the `deployment` section. If it isn't equal, check the `failures` section in the output.

```
ClusterName=cluster-name
Region=cluster-region
aws ecs describe-services --services cwagent-daemon-service --cluster $ClusterName --region $Region
```

You can also use the CloudWatch Logs console to check the agent log. Look for the **/ecs/ecs-cwagent-daemon-service** log group.

**Deleting the CloudFormation stack for the CloudWatch agent**

If you need to delete the CloudFormation stack, enter the following command.

```
ClusterName=cluster-name
Region=cluster-region
aws cloudformation delete-stack --stack-name CWAgentECS-${ClusterName}-${Region} --region ${Region}
```

## Manual and custom setup
<a name="deploy-container-insights-ECS-instancelevel-manual"></a>

Follow the steps in this section to manually deploy the CloudWatch agent to collect instance-level metrics from your Amazon ECS clusters that are hosted on EC2 instances.

### Necessary IAM roles and policies
<a name="deploy-container-insights-ECS-instancelevel-IAMRoles"></a>

Two IAM roles are required. You must create them if they don't already exist. For more information about these roles, see [IAM roles for Tasks](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html) and [Amazon ECS Task Execution Role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html).
+ An *ECS task role*, which is used by the CloudWatch agent to publish metrics. If this role already exists, you must make sure it has the `CloudWatchAgentServerPolicy` policy attached.
+ An *ECS task execution role*, which is used by Amazon ECS agent to launch the CloudWatch agent. If this role already exists, you must make sure it has the `AmazonECSTaskExecutionRolePolicy` and `CloudWatchAgentServerPolicy` policies attached.

If you do not already have these roles, you can use the following commands to create them and attach the necessary policies. This first command creates the ECS task role.

```
aws iam create-role --role-name CWAgentECSTaskRole \
    --assume-role-policy-document "{\"Version\": \"2012-10-17\",		 	 	 \"Statement\": [{\"Sid\": \"\",\"Effect\": \"Allow\",\"Principal\": {\"Service\": \"ecs-tasks.amazonaws.com\"},\"Action\": \"sts:AssumeRole\"}]}"
```

After you enter the previous command, note the value of `Arn` from the command output as "TaskRoleArn". You'll need to use it later when you create the task definition. Then enter the following command to attach the necessary policies.

```
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
    --role-name CWAgentECSTaskRole
```

This next command creates the ECS task execution role.

```
aws iam create-role --role-name CWAgentECSExecutionRole \
    --assume-role-policy-document "{\"Version\": \"2012-10-17\",		 	 	 \"Statement\": [{\"Sid\": \"\",\"Effect\": \"Allow\",\"Principal\": {\"Service\": \"ecs-tasks.amazonaws.com\"},\"Action\": \"sts:AssumeRole\"}]}"
```

After you enter the previous command, note the value of `Arn` from the command output as "ExecutionRoleArn". You'll need to use it later when you create the task definition. Then enter the following commands to attach the necessary policies.

```
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
    --role-name CWAgentECSExecutionRole
          
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy \
    --role-name CWAgentECSExecutionRole
```

### Create the task definition and launch the daemon service
<a name="deploy-container-insights-ECS-instancelevel-taskdefinition"></a>

Create a task definition and use it to launch the CloudWatch agent as a daemon service. To create the task definition, enter the following command. In the first lines, replace the placeholders with the actual values for your deployment. *logs-region* is the Region where CloudWatch Logs is located, and *cluster-region* is the Region where your cluster is located. *task-role-arn* is the Arn of the ECS task role that you are using, and *execution-role-arn* is the Arn of the ECS task execution role.

```
TaskRoleArn=task-role-arn
ExecutionRoleArn=execution-role-arn
AWSLogsRegion=logs-region
Region=cluster-region
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/daemon-service/cwagent-ecs-instance-metric/cwagent-ecs-instance-metric.json \
    | sed "s|{{task-role-arn}}|${TaskRoleArn}|;s|{{execution-role-arn}}|${ExecutionRoleArn}|;s|{{awslogs-region}}|${AWSLogsRegion}|" \
    | xargs -0 aws ecs register-task-definition --region ${Region} --cli-input-json
```

Then run the following command to launch the daemon service. Replace *cluster-name* and *cluster-region* with the name and Region of your Amazon ECS cluster.

**Important**  
Remove all capacity provider strategies before you run this command. Otherwise, the command won't work.

```
ClusterName=cluster-name
Region=cluster-region
aws ecs create-service \
    --cluster ${ClusterName} \
    --service-name cwagent-daemon-service \
    --task-definition ecs-cwagent-daemon-service \
    --scheduling-strategy DAEMON \
    --region ${Region}
```

If you see this error message, `An error occurred (InvalidParameterException) when calling the CreateService operation: Creation of service was not idempotent`, you have already created a daemon service named `cwagent-daemon-service`. You must delete that service first, using the following command as an example.

```
ClusterName=cluster-name
Region=cluster-region
aws ecs delete-service \
    --cluster ${ClusterName} \
    --service cwagent-daemon-service \
    --region ${Region} \
    --force
```

### (Optional) Advanced configuration
<a name="deploy-container-insights-ECS-instancelevel-advanced"></a>

Optionally, you can use SSM to specify other configuration options for the CloudWatch agent in your Amazon ECS clusters that are hosted on EC2 instances. These options are as follows:
+ `metrics_collection_interval` – How often in seconds that the CloudWatch agent collects metrics. The default is 60. The range is 1–172,000.
+ `endpoint_override` – (Optional) Specifies a different endpoint to send logs to. You might want to do this if you're publishing from a cluster in a VPC and you want the logs data to go to a VPC endpoint.

  The value of `endpoint_override` must be a string that is a URL.
+ `force_flush_interval` – Specifies in seconds the maximum amount of time that logs remain in the memory buffer before being sent to the server. No matter the setting for this field, if the size of the logs in the buffer reaches 1 MB, the logs are immediately sent to the server. The default value is 5 seconds.
+ `region` – By default, the agent publishes metrics to the same Region where the Amazon ECS container instance is located. To override this, you can specify a different Region here. For example, `"region" : "us-east-1"`

The following is an example of a customized configuration:

```
{
    "agent": {
        "region": "us-east-1"
    },
    "logs": {
        "metrics_collected": {
            "ecs": {
                "metrics_collection_interval": 30
            }
        },
        "force_flush_interval": 5
    }
}
```

**To customize your CloudWatch agent configuration in your Amazon ECS containers**

1. Make sure that the **AmazonSSMReadOnlyAccess** policy is attached to your Amazon ECS Task Execution role. You can enter the following command to do so. This example assumes that your Amazon ECS Task Execution role is CWAgentECSExecutionRole. If you are using a different role, substitute that role name in the following command.

   ```
   aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonSSMReadOnlyAccess \
           --role-name CWAgentECSExecutionRole
   ```

1. Create the customized configuration file similar to the preceding example. Name this file `/tmp/ecs-cwagent-daemon-config.json`.

1. Run the following command to put this configuration into the Parameter Store. Replace *cluster-region* with the Region of your Amazon ECS cluster. To run this command, you must be logged on to a user or role that has the **AmazonSSMFullAccess** policy.

   ```
   Region=cluster-region
   aws ssm put-parameter \
       --name "ecs-cwagent-daemon-service" \
       --type "String" \
       --value "`cat /tmp/ecs-cwagent-daemon-config.json`" \
       --region $Region
   ```

1. Download the task definition file to a local file, such as `/tmp/cwagent-ecs-instance-metric.json`

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/daemon-service/cwagent-ecs-instance-metric/cwagent-ecs-instance-metric.json -o /tmp/cwagent-ecs-instance-metric.json
   ```

1. Modify the task definition file. Remove the following section:

   ```
   "environment": [
                   {
                       "name": "USE_DEFAULT_CONFIG",
                       "value": "True"
                   }
               ],
   ```

   Replace that section with the following:

   ```
   "secrets": [
                   {
                       "name": "CW_CONFIG_CONTENT",
                       "valueFrom": "ecs-cwagent-daemon-service"
                   }
               ],
   ```

1. Restart the agent as a daemon service by following these steps:

   1. Run the following command.

      ```
      TaskRoleArn=task-role-arn
      ExecutionRoleArn=execution-role-arn
      AWSLogsRegion=logs-region
      Region=cluster-region
      cat /tmp/cwagent-ecs-instance-metric.json \
          | sed "s|{{task-role-arn}}|${TaskRoleArn}|;s|{{execution-role-arn}}|${ExecutionRoleArn}|;s|{{awslogs-region}}|${AWSLogsRegion}|" \
          | xargs -0 aws ecs register-task-definition --region ${Region} --cli-input-json
      ```

   1. Run the following command to launch the daemon service. Replace *cluster-name* and *cluster-region* with the name and Region of your Amazon ECS cluster.

      ```
      ClusterName=cluster-name
      Region=cluster-region
      aws ecs create-service \
          --cluster ${ClusterName} \
          --service-name cwagent-daemon-service \
          --task-definition ecs-cwagent-daemon-service \
          --scheduling-strategy DAEMON \
          --region ${Region}
      ```

      If you see this error message, `An error occurred (InvalidParameterException) when calling the CreateService operation: Creation of service was not idempotent`, you have already created a daemon service named `cwagent-daemon-service`. You must delete that service first, using the following command as an example.

      ```
      ClusterName=cluster-name
      Region=Region
      aws ecs delete-service \
          --cluster ${ClusterName} \
          --service cwagent-daemon-service \
          --region ${Region} \
          --force
      ```

# Deploying the AWS Distro for OpenTelemetry to collect EC2 instance-level metrics on Amazon ECS clusters
<a name="deploy-container-insights-ECS-OTEL"></a>

Use the steps in this section to use AWS Distro for OpenTelemetry to collect EC2 instance-level metrics on an Amazon ECS cluster. For more information about the AWS Distro for OpenTelemetry, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/).

These steps assume that you already have a cluster running Amazon ECS. This cluster must be deployed with the EC2 launch type. For more information about using AWS Distro for Open Telemetry with Amazon ECS and setting up an Amazon ECS cluster for this purpose, see [Setting up AWS Distro for OpenTelemetry Collector in Amazon Elastic Container Service for ECS EC2 instance level metrics](https://aws-otel.github.io/docs/setup/ecs#3-setup-the-aws-otel-collector-for-ecs-ec2-instance-metrics). 

**Topics**
+ [

## Quick setup using CloudFormation
](#container-insights-ECS-OTEL-quicksetup)
+ [

## Manual and custom setup
](#container-insights-ECS-OTEL-custom)

## Quick setup using CloudFormation
<a name="container-insights-ECS-OTEL-quicksetup"></a>

Download the CloudFormation template file for installing the AWS Distro for OpenTelemetry collector for Amazon ECS on EC2. Run the following curl command.

```
curl -O https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/ecs/aws-otel-ec2-instance-metrics-daemon-deployment-cfn.yaml
```

After you download the template file, open it and replace *PATH\$1TO\$1CloudFormation\$1TEMPLATE* with the path where you saved the template file. Then export the following parameters and run the CloudFormation command, as shown in the following command.
+ **Cluster\$1Name**– The Amazon ECS cluster name
+ **AWS\$1Region**– The Region where the data will be sent
+ **PATH\$1TO\$1CloudFormation\$1TEMPLATE**– The path where you saved the CloudFormation template file.
+ **command**– To enable the AWS Distro for OpenTelemetry collector to collect the instance-level metrics for Amazon ECS on Amazon EC2, you must specify `--config=/etc/ecs/otel-instance-metrics-config.yaml` for this parameter.

```
ClusterName=Cluster_Name
Region=AWS_Region
command=--config=/etc/ecs/otel-instance-metrics-config.yaml
aws cloudformation create-stack --stack-name AOCECS-${ClusterName}-${Region} \
--template-body file://PATH_TO_CloudFormation_TEMPLATE \
--parameters ParameterKey=ClusterName,ParameterValue=${ClusterName} \
ParameterKey=CreateIAMRoles,ParameterValue=True \
ParameterKey=command,ParameterValue=${command} \
--capabilities CAPABILITY_NAMED_IAM \
--region ${Region}
```

After running this command, use the Amazon ECS console to see if the task is running.

### Troubleshooting the quick setup
<a name="container-insights-ECS-OTEL-quicksetup-troubleshooting"></a>

To check the status of the CloudFormation stack, enter the following command.

```
ClusterName=cluster-name
Region=cluster-region
aws cloudformation describe-stack --stack-name AOCECS-$ClusterName-$Region --region $Region
```

If the value of `StackStatus` is anything other than `CREATE_COMPLETE` or `CREATE_IN_PROGRESS`, check the stack events to find the error. Enter the following command.

```
ClusterName=cluster-name
Region=cluster-region
aws cloudformation describe-stack-events --stack-name AOCECS-$ClusterName-$Region --region $Region
```

To check the status of the `AOCECS` daemon service, enter the following command. In the output, you should see that `runningCount` is equal to the `desiredCount` in the deployment section. If it isn't equal, check the failures section in the output.

```
ClusterName=cluster-name
Region=cluster-region
aws ecs describe-services --services AOCECS-daemon-service --cluster $ClusterName --region $Region
```

You can also use the CloudWatch Logs console to check the agent log. Look for the **/aws/ecs/containerinsights/\$1ClusterName\$1/performance** log group.

## Manual and custom setup
<a name="container-insights-ECS-OTEL-custom"></a>

Follow the steps in this section to manually deploy the AWS Distro for OpenTelemetry to collect instance-level metrics from your Amazon ECS clusters that are hosted on Amazon EC2 instances.

### Step 1: Necessary roles and policies
<a name="container-insights-ECS-OTEL-custom-iam"></a>

Two IAM roles are required. You must create them if they don't already exist. For more information about these roles, see [Create IAM policy](https://aws-otel.github.io/docs/setup/ecs/create-iam-policy) and [Create IAM role](https://aws-otel.github.io/docs/setup/ecs/create-iam-role).

### Step 2: Create the task definition
<a name="container-insights-ECS-OTEL-custom-task"></a>

Create a task definition and use it to launch the AWS Distro for OpenTelemetry as a daemon service.

To use the task definition template to create the task definition, follow the instructions in [ Create ECS EC2 Task Definition for EC2 instance with AWS OTel Collector](https://aws-otel.github.io/docs/setup/ecs/task-definition-for-ecs-ec2-instance).

To use the Amazon ECS console to create the task definition, follow the instructions in [ Install AWS OTel Collector by creating Task Definition through AWS console for Amazon ECS EC2 instance metrics](https://aws-otel.github.io/docs/setup/ecs/create-task-definition-instance-console).

### Step 3: Launch the daemon service
<a name="container-insights-ECS-OTEL-custom-launch"></a>

To launch the AWS Distro for OpenTelemetry as a daemon service, follow the instructions in [ Run your task on the Amazon Elastic Container Service (Amazon ECS) using daemon service](https://aws-otel.github.io/docs/setup/ecs/run-daemon-service).

### (Optional) Advanced configuration
<a name="container-insights-ECS-OTEL-custom-advancdeconfig"></a>

Optionally, you can use SSM to specify other configuration options for the AWS Distro for OpenTelemetry in your Amazon ECS clusters that are hosted on Amazon EC2 instances. For more information, about creating a configuration file, see [ Custom OpenTelemetry Configuration](https://aws-otel.github.io/docs/setup/ecs#5-custom-opentelemetry-configuration). For more information about the options that you can use in the configuration file, see [AWS Container Insights Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/awscontainerinsightreceiver/README.md).

# Set up FireLens to send logs to CloudWatch Logs
<a name="deploy-container-insights-ECS-logs"></a>

FireLens for Amazon ECS enables you to use task definition parameters to route logs to Amazon CloudWatch Logs for log storage and analytics. FireLens works with [Fluent Bit](https://fluentbit.io/) and [Fluentd](https://www.fluentd.org/). We provide an AWS for Fluent Bit image, or you can use your own Fluent Bit or Fluentd image. Creating Amazon ECS task definitions with a FireLens configuration is supported using the AWS SDKs, AWS CLI, and AWS Management Console. For more information about CloudWatch Logs, see [ What is CloudWatch Logs?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

There are key considerations when using FireLens for Amazon ECS. For more information, see [ Considerations](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_firelens.html#firelens-considerations).

To find the AWS for Fluent Bit images, see [ Using the AWS for Fluent Bit image](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-using-fluentbit.html).

To create a task definition that uses a FireLens configuration, see [ Creating a task definition that uses a FireLens configuration](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html).

**Example**

The following task definition example demonstrates how to specify a log configuration that forwards logs to a CloudWatch Logs log group. For more information, see [What Is Amazon CloudWatch Logs?](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) in the *Amazon CloudWatch Logs User Guide*.

In the log configuration options, specify the log group name and the Region it exists in. To have Fluent Bit create the log group on your behalf, specify `"auto_create_group":"true"`. You can also specify the task ID as the log stream prefix, which assists in filtering. For more information, see [Fluent Bit Plugin for CloudWatch Logs](https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/blob/mainline/README.md).

```
{
	"family": "firelens-example-cloudwatch",
	"taskRoleArn": "arn:aws:iam::123456789012:role/ecs_task_iam_role",
	"containerDefinitions": [
		{
			"essential": true,
			"image": "906394416424.dkr.ecr.us-west-2.amazonaws.com/aws-for-fluent-bit:latest",
			"name": "log_router",
			"firelensConfiguration": {
				"type": "fluentbit"
			},
			"logConfiguration": {
				"logDriver": "awslogs",
				"options": {
					"awslogs-group": "firelens-container",
					"awslogs-region": "us-west-2",
					"awslogs-create-group": "true",
					"awslogs-stream-prefix": "firelens"
				}
			},
			"memoryReservation": 50
		 },
		 {
			 "essential": true,
			 "image": "nginx",
			 "name": "app",
			 "logConfiguration": {
				 "logDriver":"awsfirelens",
				 "options": {
					"Name": "cloudwatch_logs",
					"region": "us-west-2",
					"log_key": "log",
                                 "log_group_name": "/aws/ecs/containerinsights/my-cluster/application",
					"auto_create_group": "true",
					"log_stream_name": "my-task-id"
				}
			},
			"memoryReservation": 100
		}
	]
}
```

# Setting up Container Insights on Amazon EKS and Kubernetes
<a name="deploy-container-insights-EKS"></a>

Container Insights is supported on Amazon EKS versions 1.23 and later. The quick start method of installation is supported only on versions 1.24 and later.

The overall process for setting up Container Insights on Amazon EKS or Kubernetes is as follows:

1. Verify that you have the necessary prerequisites.

1. Set up the Amazon CloudWatch Observability EKS add-on, the CloudWatch agent, or AWS Distro for OpenTelemetry on your cluster to send metrics to CloudWatch. 
**Note**  
To use Container Insights with enhanced observability for Amazon EKS, you must use the Amazon CloudWatch Observability EKS add-on or the CloudWatch agent. For more information about this version of Container Insights, see [Container Insights with enhanced observability for Amazon EKS](container-insights-detailed-metrics.md).  
To use Container Insights with Fargate, you must use AWS Distro for OpenTelemetry. Container Insights with enhanced observability for Amazon EKS is not supported on Fargate.
**Note**  
Container Insights now supports Windows worker nodes in an Amazon EKS cluster. Container Insights with enhanced observability for Amazon EKS is also supported on Windows. For information about enabling Container Insights on Windows, see [Using the CloudWatch agent with Container Insights enhanced observability enabled](Container-Insights-EKS-agent.md).

   To use Container Insights with OpenTelemetry metrics, install the Amazon CloudWatch Observability EKS add-on version `v6.0.1-eksbuild.1` or later. For more information, see [Container Insights with OpenTelemetry metrics for Amazon EKS](container-insights-otel-metrics.md).

   Set up Fluent Bit or Fluentd to send logs to CloudWatch Logs. (This is enabled by default if you install the Amazon CloudWatch Observability EKS add-on.)

   You can perform these steps at once as part of the quick start setup if you are using the CloudWatch agent, or do them separately.

1. (Optional) Set up Amazon EKS control plane logging.

1. (Optional) Set up the CloudWatch agent as a StatsD endpoint on the cluster to send StatsD metrics to CloudWatch.

1. (Optional) Enable App Mesh Envoy Access Logs.

With the original version of Container Insights, metrics collected and logs ingested are charged as custom metrics. With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics and logs are charged per observation instead of being charged per metric stored or log ingested. For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

**Topics**
+ [

# Verifying prerequisites for Container Insights in CloudWatch
](Container-Insights-prerequisites.md)
+ [

# Using the CloudWatch agent with Container Insights enhanced observability enabled
](Container-Insights-EKS-agent.md)
+ [

# Using AWS Distro for OpenTelemetry
](Container-Insights-EKS-otel.md)
+ [

# Send logs to CloudWatch Logs
](Container-Insights-EKS-logs.md)
+ [

# Updating or deleting Container Insights on Amazon EKS and Kubernetes
](ContainerInsights-update-delete.md)

# Verifying prerequisites for Container Insights in CloudWatch
<a name="Container-Insights-prerequisites"></a>

Before you install Container Insights on Amazon EKS or Kubernetes, verify the following. These prerequisites apply whether you are using the CloudWatch agent or AWS Distro for OpenTelemetry to set up Container Insights on Amazon EKS clusters.
+ You have a functional Amazon EKS or Kubernetes cluster with nodes attached in one of the Regions that supports the Container Insights for Amazon EKS and Kubernetes. For the list of supported Regions, see [Container Insights](ContainerInsights.md).
+ You have `kubectl` installed and running. For more information, see [Installing `kubectl`](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) in the *Amazon EKS User Guide*.
+ If you're using Kubernetes running on AWS instead of using Amazon EKS, the following prerequisites are also necessary:
  + Be sure that your Kubernetes cluster has enabled role-based access control (RBAC). For more information, see [Using RBAC Authorization](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) in the Kubernetes Reference. 
  + Your kubelet has enabled Webhook authorization mode. For more information, see [Kubelet authentication/authorization](https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/) in the Kubernetes Reference.

You must also grant IAM permissions to enable your Amazon EKS worker nodes to send metrics and logs to CloudWatch. There are two ways to do this:
+ Attach a policy to the IAM role of your worker nodes. This works for both Amazon EKS clusters and other Kubernetes clusters.
+ Use an IAM role for service accounts for the cluster, and attach the policy to this role. This works only for Amazon EKS clusters.

The first option grants permissions to CloudWatch for the entire node, while using an IAM role for the service account gives CloudWatch access to only the appropriate daemonset pods.

**Attaching a policy to the IAM role of your worker nodes**

Follow these steps to attach the policy to the IAM role of your worker nodes. This works for both Amazon EKS clusters and Kubernetes clusters outside of Amazon EKS. 

**To attach the necessary policy to the IAM role for your worker nodes**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. Select one of the worker node instances and choose the IAM role in the description.

1. On the IAM role page, choose **Attach policies**.

1. In the list of policies, select the check box next to **CloudWatchAgentServerPolicy**. If necessary, use the search box to find this policy.

1. Choose **Attach policies**.

If you're running a Kubernetes cluster outside Amazon EKS, you might not already have an IAM role attached to your worker nodes. If not, you must first attach an IAM role to the instance and then add the policy as explained in the previous steps. For more information on attaching a role to an instance, see [Attaching an IAM Role to an Instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/iam-roles-for-amazon-ec2.html#attach-iam-role) in the *Amazon EC2 User Guide*.

If you're running a Kubernetes cluster outside Amazon EKS and you want to collect EBS volume IDs in the metrics, you must add another policy to the IAM role attached to the instance. Add the following as an inline policy. For more information, see [Adding and Removing IAM Identity Permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) in the *IAM User Guide*.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": [
                "ec2:DescribeVolumes"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}
```

------

**Using an IAM service account role**

This method works only on Amazon EKS clusters.

**To grant permission to CloudWatch using an IAM service account role**

1. If you haven't already, enable IAM roles for service accounts on your cluster. For more information, see [Enabling IAM roles for service accounts on your cluster ](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html). 

1. If you haven't already, configure the service account to use an IAM role. For more information, see [Configuring a Kubernetes service account to assume an IAM role](https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html). 

   When you create the role, attach the **CloudWatchAgentServerPolicy** IAM policy to the role in addition to the policy that you create for the role. Also, the associated Kubernetes Service Account that is linked to this role should be created in the `amazon-cloudwatch` namespace, where the CloudWatch and Fluent Bit daemonsets will be deployed in the upcoming steps

1. If you haven't already, associate the IAM role with a service account in your cluster. For more information, see [Configuring a Kubernetes service account to assume an IAM role](https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html).

# Using the CloudWatch agent with Container Insights enhanced observability enabled
<a name="Container-Insights-EKS-agent"></a>

Use the instructions in one of the following sections to set up Container Insights on an Amazon EKS cluster or Kubernetes cluster by using the CloudWatch agent. The quick start instructions are supported only on Amazon EKS versions 1.24 and later.

**Note**  
You can install Container Insights by following the instructions in any one of the following sections. You don't need to follow all three sets of instructions.

**Topics**
+ [

# Quick start with the Amazon CloudWatch Observability EKS add-on
](Container-Insights-setup-EKS-addon.md)
+ [

# Quick Start setup for Container Insights on Amazon EKS and Kubernetes
](Container-Insights-setup-EKS-quickstart.md)
+ [

# Setting up the CloudWatch agent to collect cluster metrics
](Container-Insights-setup-metrics.md)

# Quick start with the Amazon CloudWatch Observability EKS add-on
<a name="Container-Insights-setup-EKS-addon"></a>

You can use the Amazon EKS add-on to install Container Insights with enhanced observability for Amazon EKS. The add-on installs the CloudWatch agent to send infrastructure metrics from the cluster, installs Fluent Bit to send container logs, and also enables CloudWatch [Application Signals](CloudWatch-Application-Monitoring-Sections.md) to send application performance telemetry.

When you use the Amazon EKS add-on version 1.5.0 or later, Container Insights is enabled on both Linux and Windows worker nodes in the cluster. Application Signals is not supported on Windows in Amazon EKS.

The Amazon EKS add-on is not supported for clusters running Kubernetes instead of Amazon EKS.

For more information about the Amazon CloudWatch Observability EKS add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md).

If you use version 3.1.0 or later of the add-on, you can use EKS Pod Identity to grant the required permissions to the add-on. EKS Pod Identity is the recommended option and provides benefits such as least privilege, credential rotation, and auditability. Additionally, using EKS Pod Identity allows you to install the EKS add-on as part of the cluster creation itself.

**To install the Amazon CloudWatch Observability EKS add-on**

1. Follow the [EKS Pod Identity association](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-association.html#pod-id-association-create/) steps to create the IAM role and set up the EKS Pod Identity agent.

1. Attach an IAM policy that grants the required permissions to your role. Replace *my-role* with the name of your IAM role from the previous step.

   ```
   aws iam attach-role-policy \
    --role-name my-role \
   --policy-arn=arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
   ```

1. Enter the following command, using with the IAM role you created in the previous step:

   ```
   aws eks create-addon \
   --addon-name amazon-cloudwatch-observability \
   --cluster-name my-cluster-name \
   --pod-identity-associations serviceAccount=cloudwatch-agent,roleArn=arn:aws:iam::111122223333:role/my-role
   ```

# Quick Start setup for Container Insights on Amazon EKS and Kubernetes
<a name="Container-Insights-setup-EKS-quickstart"></a>

**Important**  
If you are installing Container Insights on an Amazon EKS cluster, we recommend that you use the Amazon CloudWatch Observability EKS add-on for the installation, instead of using the instructions in this section. Additionally, to retrieve accelerated computing networks, you must use the Amazon CloudWatch Observability EKS add-on. For more information and instructions, see [Quick start with the Amazon CloudWatch Observability EKS add-on](Container-Insights-setup-EKS-addon.md).

To complete the setup of Container Insights, you can follow the quick start instructions in this section. If you are installing in an Amazon EKS cluster and you use the instructions in this section on or after November 6, 2023, you install Container Insights with enhanced observability for Amazon EKS in the cluster.

**Important**  
Before completing the steps in this section, you must have verified the prerequisites including IAM permissions. For more information, see [Verifying prerequisites for Container Insights in CloudWatch](Container-Insights-prerequisites.md). 

Alternatively, you can instead follow the instructions in the following two sections, [Setting up the CloudWatch agent to collect cluster metrics](Container-Insights-setup-metrics.md) and [Send logs to CloudWatch Logs](Container-Insights-EKS-logs.md). Those sections provide more configuration details on how the CloudWatch agent works with Amazon EKS and Kubernetes, but require you to perform more installation steps.

With the original version of Container Insights, metrics collected and logs ingested are charged as custom metrics. With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics and logs are charged per observation instead of being charged per metric stored or log ingested. For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

**Note**  
Amazon has now launched Fluent Bit as the default log solution for Container Insights with significant performance gains. We recommend that you use Fluent Bit instead of Fluentd.

## Quick Start with the CloudWatch agent operator and Fluent Bit
<a name="Container-Insights-setup-EKS-quickstart-FluentBit"></a>

There are two configurations for Fluent Bit: an optimized version and a version that provides an experience more similar to Fluentd. The Quick Start configuration uses the optimized version. For more details about the Fluentd-compatible configuration, see [Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs](Container-Insights-setup-logs-FluentBit.md).

The CloudWatch agent operator is an additional container that gets installed to an Amazon EKS cluster. It is modeled after the OpenTelemetry Operator for Kubernetes. The operator manages the lifecycle of Kubernetes resources in a cluster. It installs the CloudWatch Agent, DCGM Exporter (NVIDIA), and the AWS Neuron Monitor on an Amazon EKS cluster and manages them. Fluent Bit and the CloudWatch Agent for Windows are installed directly to an Amazon EKS cluster without the operator managing them. 

For a more secure and feature-rich certificate authority solution, the CloudWatch agent operator requires cert-manager, a widely-adopted solution for TLS certificate management in Kubernetes. Using cert-manager simplifies the process of obtaining, renewing, managing and using these certificates. It ensures that certificates are valid and up to date, and attempts to renew certificates at a configured time before expiry. cert-manager also facilitates issuing certificates from a variety of supported sources, including AWS Certificate Manager Private Certificate Authority.

**To deploy Container Insights using the quick start**

1. Install cert-manager if it is not already installed in the cluster. For more information, see [cert-manager Installation](https://cert-manager.io/docs/installation/).

1. Install the custom resource definitions (CRD) by entering the following commmand.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-custom-resource-definitions.yaml | kubectl apply --server-side -f -
   ```

1. Install the operator by entering the following command. Replace *my-cluster-name* with the name of your Amazon EKS or Kubernetes cluster, and replace *my-cluster-region* with the name of the Region where the logs are published. We recommend that you use the same Region where your cluster is deployed to reduce the AWS outbound data transfer costs.

   ```
   ClusterName=my-cluster-name
   RegionName=my-cluster-region
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl apply -f -
   ```

   For example, to deploy Container Insights on the cluster named `MyCluster` and publish the logs and metrics to US West (Oregon), enter the following command.

   ```
   ClusterName='MyCluster'
   RegionName='us-west-2'
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl apply -f -
   ```

**Migrating from Container Insights**

If you already have Container Insights configured in an Amazon EKS cluster and you want to migrate to Container Insights with enhanced observability for Amazon EKS, see [Upgrading to Container Insights with enhanced observability for Amazon EKS in CloudWatch](Container-Insights-upgrade-enhanced.md)

**Deleting Container Insights**

If you want to remove Container Insights after using the quick start setup, enter the following commands.

```
ClusterName=my-cluster-name 
RegionName=my-cluster-region
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl delete -f -
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-custom-resource-definitions.yaml | kubectl delete -f -
```

# Setting up the CloudWatch agent to collect cluster metrics
<a name="Container-Insights-setup-metrics"></a>

**Important**  
If you are installing Container Insights on on Amazon EKS cluster, we recommend that you use the Amazon CloudWatch Observability EKS add-on for the installation, instead of using the instructions in this section. For more information and instructions, see [Quick start with the Amazon CloudWatch Observability EKS add-on](Container-Insights-setup-EKS-addon.md).

To set up Container Insights to collect metrics, you can follow the steps in [Quick Start setup for Container Insights on Amazon EKS and Kubernetes](Container-Insights-setup-EKS-quickstart.md) or you can follow the steps in this section. In the following steps, you set up the CloudWatch agent to be able to collect metrics from your clusters.

If you are installing in an Amazon EKS cluster and you use the instructions in this section on or after November 6, 2023, you install Container Insights with enhanced observability for Amazon EKS in the cluster.

## Step 1: Create a namespace for CloudWatch
<a name="create-namespace-metrics"></a>

Use the following step to create a Kubernetes namespace called `amazon-cloudwatch` for CloudWatch. You can skip this step if you have already created this namespace.

**To create a namespace for CloudWatch**
+ Enter the following command.

  ```
  kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
  ```

## Step 2: Create a service account in the cluster
<a name="create-service-account"></a>

Use one of the following methods to create a service account for the CloudWatch agent, if you do not already have one.
+ Use `kubectl`
+ Use a `kubeconfig` file

### Use `kubectl` for authentication
<a name="use-kubectl"></a>

**To use `kubectl` to create a service account for the CloudWatch agent**
+ Enter the following command.

  ```
  kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-serviceaccount.yaml
  ```

If you didn't follow the previous steps, but you already have a service account for the CloudWatch agent that you want to use, you must ensure that it has the following rules. Additionally, in the rest of the steps in the Container Insights installation, you must use the name of that service account instead of `cloudwatch-agent`. The CloudWatch agent requires a ClusterRole for cluster-wide access and a namespace-scoped role for ConfigMap operations in the `amazon-cloudwatch` namespace. 

**ClusterRole (cluster-scoped permissions):**

```
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes", "endpoints"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["list", "watch", "get"]
  - apiGroups: ["apps"]
    resources: ["replicasets", "daemonsets", "deployments", "statefulsets"]
    verbs: ["list", "watch"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["nodes/stats", "events"]
    verbs: ["create", "get"]
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources: ["endpointslices"]
    verbs: ["list", "watch", "get"]
```

**Role (namespace-scoped permissions for amazon-cloudwatch namespace):**

```
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["create", "update"]
```

### Use `kubeconfig` for authentication
<a name="use-kubeconfig"></a>

Alternatively, you can use a `kubeconfig` file for authentication. This method allows you to bypass the need for a service account b directly specifying the `kubeconfig` path in your CloudWatch agent configuration. It also allows you to remove your dependency on the Kubernetes control plane API for authentication, streamlining your setup and potentially increasing security by managing authentication through your kubeconfig file. 

To use this method, update your CloudWatch agent configuration file to specify the path to your `kubeconfig` file, as in the following example.

```
{
  "logs": {
    "metrics_collected": {
      "kubernetes": {
        "cluster_name": "YOUR_CLUSTER_NAME",
        "enhanced_container_insights": false,
        "accelerated_compute_metrics": false,
        "tag_service": false,
        "kube_config_path": "/path/to/your/kubeconfig" 
        "host_ip": "HOSTIP"
      }
    }
  }
}
```

To create a `kubeconfig` file, create a Certificate Signing Request (CSR) for the `admin/{create_your_own_user}` user with the `system:masters` Kubernetes role. Then sign with Kubernetes cluster’s Certificate Authority (CA) and create the `kubeconfig` file.

## Step 3: Create a ConfigMap for the CloudWatch agent
<a name="create-configmap"></a>

Use the following steps to create a ConfigMap for the CloudWatch agent.

**To create a ConfigMap for the CloudWatch agent**

1. Download the ConfigMap YAML to your `kubectl` client host by running the following command:

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-configmap-enhanced.yaml
   ```

1. Edit the downloaded YAML file, as follows:
   + **cluster\$1name** – In the `kubernetes` section, replace `{{cluster_name}}` with the name of your cluster. Remove the `{{}}` characters. Alternatively, if you're using an Amazon EKS cluster, you can delete the `"cluster_name"` field and value. If you do, the CloudWatch agent detects the cluster name from the Amazon EC2 tags.

1. (Optional) Make further changes to the ConfigMap based on your monitoring requirements, as follows:
   + **metrics\$1collection\$1interval** – In the `kubernetes` section, you can specify how often the agent collects metrics. The default is 60 seconds. The default cadvisor collection interval in kubelet is 15 seconds, so don't set this value to less than 15 seconds.
   + **endpoint\$1override** – In the `logs` section, you can specify the CloudWatch Logs endpoint if you want to override the default endpoint. You might want to do this if you're publishing from a cluster in a VPC and you want the data to go to a VPC endpoint.
   + **force\$1flush\$1interval** – In the `logs` section, you can specify the interval for batching log events before they are published to CloudWatch Logs. The default is 5 seconds.
   + **region** – By default, the agent published metrics to the Region where the worker node is located. To override this, you can add a `region` field in the `agent` section: for example, `"region":"us-west-2"`.
   + **statsd** section – If you want the CloudWatch Logs agent to also run as a StatsD listener in each worker node of your cluster, you can add a `statsd` section to the `metrics` section, as in the following example. For information about other StatsD options for this section, see [Retrieve custom metrics with StatsD](CloudWatch-Agent-custom-metrics-statsd.md).

     ```
     "metrics": {
       "metrics_collected": {
         "statsd": {
           "service_address":":8125"
         }
       }
     }
     ```

     A full example of the JSON section is as follows. If you're using a `kubeconfig` file for authentication, add the `kube_config_path` parameter to specify the path to your kubeconfig file.

     ```
     {
         "agent": {
             "region": "us-east-1"
         },
         "logs": {
             "metrics_collected": {
                 "kubernetes": {
                     "cluster_name": "MyCluster",
                     "metrics_collection_interval": 60,
                     "kube_config_path": "/path/to/your/kubeconfig" //if using kubeconfig for authentication
                 }
             },
             "force_flush_interval": 5,
             "endpoint_override": "logs.us-east-1.amazonaws.com"
         },
         "metrics": {
             "metrics_collected": {
                 "statsd": {
                     "service_address": ":8125"
                 }
             }
         }
     }
     ```

1. Create the ConfigMap in the cluster by running the following command.

   ```
   kubectl apply -f cwagent-configmap-enhanced.yaml
   ```

## Step 4: Deploy the CloudWatch agent as a DaemonSet
<a name="deploy-agent-yaml"></a>

To finish the installation of the CloudWatch agent and begin collecting container metrics, use the following steps.

**To deploy the CloudWatch agent as a DaemonSet**

1. 
   + If you do not want to use StatsD on the cluster, enter the following command.

     ```
     kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml
     ```
   + If you do want to use StatsD, follow these steps:

     1. Download the DaemonSet YAML to your `kubectl` client host by running the following command.

        ```
        curl -O  https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml
        ```

     1. Uncomment the `port` section in the `cwagent-daemonset.yaml` file as in the following: 

        ```
        ports:
          - containerPort: 8125
            hostPort: 8125
            protocol: UDP
        ```

     1. Deploy the CloudWatch agent in your cluster by running the following command.

        ```
        kubectl apply -f cwagent-daemonset.yaml
        ```

     1. Deploy the CloudWatch agent on Windows nodes in your cluster by running the following command. The StatsD listener is not supported on the CloudWatch agent on Windows.

        ```
        kubectl apply -f cwagent-daemonset-windows.yaml
        ```

1. Validate that the agent is deployed by running the following command.

   ```
   kubectl get pods -n amazon-cloudwatch
   ```

When complete, the CloudWatch agent creates a log group named `/aws/containerinsights/Cluster_Name/performance` and sends the performance log events to this log group. If you also set up the agent as a StatsD listener, the agent also listens for StatsD metrics on port 8125 with the IP address of the node where the application pod is scheduled.

### Troubleshooting
<a name="ContainerInsights-deploy-troubleshooting"></a>

If the agent doesn't deploy correctly, try the following:
+ Run the following command to get the list of pods.

  ```
  kubectl get pods -n amazon-cloudwatch
  ```
+ Run the following command and check the events at the bottom of the output.

  ```
  kubectl describe pod pod-name -n amazon-cloudwatch
  ```
+ Run the following command to check the logs.

  ```
  kubectl logs pod-name  -n amazon-cloudwatch
  ```

# Using AWS Distro for OpenTelemetry
<a name="Container-Insights-EKS-otel"></a>

You can set up Container Insights to collect metrics from Amazon EKS clusters by using the AWS Distro for OpenTelemetry collector. For more information about the AWS Distro for OpenTelemetry, see [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/). 

**Important**  
If you install using AWS Distro for OpenTelemetry, you install Container Insights but do not get Container Insights with enhanced observability for Amazon EKS. You will not collect the detailed metrics supported in Container Insights with enhanced observability for Amazon EKS.

How you set up Container Insights depends on whether the cluster is hosted on Amazon EC2 instances or on AWS Fargate.

## Amazon EKS clusters hosted on Amazon EC2
<a name="Container-Insights-EKS-otel-EC2"></a>

If you have not already done so, make sure that you have fulfilled the prerequisites including the necessary IAM roles. For more information, see [Verifying prerequisites for Container Insights in CloudWatch](Container-Insights-prerequisites.md).

Amazon provides a Helm chart that you can use to set up the monitoring of Amazon Elastic Kubernetes Service on Amazon EC2. This monitoring uses the AWS Distro for OpenTelemetry(ADOT) Collector for metrics and Fluent Bit for logs. Therefore, the Helm chart is useful for customers who use Amazon EKS on Amazon EC2 and want to collect metrics and logs to send to CloudWatch Container Insights. For more information about this Helm chart, see [ADOT Helm chart for EKS on EC2 metrics and logs to Amazon CloudWatch Container Insights](https://github.com/aws-observability/aws-otel-helm-charts/tree/main/charts/adot-exporter-for-eks-on-ec2). 

Alternatively, you can also use the instructions in the rest of this section.

First, deploy the AWS Distro for OpenTelemetry collector as a DaemonSet by entering the following command. 

```
curl https://raw.githubusercontent.com/aws-observability/aws-otel-collector/main/deployment-template/eks/otel-container-insights-infra.yaml |
kubectl apply -f -
```

To confirm that the collector is running, enter the following command.

```
kubectl get pods -l name=aws-otel-eks-ci -n aws-otel-eks
```

If the output of this command includes multiple pods in the `Running` state, the collector is running and collecting metrics from the cluster. The collector creates a log group named `aws/containerinsights/cluster-name/performance` and sends the performance log events to it.

For information about how to see your Container Insights metrics in CloudWatch, see [Viewing Container Insights metrics](Container-Insights-view-metrics.md).

AWS has also provided documentation on GitHub for this scenario. If you want to customize the metrics and logs published by Container Insights, see [https://aws-otel.github.io/docs/getting-started/container-insights/eks-infra](https://aws-otel.github.io/docs/getting-started/container-insights/eks-infra).

## Amazon EKS clusters hosted on Fargate
<a name="Container-Insights-EKS-otel-Fargate"></a>

For instructions for how to configure and deploy an ADOT Collector to collect system metrics from workloads deployed to an Amazon EKS cluster on Fargate and send them to CloudWatch Container Insights, see [Container Insights EKS Fargate](https://aws-otel.github.io/docs/getting-started/container-insights/eks-fargate) in the AWS Distro for OpenTelemetry documentation.

# Send logs to CloudWatch Logs
<a name="Container-Insights-EKS-logs"></a>

To send logs from your containers to Amazon CloudWatch Logs, you can use Fluent Bit. For more information, see [Fluent Bit](https://fluentbit.io/).

**Note**  
As of February 10 2025, AWS has deprecated support for FluentD as a log forwarder to CloudWatch Logs. We recommend that you use Fluent Bit, which is a lightweight and resource-efficient alternative. Existing FluentD deployments will continue to function. Migrate your logging pipeline to Fluent Bit to ensure continued support and optimal performance.   
Container Insights previously also supported using FluentD to send logs from your containers. FluentD has been deprecated and is now not supported for Container Insights. Use Fluent Bit instead.

**Topics**
+ [

# Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs
](Container-Insights-setup-logs-FluentBit.md)
+ [

# (Optional) Set up Amazon EKS control plane logging
](Container-Insights-setup-control-plane-logging.md)
+ [

# (Optional) Enable the Use\$1Kubelet feature for large clusters
](ContainerInsights-use-kubelet.md)

# Set up Fluent Bit as a DaemonSet to send logs to CloudWatch Logs
<a name="Container-Insights-setup-logs-FluentBit"></a>

The following sections help you deploy Fluent Bit to send logs from containers to CloudWatch Logs.

**Topics**
+ [

## Setting up Fluent Bit
](#Container-Insights-FluentBit-setup)
+ [

## Multi-line log support
](#ContainerInsights-fluentbit-multiline)
+ [

## (Optional) Reducing the log volume from Fluent Bit
](#ContainerInsights-fluentbit-volume)
+ [

## Troubleshooting
](#Container-Insights-FluentBit-troubleshoot)
+ [

## Dashboard
](#Container-Insights-FluentBit-dashboard)

## Setting up Fluent Bit
<a name="Container-Insights-FluentBit-setup"></a>

To set up Fluent Bit to collect logs from your containers, you can follow the steps in [Quick Start setup for Container Insights on Amazon EKS and Kubernetes](Container-Insights-setup-EKS-quickstart.md) or you can follow the steps in this section.

With either method, the IAM role that is attached to the cluster nodes must have sufficient permissions. For more information about the permissions required to run an Amazon EKS cluster, see [Amazon EKS IAM Policies, Roles, and Permissions](https://docs.aws.amazon.com/eks/latest/userguide/IAM_policies.html) in the *Amazon EKS User Guide*.

In the following steps, you set up Fluent Bit as a daemonSet to send logs to CloudWatch Logs. When you complete this step, Fluent Bit creates the following log groups if they don't already exist.

**Important**  
If you already have FluentD configured in Container Insights and the FluentD DaemonSet is not running as expected (this can happen if you use the `containerd` runtime), you must uninstall it before installing Fluent Bit to prevent Fluent Bit from processing the FluentD error log messages. Otherwise, you must uninstall FluentD immediately after you have successfully installed Fluent Bit. Uninstalling Fluentd after installing Fluent Bit ensures continuity in logging during this migration process. Only one of Fluent Bit or FluentD is needed to send logs to CloudWatch Logs.


| Log group name | Log source | 
| --- | --- | 
|  `/aws/containerinsights/Cluster_Name/application`  |  All log files in `/var/log/containers`  | 
|  `/aws/containerinsights/Cluster_Name/host`  |  Logs from `/var/log/dmesg`, `/var/log/secure`, and `/var/log/messages`  | 
|  `/aws/containerinsights/Cluster_Name/dataplane`  |  The logs in `/var/log/journal` for `kubelet.service`, `kubeproxy.service`, and `docker.service`.  | 

**To install Fluent Bit to send logs from containers to CloudWatch Logs**

1. If you don't already have a namespace called `amazon-cloudwatch`, create one by entering the following command:

   ```
   kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
   ```

1. Run the following command to create a ConfigMap named `cluster-info` with the cluster name and the Region to send logs to. Replace *cluster-name* and *cluster-region* with your cluster's name and Region.

   ```
   ClusterName=cluster-name
   RegionName=cluster-region
   FluentBitHttpPort='2020'
   FluentBitReadFromHead='Off'
   [[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
   [[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
   kubectl create configmap fluent-bit-cluster-info \
   --from-literal=cluster.name=${ClusterName} \
   --from-literal=http.server=${FluentBitHttpServer} \
   --from-literal=http.port=${FluentBitHttpPort} \
   --from-literal=read.head=${FluentBitReadFromHead} \
   --from-literal=read.tail=${FluentBitReadFromTail} \
   --from-literal=logs.region=${RegionName} -n amazon-cloudwatch
   ```

   In this command, the `FluentBitHttpServer` for monitoring plugin metrics is on by default. To turn it off, change the third line in the command to `FluentBitHttpPort=''` (empty string) in the command.

   Also by default, Fluent Bit reads log files from the tail, and will capture only new logs after it is deployed. If you want the opposite, set `FluentBitReadFromHead='On'` and it will collect all logs in the file system.

1. Download and deploy the Fluent Bit daemonset to the cluster by running one of the following commands.
   + If you want the Fluent Bit optimized configuration for Linux computers, run this command.

     ```
     kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml
     ```
   + If you want the Fluent Bit optimized configuration for Windows computers, run this command.

     ```
     kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit-windows.yaml
     ```
   + If you are using Linux computers and want the Fluent Bit configuration that is more similar to Fluentd, run this command.

     ```
     kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit-compatible.yaml
     ```
**Important**  
The Fluent Bit daemonset configuration by default sets the log level to INFO, which can result in higher CloudWatch Logs ingestion costs. If you want to reduce log ingestion volume and costs, you can change the log level to ERROR.  
For more information about how to reduce the log volume, see [(Optional) Reducing the log volume from Fluent Bit](#ContainerInsights-fluentbit-volume)

1. Validate the deployment by entering the following command. Each node should have one pod named **fluent-bit-\$1**.

   ```
   kubectl get pods -n amazon-cloudwatch
   ```

The above steps create the following resources in the cluster:
+ A service account named `Fluent-Bit` in the `amazon-cloudwatch` namespace. This service account is used to run the Fluent Bit daemonSet. For more information, see [Managing Service Accounts](https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/) in the Kubernetes Reference.
+ A cluster role named `Fluent-Bit-role` in the `amazon-cloudwatch` namespace. This cluster role grants `get`, `list`, and `watch` permissions on pod logs to the `Fluent-Bit` service account. For more information, see [API Overview](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#api-overview/) in the Kubernetes Reference.
+ A ConfigMap named `Fluent-Bit-config` in the `amazon-cloudwatch` namespace. This ConfigMap contains the configuration to be used by Fluent Bit. For more information, see [Configure a Pod to Use a ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) in the Kubernetes Tasks documentation.

If you want to verify your Fluent Bit setup, follow these steps.

**Verify the Fluent Bit setup**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Log groups**.

1. Make sure that you're in the Region where you deployed Fluent Bit.

1. Check the list of log groups in the Region. You should see the following:
   + `/aws/containerinsights/Cluster_Name/application`
   + `/aws/containerinsights/Cluster_Name/host`
   + `/aws/containerinsights/Cluster_Name/dataplane`

1. Navigate to one of these log groups and check the **Last Event Time** for the log streams. If it is recent relative to when you deployed Fluent Bit, the setup is verified.

   There might be a slight delay in creating the `/dataplane` log group. This is normal as these log groups only get created when Fluent Bit starts sending logs for that log group.

## Multi-line log support
<a name="ContainerInsights-fluentbit-multiline"></a>

For information on how to use Fluent Bit with multi-line logs, see the following sections of the Fluent Bit documentation:
+ [Multiline Parsing](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/multiline-parsing)
+ [Multiline and Containers (v1.8)](https://docs.fluentbit.io/manual/pipeline/inputs/tail#multiline-and-containers-v1.8)
+ [Multiline Core (v1.8)](https://docs.fluentbit.io/manual/pipeline/inputs/tail#multiline-core-v1.8)
+ [Always use multiline in the tail input](https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#always-use-multiline-the-tail-input)

## (Optional) Reducing the log volume from Fluent Bit
<a name="ContainerInsights-fluentbit-volume"></a>

By default, we send Fluent Bit application logs and Kubernetes metadata to CloudWatch. If you want to reduce the volume of data being sent to CloudWatch, you can stop one or both of these data sources from being sent to CloudWatch. If you have followed the steps on this page to set up Fluent Bit, download the Kubernetes manifest YAML file from the kubectl `apply` command that you previously ran and modify it with your changes, which you can then re-apply to your cluster. Alternatively, if you are using the Amazon CloudWatch Observability EKS add-on or Helm chart, see [(Optional) Additional configuration](install-CloudWatch-Observability-EKS-addon.md#install-CloudWatch-Observability-EKS-addon-configuration) for information about managing the Fluent Bit configuration by using the add-on’s advanced config or the Helm chart.

To stop Fluent Bit application logs, remove the following section from the `Fluent Bit configuration` file.

```
[INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/fluent-bit*
        Parser              docker
        DB                  /fluent-bit/state/flb_log.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
```

To remove Kubernetes metadata from being appended to log events that are sent to CloudWatch, add the following filters to the `application-log.conf` section in the Fluent Bit configuration. Replace *<Metadata\$11>* and the similar fields with with the actual metadata identifiers.

```
application-log.conf: |
    [FILTER]
        Name                nest
        Match               application.*
        Operation           lift
        Nested_under        kubernetes
        Add_prefix          Kube.

    [FILTER]
        Name                modify
        Match               application.*
        Remove              Kube.<Metadata_1>
        Remove              Kube.<Metadata_2>
        Remove              Kube.<Metadata_3>
    
    [FILTER]
        Name                nest
        Match               application.*
        Operation           nest
        Wildcard            Kube.*
        Nested_under        kubernetes
        Remove_prefix       Kube.
```

## Troubleshooting
<a name="Container-Insights-FluentBit-troubleshoot"></a>

If you don't see these log groups and are looking in the correct Region, check the logs for the Fluent Bit daemonSet pods to look for the error.

Run the following command and make sure that the status is `Running`.

```
kubectl get pods -n amazon-cloudwatch
```

If the logs have errors related to IAM permissions, check the IAM role that is attached to the cluster nodes. For more information about the permissions required to run an Amazon EKS cluster, see [Amazon EKS IAM Policies, Roles, and Permissions](https://docs.aws.amazon.com/eks/latest/userguide/IAM_policies.html) in the *Amazon EKS User Guide*.

If the pod status is `CreateContainerConfigError`, get the exact error by running the following command.

```
kubectl describe pod pod_name -n amazon-cloudwatch
```

## Dashboard
<a name="Container-Insights-FluentBit-dashboard"></a>

You can create a dashboard to monitor metrics of each running plugin. You can see data for input and output bytes and for record processing rates as well as output errors and retry/failed rates. To view these metrics, you will need to install the CloudWatch agent with Prometheus metrics collection for Amazon EKS and Kubernetes clusters. For more information about how to set up the dashboard, see [Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clustersInstall the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters](ContainerInsights-Prometheus-Setup.md).

**Note**  
Before you can set up this dashboard, you must set up Container Insights for Prometheus metrics. For more information, see [Container Insights Prometheus metrics monitoring](ContainerInsights-Prometheus.md).

**To create a dashboard for the Fluent Bit Prometheus metrics**

1. Create environment variables, replacing the values on the right in the following lines to match your deployment.

   ```
   DASHBOARD_NAME=your_cw_dashboard_name
   REGION_NAME=your_metric_region_such_as_us-west-1
   CLUSTER_NAME=your_kubernetes_cluster_name
   ```

1. Create the dashboard by running the following command.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/fluent-bit/cw_dashboard_fluent_bit.json \
   | sed "s/{{YOUR_AWS_REGION}}/${REGION_NAME}/g" \
   | sed "s/{{YOUR_CLUSTER_NAME}}/${CLUSTER_NAME}/g" \
   | xargs -0 aws cloudwatch put-dashboard --dashboard-name ${DASHBOARD_NAME} --dashboard-body
   ```

# (Optional) Set up Amazon EKS control plane logging
<a name="Container-Insights-setup-control-plane-logging"></a>

If you're using Amazon EKS, you can optionally enable Amazon EKS control plane logging, to provide audit and diagnostic logs directly from the Amazon EKS control plane to CloudWatch Logs. For more information, see [Amazon EKS Control Plane Logging](https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html).

# (Optional) Enable the Use\$1Kubelet feature for large clusters
<a name="ContainerInsights-use-kubelet"></a>

By default, the Use\$1Kubelet feature is disabled in the FluentBit Kubernetes plugin. Enabling this feature can reduce traffic to the API server and mitigate the issue of the API Server being a bottleneck. We recommend that you enable this feature for large clusters.

To enable Use\$1Kubelet, first add the nodes and nodes/proxy permissions to the clusterRole config.

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-role
rules:
  - nonResourceURLs:
      - /metrics
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - pods/logs
      - nodes
      - nodes/proxy
    verbs: ["get", "list", "watch"]
```

In the DaemonSet configuration, this feature needs host network access. The image version for `amazon/aws-for-fluent-bit` should 2.12.0 or later, or the fluent bit image version should be 1.7.2 or later.

```
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: amazon-cloudwatch
  labels:
    k8s-app: fluent-bit
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit
  template:
    metadata:
      labels:
        k8s-app: fluent-bit
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: fluent-bit
        image: amazon/aws-for-fluent-bit:2.19.0
        imagePullPolicy: Always
        env:
            - name: AWS_REGION
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: logs.region
            - name: CLUSTER_NAME
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: cluster.name
            - name: HTTP_SERVER
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: http.server
            - name: HTTP_PORT
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: http.port
            - name: READ_FROM_HEAD
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: read.head
            - name: READ_FROM_TAIL
              valueFrom:
                configMapKeyRef:
                  name: fluent-bit-cluster-info
                  key: read.tail
            - name: HOST_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name      
            - name: CI_VERSION
              value: "k8s/1.3.8"
        resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 500m
              memory: 100Mi
        volumeMounts:
        # Please don't change below read-only permissions
        - name: fluentbitstate
          mountPath: /var/fluent-bit/state
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: runlogjournal
          mountPath: /run/log/journal
          readOnly: true
        - name: dmesg
          mountPath: /var/log/dmesg
          readOnly: true
      terminationGracePeriodSeconds: 10
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      - name: fluentbitstate
        hostPath:
          path: /var/fluent-bit/state
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: runlogjournal
        hostPath:
          path: /run/log/journal
      - name: dmesg
        hostPath:
          path: /var/log/dmesg
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"
```

The Kubernetes Plugin configuration should be similar to the following:

```
[FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Use_Kubelet         On
        Kubelet_Port        10250 
        Buffer_Size         0
```

# Updating or deleting Container Insights on Amazon EKS and Kubernetes
<a name="ContainerInsights-update-delete"></a>

Use the steps in these sections to update your CloudWatch agent container image, or to remove Container Insights from an Amazon EKS or Kubernetes cluster.

**Topics**
+ [

# Upgrading to Container Insights with enhanced observability for Amazon EKS in CloudWatch
](Container-Insights-upgrade-enhanced.md)
+ [

# Updating the CloudWatch agent container image
](ContainerInsights-update-image.md)
+ [

# Deleting the CloudWatch agent and Fluent Bit for Container Insights
](ContainerInsights-delete-agent.md)

# Upgrading to Container Insights with enhanced observability for Amazon EKS in CloudWatch
<a name="Container-Insights-upgrade-enhanced"></a>

**Important**  
If you are upgrading or installing Container Insights on an Amazon EKS cluster, we recommend that you use the Amazon CloudWatch Observability EKS add-on for the installation, instead of using the instructions in this section. Additionally, to retrieve accelerated computing metrics, you must use the Amazon CloudWatch Observability EKS add-on. For more information and instructions, see [Quick start with the Amazon CloudWatch Observability EKS add-on](Container-Insights-setup-EKS-addon.md).

Container Insights with enhanced observability for Amazon EKS is the newest version of Container Insights. It collects detailed metrics from clusters running Amazon EKS and offers curated, immediately usable dashboards to drill down into application and infrastructure telemetry. For more information about this version of Container Insights, see [Container Insights with enhanced observability for Amazon EKS](container-insights-detailed-metrics.md).

If you have installed the original version of Container Insights in an Amazon EKS cluster and you want to upgrade it to the newer version with enhanced observability, follow the instructions in this section.

**Important**  
Before completing the steps in this section, you must have verified the prerequisites including cert-manager. For more information, see [Quick Start with the CloudWatch agent operator and Fluent Bit](Container-Insights-setup-EKS-quickstart.md#Container-Insights-setup-EKS-quickstart-FluentBit).

**To upgrade an Amazon EKS cluster to Container Insights with enhanced observability for Amazon EKS**

1. Install the CloudWatch agent operator by entering the following command. Replace *my-cluster-name* with the name of your Amazon EKS or Kubernetes cluster, and replace *my-cluster-region* with the name of the Region where the logs are published. We recommend that you use the same Region where your cluster is deployed to reduce the AWS outbound data transfer costs.

   ```
   ClusterName=my-cluster-name
   RegionName=my-cluster-region
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl apply -f -
   ```

   If you notice a failure caused by conflicting resources, it is likely because you already have the CloudWatch agent and Fluent Bit with its associated components such as the ServiceAccount, the ClusterRole and the ClusterRoleBinding installed on the cluster. When the CloudWatch agent operator tries to install the CloudWatch agent and its associated components, if it detects any change in the contents, it by default fails the installation or update to avoid overwriting the state of the resources on the cluster. We recommend that you delete any existing CloudWatch agent with Container Insights setup that you had previously installed on the cluster, and then install the CloudWatch agent operator.

1. (Optional) To apply an existing custom Fluent Bit configuration, you must update the configmap associated with the Fluent Bit daemonset. The CloudWatch agent operator provides a default configuration for Fluent Bit, and you can override or modify the default configuration as needed. To apply a custom configuration, follow these steps.

   1. Open the existing configuration by entering the following command.

      ```
      kubectl edit cm fluent-bit-config -n amazon-cloudwatch
      ```

   1. Make your changes in the file, then enter `:wq` to save the file and exit edit mode.

   1. Restart Fluent Bit by entering the following command.

      ```
      kubectl rollout restart ds fluent-bit -n amazon-cloudwatch
      ```

# Updating the CloudWatch agent container image
<a name="ContainerInsights-update-image"></a>

**Important**  
If you are upgrading or installing Container Insights on an Amazon EKS cluster, we recommend that you use the Amazon CloudWatch Observability EKS add-on for the installation, instead of using the instructions in this section. Additionally, to retrieve accelerated computing metrics, you must use the Amazon CloudWatch Observability EKS add-on or the CloudWatch agent operator. For more information and instructions, see [Quick start with the Amazon CloudWatch Observability EKS add-on](Container-Insights-setup-EKS-addon.md).

If you need to update your container image to the latest version, use the steps in this section.

**To update your container image**

1. Verify if the `amazoncloudwatchagent` Customer Resource Definition (CRD) already exists by entering the following command.

   ```
   kubectl get crds amazoncloudwatchagents.cloudwatch.aws.amazon.com -n amazon-cloudwatch
   ```

   If this command returns an error that the CRD is missing, the cluster doesn't have Container Insights with enhanced observabilit for Amazon EKS configured with the CloudWatch agent operator. In this case, see [Upgrading to Container Insights with enhanced observability for Amazon EKS in CloudWatch](Container-Insights-upgrade-enhanced.md).

1. Apply the latest `cwagent-version.yaml` file by entering the following command.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-version.yaml | kubectl apply -f -
   ```

# Deleting the CloudWatch agent and Fluent Bit for Container Insights
<a name="ContainerInsights-delete-agent"></a>

If you installed Container Insights by using installing the CloudWatch Observability add-on for Amazon EKS, you can delete Container Insights and the CloudWatch agent by entering the following command:

**Note**  
The Amazon EKS add-on now supports Container Insights on Windows worker nodes. If you delete the Amazon EKS add-on, Container Insights for Windows is also deleted.

```
aws eks delete-addon —cluster-name my-cluster —addon-name amazon-cloudwatch-observability
```

Otherwise, to delete all resources related to the CloudWatch agent and Fluent Bit, enter the following command. In this command, *My\$1Cluster\$1Name* is the name of your Amazon EKS or Kubernetes cluster, and *My\$1Region* is the name of the Region where the logs are published.

```
ClusterName=My_Cluster_Name
RegionName=My-Region
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl delete -f -
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-custom-resource-definitions.yaml | kubectl delete -f -
```

# Setting up Container Insights on RedHat OpenShift on AWS (ROSA)
<a name="deploy-container-insights-RedHatOpenShift"></a>

CloudWatch Container Insights with enhanced observability supports RedHat OpenShift on AWS (ROSA). This version supports enhanced observability for ROSA clusters. After you install the CloudWatch agent operator Helm chart, Container Insights auto-collects detailed infrastructure telemetry from the cluster level down to the container level in your environment. It then displays this performance data in curated dashboards removing the heavy lifting in observability setup. 

**Note**  
For RedHat for OpenShift on AWS (ROSA), when you install the CloudWatch agent operator using helm charts, the CloudWatch agent is by default also enabled to receive both metrics and traces from your applications that are instrumented for Application Signals. If you would like to optionally pass in custom configuration rules, you can do so by passing in a custom agent configuration by using the Helm chart, as outlined in (Optional) [Additional configuration], as outlined in [(Optional) Additional configuration](install-CloudWatch-Observability-EKS-addon.md#install-CloudWatch-Observability-EKS-addon-configuration).

**To install Container Insights with enhanced observability on a RedHat OpenShift on AWS (ROSA) cluster**

1. If necessary, install Helm. For more information, see [Quickstart Guide](https://helm.sh/docs/intro/quickstart/) in the Helm documentation.

1. Install the CloudWatch agent operator by entering the following commands. Replace *my-cluster-name* with the name of your cluster, and replace *my-cluster-region* with the Region that the cluster runs in.

   ```
   helm repo add aws-observability https://aws-observability.github.io/helm-charts
   helm repo update aws-observability
   helm install --wait --create-namespace \
       --namespace amazon-cloudwatch amazon-cloudwatch-observability \
       aws-observability/amazon-cloudwatch-observability \
       --set clusterName=my-cluster-name \
       --set region=my-cluster-region \
       --set k8sMode=ROSA
   ```

1. Set up authorization for the agent operator by following the steps in Option 1, Option 2, or Option 3 in [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md).

# Viewing Container Insights metrics
<a name="Container-Insights-view-metrics"></a>

After you have Container Insights set up and it is collecting metrics, you can view those metrics in the CloudWatch console.

For Container Insights metrics to appear on your dashboard, you must complete the Container Insights setup. For more information, see [Setting up Container Insights](deploy-container-insights.md).

This procedure explains how to view the metrics that Container Insights automatically generates from the collected log data. The rest of this section explains how to further dive into your data and use CloudWatch Logs Insights to see more metrics at more levels of granularity.

**To view Container Insights metrics**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Insights**, **Container Insights**.

1. Use the drop-down boxes near the top to select the type of resource to view, as well as the specific resource.

You can set a CloudWatch alarm on any metric that Container Insights collects. For more information, see [Using Amazon CloudWatch alarms](CloudWatch_Alarms.md)

**Note**  
If you have already set up CloudWatch Application Insights to monitor your containerized applications, the Application Insights dashboard appears below the Container Insights dashboard. If you have not already enabled Application Insights, you can do so by choosing **Auto-configure Application Insights** below the performance view in the Container Insights dashboard.  
For more information about Application Insights and containerized applications, see [Enable Application Insights for Amazon ECS and Amazon EKS resource monitoring](appinsights-setting-up-console.md#appinsights-container-insights).

## Viewing the top contributors
<a name="Container-Insights-view-metrics-topn"></a>

For some of the views in Container Insights performance monitoring, you can also see the top contributors by memory or CPU, or the most recently active resources. This is available when you select any of the following dashboards in the drop-down box near the top of the page:
+ ECS Services
+ ECS Tasks
+ EKS Namespaces
+ EKS Services
+ EKS Pods

When you are viewing one of these types of resources, the bottom of the page displays a table sorted initially by CPU usage. You can change it to sort by memory usage or recent activity. To see more about one of the rows in the table, you can select the checkbox next to that row and then choose **Actions** and choose one of the options in the **Actions** menu.

## Using CloudWatch Logs Insights to view Container Insights data
<a name="Container-Insights-CloudWatch-Logs-Insights"></a>

Container Insights collects metrics by using performance log events with using [embedded metric format](CloudWatch_Embedded_Metric_Format.md). The logs are stored in CloudWatch Logs. CloudWatch generates several metrics automatically from the logs which you can view in the CloudWatch console. You can also do a deeper analysis of the performance data that is collected by using CloudWatch Logs Insights queries.

For more information about CloudWatch Logs Insights, see [Analyze Log Data with CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html). For more information about the log fields you can use in queries, see [Container Insights performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-logs-EKS.md).

**To use CloudWatch Logs Insights to query your container metric data**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Logs**, **Logs Insights**.

   Near the top of the screen is the query editor. When you first open CloudWatch Logs Insights, this box contains a default query that returns the 20 most recent log events.

1. In the box above the query editor, select one of the Container Insights log groups to query. For the following example queries to work, the log group name must end with **performance**.

   When you select a log group, CloudWatch Logs Insights automatically detects fields in the data in the log group and displays them in **Discovered fields** in the right pane. It also displays a bar graph of log events in this log group over time. This bar graph shows the distribution of events in the log group that matches your query and time range, not only the events displayed in the table.

1. In the query editor, replace the default query with the following query and choose **Run query**.

   ```
   STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName
   | SORT avg_node_cpu_utilization DESC
   ```

   This query shows a list of nodes, sorted by average node CPU utilization.

1. To try another example, replace that query with another query and choose **Run query**. More sample queries are listed later on this page.

   ```
   STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
   | SORT avg_number_of_container_restarts DESC
   ```

   This query displays a list of your pods, sorted by average number of container restarts.

1. If you want to try another query, you can use include fields in the list at the right of the screen. For more information about query syntax, see [CloudWatch Logs Insights Query Syntax](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html).

**To see lists of your resources**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Resources**.

1. The default view is a list of your resources being monitored by Container Insights, and alarms that you have set on these resources. To see a visual map of the resources, choose **Map view**.

1. From the map view, you can pause your pointer over any resource in the map to see basic metrics about that resource. You can choose any resource to see more detailed graphs about the resource.

## Use case: Seeing task-level metrics in Amazon ECS containers
<a name="Container-Insights-CloudWatch-Logs-Insights-example"></a>

The following example illustrates how to use CloudWatch Logs Insights to dive deeper into your Container Insights logs. For more examples, see the blog [ Introducing Amazon CloudWatch Container Insights for Amazon ECS](https://aws.amazon.com/blogs/mt/introducing-container-insights-for-amazon-ecs/).

 Container Insights does not automatically generate metrics at the Task level of granularity. The following query displays task-level metrics for CPU and memory usage.

```
stats avg(CpuUtilized) as CPU, avg(MemoryUtilized) as Mem by TaskId, ContainerName
| sort Mem, CPU desc
```

## Other sample queries for Container Insights
<a name="Container-Insights-sample-queries"></a>

**List of your pods, sorted by average number of container restarts**

```
STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName
| SORT avg_number_of_container_restarts DESC
```

**Pods requested vs. pods running**

```
fields @timestamp, @message 
| sort @timestamp desc 
| filter Type="Pod" 
| stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name 
| sort pods_missing desc
```

**Count of cluster node failures**

```
stats avg(cluster_failed_node_count) as CountOfNodeFailures 
| filter Type="Cluster" 
| sort @timestamp desc
```

**Application log errors by container name**

```
stats count() as countoferrors by kubernetes.container_name 
| filter stream="stderr" 
| sort countoferrors desc
```

# Metrics collected by Container Insights
<a name="Container-Insights-metrics"></a>

Container Insights collects one set of metrics for Amazon ECS and AWS Fargate on Amazon ECS, and a different set for Amazon EKS, AWS Fargate on Amazon EKS, RedHat OpenShift on AWS (ROSA), and Kubernetes.

Metrics are not visible until the container tasks have been running for some time.

**Topics**
+ [

# Amazon ECS Container Insights with enhanced observability metrics
](Container-Insights-enhanced-observability-metrics-ECS.md)
+ [

# Amazon ECS Container Insights metrics
](Container-Insights-metrics-ECS.md)
+ [

# Amazon EKS and Kubernetes Container Insights with enhanced observability metrics
](Container-Insights-metrics-enhanced-EKS.md)
+ [

# Amazon EKS and Kubernetes Container Insights metrics
](Container-Insights-metrics-EKS.md)
+ [

# Container Insights performance log reference
](Container-Insights-reference.md)
+ [

# Container Insights Prometheus metrics monitoring
](ContainerInsights-Prometheus.md)
+ [

# Integration with Application Insights
](container-insights-appinsights.md)
+ [

# Viewing Amazon ECS lifecycle events within Container Insights
](container-insights-ECS-lifecycle-events.md)
+ [

# Troubleshooting Container Insights
](ContainerInsights-troubleshooting.md)
+ [

# Building your own CloudWatch agent Docker image
](ContainerInsights-build-docker-image.md)
+ [

# Deploying other CloudWatch agent features in your containers
](ContainerInsights-other-agent-features.md)

# Amazon ECS Container Insights with enhanced observability metrics
<a name="Container-Insights-enhanced-observability-metrics-ECS"></a>

Container Insights with enhanced observability provides deeper visibility into containerized workloads by offering:
+ Higher metrics granularity at both task and container levels
+ Improved monitoring and troubleshooting capabilities
+ Integration with CloudWatch Logs for:
  + Correlating metrics anomalies with log entries
  + Performing faster root cause analysis
  + Reducing resolution time for complex container issues

**Use cases**

Container Insights with enhanced observability extends the capabilities of standard Container Insights. It enables the following use cases:
+ **Task-level troubleshooting** – Identify performance bottlenecks at the task level. Analyze task-level metrics and compare them with reserved resources to determine if tasks have sufficient processing capacity 
+ **Container-level resource optimization** – Track utilization against reservation levels to identify containers that are either resource-constrained or over-provisioned 
+ **Container health assessment** – Monitor restart counts and state transitions to detect unstable containers requiring intervention 
+ **Application performance monitoring** – Track how applications communicate with each other, monitor resource usage patterns, and optimize data storage performance
+ **Operational monitoring** – Monitor deployments, track task sets for blue or green deployments, and maintain platform health through service metrics

For more information on Amazon ECS metrics, see [Amazon ECS service utilization metrics use cases](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service_utilization-metrics-explanation.html) and for information on container insights with enhanced observability [Amazon ECS Container Insights with enhanced observability metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-enhanced-observability-metrics-ECS.html).

Container Insights also shows cluster, service, and daemon-wide statistics by averaging data across all tasks. This provides a higher-level view of your service and daemon health, assisting in both environment monitoring and capacity planning.

**Note**  
Amazon ECS Managed Daemon metrics use the same `ECS/ContainerInsights` namespace and the same `ServiceName` dimension as service metrics. For daemon metrics, the `ServiceName` dimension value uses the format `daemon:daemon-name`. For example, a daemon named `my-daemon` has a `ServiceName` dimension value of `daemon:my-daemon`. All metrics in the table below that include the `ServiceName` dimension also apply to Managed Daemons.

The following table lists the metrics and dimensions that Container Insights with enhanced observability collects for Amazon ECS. These metrics are in the `ECS/ContainerInsights` namespace. For more information, see [Metrics](cloudwatch_concepts.md#Metric).

If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights with enhanced observability. Metrics do not appear before Container Insights with enhanced observability has been set up completely. For more information, see [Set up Container Insights with enhanced observability](deploy-container-insights-ECS-cluster.md#set-container-insights-ECS-cluster-enhanced).

The following metrics are available for all launch types.


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `ContainerInstanceCount`  |  `ClusterName`  |  The number of EC2 instances running the Amazon ECS agent that are registered with a cluster. This metric is collected only for container instances that are running Amazon ECS tasks in the cluster. It is not collected for empty container instances that do not have any Amazon ECS tasks. Unit: Count  | 
|  `ContainerCpuUtilized`  |  `ClusterName` `ContainerName`, `TaskId`, `ServiceName`, `ClusterName` `ContainerName`, `TaskDefinitionFamily`, `ClusterName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `ServiceName`, `ClusterName`, `ContainerName`  |  The CPU units used by containers in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: None  | 
|  `ContainerCpuReserved`  |  `ClusterName` `ContainerName`, `TaskId`, `ServiceName`, `ClusterName` `ContainerName`, `TaskDefinitionFamily`, `ClusterName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `ServiceName`, `ClusterName`, `ContainerName`  |  The CPU units reserved by containers in the resource that is specified by the dimension set that you're using. This metric is collected based on the CPU reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance CPU reservation is used. Also applies to Managed Daemons. Unit: None  | 
|  `ContainerCpuUtilization`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`   | The total percentage of CPU units being used by containers in the resource that is specified by the dimension set that you're using. Unit: Percent | 
|  `ContainerMemoryUtilized`  |  `ClusterName` `ContainerName`, `TaskId`, `ServiceName`, `ClusterName` `ContainerName`, `TaskDefinitionFamily`, `ClusterName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `ServiceName`, `ClusterName`, `ContainerName`  |  The memory being used by containers in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: Megabytes  | 
|  `ContainerMemoryReserved`  |  `ClusterName` `ContainerName`, `TaskId`, `ServiceName`, `ClusterName` `ContainerName`, `TaskDefinitionFamily`, `ClusterName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `ServiceName`, `ClusterName`, `ContainerName`  |  The memory that is reserved by containers in the resource that is specified by the dimension set that you're using.  This metric is collected based on the memory reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance memory reservation is used. Also applies to Managed Daemons. Unit: Megabytes  | 
|  `ContainerMemoryUtilization`  |  `ClusterName` `ContainerName`, `TaskId`, `ServiceName`, `ClusterName` `ContainerName`, `TaskDefinitionFamily`, `ClusterName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `ServiceName`, `ClusterName`, `ContainerName`  | The total percentage of memory being used by containers in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: Percent | 
|  `ContainerNetworkRxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes received by the container that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `ContainerNetworkTxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes transmitted by the container that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `ContainerStorageReadBytes`  |  `ClusterName` `ClusterName`, `ServiceName`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `ContainerName` `ClusterName`, `ServiceName`, `TaskId`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `TaskId`, `ContainerName`  |  The number of bytes read from storage on the container in the resource that is specified by the dimensions that you're using. This does not include read bytes for your storage devices. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `ContainerStorageWriteBytes`  |  `ClusterName` `ClusterName`, `ServiceName`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `ContainerName` `ClusterName`, `ServiceName`, `TaskId`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `TaskId`, `ContainerName`  |  The number of bytes written to storage in the container that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `CpuUtilized`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`   |  The CPU units used by tasks in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: None  | 
|  `CpuReserved`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is collected based on the CPU reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance CPU reservation is used. Also applies to Managed Daemons. Unit: None  | 
|  `DeploymentCount`  |  `ServiceName`, `ClusterName`  |  The number of deployments in an Amazon ECS service. Unit: Count  | 
|  `DesiredTaskCount`  |  `ServiceName`, `ClusterName`  |  The desired number of tasks for an Amazon ECS service. Unit: Count  | 
|  `EBSFilesystemSize`  |  `ClusterName` ,`TaskDefinitionFamily`, `VolumeName` `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName`  |  The total amount, in gigabytes (GB), of Amazon EBS filesystem storage that is allocated to the resources specified by the dimensions you're using. This metric is only available for tasks that run on Amazon ECS infrastructure running on Fargate using platform version `1.4.0` or Amazon EC2 instances using container agent version `1.79.0` or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  `EBSFilesystemUtilized`  |  `ClusterName` ,`TaskDefinitionFamily`, `VolumeName` `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName`  |  The total amount, in gigabytes (GB), of Amazon EBS filesystem storage that is being used by the resources specified by the dimensions that you're using. This metric is only available for tasks that run on Amazon ECS infrastructure running on Fargate using platform version `1.4.0` or Amazon EC2 instances using container agent version `1.79.0` or later. For tasks run on Fargate, Fargate reserves space on the disk that is only used by Fargate. There is no cost associated with the space Fargate uses, but you will see this additional storage using tools like `df`. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  `TaskEBSFilesystemUtilization`  |  `TaskDefinitionFamily`, `ClusterName` `ClusterName`, `ServiceName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `TaskDefinitionFamily`, `ClusterName`, `TaskId`  |  The percentage of Amazon EBS filesystem storage that is being used by the task specified by the dimensions that you're using. This metric is only available for tasks that run on Amazon ECS infrastructure running on Fargate using platform version `1.4.0` or Amazon EC2 instances using container agent version `1.79.0` or later. Also applies to Managed Daemons. Unit: Percent  | 
|  EphemeralStorageReserved [1](#ci-enhanced-metrics-ecs-storage-fargate-note)  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  EphemeralStorageUtilized [1](Container-Insights-metrics-ECS.md#ci-metrics-ecs-storage-fargate-note)  |  `ClusterName` `ClusterName`, `TaskDefinitionFamily` `ClusterName`, `ServiceName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  `MemoryUtilized`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The memory being used by tasks in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: Megabytes  | 
|  `MemoryReserved`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using.  This metric is collected based on the memory reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance memory reservation is used. Also applies to Managed Daemons. Unit: Megabytes  | 
|  `NetworkRxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `NetworkTxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `PendingTaskCount`  |  `ServiceName`, `ClusterName`  |  The number of tasks currently in the `PENDING` state. Unit: Count  | 
|  `RunningTaskCount`  |  `ServiceName`, `ClusterName`  |  The number of tasks currently in the `RUNNING` state. Unit: Count  | 
|  `RestartCount`  |  `ClusterName` `ClusterName`, `ServiceName` `ClusterName`, `TaskDefinitionFamily` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId` `ClusterName`, `ServiceName`, `ContainerName` `ClusterName`, `ServiceName`, `TaskId`, `ContainerName` `TaskDefinitionFamily`, `ClusterName`, `ContainerName` `TaskDefinitionFamily`, `ClusterName`, `TaskId`, `ContainerName`  |  The number of times a container in an Amazon ECS task has been restarted. This metric is collected only for containers that have a restart policy enabled. Also applies to Managed Daemons. Unit: Count  | 
|  `UnHealthyContainerHealthStatus`  |  `ClusterName` `ClusterName`, `ServiceName`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `ContainerName` `ClusterName`, `ServiceName`, `TaskId`, `ContainerName` `ClusterName`, `TaskDefinitionFamily`, `TaskId`, `ContainerName`  |  The number of unhealthy containers based on container health check status. A container is considered unhealthy when its health check returns an unhealthy status. This metric is collected only for containers that have a health check configured in the task definition. The metric value is 1 when the container health status is `UNHEALTHY`, and 0 when the health status is `HEALTHY`. Unit: Count  | 
|  `ServiceCount`  |  `ClusterName`  |  The number of services in the cluster. Unit: Count  | 
|  `StorageReadBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes read from storage on the instance in the resource that is specified by the dimensions that you're using. This does not include read bytes for your storage devices. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `StorageWriteBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`  |  The number of bytes written to storage in the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `TaskCount`  |  `ClusterName`  |  The number of tasks running in the cluster. Unit: Count  | 
|  `TaskCpuUtilization`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`   |  The total percentage of CPU units being used by a task.  Also applies to Managed Daemons. Unit: Percent  | 
|  `TaskEphemeralStorageUtilization`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`   |  The total percentage of ephemeral storage being used by a task.  Also applies to Managed Daemons. Unit: Percent  | 
|  `TaskMemoryUtilization`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName` `ClusterName`, `ServiceName`, `TaskId` `ClusterName`, `TaskDefinitionFamily`, `TaskId`   |  The total percentage of memory being used by a task.  Also applies to Managed Daemons. Unit: Percent  | 
|  `TaskSetCount`  |  `ServiceName`, `ClusterName`  |  The number of task sets in the service. Unit: Count  | 

**Note**  
The `EphemeralStorageReserved` and `EphemeralStorageUtilized` metrics are only available for tasks that run on Fargate Linux platform version 1.4.0 or later.  
Fargate reserves space on disk. It is only used by Fargate. You aren't billed for it. It isn't shown in these metrics. However, you can see this additional storage in other tools such as `df`.

The following metrics are available when you complete the steps in [Deploying the CloudWatch agent to collect EC2 instance-level metrics on Amazon ECS](deploy-container-insights-ECS-instancelevel.md) and use the EC2 launch type.


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `instance_cpu_limit`  |  `ClusterName`  |  The maximum number of CPU units that can be assigned to a single EC2 instance in the cluster. Unit: None  | 
|  `instance_cpu_reserved_capacity`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The percentage of CPU currently being reserved on a single EC2 instance in the cluster. Unit: Percent  | 
|  `instance_cpu_usage_total`  |  `ClusterName`  |  The number of CPU units being used on a Single EC2 instance in the cluster. Unit: None  | 
|  `instance_cpu_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of CPU units being used on a single EC2 instance in the cluster.  Unit: Percent  | 
|  `instance_filesystem_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of file system capacity being used on a single EC2 instance in the cluster.  Unit: Percent  | 
|  `instance_memory_limit`  |  `ClusterName`  |  The maximum amount of memory, in bytes, that can be assigned to a single EC2 instance in this cluster.  Unit: Bytes  | 
|  `instance_memory_reserved_capacity`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The percentage of Memory currently being reserved on a single EC2 instance in the cluster. Unit: Percent  | 
|  `instance_memory_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of memory being used on a single EC2 instance in the cluster.  If you're using the Java ZGC garbage collector for your application, this metric might be inaccurate.  Unit: Percent  | 
|  `instance_memory_working_set`  |  `ClusterName`  |  The amount of memory, in bytes, being used on a single EC2 instance in the cluster.  If you're using the Java ZGC garbage collector for your application, this metric might be inaccurate.  Unit: Bytes  | 
|  `instance_network_total_bytes`  |  `ClusterName`  |  The total number of bytes per second transmitted and received over the network on a single EC2 instance in the cluster. Unit: Bytes/second  | 
|  `instance_number_of_running_tasks`  |  `ClusterName`  |  The number of running tasks on a single EC2 instance in the cluster. Unit: Count  | 

# Amazon ECS Container Insights metrics
<a name="Container-Insights-metrics-ECS"></a>

Container Insights metrics provides additional network, storage, and ephemeral storage metrics. These metrics provide more information than standard Amazon ECS metrics. Container Insights integrates with CloudWatch Logs. You can correlate metric changes with log entries for easier troubleshooting. Container Insights also shows cluster, service, and daemon-wide statistics by averaging data across all tasks. This provides a higher-level view of your service and daemon health, assisting in both environment monitoring and capacity planning.

**Use cases**
+ **Problem identification and troubleshooting** – Track failed deployments by analyzing task state transition patterns, enabling rapid identification of failure points. Diagnose configuration issues through comprehensive examination of task startup sequences and initialization behaviors
+ **Cluster and Service-Level health assessment** – Shows average task performance across the cluster. This approach moderates outliers to deliver a more stable view of cluster and service health. Use these insights for general service monitoring where extreme values could be misleading 
+ **Service availability issues** – Detect deployment failures by monitoring running task count metrics. Correlate service event logs with performance metrics to understand infrastructure impacts. Track task restart patterns to identify unstable services or infrastructure issues
+ **Capacity planning for average load** – It helps determine resource requirements based on typical task behavior patterns, provides consistent metrics that support effective long-term planning, and reduces the impact of short-lived spikes on capacity decisions
+ **Provides additional metrics** – Collects additional network, storage, and ephemeral storage metrics not available in vended metrics

For more information on Amazon ECS metrics, see [Amazon ECS service utilization metrics use cases](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service_utilization-metrics-explanation.html) and for information on container insights with enhanced observability [Amazon ECS Container Insights with enhanced observability metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-enhanced-observability-metrics-ECS.html).

**Note**  
Amazon ECS Managed Daemon metrics use the same `ECS/ContainerInsights` namespace and the same `ServiceName` dimension as service metrics. For daemon metrics, the `ServiceName` dimension value uses the format `daemon:daemon-name`. For example, a daemon named `my-daemon` has a `ServiceName` dimension value of `daemon:my-daemon`. All metrics in the table below that include the `ServiceName` dimension also apply to Managed Daemons.

The following table lists the metrics and dimensions that Container Insights collects for Amazon ECS. These metrics are in the `ECS/ContainerInsights` namespace. For more information, see [Metrics](cloudwatch_concepts.md#Metric).

If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights. Metrics do not appear before Container Insights has been set up completely. For more information, see [Setting up Container Insights](deploy-container-insights.md).

The following metrics are available when you complete the steps in [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS-cluster.md).


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `ContainerInstanceCount`  |  `ClusterName`  |  The number of EC2 instances running the Amazon ECS agent that are registered with a cluster. This metric is collected only for container instances that are running Amazon ECS tasks in the cluster. It is not collected for empty container instances that do not have any Amazon ECS tasks. Unit: Count  | 
|  `CpuUtilized`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The CPU units used by tasks in the resource that is specified by the dimension set that you're using. Also applies to Managed Daemons. Unit: None  | 
|  `CpuReserved`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The CPU units reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is collected based on the CPU reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance CPU reservation is used. Also applies to Managed Daemons. Unit: None  | 
|  `DeploymentCount`  |  `ServiceName`, `ClusterName`  |  The number of deployments in an Amazon ECS service. Unit: Count  | 
|  `DesiredTaskCount`  |  `ServiceName`, `ClusterName`  |  The desired number of tasks for an Amazon ECS service. Unit: Count  | 
|  `EBSFilesystemSize`  |  `VolumeName`, `TaskDefinitionFamily`, `ClusterName` `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName`  |  The total amount, in gigabytes (GB), of Amazon EBS filesystem storage that is allocated to the resources specified by the dimensions you're using. This metric is only available for tasks that run on Amazon ECS infrastructure running on Fargate using platform version `1.4.0` or Amazon EC2 instances using container agent version `1.79.0` or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  `EBSFilesystemUtilized`  |  `VolumeName`, `TaskDefinitionFamily`, `ClusterName` `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName`  |  The total amount, in gigabytes (GB), of Amazon EBS filesystem storage that is being used by the resources specified by the dimensions that you're using. This metric is only available for tasks that run on Amazon ECS infrastructure running on Fargate using platform version `1.4.0` or Amazon EC2 instances using container agent version `1.79.0` or later. For tasks run on Fargate, Fargate reserves space on the disk that is only used by Fargate. There is no cost associated with the space Fargate uses, but you will see this additional storage using tools like `df`. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  EphemeralStorageReserved [1](#ci-metrics-ecs-storage-fargate-note)  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes reserved from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  EphemeralStorageUtilized [1](#ci-metrics-ecs-storage-fargate-note)  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes used from ephemeral storage in the resource that is specified by the dimensions that you're using. Ephemeral storage is used for the container root filesystem and any bind mount host volumes defined in the container image and task definition. The amount of ephemeral storage can’t be changed in a running task. This metric is only available for tasks that run on Fargate Linux platform version 1.4.0 or later. Also applies to Managed Daemons. Unit: Gigabytes (GB)  | 
|  `InstanceOSFilesystemUtilization`  |  `CapacityProviderName`, `ClusterName`, `ContainerInstanceId`, `EC2InstanceId` `ClusterName`  |  The percentage of total disk space that is used for OS volume.  | 
|  `InstanceDataFilesystemUtilization`  |  `CapacityProviderName`, `ClusterName`, `ContainerInstanceId`, `EC2InstanceId` `ClusterName`  |  The percentage of total disk space that is used for data volume.  | 
|  `MemoryUtilized`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The memory being used by tasks in the resource that is specified by the dimension set that you're using.  If you're using the Java ZGC garbage collector for your application, this metric might be inaccurate. Although `MemoryUtilized` and `MemoryReserved` are marked as "Megabytes", the actual units are in MiB (Mebibytes).  Also applies to Managed Daemons. Unit: Megabytes  | 
|  `MemoryReserved`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using. This metric is collected based on the memory reservation defined in the task definition, for example, at the task or all containers level. If this is not specified in the task definition, then the instance memory reservation is used. Also applies to Managed Daemons. Unit: Megabytes  Although `MemoryUtilized` and `MemoryReserved` are marked as "Megabytes", the actual units are in MiB (Mebibytes).   | 
|  `NetworkRxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes received by the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `NetworkTxBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes transmitted by the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. This metric is available only for containers in tasks using the `awsvpc` or `bridge` network modes. Also applies to Managed Daemons. Unit: Bytes/Second  | 
|  `PendingTaskCount`  |  `ServiceName`, `ClusterName`  |  The number of tasks currently in the `PENDING` state. Unit: Count  | 
|  `RunningTaskCount`  |  `ServiceName`, `ClusterName`  |  The number of tasks currently in the `RUNNING` state. Unit: Count  | 
|  `RestartCount`  |  `ClusterName` `ClusterName`, `ServiceName` `ClusterName`, `TaskDefinitionFamily`  |  The number of times a container in an Amazon ECS task has been restarted. This metric is collected only for containers that have a restart policy enabled. Also applies to Managed Daemons. Unit: Count  | 
|  `ServiceCount`  |  `ClusterName`  |  The number of services in the cluster. Unit: Count  | 
|  `StorageReadBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes read from storage on the instance in the resource that is specified by the dimensions that you're using. This does not include read bytes for your storage devices. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `StorageWriteBytes`  |  `TaskDefinitionFamily`, `ClusterName` `ServiceName`, `ClusterName` `ClusterName`  |  The number of bytes written to storage in the resource that is specified by the dimensions that you're using. This metric is obtained from the Docker runtime. Also applies to Managed Daemons. Unit: Bytes  | 
|  `TaskCount`  |  `ClusterName`  |  The number of tasks running in the cluster. Unit: Count  | 
|  `TaskSetCount`  |  `ServiceName`, `ClusterName`  |  The number of task sets in the service. Unit: Count  | 

**Note**  
The `EphemeralStorageReserved` and `EphemeralStorageUtilized` metrics are only available for tasks that run on Fargate Linux platform version 1.4.0 or later.  
Fargate reserves space on disk. It is only used by Fargate. You aren't billed for it. It isn't shown in these metrics. However, you can see this additional storage in other tools such as `df`.

The following metrics are available when you complete the steps in [Deploying the CloudWatch agent to collect EC2 instance-level metrics on Amazon ECS](deploy-container-insights-ECS-instancelevel.md)


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `instance_cpu_limit`  |  `ClusterName`  |  The maximum number of CPU units that can be assigned to a single EC2 Instance in the cluster. Unit: None  | 
|  `instance_cpu_reserved_capacity`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The percentage of CPU currently being reserved on a single EC2 instance in the cluster. Unit: Percent  | 
|  `instance_cpu_usage_total`  |  `ClusterName`  |  The number of CPU units being used on a Single EC2 instance in the cluster. Unit: None  | 
|  `instance_cpu_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of CPU units being used on a single EC2 instance in the cluster.  Unit: Percent  | 
|  `instance_filesystem_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of file system capacity being used on a single EC2 instance in the cluster.  Unit: Percent  | 
|  `instance_memory_limit`  |  `ClusterName`  |  The maximum amount of memory, in bytes, that can be assigned to a single EC2 Instance in this cluster.  Unit: Bytes  | 
|  `instance_memory_reserved_capacity`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The percentage of Memory currently being reserved on a single EC2 Instance in the cluster. Unit: Percent  | 
|  `instance_memory_utilization`  |  `ClusterName` `InstanceId`, `ContainerInstanceId`, `ClusterName`  |  The total percentage of memory being used on a single EC2 Instance in the cluster.  If you're using the Java ZGC garbage collector for your application, this metric might be inaccurate.  Unit: Percent  | 
|  `instance_memory_working_set`  |  `ClusterName`  |  The amount of memory, in bytes, being used on a single EC2 Instance in the cluster.  If you're using the Java ZGC garbage collector for your application, this metric might be inaccurate.  Unit: Bytes  | 
|  `instance_network_total_bytes`  |  `ClusterName`  |  The total number of bytes per second transmitted and received over the network on a single EC2 Instance in the cluster. Unit: Bytes/second  | 
|  `instance_number_of_running_tasks`  |  `ClusterName`  |  The number of running tasks on a single EC2 Instance in the cluster. Unit: Count  | 

# Amazon EKS and Kubernetes Container Insights with enhanced observability metrics
<a name="Container-Insights-metrics-enhanced-EKS"></a>

The following tables list the metrics and dimensions that Container Insights with enhanced observability collects for Amazon EKS and Kubernetes. These metrics are in the `ContainerInsights` namespace. For more information, see [Metrics](cloudwatch_concepts.md#Metric).

If you do not see any Container Insights with enhanced observability metrics in your console, be sure that you have completed the setup of Container Insights with enhanced observability. Metrics do not appear before Container Insights with enhanced observability has been set up completely. For more information, see [Setting up Container Insights](deploy-container-insights.md).

If you are using version 1.5.0 or later of the Amazon EKS add-on or version 1.300035.0 of the CloudWatch agent, most metrics listed in the following table are collected for both Linux and Windows nodes. See the **Metric Name** column of the table to see which metrics are not collected for Windows.

With the earlier version of Container Insights which delivers aggregated metrics at Cluster and Service level, the metrics are charged as custom metrics. With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics are charged per observation instead of being charged per metric stored or log ingested. For more information about CloudWatch pricing, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/). 

**Note**  
On Windows, network metrics such as `pod_network_rx_bytes` and `pod_network_tx_bytes` are not collected for host process containers.  
On RedHat OpenShift on AWS (ROSA) clusters , diskio metrics such as `node_diskio_io_serviced_total` and `node_diskio_io_service_bytes_total` are not collected.


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `cluster_failed_node_count`  |  `ClusterName`  |  The number of failed worker nodes in the cluster. A node is considered failed if it is suffering from any *node conditions*. For more information, see [Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) in the Kubernetes documentation.  | 
|  `cluster_node_count`  |  `ClusterName`  |  The total number of worker nodes in the cluster.  | 
|  `namespace_number_of_running_pods`  |  `Namespace` `ClusterName` `ClusterName`  |  The number of pods running per namespace in the resource that is specified by the dimensions that you're using.  | 
|  `node_cpu_limit`  |  `ClusterName`  `ClusterName`, `InstanceId`, `NodeName`   |  The maximum number of CPU units that can be assigned to a single node in this cluster.  | 
|  `node_cpu_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker. Formula: `node_cpu_request / node_cpu_limit`  `node_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_cpu_usage_total`  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The number of CPU units being used on the nodes in the cluster.  | 
|  `node_cpu_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of CPU units being used on the nodes in the cluster. Formula: `node_cpu_usage_total / node_cpu_limit`  | 
|  `node_filesystem_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of file system capacity being used on nodes in the cluster. Formula: `node_filesystem_usage / node_filesystem_capacity`  `node_filesystem_usage` and `node_filesystem_capacity` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_limit`  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster.  | 
|  `node_filesystem_inodes`  It is not available on Windows.  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The total number of inodes (used and unused) on a node.  | 
|  `node_filesystem_inodes_free` It is not available on Windows.  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The number of unused inodes on a node.  | 
|  `node_gpu_limit` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The total number of GPU(s) available on the node.  | 
|  `node_gpu_usage_total` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The number of GPU(s) being used by the running pods on the node.  | 
|  `node_gpu_reserved_capacity` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  | 
|  `node_memory_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used on the nodes in the cluster. Formula: `node_memory_request / node_memory_limit`  `node_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used by the node or nodes. It is the percentage of node memory usage divided by the node memory limitation. Formula: `node_memory_working_set / node_memory_limit`.   | 
|  `node_memory_working_set`  |  `ClusterName`  `ClusterName`, `InstanceId`, `NodeName`   |  The amount of memory, in bytes, being used in the working set of the nodes in the cluster.  | 
|  `node_network_total_bytes`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total number of bytes per second transmitted and received over the network per node in a cluster. Formula: `node_network_rx_bytes + node_network_tx_bytes`  `node_network_rx_bytes` and `node_network_tx_bytes` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_number_of_running_containers`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running containers per node in a cluster.  | 
|  `node_number_of_running_pods`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running pods per node in a cluster.  | 
|  `node_status_allocatable_pods`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The number of pods that can be assigned to a node based on its allocatable resources, which is defined as the remainder of a node's capacity after accounting for system daemons reservations and hard eviction thresholds.  | 
|  `node_status_capacity_pods`  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The number of pods that can be assigned to a node based on its capacity.  | 
|  `node_status_condition_ready`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  Indicates whether the node status condition `Ready` is true for Amazon EC2 nodes.  | 
|  `node_status_condition_memory_pressure`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  Indicates whether the node status condition `MemoryPressure` is true.  | 
|  `node_status_condition_pid_pressure`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  Indicates whether the node status condition `PIDPressure` is true.  | 
|  `node_status_condition_disk_pressure`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  Indicates whether the node status condition `OutOfDisk` is true.  | 
|  `node_status_condition_unknown`   |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  Indicates whether any of the node status conditions are Unknown.  | 
|  `node_interface_network_rx_dropped`  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The number of packets which were received and subsequently dropped by a network interface on the node.  | 
|  `node_interface_network_tx_dropped`  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The number of packets which were due to be transmitted but were dropped by a network interface on the node.  | 
|  `node_diskio_io_service_bytes_total`  It is not available on Windows or on ROSA clusters.  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The total number of bytes transferred by all I/O operations on the node.  | 
|  `node_diskio_io_serviced_total` It is not available on Windows or on ROSA clusters.  |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`   |  The total number of I/O operations on the node.  | 
|  `pod_cpu_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  `ClusterName`, `Namespace`, `Service`   |  The CPU capacity that is reserved per pod in a cluster. Formula: `pod_cpu_request / node_cpu_limit`  `pod_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_cpu_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The percentage of CPU units being used by pods. Formula: `pod_cpu_usage_total / node_cpu_limit`  | 
|  `pod_cpu_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The percentage of CPU units being used by pods relative to the pod limit. Formula: `pod_cpu_usage_total / pod_cpu_limit`  | 
|  `pod_memory_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  `ClusterName`, `Namespace`, `Service`   |  The percentage of memory that is reserved for pods. Formula: `pod_memory_request / node_memory_limit`  `pod_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_memory_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The percentage of memory currently being used by the pod or pods. Formula: `pod_memory_working_set / node_memory_limit`  | 
|  `pod_memory_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The percentage of memory that is being used by pods relative to the pod limit. If any containers in the pod don't have a memory limit defined, this metric doesn't appear. Formula: `pod_memory_working_set / pod_memory_limit`  | 
|  `pod_network_rx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The number of bytes per second being received over the network by the pod. Formula: `sum(pod_interface_network_rx_bytes)`  `pod_interface_network_rx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_network_tx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace,` `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`   |  The number of bytes per second being transmitted over the network by the pod. Formula: `sum(pod_interface_network_tx_bytes)`  `pod_interface_network_tx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_cpu_request`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The CPU requests for the pod. Formula: `sum(container_cpu_request)`  `pod_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_memory_request`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The memory requests for the pod. Formula: `sum(container_memory_request)`  `pod_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_cpu_limit`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The CPU limit defined for the containers in the pod. If any containers in the pod don't have a CPU limit defined, this metric doesn't appear.  Formula: `sum(container_cpu_limit)`  `pod_cpu_limit` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_memory_limit`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The memory limit defined for the containers in the pod. If any containers in the pod don't have a memory limit defined, this metric doesn't appear.  Formula: `sum(container_memory_limit)`  `pod_cpu_limit` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_status_failed`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that all containers in the pod have terminated, and at least one container has terminated with a non-zero status or was terminated by the system.   | 
|  `pod_status_ready`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that all containers in the pod are ready, having reached the condition of `ContainerReady`.   | 
|  `pod_status_running`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that all containers in the pod are running.   | 
|  `pod_status_scheduled`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that the pod has been scheduled to a node.   | 
|  `pod_status_unknown`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that status of the pod can't be obtained.   | 
|  `pod_status_pending`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that the pod has been accepted by the cluster but one or more of the containers has not become ready yet.   | 
|  `pod_status_succeeded`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Indicates that all containers in the pod have successfully terminated and will not be restarted.   | 
|  `pod_number_of_containers`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers defined in the pod specification.   | 
|  `pod_number_of_running_containers`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are currently in the `Running` state.   | 
|  `pod_container_status_terminated`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are in the `Terminated` state.   | 
|  `pod_container_status_running`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are in the `Running` state.   | 
|  `pod_container_status_waiting`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are in the `Waiting` state.   | 
|  `pod_container_status_waiting_reason_crash_loop_back_off`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are pending because of a `CrashLoopBackOff` error, where a container repeatedly fails to start.  | 
|  `pod_container_status_waiting_reason_create_container_config_error`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are pending with the reason `CreateContainerConfigError`. This is because of an error while creating the container configuration.  | 
|  `pod_container_status_waiting_reason_create_container_error`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are pending with the reason `CreateContainerError` because of an error while creating the container.  | 
|  `pod_container_status_waiting_reason_image_pull_error`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are pending because of `ErrImagePull`, `ImagePullBackOff`, or `InvalidImageName`. These situations are because of an error while pulling the container image.  | 
|  `pod_container_status_waiting_reason_start_error`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  Reports the number of containers in the pod which are pending with the reason being `StartError` because of an error while starting the container.  | 
|  `pod_container_status_terminated_reason_oom_killed`   |  `ContainerName`, `FullPodName`, `PodName`, `Namespace`, `ClusterName` `ContainerName`, `PodName`, `Namespace`, `ClusterName` `ClusterName`  |  Indicates a pod was terminated for exceeding the memory limit. This metric is only displayed when this issue occurs.  | 
|  `pod_interface_network_rx_dropped`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The number of packets which were received and subsequently dropped a network interface for the pod.   | 
|  `pod_interface_network_tx_dropped`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  |  The number of packets which were due to be transmitted but were dropped for the pod.   | 
| `pod_memory_working_set` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  | The memory in bytes that is currently being used by a pod. | 
| `pod_cpu_usage_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName`  | The number of CPU units used by a pod. | 
|  `container_cpu_utilization`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName`, `ContainerName` `PodName`, `Namespace`, `ClusterName`, `ContainerName`, `FullPodName`  |  The percentage of CPU units being used by the container. Formula: `container_cpu_usage_total / node_cpu_limit`  `container_cpu_utilization` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `container_cpu_utilization_over_container_limit`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName`, `ContainerName` `PodName`, `Namespace`, `ClusterName`, `ContainerName`, `FullPodName`  |  The percentage of CPU units being used by the container relative to the container limit. If the container doesn't have a CPU limit defined, this metric doesn't appear. Formula: `container_cpu_usage_total / container_cpu_limit`  `container_cpu_utilization_over_container_limit` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `container_memory_utilization`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName`, `ContainerName` `PodName`, `Namespace`, `ClusterName`, `ContainerName`, `FullPodName`  |  The percentage of memory units being used by the container. Formula: `container_memory_working_set / node_memory_limit`  `container_memory_utilization` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `container_memory_utilization_over_container_limit`    |  `ClusterName` `PodName`, `Namespace`, `ClusterName`, `ContainerName` `PodName`, `Namespace`, `ClusterName`, `ContainerName`, `FullPodName`  |  The percentage of memory units being used by the container relative to the container limit. If the container doesn't have a memory limit defined, this metric doesn't appear. Formula: `container_memory_working_set / container_memory_limit`  `container_memory_utilization_over_container_limit` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `container_memory_failures_total`  It is not available on Windows.  |  `ClusterName` `PodName`, `Namespace`, `ClusterName`, `ContainerName` `PodName`, `Namespace`, `ClusterName`, `ContainerName`, `FullPodName`  |  The number of memory allocation failures experienced by the container.  | 
|  `pod_number_of_container_restarts`  |  PodName, `Namespace`, `ClusterName`  |  The total number of container restarts in a pod.  | 
|  `service_number_of_running_pods`  |  Service, `Namespace`, `ClusterName` `ClusterName`  |  The number of pods running the service or services in the cluster.  | 
|  `replicas_desired`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName`  |  The number of pods desired for a workload as defined in the workload specification.  | 
|  `replicas_ready`   |  `ClusterName` `PodName`, `Namespace`, `ClusterName`  |  The number of pods for a workload that have reached the ready status.  | 
|  `status_replicas_available`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName`  |  The number of pods for a workload which are available. A pod is available when it has been ready for the `minReadySeconds` defined in the workload specification.  | 
|  `status_replicas_unavailable`  |  `ClusterName` `PodName`, `Namespace`, `ClusterName`  |  The number of pods for a workload which are unavailable. A pod is available when it has been ready for the `minReadySeconds` defined in the workload specification. Pods are unavailable if they have not met this criterion.  | 
|  `apiserver_storage_objects`  |  `ClusterName` `ClusterName`, `resource`  |  The number of objects stored in etcd at the time of the last check.  | 
|  `apiserver_storage_db_total_size_in_bytes`  |  `ClusterName` `ClusterName`, `endpoint`  |  Total size of the storage database file physically allocated in bytes. This metric is experimental and might change in future releases of Kubernetes. Unit: Bytes Meaningful statistics: Sum, Average, Minimum, Maximum  | 
|  `apiserver_request_total`  |  `ClusterName` `ClusterName`, `code`, `verb`  |  The total number of API requests to the Kubernetes API server.  | 
|  `apiserver_request_duration_seconds`  |  `ClusterName` `ClusterName`, `verb`  |  Responce latency for API requests to the Kubernetes API server.  | 
|  `apiserver_admission_controller_admission_duration_seconds`  |  `ClusterName` `ClusterName`, `operation`  |  Admission controller latency in seconds. An admission controller is code which intercepts requests to the Kubernetes API server.  | 
|  `rest_client_request_duration_seconds`   |  `ClusterName` `ClusterName`, `operation`  |  Reponse latency experienced by clients calling the Kubernetes API server. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `rest_client_requests_total`   |  `ClusterName` `ClusterName`, `code`, `method`  |  The total number of API requests to the Kubernetes API server made by clients. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `etcd_request_duration_seconds`   |  `ClusterName` `ClusterName`, `operation`  |  Response latency of API calls to Etcd. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `apiserver_storage_size_bytes`   |  `ClusterName` `ClusterName`, `endpoint`  |  Size of the storage database file physically allocated in bytes. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `apiserver_longrunning_requests`  |  `ClusterName` `ClusterName`, `resource`  |  The number of active long-running requests to the Kubernetes API server.  | 
|  `apiserver_current_inflight_requests`  |  `ClusterName` `ClusterName`, `request_kind`  |  The number of requests that are being processed by Kubernetes API server.  | 
|  `apiserver_admission_webhook_admission_duration_seconds`  |  `ClusterName` `ClusterName`, `name`  |  Admission webhook latency in seconds. Admission webhooks are HTTP callbacks that receive admission requests and do something with them.  | 
|  `apiserver_admission_step_admission_duration_seconds`   |  `ClusterName` `ClusterName`, `operation`  |  Admission sub-step latency in seconds.  | 
|  `apiserver_requested_deprecated_apis`   |  `ClusterName` `ClusterName`, `group`  |  Number of requests to deprecated APIs on the Kubernetes API server.  | 
|  `apiserver_request_total_5xx`  |  `ClusterName` `ClusterName`, `code`, `verb`  |  Number of requests to the Kubernetes API server which were responded to with a 5XX HTTP response code.  | 
|  `apiserver_storage_list_duration_seconds`   |  `ClusterName` `ClusterName`, `resource`  |  Response latency of listing objects from Etc. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `apiserver_flowcontrol_request_concurrency_limit`   |  `ClusterName` `ClusterName`, `priority_level`  |  The number of threads used by the currently executing requests in the API Priority and Fairness subsystem.  | 
|  `apiserver_flowcontrol_rejected_requests_total`   |  `ClusterName` `ClusterName`, `reason`  |  Number of requests rejected by API Priority and Fairness subsystem. This metric is experimental and may change in future releases of Kubernetes.  | 
|  `apiserver_current_inqueue_requests`   |  `ClusterName` `ClusterName`, `request_kind`  |  The number queued requests queued by the Kubernetes API server. This metric is experimental and may change in future releases of Kubernetes.  | 

## NVIDIA GPU metrics
<a name="Container-Insights-metrics-EKS-GPU"></a>

Beginning with version `1.300034.0` of the CloudWatch agent, Container Insights with enhanced observability for Amazon EKS collects NVIDIA GPU metrics from EKS workloads by default. The CloudWatch agent must be installed using the CloudWatch Observability EKS add-on version `v1.3.0-eksbuild.1` or later. For more information, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md). These NVIDIA GPU metrics that are collected are listed in the table in this section. 

For Container Insights to collect NVIDIA GPU metrics, you must meet the following prerequisites:
+ You must be using Container Insights with enhanced observability for Amazon EKS, with the Amazon CloudWatch Observability EKS add-on version `v1.3.0-eksbuild.1` or later.
+ [The NVIDIA device plugin for Kubernetes](https://github.com/NVIDIA/k8s-device-plugin) must be installed in the cluster.
+ [The NVIDIA container toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) must be installed on the nodes of the cluster. For example, the Amazon EKS optimized accelerated AMIs are built with the necessary components.

You can opt out of collecting NVIDIA GPU metrics by setting the `accelerated_compute_metrics` option in the beginn CloudWatch agent configuration file to `false`. For more information and an example opt-out configuration, see [(Optional) Additional configuration](install-CloudWatch-Observability-EKS-addon.md#install-CloudWatch-Observability-EKS-addon-configuration).


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `container_gpu_memory_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The total frame buffer size, in bytes, on the GPU(s) allocated to the container.  | 
|  `container_gpu_memory_used` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The bytes of frame buffer used on the GPU(s) allocated to the container.  | 
|  `container_gpu_memory_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The percentage of frame buffer used of the GPU(s) allocated to the container.  | 
|  `container_gpu_power_draw` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The power usage in watts of the GPU(s) allocated to the container.  | 
|  `container_gpu_temperature` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The temperature in degrees celsius of the GPU(s) allocated to the container.  | 
|  `container_gpu_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The percentage utilization of the GPU(s) allocated to the container.  | 
|  `container_gpu_tensor_core_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The percentage utilization of the tensor cores on the GPU(s) allocated to the container.  | 
|  `node_gpu_memory_total` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The total frame buffer size, in bytes, on the GPU(s) allocated to the node.  | 
|  `node_gpu_memory_used` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The bytes of frame buffer used on the GPU(s) allocated to the node.  | 
|  `node_gpu_memory_utilization` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The percentage of frame buffer used on the GPU(s) allocated to the node.  | 
|  `node_gpu_power_draw` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The power usage in watts of the GPU(s) allocated to the node.  | 
|  `node_gpu_temperature` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The temperature in degrees celsius of the GPU(s) allocated to the node.  | 
|  `node_gpu_utilization` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The percentage utilization of the GPU(s) allocated to the node.  | 
|  `node_gpu_tensor_core_utilization` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `GpuDevice`  |  The percentage utilization of the tensor cores on the GPU(s) allocated to the node.  | 
|  `pod_gpu_memory_total` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`. `GpuDevice`  |  The total frame buffer size, in bytes, on the GPU(s) allocated to the pod.  | 
|  `pod_gpu_memory_used` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`. `GpuDevice`  |  The bytes of frame buffer used on the GPU(s) allocated to the pod.  | 
|  `pod_gpu_memory_utilization` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`. `GpuDevice`  |  The percentage of frame buffer used of the GPU(s) allocated to the pod.  | 
|  `pod_gpu_power_draw` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`. `GpuDevice`  |  The power usage in watts of the GPU(s) allocated to the pod.  | 
|  `pod_gpu_temperature` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`. `GpuDevice`  |  The temperature in degrees Celsius of the GPU(s) allocated to the pod.  | 
|  `pod_gpu_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The percentage utilization of the GPU(s) allocated to the pod.  | 
|  `pod_gpu_tensor_core_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `GpuDevice`  |  The percentage utilization of the tensor cores on the GPU(s) allocated to the pod.  | 

### Detailed GPU monitoring
<a name="Container-Insights-detailed-GPU-monitoring"></a>

Beginning with version `1.300062.0` of the CloudWatch agent, Container Insights with enhanced observability for Amazon EKS supports detailed GPU monitoring with sub-minute collection intervals. This addresses monitoring gaps for short-duration machine learning inference workloads that may be completely missed by standard collection intervals. The CloudWatch agent must be installed using the CloudWatch Observability EKS add-on version `v4.7.0-eksbuild.1` or later. For more information, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md).

By default, GPU metrics are collected and ingested at 60-second intervals. With detailed monitoring enabled, the CloudWatch agent collect GPU metrics at sub-minute intervals (minimum 1 second), but metrics are still ingested to CloudWatch at 1-minute intervals. However, you can query statistical aggregations (such as minimum, maximum, and percentiles like p90) of the sub-minute datapoints within each 1-minute period, providing accurate GPU utilization data and better resource optimization.

#### Configuration
<a name="Container-Insights-detailed-GPU-monitoring-configuration"></a>

To enable detailed GPU monitoring, update your CloudWatch agent configuration to include the `accelerated_compute_gpu_metrics_collection_interval` parameter in the `kubernetes` section, as in the following example.

```
{  
    "logs": {  
        "metrics_collected": {  
            "kubernetes": {  
                "cluster_name": "MyCluster",  
                "enhanced_container_insights": true,  
                "accelerated_compute_metrics": true,  
                "accelerated_compute_gpu_metrics_collection_interval": 1  
            }  
        }  
    }  
}
```

The `accelerated_compute_gpu_metrics_collection_interval` parameter accepts values in seconds, with a minimum value of 1 second. Setting it to `1` enables 1-second collection intervals. If this parameter is not specified, the default 60-second interval is used.

For complete configuration instructions, see [Setting up the CloudWatch agent to collect cluster metrics](Container-Insights-setup-metrics.md).

## AWS Neuron metrics for AWS Trainium and AWS Inferentia
<a name="Container-Insights-metrics-EKS-Neuron"></a>

Beginning with version `1.300036.0` of the CloudWatch agent, Container Insights with enhanced observability for Amazon EKS collects accelerated computing metrics from AWS Trainium and AWS Inferentia accelerators by default. The CloudWatch agent must be installed using the CloudWatch Observability EKS add-on version `v1.5.0-eksbuild.1` or later. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md). For more information about AWS Trainium, see [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/). For more information about AWS Inferentia, see [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/).

For Container Insights to collect AWS Neuron metrics, you must meet the following prerequisites:
+ You must be using Container Insights with enhanced observability for Amazon EKS, with the Amazon CloudWatch Observability EKS add-on version `v1.5.0-eksbuild.1` or later.
+ The [Neuron driver](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22) must be installed on the nodes of the cluster.
+ The [Neuron device plugin](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html) must be installed on the cluster. For example, the Amazon EKS optimized accelerated AMIs are built with the necessary components.

The metrics that are collected are listed in the table in this section. The metrics are collected for AWS Trainium, AWS Inferentia, and AWS Inferentia2.

The CloudWatch agent collects these metrics from the [Neuron monitor](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html) and does the necessary Kubernetes resource correlation to deliver metrics at the pod and container levels


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `container_neuroncore_utilization` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  NeuronCore utilization, during the captured period of the NeuronCore allocated to the container. Unit: Percent  | 
|  `container_neuroncore_memory_usage_constants` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for constants during training by the NeuronCore that is allocated to the container (or weights during inference). Unit: Bytes  | 
|  `container_neuroncore_memory_usage_model_code` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the models' executable code by the NeuronCore that is allocated to the container. Unit: Bytes  | 
|  `container_neuroncore_memory_usage_model_shared_scratchpad` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the scratchpad shared of the models by the NeuronCore that is allocated to the container. This memory region is reserved for the models. Unit: Bytes  | 
|  `container_neuroncore_memory_usage_runtime_memory` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the Neuron runtime by the NeuronCore allocated to the container. Unit: Bytes  | 
|  `container_neuroncore_memory_usage_tensors` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for tensors by the NeuronCore allocated to the container. Unit: Bytes  | 
|  `container_neuroncore_memory_usage_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`, `NeuronCore`  |  The total amount of memory used by the NeuronCore allocated to the container. Unit: Bytes  | 
|  `container_neurondevice_hw_ecc_events_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NeuronDevice`  |  The number of corrected and uncorrected ECC events for the on-chip SRAM and device memory of the Neuron device on the node. Unit: Count  | 
|  `pod_neuroncore_utilization` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The NeuronCore utilization during the captured period of the NeuronCore allocated to the pod. Unit: Percent  | 
|  `pod_neuroncore_memory_usage_constants` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for constants during training by the NeuronCore that is allocated to the pod (or weights during inference). Unit: Bytes  | 
|  `pod_neuroncore_memory_usage_model_code` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the models' executable code by the NeuronCore that is allocated to the pod. Unit: Bytes  | 
|  `pod_neuroncore_memory_usage_model_shared_scratchpad` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the scratchpad shared of the models by the NeuronCore that is allocated to the pod. This memory region is reserved for the models. Unit: Bytes  | 
|  `pod_neuroncore_memory_usage_runtime_memory` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the Neuron runtime by the NeuronCore allocated to the pod. Unit: Bytes  | 
|  `pod_neuroncore_memory_usage_tensors` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for tensors by the NeuronCore allocated to the pod. Unit: Bytes  | 
|  `pod_neuroncore_memory_usage_total` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`, `NeuronCore`  |  The total amount of memory used by the NeuronCore allocated to the pod. Unit: Bytes  | 
|  `pod_neurondevice_hw_ecc_events_total` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NeuronDevice`  |  The number of corrected and uncorrected ECC events for the on-chip SRAM and device memory of the Neuron device allocated to a pod. Unit: Bytes  | 
|  `node_neuroncore_utilization` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The NeuronCore utilization during the captured period of the NeuronCore allocated to the node. Unit: Percent  | 
|  `node_neuroncore_memory_usage_constants` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for constants during training by the NeuronCore that is allocated to the node (or weights during inference). Unit: Bytes  | 
|  `node_neuroncore_memory_usage_model_code` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for models' executable code by the NeuronCore that is allocated to the node. Unit: Bytes  | 
|  `node_neuroncore_memory_usage_model_shared_scratchpad` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the scratchpad shared of the models by the NeuronCore that is allocated to the node. This is a memory region reserved for the models. Unit: Bytes  | 
|  `node_neuroncore_memory_usage_runtime_memory` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for the Neuron runtime by the NeuronCore that is allocated to the node. Unit: Bytes  | 
|  `node_neuroncore_memory_usage_tensors` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The amount of device memory used for tensors by the NeuronCore that is allocated to the node. Unit: Bytes  | 
|  `node_neuroncore_memory_usage_total` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceType`, `InstanceId`, `NodeName`, `NeuronDevice`, `NeuronCore`  |  The total amount of memory used by the NeuronCore that is allocated to the node. Unit: Bytes  | 
|  `node_neuron_execution_errors_total` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName`  |  The total number of execution errors on the node. This is calculated by the CloudWatch agent by aggregating the errors of the following types: `generic`, `numerical`, `transient`, `model`, `runtime`, and `hardware` Unit: Count  | 
|  `node_neurondevice_runtime_memory_used_bytes` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName`  |  The total Neuron device memory usage in bytes on the node. Unit: Bytes  | 
| `node_neuron_execution_latency` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName`  |  In seconds, the latency for an execution on the node as measured by the Neuron runtime. Unit: Seconds  | 
| `node_neurondevice_hw_ecc_events_total` |  `ClusterName` `ClusterName`, `UltraServer` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `NodeName`, `NeuronDevice`  |  The number of corrected and uncorrected ECC events for the on-chip SRAM and device memory of the Neuron device on the node. Unit: Count  | 

## AWS Elastic Fabric Adapter (EFA) metrics
<a name="Container-Insights-metrics-EFA"></a>

Beginning with version `1.300037.0` of the CloudWatch agent, Container Insights with enhanced observability for Amazon EKS collects AWS Elastic Fabric Adapter (EFA) metrics from Amazon EKS clusters on Linux instances. The CloudWatch agent must be installed using the CloudWatch Observability EKS add-on version `v1.5.2-eksbuild.1` or later. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md). For more information about AWS Elastic Fabric Adapter, see [Elastic Fabric Adapter](https://aws.amazon.com/hpc/efa/).

For Container Insights to collect AWS Elastic Fabric adapter metrics, you must meet the following prerequisites:
+ You must be using Container Insights with enhanced observability for Amazon EKS, with the Amazon CloudWatch Observability EKS add-on version `v1.5.2-eksbuild.1` or later.
+ The EFA device plugin must be installed on the cluster. For more information, see [aws-efa-k8s-device-plugin](https://github.com/aws/eks-charts/tree/master/stable/aws-efa-k8s-device-plugin) on GitHub.

The metrics that are collected are listed in the following table. 


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `container_efa_rx_bytes` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of bytes per second received by the EFA device(s) allocated to the container. Unit: Bytes/Second  | 
|  `container_efa_tx_bytes` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of bytes per second transmitted by the EFA device(s) allocated to the container. Unit: Bytes/Second  | 
|  `container_efa_rx_dropped` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of packets that were received and then dropped by the EFA device(s) allocated to the container. Unit: Count/Second  | 
|  `container_efa_rdma_read_bytes` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of bytes per second received using remote direct memory access read operations by the EFA device(s) allocated to the container. Unit: Bytes/Second  | 
|  `container_efa_rdma_write_bytes` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of bytes per second transmitted using remote direct memory access read operations by the EFA device(s) allocated to the container. Unit: Bytes/Second  | 
|  `container_efa_rdma_write_recv_bytes` |  `ClusterName` `ClusterName`, `Namespace`, `PodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `ContainerName`, `NetworkInterfaceId`   |  The number of bytes per second received during remote direct memory access write operations by the EFA device(s) allocated to the container. Unit: Bytes/Second  | 
|  `pod_efa_rx_bytes` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of bytes per second received by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `pod_efa_tx_bytes` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of bytes per second transmitted by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `pod_efa_rx_dropped` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of packets that were received and then dropped by the EFA device(s) allocated to the pod. Unit: Count/Second  | 
|  `pod_efa_rdma_read_bytes` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of bytes per second received using remote direct memory access read operations by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `pod_efa_rdma_write_bytes` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of bytes per second transmitted using remote direct memory access read operations by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `pod_efa_rdma_write_recv_bytes` |  `ClusterName` `ClusterName`, `Namespace` `ClusterName`, `Namespace`, `Service` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName` `ClusterName`, `Namespace`, `PodName`, `FullPodName`, `NetworkInterfaceId`  |  The number of bytes per second received during remote direct memory access write operations by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `node_efa_rx_bytes` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of bytes per second received by the EFA device(s) allocated to the node. Unit: Bytes/Second  | 
|  `node_efa_tx_bytes` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of bytes per second transmitted by the EFA device(s) allocated to the node. Unit: Bytes/Second  | 
|  `node_efa_rx_dropped` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of packets that were received and then dropped by the EFA device(s) allocated to the node. Unit: Count/Second  | 
|  `node_efa_rdma_read_bytes` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of bytes per second received using remote direct memory access read operations by the EFA device(s) allocated to the node. Unit: Bytes/Second  | 
|  `node_efa_rdma_write_bytes` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of bytes per second transmitted using remote direct memory access read operations by the EFA device(s) allocated to the pod. Unit: Bytes/Second  | 
|  `node_efa_rdma_write_recv_bytes` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName` `ClusterName`, `InstanceId`, `InstanceType`, `NodeName`, `NetworkInterfaceId`  |  The number of bytes per second received during remote direct memory access write operations by the EFA device(s) allocated to the node. Unit: Bytes/Second  | 

## Amazon SageMaker AI HyperPod metrics
<a name="Container-Insights-metrics-Sagemaker-HyperPod"></a>

Beginning with version `v2.0.1-eksbuild.1` of the CloudWatch Observability EKS add-on, Container Insights with enhanced observability for Amazon EKS automatically collects Amazon SageMaker AI HyperPod metrics from Amazon EKS clusters. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md). For more information about Amazon SageMaker AI HyperPod, see [Amazon SageMaker AI HyperPod](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks.html).

The metrics that are collected are listed in the following table. 


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `hyperpod_node_health_status_unschedulable` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  Indicates if a node is labeled as `Unschedulable` by Amazon SageMaker AI HyperPod. This means that the node is running deep health checks and is not available for running workloads. Unit: Count  | 
|  `hyperpod_node_health_status_schedulable` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  Indicates if a node is labeled as `Schedulable` by Amazon SageMaker AI HyperPod. This means that the node has passed basic health checks or deep health checks and is available for running workloads. Unit: Count  | 
|  `hyperpod_node_health_status_unschedulable_pending_replacement` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  Indicates if a node is labeled as `UnschedulablePendingReplacement` by HyperPod. This means that the node has failed deep health checks or health monitoring agent checks and requires a replacement. If automatic node recovery is enabled, the node will be automatically replaced by Amazon SageMaker AI HyperPod. Unit: Count  | 
|  `hyperpod_node_health_status_unschedulable_pending_reboot` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  Indicates if a node is labeled as `UnschedulablePendingReboot` by Amazon SageMaker AI HyperPod. This means that the node is running deep health checks and requires a reboot. If automatic node recovery is enabled, the node will be automatically rebooted by Amazon SageMaker AI HyperPod. Unit: Count  | 

## Amazon EBS NVMe driver metrics
<a name="Container-Insights-metrics-EBS"></a>

Beginning with version ` 1.300056.0` of the CloudWatch agent, Container Insights with enhanced observability for Amazon EKS automatically collects Amazon EBS NVMe driver metrics from Amazon EKS clusters on Linux instances. The CloudWatch agent must be installed using the CloudWatch Observability Amazon EKS add-on version `4.1.0` or later. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md). For more information about Amazon EBS, see [Amazon EBS detailed performance statistics](https://docs.aws.amazon.com/ebs/latest/userguide/nvme-detailed-performance-stats.html).

For Container Insights to collect Amazon EBS NVMe driver metrics, you must meet the following prerequisites:
+ You must be using Container Insights with enhanced observability for Amazon EKS, with the CloudWatch Observability Amazon EKS add-on version `4.1.0` or later.
+ The EBS CSI driver `1.42.0` add-on or Helm chart must be installed on the cluster with metrics enabled.
  + To enable the metrics when you are using Amazon EBS CSI driver add-on, use the following option when you create or update the add-on. `--configuration-values '{ "node": { "enableMetrics": true } }'`
  + To enable the metrics if you are using Helm chart, use the following option when you create or update the add-on. `--set node.enableMetrics=true`

The metrics that are collected are listed in the following table. 


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `node_diskio_ebs_total_read_ops` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total number of completed read operations. | 
|  `node_diskio_ebs_total_write_ops` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total number of completed write operations. | 
|  `node_diskio_ebs_total_read_bytes` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total number of read bytes transferred. | 
|  `node_diskio_ebs_total_write_bytes` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total number of write bytes transferred. | 
|  `node_diskio_ebs_total_read_time` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time spent, in microseconds, by all completed read operations. | 
|  `node_diskio_ebs_total_write_time` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time spent, in microseconds, by all completed write operations. | 
|  `node_diskio_ebs_volume_performance_exceeded_iops` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time, in microseconds, that IOPS demand exceeded the volume's provisioned IOPS performance. | 
|  `node_diskio_ebs_volume_performance_exceeded_tp` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time, in microseconds, that throughput demand exceeded the volume's provisioned throughput performance. | 
|  `node_diskio_ebs_ec2_instance_performance_exceeded_iops` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time, in microseconds, that the EBS volume exceeded the attached Amazon EC2 instance's maximum IOPS performance. | 
|  `node_diskio_ebs_ec2_instance_performance_exceeded_tp` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The total time, in microseconds, that the EBS volume exceeded the attached Amazon EC2 instance's maximum throughput performance. | 
|  `node_diskio_ebs_volume_queue_length` |  `ClusterName` `ClusterName`, `NodeName`, `InstanceId` `ClusterName`, `NodeName`, `InstanceId` `VolumeId`  | The number of read and write operations waiting to be completed. | 

# Amazon EKS and Kubernetes Container Insights metrics
<a name="Container-Insights-metrics-EKS"></a>

The following tables list the metrics and dimensions that Container Insights collects for Amazon EKS and Kubernetes. These metrics are in the `ContainerInsights` namespace. For more information, see [Metrics](cloudwatch_concepts.md#Metric).

If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights. Metrics do not appear before Container Insights has been set up completely. For more information, see [Setting up Container Insights](deploy-container-insights.md).


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `cluster_failed_node_count`  |  `ClusterName`  |  The number of failed worker nodes in the cluster. A node is considered failed if it is suffering from any *node conditions*. For more information, see [Conditions](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) in the Kubernetes documentation.  | 
|  `cluster_node_count`  |  `ClusterName`  |  The total number of worker nodes in the cluster.  | 
|  `namespace_number_of_running_pods`  |  `Namespace` `ClusterName` `ClusterName`  |  The number of pods running per namespace in the resource that is specified by the dimensions that you're using.  | 
|  `node_cpu_limit`  |  `ClusterName`   |  The maximum number of CPU units that can be assigned to a single node in this cluster.  | 
|  `node_cpu_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker. Formula: `node_cpu_request / node_cpu_limit`  `node_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_cpu_usage_total`  |  `ClusterName`  |  The number of CPU units being used on the nodes in the cluster.  | 
|  `node_cpu_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of CPU units being used on the nodes in the cluster. Formula: `node_cpu_usage_total / node_cpu_limit`  | 
|  `node_gpu_limit` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The total number of GPU(s) available on the node.  | 
|  `node_gpu_usage_total` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The number of GPU(s) being used by the running pods on the node.  | 
|  `node_gpu_reserved_capacity` |  `ClusterName` `ClusterName`, `InstanceId`, `NodeName`  |  The percentage of GPU currently being reserved on the node. The formula is, `node_gpu_request / node_gpu_limit`.  `node_gpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).   | 
|  `node_filesystem_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total percentage of file system capacity being used on nodes in the cluster. Formula: `node_filesystem_usage / node_filesystem_capacity`  `node_filesystem_usage` and `node_filesystem_capacity` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_limit`  |  `ClusterName`  |  The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster.  | 
|  `node_memory_reserved_capacity`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used on the nodes in the cluster. Formula: `node_memory_request / node_memory_limit`  `node_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_memory_utilization`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The percentage of memory currently being used by the node or nodes. It is the percentage of node memory usage divided by the node memory limitation. Formula: `node_memory_working_set / node_memory_limit`.   | 
|  `node_memory_working_set`  |  `ClusterName`   |  The amount of memory, in bytes, being used in the working set of the nodes in the cluster.  | 
|  `node_network_total_bytes`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The total number of bytes per second transmitted and received over the network per node in a cluster. Formula: `node_network_rx_bytes + node_network_tx_bytes`  `node_network_rx_bytes` and `node_network_tx_bytes` are not reported directly as metrics, but are fields in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `node_number_of_running_containers`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running containers per node in a cluster.  | 
|  `node_number_of_running_pods`  |  `NodeName`, `ClusterName`, `InstanceId` `ClusterName`  |  The number of running pods per node in a cluster.  | 
|  `pod_cpu_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName`  |  The CPU capacity that is reserved per pod in a cluster. Formula: `pod_cpu_request / node_cpu_limit`  `pod_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_cpu_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of CPU units being used by pods. Formula: `pod_cpu_usage_total / node_cpu_limit`  | 
|  `pod_cpu_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of CPU units being used by pods relative to the pod limit. Formula: `pod_cpu_usage_total / pod_cpu_limit`  | 
|  `pod_gpu_request` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The GPU requests for the pod. This value must always be equal to `pod_gpu_limit`.  | 
|  `pod_gpu_limit` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The maximum number of GPU(s) that can be assigned to the pod in a node.  | 
|  `pod_gpu_usage_total` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The number of GPU(s) being allocated on the pod.  | 
|  `pod_gpu_reserved_capacity` |  `ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`  |  The percentage of GPU currently being reserved for the pod. The formula is - pod\$1gpu\$1request / node\$1gpu\$1reserved\$1capacity.  | 
|  `pod_memory_reserved_capacity`  |  `PodName`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory that is reserved for pods. Formula: `pod_memory_request / node_memory_limit`  `pod_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_memory_utilization`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory currently being used by the pod or pods. Formula: `pod_memory_working_set / node_memory_limit`  | 
|  `pod_memory_utilization_over_pod_limit`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The percentage of memory that is being used by pods relative to the pod limit. If any containers in the pod don't have a memory limit defined, this metric doesn't appear. Formula: `pod_memory_working_set / pod_memory_limit`  | 
|  `pod_network_rx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of bytes per second being received over the network by the pod. Formula: `sum(pod_interface_network_rx_bytes)`  `pod_interface_network_rx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_network_tx_bytes`  |  `PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of bytes per second being transmitted over the network by the pod. Formula: `sum(pod_interface_network_tx_bytes)`  `pod_interface_network_tx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see [Relevant fields in performance log events for Amazon EKS and Kubernetes](Container-Insights-reference-performance-entries-EKS.md).    | 
|  `pod_number_of_container_restarts`  |  `PodName`, `Namespace`, `ClusterName`  |  The total number of container restarts in a pod.  | 
|  `service_number_of_running_pods`  |  `Service`, `Namespace`, `ClusterName` `ClusterName`  |  The number of pods running the service or services in the cluster.  | 

## Kueue metrics
<a name="Container-Insights-metrics-Kueue"></a>

Beginning with version `v2.4.0-eksbuild.1` of the the CloudWatch Observability EKS add-on, Container Insights for Amazon EKS supports collecting Kueue metrics from Amazon EKS clusters. For more information about the add-on, see [Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart](install-CloudWatch-Observability-EKS-addon.md).

For information about enabling the metrics, see [Enable Kueue metrics](install-CloudWatch-Observability-EKS-addon.md#enable-Kueue-metrics) to enable them.

The Kueue metrics that are collected are listed in the following table. These metrics are published into the `ContainerInsights/Prometheus` namespace in CloudWatch. Some of these metrics use the following dimensions:
+ `ClusterQueue` is the name of the ClusterQueue
+ The possible values of `Status` are `active` and `inadmissible`
+ The possible values of `Reason` are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped`, and `InactiveWorkload`
+ `Flavor` is the referenced flavor.
+ `Resource` refers to cluster computer resources, such as `cpu`, `memory`, `gpu`, and so on. 


| Metric name | Dimensions | Description | 
| --- | --- | --- | 
|  `kueue_pending_workloads` |  `ClusterName`, `ClusterQueue`, `Status` `ClusterName`, `ClusterQueue` `ClusterName`, `Status` `ClusterName`  |  The number of pending workloads.  | 
|  `kueue_evicted_workloads_total` |  `ClusterName`, `ClusterQueue`, `Reason` `ClusterName`, `ClusterQueue` `ClusterName`, `Reason` `ClusterName`  |  The total number of evicted workloads.  | 
|  `kueue_admitted_active_workloads` |  `ClusterName`, `ClusterQueue` `ClusterName`  |  The number of admitted workloads that are active (unsuspended and not finished).  | 
|  `kueue_cluster_queue_resource_usage` |  `ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`  |  Reports the total resource usage of the ClusterQueue.  | 
|  `kueue_cluster_queue_nominal_quota` |  `ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`  |  Reports the resource quota of the ClusterQueue.  | 

# Container Insights performance log reference
<a name="Container-Insights-reference"></a>

This section includes reference information about how Container Insights uses performance log events to collect metrics. When you deploy Container Insights, it automatically creates a log group for the performance log events. You don't need to create this log group yourself.

**Topics**
+ [

# Container Insights performance log events for Amazon ECS
](Container-Insights-reference-performance-logs-ECS.md)
+ [

# Container Insights performance log events for Amazon EKS and Kubernetes
](Container-Insights-reference-performance-logs-EKS.md)
+ [

# Relevant fields in performance log events for Amazon EKS and Kubernetes
](Container-Insights-reference-performance-entries-EKS.md)

# Container Insights performance log events for Amazon ECS
<a name="Container-Insights-reference-performance-logs-ECS"></a>

The following are examples of the performance log events that Container Insights collects from Amazon ECS.

These logs are in CloudWatch Logs, in a log group named `/aws/ecs/containerinsights/CLUSTER_NAME/performance`. Within that log group, each container instance will has a log stream named `AgentTelemetry-CONTAINER_INSTANCE_ID`.

You can query these logs using queries such as `{ $.Type = "Container" }` to view all container log events. 

**Type: Container**

```
{
	"Version":"0",
	"Type":"Container",
	"ContainerName":"sleep",
	"TaskId":"7ac4dfba69214411b4783a3b8189c9ba",
	"TaskDefinitionFamily":"sleep360",
	"TaskDefinitionRevision":"1",
	"ContainerInstanceId":"0d7650e6dec34c1a9200f72098071e8f",
	"EC2InstanceId":"i-0c470579dbcdbd2f3",
	"ClusterName":"MyCluster",
	"Image":"busybox",
	"ContainerKnownStatus":"RUNNING",
	"Timestamp":1623963900000,
	"CpuUtilized":0.0,
	"CpuReserved":10.0,
	"MemoryUtilized":0,
	"MemoryReserved":10,
	"StorageReadBytes":0,
	"StorageWriteBytes":0,
	"NetworkRxBytes":0,
	"NetworkRxDropped":0,
	"NetworkRxErrors":0,
	"NetworkRxPackets":14,
	"NetworkTxBytes":0,
	"NetworkTxDropped":0,
	"NetworkTxErrors":0,
	"NetworkTxPackets":0
}
```

**Type: Task**

Even though the units for `StorageReadBytes` and `StorageWriteBytes` are in Bytes/Second, the values represent the cumulative number of bytes read from and written to storage, respectively. 

```
{
    "Version": "0",
    "Type": "Task",
    "TaskId": "7ac4dfba69214411b4783a3b8189c9ba",
    "TaskDefinitionFamily": "sleep360",
    "TaskDefinitionRevision": "1",
    "ContainerInstanceId": "0d7650e6dec34c1a9200f72098071e8f",
    "EC2InstanceId": "i-0c470579dbcdbd2f3",
    "ClusterName": "MyCluster",
    "AccountID": "637146863587",
    "Region": "us-west-2",
    "AvailabilityZone": "us-west-2b",
    "KnownStatus": "RUNNING",
    "LaunchType": "EC2",
    "PullStartedAt": 1623963608201,
    "PullStoppedAt": 1623963610065,
    "CreatedAt": 1623963607094,
    "StartedAt": 1623963610382,
    "Timestamp": 1623963900000,
    "CpuUtilized": 0.0,
    "CpuReserved": 10.0,
    "MemoryUtilized": 0,
    "MemoryReserved": 10,
    "StorageReadBytes": 0,
    "StorageWriteBytes": 0,
    "NetworkRxBytes": 0,
    "NetworkRxDropped": 0,
    "NetworkRxErrors": 0,
    "NetworkRxPackets": 14,
    "NetworkTxBytes": 0,
    "NetworkTxDropped": 0,
    "NetworkTxErrors": 0,
    "NetworkTxPackets": 0,
    "EBSFilesystemUtilized": 10,
    "EBSFilesystemSize": 20,
    "CloudWatchMetrics": [
        {
            "Namespace": "ECS/ContainerInsights",
            "Metrics": [
                {
                    "Name": "CpuUtilized",
                    "Unit": "None"
                },
                {
                    "Name": "CpuReserved",
                    "Unit": "None"
                },
                {
                    "Name": "MemoryUtilized",
                    "Unit": "Megabytes"
                },
                {
                    "Name": "MemoryReserved",
                    "Unit": "Megabytes"
                },
                {
                    "Name": "StorageReadBytes",
                    "Unit": "Bytes/Second"
                },
                {
                    "Name": "StorageWriteBytes",
                    "Unit": "Bytes/Second"
                },
                {
                    "Name": "NetworkRxBytes",
                    "Unit": "Bytes/Second"
                },
                {
                    "Name": "NetworkTxBytes",
                    "Unit": "Bytes/Second"
                },
                {
                    "Name": "EBSFilesystemSize",
                    "Unit": "Gigabytes"
                },
                {
                    "Name": "EBSFilesystemUtilzed",
                    "Unit": "Gigabytes"
                }
            ],
            "Dimensions": [
                ["ClusterName"],
                [
                    "ClusterName",
                    "TaskDefinitionFamily"
                ]
            ]
        }
    ]
}
```

**Type: Service**

```
{   
    "Version": "0",
    "Type": "Service",
    "ServiceName": "myCIService",
    "ClusterName": "myCICluster",
    "Timestamp": 1561586460000,
    "DesiredTaskCount": 2,
    "RunningTaskCount": 2,
    "PendingTaskCount": 0,
    "DeploymentCount": 1,
    "TaskSetCount": 0,
    "CloudWatchMetrics": [
        {
            "Namespace": "ECS/ContainerInsights",
            "Metrics": [
                {
                    "Name": "DesiredTaskCount",
                    "Unit": "Count"
                },
                {
                    "Name": "RunningTaskCount",
                    "Unit": "Count"
                },
                {
                    "Name": "PendingTaskCount",
                    "Unit": "Count"
                },
                {
                    "Name": "DeploymentCount",
                    "Unit": "Count"
                },
                {
                    "Name": "TaskSetCount",
                    "Unit": "Count"
                }
            ],
            "Dimensions": [
                [
                    "ServiceName",
                    "ClusterName"
                ]
            ]
        }
    ]
}
```

**Type: Volume**

```
{
    "Version": "0",
    "Type": "Volume",
    "TaskDefinitionFamily": "myCITaskDef",
    "TaskId": "7ac4dfba69214411b4783a3b8189c9ba",
    "ClusterName": "myCICluster",
    "ServiceName": "myCIService",
    "VolumeId": "vol-1233436545ff708cb",
    "InstanceId": "i-0c470579dbcdbd2f3",
    "LaunchType": "EC2",
    "VolumeName": "MyVolumeName",
    "EBSFilesystemUtilized": 10,
    "EBSFilesystemSize": 20,
    "CloudWatchMetrics": [
        {
            "Namespace": "ECS/ContainerInsights",
            "Metrics": [
                {
                    "Name": "EBSFilesystemSize",
                    "Unit": "Gigabytes"
                },
                {
                    "Name": "EBSFilesystemUtilzed",
                    "Unit": "Gigabytes"
                }
            ],
            "Dimensions": [
                ["ClusterName"],
                [
                    "VolumeName",
                    "TaskDefinitionFamily",
                    "ClusterName"
                ],
                [
                    "ServiceName",
                    "ClusterName"
                ]
            ]
        }
    ]
}
```

**Type: Cluster**

```
{
    "Version": "0",
    "Type": "Cluster",
    "ClusterName": "myCICluster",
    "Timestamp": 1561587300000,
    "TaskCount": 5,
    "ContainerInstanceCount": 5,
    "ServiceCount": 2,
    "CloudWatchMetrics": [
        {
            "Namespace": "ECS/ContainerInsights",
            "Metrics": [
                {
                    "Name": "TaskCount",
                    "Unit": "Count"
                },
                {
                    "Name": "ContainerInstanceCount",
                    "Unit": "Count"
                },
                {
                    "Name": "ServiceCount",
                    "Unit": "Count"
                }
            ],
            "Dimensions": [
                [
                    "ClusterName"
                ]
            ]
        }
    ]
}
```

# Container Insights performance log events for Amazon EKS and Kubernetes
<a name="Container-Insights-reference-performance-logs-EKS"></a>

The following are examples of the performance log events that Container Insights collects from Amazon EKS and Kubernetes clusters.

**Type: Node**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Percent",
          "Name": "node_cpu_utilization"
        },
        {
          "Unit": "Percent",
          "Name": "node_memory_utilization"
        },
        {
          "Unit": "Bytes/Second",
          "Name": "node_network_total_bytes"
        },
        {
          "Unit": "Percent",
          "Name": "node_cpu_reserved_capacity"
        },
        {
          "Unit": "Percent",
          "Name": "node_memory_reserved_capacity"
        },
        {
          "Unit": "Count",
          "Name": "node_number_of_running_pods"
        },
        {
          "Unit": "Count",
          "Name": "node_number_of_running_containers"
        }
      ],
      "Dimensions": [
        [
          "NodeName",
          "InstanceId",
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    },
    {
      "Metrics": [
        {
          "Unit": "Percent",
          "Name": "node_cpu_utilization"
        },
        {
          "Unit": "Percent",
          "Name": "node_memory_utilization"
        },
        {
          "Unit": "Bytes/Second",
          "Name": "node_network_total_bytes"
        },
        {
          "Unit": "Percent",
          "Name": "node_cpu_reserved_capacity"
        },
        {
          "Unit": "Percent",
          "Name": "node_memory_reserved_capacity"
        },
        {
          "Unit": "Count",
          "Name": "node_number_of_running_pods"
        },
        {
          "Unit": "Count",
          "Name": "node_number_of_running_containers"
        },
        {
          "Name": "node_cpu_usage_total"
        },
        {
          "Name": "node_cpu_limit"
        },
        {
          "Unit": "Bytes",
          "Name": "node_memory_working_set"
        },
        {
          "Unit": "Bytes",
          "Name": "node_memory_limit"
        }
      ],
      "Dimensions": [
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "Sources": [
    "cadvisor",
    "/proc",
    "pod",
    "calculated"
  ],
  "Timestamp": "1567096682364",
  "Type": "Node",
  "Version": "0",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal"
  },
  "node_cpu_limit": 4000,
  "node_cpu_request": 1130,
  "node_cpu_reserved_capacity": 28.249999999999996,
  "node_cpu_usage_system": 33.794636630852764,
  "node_cpu_usage_total": 136.47852169244098,
  "node_cpu_usage_user": 71.67075111567326,
  "node_cpu_utilization": 3.4119630423110245,
  "node_memory_cache": 3103297536,
  "node_memory_failcnt": 0,
  "node_memory_hierarchical_pgfault": 0,
  "node_memory_hierarchical_pgmajfault": 0,
  "node_memory_limit": 16624865280,
  "node_memory_mapped_file": 406646784,
  "node_memory_max_usage": 4230746112,
  "node_memory_pgfault": 0,
  "node_memory_pgmajfault": 0,
  "node_memory_request": 1115684864,
  "node_memory_reserved_capacity": 6.7109407818311055,
  "node_memory_rss": 798146560,
  "node_memory_swap": 0,
  "node_memory_usage": 3901444096,
  "node_memory_utilization": 6.601302600149552,
  "node_memory_working_set": 1097457664,
  "node_network_rx_bytes": 35918.392817386324,
  "node_network_rx_dropped": 0,
  "node_network_rx_errors": 0,
  "node_network_rx_packets": 157.67565245448117,
  "node_network_total_bytes": 68264.20276554905,
  "node_network_tx_bytes": 32345.80994816272,
  "node_network_tx_dropped": 0,
  "node_network_tx_errors": 0,
  "node_network_tx_packets": 154.21455923431654,
  "node_number_of_running_containers": 16,
  "node_number_of_running_pods": 13
}
```

**Type: NodeFS**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Percent",
          "Name": "node_filesystem_utilization"
        }
      ],
      "Dimensions": [
        [
          "NodeName",
          "InstanceId",
          "ClusterName"
        ],
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "EBSVolumeId": "aws://us-west-2b/vol-0a53108976d4a2fda",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "Sources": [
    "cadvisor",
    "calculated"
  ],
  "Timestamp": "1567097939726",
  "Type": "NodeFS",
  "Version": "0",
  "device": "/dev/nvme0n1p1",
  "fstype": "vfs",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal"
  },
  "node_filesystem_available": 17298395136,
  "node_filesystem_capacity": 21462233088,
  "node_filesystem_inodes": 10484720,
  "node_filesystem_inodes_free": 10367158,
  "node_filesystem_usage": 4163837952,
  "node_filesystem_utilization": 19.400767547940255
}
```

**Type: NodeDiskIO**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "ClusterName": "myCICluster",
  "EBSVolumeId": "aws://us-west-2b/vol-0a53108976d4a2fda",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "Sources": [
    "cadvisor"
  ],
  "Timestamp": "1567096928131",
  "Type": "NodeDiskIO",
  "Version": "0",
  "device": "/dev/nvme0n1",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal"
  },
  "node_diskio_io_service_bytes_async": 9750.505814277016,
  "node_diskio_io_service_bytes_read": 0,
  "node_diskio_io_service_bytes_sync": 230.6174506688036,
  "node_diskio_io_service_bytes_total": 9981.123264945818,
  "node_diskio_io_service_bytes_write": 9981.123264945818,
  "node_diskio_io_serviced_async": 1.153087253344018,
  "node_diskio_io_serviced_read": 0,
  "node_diskio_io_serviced_sync": 0.03603397666700056,
  "node_diskio_io_serviced_total": 1.1891212300110185,
  "node_diskio_io_serviced_write": 1.1891212300110185
}
```

**Type: NodeNet**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "ClusterName": "myCICluster",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "Sources": [
    "cadvisor",
    "calculated"
  ],
  "Timestamp": "1567096928131",
  "Type": "NodeNet",
  "Version": "0",
  "interface": "eni972f6bfa9a0",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal"
  },
  "node_interface_network_rx_bytes": 3163.008420864309,
  "node_interface_network_rx_dropped": 0,
  "node_interface_network_rx_errors": 0,
  "node_interface_network_rx_packets": 16.575629266820258,
  "node_interface_network_total_bytes": 3518.3935157426017,
  "node_interface_network_tx_bytes": 355.385094878293,
  "node_interface_network_tx_dropped": 0,
  "node_interface_network_tx_errors": 0,
  "node_interface_network_tx_packets": 3.9997714100370625
}
```

**Type: Pod**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Percent",
          "Name": "pod_cpu_utilization"
        },
        {
          "Unit": "Percent",
          "Name": "pod_memory_utilization"
        },
        {
          "Unit": "Bytes/Second",
          "Name": "pod_network_rx_bytes"
        },
        {
          "Unit": "Bytes/Second",
          "Name": "pod_network_tx_bytes"
        },
        {
          "Unit": "Percent",
          "Name": "pod_cpu_utilization_over_pod_limit"
        },
        {
          "Unit": "Percent",
          "Name": "pod_memory_utilization_over_pod_limit"
        }
      ],
      "Dimensions": [
        [
          "PodName",
          "Namespace",
          "ClusterName"
        ],
        [
          "Service",
          "Namespace",
          "ClusterName"
        ],
        [
          "Namespace",
          "ClusterName"
        ],
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    },
    {
      "Metrics": [
        {
          "Unit": "Percent",
          "Name": "pod_cpu_reserved_capacity"
        },
        {
          "Unit": "Percent",
          "Name": "pod_memory_reserved_capacity"
        }
      ],
      "Dimensions": [
        [
          "PodName",
          "Namespace",
          "ClusterName"
        ],
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    },
    {
      "Metrics": [
        {
          "Unit": "Count",
          "Name": "pod_number_of_container_restarts"
        }
      ],
      "Dimensions": [
        [
          "PodName",
          "Namespace",
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "Namespace": "amazon-cloudwatch",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "PodName": "cloudwatch-agent-statsd",
  "Service": "cloudwatch-agent-statsd",
  "Sources": [
    "cadvisor",
    "pod",
    "calculated"
  ],
  "Timestamp": "1567097351092",
  "Type": "Pod",
  "Version": "0",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal",
    "labels": {
      "app": "cloudwatch-agent-statsd",
      "pod-template-hash": "df44f855f"
    },
    "namespace_name": "amazon-cloudwatch",
    "pod_id": "2f4ff5ac-c813-11e9-a31d-06e9dde32928",
    "pod_name": "cloudwatch-agent-statsd-df44f855f-ts4q2",
    "pod_owners": [
      {
        "owner_kind": "Deployment",
        "owner_name": "cloudwatch-agent-statsd"
      }
    ],
    "service_name": "cloudwatch-agent-statsd"
  },
  "pod_cpu_limit": 200,
  "pod_cpu_request": 200,
  "pod_cpu_reserved_capacity": 5,
  "pod_cpu_usage_system": 1.4504841104992765,
  "pod_cpu_usage_total": 5.817016867430125,
  "pod_cpu_usage_user": 1.1281543081661038,
  "pod_cpu_utilization": 0.14542542168575312,
  "pod_cpu_utilization_over_pod_limit": 2.9085084337150624,
  "pod_memory_cache": 8192,
  "pod_memory_failcnt": 0,
  "pod_memory_hierarchical_pgfault": 0,
  "pod_memory_hierarchical_pgmajfault": 0,
  "pod_memory_limit": 104857600,
  "pod_memory_mapped_file": 0,
  "pod_memory_max_usage": 25268224,
  "pod_memory_pgfault": 0,
  "pod_memory_pgmajfault": 0,
  "pod_memory_request": 104857600,
  "pod_memory_reserved_capacity": 0.6307275170893897,
  "pod_memory_rss": 22777856,
  "pod_memory_swap": 0,
  "pod_memory_usage": 25141248,
  "pod_memory_utilization": 0.10988455961791709,
  "pod_memory_utilization_over_pod_limit": 17.421875,
  "pod_memory_working_set": 18268160,
  "pod_network_rx_bytes": 9880.697124714186,
  "pod_network_rx_dropped": 0,
  "pod_network_rx_errors": 0,
  "pod_network_rx_packets": 107.80005532263283,
  "pod_network_total_bytes": 10158.829201483635,
  "pod_network_tx_bytes": 278.13207676944796,
  "pod_network_tx_dropped": 0,
  "pod_network_tx_errors": 0,
  "pod_network_tx_packets": 1.146027574644318,
  "pod_number_of_container_restarts": 0,
  "pod_number_of_containers": 1,
  "pod_number_of_running_containers": 1,
  "pod_status": "Running"
}
```

**Type: PodNet**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "ClusterName": "myCICluster",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "Namespace": "amazon-cloudwatch",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "PodName": "cloudwatch-agent-statsd",
  "Service": "cloudwatch-agent-statsd",
  "Sources": [
    "cadvisor",
    "calculated"
  ],
  "Timestamp": "1567097351092",
  "Type": "PodNet",
  "Version": "0",
  "interface": "eth0",
  "kubernetes": {
    "host": "ip-192-168-75-26.us-west-2.compute.internal",
    "labels": {
      "app": "cloudwatch-agent-statsd",
      "pod-template-hash": "df44f855f"
    },
    "namespace_name": "amazon-cloudwatch",
    "pod_id": "2f4ff5ac-c813-11e9-a31d-06e9dde32928",
    "pod_name": "cloudwatch-agent-statsd-df44f855f-ts4q2",
    "pod_owners": [
      {
        "owner_kind": "Deployment",
        "owner_name": "cloudwatch-agent-statsd"
      }
    ],
    "service_name": "cloudwatch-agent-statsd"
  },
  "pod_interface_network_rx_bytes": 9880.697124714186,
  "pod_interface_network_rx_dropped": 0,
  "pod_interface_network_rx_errors": 0,
  "pod_interface_network_rx_packets": 107.80005532263283,
  "pod_interface_network_total_bytes": 10158.829201483635,
  "pod_interface_network_tx_bytes": 278.13207676944796,
  "pod_interface_network_tx_dropped": 0,
  "pod_interface_network_tx_errors": 0,
  "pod_interface_network_tx_packets": 1.146027574644318
}
```

**Type: Container**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-sample",
  "ClusterName": "myCICluster",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "Namespace": "amazon-cloudwatch",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "PodName": "cloudwatch-agent-statsd",
  "Service": "cloudwatch-agent-statsd",
  "Sources": [
    "cadvisor",
    "pod",
    "calculated"
  ],
  "Timestamp": "1567097399912",
  "Type": "Container",
  "Version": "0",
  "container_cpu_limit": 200,
  "container_cpu_request": 200,
  "container_cpu_usage_system": 1.87958283771964,
  "container_cpu_usage_total": 6.159993652997942,
  "container_cpu_usage_user": 1.6707403001952357,
  "container_cpu_utilization": 0.15399984132494854,
  "container_memory_cache": 8192,
  "container_memory_failcnt": 0,
  "container_memory_hierarchical_pgfault": 0,
  "container_memory_hierarchical_pgmajfault": 0,
  "container_memory_limit": 104857600,
  "container_memory_mapped_file": 0,
  "container_memory_max_usage": 24580096,
  "container_memory_pgfault": 0,
  "container_memory_pgmajfault": 0,
  "container_memory_request": 104857600,
  "container_memory_rss": 22736896,
  "container_memory_swap": 0,
  "container_memory_usage": 24453120,
  "container_memory_utilization": 0.10574541028701798,
  "container_memory_working_set": 17580032,
  "container_status": "Running",
  "kubernetes": {
    "container_name": "cloudwatch-agent",
    "docker": {
      "container_id": "8967b6b37da239dfad197c9fdea3e5dfd35a8a759ec86e2e4c3f7b401e232706"
    },
    "host": "ip-192-168-75-26.us-west-2.compute.internal",
    "labels": {
      "app": "cloudwatch-agent-statsd",
      "pod-template-hash": "df44f855f"
    },
    "namespace_name": "amazon-cloudwatch",
    "pod_id": "2f4ff5ac-c813-11e9-a31d-06e9dde32928",
    "pod_name": "cloudwatch-agent-statsd-df44f855f-ts4q2",
    "pod_owners": [
      {
        "owner_kind": "Deployment",
        "owner_name": "cloudwatch-agent-statsd"
      }
    ],
    "service_name": "cloudwatch-agent-statsd"
  },
  "number_of_container_restarts": 0
}
```

**Type: ContainerFS**

```
{
  "AutoScalingGroupName": "eksctl-myCICluster-nodegroup-standard-workers-NodeGroup-1174PV2WHZAYU",
  "ClusterName": "myCICluster",
  "EBSVolumeId": "aws://us-west-2b/vol-0a53108976d4a2fda",
  "InstanceId": "i-1234567890123456",
  "InstanceType": "t3.xlarge",
  "Namespace": "amazon-cloudwatch",
  "NodeName": "ip-192-0-2-0.us-west-2.compute.internal",
  "PodName": "cloudwatch-agent-statsd",
  "Service": "cloudwatch-agent-statsd",
  "Sources": [
    "cadvisor",
    "calculated"
  ],
  "Timestamp": "1567097399912",
  "Type": "ContainerFS",
  "Version": "0",

  "device": "/dev/nvme0n1p1",
  "fstype": "vfs",
  "kubernetes": {
    "container_name": "cloudwatch-agent",
    "docker": {
      "container_id": "8967b6b37da239dfad197c9fdea3e5dfd35a8a759ec86e2e4c3f7b401e232706"
    },
    "host": "ip-192-168-75-26.us-west-2.compute.internal",
    "labels": {
      "app": "cloudwatch-agent-statsd",
      "pod-template-hash": "df44f855f"
    },
    "namespace_name": "amazon-cloudwatch",
    "pod_id": "2f4ff5ac-c813-11e9-a31d-06e9dde32928",
    "pod_name": "cloudwatch-agent-statsd-df44f855f-ts4q2",
    "pod_owners": [
      {
        "owner_kind": "Deployment",
        "owner_name": "cloudwatch-agent-statsd"
      }
    ],
    "service_name": "cloudwatch-agent-statsd"
  }
}
```

**Type: Cluster**

```
{
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Count",
          "Name": "cluster_node_count"
        },
        {
          "Unit": "Count",
          "Name": "cluster_failed_node_count"
        }
      ],
      "Dimensions": [
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "Sources": [
    "apiserver"
  ],
  "Timestamp": "1567097534160",
  "Type": "Cluster",
  "Version": "0",
  "cluster_failed_node_count": 0,
  "cluster_node_count": 3
}
```

**Type: ClusterService**

```
{
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Count",
          "Name": "service_number_of_running_pods"
        }
      ],
      "Dimensions": [
        [
          "Service",
          "Namespace",
          "ClusterName"
        ],
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "Namespace": "amazon-cloudwatch",
  "Service": "cloudwatch-agent-statsd",
  "Sources": [
    "apiserver"
  ],
  "Timestamp": "1567097534160",
  "Type": "ClusterService",
  "Version": "0",
  "kubernetes": {
    "namespace_name": "amazon-cloudwatch",
    "service_name": "cloudwatch-agent-statsd"
  },
  "service_number_of_running_pods": 1
}
```

**Type: ClusterNamespace**

```
{
  "CloudWatchMetrics": [
    {
      "Metrics": [
        {
          "Unit": "Count",
          "Name": "namespace_number_of_running_pods"
        }
      ],
      "Dimensions": [
        [
          "Namespace",
          "ClusterName"
        ],
        [
          "ClusterName"
        ]
      ],
      "Namespace": "ContainerInsights"
    }
  ],
  "ClusterName": "myCICluster",
  "Namespace": "amazon-cloudwatch",
  "Sources": [
    "apiserver"
  ],
  "Timestamp": "1567097594160",
  "Type": "ClusterNamespace",
  "Version": "0",
  "kubernetes": {
    "namespace_name": "amazon-cloudwatch"
  },
  "namespace_number_of_running_pods": 7
}
```

# Relevant fields in performance log events for Amazon EKS and Kubernetes
<a name="Container-Insights-reference-performance-entries-EKS"></a>

For Amazon EKS and Kubernetes, the containerized CloudWatch agent emits data as performance log events. This enables CloudWatch to ingest and store high-cardinality data. CloudWatch uses the data in the performance log events to create aggregated CloudWatch metrics at the cluster, node, and pod levels without the need to lose granular details.

The following table lists the fields in these performance log events that are relevant to the collection of Container Insights metric data. You can use CloudWatch Logs Insights to query for any of these fields to collect data or investigate issues. For more information, see [Analyze Log Data With CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html).


| Type | Log field | Source | Formula or notes | 
| --- | --- | --- | --- | 
|  Pod |  `pod_cpu_utilization`  |  Calculated  |  Formula: `pod_cpu_usage_total / node_cpu_limit`  | 
|  Pod |  `pod_cpu_usage_total` `pod_cpu_usage_total` is reported in millicores.  |  cadvisor  |   | 
|  Pod |  `pod_cpu_limit`  |  Calculated  |  Formula: `sum(container_cpu_limit)`  `sum(container_cpu_limit)` includes already-completed pods. If any containers in the pod don't have a CPU limit defined, this field doesn't appear in the log event. This includes [ init containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources).  | 
|  Pod |  `pod_cpu_request`  |  Calculated  |  Formula: `sum(container_cpu_request)` `container_cpu_request` isn't guaranteed to be set. Only the ones that are set are included in the sum.  | 
|  Pod |  `pod_cpu_utilization_over_pod_limit`  |  Calculated  |  Formula: `pod_cpu_usage_total / pod_cpu_limit`  | 
|  Pod |  `pod_cpu_reserved_capacity`  |  Calculated  |  Formula: `pod_cpu_request / node_cpu_limit`  | 
|  Pod |  `pod_memory_utilization`  |  Calculated  |  Formula: `pod_memory_working_set / node_memory_limit` It is the percentage of pod memory usage over the node memory limitation.  | 
|  Pod |  `pod_memory_working_set`  |  cadvisor  |   | 
|  Pod |  `pod_memory_limit`  |  Calculated  |  Formula: `sum(container_memory_limit)` If any containers in the pod don't have a memory limit defined, this field doesn't appear in the log event. This includes [ init containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources).  | 
|  Pod |  `pod_memory_request`  |  Calculated  |  Formula: `sum(container_memory_request)` `container_memory_request` isn't guaranteed to be set. Only the ones that are set are included in the sum.  | 
|  Pod |  `pod_memory_utilization_over_pod_limit`  |  Calculated  |  Formula: `pod_memory_working_set / pod_memory_limit` If any containers in the pod don't have a memory limit defined, this field doesn't appear in the log event. This includes [ init containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources).  | 
|  Pod |  `pod_memory_reserved_capacity`  |  Calculated  |  Formula: `pod_memory_request / node_memory_limit`  | 
|  Pod |  `pod_network_tx_bytes`  |  Calculated  |  Formula: `sum(pod_interface_network_tx_bytes)` This data is available for all the network interfaces per pod. The CloudWatch agent calculates the total and adds metric extraction rules.  | 
|  Pod |  `pod_network_rx_bytes`  |  Calculated  |  Formula: `sum(pod_interface_network_rx_bytes)`  | 
|  Pod |  `pod_network_total_bytes`  |  Calculated  |  Formula: `pod_network_rx_bytes + pod_network_tx_bytes`  | 
|  PodNet |  `pod_interface_network_rx_bytes`  |  cadvisor  | This data is network rx bytes per second of a pod network interface.  | 
|  PodNet |  `pod_interface_network_tx_bytes`  |  cadvisor  | This data is network tx bytes per second of a pod network interface. | 
|  Container |  `container_cpu_usage_total`  |  cadvisor  |   | 
|  Container |  `container_cpu_limit`  |  cadvisor  |  Not guaranteed to be set. It's not emitted if it's not set. | 
|  Container |  `container_cpu_request`  |  cadvisor  |  Not guaranteed to be set. It's not emitted if it's not set. | 
|  Container |  `container_memory_working_set`  |  cadvisor  |   | 
|  Container |  `container_memory_limit`  |  pod  |  Not guaranteed to be set. It's not emitted if it's not set. | 
|  Container |  `container_memory_request`  |  pod  |  Not guaranteed to be set. It's not emitted if it's not set. | 
|  Node |  `node_cpu_utilization`  |  Calculated  |  Formula: `node_cpu_usage_total / node_cpu_limit`  | 
|  Node |  `node_cpu_usage_total`  |  cadvisor  |   | 
|  Node |  `node_cpu_limit`  |  /proc  |   | 
|  Node |  `node_cpu_request`  |  Calculated  | Formula: `sum(pod_cpu_request)` For cronjobs,`node_cpu_request` also includes requests from completed pods. This can lead to a high value for `node_cpu_reserved_capacity`.  | 
|  Node |  `node_cpu_reserved_capacity`  |  Calculated  | Formula: `node_cpu_request / node_cpu_limit`  | 
|  Node |  `node_memory_utilization`  |  Calculated  | Formula: `node_memory_working_set / node_memory_limit`  | 
|  Node |  `node_memory_working_set`  |  cadvisor  |   | 
|  Node |  `node_memory_limit`  |  /proc  |   | 
|  Node |  `node_memory_request`  |  Calculated  |  Formula: `sum(pod_memory_request)`  | 
|  Node |  `node_memory_reserved_capacity`  |  Calculated  | Formula: `node_memory_request / node_memory_limit`  | 
|  Node |  `node_network_rx_bytes`  |  Calculated  | Formula: `sum(node_interface_network_rx_bytes)`  | 
|  Node |  `node_network_tx_bytes`  |  Calculated  | Formula: `sum(node_interface_network_tx_bytes)`  | 
|  Node |  `node_network_total_bytes`  |  Calculated  | Formula: `node_network_rx_bytes + node_network_tx_bytes`  | 
|  Node |  `node_number_of_running_pods`  |  Pod List  |   | 
|  Node |  `node_number_of_running_containers`  |  Pod List  |   | 
|  NodeNet |  `node_interface_network_rx_bytes`  |  cadvisor  |  This data is network rx bytes per second of a worker node network interface.  | 
|  NodeNet |  `node_interface_network_tx_bytes`  |  cadvisor  |  This data is network tx bytes per second of a worker node network interface.  | 
|  NodeFS |  `node_filesystem_capacity`  |  cadvisor  |   | 
|  NodeFS |  `node_filesystem_usage`  |  cadvisor  |   | 
|  NodeFS |  `node_filesystem_utilization`  |  Calculated  |  Formula: `node_filesystem_usage / node_filesystem_capacity` This data is available per device name.  | 
|  Cluster |  `cluster_failed_node_count`  |  API Server  |   | 
|  Cluster |  `cluster_node_count`  |  API Server  |   | 
|  Service |  `service_number_of_running_pods`  |  API Server  |   | 
|  `Namespace` |  `namespace_number_of_running_pods`  |  API Server  |   | 

## Metrics calculation examples
<a name="Container-Insights-calculation-examples"></a>

This section includes examples that show how some of the values in the preceding table are calculated.

Suppose that you have a cluster in the following state.

```
Node1
   node_cpu_limit = 4
   node_cpu_usage_total = 3
   
   Pod1
     pod_cpu_usage_total = 2
     
     Container1
        container_cpu_limit = 1
        container_cpu_request = 1
        container_cpu_usage_total = 0.8
        
     Container2
        container_cpu_limit = null
        container_cpu_request = null
        container_cpu_usage_total = 1.2
        
   Pod2
     pod_cpu_usage_total = 0.4
     
     Container3
        container_cpu_limit = 1
        container_cpu_request = 0.5
        container_cpu_usage_total = 0.4
        
Node2
   node_cpu_limit = 8
   node_cpu_usage_total = 1.5
   
   Pod3
     pod_cpu_usage_total = 1
     
     Container4
        container_cpu_limit = 2
        container_cpu_request = 2
        container_cpu_usage_total = 1
```

The following table shows how pod CPU metrics are calculated using this data.


| Metric | Formula | Pod1 | Pod2 | Pod3 | 
| --- | --- | --- | --- | --- | 
|  `pod_cpu_utilization` |  `pod_cpu_usage_total / node_cpu_limit`  |  2 / 4 = 50%  |  0.4 / 4 = 10%  |  1 / 8 = 12.5%  | 
|  `pod_cpu_utilization_over_pod_limit` |  `pod_cpu_usage_total / sum(container_cpu_limit)`  |  N/A because CPU limit for `Container2` isn't defined  |  0.4 / 1 = 40%  |  1 / 2 = 50%  | 
|  `pod_cpu_reserved_capacity` |  `sum(container_cpu_request) / node_cpu_limit`  |  (1 \$1 0) / 4 = 25%  |  0.5 / 4 = 12.5%  |  2 / 8 = 25%  | 

The following table shows how node CPU metrics are calculated using this data.


| Metric | Formula | Node1 | Node2 | 
| --- | --- | --- | --- | 
|  `node_cpu_utilization` |  `node_cpu_usage_total / node_cpu_limit`  |  3 / 4 = 75%  |  1.5 / 8 = 18.75%  | 
|  `node_cpu_reserved_capacity` |  `sum(pod_cpu_request) / node_cpu_limit`  |  1.5 / 4 = 37.5%  |  2 / 8 = 25%  | 

# Container Insights Prometheus metrics monitoring
<a name="ContainerInsights-Prometheus"></a>

CloudWatch Container Insights monitoring for Prometheus automates the discovery of Prometheus metrics from containerized systems and workloads. Prometheus is an open-source systems monitoring and alerting toolkit. For more information, see [ What is Prometheus?](https://prometheus.io/docs/introduction/overview/) in the Prometheus documentation.

Discovering Prometheus metrics is supported for [Amazon Elastic Container Service](https://aws.amazon.com/ecs/), [Amazon Elastic Kubernetes Service](https://aws.amazon.com/eks/) and [Kubernetes](https://aws.amazon.com/kubernetes/) clusters running on Amazon EC2 instances. The Prometheus counter, gauge, and summary metric types are collected.

For Amazon ECS and Amazon EKS clusters, both the EC2 and Fargate launch types are supported. Container Insights automatically collects metrics from several workloads, and you can configure it to collect metrics from any workload.

You can adopt Prometheus as an open-source and open-standard method to ingest custom metrics in CloudWatch. The CloudWatch agent with Prometheus support discovers and collects Prometheus metrics to monitor, troubleshoot, and alarm on application performance degradation and failures faster. This also reduces the number of monitoring tools required to improve observability.

Container Insights Prometheus support involves pay-per-use of metrics and logs, including collecting, storing, and analyzing. For more information, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

**Pre-built dashboards for some workloads**

The Container Insights Prometheus solution includes pre-built dashboards for the popular workloads that are listed in this section. For sample configurations for these workloads, see [(Optional) Set up sample containerized Amazon ECS workloads for Prometheus metric testing](ContainerInsights-Prometheus-Sample-Workloads-ECS.md) and [(Optional) Set up sample containerized Amazon EKS workloads for Prometheus metric testing](ContainerInsights-Prometheus-Sample-Workloads.md).

You can also configure Container Insights to collect Prometheus metrics from other containerized services and applications by editing the agent configuration file.

Workloads with pre-built dashboards for Amazon EKS clusters and Kubernetes clusters running on Amazon EC2 instances:
+ AWS App Mesh
+ NGINX
+ Memcached
+ Java/JMX
+ HAProxy

Workloads with pre-built dashboards for Amazon ECS clusters:
+ AWS App Mesh
+ Java/JMX
+ NGINX
+ NGINX Plus

# Set up and configure Prometheus metrics collection on Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Setup-ECS"></a>

To collect Prometheus metrics from Amazon ECS clusters, you can use the CloudWatch agent as a collector or use the AWS Distro for OpenTelemetry collector. For information about using the AWS Distro for OpenTelemetry collector, see [https://aws-otel.github.io/docs/getting-started/container-insights/ecs-prometheus](https://aws-otel.github.io/docs/getting-started/container-insights/ecs-prometheus).

The following sections explain how to use the CloudWatch agent as the collector to retrieve Prometheus metrics. You install the CloudWatch agent with Prometheus monitoring on clusters running Amazon ECS, and you can optionally configure the agent to scrape additional targets. These sections also provide optional tutorials for setting up sample workloads to use for testing with Prometheus monitoring. 

Container Insights on Amazon ECS supports the following launch type and network mode combinations for Prometheus metrics:


| Amazon ECS launch type | Network modes supported | 
| --- | --- | 
|  EC2 (Linux)  |  bridge, host, and awsvpc  | 
|  Fargate  |  awsvpc  | 

**VPC security group requirements**

The ingress rules of the security groups for the Prometheus workloads must open the Prometheus ports to the CloudWatch agent for scraping the Prometheus metrics by the private IP.

The egress rules of the security group for the CloudWatch agent must allow the CloudWatch agent to connect to the Prometheus workloads' port by private IP. 

**Topics**
+ [

# Install the CloudWatch agent with Prometheus metrics collection on Amazon ECS clusters
](ContainerInsights-Prometheus-install-ECS.md)
+ [

# Scraping additional Prometheus sources and importing those metrics
](ContainerInsights-Prometheus-Setup-configure-ECS.md)
+ [

# (Optional) Set up sample containerized Amazon ECS workloads for Prometheus metric testing
](ContainerInsights-Prometheus-Sample-Workloads-ECS.md)

# Install the CloudWatch agent with Prometheus metrics collection on Amazon ECS clusters
<a name="ContainerInsights-Prometheus-install-ECS"></a>

This section explains how to set up the CloudWatch agent with Prometheus monitoring in a cluster running Amazon ECS. After you do this, the agent automatically scrapes and imports metrics for the following workloads running in that cluster.
+ AWS App Mesh
+ Java/JMX

You can also configure the agent to scrape and import metrics from additional Prometheus workloads and sources.

## Set up IAM roles
<a name="ContainerInsights-Prometheus-Setup-ECS-IAM"></a>

You need two IAM roles for the CloudWatch agent task definition. If you specify **CreateIAMRoles=True** in the CloudFormation stack to have Container Insights create these roles for you, the roles will be created with the correct permissions. If you want to create them yourself or use existing roles, the following roles and permissions are required.
+ **CloudWatch agent ECS task role**— The CloudWatch agent container uses this role. It must include the **CloudWatchAgentServerPolicy** policy and a customer-managed policy which contains the following read-only permissions:
  + `ec2:DescribeInstances`
  + `ecs:ListTasks`
  + `ecs:ListServices`
  + `ecs:DescribeContainerInstances`
  + `ecs:DescribeServices`
  + `ecs:DescribeTasks`
  + `ecs:DescribeTaskDefinition`
+ **CloudWatch agent ECS task execution role**— This is the role that Amazon ECS requires to launch and execute your containers. Ensure that your task execution role has the **AmazonSSMReadOnlyAccess**, **AmazonECSTaskExecutionRolePolicy**, and **CloudWatchAgentServerPolicy** policies attached. If you want to store more sensitive data for Amazon ECS to use, see [ Specifying sensitive data](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html).

## Install the CloudWatch agent with Prometheus monitoring by using CloudFormation
<a name="ContainerInsights-Prometheus-Setup-ECS-CFN"></a>

You use AWS CloudFormation to install the CloudWatch agent with Prometheus monitoring for Amazon ECS clusters. The following list shows the parameters you will use in the CloudFormation template.
+ **ECSClusterName**— Specifies the target Amazon ECS cluster.
+ **CreateIAMRoles**— Specify **True** to create new roles for the Amazon ECS task role and Amazon ECS task execution role. Specify **False** to reuse existing roles.
+ **TaskRoleName**— If you specified **True** for **CreateIAMRoles**, this specifies the name to use for the new Amazon ECS task role. If you specified **False** for **CreateIAMRoles**, this specifies the existing role to use as the Amazon ECS task role. 
+ **ExecutionRoleName**— If you specified **True** for **CreateIAMRoles**, this specifies the name to use for the new Amazon ECS task execution role. If you specified **False** for **CreateIAMRoles**, this specifies the existing role to use as the Amazon ECS task execution role. 
+ **ECSNetworkMode**— If you are using EC2 launch type, specify the network mode here. It must be either **bridge** or **host**.
+ **ECSLaunchType**— Specify either **fargate** or **EC2**.
+ **SecurityGroupID**— If the **ECSNetworkMode** is **awsvpc**, specify the security group ID here.
+ **SubnetID**— If the **ECSNetworkMode** is **awsvpc**, specify the subnet ID here.

### Command samples
<a name="ContainerInsights-Prometheus-Setup-ECS-CFNcommands"></a>

This section includes sample CloudFormation commands to install Container Insights with Prometheus monitoring in various scenarios.

**Create CloudFormation stack for an Amazon ECS cluster in bridge network mode**

```
export AWS_PROFILE=your_aws_config_profile_eg_default
export AWS_DEFAULT_REGION=your_aws_region_eg_ap-southeast-1
export ECS_CLUSTER_NAME=your_ec2_ecs_cluster_name
export ECS_NETWORK_MODE=bridge
export CREATE_IAM_ROLES=True
export ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
export ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name

curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml

aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
    --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
    --parameters ParameterKey=ECSClusterName,ParameterValue=${ECS_CLUSTER_NAME} \
                 ParameterKey=CreateIAMRoles,ParameterValue=${CREATE_IAM_ROLES} \
                 ParameterKey=ECSNetworkMode,ParameterValue=${ECS_NETWORK_MODE} \
                 ParameterKey=TaskRoleName,ParameterValue=${ECS_TASK_ROLE_NAME} \
                 ParameterKey=ExecutionRoleName,ParameterValue=${ECS_EXECUTION_ROLE_NAME} \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${AWS_DEFAULT_REGION} \
    --profile ${AWS_PROFILE}
```

**Create CloudFormation stack for an Amazon ECS cluster in host network mode**

```
export AWS_PROFILE=your_aws_config_profile_eg_default
export AWS_DEFAULT_REGION=your_aws_region_eg_ap-southeast-1
export ECS_CLUSTER_NAME=your_ec2_ecs_cluster_name
export ECS_NETWORK_MODE=host
export CREATE_IAM_ROLES=True
export ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
export ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name


curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml

aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
    --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
    --parameters ParameterKey=ECSClusterName,ParameterValue=${ECS_CLUSTER_NAME} \
                 ParameterKey=CreateIAMRoles,ParameterValue=${CREATE_IAM_ROLES} \
                 ParameterKey=ECSNetworkMode,ParameterValue=${ECS_NETWORK_MODE} \
                 ParameterKey=TaskRoleName,ParameterValue=${ECS_TASK_ROLE_NAME} \
                 ParameterKey=ExecutionRoleName,ParameterValue=${ECS_EXECUTION_ROLE_NAME} \ 
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${AWS_DEFAULT_REGION} \
    --profile ${AWS_PROFILE}
```

**Create CloudFormation stack for an Amazon ECS cluster in awsvpc network mode**

```
export AWS_PROFILE=your_aws_config_profile_eg_default
export AWS_DEFAULT_REGION=your_aws_region_eg_ap-southeast-1
export ECS_CLUSTER_NAME=your_ec2_ecs_cluster_name
export ECS_LAUNCH_TYPE=EC2
export CREATE_IAM_ROLES=True
export ECS_CLUSTER_SECURITY_GROUP=your_security_group_eg_sg-xxxxxxxxxx
export ECS_CLUSTER_SUBNET=your_subnet_eg_subnet-xxxxxxxxxx
export ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
export ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name

curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-awsvpc.yaml

aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-${ECS_LAUNCH_TYPE}-awsvpc \
    --template-body file://cwagent-ecs-prometheus-metric-for-awsvpc.yaml \
    --parameters ParameterKey=ECSClusterName,ParameterValue=${ECS_CLUSTER_NAME} \
                 ParameterKey=CreateIAMRoles,ParameterValue=${CREATE_IAM_ROLES} \
                 ParameterKey=ECSLaunchType,ParameterValue=${ECS_LAUNCH_TYPE} \
                 ParameterKey=SecurityGroupID,ParameterValue=${ECS_CLUSTER_SECURITY_GROUP} \
                 ParameterKey=SubnetID,ParameterValue=${ECS_CLUSTER_SUBNET} \
                 ParameterKey=TaskRoleName,ParameterValue=${ECS_TASK_ROLE_NAME} \
                 ParameterKey=ExecutionRoleName,ParameterValue=${ECS_EXECUTION_ROLE_NAME} \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${AWS_DEFAULT_REGION} \
    --profile ${AWS_PROFILE}
```

**Create CloudFormation stack for a Fargate cluster in awsvpc network mode**

```
export AWS_PROFILE=your_aws_config_profile_eg_default
export AWS_DEFAULT_REGION=your_aws_region_eg_ap-southeast-1
export ECS_CLUSTER_NAME=your_ec2_ecs_cluster_name
export ECS_LAUNCH_TYPE=FARGATE
export CREATE_IAM_ROLES=True
export ECS_CLUSTER_SECURITY_GROUP=your_security_group_eg_sg-xxxxxxxxxx
export ECS_CLUSTER_SUBNET=your_subnet_eg_subnet-xxxxxxxxxx
export ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
export ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name            

curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-awsvpc.yaml

aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-${ECS_LAUNCH_TYPE}-awsvpc \
    --template-body file://cwagent-ecs-prometheus-metric-for-awsvpc.yaml \
    --parameters ParameterKey=ECSClusterName,ParameterValue=${ECS_CLUSTER_NAME} \
                 ParameterKey=CreateIAMRoles,ParameterValue=${CREATE_IAM_ROLES} \
                 ParameterKey=ECSLaunchType,ParameterValue=${ECS_LAUNCH_TYPE} \
                 ParameterKey=SecurityGroupID,ParameterValue=${ECS_CLUSTER_SECURITY_GROUP} \
                 ParameterKey=SubnetID,ParameterValue=${ECS_CLUSTER_SUBNET} \
                 ParameterKey=TaskRoleName,ParameterValue=${ECS_TASK_ROLE_NAME} \
                 ParameterKey=ExecutionRoleName,ParameterValue=${ECS_EXECUTION_ROLE_NAME} \
    --capabilities CAPABILITY_NAMED_IAM \
    --region ${AWS_DEFAULT_REGION} \
    --profile ${AWS_PROFILE}
```

### AWS resources created by the CloudFormation stack
<a name="ContainerInsights-Prometheus-Setup-ECS-resources"></a>

The following table lists the AWS resources that are created when you use CloudFormation to set up Container Insights with Prometheus monitoring on an Amazon ECS cluster.


| Resource type | Resource name | Comments | 
| --- | --- | --- | 
|  AWS::SSM::Parameter  |  AmazonCloudWatch-CWAgentConfig-\$1*ECS\$1CLUSTER\$1NAME*-\$1*ECS\$1LAUNCH\$1TYPE*-\$1*ECS\$1NETWORK\$1MODE*  |  This is the CloudWatch agent with the default App Mesh and Java/JMX embedded metric format definition.  | 
|  AWS::SSM::Parameter  |  AmazonCloudWatch-PrometheusConfigName-\$1*ECS\$1CLUSTER\$1NAME*-\$1*ECS\$1LAUNCH\$1TYPE*-\$1*ECS\$1NETWORK\$1MODE*  |  This is the Prometheus scraping configuration.  | 
|  AWS::IAM::Role  |  **\$1ECS\$1TASK\$1ROLE\$1NAME**.   |  The Amazon ECS task role. This is created only if you specified **True** for `CREATE_IAM_ROLES`.  | 
|  AWS::IAM::Role  |  **\$1\$1ECS\$1EXECUTION\$1ROLE\$1NAME\$1**   |  The Amazon ECS task execution role. This is created only if you specified **True** for `CREATE_IAM_ROLES`.  | 
|  AWS::ECS::TaskDefinition  |  cwagent-prometheus-\$1*ECS\$1CLUSTER\$1NAME*-\$1*ECS\$1LAUNCH\$1TYPE*-\$1*ECS\$1NETWORK\$1MODE*   |   | 
|  AWS::ECS::Service  |  cwagent-prometheus-replica-service-\$1*ECS\$1LAUNCH\$1TYPE*-\$1*ECS\$1NETWORK\$1MODE*  |   | 

### Deleting the CloudFormation stack for the CloudWatch agent with Prometheus monitoring
<a name="ContainerInsights-Prometheus-ECS-delete"></a>

To delete the CloudWatch agent from an Amazon ECS cluster, enter these commands.

```
export AWS_PROFILE=your_aws_config_profile_eg_default
export AWS_DEFAULT_REGION=your_aws_region_eg_ap-southeast-1
export CLOUDFORMATION_STACK_NAME=your_cloudformation_stack_name

aws cloudformation delete-stack \
--stack-name ${CLOUDFORMATION_STACK_NAME} \
--region ${AWS_DEFAULT_REGION} \
--profile ${AWS_PROFILE}
```

# Scraping additional Prometheus sources and importing those metrics
<a name="ContainerInsights-Prometheus-Setup-configure-ECS"></a>

The CloudWatch agent with Prometheus monitoring needs two configurations to scrape the Prometheus metrics. One is for the standard Prometheus configurations as documented in [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) in the Prometheus documentation. The other is for the CloudWatch agent configuration.

For Amazon ECS clusters, the configurations are integrated with the Parameter Store of AWS Systems Manager by the secrets in the Amazon ECS task definition:
+ The secret `PROMETHEUS_CONFIG_CONTENT` is for the Prometheus scrape configuration.
+ The secret `CW_CONFIG_CONTENT` is for the CloudWatch agent configuration. 

To scrape additional Prometheus metrics sources and import those metrics to CloudWatch, you modify both the Prometheus scrape configuration and the CloudWatch agent configuration, and then re-deploy the agent with the updated configuration.

**VPC security group requirements**

The ingress rules of the security groups for the Prometheus workloads must open the Prometheus ports to the CloudWatch agent for scraping the Prometheus metrics by the private IP.

The egress rules of the security group for the CloudWatch agent must allow the CloudWatch agent to connect to the Prometheus workloads' port by private IP. 

## Prometheus scrape configuration
<a name="ContainerInsights-Prometheus-Setup-config-global"></a>

The CloudWatch agent supports the standard Prometheus scrape configurations as documented in [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) in the Prometheus documentation. You can edit this section to update the configurations that are already in this file, and add additional Prometheus scraping targets. By default, the sample configuration file contains the following global configuration lines:

```
global:
  scrape_interval: 1m
  scrape_timeout: 10s
```
+ **scrape\$1interval**— Defines how frequently to scrape targets.
+ **scrape\$1timeout**— Defines how long to wait before a scrape request times out.

You can also define different values for these settings at the job level, to override the global configurations.

### Prometheus scraping jobs
<a name="ContainerInsights-Prometheus-Setup-config-scrape"></a>

The CloudWatch agent YAML files already have some default scraping jobs configured. For example, in the YAML files for Amazon ECS such as `cwagent-ecs-prometheus-metric-for-bridge-host.yaml`, the default scraping jobs are configured in the `ecs_service_discovery` section.

```
"ecs_service_discovery": {
                  "sd_frequency": "1m",
                  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
                  "docker_label": {
                  },
                  "task_definition_list": [
                    {
                      "sd_job_name": "ecs-appmesh-colors",
                      "sd_metrics_ports": "9901",
                      "sd_task_definition_arn_pattern": ".*:task-definition\/.*-ColorTeller-(white):[0-9]+",
                      "sd_metrics_path": "/stats/prometheus"
                    },
                    {
                      "sd_job_name": "ecs-appmesh-gateway",
                      "sd_metrics_ports": "9901",
                      "sd_task_definition_arn_pattern": ".*:task-definition/.*-ColorGateway:[0-9]+",
                      "sd_metrics_path": "/stats/prometheus"
                    }
                  ]
                }
```

Each of these default targets are scraped, and the metrics are sent to CloudWatch in log events using embedded metric format. For more information, see [Embedding metrics within logs](CloudWatch_Embedded_Metric_Format.md).

Log events from Amazon ECS clusters are stored in the **/aws/ecs/containerinsights/*cluster\$1name*/prometheus** log group.

Each scraping job is contained in a different log stream in this log group.

To add a new scraping target, you add a new entry in the `task_definition_list` section under the `ecs_service_discovery` section. of the YAML file, and restart the agent. For an example of this process, see [Tutorial for adding a new Prometheus scrape target: Prometheus API Server metrics](ContainerInsights-Prometheus-Setup-configure.md#ContainerInsights-Prometheus-Setup-new-exporters).

## CloudWatch agent configuration for Prometheus
<a name="ContainerInsights-Prometheus-Setup-cw-agent-config"></a>

The CloudWatch agent configuration file has a `prometheus` section under `metrics_collected` for the Prometheus scraping configuration. It includes the following configuration options:
+ **cluster\$1name**— specifies the cluster name to be added as a label in the log event. This field is optional. If you omit it, the agent can detect the Amazon ECS cluster name.
+ **log\$1group\$1name**— specifies the log group name for the scraped Prometheus metrics. This field is optional. If you omit it, CloudWatch uses **/aws/ecs/containerinsights/*cluster\$1name*/prometheus** for logs from Amazon ECS clusters.
+ **prometheus\$1config\$1path**— specifies the Prometheus scrape configuration file path. If the value of this field starts with `env:` the Prometheus scrape configuration file contents will be retrieved from the container's environment variable. Do not change this field.
+ **ecs\$1service\$1discovery**— is the section to specify the configurations of the Amazon ECS Prometheus target auto-discovery functions. Two modes are supported to discover the Prometheus targets: discovery based on the container’s docker label or discovery based on the Amazon ECS task definition ARN regular expression. You can use the two modes together and the CloudWatch agent will de-duplicate the discovered targets based on: *\$1private\$1ip\$1:\$1port\$1/\$1metrics\$1path\$1*.

  The `ecs_service_discovery` section can contain the following fields:
  + `sd_frequency` is the frequency to discover the Prometheus exporters. Specify a number and a unit suffix. For example, `1m` for once per minute or `30s` for once per 30 seconds. Valid unit suffixes are `ns`, `us`, `ms`, `s`, `m`, and `h`.

    This field is optional. The default is 60 seconds (1 minute).
  + `sd_target_cluster` is the target Amazon ECS cluster name for auto-discovery. This field is optional. The default is the name of the Amazon ECS cluster where the CloudWatch agent is installed. 
  + `sd_cluster_region` is the target Amazon ECS cluster's Region. This field is optional. The default is the Region of the Amazon ECS cluster where the CloudWatch agent is installed. .
  + `sd_result_file` is the path of the YAML file for the Prometheus target results. The Prometheus scrape configuration will refer to this file.
  + `docker_label` is an optional section that you can use to specify the configuration for docker label-based service discovery. If you omit this section, docker label-based discovery is not used. This section can contain the following fields:
    + `sd_port_label` is the container's docker label name that specifies the container port for Prometheus metrics. The default value is `ECS_PROMETHEUS_EXPORTER_PORT`. If the container does not have this docker label, the CloudWatch agent will skip it.
    + `sd_metrics_path_label` is the container's docker label name that specifies the Prometheus metrics path. The default value is `ECS_PROMETHEUS_METRICS_PATH`. If the container does not have this docker label, the agent assumes the default path `/metrics`.
    + `sd_job_name_label` is the container's docker label name that specifies the Prometheus scrape job name. The default value is `job`. If the container does not have this docker label, the CloudWatch agent uses the job name in the Prometheus scrape configuration.
  + `task_definition_list` is an optional section that you can use to specify the configuration of task definition-based service discovery. If you omit this section, task definition-based discovery is not used. This section can contain the following fields:
    + `sd_task_definition_arn_pattern` is the pattern to use to specify the Amazon ECS task definitions to discover. This is a regular expression.
    + `sd_metrics_ports` lists the containerPort for the Prometheus metrics. Separate the containerPorts with semicolons.
    + `sd_container_name_pattern` specifies the Amazon ECS task container names. This is a regular expression.
    + `sd_metrics_path` specifies the Prometheus metric path. If you omit this, the agent assumes the default path `/metrics`
    + `sd_job_name` specifies the Prometheus scrape job name. If you omit this field, the CloudWatch agent uses the job name in the Prometheus scrape configuration.
  + `service_name_list_for_tasks` is an optional section that you can use to specify the configuration of service name-based discovery. If you omit this section, service name-based discovery is not used. This section can contain the following fields:
    + `sd_service_name_pattern` is the pattern to use to specify the Amazon ECS service where tasks are to be discovered. This is a regular expression.
    + `sd_metrics_ports` Lists the `containerPort` for the Prometheus metrics. Separate multiple `containerPorts` with semicolons.
    + `sd_container_name_pattern` specifies the Amazon ECS task container names. This is a regular expression.
    + `sd_metrics_path` specifies the Prometheus metrics path. If you omit this, the agent assumes that the default path `/metrics`.
    + `sd_job_name` specifies the Prometheus scrape job name. If you omit this field, the CloudWatch agent uses the job name in the Prometheus scrape configuration. 
+ **metric\$1declaration**— are sections that specify the array of logs with embedded metric format to be generated. There are `metric_declaration` sections for each Prometheus source that the CloudWatch agent imports from by default. These sections each include the following fields:
  + `label_matcher` is a regular expression that checks the value of the labels listed in `source_labels`. The metrics that match are enabled for inclusion in the embedded metric format sent to CloudWatch. 

    If you have multiple labels specified in `source_labels`, we recommend that you do not use `^` or `$` characters in the regular expression for `label_matcher`.
  + `source_labels` specifies the value of the labels that are checked by the `label_matcher` line.
  + `label_separator` specifies the separator to be used in the ` label_matcher` line if multiple `source_labels` are specified. The default is `;`. You can see this default used in the `label_matcher` line in the following example.
  + `metric_selectors` is a regular expression that specifies the metrics to be collected and sent to CloudWatch.
  + `dimensions` is the list of labels to be used as CloudWatch dimensions for each selected metric.

See the following `metric_declaration` example.

```
"metric_declaration": [
  {
     "source_labels":[ "Service", "Namespace"],
     "label_matcher":"(.*node-exporter.*|.*kube-dns.*);kube-system$",
     "dimensions":[
        ["Service", "Namespace"]
     ],
     "metric_selectors":[
        "^coredns_dns_request_type_count_total$"
     ]
  }
]
```

This example configures an embedded metric format section to be sent as a log event if the following conditions are met:
+ The value of `Service` contains either `node-exporter` or `kube-dns`.
+ The value of `Namespace` is `kube-system`.
+ The Prometheus metric `coredns_dns_request_type_count_total` contains both `Service` and `Namespace` labels.

The log event that is sent includes the following highlighted section:

```
{
   "CloudWatchMetrics":[
      {
         "Metrics":[
            {
               "Name":"coredns_dns_request_type_count_total"
            }
         ],
         "Dimensions":[
            [
               "Namespace",
               "Service"
            ]
         ],
         "Namespace":"ContainerInsights/Prometheus"
      }
   ],
   "Namespace":"kube-system",
   "Service":"kube-dns",
   "coredns_dns_request_type_count_total":2562,
   "eks_amazonaws_com_component":"kube-dns",
   "instance":"192.168.61.254:9153",
   "job":"kubernetes-service-endpoints",
   ...
}
```

# Detailed guide for autodiscovery on Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Setup-autodiscovery-ecs"></a>

Prometheus provides dozens of dynamic service-discovery mechanisms as described in [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config). However there is no built-in service discovery for Amazon ECS. The CloudWatch agent adds this mechanism.

When the Amazon ECS Prometheus service discovery is enabled, the CloudWatch agent periodically makes the following API calls to Amazon ECS and Amazon EC2 frontends to retrieve the metadata of the running ECS tasks in the target ECS cluster. 

```
EC2:DescribeInstances
ECS:ListTasks
ECS:ListServices
ECS:DescribeContainerInstances
ECS:DescribeServices
ECS:DescribeTasks
ECS:DescribeTaskDefinition
```

The metadata is used by the CloudWatch agent to scan the Prometheus targets within the ECS cluster. The CloudWatch agent supports three service discovery modes:
+ Container docker label-based service discovery
+ ECS task definition ARN regular expression-based service discovery
+ ECS service name regular expression-based service discovery

All modes can be used together. CloudWatch agent de-duplicates the discovered targets based on: `{private_ip}:{port}/{metrics_path}`.

All discovered targets are written into a result file specified by the `sd_result_file` configuration field within the CloudWatch agent container. The following is a sample result file: 

```
- targets:
  - 10.6.1.95:32785
  labels:
    __metrics_path__: /metrics
    ECS_PROMETHEUS_EXPORTER_PORT: "9406"
    ECS_PROMETHEUS_JOB_NAME: demo-jar-ec2-bridge-dynamic
    ECS_PROMETHEUS_METRICS_PATH: /metrics
    InstanceType: t3.medium
    LaunchType: EC2
    SubnetId: subnet-123456789012
    TaskDefinitionFamily: demo-jar-ec2-bridge-dynamic-port
    TaskGroup: family:demo-jar-ec2-bridge-dynamic-port
    TaskRevision: "7"
    VpcId: vpc-01234567890
    container_name: demo-jar-ec2-bridge-dynamic-port
    job: demo-jar-ec2-bridge-dynamic
- targets:
  - 10.6.3.193:9404
  labels:
    __metrics_path__: /metrics
    ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_B: "9404"
    ECS_PROMETHEUS_JOB_NAME: demo-tomcat-ec2-bridge-mapped-port
    ECS_PROMETHEUS_METRICS_PATH: /metrics
    InstanceType: t3.medium
    LaunchType: EC2
    SubnetId: subnet-123456789012
    TaskDefinitionFamily: demo-tomcat-ec2-bridge-mapped-port
    TaskGroup: family:demo-jar-tomcat-bridge-mapped-port
    TaskRevision: "12"
    VpcId: vpc-01234567890
    container_name: demo-tomcat-ec2-bridge-mapped-port
    job: demo-tomcat-ec2-bridge-mapped-port
```

You can directly integrate this result file with Prometheus file-based service discovery. For more information about Prometheus file-based service discovery, see [<file\$1sd\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config).

 Suppose the result file is written to `/tmp/cwagent_ecs_auto_sd.yaml` The following Prometheus scrape configuration will consume it.

```
global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: cwagent-ecs-file-sd-config
    sample_limit: 10000
    file_sd_configs:
      - files: [ "/tmp/cwagent_ecs_auto_sd.yaml" ]
```

The CloudWatch agent also adds the following additional labels for the discovered targets.
+ `container_name`
+ `TaskDefinitionFamily`
+ `TaskRevision`
+ `TaskGroup`
+ `StartedBy`
+ `LaunchType`
+ `job`
+ `__metrics_path__`
+ Docker labels

When the cluster has the EC2 launch type, the following three labels are added.
+ `InstanceType`
+ `VpcId`
+ `SubnetId`

**Note**  
Docker labels that don't match the regular expression `[a-zA-Z_][a-zA-Z0-9_]*` are filtered out. This matches the Prometheus conventions as listed in `label_name` in [Configuration file](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#labelname) in the Prometheus documentation.

## ECS service discovery configuration examples
<a name="ContainerInsights-Prometheus-Setup-autodiscovery-ecs-examples"></a>

This section includes examples that demonstrate ECS service discovery.

**Example 1**

```
"ecs_service_discovery": {
  "sd_frequency": "1m",
  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
  "docker_label": {
  }
}
```

This example enables docker label-based service discovery. The CloudWatch agent will query the ECS tasks’ metadata once per minute and write the discovered targets into the `/tmp/cwagent_ecs_auto_sd.yaml` file within the CloudWatch agent container.

The default value of `sd_port_label` in the `docker_label` section is `ECS_PROMETHEUS_EXPORTER_PORT`. If any running container in the ECS tasks has a `ECS_PROMETHEUS_EXPORTER_PORT` docker label, the CloudWatch agent uses its value as `container port` to scan all exposed ports of the container. If there is a match, the mapped host port plus the private IP of the container are used to construct the Prometheus exporter target in the following format: `private_ip:host_port`. 

The default value of `sd_metrics_path_label` in the `docker_label` section is `ECS_PROMETHEUS_METRICS_PATH`. If the container has this docker label, its value will be used as the `__metrics_path__` . If the container does not have this label, the default value `/metrics` is used.

The default value of `sd_job_name_label` in the `docker_label` section is `job`. If the container has this docker label, its value will be appended as one of the labels for the target to replace the default job name specified in the Prometheus configuration. The value of this docker label is used as the log stream name in the CloudWatch Logs log group. 

**Example 2**

```
"ecs_service_discovery": {
  "sd_frequency": "15s",
  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
  "docker_label": {
    "sd_port_label": "ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A",
    "sd_job_name_label": "ECS_PROMETHEUS_JOB_NAME"  
  }
}
```

This example enables docker label-based service discovery. THe CloudWatch agent will query the ECS tasks’ metadata every 15 seconds and write the discovered targets into the `/tmp/cwagent_ecs_auto_sd.yaml` file within the CloudWatch agent container. The containers with a docker label of `ECS_PROMETHEUS_EXPORTER_PORT_SUBSET_A` will be scanned. The value of the docker label `ECS_PROMETHEUS_JOB_NAME` is used as the job name.

**Example 3**

```
"ecs_service_discovery": {
  "sd_frequency": "5m",
  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
  "task_definition_list": [
    {
      "sd_job_name": "java-prometheus",
      "sd_metrics_path": "/metrics",
      "sd_metrics_ports": "9404; 9406",
      "sd_task_definition_arn_pattern": ".*:task-definition/.*javajmx.*:[0-9]+"
    },
    {
      "sd_job_name": "envoy-prometheus",
      "sd_metrics_path": "/stats/prometheus",
      "sd_container_name_pattern": "^envoy$", 
      "sd_metrics_ports": "9901",
      "sd_task_definition_arn_pattern": ".*:task-definition/.*appmesh.*:23"
    }
  ]
}
```

This example enables ECS task definition ARN regular expression-based service discovery. The CloudWatch agent will query the ECS tasks’ metadata every five minutes and write the discovered targets into the `/tmp/cwagent_ecs_auto_sd.yaml` file within the CloudWatch agent container.

Two task definition ARN regular expresion sections are defined:
+  For the first section, the ECS tasks with `javajmx` in their ECS task definition ARN are filtered for the container port scan. If the containers within these ECS tasks expose the container port on 9404 or 9406, the mapped host port along with the private IP of the container are used to create the Prometheus exporter targets. The value of `sd_metrics_path` sets `__metrics_path__` to `/metrics`. So the CloudWatch agent will scrape the Prometheus metrics from `private_ip:host_port/metrics`, the scraped metrics are sent to the `java-prometheus` log stream in CloudWatch Logs in the log group `/aws/ecs/containerinsights/cluster_name/prometheus`. 
+  For the second section, the ECS tasks with `appmesh` in their ECS task definition ARN and with `version` of `:23` are filtered for the container port scan. For containers with a name of `envoy` that expose the container port on `9901`, the mapped host port along with the private IP of the container are used to create the Prometheus exporter targets. The value within these ECS tasks expose the container port on 9404 or 9406, the mapped host port along with the private IP of the container are used to create the Prometheus exporter targets. The value of `sd_metrics_path` sets `__metrics_path__` to `/stats/prometheus`. So the CloudWatch agent will scrape the Prometheus metrics from `private_ip:host_port/stats/prometheus`, and send the scraped metrics to the `envoy-prometheus` log stream in CloudWatch Logs in the log group `/aws/ecs/containerinsights/cluster_name/prometheus`. 

**Example 4**

```
"ecs_service_discovery": {
  "sd_frequency": "5m",
  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
  "service_name_list_for_tasks": [
    {
      "sd_job_name": "nginx-prometheus",
      "sd_metrics_path": "/metrics",
      "sd_metrics_ports": "9113",
      "sd_service_name_pattern": "^nginx-.*"
    },
    {
      "sd_job_name": "haproxy-prometheus",
      "sd_metrics_path": "/stats/metrics",
      "sd_container_name_pattern": "^haproxy$",
      "sd_metrics_ports": "8404",
      "sd_service_name_pattern": ".*haproxy-service.*"
    }
  ]
}
```

This example enables ECS service name regular expression-based service discovery. The CloudWatch agent will query the ECS services’ metadata every five minutes and write the discovered targets into the `/tmp/cwagent_ecs_auto_sd.yaml` file within the CloudWatch agent container.

Two service name regular expresion sections are defined:
+  For the first section, the ECS tasks that are associated with ECS services that have names matching the regular expression `^nginx-.*` are filtered for the container port scan. If the containers within these ECS tasks expose the container port on 9113, the mapped host port along with the private IP of the container are used to create the Prometheus exporter targets. The value of `sd_metrics_path` sets `__metrics_path__` to `/metrics`. So the CloudWatch agent will scrape the Prometheus metrics from `private_ip:host_port/metrics`, and the scraped metrics are sent to the `nginx-prometheus` log stream in CloudWatch Logs in the log group `/aws/ecs/containerinsights/cluster_name/prometheus`. 
+  or the second section, the ECS tasks that are associated with ECS services that have names matching the regular expression `.*haproxy-service.*` are filtered for the container port scan. For containers with a name of `haproxy` expose the container port on 8404, the mapped host port along with the private IP of the container are used to create the Prometheus exporter targets. The value of `sd_metrics_path` sets `__metrics_path__` to `/stats/metrics`. So the CloudWatch agent will scrape the Prometheus metrics from `private_ip:host_port/stats/metrics`, and the scraped metrics are sent to the `haproxy-prometheus` log stream in CloudWatch Logs in the log group `/aws/ecs/containerinsights/cluster_name/prometheus`. 

**Example 5**

```
"ecs_service_discovery": {
  "sd_frequency": "1m30s",
  "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
  "docker_label": {
    "sd_port_label": "MY_PROMETHEUS_EXPORTER_PORT_LABEL",
    "sd_metrics_path_label": "MY_PROMETHEUS_METRICS_PATH_LABEL",
    "sd_job_name_label": "MY_PROMETHEUS_METRICS_NAME_LABEL"  
  }
  "task_definition_list": [
    {
      "sd_metrics_ports": "9150",
      "sd_task_definition_arn_pattern": "*memcached.*"
    }
  ]
}
```

This example enables both ECS service discovery modes. The CloudWatch agent will query the ECS tasks’ metadata every 90 seconds and write the discovered targets into the `/tmp/cwagent_ecs_auto_sd.yaml` file within the CloudWatch agent container. 

For the docker-based service discovery configuration:
+ The ECS tasks with docker label `MY_PROMETHEUS_EXPORTER_PORT_LABEL` will be filtered for Prometheus port scan. The target Prometheus container port is specified by the value of the label `MY_PROMETHEUS_EXPORTER_PORT_LABEL`. 
+ The value of the docker label `MY_PROMETHEUS_EXPORTER_PORT_LABEL` is used for `__metrics_path__`. If the container does not have this docker label, the default value `/metrics` is used. 
+ The value of the docker label `MY_PROMETHEUS_EXPORTER_PORT_LABEL` is used as the job label. If the container does not have this docker label, the job name defined in the Prometheus configuration is used.

For the ECS task definition ARN regular expression-based service discovery configuration:
+ The ECS tasks with `memcached` in the ECS task definition ARN are filtered for container port scan. The target Prometheus container port is 9150 as defined by `sd_metrics_ports`. The default metrics path `/metrics` is used. The job name defined in the Prometheus configuration is used.

# (Optional) Set up sample containerized Amazon ECS workloads for Prometheus metric testing
<a name="ContainerInsights-Prometheus-Sample-Workloads-ECS"></a>

To test the Prometheus metric support in CloudWatch Container Insights, you can set up one or more of the following containerized workloads. The CloudWatch agent with Prometheus support automatically collects metrics from each of these workloads. To see the metrics that are collected by default, see [Prometheus metrics collected by the CloudWatch agent](ContainerInsights-Prometheus-metrics.md).

**Topics**
+ [

# Sample App Mesh workload for Amazon ECS clusters
](ContainerInsights-Prometheus-Sample-Workloads-ECS-appmesh.md)
+ [

# Sample Java/JMX workload for Amazon ECS clusters
](ContainerInsights-Prometheus-Sample-Workloads-ECS-javajmx.md)
+ [

# Sample NGINX workload for Amazon ECS clusters
](ContainerInsights-Prometheus-Setup-nginx-ecs.md)
+ [

# Sample NGINX Plus workload for Amazon ECS clusters
](ContainerInsights-Prometheus-Setup-nginx-plus-ecs.md)
+ [

# Tutorial for adding a new Prometheus scrape target: Memcached on Amazon ECS
](ContainerInsights-Prometheus-Setup-memcached-ecs.md)
+ [

# Tutorial for scraping Redis OSS Prometheus metrics on Amazon ECS Fargate
](ContainerInsights-Prometheus-Setup-redis-ecs.md)

# Sample App Mesh workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Sample-Workloads-ECS-appmesh"></a>

To collect metrics from a sample Prometheus workload for Amazon ECS, you must be running Container Insights in the cluster. For information about installing Container Insights, see [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS.md).

First, follow this [ walkthrough](https://github.com/aws/aws-app-mesh-examples/tree/main/examples/apps/colorapp#app-mesh-walkthrough-deploy-the-color-app-on-ecs) to deploy the sample color app on your Amazon ECS cluster. After you finish, you will have App Mesh Prometheus metrics exposed on port 9901.

Next, follow these steps to install the CloudWatch agent with Prometheus monitoring on the same Amazon ECS cluster where you installed the color app. The steps in this section install the CloudWatch agent in bridge network mode. 

The environment variables `ENVIRONMENT_NAME`, `AWS_PROFILE`, and `AWS_DEFAULT_REGION` that you set in the walkthrough will also be used in the following steps.

**To install the CloudWatch agent with Prometheus monitoring for testing**

1. Download the CloudFormation template by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml
   ```

1. Set the network mode by entering the following commands.

   ```
   export ECS_CLUSTER_NAME=${ENVIRONMENT_NAME}
   export ECS_NETWORK_MODE=bridge
   ```

1. Create the CloudFormation stack by entering the following commands.

   ```
   aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=${ECS_CLUSTER_NAME} \
                    ParameterKey=CreateIAMRoles,ParameterValue=True \
                    ParameterKey=ECSNetworkMode,ParameterValue=${ECS_NETWORK_MODE} \
                    ParameterKey=TaskRoleName,ParameterValue=CWAgent-Prometheus-TaskRole-${ECS_CLUSTER_NAME} \
                    ParameterKey=ExecutionRoleName,ParameterValue=CWAgent-Prometheus-ExecutionRole-${ECS_CLUSTER_NAME} \
       --capabilities CAPABILITY_NAMED_IAM \
       --region ${AWS_DEFAULT_REGION} \
       --profile ${AWS_PROFILE}
   ```

1. (Optional) When the CloudFormation stack is created, you see a `CREATE_COMPLETE` message. If you to check the status before you see that message, enter the following command.

   ```
   aws cloudformation describe-stacks \
   --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
   --query 'Stacks[0].StackStatus' \
   --region ${AWS_DEFAULT_REGION} \
   --profile ${AWS_PROFILE}
   ```

**Troubleshooting**

The steps in the walkthrough use jq to parse the output result of the AWS CLI. For more information about installing jq, see [ jq](https://stedolan.github.io/jq/). Use the following command to set the default output format of your AWS CLI to JSON so jq can parse it correctly. 

```
$ aws configure
```

When the response gets to `Default output format`, enter **json**.

## Uninstall the CloudWatch agent with Prometheus monitoring
<a name="ContainerInsights-Prometheus-Sample-Workloads-ECS-appmesh-uninstall"></a>

When you are finished testing, enter the following command to uninstall the CloudWatch agent by deleting the CloudFormation stack.

```
aws cloudformation delete-stack \
--stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
--region ${AWS_DEFAULT_REGION} \
--profile ${AWS_PROFILE}
```

# Sample Java/JMX workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Sample-Workloads-ECS-javajmx"></a>

JMX Exporter is an official Prometheus exporter that can scrape and expose JMX mBeans as Prometheus metrics. For more information, see [prometheus/jmx\$1exporter](https://github.com/prometheus/jmx_exporter).

The CloudWatch agent with Prometheus support scrapes the Java/JMX Prometheus metrics based on the service discovery configuration in the Amazon ECS cluster. You can configure the JMX Exporter to expose the metrics on a different port or metrics\$1path. If you do change the port or path, update the default `ecs_service_discovery` section in the CloudWatch agent configuration.

To collect metrics from a sample Prometheus workload for Amazon ECS, you must be running Container Insights in the cluster. For information about installing Container Insights, see [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS.md).

**To install the Java/JMX sample workload for Amazon ECS clusters**

1. Follow the steps in these sections to create your Docker images.
   + [Example: Java Jar Application Docker image with Prometheus metrics](ContainerInsights-Prometheus-Sample-Workloads-javajmx.md#ContainerInsights-Prometheus-Sample-Workloads-javajmx-jar)
   + [Example: Apache Tomcat Docker image with Prometheus metrics](ContainerInsights-Prometheus-Sample-Workloads-javajmx.md#ContainerInsights-Prometheus-Sample-Workloads-javajmx-tomcat)

1. Specify the following two docker labels in the Amazon ECS task definition file. You can then run the task definition as an Amazon ECS service or Amazon ECS task in the cluster.
   + Set `ECS_PROMETHEUS_EXPORTER_PORT` to point to the containerPort where the Prometheus metrics are exposed.
   + Set `Java_EMF_Metrics` to `true`. The CloudWatch agent uses this flag to generated the embedded metric format in the log event.

   The following is an example:

   ```
   {
     "family": "workload-java-ec2-bridge",
     "taskRoleArn": "{{task-role-arn}}",
     "executionRoleArn": "{{execution-role-arn}}",
     "networkMode": "bridge",
     "containerDefinitions": [
       {
         "name": "tomcat-prometheus-workload-java-ec2-bridge-dynamic-port",
         "image": "your_docker_image_tag_for_tomcat_with_prometheus_metrics",
         "portMappings": [
           {
             "hostPort": 0,
             "protocol": "tcp",
             "containerPort": 9404
           }
         ],
         "dockerLabels": {
           "ECS_PROMETHEUS_EXPORTER_PORT": "9404",
           "Java_EMF_Metrics": "true"
         }
       }
     ],
     "requiresCompatibilities": [
       "EC2"  ],
     "cpu": "256",
     "memory": "512"
     }
   ```

The default setting of the CloudWatch agent in the CloudFormation template enables both docker label-based service discovery and task definition ARN-based service discovery. To view these default settings, see line 65 of the [ CloudWatch agent YAML configuration file](https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml#L65). The containers with the `ECS_PROMETHEUS_EXPORTER_PORT` label will be auto-discovered based on the specified container port for Prometheus scraping. 

The default setting of the CloudWatch agent also has the `metric_declaration` setting for Java/JMX at line 112 of the same file. All docker labels of the target containers will be added as additional labels in the Prometheus metrics and sent to CloudWatch Logs. For the Java/JMX containers with docker label `Java_EMF_Metrics=“true”`, the embedded metric format will be generated. 

# Sample NGINX workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Setup-nginx-ecs"></a>

The NGINX Prometheus exporter can scrape and expose NGINX data as Prometheus metrics. This example uses the exporter in tandem with the NGINX reverse proxy service for Amazon ECS.

For more information about the NGINX Prometheus exporter, see [ nginx-prometheus-exporter](https://github.com/nginxinc/nginx-prometheus-exporter) on Github. For more information about the NGINX reverse proxy, see [ ecs-nginx-reverse-proxy](https://github.com/awslabs/ecs-nginx-reverse-proxy) on Github.

The CloudWatch agent with Prometheus support scrapes the NGINX Prometheus metrics based on the service discovery configuration in the Amazon ECS cluster. You can configure the NGINX Prometheus Exporter to expose the metrics on a different port or path. If you change the port or path, update the `ecs_service_discovery` section in the CloudWatch agent configuration file.

## Install the NGINX reverse proxy sample workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-nginx-ecs-setup"></a>

Follow these steps to install the NGINX reverse proxy sample workload.

### Create the Docker images
<a name="ContainerInsights-Prometheus-nginx-ecs-setup-docker"></a>

**To create the Docker images for the NGINX reverse proxy sample workload**

1. Download the following folder from the NGINX reverse proxy repo: [ https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/](https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/).

1. Find the `app` directory and build an image from that directory:

   ```
   docker build -t web-server-app ./path-to-app-directory
   ```

1. Build a custom image for NGINX. First, create a directory with the following two files:
   + A sample Dockerfile:

     ```
     FROM nginx
     COPY nginx.conf /etc/nginx/nginx.conf
     ```
   + An `nginx.conf` file, modified from [ https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/](https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/):

     ```
     events {
       worker_connections 768;
     }
     
     http {
       # Nginx will handle gzip compression of responses from the app server
       gzip on;
       gzip_proxied any;
       gzip_types text/plain application/json;
       gzip_min_length 1000;
     
       server{
         listen 8080;
         location /stub_status {
             stub_status   on;
         }
       }
     
       server {
         listen 80;
     
         # Nginx will reject anything not matching /api
         location /api {
           # Reject requests with unsupported HTTP method
           if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
             return 405;
           }
     
           # Only requests matching the whitelist expectations will
           # get sent to the application server
           proxy_pass http://app:3000;
           proxy_http_version 1.1;
           proxy_set_header Upgrade $http_upgrade;
           proxy_set_header Connection 'upgrade';
           proxy_set_header Host $host;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_cache_bypass $http_upgrade;
         }
       }
     }
     ```
**Note**  
`stub_status` must be enabled on the same port that `nginx-prometheus-exporter` is configured to scrape metrics from. In our example task definition, `nginx-prometheus-exporter` is configured to scrape metrics from port 8080.

1. Build an image from files in your new directory:

   ```
   docker build -t nginx-reverse-proxy ./path-to-your-directory
   ```

1. Upload your new images to an image repository for later use.

### Create the task definition to run NGINX and the web server app in Amazon ECS
<a name="ContainerInsights-Prometheus-nginx-ecs-setup-task"></a>

Next, you set up the task definition.

This task definition enables the collection and export of NGINX Prometheus metrics. The NGINX container tracks input from the app, and exposes that data to port 8080, as set in `nginx.conf`. The NGINX prometheus exporter container scrapes these metrics, and posts them to port 9113, for use in CloudWatch.

**To set up the task definition for the NGINX sample Amazon ECS workload**

1. Create a task definition JSON file with the following content. Replace *your-customized-nginx-iamge* with the image URI for your customized NGINX image, and replace *your-web-server-app-image* with the image URI for your web server app image.

   ```
   {
     "containerDefinitions": [
       {
         "name": "nginx",
         "image": "your-customized-nginx-image",
         "memory": 256,
         "cpu": 256,
         "essential": true,
         "portMappings": [
           {
             "containerPort": 80,
             "protocol": "tcp"
           }
         ],
         "links": [
           "app"
         ]
       },
       {
         "name": "app",
         "image": "your-web-server-app-image",
         "memory": 256,
         "cpu": 256,
         "essential": true
       },
       {
         "name": "nginx-prometheus-exporter",
         "image": "docker.io/nginx/nginx-prometheus-exporter:0.8.0",
         "memory": 256,
         "cpu": 256,
         "essential": true,
         "command": [
           "-nginx.scrape-uri",
           "http://nginx:8080/stub_status"
       ],
       "links":[
         "nginx"
       ],
         "portMappings":[
           {
             "containerPort": 9113,
             "protocol": "tcp"
           }
         ]
       }
     ],
     "networkMode": "bridge",
     "placementConstraints": [],
     "family": "nginx-sample-stack"
   }
   ```

1. Register the task definition by entering the following command.

   ```
   aws ecs register-task-definition --cli-input-json file://path-to-your-task-definition-json
   ```

1. Create a service to run the task by entering the following command:

   Be sure not to change the service name. We will be running a CloudWatch agent service using a configuration that searches for tasks using the name patterns of the services that started them. For example, for the CloudWatch agent to find the task launched by this command, you can specify the value of `sd_service_name_pattern` to be `^nginx-service$`. The next section provides more details.

   ```
   aws ecs create-service \
    --cluster your-cluster-name \
    --service-name nginx-service \
    --task-definition nginx-sample-stack:1 \
    --desired-count 1
   ```

### Configure the CloudWatch agent to scrape NGINX Prometheus metrics
<a name="ContainerInsights-Prometheus-nginx-ecs-setup-agent"></a>

The final step is to configure the CloudWatch agent to scrape the NGINX metrics. In this example, the CloudWatch agent discovers the task via the service name pattern, and the port 9113, where the exporter exposes the prometheus metrics for NGINX. With the task discovered and the metrics available, the CloudWatch agent begins posting the collected metrics to the log stream **nginx-prometheus-exporter**. 

**To configure the CloudWatch agent to scrape the NGINX metrics**

1. Download the latest version of the necessary YAML file by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml
   ```

1. Open the file with a text editor, and find the full CloudWatch agent confguration in the `value` key in the `resource:CWAgentConfigSSMParameter` section. Then, in the `ecs_service_discovery` section, add the following `service_name_list_for_tasks` section.

   ```
   "service_name_list_for_tasks": [
     {
       "sd_job_name": "nginx-prometheus-exporter",
       "sd_metrics_path": "/metrics",
       "sd_metrics_ports": "9113",
       "sd_service_name_pattern": "^nginx-service$"
      }
   ],
   ```

1. In the same file, add the following section in the `metric_declaration` section to allow NGINX metrics. Be sure to follow the existing indentation pattern.

   ```
   {
     "source_labels": ["job"],
     "label_matcher": ".*nginx.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily", "ServiceName"]],
     "metric_selectors": [
       "^nginx_.*$"
     ]
   },
   ```

1. If you don't already have the CloudWatch agent deployed in this cluster, skip to step 8.

   If you already have the CloudWatch agent deployed in the Amazon ECS cluster by using AWS CloudFormation, you can create a change set by entering the following commands:

   ```
   ECS_CLUSTER_NAME=your_cluster_name
   AWS_REGION=your_aws_region
   ECS_NETWORK_MODE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-change-set --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION \
       --change-set-name nginx-scraping-support
   ```

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. Revew the newly-created changeset **nginx-scraping-support**. You should see one change applied to the **CWAgentConfigSSMParameter** resource. Run the changeset and restart the CloudWatch agent task by entering the following command:

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 0 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. Wait about 10 seconds, and then enter the following command.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 1 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. If you are installing the CloudWatch agent with Prometheus metric collecting on the cluster for the first time, enter the following commands.

   ```
   ECS_CLUSTER_NAME=your_cluster_name
   AWS_REGION=your_aws_region
   ECS_NETWORK_MODE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION
   ```

## Viewing your NGINX metrics and logs
<a name="ContainerInsights-Prometheus-Setup-nginx-view"></a>

You can now view the NGINX metrics being collected.

**To view the metrics for your sample NGINX workload**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the Region where your cluster is running, choose **Metrics** in the left navigation pane. Find the **ContainerInsights/Prometheus** namespace to see the metrics.

1. To see the CloudWatch Logs events, choose **Log groups** in the navigation pane. The events are in the log group **/aws/containerinsights/*your\$1cluster\$1name*/prometheus**, in the log stream *nginx-prometheus-exporter*.

# Sample NGINX Plus workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-Setup-nginx-plus-ecs"></a>

NGINX Plus is the commerical version of NGINX. You must have a licence to use it. For more information, see [ NGINX Plus](https://www.nginx.com/products/nginx/)

The NGINX Prometheus exporter can scrape and expose NGINX data as Prometheus metrics. This example uses the exporter in tandem with the NGINX Plus reverse proxy service for Amazon ECS.

For more information about the NGINX Prometheus exporter, see [ nginx-prometheus-exporter](https://github.com/nginxinc/nginx-prometheus-exporter) on Github. For more information about the NGINX reverse proxy, see [ ecs-nginx-reverse-proxy](https://github.com/awslabs/ecs-nginx-reverse-proxy) on Github.

The CloudWatch agent with Prometheus support scrapes the NGINX Plus Prometheus metrics based on the service discovery configuration in the Amazon ECS cluster. You can configure the NGINX Prometheus Exporter to expose the metrics on a different port or path. If you change the port or path, update the `ecs_service_discovery` section in the CloudWatch agent configuration file.

## Install the NGINX Plus reverse proxy sample workload for Amazon ECS clusters
<a name="ContainerInsights-Prometheus-nginx-plus-ecs-setup"></a>

Follow these steps to install the NGINX reverse proxy sample workload.

### Create the Docker images
<a name="ContainerInsights-Prometheus-nginx-plus-ecs-setup-docker"></a>

**To create the Docker images for the NGINX Plus reverse proxy sample workload**

1. Download the following folder from the NGINX reverse proxy repo: [ https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/](https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/).

1. Find the `app` directory and build an image from that directory:

   ```
   docker build -t web-server-app ./path-to-app-directory
   ```

1. Build a custom image for NGINX Plus. Before you can build the image for NGINX Plus, you need to obtain the key named `nginx-repo.key` and the SSL certificate `nginx-repo.crt` for your licensed NGINX Plus. Create a directory and store in it your `nginx-repo.key` and `nginx-repo.crt` files. 

   In the directory that you just created, create the following two files:
   + A sample Dockerfile with the following content. This docker file is adopted from a sample file provided at [https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-docker/\$1docker\$1plus\$1image](https://docs.nginx.com/nginx/admin-guide/installing-nginx/installing-nginx-docker/#docker_plus_image). The important change that we make is that we load a separate file, called `nginx.conf`, which will be created in the next step.

     ```
     FROM debian:buster-slim
     
     LABEL maintainer="NGINX Docker Maintainers <docker-maint@nginx.com>“
     
     # Define NGINX versions for NGINX Plus and NGINX Plus modules
     # Uncomment this block and the versioned nginxPackages block in the main RUN
     # instruction to install a specific release
     # ENV NGINX_VERSION 21
     # ENV NJS_VERSION 0.3.9
     # ENV PKG_RELEASE 1~buster
     
     # Download certificate and key from the customer portal (https://cs.nginx.com (https://cs.nginx.com/))
     # and copy to the build context
     COPY nginx-repo.crt /etc/ssl/nginx/
     COPY nginx-repo.key /etc/ssl/nginx/
     # COPY nginx.conf /etc/ssl/nginx/nginx.conf
     
     RUN set -x \
     # Create nginx user/group first, to be consistent throughout Docker variants
     && addgroup --system --gid 101 nginx \
     && adduser --system --disabled-login --ingroup nginx --no-create-home --home /nonexistent --gecos "nginx user" --shell /bin/false --uid 101 nginx \
     && apt-get update \
     && apt-get install --no-install-recommends --no-install-suggests -y ca-certificates gnupg1 \
     && \
     NGINX_GPGKEY=573BFD6B3D8FBC641079A6ABABF5BD827BD9BF62; \
     found=''; \
     for server in \
     ha.pool.sks-keyservers.net (http://ha.pool.sks-keyservers.net/) \
     hkp://keyserver.ubuntu.com:80 \
     hkp://p80.pool.sks-keyservers.net:80 \
     pgp.mit.edu (http://pgp.mit.edu/) \
     ; do \
     echo "Fetching GPG key $NGINX_GPGKEY from $server"; \
     apt-key adv --keyserver "$server" --keyserver-options timeout=10 --recv-keys "$NGINX_GPGKEY" && found=yes && break; \
     done; \
     test -z "$found" && echo >&2 "error: failed to fetch GPG key $NGINX_GPGKEY" && exit 1; \
     apt-get remove --purge --auto-remove -y gnupg1 && rm -rf /var/lib/apt/lists/* \
     # Install the latest release of NGINX Plus and/or NGINX Plus modules
     # Uncomment individual modules if necessary
     # Use versioned packages over defaults to specify a release
     && nginxPackages=" \
     nginx-plus \
     # nginx-plus=${NGINX_VERSION}-${PKG_RELEASE} \
     # nginx-plus-module-xslt \
     # nginx-plus-module-xslt=${NGINX_VERSION}-${PKG_RELEASE} \
     # nginx-plus-module-geoip \
     # nginx-plus-module-geoip=${NGINX_VERSION}-${PKG_RELEASE} \
     # nginx-plus-module-image-filter \
     # nginx-plus-module-image-filter=${NGINX_VERSION}-${PKG_RELEASE} \
     # nginx-plus-module-perl \
     # nginx-plus-module-perl=${NGINX_VERSION}-${PKG_RELEASE} \
     # nginx-plus-module-njs \
     # nginx-plus-module-njs=${NGINX_VERSION}+${NJS_VERSION}-${PKG_RELEASE} \
     " \
     && echo "Acquire::https::plus-pkgs.nginx.com::Verify-Peer \"true\";" >> /etc/apt/apt.conf.d/90nginx \
     && echo "Acquire::https::plus-pkgs.nginx.com::Verify-Host \"true\";" >> /etc/apt/apt.conf.d/90nginx \
     && echo "Acquire::https::plus-pkgs.nginx.com::SslCert \"/etc/ssl/nginx/nginx-repo.crt\";" >> /etc/apt/apt.conf.d/90nginx \
     && echo "Acquire::https::plus-pkgs.nginx.com::SslKey \"/etc/ssl/nginx/nginx-repo.key\";" >> /etc/apt/apt.conf.d/90nginx \
     && printf "deb https://plus-pkgs.nginx.com/debian buster nginx-plus\n" > /etc/apt/sources.list.d/nginx-plus.list \
     && apt-get update \
     && apt-get install --no-install-recommends --no-install-suggests -y \
     $nginxPackages \
     gettext-base \
     curl \
     && apt-get remove --purge --auto-remove -y && rm -rf /var/lib/apt/lists/* /etc/apt/sources.list.d/nginx-plus.list \
     && rm -rf /etc/apt/apt.conf.d/90nginx /etc/ssl/nginx
     
     # Forward request logs to Docker log collector
     RUN ln -sf /dev/stdout /var/log/nginx/access.log \
     && ln -sf /dev/stderr /var/log/nginx/error.log
     
     COPY nginx.conf /etc/nginx/nginx.conf
     
     EXPOSE 80
     
     STOPSIGNAL SIGTERM
     
     CMD ["nginx", "-g", "daemon off;"]
     ```
   + An `nginx.conf` file, modified from [ https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/nginx](https://github.com/awslabs/ecs-nginx-reverse-proxy/tree/master/reverse-proxy/nginx).

     ```
     events {
       worker_connections 768;
     }
     
     http {
       # Nginx will handle gzip compression of responses from the app server
       gzip on;
       gzip_proxied any;
       gzip_types text/plain application/json;
       gzip_min_length 1000;
     
       upstream backend {
         zone name 10m;
         server app:3000    weight=2;
         server app2:3000    weight=1;
       }
     
       server{
         listen 8080;
         location /api {
           api write=on;
         }
       }
     
       match server_ok {
         status 100-599;
       }
     
       server {
         listen 80;
         status_zone zone;
         # Nginx will reject anything not matching /api
         location /api {
           # Reject requests with unsupported HTTP method
           if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
             return 405;
           }
     
           # Only requests matching the whitelist expectations will
           # get sent to the application server
           proxy_pass http://backend;
           health_check uri=/lorem-ipsum match=server_ok;
           proxy_http_version 1.1;
           proxy_set_header Upgrade $http_upgrade;
           proxy_set_header Connection 'upgrade';
           proxy_set_header Host $host;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_cache_bypass $http_upgrade;
         }
       }
     }
     ```

1. Build an image from files in your new directory:

   ```
   docker build -t nginx-plus-reverse-proxy ./path-to-your-directory
   ```

1. Upload your new images to an image repository for later use.

### Create the task definition to run NGINX Plus and the web server app in Amazon ECS
<a name="ContainerInsights-Prometheus-nginx-plus-ecs-setup-task"></a>

Next, you set up the task definition.

This task definition enables the collection and export of NGINX Plus Prometheus metrics. The NGINX container tracks input from the app, and exposes that data to port 8080, as set in `nginx.conf`. The NGINX prometheus exporter container scrapes these metrics, and posts them to port 9113, for use in CloudWatch.

**To set up the task definition for the NGINX sample Amazon ECS workload**

1. Create a task definition JSON file with the following content. Replace *your-customized-nginx-plus-image* with the image URI for your customized NGINX Plus image, and replace *your-web-server-app-image* with the image URI for your web server app image.

   ```
   {
     "containerDefinitions": [
       {
         "name": "nginx",
         "image": "your-customized-nginx-plus-image",
         "memory": 256,
         "cpu": 256,
         "essential": true,
         "portMappings": [
           {
             "containerPort": 80,
             "protocol": "tcp"
           }
         ],
         "links": [
           "app",
           "app2"
         ]
       },
       {
         "name": "app",
         "image": "your-web-server-app-image",
         "memory": 256,
         "cpu": 128,
         "essential": true
       },
       {
         "name": "app2",
         "image": "your-web-server-app-image",
         "memory": 256,
         "cpu": 128,
         "essential": true
       },
       {
         "name": "nginx-prometheus-exporter",
         "image": "docker.io/nginx/nginx-prometheus-exporter:0.8.0",
         "memory": 256,
         "cpu": 256,
         "essential": true,
         "command": [
           "-nginx.plus",
           "-nginx.scrape-uri",
            "http://nginx:8080/api"
       ],
       "links":[
         "nginx"
       ],
         "portMappings":[
           {
             "containerPort": 9113,
             "protocol": "tcp"
           }
         ]
       }
     ],
     "networkMode": "bridge",
     "placementConstraints": [],
     "family": "nginx-plus-sample-stack"
   }
   ```

1. Register the task definition:

   ```
   aws ecs register-task-definition --cli-input-json file://path-to-your-task-definition-json
   ```

1. Create a service to run the task by entering the following command:

   ```
   aws ecs create-service \
    --cluster your-cluster-name \
    --service-name nginx-plus-service \
    --task-definition nginx-plus-sample-stack:1 \
    --desired-count 1
   ```

   Be sure not to change the service name. We will be running a CloudWatch agent service using a configuration that searches for tasks using the name patterns of the services that started them. For example, for the CloudWatch agent to find the task launched by this command, you can specify the value of `sd_service_name_pattern` to be `^nginx-plus-service$`. The next section provides more details.

### Configure the CloudWatch agent to scrape NGINX Plus Prometheus metrics
<a name="ContainerInsights-Prometheus-nginx-plus-ecs-setup-agent"></a>

The final step is to configure the CloudWatch agent to scrape the NGINX metrics. In this example, the CloudWatch agent discovers the task via the service name pattern, and the port 9113, where the exporter exposes the prometheus metrics for NGINX. With the task discovered and the metrics available, the CloudWatch agent begins posting the collected metrics to the log stream **nginx-prometheus-exporter**. 

**To configure the CloudWatch agent to scrape the NGINX metrics**

1. Download the latest version of the necessary YAML file by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-bridge-host.yaml
   ```

1. Open the file with a text editor, and find the full CloudWatch agent confguration in the `value` key in the `resource:CWAgentConfigSSMParameter` section. Then, in the `ecs_service_discovery` section, add the following `service_name_list_for_tasks` section.

   ```
   "service_name_list_for_tasks": [
     {
       "sd_job_name": "nginx-plus-prometheus-exporter",
       "sd_metrics_path": "/metrics",
       "sd_metrics_ports": "9113",
       "sd_service_name_pattern": "^nginx-plus.*"
      }
   ],
   ```

1. In the same file, add the following section in the `metric_declaration` section to allow NGINX Plus metrics. Be sure to follow the existing indentation pattern.

   ```
   {
     "source_labels": ["job"],
     "label_matcher": "^nginx-plus.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily", "ServiceName"]],
     "metric_selectors": [
       "^nginxplus_connections_accepted$",
       "^nginxplus_connections_active$",
       "^nginxplus_connections_dropped$",
       "^nginxplus_connections_idle$",
       "^nginxplus_http_requests_total$",
       "^nginxplus_ssl_handshakes$",
       "^nginxplus_ssl_handshakes_failed$",
       "^nginxplus_up$",
       "^nginxplus_upstream_server_health_checks_fails$"
     ]
   },
   {
     "source_labels": ["job"],
     "label_matcher": "^nginx-plus.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily", "ServiceName", "upstream"]],
     "metric_selectors": [
       "^nginxplus_upstream_server_response_time$"
     ]
   },
   {
     "source_labels": ["job"],
     "label_matcher": "^nginx-plus.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily", "ServiceName", "code"]],
     "metric_selectors": [
       "^nginxplus_upstream_server_responses$",
       "^nginxplus_server_zone_responses$"
     ]
   },
   ```

1. If you don't already have the CloudWatch agent deployed in this cluster, skip to step 8.

   If you already have the CloudWatch agent deployed in the Amazon ECS cluster by using AWS CloudFormation, you can create a change set by entering the following commands:

   ```
   ECS_CLUSTER_NAME=your_cluster_name
   AWS_REGION=your_aws_region
   ECS_NETWORK_MODE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-change-set --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION \
       --change-set-name nginx-plus-scraping-support
   ```

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. Revew the newly-created changeset **nginx-plus-scraping-support**. You should see one change applied to the **CWAgentConfigSSMParameter** resource. Run the changeset and restrt the CloudWatch agent task by entering the following command:

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 0 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. Wait about 10 seconds, and then enter the following command.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 1 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. If you are installing the CloudWatch agent with Prometheus metric collecting on the cluster for the first time, enter the following commands.

   ```
   ECS_CLUSTER_NAME=your_cluster_name
   AWS_REGION=your_aws_region
   ECS_NETWORK_MODE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION
   ```

## Viewing your NGINX Plus metrics and logs
<a name="ContainerInsights-Prometheus-Setup-nginx-plus-view"></a>

You can now view the NGINX Plus metrics being collected.

**To view the metrics for your sample NGINX workload**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the Region where your cluster is running, choose **Metrics** in the left navigation pane. Find the **ContainerInsights/Prometheus** namespace to see the metrics.

1. To see the CloudWatch Logs events, choose **Log groups** in the navigation pane. The events are in the log group **/aws/containerinsights/*your\$1cluster\$1name*/prometheus**, in the log stream *nginx-plus-prometheus-exporter*.

# Tutorial for adding a new Prometheus scrape target: Memcached on Amazon ECS
<a name="ContainerInsights-Prometheus-Setup-memcached-ecs"></a>

This tutorial provides a hands-on introduction to scrape the Prometheus metrics of a sample Memcached application on an Amazon Amazon ECS cluster with the EC2 launch type. The Memcached Prometheus exporter target will be auto-discovered by the CloudWatch agent by ECS task definition-based service discovery.

Memcached is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. For more infromation, see [ What is Memcached?](https://www.memcached.org/)

The [ memchached\$1exporter](https://github.com/prometheus/memcached_exporter) (Apache License 2.0) is one of the Prometheus official exporters. By default the memcache\$1exporter serves on port 0.0.0.0:9150 at `/metrics.`

The Docker images in the following two Docker Hub repositories are used in this tutorial: 
+ [ Memcached](https://hub.docker.com/_/memcached?tab=description)
+ [ prom/memcached-exporter](https://hub.docker.com/r/prom/memcached-exporter/)

**Prerequisite**

To collect metrics from a sample Prometheus workload for Amazon ECS, you must be running Container Insights in the cluster. For information about installing Container Insights, see [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS.md).

**Topics**
+ [

## Set the Amazon ECS EC2 cluster environment variables
](#ContainerInsights-Prometheus-Setup-memcached-ecs-environment)
+ [

## Install the sample Memcached workload
](#ContainerInsights-Prometheus-Setup-memcached-ecs-install-workload)
+ [

## Configure the CloudWatch agent to scrape Memcached Prometheus metrics
](#ContainerInsights-Prometheus-Setup-memcached-ecs-agent)
+ [

## Viewing your Memcached metrics
](#ContainerInsights-Prometheus-ECS-memcached-view)

## Set the Amazon ECS EC2 cluster environment variables
<a name="ContainerInsights-Prometheus-Setup-memcached-ecs-environment"></a>

**To set the Amazon ECS EC2 cluster environment variables**

1. Install the Amazon ECS CLI if you haven't already done so. For more information, see [ Installing the Amazon ECS CLI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_CLI_installation.html).

1. Set the new Amazon ECS cluster name and Region. For example:

   ```
   ECS_CLUSTER_NAME=ecs-ec2-memcached-tutorial
   AWS_DEFAULT_REGION=ca-central-1
   ```

1. (Optional) If you don't already have an Amazon ECS cluster with the EC2 launch type where you want to install the sample Memcached workload and CloudWatch agent, you can create one by entering the following command.

   ```
   ecs-cli up --capability-iam --size 1 \
   --instance-type t3.medium \
   --cluster $ECS_CLUSTER_NAME \
   --region $AWS_REGION
   ```

   The expected result of this command is as follows:

   ```
   WARN[0000] You will not be able to SSH into your EC2 instances without a key pair. 
   INFO[0000] Using recommended Amazon Linux 2 AMI with ECS Agent 1.44.4 and Docker version 19.03.6-ce 
   INFO[0001] Created cluster                               cluster=ecs-ec2-memcached-tutorial region=ca-central-1
   INFO[0002] Waiting for your cluster resources to be created... 
   INFO[0002] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
   INFO[0063] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
   INFO[0124] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
   VPC created: vpc-xxxxxxxxxxxxxxxxx
   Security Group created: sg-xxxxxxxxxxxxxxxxx
   Subnet created: subnet-xxxxxxxxxxxxxxxxx
   Subnet created: subnet-xxxxxxxxxxxxxxxxx
   Cluster creation succeeded.
   ```

## Install the sample Memcached workload
<a name="ContainerInsights-Prometheus-Setup-memcached-ecs-install-workload"></a>

**To install the sample Memcached workload which exposes the Prometheus metrics**

1. Download the Memcached CloudFormation template by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/sample_traffic/memcached/memcached-traffic-sample.yaml
   ```

1. Set the IAM role names to be created for Memcached by entering the following commands.

   ```
   MEMCACHED_ECS_TASK_ROLE_NAME=memcached-prometheus-demo-ecs-task-role-name
   MEMCACHED_ECS_EXECUTION_ROLE_NAME=memcached-prometheus-demo-ecs-execution-role-name
   ```

1. Install the sample Memcached workload by entering the following command. This sample installs the workload in `host` network mode.

   ```
   MEMCACHED_ECS_NETWORK_MODE=host
   
   aws cloudformation create-stack --stack-name Memcached-Prometheus-Demo-ECS-$ECS_CLUSTER_NAME-EC2-$MEMCACHED_ECS_NETWORK_MODE \
       --template-body file://memcached-traffic-sample.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=ECSNetworkMode,ParameterValue=$MEMCACHED_ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$MEMCACHED_ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$MEMCACHED_ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION
   ```

The CloudFormation stack creates four resources:
+ One ECS task role
+ One ECS task execution role
+ One Memcached task definition
+ One Memcached service

In the Memcached task definition, two containers are defined:
+ The primary container runs a simple Memcached application and opens port 11211 for access.
+ The other container runs the Redis OSS exporter process to expose the Prometheus metrics on port 9150. This is the container to be discovered and scraped by the CloudWatch agent.

## Configure the CloudWatch agent to scrape Memcached Prometheus metrics
<a name="ContainerInsights-Prometheus-Setup-memcached-ecs-agent"></a>

**To configure the CloudWatch agent to scrape Memcached Prometheus metrics**

1. Download the latest version of `cwagent-ecs-prometheus-metric-for-awsvpc.yaml` by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-awsvpc.yaml
   ```

1. Open the file with a text editor, and find the full CloudWatch agent configuration behind the `value` key in the `resource:CWAgentConfigSSMParameter` section.

   Then, in the `ecs_service_discovery` section, add the following configuration into the `task_definition_list` section.

   ```
   {
       "sd_job_name": "ecs-memcached",
       "sd_metrics_ports": "9150",
       "sd_task_definition_arn_pattern": ".*:task-definition/memcached-prometheus-demo.*:[0-9]+"
   },
   ```

   For the `metric_declaration` section, the default setting does not allow any Memcached metrics. Add the following section to allow Memcached metrics. Be sure to follow the existing indentation pattern.

   ```
   {
     "source_labels": ["container_name"],
     "label_matcher": "memcached-exporter-.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily"]],
     "metric_selectors": [
       "^memcached_current_(bytes|items|connections)$",
       "^memcached_items_(reclaimed|evicted)_total$",
       "^memcached_(written|read)_bytes_total$",
       "^memcached_limit_bytes$",
       "^memcached_commands_total$"
     ]
   },
   {
     "source_labels": ["container_name"],
     "label_matcher": "memcached-exporter-.*",
     "dimensions": [["ClusterName", "TaskDefinitionFamily","status","command"], ["ClusterName", "TaskDefinitionFamily","command"]],
     "metric_selectors": [
       "^memcached_commands_total$"
     ]
   },
   ```

1. If you already have the CloudWatch agent deployed in the Amazon ECS cluster by CloudFormation, you can create a change set by entering the following commands.

   ```
   ECS_NETWORK_MODE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-change-set --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION \
       --change-set-name memcached-scraping-support
   ```

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. Review the newly created changeset `memcached-scraping-support`. You should see one change applied to the `CWAgentConfigSSMParameter` resource. Execute the changeset and restart the CloudWatch agent task by entering the following commands.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 0 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. Wait about 10 seconds, and then enter the following command.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 1 \
   --service cwagent-prometheus-replica-service-EC2-$ECS_NETWORK_MODE \
   --region $AWS_REGION
   ```

1. If you are installing the CloudWatch agent with Prometheus metric collecting for the cluster for the first time, please enter the following commands:

   ```
   ECS_NETWORK_MODEE=bridge
   CREATE_IAM_ROLES=True
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
       --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_REGION
   ```

## Viewing your Memcached metrics
<a name="ContainerInsights-Prometheus-ECS-memcached-view"></a>

This tutorial sends the following metrics to the **ECS/ContainerInsights/Prometheus** namespace in CloudWatch. You can use the CloudWatch console to see the metrics in that namespace.


| Metric name | Dimensions | 
| --- | --- | 
|  `memcached_current_items` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_current_connections` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_limit_bytes` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_current_bytes` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_written_bytes_total` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_read_bytes_total` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_items_evicted_total` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_items_reclaimed_total` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `memcached_commands_total` |  `ClusterName`, `TaskDefinitionFamily` `ClusterName`, TaskDefinitionFamily, command `ClusterName`, TaskDefinitionFamily, status, command  | 

**Note**  
The value of the **command** dimension can be: `delete`, `get`, `cas`, `set`, `decr`, `touch`, `incr`, or `flush`.  
The value of the **status** dimension can be `hit`, `miss`, or `badval`. 

You can also create a CloudWatch dashboard for your Memcached Prometheus metrics.

**To create a dashboard for Memcached Prometheus metrics**

1. Create environment variables, replacing the values below to match your deployment.

   ```
   DASHBOARD_NAME=your_memcached_cw_dashboard_name
   ECS_TASK_DEF_FAMILY=memcached-prometheus-demo-$ECS_CLUSTER_NAME-EC2-$MEMCACHED_ECS_NETWORK_MOD
   ```

1. Enter the following command to create the dashboard.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/sample_cloudwatch_dashboards/memcached/cw_dashboard_memcached.json \
   | sed "s/{{YOUR_AWS_REGION}}/$AWS_REGION/g" \
   | sed "s/{{YOUR_CLUSTER_NAME}}/$ECS_CLUSTER_NAME/g" \
   | sed "s/{{YOUR_TASK_DEF_FAMILY}}/$ECS_TASK_DEF_FAMILY/g" \
   | xargs -0 aws cloudwatch put-dashboard --dashboard-name ${DASHBOARD_NAME} --region $AWS_REGION --dashboard-body
   ```

# Tutorial for scraping Redis OSS Prometheus metrics on Amazon ECS Fargate
<a name="ContainerInsights-Prometheus-Setup-redis-ecs"></a>

This tutorial provides a hands-on introduction to scrape the Prometheus metrics of a sample Redis OSS application in an Amazon ECS Fargate cluster. The Redis OSS Prometheus exporter target will be auto-discovered by the CloudWatch agent with Prometheus metric support based on the container’s docker labels.

Redis OSS (https://redis.io/) is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. For more information, see [ redis](https://redis.io/).

redis\$1exporter (MIT License licensed) is used to expose the Redis OSS prometheus metrics on the specified port (default: 0.0.0.0:9121). For more information, see [ redis\$1exporter](https://github.com/oliver006/redis_exporter).

The Docker images in the following two Docker Hub repositories are used in this tutorial: 
+ [ redis](https://hub.docker.com/_/redis?tab=description)
+ [ redis\$1exporter](https://hub.docker.com/r/oliver006/redis_exporter)

**Prerequisite**

To collect metrics from a sample Prometheus workload for Amazon ECS, you must be running Container Insights in the cluster. For information about installing Container Insights, see [Setting up Container Insights on Amazon ECS](deploy-container-insights-ECS.md).

**Topics**
+ [

## Set the Amazon ECS Fargate cluster environment variable
](#ContainerInsights-Prometheus-Setup-redis-ecs-variable)
+ [

## Set the network environment variables for the Amazon ECS Fargate cluster
](#ContainerInsights-Prometheus-Setup-redis-ecs-variable2)
+ [

## Install the sample Redis OSS workload
](#ContainerInsights-Prometheus-Setup-redis-ecs-install-workload)
+ [

## Configure the CloudWatch agent to scrape Redis OSS Prometheus metrics
](#ContainerInsights-Prometheus-Setup-redis-ecs-agent)
+ [

## Viewing your Redis OSS metrics
](#ContainerInsights-Prometheus-Setup-redis-view)

## Set the Amazon ECS Fargate cluster environment variable
<a name="ContainerInsights-Prometheus-Setup-redis-ecs-variable"></a>

**To set the Amazon ECS Fargate cluster environment variable**

1. Install the Amazon ECS CLI if you haven't already done so. For more information, see [ Installing the Amazon ECS CLI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_CLI_installation.html).

1. Set the new Amazon ECS cluster name and Region. For example:

   ```
   ECS_CLUSTER_NAME=ecs-fargate-redis-tutorial
   AWS_DEFAULT_REGION=ca-central-1
   ```

1. (Optional) If you don't already have an Amazon ECS Fargate cluster where you want to install the sample Redis OSS workload and CloudWatch agent, you can create one by entering the following command.

   ```
   ecs-cli up --capability-iam \
   --cluster $ECS_CLUSTER_NAME \
   --launch-type FARGATE \
   --region $AWS_DEFAULT_REGION
   ```

   The expected result of this command is as follows:

   ```
   INFO[0000] Created cluster   cluster=ecs-fargate-redis-tutorial region=ca-central-1
   INFO[0001] Waiting for your cluster resources to be created...
   INFO[0001] Cloudformation stack status   stackStatus=CREATE_IN_PROGRESS
   VPC created: vpc-xxxxxxxxxxxxxxxxx
   Subnet created: subnet-xxxxxxxxxxxxxxxxx
   Subnet created: subnet-xxxxxxxxxxxxxxxxx
   Cluster creation succeeded.
   ```

## Set the network environment variables for the Amazon ECS Fargate cluster
<a name="ContainerInsights-Prometheus-Setup-redis-ecs-variable2"></a>

**To set the network environment variables for the Amazon ECS Fargate cluster**

1. Set your VPC and subnet ID of the Amazon ECS cluster. If you created a new cluster in the previous procedure, you'll see these values in the result of the final command. Otherwise, use the IDs of the existing cluster that you are going to use with Redis.

   ```
   ECS_CLUSTER_VPC=vpc-xxxxxxxxxxxxxxxxx
   ECS_CLUSTER_SUBNET_1=subnet-xxxxxxxxxxxxxxxxx
   ECS_CLUSTER_SUBNET_2=subnet-xxxxxxxxxxxxxxxxx
   ```

1. In this tutorial, we are going to install the Redis OSS application and the CloudWatch agent in the default security group of the Amazon ECS cluster’s VPC. The default security group allows all network connection within the same security group so the CloudWatch agent can scrape the Prometheus metrics exposed on the Redis OSS containers. In a real production environment, you might want to create dedicated security groups for the Redis OSS application and CloudWatch agent and set customized permissions for them. 

   Enter the following command to get the default security group ID.

   ```
   aws ec2 describe-security-groups \
   --filters Name=vpc-id,Values=$ECS_CLUSTER_VPC  \
   --region $AWS_DEFAULT_REGION
   ```

   Then set the Fargate cluster deafult security group variable by entering the following command, replacing *my-default-security-group* with the value you found from the previous command.

   ```
   ECS_CLUSTER_SECURITY_GROUP=my-default-security-group
   ```

## Install the sample Redis OSS workload
<a name="ContainerInsights-Prometheus-Setup-redis-ecs-install-workload"></a>

**To install the sample Redis OSS workload which exposes the Prometheus metrics**

1. Download the Redis OSS CloudFormation template by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/sample_traffic/redis/redis-traffic-sample.yaml
   ```

1. Set the IAM role names to be created for Redis OSS by entering the following commands.

   ```
   REDIS_ECS_TASK_ROLE_NAME=redis-prometheus-demo-ecs-task-role-name
   REDIS_ECS_EXECUTION_ROLE_NAME=redis-prometheus-demo-ecs-execution-role-name
   ```

1. Install the sample Redis OSS workload by entering the following command.

   ```
   aws cloudformation create-stack --stack-name Redis-Prometheus-Demo-ECS-$ECS_CLUSTER_NAME-fargate-awsvpc \
       --template-body file://redis-traffic-sample.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=SecurityGroupID,ParameterValue=$ECS_CLUSTER_SECURITY_GROUP \
                    ParameterKey=SubnetID,ParameterValue=$ECS_CLUSTER_SUBNET_1 \
                    ParameterKey=TaskRoleName,ParameterValue=$REDIS_ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$REDIS_ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region $AWS_DEFAULT_REGION
   ```

The CloudFormation stack creates four resources:
+ One ECS task role
+ One ECS task execution role
+ One Redis OSS task definition
+ One Redis OSS service

In the Redis OSS task definition, two containers are defined:
+ The primary container runs a simple Redis OSS application and opens port 6379 for access.
+ The other container runs the Redis OSS exporter process to expose the Prometheus metrics on port 9121. This is the container to be discovered and scraped by the CloudWatch agent. The following docker label is defined so that the CloudWatch agent can discover this container based on it.

  ```
  ECS_PROMETHEUS_EXPORTER_PORT: 9121
  ```

## Configure the CloudWatch agent to scrape Redis OSS Prometheus metrics
<a name="ContainerInsights-Prometheus-Setup-redis-ecs-agent"></a>

**To configure the CloudWatch agent to scrape Redis OSS Prometheus metrics**

1. Download the latest version of `cwagent-ecs-prometheus-metric-for-awsvpc.yaml` by entering the following command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/ecs-task-definition-templates/deployment-mode/replica-service/cwagent-prometheus/cloudformation-quickstart/cwagent-ecs-prometheus-metric-for-awsvpc.yaml
   ```

1. Open the file with a text editor, and find the full CloudWatch agent configuration behind the `value` key in the `resource:CWAgentConfigSSMParameter` section.

   Then, in the `ecs_service_discovery` section shown here, the `docker_label`-based service discovery is enabled with the default settings which are based on `ECS_PROMETHEUS_EXPORTER_PORT`, which matches the docker label we defined in the Redis OSS ECS task definition. So we do not need to make any changes in this section:

   ```
   ecs_service_discovery": {
     "sd_frequency": "1m",
     "sd_result_file": "/tmp/cwagent_ecs_auto_sd.yaml",
   *  "docker_label": {
     },*
     ...
   ```

   For the `metric_declaration` section, the default setting does not allow any Redis OSS metrics. Add the following section to allow Redis OSS metrics. Be sure to follow the existing indentation pattern.

   ```
   {
     "source_labels": ["container_name"],
     "label_matcher": "^redis-exporter-.*$",
     "dimensions": [["ClusterName","TaskDefinitionFamily"]],
     "metric_selectors": [
       "^redis_net_(in|out)put_bytes_total$",
       "^redis_(expired|evicted)_keys_total$",
       "^redis_keyspace_(hits|misses)_total$",
       "^redis_memory_used_bytes$",
       "^redis_connected_clients$"
     ]
   },
   {
     "source_labels": ["container_name"],
     "label_matcher": "^redis-exporter-.*$",
     "dimensions": [["ClusterName","TaskDefinitionFamily","cmd"]],
     "metric_selectors": [
       "^redis_commands_total$"
     ]
   },
   {
     "source_labels": ["container_name"],
     "label_matcher": "^redis-exporter-.*$",
     "dimensions": [["ClusterName","TaskDefinitionFamily","db"]],
     "metric_selectors": [
       "^redis_db_keys$"
     ]
   },
   ```

1. If you already have the CloudWatch agent deployed in the Amazon ECS cluster by CloudFormation, you can create a change set by entering the following commands.

   ```
   ECS_LAUNCH_TYPE=FARGATE
   CREATE_IAM_ROLES=True
   ECS_CLUSTER_SUBNET=$ECS_CLUSTER_SUBNET_1
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-change-set --stack-name CWAgent-Prometheus-ECS-$ECS_CLUSTER_NAME-$ECS_LAUNCH_TYPE-awsvpc \
       --template-body file://cwagent-ecs-prometheus-metric-for-awsvpc.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSLaunchType,ParameterValue=$ECS_LAUNCH_TYPE \
                    ParameterKey=SecurityGroupID,ParameterValue=$ECS_CLUSTER_SECURITY_GROUP \
                    ParameterKey=SubnetID,ParameterValue=$ECS_CLUSTER_SUBNET \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region ${AWS_DEFAULT_REGION} \
       --change-set-name redis-scraping-support
   ```

1. Open the CloudFormation console at [https://console.aws.amazon.com/cloudformation](https://console.aws.amazon.com/cloudformation/).

1. Review the newly created changeset `redis-scraping-support`. You should see one change applied to the `CWAgentConfigSSMParameter` resource. Execute the changeset and restart the CloudWatch agent task by entering the following commands.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 0 \
   --service cwagent-prometheus-replica-service-$ECS_LAUNCH_TYPE-awsvpc \
   --region ${AWS_DEFAULT_REGION}
   ```

1. Wait about 10 seconds, and then enter the following command.

   ```
   aws ecs update-service --cluster $ECS_CLUSTER_NAME \
   --desired-count 1 \
   --service cwagent-prometheus-replica-service-$ECS_LAUNCH_TYPE-awsvpc \
   --region ${AWS_DEFAULT_REGION}
   ```

1. If you are installing the CloudWatch agent with Prometheus metric collecting for the cluster for the first time, please enter the following commands:

   ```
   ECS_LAUNCH_TYPE=FARGATE
   CREATE_IAM_ROLES=True
   ECS_CLUSTER_SUBNET=$ECS_CLUSTER_SUBNET_1
   ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
   ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
   
   aws cloudformation create-stack --stack-name CWAgent-Prometheus-ECS-$ECS_CLUSTER_NAME-$ECS_LAUNCH_TYPE-awsvpc \
       --template-body file://cwagent-ecs-prometheus-metric-for-awsvpc.yaml \
       --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                    ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                    ParameterKey=ECSLaunchType,ParameterValue=$ECS_LAUNCH_TYPE \
                    ParameterKey=SecurityGroupID,ParameterValue=$ECS_CLUSTER_SECURITY_GROUP \
                    ParameterKey=SubnetID,ParameterValue=$ECS_CLUSTER_SUBNET \
                    ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                    ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
       --capabilities CAPABILITY_NAMED_IAM \
       --region ${AWS_DEFAULT_REGION}
   ```

## Viewing your Redis OSS metrics
<a name="ContainerInsights-Prometheus-Setup-redis-view"></a>

This tutorial sends the following metrics to the **ECS/ContainerInsights/Prometheus** namespace in CloudWatch. You can use the CloudWatch console to see the metrics in that namespace.


| Metric Name | Dimensions | 
| --- | --- | 
|  `redis_net_input_bytes_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_net_output_bytes_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_expired_keys_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_evicted_keys_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_keyspace_hits_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_keyspace_misses_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_memory_used_bytes` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_connected_clients` |  ClusterName, `TaskDefinitionFamily`  | 
|  `redis_commands_total` |  ` ClusterName`, `TaskDefinitionFamily`, `cmd`  | 
|  `redis_db_keys` |  `ClusterName`, `TaskDefinitionFamily`, `db`  | 

**Note**  
The value of the **cmd** dimension can be: `append`, `client`, `command`, `config`, `dbsize`, `flushall`, `get`, `incr`, `info`, `latency`, or `slowlog`.  
The value of the **db** dimension can be `db0` to `db15`. 

You can also create a CloudWatch dashboard for your Redis OSS Prometheus metrics.

**To create a dashboard for Redis OSS Prometheus metrics**

1. Create environment variables, replacing the values below to match your deployment.

   ```
   DASHBOARD_NAME=your_cw_dashboard_name
   ECS_TASK_DEF_FAMILY=redis-prometheus-demo-$ECS_CLUSTER_NAME-fargate-awsvpc
   ```

1. Enter the following command to create the dashboard.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/redis/cw_dashboard_redis.json \
   | sed "s/{{YOUR_AWS_REGION}}/${REGION_NAME}/g" \
   | sed "s/{{YOUR_CLUSTER_NAME}}/${CLUSTER_NAME}/g" \
   | sed "s/{{YOUR_NAMESPACE}}/${NAMESPACE}/g" \
   ```

# Set up and configure Prometheus metrics collection on Amazon EKS and Kubernetes clusters
<a name="ContainerInsights-Prometheus-install-EKS"></a>

To collect Prometheus metrics from clusters running Amazon EKS or Kubernetes, you can use the CloudWatch agent as a collector or use the AWS Distro for OpenTelemetry collector. For information about using the AWS Distro for OpenTelemetry collector, see [https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus](https://aws-otel.github.io/docs/getting-started/container-insights/eks-prometheus).

The following sections explain how to collect Prometheus metrics using the CloudWatch agent. They explain how to install the CloudWatch agent with Prometheus monitoring on clusters running Amazon EKS or Kubernetes, and how to configure the agent to scrape additional targets. They also provide optional tutorials for setting up sample workloads to use for testing with Prometheus monitoring.

**Topics**
+ [

# Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters
](ContainerInsights-Prometheus-Setup.md)

# Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters
<a name="ContainerInsights-Prometheus-Setup"></a>

This section explains how to set up the CloudWatch agent with Prometheus monitoring in a cluster running Amazon EKS or Kubernetes. After you do this, the agent automatically scrapes and imports metrics for the following workloads running in that cluster.
+ AWS App Mesh
+ NGINX
+ Memcached
+ Java/JMX
+ HAProxy
+ Fluent Bit

You can also configure the agent to scrape and import additional Prometheus workloads and sources.

Before following these steps to install the CloudWatch agent for Prometheus metric collection, you must have a cluster running on Amazon EKS or a Kubernetes cluster running on an Amazon EC2 instance.

**VPC security group requirements**

The ingress rules of the security groups for the Prometheus workloads must open the Prometheus ports to the CloudWatch agent for scraping the Prometheus metrics by the private IP.

The egress rules of the security group for the CloudWatch agent must allow the CloudWatch agent to connect to the Prometheus workloads' port by private IP. 

**Topics**
+ [

## Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters
](#ContainerInsights-Prometheus-Setup-roles)
+ [

# Scraping additional Prometheus sources and importing those metrics
](ContainerInsights-Prometheus-Setup-configure.md)
+ [

# (Optional) Set up sample containerized Amazon EKS workloads for Prometheus metric testing
](ContainerInsights-Prometheus-Sample-Workloads.md)

## Install the CloudWatch agent with Prometheus metrics collection on Amazon EKS and Kubernetes clusters
<a name="ContainerInsights-Prometheus-Setup-roles"></a>

This section explains how to set up the CloudWatch agent with Prometheus monitoring in a cluster running Amazon EKS or Kubernetes. After you do this, the agent automatically scrapes and imports metrics for the following workloads running in that cluster.
+ AWS App Mesh
+ NGINX
+ Memcached
+ Java/JMX
+ HAProxy
+ Fluent Bit

You can also configure the agent to scrape and import additional Prometheus workloads and sources.

Before following these steps to install the CloudWatch agent for Prometheus metric collection, you must have a cluster running on Amazon EKS or a Kubernetes cluster running on an Amazon EC2 instance.

**VPC security group requirements**

The ingress rules of the security groups for the Prometheus workloads must open the Prometheus ports to the CloudWatch agent for scraping the Prometheus metrics by the private IP.

The egress rules of the security group for the CloudWatch agent must allow the CloudWatch agent to connect to the Prometheus workloads' port by private IP. 

**Topics**
+ [

### Setting up IAM roles
](#ContainerInsights-Prometheus-Setup-roles)
+ [

### Installing the CloudWatch agent to collect Prometheus metrics
](#ContainerInsights-Prometheus-Setup-install-agent)

### Setting up IAM roles
<a name="ContainerInsights-Prometheus-Setup-roles"></a>

The first step is to set up the necessary IAM role in the cluster. There are two methods:
+ Set up an IAM role for a service account, also known as a *service role*. This method works for both the EC2 launch type and the Fargate launch type.
+ Add an IAM policy to the IAM role used for the cluster. This works only for the EC2 launch type.

**Set up a service role (EC2 launch type and Fargate launch type)**

To set up a service role, enter the following command. Replace *MyCluster* with the name of the cluster.

```
eksctl create iamserviceaccount \
 --name cwagent-prometheus \
--namespace amazon-cloudwatch \
 --cluster MyCluster \
--attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
--approve \
--override-existing-serviceaccounts
```

**Add a policy to the node group's IAM role (EC2 launch type only)**

**To set up the IAM policy in a node group for Prometheus support**

1. Open the Amazon EC2 console at [https://console.aws.amazon.com/ec2/](https://console.aws.amazon.com/ec2/).

1. In the navigation pane, choose **Instances**.

1. You need to find the prefix of the IAM role name for the cluster. To do this, select the check box next to the name of an instance that is in the cluster, and choose **Actions**, **Security**, **Modify IAM Role**. Then copy the prefix of the IAM role, such as `eksctl-dev303-workshop-nodegroup`.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**.

1. Use the search box to find the prefix that you copied earlier in this procedure, and choose that role.

1. Choose **Attach policies**.

1. Use the search box to find **CloudWatchAgentServerPolicy**. Select the check box next to **CloudWatchAgentServerPolicy**, and choose **Attach policy**.

### Installing the CloudWatch agent to collect Prometheus metrics
<a name="ContainerInsights-Prometheus-Setup-install-agent"></a>

You must install the CloudWatch agent in the cluster to collect the metrics. How to install the agent differs for Amazon EKS clusters and Kubernetes clusters.

**Delete previous versions of the CloudWatch agent with Prometheus support**

If you have already installed a version of the CloudWatch agent with Prometheus support in your cluster, you must delete that version by entering the following command. This is necessary only for previous versions of the agent with Prometheus support. You do not need to delete the CloudWatch agent that enables Container Insights without Prometheus support.

```
kubectl delete deployment cwagent-prometheus -n amazon-cloudwatch
```

#### Installing the CloudWatch agent on Amazon EKS clusters with the EC2 launch type
<a name="ContainerInsights-Prometheus-Setup-install-agent-EKS"></a>

To install the CloudWatch agent with Prometheus support on an Amazon EKS cluster, follow these steps.

**To install the CloudWatch agent with Prometheus support on an Amazon EKS cluster**

1. Enter the following command to check whether the `amazon-cloudwatch` namespace has already been created:

   ```
   kubectl get namespace
   ```

1. If `amazon-cloudwatch` is not displayed in the results, create it by entering the following command:

   ```
   kubectl create namespace amazon-cloudwatch
   ```

1. To deploy the agent with the default configuration and have it send data to the AWS Region that it is installed in, enter the following command:

   ```
   kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml
   ```

   To have the agent send data to a different Region instead, follow these steps:

   1. Download the YAML file for the agent by entering the following command:

      ```
      curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml
      ```

   1. Open the file with a text editor, and search for the `cwagentconfig.json` block of the file.

   1. Add the highlighted lines, specifying the Region that you want:

      ```
      cwagentconfig.json: |
          {
            "agent": {
              "region": "us-east-2"
            },
            "logs": { ...
      ```

   1. Save the file and deploy the agent using your updated file.

      ```
      kubectl apply -f prometheus-eks.yaml
      ```

#### Installing the CloudWatch agent on Amazon EKS clusters with the Fargate launch type
<a name="ContainerInsights-Prometheus-Setup-install-agent-EKS-fargate"></a>

To install the CloudWatch agent with Prometheus support on an Amazon EKS cluster with the Fargate launch type, follow these steps.

**To install the CloudWatch agent with Prometheus support on an Amazon EKS cluster with the Fargate launch type**

1. Enter the following command to create a Fargate profile for the CloudWatch agent so that it can run inside the cluster. Replace *MyCluster* with the name of the cluster.

   ```
   eksctl create fargateprofile --cluster MyCluster \
   --name amazon-cloudwatch \
   --namespace amazon-cloudwatch
   ```

1. To install the CloudWatch agent, enter the following command. Replace *MyCluster* with the name of the cluster. This name is used in the log group name that stores the log events collected by the agent, and is also used as a dimension for the metrics collected by the agent.

   Replace *region* with the name of the Region where you want the metrics to be sent. For example, `us-west-1`. 

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks-fargate.yaml | 
   sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" | 
   kubectl apply -f -
   ```

#### Installing the CloudWatch agent on a Kubernetes cluster
<a name="ContainerInsights-Prometheus-Setup-install-agent-Kubernetes"></a>

To install the CloudWatch agent with Prometheus support on a cluster running Kubernetes, enter the following command:

```
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-k8s.yaml | 
sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" | 
kubectl apply -f -
```

Replace *MyCluster* with the name of the cluster. This name is used in the log group name that stores the log events collected by the agent, and is also used as a dimension for the metrics collected by the agent.

Replace *region* with the name of the AWS Region where you want the metrics to be sent. For example, **us-west-1**.

#### Verify that the agent is running
<a name="ContainerInsights-Prometheus-Setup-install-agent-verify"></a>

On both Amazon EKS and Kubernetes clusters, you can enter the following command to confirm that the agent is running.

```
kubectl get pod -l "app=cwagent-prometheus" -n amazon-cloudwatch
```

If the results include a single CloudWatch agent pod in the `Running` state, the agent is running and collecting Prometheus metrics. By default the CloudWatch agent collects metrics for App Mesh, NGINX, Memcached, Java/JMX, and HAProxy every minute. For more information about those metrics, see [Prometheus metrics collected by the CloudWatch agent](ContainerInsights-Prometheus-metrics.md). For instructions on how to see your Prometheus metrics in CloudWatch, see [Viewing your Prometheus metrics](ContainerInsights-Prometheus-viewmetrics.md)

You can also configure the CloudWatch agent to collect metrics from other Prometheus exporters. For more information, see [Scraping additional Prometheus sources and importing those metrics](ContainerInsights-Prometheus-Setup-configure.md).

# Scraping additional Prometheus sources and importing those metrics
<a name="ContainerInsights-Prometheus-Setup-configure"></a>

The CloudWatch agent with Prometheus monitoring needs two configurations to scrape the Prometheus metrics. One is for the standard Prometheus configurations as documented in [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) in the Prometheus documentation. The other is for the CloudWatch agent configuration.

For Amazon EKS clusters, the configurations are defined in `prometheus-eks.yaml` (for the EC2 launch type) or `prometheus-eks-fargate.yaml` (for the Fargate launch type) as two config maps:
+ The `name: prometheus-config` section contains the settings for Prometheus scraping.
+ The `name: prometheus-cwagentconfig` section contains the configuration for the CloudWatch agent. You can use this section to configure how the Prometheus metrics are collected by CloudWatch. For example, you specify which metrics are to be imported into CloudWatch, and define their dimensions. 

For Kubernetes clusters running on Amazon EC2 instances, the configurations are defined in the `prometheus-k8s.yaml` YAML file as two config maps:
+ The `name: prometheus-config` section contains the settings for Prometheus scraping.
+ The `name: prometheus-cwagentconfig` section contains the configuration for the CloudWatch agent. 

To scrape additional Prometheus metrics sources and import those metrics to CloudWatch, you modify both the Prometheus scrape configuration and the CloudWatch agent configuration, and then re-deploy the agent with the updated configuration.

**VPC security group requirements**

The ingress rules of the security groups for the Prometheus workloads must open the Prometheus ports to the CloudWatch agent for scraping the Prometheus metrics by the private IP.

The egress rules of the security group for the CloudWatch agent must allow the CloudWatch agent to connect to the Prometheus workloads' port by private IP. 

## Prometheus scrape configuration
<a name="ContainerInsights-Prometheus-Setup-config-global"></a>

The CloudWatch agent supports the standard Prometheus scrape configurations as documented in [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) in the Prometheus documentation. You can edit this section to update the configurations that are already in this file, and add additional Prometheus scraping targets. By default, the sample configuration file contains the following global configuration lines:

```
global:
  scrape_interval: 1m
  scrape_timeout: 10s
```
+ **scrape\$1interval**— Defines how frequently to scrape targets.
+ **scrape\$1timeout**— Defines how long to wait before a scrape request times out.

You can also define different values for these settings at the job level, to override the global configurations.

### Prometheus scraping jobs
<a name="ContainerInsights-Prometheus-Setup-config-scrape"></a>

The CloudWatch agent YAML files already have some default scraping jobs configured. For example, in `prometheus-eks.yaml`, the default scraping jobs are configured in the `job_name` lines in the `scrape_configs` section. In this file, the following default `kubernetes-pod-jmx` section scrapes JMX exporter metrics.

```
   - job_name: 'kubernetes-pod-jmx'
      sample_limit: 10000
      metrics_path: /metrics
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__address__]
        action: keep
        regex: '.*:9404$'
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: Namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_controller_name
        target_label: pod_controller_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_controller_kind
        target_label: pod_controller_kind
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_phase
        target_label: pod_phase
```

Each of these default targets are scraped, and the metrics are sent to CloudWatch in log events using embedded metric format. For more information, see [Embedding metrics within logs](CloudWatch_Embedded_Metric_Format.md).

Log events from Amazon EKS and Kubernetes clusters are stored in the **/aws/containerinsights/*cluster\$1name*/prometheus** log group in CloudWatch Logs. Log events from Amazon ECS clusters are stored in the **/aws/ecs/containerinsights/*cluster\$1name*/prometheus** log group.

Each scraping job is contained in a different log stream in this log group. For example, the Prometheus scraping job `kubernetes-pod-appmesh-envoy` is defined for App Mesh. All App Mesh Prometheus metrics from Amazon EKS and Kubernetes clusters are sent to the log stream named **/aws/containerinsights/*cluster\$1name*>prometheus/kubernetes-pod-appmesh-envoy/**.

To add a new scraping target, you add a new `job_name` section to the `scrape_configs` section of the YAML file, and restart the agent. For an example of this process, see [Tutorial for adding a new Prometheus scrape target: Prometheus API Server metrics](#ContainerInsights-Prometheus-Setup-new-exporters).

## CloudWatch agent configuration for Prometheus
<a name="ContainerInsights-Prometheus-Setup-cw-agent-config2"></a>

The CloudWatch agent configuration file has a `prometheus` section under `metrics_collected` for the Prometheus scraping configuration. It includes the following configuration options:
+ **cluster\$1name**— specifies the cluster name to be added as a label in the log event. This field is optional. If you omit it, the agent can detect the Amazon EKS or Kubernetes cluster name.
+ **log\$1group\$1name**— specifies the log group name for the scraped Prometheus metrics. This field is optional. If you omit it, CloudWatch uses **/aws/containerinsights/*cluster\$1name*/prometheus** for logs from Amazon EKS and Kubernetes clusters.
+ **prometheus\$1config\$1path**— specifies the Prometheus scrape configuration file path. If the value of this field starts with `env:` the Prometheus scrape configuration file contents will be retrieved from the container's environment variable. Do not change this field.
+ **ecs\$1service\$1discovery**— is the section to specify the configuration for Amazon ECS Prometheus service discovery. For more information, see [Detailed guide for autodiscovery on Amazon ECS clusters](ContainerInsights-Prometheus-Setup-autodiscovery-ecs.md).

  The `ecs_service_discovery` section can contain the following fields:
  + `sd_frequency` is the frequency to discover the Prometheus exporters. Specify a number and a unit suffix. For example, `1m` for once per minute or `30s` for once per 30 seconds. Valid unit suffixes are `ns`, `us`, `ms`, `s`, `m`, and `h`.

    This field is optional. The default is 60 seconds (1 minute).
  + `sd_target_cluster` is the target Amazon ECS cluster name for auto-discovery. This field is optional. The default is the name of the Amazon ECS cluster where the CloudWatch agent is installed. 
  + `sd_cluster_region` is the target Amazon ECS cluster's Region. This field is optional. The default is the Region of the Amazon ECS cluster where the CloudWatch agent is installed. .
  + `sd_result_file` is the path of the YAML file for the Prometheus target results. The Prometheus scrape configuration will refer to this file.
  + `docker_label` is an optional section that you can use to specify the configuration for docker label-based service discovery. If you omit this section, docker label-based discovery is not used. This section can contain the following fields:
    + `sd_port_label` is the container's docker label name that specifies the container port for Prometheus metrics. The default value is `ECS_PROMETHEUS_EXPORTER_PORT`. If the container does not have this docker label, the CloudWatch agent will skip it.
    + `sd_metrics_path_label` is the container's docker label name that specifies the Prometheus metrics path. The default value is `ECS_PROMETHEUS_METRICS_PATH`. If the container does not have this docker label, the agent assumes the default path `/metrics`.
    + `sd_job_name_label` is the container's docker label name that specifies the Prometheus scrape job name. The default value is `job`. If the container does not have this docker label, the CloudWatch agent uses the job name in the Prometheus scrape configuration.
  + `task_definition_list` is an optional section that you can use to specify the configuration of task definition-based service discovery. If you omit this section, task definition-based discovery is not used. This section can contain the following fields:
    + `sd_task_definition_arn_pattern` is the pattern to use to specify the Amazon ECS task definitions to discover. This is a regular expression.
    + `sd_metrics_ports` lists the containerPort for the Prometheus metrics. Separate the containerPorts with semicolons.
    + `sd_container_name_pattern` specifies the Amazon ECS task container names. This is a regular expression.
    + `sd_metrics_path` specifies the Prometheus metric path. If you omit this, the agent assumes the default path `/metrics`
    + `sd_job_name` specifies the Prometheus scrape job name. If you omit this field, the CloudWatch agent uses the job name in the Prometheus scrape configuration.
+ **metric\$1declaration**— are sections that specify the array of logs with embedded metric format to be generated. There are `metric_declaration` sections for each Prometheus source that the CloudWatch agent imports from by default. These sections each include the following fields:
  + `label_matcher` is a regular expression that checks the value of the labels listed in `source_labels`. The metrics that match are enabled for inclusion in the embedded metric format sent to CloudWatch. 

    If you have multiple labels specified in `source_labels`, we recommend that you do not use `^` or `$` characters in the regular expression for `label_matcher`.
  + `source_labels` specifies the value of the labels that are checked by the `label_matcher` line.
  + `label_separator` specifies the separator to be used in the ` label_matcher` line if multiple `source_labels` are specified. The default is `;`. You can see this default used in the `label_matcher` line in the following example.
  + `metric_selectors` is a regular expression that specifies the metrics to be collected and sent to CloudWatch.
  + `dimensions` is the list of labels to be used as CloudWatch dimensions for each selected metric.

See the following `metric_declaration` example.

```
"metric_declaration": [
  {
     "source_labels":[ "Service", "Namespace"],
     "label_matcher":"(.*node-exporter.*|.*kube-dns.*);kube-system",
     "dimensions":[
        ["Service", "Namespace"]
     ],
     "metric_selectors":[
        "^coredns_dns_request_type_count_total$"
     ]
  }
]
```

This example configures an embedded metric format section to be sent as a log event if the following conditions are met:
+ The value of `Service` contains either `node-exporter` or `kube-dns`.
+ The value of `Namespace` is `kube-system`.
+ The Prometheus metric `coredns_dns_request_type_count_total` contains both `Service` and `Namespace` labels.

The log event that is sent includes the following highlighted section:

```
{
   "CloudWatchMetrics":[
      {
         "Metrics":[
            {
               "Name":"coredns_dns_request_type_count_total"
            }
         ],
         "Dimensions":[
            [
               "Namespace",
               "Service"
            ]
         ],
         "Namespace":"ContainerInsights/Prometheus"
      }
   ],
   "Namespace":"kube-system",
   "Service":"kube-dns",
   "coredns_dns_request_type_count_total":2562,
   "eks_amazonaws_com_component":"kube-dns",
   "instance":"192.168.61.254:9153",
   "job":"kubernetes-service-endpoints",
   ...
}
```

## Tutorial for adding a new Prometheus scrape target: Prometheus API Server metrics
<a name="ContainerInsights-Prometheus-Setup-new-exporters"></a>

The Kubernetes API Server exposes Prometheus metrics on endpoints by default. The official example for the Kubernetes API Server scraping configuration is available on [Github](https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml).

The following tutorial shows how to do the following steps to begin importing Kubernetes API Server metrics into CloudWatch:
+ Adding the Prometheus scraping configuration for Kubernetes API Server to the CloudWatch agent YAML file.
+ Configuring the embedded metric format metrics definitions in the CloudWatch agent YAML file.
+ (Optional) Creating a CloudWatch dashboard for the Kubernetes API Server metrics.

**Note**  
The Kubernetes API Server exposes gauge, counter, histogram, and summary metrics. In this release of Prometheus metrics support, CloudWatch imports only the metrics with gauge, counter, and summary types.

**To start collecting Kubernetes API Server Prometheus metrics in CloudWatch**

1. Download the latest version of the `prometheus-eks.yaml`, `prometheus-eks-fargate.yaml`, or `prometheus-k8s.yaml` file by entering one of the following commands.

   For an Amazon EKS cluster with the EC2 launch type, enter the following command:

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml
   ```

   For an Amazon EKS cluster with the Fargate launch type, enter the following command:

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks-fargate.yaml
   ```

   For a Kubernetes cluster running on an Amazon EC2 instance, enter the following command:

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-k8s.yaml
   ```

1. Open the file with a text editor, find the `prometheus-config` section, and add the following section inside of that section. Then save the changes:

   ```
       # Scrape config for API servers
       - job_name: 'kubernetes-apiservers'
         kubernetes_sd_configs:
           - role: endpoints
             namespaces:
               names:
                 - default
         scheme: https
         tls_config:
           ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
           insecure_skip_verify: true
         bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
         relabel_configs:
         - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
           action: keep
           regex: kubernetes;https
         - action: replace
           source_labels:
           - __meta_kubernetes_namespace
           target_label: Namespace
         - action: replace
           source_labels:
           - __meta_kubernetes_service_name
           target_label: Service
   ```

1. While you still have the YAML file open in the text editor, find the `cwagentconfig.json` section. Add the following subsection and save the changes. This section puts the API server metrics onto the CloudWatch agent allow list. Three types of API Server metrics are added to the allow list:
   + etcd object counts
   + API Server registration controller metrics
   + API Server request metrics

   ```
   {"source_labels": ["job", "resource"],
     "label_matcher": "^kubernetes-apiservers;(services|daemonsets.apps|deployments.apps|configmaps|endpoints|secrets|serviceaccounts|replicasets.apps)",
     "dimensions": [["ClusterName","Service","resource"]],
     "metric_selectors": [
     "^etcd_object_counts$"
     ]
   },
   {"source_labels": ["job", "name"],
      "label_matcher": "^kubernetes-apiservers;APIServiceRegistrationController$",
      "dimensions": [["ClusterName","Service","name"]],
      "metric_selectors": [
      "^workqueue_depth$",
      "^workqueue_adds_total$",
      "^workqueue_retries_total$"
     ]
   },
   {"source_labels": ["job","code"],
     "label_matcher": "^kubernetes-apiservers;2[0-9]{2}$",
     "dimensions": [["ClusterName","Service","code"]],
     "metric_selectors": [
      "^apiserver_request_total$"
     ]
   },
   {"source_labels": ["job"],
     "label_matcher": "^kubernetes-apiservers",
     "dimensions": [["ClusterName","Service"]],
     "metric_selectors": [
     "^apiserver_request_total$"
     ]
   },
   ```

1. If you already have the CloudWatch agent with Prometheus support deployed in the cluster, you must delete it by entering the following command:

   ```
   kubectl delete deployment cwagent-prometheus -n amazon-cloudwatch
   ```

1. Deploy the CloudWatch agent with your updated configuration by entering one of the following commands. For an Amazon EKS cluster with the EC2 launch type, enter:

   ```
   kubectl apply -f prometheus-eks.yaml
   ```

   For an Amazon EKS cluster with the Fargate launch type, enter the following command. Replace *MyCluster* and *region* with values to match your deployment.

   ```
   cat prometheus-eks-fargate.yaml \
   | sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" \
   | kubectl apply -f -
   ```

   For a Kubernetes cluster, enter the following command. Replace *MyCluster* and *region* with values to match your deployment.

   ```
   cat prometheus-k8s.yaml \
   | sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" \
   | kubectl apply -f -
   ```

Once you have done this, you should see a new log stream named ** kubernetes-apiservers ** in the **/aws/containerinsights/*cluster\$1name*/prometheus** log group. This log stream should include log events with an embedded metric format definition like the following:

```
{
   "CloudWatchMetrics":[
      {
         "Metrics":[
            {
               "Name":"apiserver_request_total"
            }
         ],
         "Dimensions":[
            [
               "ClusterName",
               "Service"
            ]
         ],
         "Namespace":"ContainerInsights/Prometheus"
      }
   ],
   "ClusterName":"my-cluster-name",
   "Namespace":"default",
   "Service":"kubernetes",
   "Timestamp":"1592267020339",
   "Version":"0",
   "apiserver_request_count":0,
   "apiserver_request_total":0,
   "code":"0",
   "component":"apiserver",
   "contentType":"application/json",
   "instance":"192.0.2.0:443",
   "job":"kubernetes-apiservers",
   "prom_metric_type":"counter",
   "resource":"pods",
   "scope":"namespace",
   "verb":"WATCH",
   "version":"v1"
}
```

You can view your metrics in the CloudWatch console in the **ContainerInsights/Prometheus** namespace. You can also optionally create a CloudWatch dashboard for your Prometheus Kubernetes API Server metrics.

### (Optional) Creating a dashboard for Kubernetes API Server metrics
<a name="ContainerInsights-Prometheus-Setup-KPI-dashboard"></a>

To see Kubernetes API Server metrics in your dashboard, you must have first completed the steps in the previous sections to start collecting these metrics in CloudWatch.

**To create a dashboard for Kubernetes API Server metrics**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Make sure you have the correct AWS Region selected.

1. In the navigation pane, choose **Dashboards**.

1. Choose **Create Dashboard**. Enter a name for the new dashboard, and choose **Create dashboard**.

1. In **Add to this dashboard**, choose **Cancel**.

1. Choose **Actions**, **View/edit source**.

1. Download the following JSON file: [Kubernetes API Dashboard source](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/kubernetes_api_server/cw_dashboard_kubernetes_api_server.json).

1. Open the JSON file that you downloaded with a text editor, and make the following changes:
   + Replace all the `{{YOUR_CLUSTER_NAME}}` strings with the exact name of your cluster. Make sure not to add whitespaces before or after the text.
   + Replace all the `{{YOUR_AWS_REGION}}` strings with the name of the Region where the metrics are collected. For example `us-west-2`. Be sure not to add whitespaces before or after the text.

1. Copy the entire JSON blob and paste it into the text box in the CloudWatch console, replacing what is already in the box.

1. Choose **Update**, **Save dashboard**.

# (Optional) Set up sample containerized Amazon EKS workloads for Prometheus metric testing
<a name="ContainerInsights-Prometheus-Sample-Workloads"></a>

To test the Prometheus metric support in CloudWatch Container Insights, you can set up one or more of the following containerized workloads. The CloudWatch agent with Prometheus support automatically collects metrics from each of these workloads. To see the metrics that are collected by default, see [Prometheus metrics collected by the CloudWatch agent](ContainerInsights-Prometheus-metrics.md).

Before you can install any of these workloads, you must install Helm 3.x by entering the following commands:

```
brew install helm
```

For more information, see [Helm](https://helm.sh).

**Topics**
+ [

# Set up AWS App Mesh sample workload for Amazon EKS and Kubernetes
](ContainerInsights-Prometheus-Sample-Workloads-appmesh.md)
+ [

# Set up NGINX with sample traffic on Amazon EKS and Kubernetes
](ContainerInsights-Prometheus-Sample-Workloads-nginx.md)
+ [

# Set up memcached with a metric exporter on Amazon EKS and Kubernetes
](ContainerInsights-Prometheus-Sample-Workloads-memcached.md)
+ [

# Set up Java/JMX sample workload on Amazon EKS and Kubernetes
](ContainerInsights-Prometheus-Sample-Workloads-javajmx.md)
+ [

# Set up HAProxy with a metric exporter on Amazon EKS and Kubernetes
](ContainerInsights-Prometheus-Sample-Workloads-haproxy.md)
+ [

# Tutorial for adding a new Prometheus scrape target: Redis OSS on Amazon EKS and Kubernetes clusters
](ContainerInsights-Prometheus-Setup-redis-eks.md)

# Set up AWS App Mesh sample workload for Amazon EKS and Kubernetes
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh"></a>

Prometheus support in CloudWatch Container Insights supports AWS App Mesh. The following sections explain how to set up App Mesh.

**Topics**
+ [

# Set up AWS App Mesh sample workload on an Amazon EKS cluster with the EC2 launch type or a Kubernetes cluster
](ContainerInsights-Prometheus-Sample-Workloads-appmesh-EKS.md)
+ [

# Set up AWS App Mesh sample workload on an Amazon EKS cluster with the Fargate launch type
](ContainerInsights-Prometheus-Sample-Workloads-appmesh-Fargate.md)

# Set up AWS App Mesh sample workload on an Amazon EKS cluster with the EC2 launch type or a Kubernetes cluster
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-EKS"></a>

Use these instructions if you are setting up App Mesh on a cluster running Amazon EKS with the EC2 launch type, or a Kubernetes cluster.

## Configure IAM permissions
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-iam"></a>

You must add the **AWSAppMeshFullAccess** policy to the IAM role for your Amazon EKS or Kubernetes node group. On Amazon EKS, this node group name looks similar to `eksctl-integ-test-eks-prometheus-NodeInstanceRole-ABCDEFHIJKL`. On Kubernetes, it might look similar to `nodes.integ-test-kops-prometheus.k8s.local`.

## Install App Mesh
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-install"></a>

To install the App Mesh Kubernetes controller, follow the instructions in [App Mesh Controller](https://github.com/aws/eks-charts/tree/master/stable/appmesh-controller#app-mesh-controller).

## Install a sample application
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-application"></a>

[aws-app-mesh-examples](https://github.com/aws/aws-app-mesh-examples) contains several Kubernetes App Mesh walkthroughs. For this tutorial, you install a sample color application that shows how http routes can use headers for matching incoming requests.

**To use a sample App Mesh application to test Container Insights**

1. Install the application using these instructions: [https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-http-headers](https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-http-headers). 

1. Launch a curler pod to generate traffic:

   ```
   kubectl -n default run -it curler --image=tutum/curl /bin/bash
   ```

1. Curl different endpoints by changing HTTP headers. Run the curl command multiple times, as shown:

   ```
   curl -H "color_header: blue" front.howto-k8s-http-headers.svc.cluster.local:8080/; echo;
   
   curl -H "color_header: red" front.howto-k8s-http-headers.svc.cluster.local:8080/; echo;
   
   curl -H "color_header: yellow" front.howto-k8s-http-headers.svc.cluster.local:8080/; echo;
   ```

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the AWS Region where your cluster is running, choose **Metrics** in the navigation pane. The metric are in the **ContainerInsights/Prometheus** namespace.

1. To see the CloudWatch Logs events, choose **Log groups** in the navigation pane. The events are in the log group ` /aws/containerinsights/your_cluster_name/prometheus ` in the log stream `kubernetes-pod-appmesh-envoy`.

## Deleting the App Mesh test environment
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-delete"></a>

When you have finished using App Mesh and the sample application, use the following commands to delete the unnecessary resources. Delete the sample application by entering the following command:

```
cd aws-app-mesh-examples/walkthroughs/howto-k8s-http-headers/
kubectl delete -f _output/manifest.yaml
```

Delete the App Mesh controller by entering the following command:

```
helm delete appmesh-controller -n appmesh-system
```

# Set up AWS App Mesh sample workload on an Amazon EKS cluster with the Fargate launch type
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-Fargate"></a>

Use these instructions if you are setting up App Mesh on a cluster running Amazon EKS with the Fargate launch type.

## Configure IAM permissions
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh--fargate-iam"></a>

To set up IAM permissions, enter the following command. Replace *MyCluster* with the name of your cluster.

```
eksctl create iamserviceaccount --cluster MyCluster \
 --namespace howto-k8s-fargate \
 --name appmesh-pod \
 --attach-policy-arn arn:aws:iam::aws:policy/AWSAppMeshEnvoyAccess \
 --attach-policy-arn arn:aws:iam::aws:policy/AWSCloudMapDiscoverInstanceAccess \
 --attach-policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess \
 --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess \
 --attach-policy-arn arn:aws:iam::aws:policy/AWSAppMeshFullAccess \
 --attach-policy-arn arn:aws:iam::aws:policy/AWSCloudMapFullAccess \
 --override-existing-serviceaccounts \
 --approve
```

## Install App Mesh
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-fargate-install"></a>

To install the App Mesh Kubernetes controller, follow the instructions in [App Mesh Controller](https://github.com/aws/eks-charts/tree/master/stable/appmesh-controller#app-mesh-controller). Be sure to follow the instructions for Amazon EKS with the Fargate launch type.

## Install a sample application
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-fargate-application"></a>

[aws-app-mesh-examples](https://github.com/aws/aws-app-mesh-examples) contains several Kubernetes App Mesh walkthroughs. For this tutorial, you install a sample color application that works for Amazon EKS clusters with the Fargate launch type.

**To use a sample App Mesh application to test Container Insights**

1. Install the application using these instructions: [https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-fargate](https://github.com/aws/aws-app-mesh-examples/tree/main/walkthroughs/howto-k8s-fargate). 

   Those instructions assume that you are creating a new cluster with the correct Fargate profile. If you want to use an Amazon EKS cluster that you've already set up, you can use the following commands to set up that cluster for this demonstration. Replace *MyCluster* with the name of your cluster.

   ```
   eksctl create iamserviceaccount --cluster MyCluster \
    --namespace howto-k8s-fargate \
    --name appmesh-pod \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSAppMeshEnvoyAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSCloudMapDiscoverInstanceAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSAppMeshFullAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AWSCloudMapFullAccess \
    --override-existing-serviceaccounts \
    --approve
   ```

   ```
   eksctl create fargateprofile --cluster MyCluster \
   --namespace howto-k8s-fargate --name howto-k8s-fargate
   ```

1. Port forward the front application deployment:

   ```
   kubectl -n howto-k8s-fargate port-forward deployment/front 8080:8080
   ```

1. Curl the front app:

   ```
   while true; do  curl -s http://localhost:8080/color; sleep 0.1; echo ; done
   ```

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the AWS Region where your cluster is running, choose **Metrics** in the navigation pane. The metric are in the **ContainerInsights/Prometheus** namespace.

1. To see the CloudWatch Logs events, choose **Log groups** in the navigation pane. The events are in the log group ` /aws/containerinsights/your_cluster_name/prometheus ` in the log stream `kubernetes-pod-appmesh-envoy`.

## Deleting the App Mesh test environment
<a name="ContainerInsights-Prometheus-Sample-Workloads-appmesh-fargate-delete"></a>

When you have finished using App Mesh and the sample application, use the following commands to delete the unnecessary resources. Delete the sample application by entering the following command:

```
cd aws-app-mesh-examples/walkthroughs/howto-k8s-fargate/
kubectl delete -f _output/manifest.yaml
```

Delete the App Mesh controller by entering the following command:

```
helm delete appmesh-controller -n appmesh-system
```

# Set up NGINX with sample traffic on Amazon EKS and Kubernetes
<a name="ContainerInsights-Prometheus-Sample-Workloads-nginx"></a>

NGINX is a web server that can also be used as a load balancer and reverse proxy. For more information about how Kubernetes uses NGINX for ingress , see [kubernetes/ingress-nginx](https://github.com/kubernetes/ingress-nginx).

**To install Ingress-NGINX with a sample traffic service to test Container Insights Prometheus support**

1. Enter the following command to add the Helm ingress-nginx repo:

   ```
   helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
   ```

1. Enter the following commands:

   ```
   kubectl create namespace nginx-ingress-sample
   
   helm install my-nginx ingress-nginx/ingress-nginx \
   --namespace nginx-ingress-sample \
   --set controller.metrics.enabled=true \
   --set-string controller.metrics.service.annotations."prometheus\.io/port"="10254" \
   --set-string controller.metrics.service.annotations."prometheus\.io/scrape"="true"
   ```

1. Check whether the services started correctly by entering the following command:

   ```
   kubectl get service -n nginx-ingress-sample
   ```

   The output of this command should display several columns, including an `EXTERNAL-IP` column.

1. Set an `EXTERNAL-IP` variable to the value of the `EXTERNAL-IP` column in the row of the NGINX ingress controller.

   ```
   EXTERNAL_IP=your-nginx-controller-external-ip
   ```

1. Start some sample NGINX traffic by entering the following command. 

   ```
   SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_traffic/nginx-traffic/nginx-traffic-sample.yaml | 
   sed "s/{{external_ip}}/$EXTERNAL_IP/g" | 
   sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" | 
   kubectl apply -f -
   ```

1. Enter the following command to confirm that all three pods are in the `Running` status.

   ```
   kubectl get pod -n $SAMPLE_TRAFFIC_NAMESPACE
   ```

   If they are running, you should soon see metrics in the **ContainerInsights/Prometheus** namespace.

**To uninstall NGINX and the sample traffic application**

1. Delete the sample traffic service by entering the following command:

   ```
   kubectl delete namespace $SAMPLE_TRAFFIC_NAMESPACE
   ```

1. Delete the NGINX egress by the Helm release name. 

   ```
   helm uninstall my-nginx --namespace nginx-ingress-sample
   kubectl delete namespace nginx-ingress-sample
   ```

# Set up memcached with a metric exporter on Amazon EKS and Kubernetes
<a name="ContainerInsights-Prometheus-Sample-Workloads-memcached"></a>

memcached is an open-source memory object caching system. For more information, see [What is Memcached?](https://www.memcached.org).

If you are running memcached on a cluster with the Fargate launch type, you need to set up a Fargate profile before doing the steps in this procedure. To set up the profile, enter the following command. Replace *MyCluster* with the name of your cluster.

```
eksctl create fargateprofile --cluster MyCluster \
--namespace memcached-sample --name memcached-sample
```

**To install memcached with a metric exporter to test Container Insights Prometheus support**

1. Enter the following command to add the repo:

   ```
   helm repo add bitnami https://charts.bitnami.com/bitnami
   ```

1. Enter the following command to create a new namespace:

   ```
   kubectl create namespace memcached-sample
   ```

1. Enter the following command to install Memcached

   ```
   helm install my-memcached bitnami/memcached --namespace memcached-sample \
   --set metrics.enabled=true \
   --set-string serviceAnnotations.prometheus\\.io/port="9150" \
   --set-string serviceAnnotations.prometheus\\.io/scrape="true"
   ```

1. Enter the following command to confirm the annotation of the running service:

   ```
   kubectl describe service my-memcached-metrics -n memcached-sample
   ```

   You should see the following two annotations:

   ```
   Annotations:   prometheus.io/port: 9150
                  prometheus.io/scrape: true
   ```

**To uninstall memcached**
+ Enter the following commands:

  ```
  helm uninstall my-memcached --namespace memcached-sample
  kubectl delete namespace memcached-sample
  ```

# Set up Java/JMX sample workload on Amazon EKS and Kubernetes
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx"></a>

JMX Exporter is an official Prometheus exporter that can scrape and expose JMX mBeans as Prometheus metrics. For more information, see [prometheus/jmx\$1exporter](https://github.com/prometheus/jmx_exporter).

Container Insights can collect predefined Prometheus metrics from Java Virtual Machine (JVM), Java, and Tomcat (Catalina) using the JMX Exporter.

## Default Prometheus scrape configuration
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx-default"></a>

By default, the CloudWatch agent with Prometheus support scrapes the Java/JMX Prometheus metrics from `http://CLUSTER_IP:9404/metrics` on each pod in an Amazon EKS or Kubernetes cluster. This is done by `role: pod` discovery of Prometheus `kubernetes_sd_config`. 9404 is the default port allocated for JMX Exporter by Prometheus. For more information about `role: pod` discovery, see [ pod](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod). You can configure the JMX Exporter to expose the metrics on a different port or metrics\$1path. If you do change the port or path, update the default jmx scrape\$1config in the CloudWatch agent config map. Run the following command to get the current CloudWatch agent Prometheus configuration:

```
kubectl describe cm prometheus-config -n amazon-cloudwatch
```

The fields to change are the `/metrics` and `regex: '.*:9404$'` fields, as highlighted in the following example.

```
job_name: 'kubernetes-jmx-pod'
sample_limit: 10000
metrics_path: /metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__address__]
  action: keep
  regex: '.*:9404$'
- action: replace
  regex: (.+)
  source_labels:
```

## Other Prometheus scrape configuration
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx-other"></a>

If you expose your application running on a set of pods with Java/JMX Prometheus exporters by a Kubernetes Service, you can also switch to use `role: service` discovery or `role: endpoint` discovery of Prometheus `kubernetes_sd_config`. For more information about these discovery methods, see [ service](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service), [ endpoints](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints), and [ <kubernetes\$1sd\$1config>.](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config). 

More meta labels are provided by these two service discovery modes which could be useful for you to build the CloudWatch metrics dimensions. For example, you can relabel `__meta_kubernetes_service_name` to `Service` and include it into your metrics’ dimension. For more informatio about customizing your CloudWatch metrics and their dimensions, see [CloudWatch agent configuration for Prometheus](ContainerInsights-Prometheus-Setup-configure-ECS.md#ContainerInsights-Prometheus-Setup-cw-agent-config).

## Docker image with JMX Exporter
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx-docker"></a>

Next, build a Docker image. The following sections provide two example Dockerfiles.

When you have built the image, load it into Amazon EKS or Kubernetes, and then run the following command to verify that Prometheus metrics are exposed by `JMX_EXPORTER` on port 9404. Replace *\$1JAR\$1SAMPLE\$1TRAFFIC\$1POD* with the running pod name and replace *\$1JAR\$1SAMPLE\$1TRAFFIC\$1NAMESPACE* with your application namespace. 

If you are running JMX Exporter on a cluster with the Fargate launch type, you also need to set up a Fargate profile before doing the steps in this procedure. To set up the profile, enter the following command. Replace *MyCluster* with the name of your cluster.

```
eksctl create fargateprofile --cluster MyCluster \
--namespace $JAR_SAMPLE_TRAFFIC_NAMESPACE\
 --name $JAR_SAMPLE_TRAFFIC_NAMESPACE
```

```
kubectl exec $JAR_SAMPLE_TRAFFIC_POD -n $JARCAT_SAMPLE_TRAFFIC_NAMESPACE -- curl http://localhost:9404
```

## Example: Apache Tomcat Docker image with Prometheus metrics
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx-tomcat"></a>

Apache Tomcat server exposes JMX mBeans by default. You can integrate JMX Exporter with Tomcat to expose JMX mBeans as Prometheus metrics. The following example Dockerfile shows the steps to build a testing image: 

```
# From Tomcat 9.0 JDK8 OpenJDK 
FROM tomcat:9.0-jdk8-openjdk 

RUN mkdir -p /opt/jmx_exporter

COPY ./jmx_prometheus_javaagent-0.12.0.jar /opt/jmx_exporter
COPY ./config.yaml /opt/jmx_exporter
COPY ./setenv.sh /usr/local/tomcat/bin 
COPY your web application.war /usr/local/tomcat/webapps/

RUN chmod  o+x /usr/local/tomcat/bin/setenv.sh

ENTRYPOINT ["catalina.sh", "run"]
```

The following list explains the four `COPY` lines in this Dockerfile.
+ Download the latest JMX Exporter jar file from [https://github.com/prometheus/jmx\$1exporter](https://github.com/prometheus/jmx_exporter).
+ `config.yaml` is the JMX Exporter configuration file. For more information, see [https://github.com/prometheus/jmx\$1exporter\$1Configuration](https://github.com/prometheus/jmx_exporter#Configuration ).

  Here is a sample configuration file for Java and Tomcat:

  ```
  lowercaseOutputName: true
  lowercaseOutputLabelNames: true
  
  rules:
  - pattern: 'java.lang<type=OperatingSystem><>(FreePhysicalMemorySize|TotalPhysicalMemorySize|FreeSwapSpaceSize|TotalSwapSpaceSize|SystemCpuLoad|ProcessCpuLoad|OpenFileDescriptorCount|AvailableProcessors)'
    name: java_lang_OperatingSystem_$1
    type: GAUGE
  
  - pattern: 'java.lang<type=Threading><>(TotalStartedThreadCount|ThreadCount)'
    name: java_lang_threading_$1
    type: GAUGE
  
  - pattern: 'Catalina<type=GlobalRequestProcessor, name=\"(\w+-\w+)-(\d+)\"><>(\w+)'
    name: catalina_globalrequestprocessor_$3_total
    labels:
      port: "$2"
      protocol: "$1"
    help: Catalina global $3
    type: COUNTER
  
  - pattern: 'Catalina<j2eeType=Servlet, WebModule=//([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), name=([-a-zA-Z0-9+/$%~_-|!.]*), J2EEApplication=none, J2EEServer=none><>(requestCount|maxTime|processingTime|errorCount)'
    name: catalina_servlet_$3_total
    labels:
      module: "$1"
      servlet: "$2"
    help: Catalina servlet $3 total
    type: COUNTER
  
  - pattern: 'Catalina<type=ThreadPool, name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|keepAliveCount|pollerThreadCount|connectionCount)'
    name: catalina_threadpool_$3
    labels:
      port: "$2"
      protocol: "$1"
    help: Catalina threadpool $3
    type: GAUGE
  
  - pattern: 'Catalina<type=Manager, host=([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), context=([-a-zA-Z0-9+/$%~_-|!.]*)><>(processingTime|sessionCounter|rejectedSessions|expiredSessions)'
    name: catalina_session_$3_total
    labels:
      context: "$2"
      host: "$1"
    help: Catalina session $3 total
    type: COUNTER
  
  - pattern: ".*"
  ```
+ `setenv.sh` is a Tomcat startup script to start the JMX exporter along with Tomcat and expose Prometheus metrics on port 9404 of the localhost. It also provides the JMX Exporter with the `config.yaml` file path.

  ```
  $ cat setenv.sh 
  export JAVA_OPTS="-javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent-0.12.0.jar=9404:/opt/jmx_exporter/config.yaml $JAVA_OPTS"
  ```
+ your web application.war is your web application `war` file to be loaded by Tomcat.

Build a Docker image with this configuration and upload it to an image repository.

## Example: Java Jar Application Docker image with Prometheus metrics
<a name="ContainerInsights-Prometheus-Sample-Workloads-javajmx-jar"></a>

The following example Dockerfile shows the steps to build a testing image: 

```
# Alpine Linux with OpenJDK JRE
FROM openjdk:8-jre-alpine

RUN mkdir -p /opt/jmx_exporter

COPY ./jmx_prometheus_javaagent-0.12.0.jar /opt/jmx_exporter
COPY ./SampleJavaApplication-1.0-SNAPSHOT.jar /opt/jmx_exporter
COPY ./start_exporter_example.sh /opt/jmx_exporter
COPY ./config.yaml /opt/jmx_exporter

RUN chmod -R o+x /opt/jmx_exporter
RUN apk add curl

ENTRYPOINT exec /opt/jmx_exporter/start_exporter_example.sh
```

The following list explains the four `COPY` lines in this Dockerfile.
+ Download the latest JMX Exporter jar file from [https://github.com/prometheus/jmx\$1exporter](https://github.com/prometheus/jmx_exporter).
+ `config.yaml` is the JMX Exporter configuration file. For more information, see [https://github.com/prometheus/jmx\$1exporter\$1Configuration](https://github.com/prometheus/jmx_exporter#Configuration ).

  Here is a sample configuration file for Java and Tomcat:

  ```
  lowercaseOutputName: true
  lowercaseOutputLabelNames: true
  
  rules:
  - pattern: 'java.lang<type=OperatingSystem><>(FreePhysicalMemorySize|TotalPhysicalMemorySize|FreeSwapSpaceSize|TotalSwapSpaceSize|SystemCpuLoad|ProcessCpuLoad|OpenFileDescriptorCount|AvailableProcessors)'
    name: java_lang_OperatingSystem_$1
    type: GAUGE
  
  - pattern: 'java.lang<type=Threading><>(TotalStartedThreadCount|ThreadCount)'
    name: java_lang_threading_$1
    type: GAUGE
  
  - pattern: 'Catalina<type=GlobalRequestProcessor, name=\"(\w+-\w+)-(\d+)\"><>(\w+)'
    name: catalina_globalrequestprocessor_$3_total
    labels:
      port: "$2"
      protocol: "$1"
    help: Catalina global $3
    type: COUNTER
  
  - pattern: 'Catalina<j2eeType=Servlet, WebModule=//([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), name=([-a-zA-Z0-9+/$%~_-|!.]*), J2EEApplication=none, J2EEServer=none><>(requestCount|maxTime|processingTime|errorCount)'
    name: catalina_servlet_$3_total
    labels:
      module: "$1"
      servlet: "$2"
    help: Catalina servlet $3 total
    type: COUNTER
  
  - pattern: 'Catalina<type=ThreadPool, name="(\w+-\w+)-(\d+)"><>(currentThreadCount|currentThreadsBusy|keepAliveCount|pollerThreadCount|connectionCount)'
    name: catalina_threadpool_$3
    labels:
      port: "$2"
      protocol: "$1"
    help: Catalina threadpool $3
    type: GAUGE
  
  - pattern: 'Catalina<type=Manager, host=([-a-zA-Z0-9+&@#/%?=~_|!:.,;]*[-a-zA-Z0-9+&@#/%=~_|]), context=([-a-zA-Z0-9+/$%~_-|!.]*)><>(processingTime|sessionCounter|rejectedSessions|expiredSessions)'
    name: catalina_session_$3_total
    labels:
      context: "$2"
      host: "$1"
    help: Catalina session $3 total
    type: COUNTER
  
  - pattern: ".*"
  ```
+ `start_exporter_example.sh` is the script to start the JAR application with the Prometheus metrics exported. It also provides the JMX Exporter with the `config.yaml` file path.

  ```
  $ cat start_exporter_example.sh 
  java -javaagent:/opt/jmx_exporter/jmx_prometheus_javaagent-0.12.0.jar=9404:/opt/jmx_exporter/config.yaml -cp  /opt/jmx_exporter/SampleJavaApplication-1.0-SNAPSHOT.jar com.gubupt.sample.app.App
  ```
+ SampleJavaApplication-1.0-SNAPSHOT.jar is the sample Java application jar file. Replace it with the Java application that you want to monitor.

Build a Docker image with this configuration and upload it to an image repository.

# Set up HAProxy with a metric exporter on Amazon EKS and Kubernetes
<a name="ContainerInsights-Prometheus-Sample-Workloads-haproxy"></a>

HAProxy is an open-source proxy application. For more information, see [HAProxy](https://www.haproxy.org).

If you are running HAProxy on a cluster with the Fargate launch type, you need to set up a Fargate profile before doing the steps in this procedure. To set up the profile, enter the following command. Replace *MyCluster* with the name of your cluster.

```
eksctl create fargateprofile --cluster MyCluster \
--namespace haproxy-ingress-sample --name haproxy-ingress-sample
```

**To install HAProxy with a metric exporter to test Container Insights Prometheus support**

1. Enter the following command to add the Helm incubator repo:

   ```
   helm repo add haproxy-ingress https://haproxy-ingress.github.io/charts
   ```

1. Enter the following command to create a new namespace:

   ```
   kubectl create namespace haproxy-ingress-sample
   ```

1. Enter the following commands to install HAProxy:

   ```
   helm install haproxy haproxy-ingress/haproxy-ingress \
   --namespace haproxy-ingress-sample \
   --set defaultBackend.enabled=true \
   --set controller.stats.enabled=true \
   --set controller.metrics.enabled=true \
   --set-string controller.metrics.service.annotations."prometheus\.io/port"="9101" \
   --set-string controller.metrics.service.annotations."prometheus\.io/scrape"="true"
   ```

1. Enter the following command to confirm the annotation of the service:

   ```
   kubectl describe service haproxy-haproxy-ingress-metrics -n haproxy-ingress-sample
   ```

   You should see the following annotations.

   ```
   Annotations:   prometheus.io/port: 9101
                  prometheus.io/scrape: true
   ```

**To uninstall HAProxy**
+ Enter the following commands:

  ```
  helm uninstall haproxy --namespace haproxy-ingress-sample
  kubectl delete namespace haproxy-ingress-sample
  ```

# Tutorial for adding a new Prometheus scrape target: Redis OSS on Amazon EKS and Kubernetes clusters
<a name="ContainerInsights-Prometheus-Setup-redis-eks"></a>

This tutorial provides a hands-on introduction to scrape the Prometheus metrics of a sample Redis OSS application on Amazon EKS and Kubernetes. Redis OSS (https://redis.io/) is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. For more information, see [ redis](https://redis.io/).

redis\$1exporter (MIT License licensed) is used to expose the Redis OSS Prometheus metrics on the specified port (default: 0.0.0.0:9121). For more information, see [ redis\$1exporter](https://github.com/oliver006/redis_exporter).

The Docker images in the following two Docker Hub repositories are used in this tutorial: 
+ [ redis](https://hub.docker.com/_/redis?tab=description)
+ [ redis\$1exporter](https://hub.docker.com/r/oliver006/redis_exporter)

**To install a sample Redis OSS workload which exposes Prometheus metrics**

1. Set the namespace for the sample Redis OSS workload.

   ```
   REDIS_NAMESPACE=redis-sample
   ```

1. If you are running Redis OSS on a cluster with the Fargate launch type, you need to set up a Fargate profile. To set up the profile, enter the following command. Replace *MyCluster* with the name of your cluster.

   ```
   eksctl create fargateprofile --cluster MyCluster \
   --namespace $REDIS_NAMESPACE --name $REDIS_NAMESPACE
   ```

1. Enter the following command to install the sample Redis OSS workload.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_traffic/redis/redis-traffic-sample.yaml \
   | sed "s/{{namespace}}/$REDIS_NAMESPACE/g" \
   | kubectl apply -f -
   ```

1. The installation includes a service named `my-redis-metrics` which exposes the Redis OSS Prometheus metric on port 9121 Enter the following command to get the details of the service: 

   ```
   kubectl describe service/my-redis-metrics  -n $REDIS_NAMESPACE
   ```

   In the `Annotations` section of the results, you'll see two annotations which match the Prometheus scrape configuration of the CloudWatch agent, so that it can auto-discover the workloads:

   ```
   prometheus.io/port: 9121
   prometheus.io/scrape: true
   ```

   The related Prometheus scrape configuration can be found in the `- job_name: kubernetes-service-endpoints` section of `kubernetes-eks.yaml` or `kubernetes-k8s.yaml`.

**To start collecting Redis OSS Prometheus metrics in CloudWatch**

1. Download the latest version of the of `kubernetes-eks.yaml` or `kubernetes-k8s.yaml` file by entering one of the following commands. For an Amazon EKS cluster with the EC2 launch type, enter this command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml
   ```

   For an Amazon EKS cluster with the Fargate launch type, enter this command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks-fargate.yaml
   ```

   For a Kubernetes cluster running on an Amazon EC2 instance, enter this command.

   ```
   curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-k8s.yaml
   ```

1. Open the file with a text editor, and find the `cwagentconfig.json` section. Add the following subsection and save the changes. Be sure that the indentation follows the existing pattern.

   ```
   {
     "source_labels": ["pod_name"],
     "label_matcher": "^redis-instance$",
     "dimensions": [["Namespace","ClusterName"]],
     "metric_selectors": [
       "^redis_net_(in|out)put_bytes_total$",
       "^redis_(expired|evicted)_keys_total$",
       "^redis_keyspace_(hits|misses)_total$",
       "^redis_memory_used_bytes$",
       "^redis_connected_clients$"
     ]
   },
   {
     "source_labels": ["pod_name"],
     "label_matcher": "^redis-instance$",
     "dimensions": [["Namespace","ClusterName","cmd"]],
     "metric_selectors": [
       "^redis_commands_total$"
     ]
   },
   {
     "source_labels": ["pod_name"],
     "label_matcher": "^redis-instance$",
     "dimensions": [["Namespace","ClusterName","db"]],
     "metric_selectors": [
       "^redis_db_keys$"
     ]
   },
   ```

   The section you added puts the Redis OSS metrics onto the CloudWatch agent allow list. For a list of these metrics, see the following section.

1. If you already have the CloudWatch agent with Prometheus support deployed in this cluster, you must delete it by entering the following command.

   ```
   kubectl delete deployment cwagent-prometheus -n amazon-cloudwatch
   ```

1. Deploy the CloudWatch agent with your updated configuration by entering one of the following commands. Replace *MyCluster* and *region* to match your settings.

   For an Amazon EKS cluster with the EC2 launch type, enter this command.

   ```
   kubectl apply -f prometheus-eks.yaml
   ```

   For an Amazon EKS cluster with the Fargate launch type, enter this command.

   ```
   cat prometheus-eks-fargate.yaml \
   | sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" \
   | kubectl apply -f -
   ```

   For a Kubernetes cluster, enter this command.

   ```
   cat prometheus-k8s.yaml \
   | sed "s/{{cluster_name}}/MyCluster/;s/{{region_name}}/region/" \
   | kubectl apply -f -
   ```

## Viewing your Redis OSS Prometheus metrics
<a name="ContainerInsights-Prometheus-Setup-redis-eks-view"></a>

This tutorial sends the following metrics to the **ContainerInsights/Prometheus** namespace in CloudWatch. You can use the CloudWatch console to see the metrics in that namespace.


| Metric name | Dimensions | 
| --- | --- | 
|  `redis_net_input_bytes_total` |  ClusterName, `Namespace`  | 
|  `redis_net_output_bytes_total` |  ClusterName, `Namespace`  | 
|  `redis_expired_keys_total` |  ClusterName, `Namespace`  | 
|  `redis_evicted_keys_total` |  ClusterName, `Namespace`  | 
|  `redis_keyspace_hits_total` |  ClusterName, `Namespace`  | 
|  `redis_keyspace_misses_total` |  ClusterName, `Namespace`  | 
|  `redis_memory_used_bytes` |  ClusterName, `Namespace`  | 
|  `redis_connected_clients` |  ClusterName, `Namespace`  | 
|  `redis_commands_total` |  ClusterName, `Namespace`, cmd  | 
|  `redis_db_keys` |  ClusterName, `Namespace`, db  | 

**Note**  
The value of the **cmd** dimension can be: `append`, `client`, `command`, `config`, `dbsize`, `flushall`, `get`, `incr`, `info`, `latency`, or `slowlog`.  
The value of the **db** dimension can be `db0` to `db15`. 

You can also create a CloudWatch dashboard for your Redis OSS Prometheus metrics.

**To create a dashboard for Redis OSS Prometheus metrics**

1. Create environment variables, replacing the values below to match your deployment.

   ```
   DASHBOARD_NAME=your_cw_dashboard_name
   REGION_NAME=your_metric_region_such_as_us-east-1
   CLUSTER_NAME=your_k8s_cluster_name_here
   NAMESPACE=your_redis_service_namespace_here
   ```

1. Enter the following command to create the dashboard.

   ```
   curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/redis/cw_dashboard_redis.json \
   | sed "s/{{YOUR_AWS_REGION}}/${REGION_NAME}/g" \
   | sed "s/{{YOUR_CLUSTER_NAME}}/${CLUSTER_NAME}/g" \
   | sed "s/{{YOUR_NAMESPACE}}/${NAMESPACE}/g" \
   ```

# Prometheus metric type conversion by the CloudWatch Agent
<a name="ContainerInsights-Prometheus-metrics-conversion"></a>

The Prometheus client libraries offer four core metric types: 
+ Counter
+ Gauge
+ Summary
+ Histogram

The CloudWatch agent supports the counter, gauge, and summary metric types.

 The Prometheus metrics with the unsupported histogram metric type are dropped by the CloudWatch agent. For more information, see [Logging dropped Prometheus metrics](ContainerInsights-Prometheus-troubleshooting-EKS.md#ContainerInsights-Prometheus-troubleshooting-droppedmetrics).

**Gauge metrics**

A Prometheus gauge metric is a metric that represents a single numerical value that can arbitrarily go up and down. The CloudWatch agent scrapes gauge metrics and send these values out directly.

**Counter metrics**

A Prometheus counter metric is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero. The CloudWatch agent calculates a delta from the previous scrape and sends the delta value as the metric value in the log event. So the CloudWatch agent will start to produce one log event from the second scrape and continue with subsequent scrapes, if any.

**Summary metrics**

A Prometheus summary metric is a complex metric type which is represented by multiple data points. It provides a total count of observations and a sum of all observed values. It calculates configurable quantiles over a sliding time window.

The sum and count of a summary metric are cumulative, but the quantiles are not. The following example shows the variance of quantiles.

```
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.123e-06
go_gc_duration_seconds{quantile="0.25"} 9.204e-06
go_gc_duration_seconds{quantile="0.5"} 1.1065e-05
go_gc_duration_seconds{quantile="0.75"} 2.8731e-05
go_gc_duration_seconds{quantile="1"} 0.003841496
go_gc_duration_seconds_sum 0.37630427
go_gc_duration_seconds_count 9774
```

The CloudWatch agent handles the sum and count of a summary metric in the same way as it handles counter metrics, as described in the previous section. The CloudWatch agent preserves the quantile values as they are originally reported.

# Prometheus metrics collected by the CloudWatch agent
<a name="ContainerInsights-Prometheus-metrics"></a>

The CloudWatch agent with Prometheus support automatically collects metrics from several services and workloads. The metrics that are collected by default are listed in the following sections. You can also configure the agent to collect more metrics from these services, and to collect Prometheus metrics from other applications and services. For more information about collecting additional metrics, see [CloudWatch agent configuration for Prometheus](ContainerInsights-Prometheus-Setup-configure-ECS.md#ContainerInsights-Prometheus-Setup-cw-agent-config).

Prometheus metrics collected from Amazon EKS and Kubernetes clusters are in the **ContainerInsights/Prometheus** namespace. Prometheus metrics collected from Amazon ECS clusters are in the **ECS/ContainerInsights/Prometheus** namespace. 

**Topics**
+ [

## Prometheus metrics for App Mesh
](#ContainerInsights-Prometheus-metrics-appmesh)
+ [

## Prometheus metrics for NGINX
](#ContainerInsights-Prometheus-metrics-nginx)
+ [

## Prometheus metrics for Memcached
](#ContainerInsights-Prometheus-metrics-memcached)
+ [

## Prometheus metrics for Java/JMX
](#ContainerInsights-Prometheus-metrics-jmx)
+ [

## Prometheus metrics for HAProxy
](#ContainerInsights-Prometheus-metrics-haproxy)

## Prometheus metrics for App Mesh
<a name="ContainerInsights-Prometheus-metrics-appmesh"></a>

The following metrics are automatically collected from App Mesh .

**Prometheus metrics for App Mesh on Amazon EKS and Kubernetes clusters**


| Metric name | Dimensions | 
| --- | --- | 
|  `envoy_http_downstream_rq_total` |  ClusterName, `Namespace`  | 
|  `envoy_http_downstream_rq_xx` |  ClusterName, `Namespace` ClusterName, `Namespace`, envoy\$1http\$1conn\$1manager\$1prefix, envoy\$1response\$1code\$1class  | 
|  `envoy_cluster_upstream_cx_rx_bytes_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_cx_tx_bytes_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_membership_healthy` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_membership_total` |  ClusterName, `Namespace`  | 
|  `envoy_server_memory_heap_size` |  ClusterName, `Namespace`  | 
|  `envoy_server_memory_allocated` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_cx_connect_timeout` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_pending_failure_eject` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_pending_overflow` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_timeout` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_try_per_timeout` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_rx_reset` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_cx_destroy_local_with_active_rq` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_cx_destroy_remote_active_rq` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_maintenance_mode` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_flow_control_paused_reading_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_flow_control_resumed_reading_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_flow_control_backed_up_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_flow_control_drained_total` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_retry` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_retry_success` |  ClusterName, `Namespace`  | 
|  `envoy_cluster_upstream_rq_retry_overflow` |  ClusterName, `Namespace`  | 
|  `envoy_server_live` |  ClusterName, `Namespace`  | 
|  `envoy_server_uptime` |  ClusterName, `Namespace`  | 

**Prometheus metrics for App Mesh on Amazon ECS clusters**


| Metric name | Dimensions | 
| --- | --- | 
|  `envoy_http_downstream_rq_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_http_downstream_rq_xx` |  ClusterName, `TaskDefinitionFamily` | 
|  `envoy_cluster_upstream_cx_rx_bytes_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_cx_tx_bytes_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_membership_healthy` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_membership_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_server_memory_heap_size` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_server_memory_allocated` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_cx_connect_timeout` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_pending_failure_eject` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_pending_overflow` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_timeout` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_try_per_timeout` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_rx_reset` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_cx_destroy_local_with_active_rq` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_cx_destroy_remote_active_rq` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_maintenance_mode` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_flow_control_paused_reading_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_flow_control_resumed_reading_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_flow_control_backed_up_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_flow_control_drained_total` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_retry` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_retry_success` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_cluster_upstream_rq_retry_overflow` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_server_live` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_server_uptime` |  ClusterName, `TaskDefinitionFamily`  | 
|  `envoy_http_downstream_rq_xx` |  ClusterName, TaskDefinitionFamily, envoy\$1http\$1conn\$1manager\$1prefix, envoy\$1response\$1code\$1class ClusterName, TaskDefinitionFamily, envoy\$1response\$1code\$1class | 

**Note**  
`TaskDefinitionFamily` is the Kubernetes namespace of the mesh.  
The value of `envoy_http_conn_manager_prefix` can be `ingress`, `egress`, or `admin`.   
The value of `envoy_response_code_class` can be `1` (stands for `1xx`), `2` stands for `2xx`), `3` stands for `3xx`), `4` stands for `4xx`), or `5` stands for `5xx`). 

## Prometheus metrics for NGINX
<a name="ContainerInsights-Prometheus-metrics-nginx"></a>

The following metrics are automatically collected from NGINX on Amazon EKS and Kubernetes clusters.


| Metric name | Dimensions | 
| --- | --- | 
|  `nginx_ingress_controller_nginx_process_cpu_seconds_total` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_success` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_requests` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_nginx_process_connections` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_nginx_process_connections_total` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_nginx_process_resident_memory_bytes` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_config_last_reload_successful` |  ClusterName, `Namespace`, Service  | 
|  `nginx_ingress_controller_requests` |  ClusterName, `Namespace`, Service, status  | 

## Prometheus metrics for Memcached
<a name="ContainerInsights-Prometheus-metrics-memcached"></a>

The following metrics are automatically collected from Memcached on Amazon EKS and Kubernetes clusters.


| Metric name | Dimensions | 
| --- | --- | 
|  `memcached_current_items` |  ClusterName, `Namespace`, Service  | 
|  `memcached_current_connections` |  ClusterName, `Namespace`, Service  | 
|  `memcached_limit_bytes` |  ClusterName, `Namespace`, Service  | 
|  `memcached_current_bytes` |  ClusterName, `Namespace`, Service  | 
|  `memcached_written_bytes_total` |  ClusterName, `Namespace`, Service  | 
|  `memcached_read_bytes_total` |  ClusterName, `Namespace`, Service  | 
|  `memcached_items_evicted_total` |  ClusterName, `Namespace`, Service  | 
|  `memcached_items_reclaimed_total` |  ClusterName, `Namespace`, Service  | 
|  `memcached_commands_total` |  ClusterName, `Namespace`, Service ClusterName, `Namespace`, Service, command ClusterName, `Namespace`, Service, status, command  | 

## Prometheus metrics for Java/JMX
<a name="ContainerInsights-Prometheus-metrics-jmx"></a>

**Metrics collected on Amazon EKS and Kubernetes clusters**

On Amazon EKS and Kubernetes clusters, Container Insights can collect the following predefined Prometheus metrics from the Java Virtual Machine (JVM), Java, and Tomcat (Catalina) using the JMX Exporter. For more information, see [ prometheus/jmx\$1exporter](https://github.com/prometheus/jmx_exporter) on Github.

**Java/JMX on Amazon EKS and Kubernetes clusters**


| Metric name | Dimensions | 
| --- | --- | 
|  `jvm_classes_loaded` |  `ClusterName`, `Namespace`  | 
|  `jvm_threads_current` |  `ClusterName`, `Namespace`  | 
|  `jvm_threads_daemon` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_totalswapspacesize` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_systemcpuload` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_processcpuload` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_freeswapspacesize` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_totalphysicalmemorysize` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_freephysicalmemorysize` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_openfiledescriptorcount` |  `ClusterName`, `Namespace`  | 
|  `java_lang_operatingsystem_availableprocessors` |  `ClusterName`, `Namespace`  | 
|  `jvm_memory_bytes_used` |  `ClusterName`, `Namespace`, area  | 
|  `jvm_memory_pool_bytes_used` |  `ClusterName`, `Namespace`, pool  | 

**Note**  
The values of the `area` dimension can be `heap` or `nonheap`.  
The values of the `pool` dimension can be `Tenured Gen`, `Compress Class Space`, `Survivor Space`, `Eden Space`, `Code Cache`, or `Metaspace`.

**Tomcat/JMX on Amazon EKS and Kubernetes clusters**

In addition to the Java/JMX metrics in the previous table, the following metrics are also collected for the Tomcat workload.


| Metric name | Dimensions | 
| --- | --- | 
|  `catalina_manager_activesessions` |  `ClusterName`, `Namespace`  | 
|  `catalina_manager_rejectedsessions` |  `ClusterName`, `Namespace`  | 
|  `catalina_globalrequestprocessor_bytesreceived` |  `ClusterName`, `Namespace`  | 
|  `catalina_globalrequestprocessor_bytessent` |  `ClusterName`, `Namespace`  | 
|  `catalina_globalrequestprocessor_requestcount` |  `ClusterName`, `Namespace`  | 
|  `catalina_globalrequestprocessor_errorcount` |  `ClusterName`, `Namespace`  | 
|  `catalina_globalrequestprocessor_processingtime` |  `ClusterName`, `Namespace`  | 

**Java/JMX on Amazon ECS clusters**


| Metric name | Dimensions | 
| --- | --- | 
|  `jvm_classes_loaded` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `jvm_threads_current` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `jvm_threads_daemon` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_totalswapspacesize` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_systemcpuload` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_processcpuload` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_freeswapspacesize` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_totalphysicalmemorysize` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_freephysicalmemorysize` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_openfiledescriptorcount` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `java_lang_operatingsystem_availableprocessors` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `jvm_memory_bytes_used` |  `ClusterName`, TaskDefinitionFamily, area  | 
|  `jvm_memory_pool_bytes_used` |  `ClusterName`, TaskDefinitionFamily, pool  | 

**Note**  
The values of the `area` dimension can be `heap` or `nonheap`.  
The values of the `pool` dimension can be `Tenured Gen`, `Compress Class Space`, `Survivor Space`, `Eden Space`, `Code Cache`, or `Metaspace`.

**Tomcat/JMX on Amazon ECS clusters**

In addition to the Java/JMX metrics in the previous table, the following metrics are also collected for the Tomcat workload on Amazon ECS clusters.


| Metric name | Dimensions | 
| --- | --- | 
|  `catalina_manager_activesessions` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_manager_rejectedsessions` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_globalrequestprocessor_bytesreceived` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_globalrequestprocessor_bytessent` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_globalrequestprocessor_requestcount` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_globalrequestprocessor_errorcount` |  `ClusterName`, `TaskDefinitionFamily`  | 
|  `catalina_globalrequestprocessor_processingtime` |  `ClusterName`, `TaskDefinitionFamily`  | 

## Prometheus metrics for HAProxy
<a name="ContainerInsights-Prometheus-metrics-haproxy"></a>

The following metrics are automatically collected from HAProxy on Amazon EKS and Kubernetes clusters.

The metrics collected depend on which version of HAProxy Ingress that you are using. For more information about HAProxy Ingress and its versions, see [ haproxy-ingress](https://artifacthub.io/packages/helm/haproxy-ingress/haproxy-ingress).


| Metric name | Dimensions | Availability | 
| --- | --- | --- | 
|  `haproxy_backend_bytes_in_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_backend_bytes_out_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_backend_connection_errors_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_backend_connections_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_backend_current_sessions` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_backend_http_responses_total` |  `ClusterName`, `Namespace`, Service, code, backend  | All versions of HAProxy Ingress | 
|  `haproxy_backend_status` |  `ClusterName`, `Namespace`, Service  |  Only in versions 0.10 or later of HAProxy Ingress  | 
|  `haproxy_backend_up` |  `ClusterName`, `Namespace`, Service  |  Only in versions of HAProxy Ingress earlier than 0.10  | 
|  `haproxy_frontend_bytes_in_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_bytes_out_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_connections_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_current_sessions` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_http_requests_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_http_responses_total` |  `ClusterName`, `Namespace`, Service, code, frontend  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_request_errors_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 
|  `haproxy_frontend_requests_denied_total` |  `ClusterName`, `Namespace`, Service  | All versions of HAProxy Ingress | 

**Note**  
The values of the `code` dimension can be `1xx`, `2xx`, `3xx`, `4xx`, `5xx`, or `other`.  
The values of the `backend` dimension can be:  
`http-default-backend`, `http-shared-backend`, or `httpsback-shared-backend` for HAProxy Ingress version 0.0.27 or earlier.
`_default_backend` for HAProxy Ingress versions later than 0.0.27.
The values of the `frontend` dimension can be:  
`httpfront-default-backend`, `httpfront-shared-frontend`, or `httpfronts` for HAProxy Ingress version 0.0.27 or earlier.
`_front_http` or `_front_https` for HAProxy Ingress versions later than 0.0.27.

# Viewing your Prometheus metrics
<a name="ContainerInsights-Prometheus-viewmetrics"></a>

You can monitor and alarm on all your Prometheus metrics including the curated pre-aggregated metrics from App Mesh, NGINX, Java/JMX, Memcached, and HAProxy, and any other manually configured Prometheus exporter you may have added. For more information about collecting metrics from other Prometheus exporters, see [Tutorial for adding a new Prometheus scrape target: Prometheus API Server metrics](ContainerInsights-Prometheus-Setup-configure.md#ContainerInsights-Prometheus-Setup-new-exporters).

In the CloudWatch console, Container Insights provides the following pre-built reports: 
+ For Amazon EKS and Kubernetes clusters, there are pre-built reports for App Mesh, NGINX, HAPROXY, Memcached, and Java/JMX.
+ For Amazon ECS clusters, there are pre-built reports for App Mesh and Java/JMX.

Container Insights also provides custom dashboards for each of the workloads that Container Insights collects curated metrics from. You can download these dashboards from GitHub 

**To see all your Prometheus metrics**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Metrics**.

1. In the list of namespaces, choose **ContainerInsights/Prometheus** or **ECS/ContainerInsights/Prometheus**.

1. Choose one of the sets of dimensions in the following list. Then select the checkbox next to the metrics that you want to see.

**To see pre-built reports on your Prometheus metrics**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Performance Monitoring**.

1. In the drop-down box near the top of the page, choose any of the Prometheus options.

   In the other drop-down box, choose a cluster to view

We have also provided custom dashboards for NGINX, App Mesh, Memcached, HAProxy, and Java/JMX.

**To use a custom dashboard that Amazon has provided**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Dashboards**.

1. Choose **Create Dashboard**. Enter a name for the new dashboard, and choose **Create dashboard**.

1. In **Add to this dashboard**, choose **Cancel**.

1. Choose **Actions**, **View/edit source**.

1. Download one of the following JSON files:
   + [ NGINX custom dashboard source on Github](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/nginx-ingress/cw_dashboard_nginx_ingress_controller.json).
   + [ App Mesh custom dashboard source on Github](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/appmesh/cw_dashboard_awsappmesh.json).
   + [ Memcached custom dashboard source on Github](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/memcached/cw_dashboard_memcached.json)
   + [ HAProxy-Ingress custom dashboard source on Github](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/haproxy-ingress/cw_dashboard_haproxy_ingress.json)
   + [ Java/JMX custom dashboard source on Github](https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_cloudwatch_dashboards/javajmx/cw_dashboard_javajmx.json).

1. Open the JSON file that you downloaded with a text editor, and make the following changes:
   + Replace all the `{{YOUR_CLUSTER_NAME}}` strings with the exact name of your cluster. Make sure not to add whitespaces before or after the text.
   + Replace all the `{{YOUR_REGION}}` strings with the AWS Region where your cluster is running. For example, **us-west-1** Make sure not to add whitespaces before or after the text. 
   + Replace all the `{{YOUR_NAMESPACE}}` strings with the exact namespace of your workload.
   + Replace all the `{{YOUR_SERVICE_NAME}}` strings with the exact service name of your workload. For example, **haproxy-haproxy-ingress-controller-metrics**

1. Copy the entire JSON blob and paste it into the text box in the CloudWatch console, replacing what is already in the box.

1. Choose **Update**, **Save dashboard**.

# Prometheus metrics troubleshooting
<a name="ContainerInsights-Prometheus-troubleshooting"></a>

This section provides help for troubleshooting your Prometheus metrics setup. 

**Topics**
+ [

# Prometheus metrics troubleshooting on Amazon ECS
](ContainerInsights-Prometheus-troubleshooting-ECS.md)
+ [

# Prometheus metrics troubleshooting on Amazon EKS and Kubernetes clusters
](ContainerInsights-Prometheus-troubleshooting-EKS.md)

# Prometheus metrics troubleshooting on Amazon ECS
<a name="ContainerInsights-Prometheus-troubleshooting-ECS"></a>

This section provides help for troubleshooting your Prometheus metrics setup on Amazon ECS clusters. 

## I don't see Prometheus metrics sent to CloudWatch Logs
<a name="ContainerInsights-Prometheus-troubleshooting-ECS-nometrics"></a>

The Prometheus metrics should be ingested as log events in the log group **/aws/ecs/containerinsights/cluster-name/Prometheus**. If the log group is not created or the Prometheus metrics are not sent to the log group, you will need to first check whether the Prometheus targets have been successfully discovered by the CloudWatch agent. And next check the security group and permission settings of the CloudWatch agent. The following steps guide you to do the debugging.

**Step 1: Enable the CloudWatch agent debugging mode**

First, change the CloudWatch agent to debug mode by adding the following bold lines to your CloudFormation template file, `cwagent-ecs-prometheus-metric-for-bridge-host.yaml` or `cwagent-ecs-prometheus-metric-for-awsvpc.yaml`. Then save the file.

```
cwagentconfig.json: |
    {
      "agent": {
        "debug": true
      },
      "logs": {
        "metrics_collected": {
```

Create a new CloudFormation changeset against the existing stack. Set other parameters in the changeset to the same values as in your existing CloudFormation stack. The following example is for a CloudWatch agent installed in an Amazon ECS cluster using the EC2 launch type and the bridge network mode.

```
ECS_NETWORK_MODE=bridge
 CREATE_IAM_ROLES=True
ECS_TASK_ROLE_NAME=your_selected_ecs_task_role_name
ECS_EXECUTION_ROLE_NAME=your_selected_ecs_execution_role_name
NEW_CHANGESET_NAME=your_selected_ecs_execution_role_name

aws cloudformation create-change-set --stack-name CWAgent-Prometheus-ECS-${ECS_CLUSTER_NAME}-EC2-${ECS_NETWORK_MODE} \
    --template-body file://cwagent-ecs-prometheus-metric-for-bridge-host.yaml \
    --parameters ParameterKey=ECSClusterName,ParameterValue=$ECS_CLUSTER_NAME \
                 ParameterKey=CreateIAMRoles,ParameterValue=$CREATE_IAM_ROLES \
                 ParameterKey=ECSNetworkMode,ParameterValue=$ECS_NETWORK_MODE \
                 ParameterKey=TaskRoleName,ParameterValue=$ECS_TASK_ROLE_NAME \
                 ParameterKey=ExecutionRoleName,ParameterValue=$ECS_EXECUTION_ROLE_NAME \
    --capabilities CAPABILITY_NAMED_IAM \
    --region $AWS_REGION \
    --change-set-name $NEW_CHANGESET_NAME
```

Go to the CloudFormation console to review the new changeset, `$NEW_CHANGESET_NAME`. There should be one change applied to the **CWAgentConfigSSMParameter** resource. Execute the changeset and restart the CloudWatch agent task by entering the following commands.

```
aws ecs update-service --cluster $ECS_CLUSTER_NAME \
--desired-count 0 \
--service your_service_name_here \
--region $AWS_REGION
```

Wait about 10 seconds and then enter the following command.

```
aws ecs update-service --cluster $ECS_CLUSTER_NAME \
--desired-count 1 \
--service your_service_name_here \
--region $AWS_REGION
```

**Step 2: Check the ECS service discovery logs**

The ECS task definition of the CloudWatch agent enables the logs by default in the section below. The logs are sent to CloudWatch Logs in the log group **/ecs/ecs-cwagent-prometheus**.

```
LogConfiguration:
  LogDriver: awslogs
    Options:
      awslogs-create-group: 'True'
      awslogs-group: "/ecs/ecs-cwagent-prometheus"
      awslogs-region: !Ref AWS::Region
      awslogs-stream-prefix: !Sub 'ecs-${ECSLaunchType}-awsvpc'
```

Filter the logs by the string `ECS_SD_Stats` to get the metrics related to the ECS service discovery, as shown in the following example.

```
2020-09-1T01:53:14Z D! ECS_SD_Stats: AWSCLI_DescribeContainerInstances: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: AWSCLI_DescribeInstancesRequest: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: AWSCLI_DescribeTaskDefinition: 2
2020-09-1T01:53:14Z D! ECS_SD_Stats: AWSCLI_DescribeTasks: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: AWSCLI_ListTasks: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: Exporter_DiscoveredTargetCount: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: LRUCache_Get_EC2MetaData: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: LRUCache_Get_TaskDefinition: 2
2020-09-1T01:53:14Z D! ECS_SD_Stats: LRUCache_Size_ContainerInstance: 1
2020-09-1T01:53:14Z D! ECS_SD_Stats: LRUCache_Size_TaskDefinition: 2
2020-09-1T01:53:14Z D! ECS_SD_Stats: Latency: 43.399783ms
```

The meaning of each metric for a particular ECS service discovery cycle is as follows:
+ **AWSCLI\$1DescribeContainerInstances** – the number of `ECS::DescribeContainerInstances` API calls made.
+ **AWSCLI\$1DescribeInstancesRequest** – the number of `ECS::DescribeInstancesRequest` API calls made.
+ **AWSCLI\$1DescribeTaskDefinition** – the number of `ECS::DescribeTaskDefinition` API calls made.
+ **AWSCLI\$1DescribeTasks** – the number of `ECS::DescribeTasks` API calls made.
+ **AWSCLI\$1ListTasks** – the number of `ECS::ListTasks` API calls made.
+ **ExporterDiscoveredTargetCount** – the number of Prometheus targets that were discovered and successfully exported into the target result file within the container.
+ **LRUCache\$1Get\$1EC2MetaData** – the number of times that container instances metadata was retrieved from the cache.
+ **LRUCache\$1Get\$1TaskDefinition** – the number of times that ECS task definition metadata was retrieved from the cache.
+ **LRUCache\$1Size\$1ContainerInstance** – the number of unique container instance's metadata cached in memory.
+ **LRUCache\$1Size\$1TaskDefinition** – the number of unique ECS task definitions cached in memory.
+ **Latency** – how long the service discovery cycle takes.

Check the value of `ExporterDiscoveredTargetCount` to see whether the discovered Prometheus targets match your expectations. If not, the possible reasons are as follows:
+ The configuration of ECS service discovery might not match your application's setting. For the docker label-based service discovery, your target containers may not have the necessary docker label configured in the CloudWatch agent to auto discover them. For the ECS task definition ARN regular expression-based service discovery, the regex setting in the CloudWatch agent may not match your application’s task definition. 
+ The CloudWatch agent’s ECS task role might not have permission to retrieve the metadata of ECS tasks. Check that the CloudWatch agent has been granted the following read-only permissions:
  + `ec2:DescribeInstances`
  + `ecs:ListTasks`
  + `ecs:DescribeContainerInstances`
  + `ecs:DescribeTasks`
  + `ecs:DescribeTaskDefinition`

**Step 3: Check the network connection and the ECS task role policy**

If there are still no log events sent to the target CloudWatch Logs log group even though the value of `Exporter_DiscoveredTargetCount` indicates that there are discovered Prometheus targets, this could be caused by one of the following:
+ The CloudWatch agent might not be able to connect to the Prometheus target ports. Check the security group setting behind the CloudWatch agent. The private IP should alow the CloudWatch agent to connect to the Prometheus exporter ports. 
+ The CloudWatch agent’s ECS task role might not have the **CloudWatchAgentServerPolicy** managed policy. The CloudWatch agent’s ECS task role needs to have this policy to be able to send the Prometheus metrics as log events. If you used the sample CloudFormation template to create the IAM roles automatically, both the ECS task role and the ECS execution role are granted with the least privilege to perform the Prometheus monitoring. 

# Prometheus metrics troubleshooting on Amazon EKS and Kubernetes clusters
<a name="ContainerInsights-Prometheus-troubleshooting-EKS"></a>

This section provides help for troubleshooting your Prometheus metrics setup on Amazon EKS and Kubernetes clusters. 

## General troubleshooting steps on Amazon EKS
<a name="ContainerInsights-Prometheus-troubleshooting-general"></a>

To confirm that the CloudWatch agent is running, enter the following command.

```
kubectl get pod -n amazon-cloudwatch
```

The output should include a row with `cwagent-prometheus-id` in the `NAME` column and `Running` in the `STATUS column.`

To display details about the running pod, enter the following command. Replace *pod-name* with the complete name of your pod that has a name that starts with `cw-agent-prometheus`.

```
kubectl describe pod pod-name -n amazon-cloudwatch
```

If you have CloudWatch Container Insights installed, you can use CloudWatch Logs Insights to query the logs from the CloudWatch agent collecting the Prometheus metrics.

**To query the application logs**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **CloudWatch Logs Insights**.

1. Select the log group for the application logs, **/aws/containerinsights/*cluster-name*/application**

1. Replace the search query expression with the following query, and choose **Run query**

   ```
   fields ispresent(kubernetes.pod_name) as haskubernetes_pod_name, stream, kubernetes.pod_name, log | 
   filter haskubernetes_pod_name and kubernetes.pod_name like /cwagent-prometheus
   ```

You can also confirm that Prometheus metrics and metadata are being ingested as CloudWatch Logs events.

**To confirm that Prometheus data is being ingested**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **CloudWatch Logs Insights**.

1. Select the **/aws/containerinsights/*cluster-name*/prometheus**

1. Replace the search query expression with the following query, and choose **Run query**

   ```
   fields @timestamp, @message | sort @timestamp desc | limit 20
   ```

## Logging dropped Prometheus metrics
<a name="ContainerInsights-Prometheus-troubleshooting-droppedmetrics"></a>

This release does not collect Prometheus metrics of the histogram type. You can use the CloudWatch agent to check whether any Prometheus metrics are being dropped because they are histogram metrics. You can also log a list of the first 500 Prometheus metrics that are dropped and not sent to CloudWatch because they are histogram metrics.

To see whether any metrics are being dropped, enter the following command:

```
kubectl logs -l "app=cwagent-prometheus" -n amazon-cloudwatch --tail=-1
```

If any metrics are being dropped, you will see the following lines in the `/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log` file.

```
I! Drop Prometheus metrics with unsupported types. Only Gauge, Counter and Summary are supported.
I! Please enable CWAgent debug mode to view the first 500 dropped metrics
```

If you see those lines and want to know what metrics are being dropped, use the following steps.

**To log a list of dropped Prometheus metrics**

1. Change the CloudWatch agent to debug mode by adding the following bold lines to your `prometheus-eks.yaml` or `prometheus-k8s.yaml`file, and save the file.

   ```
   {
         "agent": {
           "debug": true
         },
   ```

   This section of the file should then look like this:

   ```
   cwagentconfig.json: |
       {
         "agent": {
           "debug": true
         },
         "logs": {
           "metrics_collected": {
   ```

1. Reinstall the CloudWatch agent to enable debug mode by entering the following commands:

   ```
   kubectl delete deployment cwagent-prometheus -n amazon-cloudwatch
   kubectl apply -f prometheus.yaml
   ```

   The dropped metrics are logged in the CloudWatch agent pod.

1. To retrieve the logs from the CloudWatch agent pod, enter the following command:

   ```
   kubectl logs -l "app=cwagent-prometheus" -n amazon-cloudwatch --tail=-1
   ```

   Or, if you have Container Insights Fluentd logging installed, the logs are also saved in the CloudWatch Logs log group **/aws/containerinsights/*cluster\$1name*/application**.

   To query these logs, you can follow the steps for querying the application logs in [General troubleshooting steps on Amazon EKS](#ContainerInsights-Prometheus-troubleshooting-general).

## Where are the Prometheus metrics ingested as CloudWatch Logs log events?
<a name="ContainerInsights-Prometheus-troubleshooting-metrics_ingested"></a>

The CloudWatch agent creates a log stream for each Prometheus scrape job configuration. For example, in the `prometheus-eks.yaml` and `prometheus-k8s.yaml` files, the line `job_name: 'kubernetes-pod-appmesh-envoy'` scrapes App Mesh metrics. The Prometheus target is defined as `kubernetes-pod-appmesh-envoy`. So all App Mesh Prometheus metrics are ingested as CloudWatch Logs events in the log stream **kubernetes-pod-appmesh-envoy** under the log group named **/aws/containerinsights/cluster-name/Prometheus**.

## I don't see Amazon EKS or Kubernetes Prometheus metrics in CloudWatch metrics
<a name="ContainerInsights-Prometheus-troubleshooting-no-metrics"></a>

First, make sure that the Prometheus metrics are ingested as log events in the log group **/aws/containerinsights/cluster-name/Prometheus**. Use the information in [Where are the Prometheus metrics ingested as CloudWatch Logs log events?](#ContainerInsights-Prometheus-troubleshooting-metrics_ingested) to help you check the target log stream. If the log stream is not created or there are no new log events in the log stream, check the following:
+ Check that the Prometheus metrics exporter endpoints are set up correctly
+ Check that the Prometheus scraping configurations in the `config map: cwagent-prometheus` section of the CloudWatch agent YAML file is correct. The configuration should be the same as it would be in a Prometheus configuration file. For more information, see [<scrape\$1config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) in the Prometheus documentation.

If the Prometheus metrics are ingested as log events correctly, check that the embedded metric format settings are added into the log events to generate the CloudWatch metrics.

```
"CloudWatchMetrics":[
   {
      "Metrics":[
         {
            "Name":"envoy_http_downstream_cx_destroy_remote_active_rq"
         }
      ],
      "Dimensions":[
         [
            "ClusterName",
            "Namespace"
         ]
      ],
      "Namespace":"ContainerInsights/Prometheus"
   }
],
```

For more information about embedded metric format, see [Specification: Embedded metric format](CloudWatch_Embedded_Metric_Format_Specification.md).

If there is no embedded metric format in the log events, check that the `metric_declaration` section is configured correctly in the `config map: prometheus-cwagentconfig` section of the CloudWatch agent installation YAML file. For more information, see [Tutorial for adding a new Prometheus scrape target: Prometheus API Server metrics](ContainerInsights-Prometheus-Setup-configure.md#ContainerInsights-Prometheus-Setup-new-exporters).

# Integration with Application Insights
<a name="container-insights-appinsights"></a>

Amazon CloudWatch Application Insights helps you monitor your applications and identifies and sets up key metrics, logs, and alarms across your application resources and technology stack. For more information, see [Detect common application problems with CloudWatch Application Insights](cloudwatch-application-insights.md).

You can enable Application Insights to gather additional data from your containerized applications and microservices. If you haven't done this already, you can enable it by choosing **Auto-configure Application Insights** below the performance view in the Container Insights dashboard.

If you have already set up CloudWatch Application Insights to monitor your containerized applications, the Application Insights dashboard appears below the Container Insights dashboard.

For more information about Application Insights and containerized applications, see [Enable Application Insights for Amazon ECS and Amazon EKS resource monitoring](appinsights-setting-up-console.md#appinsights-container-insights).

# Viewing Amazon ECS lifecycle events within Container Insights
<a name="container-insights-ECS-lifecycle-events"></a>

You can view Amazon ECS lifecycle events within the Container Insights console. This helps you correlate your container metrics, logs, and events in a single view to give you a more complete operational visibility.

The events include container instance state change events, task state change events, and service action events. They are automatically sent by Amazon ECS to Amazon EventBridge and are also collected in CloudWatch in event log format. For more information about these events, see [Amazon ECS events](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html).

Standard Container Insights pricing applies for Amazon ECS Lifecycle events. For more information, see [Amazon CloudWatch Pricing](https://aws.amazon.com/cloudwatch/pricing/).

To configure the table of lifecycle events and create rules for a cluster, you must have the `events:PutRule`, `events:PutTargets`, and `logs:CreateLogGroup` permissions. You must also make sure that there is a resource policy that enables EventBridge to create the log stream and send logs to CloudWatch Logs. If this resource policy doesn't exist, you can enter the following command to create it:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Principal": {
        "Service": ["events.amazonaws.com", "delivery.logs.amazonaws.com"]
      },
      "Resource": "arn:aws:logs:us-east-1:111122223333:log-group:/aws/events/ecs/containerinsights/*:*",
      "Condition": {
        "StringEquals": {
        "aws:SourceAccount": "111122223333"
        },
        "ArnLike": {
        "aws:SourceArn": "arn:aws:events:us-east-1:111122223333:rule/eventsToLog*"
        }
      },
      "Sid": "TrustEventBridgeToStoreECSLifecycleLogEvents"
    }
  ]
}
```

------

You can use the following command to check whether you already have this policy, and to confirm that attaching it worked correctly.

```
aws logs describe-resource-policies --region region --output json
```

To view the table of lifecycle events, you must have the `events:DescribeRule`, `events:ListTargetsByRule`, and `logs:DescribeLogGroups` permissions.

**To view Amazon ECS lifecycle events in the CloudWatch Container Insights console**

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. Choose **Insights**, **Container Insights**.

1. Choose **View performance dashboards**. 

1. In the next drop-down, choose either **ECS Clusters**, **ECS Services**, or **ECS Tasks**.

1. If you chose **ECS Services** or **ECS Tasks** in the previous step, choose the **Lifecycle events** tab.

1. At the bottom of the page, if you see **Configure lifecycle events**, choose it to create EventBridge rules for your cluster.

   The events are displayed below the container insights panes and above the Application Insights section. To run extra analytics and create additional visualizations on these events, choose **View in Logs Insights** in the Lifecycle Events table.

# Troubleshooting Container Insights
<a name="ContainerInsights-troubleshooting"></a>

The following sections can help if you're having trouble issues with Container Insights.

## Failed deployment on Amazon EKS or Kubernetes
<a name="ContainerInsights-setup-EKS-troubleshooting-general"></a>

If the agent doesn't deploy correctly on a Kubernetes cluster, try the following:
+ Run the following command to get the list of pods.

  ```
  kubectl get pods -n amazon-cloudwatch
  ```
+ Run the following command and check the events at the bottom of the output.

  ```
  kubectl describe pod pod-name -n amazon-cloudwatch
  ```
+ Run the following command to check the logs.

  ```
  kubectl logs pod-name -n amazon-cloudwatch
  ```

## Unauthorized panic: Cannot retrieve cadvisor data from kubelet
<a name="ContainerInsights-setup-EKS-troubleshooting-permissions"></a>

If your deployment fails with the error `Unauthorized panic: Cannot retrieve cadvisor data from kubelet`, your kubelet might not have Webhook authorization mode enabled. This mode is required for Container Insights. For more information, see [Verifying prerequisites for Container Insights in CloudWatch](Container-Insights-prerequisites.md).

## Deploying Container Insights on a deleted and re-created cluster on Amazon ECS
<a name="ContainerInsights-troubleshooting-recreate"></a>

If you delete an existing Amazon ECS cluster that does not have Container Insights enabled, and you re-create it with the same name, you can't enable Container Insights on this new cluster at the time you re-create it. You can enable it by re-creating it, and then entering the following command:

```
aws ecs update-cluster-settings --cluster myCICluster --settings name=container Insights,value=enabled
```

## Invalid endpoint error
<a name="ContainerInsights-setup-invalid-endpoint"></a>

If you see an error message similar to the following, check to make sure that you replaced all the placeholders such as *cluster-name* and *region-name* in the commands that you are using with the correct information for your deployment.

```
"log": "2020-04-02T08:36:16Z E! cloudwatchlogs: code: InvalidEndpointURL, message: invalid endpoint uri, original error: &url.Error{Op:\"parse\", URL:\"https://logs.{{region_name}}.amazonaws.com/\", Err:\"{\"}, &awserr.baseError{code:\"InvalidEndpointURL\", message:\"invalid endpoint uri\", errs:[]error{(*url.Error)(0xc0008723c0)}}\n",
```

## Metrics don't appear in the console
<a name="ContainerInsights-setup-EKS-troubleshooting-nometrics"></a>

If you don't see any Container Insights metrics in the AWS Management Console, be sure that you have completed the setup of Container Insights. Metrics don't appear before Container Insights has been set up completely. For more information, see [Setting up Container Insights](deploy-container-insights.md).

## Pod metrics missing on Amazon EKS or Kubernetes after upgrading cluster
<a name="ContainerInsights-troubleshooting-podmetrics-missing"></a>

This section might be useful if all or some pod metrics are missing after you deploy the CloudWatch agent as a daemonset on a new or upgraded cluster, or you see an error log with the message `W! No pod metric collected`.

These errors can be caused by changes in the container runtime, such as containerd or the docker systemd cgroup driver. You can usually solve this by updating your deployment manifest so that the containerd socket from the host is mounted into the container. See the following example:

```
# For full example see https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cloudwatch-agent
  namespace: amazon-cloudwatch
spec:
  template:
    spec:
      containers:
        - name: cloudwatch-agent
# ...
          # Don't change the mountPath
          volumeMounts:
# ...
            - name: dockersock
              mountPath: /var/run/docker.sock
              readOnly: true
            - name: varlibdocker
              mountPath: /var/lib/docker
              readOnly: true
            - name: containerdsock # NEW mount
              mountPath: /run/containerd/containerd.sock
              readOnly: true
# ...
      volumes:
# ...
        - name: dockersock
          hostPath:
            path: /var/run/docker.sock
        - name: varlibdocker
          hostPath:
            path: /var/lib/docker
        - name: containerdsock # NEW volume
          hostPath:
            path: /run/containerd/containerd.sock
```

## No pod metrics when using Bottlerocket for Amazon EKS
<a name="ContainerInsights-troubleshooting-bottlerocket"></a>

Bottlerocket is a Linux-based open source operating system that is purpose-built by AWS for running containers. 

Bottlerocket uses a different `containerd` path on the host, so you need to change the volumes to its location. If you don't, you see an error in the logs that includes `W! No pod metric collected`. See the following example.

```
volumes:
  # ... 
    - name: containerdsock
      hostPath:
        # path: /run/containerd/containerd.sock
        # bottlerocket does not mount containerd sock at normal place
        # https://github.com/bottlerocket-os/bottlerocket/commit/91810c85b83ff4c3660b496e243ef8b55df0973b
        path: /run/dockershim.sock
```

## No container filesystem metrics when using the containerd runtime for Amazon EKS or Kubernetes
<a name="ContainerInsights-troubleshooting-containerd"></a>

This is a known issue and is being worked on by community contributors. For more information, see [Disk usage metric for containerd](https://github.com/google/cadvisor/issues/2785) and [container file system metrics is not supported by cadvisor for containerd](https://github.com/aws/amazon-cloudwatch-agent/issues/192) on GitHub.

## Unexpected log volume increase from CloudWatch agent when collecting Prometheus metrics
<a name="ContainerInsights-troubleshooting-log-volume-increase"></a>

This was a regression introduced in version 1.247347.6b250880 of the CloudWatch agent. This regression has already been fixed in more recent versions of the agent. It's impact was limited to scenarios where customers collected the logs of the CloudWatch agent itself and were also using Prometheus. For more information, see [[prometheus] agent is printing all the scraped metrics in log](https://github.com/aws/amazon-cloudwatch-agent/issues/209) on GitHub.

## Latest docker image mentioned in release notes not found from Dockerhub
<a name="ContainerInsights-troubleshooting-docker-image"></a>

We update the release note and tag on Github before we start the actual release internally. It usually takes 1-2 weeks to see the latest docker image on registries after we bump the version number on Github. There is no nightly release for the CloudWatch agent container image. You can build the image directly from source at the following location: [https://github.com/aws/amazon-cloudwatch-agent/tree/main/amazon-cloudwatch-container-insights/cloudwatch-agent-dockerfile](https://github.com/aws/amazon-cloudwatch-agent/tree/main/amazon-cloudwatch-container-insights/cloudwatch-agent-dockerfile)

## CrashLoopBackoff error on the CloudWatch agent
<a name="ContainerInsights-troubleshooting-crashloopbackoff"></a>

If you see a `CrashLoopBackOff` error for the CloudWatch agent, make sure that your IAM permissions are set correctly. For more information, see [Verifying prerequisites for Container Insights in CloudWatch](Container-Insights-prerequisites.md).

## CloudWatch agent or Fluentd pod stuck in pending
<a name="ContainerInsights-troubleshooting-pending"></a>

If you have a CloudWatch agent or Fluentd pod stuck in `Pending` or with a `FailedScheduling` error, determine if your nodes have enough compute resources based on the number of cores and amount of RAM required by the agents. Enter the following command to describe the pod:

```
kubectl describe pod cloudwatch-agent-85ppg -n amazon-cloudwatch
```

# Building your own CloudWatch agent Docker image
<a name="ContainerInsights-build-docker-image"></a>

You can build your own CloudWatch agent Docker image by referring to the Dockerfile located at [ https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/cloudwatch-agent-dockerfile/Dockerfile](https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/latest/cloudwatch-agent-dockerfile/Dockerfile).

The Dockerfile supports building multi-architecture images directly using `docker buildx`.

# Deploying other CloudWatch agent features in your containers
<a name="ContainerInsights-other-agent-features"></a>

You can deploy additional monitoring features in your containers using the CloudWatch agent. These features include the following:
+ **Embedded Metric Format**— For more information, see [Embedding metrics within logs](CloudWatch_Embedded_Metric_Format.md).
+ **StatsD**— For more information, see [Retrieve custom metrics with StatsD](CloudWatch-Agent-custom-metrics-statsd.md).

Instructions and necessary files are located on GitHub at the following locations:
+ For Amazon ECS containers, see [ Example Amazon ECS task definitions based on deployment modes](https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/latest/ecs-task-definition-templates/deployment-mode).
+ For Amazon EKS and Kubernetes containers, see [ Example Kubernetes YAML files based on deployment modes](https://github.com/aws-samples/amazon-cloudwatch-container-insights/tree/latest/k8s-deployment-manifest-templates/deployment-mode).