Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart
You can use either the Amazon CloudWatch Observability EKS add-on or the Amazon CloudWatch Observability Helm chart to install the CloudWatch Agent and the Fluent-bit agent on an Amazon EKS cluster. You can also use the Helm chart to install the CloudWatch Agent and the Fluent-bit agent on a Kubernetes cluster that is not hosted on Amazon EKS.
Using either method on an Amazon EKS cluster enables both Container Insights with enhanced observability for Amazon EKS and CloudWatch Application Signals by default. Both features help you to collect infrastructure metrics, application performance telemetry, and container logs from the cluster.
With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics
are charged per observation instead of being charged per metric stored or log ingested.
For Application Signals, billing is based on inbound requests to your applications,
outbound requests from your applications, and each configured service level objective (SLO).
Each inbound request received generates one application signal, and each outbound request made generates
one application signal. Every SLO creates two application signals per measurement period.
For more information about CloudWatch pricing, see Amazon CloudWatch Pricing
Both methods enable Container Insights on both Linux and Windows worker nodes in the Amazon EKS cluster. To enable Container Insights on Windows, you must use version 1.5.0 or later of the Amazon EKS add-on or the Helm chart. Currently, Application Signals is not supported on Windows in Amazon EKS clusters.
The Amazon CloudWatch Observability EKS add-on is supported on Amazon EKS clusters running with Kubernetes version 1.23 or later.
When you install the add-on or the Helm chart, you must also grant IAM permissions to enable the CloudWatch agent to send metrics, logs, and traces to CloudWatch. There are two ways to do this:
Attach a policy to the IAM role of your worker nodes. This option grants permissions to worker nodes to send telemetry to CloudWatch.
Use an IAM role for service accounts for the agent pods, and attach the policy to this role. This works only for Amazon EKS clusters. This option gives CloudWatch access only to the appropriate agent pods.
Option 1: Install with IAM permissions on worker nodes
To use this method, first attach the CloudWatchAgentServerPolicy
IAM policy to your worker
nodes by entering the following command. In this command, replace my-worker-node-role
with
the IAM role used by your Kubernetes worker nodes.
aws iam attach-role-policy \ --role-name
my-worker-node-role
\ --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
Then install the Amazon CloudWatch Observability EKS add-on. To install the add-on, you can use the AWS CLI, the console, AWS CloudFormation, or Terraform.
Option 2: Install using IAM service account role (applies only to using the add-on)
This method is valid only if you are using the Amazon CloudWatch Obeservability EKS add-on. Before using this method, verify the following prerequisites:
-
You have a functional Amazon EKS cluster with nodes attached in one of the AWS Regions that supports Container Insights. For the list of supported Regions, see Container Insights.
-
You have
kubectl
installed and configured for the cluster. For more information, see Installingkubectl
in the Amazon EKS User Guide. -
You have
eksctl
installed. For more information, see Installing or updatingeksctl
in the Amazon EKS User Guide.
To install the Amazon CloudWatch Observability EKS add-on using the IAM service account role
Enter the following command to create an OpenID Connect (OIDC) provider, if the cluster doesn't have one already. For more information, see Configuring a Kubernetes service account to assume an IAM role in the Amazon EKS User Guide.
eksctl utils associate-iam-oidc-provider --cluster
my-cluster-name
--approveEnter the following command to create the IAM role with the CloudWatchAgentServerPolicy policy attached, and configure the agent service account to assume that role using OIDC. Replace
my-cluster-name
with the name of your cluster, and replacemy-service-account-role
with the name of the role that you want to associate the service account with. If the role doesn't already exist,eksctl
creates it for you.eksctl create iamserviceaccount \ --name cloudwatch-agent \ --namespace amazon-cloudwatch --cluster
my-cluster-name
\ --role-namemy-service-account-role
\ --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \ --role-only \ --approveInstall the add-on by entering the following command. Replace
my-cluster-name
with the name of your cluster, replace111122223333
with your account ID, and replacemy-service-account-role
with the IAM role created in the previous step.aws eks create-addon --addon-name amazon-cloudwatch-observability --cluster-name
my-cluster-name
--service-account-role-arn arn:aws:iam::111122223333
:role/my-service-account-role
(Optional) Additional configuration
Topics
Opt out of collecting container logs
By default, the add-on uses Fluent Bit to collect container logs from all pods and then sends the logs to CloudWatch Logs. For information about which logs are collected, see Setting up Fluent Bit.
Note
Neither the add-on or the Helm chart manage existing Fluentd or Fluent Bit resources in a cluster. You can delete the existing Fluentd or Fluent Bit resources before installing the add-on or Helm chart. Alternatively, to keep your existing setup and avoid having the add-on or the Helm chart from also installing Fluent Bit, you can disable it by following the instructions in this section.
To opt out of the collection of container logs if you are using the Amazon CloudWatch Observability EKS add-on, pass the following option when you create or update the add-on:
--configuration-values '{ "containerLogs": { "enabled": false } }'
To opt out of the collection of container logs if you are using the Helm chart, pass the following option when you create or update the add-on:
--set containerLogs.enabled=false
Use a custom Fluent Bit configuration
Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, you can modify the Fluent Bit configuration when you
create or update the add-on or Helm chart. You supply the custom Fluent Bit configuration in the containerLogs
root level section of the advanced configuration
of the add-on or the value overrides in the Helm chart. Within this section, you supply
the custom Fluent Bit configuration in the config
section (for Linux) or configWindows
section (for Windows). The config
is further broken down into the following sub-sections:
service
– This section represents theSERVICE
config to define the global behavior of the Fluent Bit engine.customParsers
– This section represents any globalPARSER
s that you want to include that are capable of taking unstructured log entries and giving them a structure to make it easier for processing and further filtering.extraFiles
– This section can be used to provide additional Fluent Bitconf
files to be included. By default, the following 3conf
files are included:.application-log.conf
– Aconf
file for sending application logs from your cluster to the log group/aws/containerinsights/
in CloudWatch Logs.my-cluster-name
/applicationdataplane-log.conf
– Aconf
file for sending logs corresponding to your cluster’s data plane components including the CRI logs, kubelet logs, kube-proxy logs and Amazon VPC CNI logs to the log group/aws/containerinsights/
in CloudWatch Logs.my-cluster-name
/dataplanehost-log.conf
– Aconf
for sending logs from/var/log/dmesg
,/var/log/messages
, and/var/log/secure
on Linux, and Systemwinlogs
on Windows, to the log group/aws/containerinsights/
in CloudWatch.my-cluster-name
/host
Note
Provide the full configuration for each of these individual sections even if you are modifying only one field within a sub-section. We recommend that you use the default configuration provided below as a baseline and then modify it accordingly so that you don't disable functionality that is enabled by default. You can use the following YAML configuration when modifying the advanced config for the Amazon EKS add-on or when you supply value overrides for the Helm chart.
To find the config
section for your cluster, see
aws-observability / helm-charts/charts/amazon-cloudwatch-observability/values.yaml
to find the config
section (for Linux) and configWindows
section (for Windows) within the fluentBit
section under containerLogs
.
As an example, the default Fluent Bit configuration for version 1.7.0 can be found
here
We recommend that you provide the config
as YAML when you supply it using the Amazon EKS add-on’s advanced config or when you supply
it as value overrides for your Helm installation. Be sure that the YAML conforms to the following structure.
containerLogs: fluentBit: config: service: | ... customParsers: | ... extraFiles: application-log.conf: | ... dataplane-log.conf: | ... host-log.conf: | ...
The following example config
changes the global setting for the flush interval to be 45 seconds.
Even though the only modification is to the Flush
field, you must still provide the
full SERVICE
definition for the service sub-section. Because this example didn't specify overrides for the
other sub-sections, the defaults are used for them.
containerLogs: fluentBit: config: service: | [SERVICE] Flush 45 Grace 30 Log_Level error Daemon off Parsers_File parsers.conf storage.path /var/fluent-bit/state/flb-storage/ storage.sync normal storage.checksum off storage.backlog.mem_limit 5M
The following example configuration includes an extra Fluent bit conf
file. In this example, we are
adding a custom my-service.conf
under extraFiles
and it will be included in addition to the three default extraFiles
.
containerLogs: fluentBit: config: extraFiles: my-service.conf: | [INPUT] Name tail Tag myservice.* Path /var/log/containers/*myservice*.log DB /var/fluent-bit/state/flb_myservice.db Mem_Buf_Limit 5MB Skip_Long_Lines On Ignore_Older 1d Refresh_Interval 10 [OUTPUT] Name cloudwatch_logs Match myservice.* region ${AWS_REGION} log_group_name /aws/containerinsights/${CLUSTER_NAME}/myservice log_stream_prefix ${HOST_NAME}- auto_create_group true
The next example removes an existing conf
file entirely from extraFiles
.
This excludes the application-log.conf
entirely by overriding it with an empty string. Simply
omitting application-log.conf
from extraFiles
would instead imply to use the default, which is not
what we are trying to achieve in this example. The same applies to removing any custom conf
file that you might have previously added to extraFiles
.
containerLogs: fluentBit: config: extraFiles: application-log.conf: ""
Manage Kubernetes tolerations for the installed pod workloads
Starting with version 1.7.0 of the Amazon CloudWatch Observability EKS add-on, the add-on and the Helm chart by default
set Kubernetes tolerations to tolerate all taints on the pod workloads that
are installed by the add-on or the Helm chart. This ensures that daemonsets such as the CloudWatch agent and Fluent Bit
can schedule pods on all nodes in your cluster by default. For more information about tolerations and taints, see
Taints and Tolerations
The default tolerations set by the add-on or the Helm chart are as follows:
tolerations: - operator: Exists
You can override the default tolerations by setting the tolerations
field at the root level
when using the add-on advanced config or when you install or upgrade the Helm chart with value overrides.
An example would look like the following:
tolerations: - key: "key1" operator: "Exists" effect: "NoSchedule"
To omit tolerations completely, you can use a config that looks like the following:
tolerations: []
Any changes to tolerations apply to all pod workloads that are installed by the add-on or the Helm chart.
Opt out of accelerated compute metrics collection
By default, Container Insights with enhanced observability collects metrics for Accelerated Compute monitoring, including NVIDIA GPU metrics, AWS Neuron metrics for AWS Trainium and AWS Inferentia, and AWS Elastic Fabric Adapter (EFA) metrics.
NVIDIA GPU metrics from Amazon EKS workloads are collected by default beginning with version v1.3.0-eksbuild.1
of the EKS add-on or the Helm chart
and version 1.300034.0
of the CloudWatch agent. For a list of metrics collected and prerequisites, see NVIDIA GPU
metrics.
AWS Neuron metrics for AWS Trainium
and AWS Inferentia accelerators are collected by default beginning with version v1.5.0-eksbuild.1
of the EKS add-on or the Helm chart,
and version 1.300036.0
of the CloudWatch agent. For a list of metrics collected and prerequisites, see
AWS Neuron metrics for AWS Trainium and AWS Inferentia .
AWS Elastic Fabric Adapter (EFA) metrics from Linux nodes on Amazon EKS clusters are collected by default beginning with
version v1.5.2-eksbuild.1
of the EKS add-on or the Helm chart and version 1.300037.0
of the CloudWatch agent. For a list of metrics collected
and prerequisites, see AWS Elastic Fabric Adapter (EFA) metrics .
You can opt out of collecting these metrics by setting the accelerated_compute_metrics
field in the CloudWatch agent configuration file to
false
. This field is in the kubernetes
section of the metrics_collected
section in the CloudWatch configuration file.
The following is an example of an opt-out configuration. For more information about how to use custom CloudWatch agent configurations,
see the following section, Use a custom CloudWatch agent configuration.
{ "logs": { "metrics_collected": { "kubernetes": { "enhanced_container_insights": true, "accelerated_compute_metrics": false } } } }
Use a custom CloudWatch agent configuration
To collect other metrics, logs or traces using the CloudWatch agent, you can specify a custom configuration while also keeping Container Insights and CloudWatch Application Signals enabled. To do so, embed the CloudWatch agent configuration file within the config key under the agent key of the advanced configuration that you can use when creating or updating the EKS add-on or the Helm chart. The following represents the default agent configuration when you do not provide any additional configuration.
Important
Any custom configuration that you provide using additional configuration settings overrides the default configuration used by the agent. Be cautious not to unintentionally disable functionality that is enabled by default, such as Container Insights with enhanced observability and CloudWatch Application Signals. In the scenario that you are required to provide a custom agent configuration, we recommend using the following default configuration as a baseline and then modifying it accordingly.
For using the Amazon CloudWatch observability EKS add-on
--configuration-values '{ "agent": { "config": { "logs": { "metrics_collected": { "application_signals": {}, "kubernetes": { "enhanced_container_insights": true } } }, "traces": { "traces_collected": { "application_signals": {} } } } }'
-
For using the Helm chart
--set agent.config='{ "logs": { "metrics_collected": { "application_signals": {}, "kubernetes": { "enhanced_container_insights": true } } }, "traces": { "traces_collected": { "application_signals": {} } } }'
The following example shows the default agent configuration for the CloudWatch agent on Windows. The CloudWatch agent on Windows does not support custom configuration.
{ "logs": { "metrics_collected": { "kubernetes": { "enhanced_container_insights": true }, } } }
Manage admission webhook TLS certificates
The Amazon CloudWatch Observability EKS add-on and the Helm chart leverage Kubernetes
admission webhooksAmazonCloudWatchAgent
and Instrumentation
custom resource (CR) requests, and optionally Kubernetes pod requests on the cluster if
CloudWatch Application Signals is enabled. In Kubernetes, webhooks require a TLS certificate that the API server
is configured to trust in order to ensure secure communication.
By default, the Amazon CloudWatch Observability EKS add-on and the Helm chart auto-generate a self-signed CA and a
TLS certificate signed by this CA for securing the communication between the API server and the
webhook server. This auto-generated certificate has a default expiry of 10 years and is not
auto-renewed upon expiry. In addition, the CA bundle and the certificate are re-generated every
time the add-on or Helm chart is upgraded or re-installed, thus resetting the expiry. If you want to
change the default expiry of the auto-generated certificate, you can use the following additional
configurations when creating or updating the add-on. Replace expiry-in-days
with your
desired expiry duration in days.
Use this for the Amazon CloudWatch Observability EKS add-on
--configuration-values '{ "admissionWebhooks": { "autoGenerateCert": { "expiryDays":
expiry-in-days
} } }'Use this for the Helm chart
--set admissionWebhooks.autoGenerateCert.expiryDays=expiry-in-days
For a more secure and feature-rich certificate authority solution, the add-on has opt-in support
for
cert-manager
We recommend that you review best practices for management of TLS certificates on your
clusters and advise you to opt in to cert-manager for production environments. Note that if you
opt-in to enabling cert-manager for managing the admission webhook TLS certificates, you are required to
pre-install cert-manager on your Amazon EKS cluster before you install the Amazon CloudWatch Observability EKS add-on or the Helm chart.
For more information about available installation options, see
cert-manager documentation
If you are using the Amazon CloudWatch Observability EKS add-on
--configuration-values '{ "admissionWebhooks": { "certManager": { "enabled": true } } }'
If you are using the Helm chart
--set admissionWebhooks.certManager.enabled=true
--configuration-values '{ "admissionWebhooks": { "certManager": { "enabled": true } } }'
The advanced configuration discussed in this section will by default use a
SelfSigned
Collect Amazon EBS volume IDs
If you want to collect Amazon EBS volume IDs in the performance logs, you must add another policy to the IAM role that is attached to the worker nodes or to the service account. Add the following as an inline policy. For more information, see Adding and Removing IAM Identity Permissions.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:DescribeVolumes" ], "Resource": "*", "Effect": "Allow" } ] }
Troubleshooting the Amazon CloudWatch Observability EKS add-on or the Helm chart
Use the following information to help troubleshoot problems with the Amazon CloudWatch Observability EKS add-on or the Helm chart
Topics
Updating and deleting the Amazon CloudWatch Observability EKS add-on or the Helm chart
For instructions about updating or deleting the Amazon CloudWatch Observability EKS add-on,
see Managing Amazon EKS add-ons.
Use amazon-cloudwatch-observability
as the name of the add-on.
To delete the Helm chart in a cluster, enter the following command.
helm delete amazon-cloudwatch-observability -n amazon-cloudwatch --wait
Verify the version of the CloudWatch agent used by the Amazon CloudWatch Observability EKS add-on or the Helm chart
The Amazon CloudWatch Observability EKS add-on and the Helm chart installs a custom resource of kind
AmazonCloudWatchAgent
that controls the behavior of the CloudWatch
agent daemonset on the cluster, including the version of the CloudWatch agent being used. You can get a
list of all the AmazonCloudWatchAgent
custom resources installed on your cluster u
by entering the following command:
kubectl get amazoncloudwatchagent -A
In the output of this command, you should be able to check the version
of the CloudWatch agent. Alternatively, you can also describe the amazoncloudwatchagent
resource or one of the cloudwatch-agent-*
pods running on your cluster to
inspect the image being used.
Handling a ConfigurationConflict when managing the add-on or the Helm chart
When you install or update the Amazon CloudWatch Observability EKS add-on or the Helm chart, if you notice a failure caused by existing resources, it is likely because you already have the CloudWatch agent and its associated components such as the ServiceAccount, the ClusterRole and the ClusterRoleBinding installed on the cluster.
The error displayed by the add-on will include Conflicts found when trying to apply. Will not continue due to resolve conflicts mode
,
The error displayed by the Helm chart will be similar to Error: INSTALLATION FAILED: Unable to continue with install and invalid ownership metadata.
.
When the add-on or the Helm chart tries to install the CloudWatch agent and its associated components, if it detects any change in the contents, it by default fails the installation or update to avoid overwriting the state of the resources on the cluster.
If you are trying to onboard to the Amazon CloudWatch Observability EKS add-on and you see this failure, we recommend deleting an existing CloudWatch agent setup that you had previously installed on the cluster and then installing the EKS add-on or Helm chart. Be sure to back up any customizations you might have made to the original CloudWatch agent setup such as a custom agent configuration, and provide these to the add-on or Helm chart when you next install or update it. If you had previously installed the CloudWatch agent for onboarding to Container Insights, see Deleting the CloudWatch agent and Fluent Bit for Container Insights for more information.
Alternatively, the add-on supports a conflict resolution configuration option
that has the capability to specify OVERWRITE
. You can use this option to proceed
with installing or updating the add-on by overwriting the conflicts on the cluster.
If you are using the Amazon EKS console, you'll find the Conflict resolution method when you
choose the Optional configuration settings when you create
or update the add-on. If you are using the AWS CLI, you can supply the --resolve-conflicts OVERWRITE
to your command to create or update the add-on.