Using split cost allocation data with Amazon Managed Service for Prometheus
Splitting the cost data for Amazon EKS requires that you collect and store metrics from your clusters, including memory and CPU usage. Amazon Managed Service for Prometheus can be used for this purpose.
Once you're opted in to split cost allocation data and your Amazon Managed Service
for Prometheus workspace starts receiving the two required metrics
(container_cpu_usage_seconds_total
and
container_memory_working_set_bytes
), split cost allocation data
recognizes the metrics and uses them automatically.
Note
The two required metrics (container_cpu_usage_seconds_total
and
container_memory_working_set_bytes
) are present in the default
Prometheus scrape configuration and the default configuration provided with an
AWS managed collector. However, if you customize these configurations, do not
relabel, modify, or remove the following labels from the
container_cpu_usage_seconds_total
and
container_memory_working_set_bytes
metrics: name
,
namespace
, and pod
. If you relabel, modify, or
remove these labels, it can impact the ingestion of your metrics.
You can use Amazon Managed Service for Prometheus to collect EKS metrics from a single usage account, in a single Region. The Amazon Managed Service for Prometheus workspace must be in that account and Region. You need one Amazon Managed Service for Prometheus instance for each usage account and Region for which you want to monitor the costs. You can collect metrics for multiple clusters in the Amazon Managed Service for Prometheus workspace, as long as they're in the same usage account and Region.
The following sections describe how to send the correct metrics from your EKS cluster to the Amazon Managed Service for Prometheus workspace.
Prerequisites
As prerequisites for using Amazon Managed Service for Prometheus with split cost allocation data:
-
You need to enable split cost allocation data in the AWS Billing and Cost Management console. For details, see Enabling split cost allocation data. Opting in to split cost allocation data creates a service-linked role in each usage account to query Amazon Managed Service for Prometheus for the Amazon EKS cluster metrics in that account. For more information, see Service-linked roles for split cost allocation data.
-
You need an EKS cluster for which you want to track split cost allocation data. This can be an existing cluster, or you can create a new one. For more information, see Create an Amazon EKS cluster in the Amazon EKS User Guide.
Note
You will need the
EKS cluster ARN
,security group IDs
, and at least twosubnet IDs
(in different availability zones) for use in later steps.(optional) Set your EKS cluster’s authentication mode to either
API
orAPI_AND_CONFIG_MAP
. -
You need an Amazon Managed Service for Prometheus instance in the same account and Region as your EKS cluster. If you do not already have one, you can create one. For more information on creating an Amazon Managed Service for Prometheus instance, see Create a workspace in the Amazon Managed Service for Prometheus User Guide.
Note
You will need the
Amazon Managed Service for Prometheus workspace ARN
for use in later steps.
Forwarding EKS metrics to Amazon Managed Service for Prometheus
Once you have an EKS cluster and an Amazon Managed Service for Prometheus instance, you can forward the metrics from the cluster to the instance. You can send metrics in two ways.
-
Option 1: Use an AWS managed collector. This is the simplest way to send metrics from an EKS cluster to Amazon Managed Service for Prometheus. However, it does have a limit of only scraping metrics every 30 seconds at most.
-
Option 2: Create your own Prometheus agent. In this case, you have more control over the scraping configuration, but you must manage the agent after creating it.
Option 1: Using an AWS managed collector
Using an AWS managed collector (a scraper) is the simplest way to send metrics from an EKS cluster to an Amazon Managed Service for Prometheus instance. The following procedure steps you through creating an AWS managed collector. For more detailed information, see AWS managed collectors in the Amazon Managed Service for Prometheus User Guide.
Note
AWS managed collectors have a minimum scrape interval of 30 seconds. If you have short-lived pods, the recommendation is to set your scraper interval to 15 seconds. To use a 15 second scraper interval, use option 2 to create your own Prometheus agent.
There are three steps to create an AWS managed collector:
-
Create a scraper configuration.
-
Create the scraper.
-
Configure your EKS cluster to allow the scraper to access metrics.
Step 1: Create a scraper configuration
In order to create a scraper, you must have a scraper configuration. You can use a default configuration, or create your own. The following are three ways to get a scraper configuration:
-
Get the default configuration using the AWS CLI, by calling:
aws amp get-default-scraper-configuration
-
Create your own configuration. For details, see the Scraper configuration instructions in the Amazon Managed Service for Prometheus User Guide.
-
Copy the sample configuration provided in that same Scraper configuration instructions in the Amazon Managed Service for Prometheus User Guide.
You can edit the scraper configuration, to modify the scrape interval or to filter the metrics that are scraped, for example.
To filter the metrics that are scraped to just include the two that are needed for split cost allocation data, use the following scraper configuration:
scrape_configs: - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kubernetes-nodes-cadvisor scrape_interval: 30s scrape_timeout: 10s kubernetes_sd_configs: - role: node relabel_configs: - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ - replacement: kubernetes.default.svc:443 target_label: __address__ metric_relabel_configs: - source_labels: [__name__] regex: 'container_cpu_usage_seconds_total|container_memory_working_set_bytes' action: keep
Once you have the scraper configuration, you must base64 encode it for use
in step 2. The configuration is a text YAML file. To
encode the file, use a website such as https://www.base64encode.org/
Step 2: Create the scraper
Now that you have a configuration file, you need to create your scraper.
Create a scraper using the following AWS CLI command, based on the
variables outlined in the prerequisites section. You must use information
from your EKS cluster for the
<EKS-CLUSTER-ARN>
,
<SG-SECURITY-GROUP-ID>
, and
<SUBNET-ID>
fields, replace
<BASE64-CONFIGURATION-BLOB>
with the
scraper configuration you created in the previous step, and replace
<AMP_WORKSPACE_ARN>
with your Amazon
Managed Service for Prometheus workspace ARN.
aws amp create-scraper \ --source eksConfiguration="{clusterArn=
<EKS-CLUSTER-ARN>
,securityGroupIds=[<SG-SECURITY-GROUP-ID>
],subnetIds=[<SUBNET-ID>
]}" \ --scrape-configuration configurationBlob=<BASE64-CONFIGURATION-BLOB>
\ --destination ampConfiguration={workspaceArn="<AMP_WORKSPACE_ARN>
"}
Note down the scraperId
that is returned for use in
step 3.
Step 3: Configure your EKS cluster to allow the scraper to access metrics
If your EKS cluster’s authentication mode is set to either
API
or API_AND_CONFIG_MAP
, then your scraper
will automatically have the correct in-cluster access policy, and the
scrapers will have access to your cluster. No further configuration is
required, and metrics should be flowing to Amazon Managed Service for
Prometheus.
If your EKS cluster’s authentication mode is not set to API
or API_AND_CONFIG_MAP
, you will need to manually configure the
cluster to allow the scraper to access your metrics through a ClusterRole
and ClusterRoleBinding. To learn how to enable these permissions, see Manually configuring an EKS cluster for scraper
access in the Amazon Managed Service for Prometheus
User Guide.
Option 2: Creating your own Prometheus agent
If you can’t use the AWS managed collector, or already have your own Prometheus server, you can use your own Prometheus instance as an agent to scrape metrics from your EKS cluster and send them to Amazon Managed Service for Prometheus.
For detailed instructions on how to use your own Prometheus instance as an agent, see Using a Prometheus instance as a collector in the Amazon Managed Service for Prometheus User Guide.
The following is a sample Prometheus scrape configuration that includes the Prometheus server scrape interval and the container metrics required for split cost allocation data. If you have short-lived pods, the recommendation is to lower the default Prometheus server scrape interval from 30 seconds to 15 seconds. Note that this can result in high Prometheus server memory usage.
scrape_configs: - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kubernetes-nodes-cadvisor scrape_interval: 30s scrape_timeout: 10s kubernetes_sd_configs: - role: node relabel_configs: - regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ - replacement: kubernetes.default.svc:443 target_label: __address__ metric_relabel_configs: - source_labels: [__name__] regex: 'container_cpu_usage_seconds_total|container_memory_working_set_bytes' action: keep
If you followed Set up ingestion from a new Prometheus server using Helm in the in the Amazon Managed Service for Prometheus User Guide, then you can update your scrape configuration.
To update your scrape configuration
-
Edit
my_prometheus_values_yaml
from the guide and include the sample scrape config in theserver
block. -
Run the following command, using
prometheus-chart-name
andprometheus-namespace
from the Amazon Managed Service for Prometheus User Guide.
helm upgrade prometheus-chart-name prometheus-community/prometheus -n prometheus-namespace -f my_prometheus_values_yaml
To learn more about scrape_interval
or how to use a non-global
scrape_interval, refer to Prometheus scrape configuration
Alternatively, you can use the AWS Distro for OpenTelemetry collector that has a Prometheus Receiver, a Prometheus Remote Write Exporter, and the AWS Sigv4 Authentication Extension to achieve remote write access to Amazon Managed Service for Prometheus.
Note
Once you have set up your Prometheus agent, unlike AWS managed collectors, you are responsible for keeping the agent up to date and running to collect metrics.
Estimating your Amazon Managed Service for Prometheus costs
You can use AWS Pricing Calculator to estimate the cost of using Amazon Managed Service for Prometheus for split cost allocation data.
To configure Amazon Managed Service for Prometheus for your estimate
-
Open AWS Pricing Calculator at https://calculator.aws/#/
. -
Choose Create estimate.
-
On the Add service page, enter Amazon Managed Service for Prometheus in the search field, and then choose Configure.
-
In the Description field, enter a description for your estimate.
-
Choose a Region.
-
Select Calculate the cost using your infrastructure details. This option allows you to estimate your ingestion, storage, and query sample costs based on your current or proposed infrastructure setup.
-
For Number of EC2 instances, enter the total number of EC2 instances across all your clusters for your entire consolidated billing family (including all accounts and Regions). If you use AWS Fargate, use the number of Fargate tasks as a proxy for your EC2 instance count.
-
Split cost allocation data requires two metrics:
container_cpu_usage_seconds_total
andcontainer_memory_working_set_bytes
. For Prometheus metrics per EC2 instances, enter 2. -
Split cost allocation data suggests a scrape interval of 15 seconds. For Metric collection interval (in seconds), enter 15. If you used a different interval (for example, 30 seconds), change this to the interval you set up.
-
Split cost allocation data does not impose any specific requirements for the other parameters so enter appropriate values for the rest of the input parameters as per your business requirements.
-
Choose Save and add service.