Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources
Amazon Managed Service for Prometheus vends usage metrics to CloudWatch. These metrics provide visibility about your
workspace utilization. The vended metrics can be found in the AWS/Usage
and
AWS/Prometheus
namespaces in CloudWatch. These metrics are available in CloudWatch
for no charge. For more information about usage metrics, see CloudWatch usage metrics.
CloudWatch metric name | Resource name | CloudWatch namespace | Description |
---|---|---|---|
ResourceCount |
IngestionRate |
|
Sample ingestion rate Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
ActiveSeries |
|
Number of active series per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
ActiveAlerts |
|
Number of active alerts per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
SizeOfAlerts |
|
Total size of all alerts in the workspace, in bytes Units: bytes Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
SuppressedAlerts |
|
Number of alerts in suppressed state per workspace. An alert can be suppressed by a silence or inhibition. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
UnprocessedAlerts |
|
Number of alerts in unprocessed state per workspace. An alert is in unprocessed state once it is received by AlertManager, but is waiting for the next aggregation group evaluation. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
AllAlerts |
|
Number of alerts in any state per workspace. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerAlertsReceived |
- |
|
Total successful alerts received by alert manager Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerNotificationsFailed |
- |
|
Number of failed alert deliveries Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerNotificationsThrottled |
- |
|
Number of throttled alerts Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
DiscardedSamples* |
- |
|
Number of discarded samples by reason Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
QuerySamplesProcessed |
- |
|
Rate of query samples processed Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum |
RuleEvaluations |
- |
|
Total number of rule evaluations Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
RuleEvaluationFailures |
- |
|
Number of rule evaluation failures in the interval Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
RuleGroupIterationsMissed |
- |
|
Number of Rule Group iterations missed in the interval. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
*Some of the reasons that cause samples to be discarded are as follows.
Reason |
Meaning |
---|---|
greater_than_max_sample_age |
Discarding samples which are older than one hour. |
new-value-for-timestamp |
Duplicate samples are sent with a different timestamp than was previously recorded. |
per_metric_series_limit |
User has hit the active series per metric limit. |
per_user_series_limit |
User has hit the total number of active series limit. |
rate_limited |
Ingestion rate limited. |
sample-out-of-order |
Samples are sent out of order and cannot be processed. |
label_value_too_long |
Label value is longer than allowed character limit. |
max_label_names_per_series |
User has hit the label names per metric. |
missing_metric_name |
Metric name is not provided. |
metric_name_invalid |
Invalid metric name provided. |
label_invalid |
Invalid label provided. |
duplicate_label_names |
Duplicate label names provided. |
Note
A metric not existing or missing is the same as the value of that metric being 0.
Note
RuleGroupIterationsMissed
, RuleEvaluations
, and
RuleEvaluationFailures
have the RuleGroup
dimension of
the following structure:
RuleGroupNamespace
;RuleGroup
Setting a CloudWatch alarm on Prometheus vended metrics
You can monitor usage of Prometheus resources using CloudWatch alarms.
To set an alarm on the number of ActiveSeries in Prometheus
-
Choose the Graphed metrics tab and scroll down to the ActiveSeries label.
In the Graphed metrics view, only the metrics currently being ingested will appear.
-
Choose the notification icon in the Actions column.
-
In Specify metric and conditions, enter the threshold condition in the Conditions value field and choose Next.
-
In Configure actions, select an existing SNS topic or create a new SNS topic to send the notification to.
-
In Add name and description, add the name of the alarm and an optional description.
-
Choose Create alarm.