Answers to common questions about high availability configuration in Amazon Managed Service for Prometheus
Should I include the value __replica__ into another label to track the sample points?
In a high availability setting, Amazon Managed Service for Prometheus ensures data samples are not duplicated by electing a leader in the cluster of Prometheus instances. If the leader replica stops sending data samples for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader, including any missed data. Therefore, the answer is no, it is not recommended. Doing so may cause issues like:
-
Querying a
count
in PromQL may return higher than expected value during the period of electing a new leader. -
The number of
active series
gets increased during a period of electing a new leader and it reaches theactive series limits
. See AMP Quotas for more info.
Kubernetes seems to have it's own cluster label, and is not deduplicating my metrics. How can I fix this?
A new metric, apiserver_storage_size_bytes
was introduced in
Kubernetes 1.28, with a cluster
label. This can cause issues with
deduplication in Amazon Managed Service for Prometheus, which depends on the cluster
label. In
Kubernetes 1.3, the label is renamed to storage-cluster_id
(it is also renamed in later patches of 1.28 and 1.29). If your cluster is
emitting this metric with the cluster
label, Amazon Managed Service for Prometheus can't
dedupe the associated time series. We recommend you upgrade your Kubernetes
cluster to the latest patched version to avoid this problem. Alternately, you
can relabel the cluster
label on your
apiserver_storage_size_bytes
metric before ingesting it into
Amazon Managed Service for Prometheus.
Note
For more details about the change to Kubernetes, see Rename Label
cluster to storage_cluster_id for apiserver_storage_size_bytes metric