Alerts in Grafana version 10 - Amazon Managed Grafana

Alerts in Grafana version 10

This documentation topic is designed for Grafana workspaces that support Grafana version 10.x.

For Grafana workspaces that support Grafana version 9.x, see Working in Grafana version 9.

For Grafana workspaces that support Grafana version 8.x, see Working in Grafana version 8.

With Grafana v10, Amazon Managed Grafana includes access to an updated alerting system, Grafana alerting, that centralizes alerting information in a single, searchable view. Grafana alerting was introduced as an optional feature in Grafana v8, and GrafanaLabs has announced the removal of legacy alerting in a future version.

Note

This documentation covers Grafana alerting. For information on legacy alerting, see Classic dashboard alerts.

Grafana Alerting allows you to learn about problems in your systems moments after they occur.

Monitor your incoming metrics data or log entries and set up your Alerting system to watch for specific events or circumstances and then send notifications when those things are found.

In this way, you eliminate the need for manual monitoring and provide a first line of defense against system outages or changes that could turn into major incidents.

Using Grafana Alerting, you create queries and expressions from multiple data sources — no matter where your data is stored — giving you the flexibility to combine your data and alert on your metrics and logs in new and unique ways. You can then create, manage, and take action on your alerts from a single, consolidated view, and improve your team’s ability to identify and resolve issues quickly.

With Mimir and Loki alert rules you can run alert expressions closer to your data and at massive scale, all managed by the Grafana UI you are already familiar with.

Note

If you are migrating from an earlier version of Grafana, where you used the legacy Grafana alerting, you might find it helpful to see the differences between the legacy alerting and the new Grafana alerting.

Key features and benefits

One page for all alerts

A single Grafana Alerting page consolidates both Grafana-managed alerts and alerts that reside in your Prometheus-compatible data source in one single place.

Multi-dimensional alerts

Alert rules can create multiple individual alert instances per alert rule, known as multi-dimensional alerts, giving you the power and flexibility to gain visibility into your entire system with just a single alert rule. You do this by adding labels to your query to specify which component is being monitored and generate multiple alert instances for a single alert rule. For example, if you want to monitor each server in a cluster, a multi-dimensional alert will alert on each CPU, whereas a standard alert will alert on the overall server.

Route alerts

Route each alert instance to a specific contact point based on labels you define. Notification policies are the set of rules for where, when, and how the alerts are routed to contact points.

Silence alerts

Silences stop notifications from getting created and last for only a specified window of time. Silences allow you to stop receiving persistent notifications from one or more alert rules. You can also partially pause an alert based on certain criteria. Silences have their own dedicated section for better organization and visibility, so that you can scan your paused alert rules without cluttering the main alerting view.

Mute timings

A mute timing is a recurring interval of time when no new notifications for a policy are generated or sent. Use them to prevent alerts from firing a specific and reoccurring period, for example, a regular maintenance period.

Similar to silences, mute timings do not prevent alert rules from being evaluated, nor do they stop alert instances from being shown in the user interface. They only prevent notifications from being created.

Design your Alerting system

Monitoring complex IT systems and understanding whether everything is up and running correctly is a difficult task. Setting up an effective alert management system is therefore essential to inform you when things are going wrong before they start to impact your business outcomes.

Designing and configuring an alert management set up that works takes time.

Here are some tips on how to create an effective alert management set up for your business:

Which are the key metrics for your business that you want to monitor and alert on?

  • Find events that are important to know about and not so trivial or frequent that recipients ignore them.

  • Alerts should only be created for big events that require immediate attention or intervention.

  • Consider quality over quantity.

Which type of Alerting do you want to use?

  • Choose between Grafana-managed Alerting or Grafana Mimir or Loki-managed Alerting; or both.

How do you want to organize your alerts and notifications?

  • Be selective about who you set to receive alerts. Consider sending them to whomever is on call or a specific Slack channel.

  • Automate as far as possible using the Alerting API or alerts as code (Terraform).

How can you reduce alert fatigue?

  • Avoid noisy, unnecessary alerts by using silences, mute timings, or pausing alert rule evaluation.

  • Continually tune your alert rules to review effectiveness. Remove alert rules to avoid duplication or ineffective alerts.

  • Think carefully about priority and severity levels.

  • Continually review your thresholds and evaluation rules.

Grafana alerting limitations

  • When aggregating rules from other systems, the Grafana alerting system can retrieve rules from all available Amazon Managed Service for Prometheus, Prometheus, Loki, and Alertmanager data sources. It might not be able to fetch rules from other supported data sources.

  • Alert rules defined in Grafana, rather than in Prometheus, send multiple notifications to your contact point. Alerts defined in other data sources, and aggregated or shown in Grafana do not. Enabling Grafana Alerting is recommended when using alerts defined in Prometheus-compatible data sources.