Alerting

Doing something with those metrics

Recording rules and alerts

Note

Prometheus is using the Go Templating System for alerting, in both Prometheus and Alertmanager.

Prometheus splits the alerting role in 3 components:

prometheus server which will calculate the alerts
alertmanager which will dispatch the alerts
webhook receivers that will handle the alerts

Note

Alerts and recording rules are close to each other. They are queries that are run at regular interval by prometheus. They both write new metrics into tsdb.

Exercise

Create, in Prometheus, an alert when a target is down.

Exercise

Create, in Prometheus, an alert when a grafana server is down, with an extra label: priority=high.

Exercise

Create a recording rule to get the % of disk space used and alert on > 50% of disk space used.

What is the difference between recording and alerting?

What is an annotation?

What is a “group” of recording rules?

How to see the rules and the alerts in the UI?

What is a pending alert?

Bonus: Alerts unit test (if there is enough time)

Tip

Prometheus generates an ALERTS metric with the active/pending alerts.

Alertmanager

Download the alertmanager 0.23.0.

Extract it

$ tar xvf Downloads/alertmanager-0.23.0.linux-amd64.tar.gz

List the files
```
$ ls alertmanager-0.23.0.linux-amd64
```

Launch the alertmanager

$ cd alertmanager-0.23.0.linux-amd64
$ ./alertmanager

Open your browser at http://127.0.0.1:9093
Add your alertmanager and your neighbors to prometheus
Connect Prometheus and Alertmanager together
Look for the alerts coming.

What are the 4 roles of alertmanager?
What are the different timers in alertmanager?

Exercise

Use https://webhook.site/ to get a webhook URL.

Send alerts to that https://webhook.site/ URL.

For the priority=high alerts, send an email instead of a webhook.

Can you explain the HA model of prometheus?
How can I send an alert to multiple targets?

Exercise

How can you check that two alertmanager config are in sync?

Note

There is a alertmanager_config_hash metric

Solution

Click to reveal.

count(count_values("config", alertmanager_config_hash)) != 1

Hide

Exercise

Make a big cluster of alert managers

Amtool

Amtool is the CLI tool for alertmanager

You can use it to e.g. create silences.

$ ./amtool silence --alertmanager.url=http://127.0.0.1:9093 add job=grafana priority=high -d 15m -c "we redeploy grafana" -a Julien

That will return the UID of the silence that you can use to expire it.

Karma

karma is a dashboard for alertmanager