Recording rules and alerts
Note
Prometheus is using the Go Templating System for alerting, in both Prometheus and Alertmanager.Prometheus splits the alerting role in 3 components:
- prometheus server which will calculate the alerts
- alertmanager which will dispatch the alerts
- webhook receivers that will handle the alerts
Note
Alerts and recording rules are close to each other. They are queries that are run at regular interval by prometheus. They both write new metrics into tsdb.Exercise
Create, in Prometheus, an alert when a target is down.
Exercise
Create, in Prometheus, an alert when a grafana server is down, with an extra label: priority=high.
Exercise
Create a recording rule to get the % of disk space used and alert on > 50% of disk space used.
What is the difference between recording and alerting?
What is an annotation?
What is a “group” of recording rules?
How to see the rules and the alerts in the UI?
What is a pending alert?
Bonus: Alerts unit test (if there is enough time)
Tip
Prometheus generates an ALERTS metric with the active/pending alerts.Alertmanager
- Download the alertmanager 0.23.0.
Extract it
$ tar xvf Downloads/alertmanager-0.23.0.linux-amd64.tar.gz
List the files
$ ls alertmanager-0.23.0.linux-amd64
Launch the alertmanager
$ cd alertmanager-0.23.0.linux-amd64 $ ./alertmanager
Open your browser at http://127.0.0.1:9093
Add your alertmanager and your neighbors to prometheus
Connect Prometheus and Alertmanager together
Look for the alerts coming.
- What are the 4 roles of alertmanager?
- What are the different timers in alertmanager?
Exercise
Use https://webhook.site/ to get a webhook URL.
Send alerts to that https://webhook.site/ URL.
For the priority=high alerts, send an email instead of a webhook.
- Can you explain the HA model of prometheus?
- How can I send an alert to multiple targets?
Exercise
How can you check that two alertmanager config are in sync?
Note
There is aalertmanager_config_hash
metricSolution
Click to reveal.Exercise
Make a big cluster of alert managers
Amtool
Amtool is the CLI tool for alertmanager
You can use it to e.g. create silences.
$ ./amtool silence --alertmanager.url=http://127.0.0.1:9093 add job=grafana priority=high -d 15m -c "we redeploy grafana" -a Julien
That will return the UID of the silence that you can use to expire it.
Karma
karma is a dashboard for alertmanager