Prometheus is an open source monitoring system designed around metrics. It is a large ecosystem, with plenty of different components.
The prometheus documentation provides an overview of those components:
How Prometheus works
Prometheus monitoring is based on metrics, exposed on HTTP endpoints. The Prometheus server is “active” and starts the polling. That polling (called “scraping”) happens at a high interval (usually 15s or 30s).
Each monitored target must expose a metrics endpoint. That endpoint exposes metrics in the Prometheus HTTP format or in the OpenMetrics format.
Once collected, those metrics are mutated by Prometheus, which adds an instance and job label. Optionally, extra relabeling configured by the user occurs.
The Prometheus server
- Download the prometheus server 2.30.0.
Extract it
$ tar xvf Downloads/prometheus-2.30.0.linux-amd64.tar.gz
List the files
$ ls prometheus-2.30.0.linux-amd64
Launch prometheus
$ cd prometheus-2.30.0.linux-amd64 $ ./prometheus
Open your browser at http://127.0.0.1:9090
Look at the TSDB data
tsdb
Prometheus stores its data in a database called tsdb. The TSDB is self-maintained by the server, which manages the data lifecycle.The web ui
There is a lot of information that can be found in the prometheus server web ui.
Try to find:
- The version of prometheus
- The duration of data retention
- The “targets” that are scraped by default
- The “scrape” interval
React UI
The Prometheus UI went under a huge refactoring in 2020. It is now react-based, with powerful autocomplete features. There is still a link to access the “classic” UI.promtool
promtool is a command line tool provided with Prometheus.
With promtool you can:
Validate Prometheus configuration
$ ./promtool check config prometheus.yml
Query Prometheus
$ ./promtool query instant http://127.0.0.1:9090 up
Info
Theup
metric is added by prometheus on each scrape. Its value is 1 if the
scrape has succeeded, 0 otherwise.- Create blocks from OpenMetrics files or recording rules, aka backfill.
Adding targets
Note
At this point, make sure you understand the basis of YAML.exercise
- Open prometheus.yml
- Add each one’s prometheus server as targets to your prometheus server.
- Look the status (using up or the target page)
What is a job? What is an instance?
Tip
You do not need to reload prometheus: you can just send a SIGHUP
signal to
reload the configuration:
$ killall -HUP prometheus
Admin commands
Enable admin commmands
$ ./prometheus --web.enable-admin-api
Take a snapshot
$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot
Look in the data directory.
Note
This is snapshotting the TSDB. There is another kind of snapshot, Memory Snapshot on Shutdown, which is a different feature.
Delete a timeserie
$ curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=process_start_time_seconds{job="prometheus"}'
Federation
File_sd
Now, let’s move to file_sd.
Create a file:
- targets:
- 127.0.0.1
labels:
name: Julien
- targets:
- 127.0.0.2
labels:
name: John
With your IP + your neighbors.
Name it users.yml
.
Adapt Prometheus configuration:
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
file_sd_configs:
- files:
- users.yml
relabel_configs:
- source_labels: [__address__]
target_label: __address__
replacement: "${1}:9090"
Duplicate the job, but with the following instructions:
- The new job should be called “federation”
- The new job should query http://127.0.0.1:9090/federate?match[]=up
- The “up” metric fetched should be renamed to external_up
Tip
The name of a metric is a label too! It is the__name__
label.Solution
Click to reveal.DigitalOcean SD
Now, let’s move to digitalocean_sd.
In your VM, there is a /etc/do_read file with a digitalocean token.
The version of Prometheus you have has native integration with DigitalOcean.
Adapt Prometheus configuration:
- job_name: 'prometheus'
digitalocean_sd_configs:
- bearer_token_file: /etc/do_read
port: 9090
relabel_configs:
- source_labels: [__meta_digitalocean_tags]
regex: '.*,prometheus_workshop,.*'
action: keep
Reload Prometheus:
killall -HUP prometheus
You should see the 10 prometheus servers.
Duplicate the job, but with the following instructions:
- The new job should be called “federation”
- The new job should query http://127.0.0.1:9090/federate?match[]=up
- The “up” metric fetched should be renamed to external_up
Tip
The name of a metric is a label too! It is the__name__
label.Solution
Click to reveal.Last exercise
Prometheus fetches Metrics over HTTP.
Metrics have a name and labels.
As an exercise, let’s build on top of our previous example:
In a new directory, create a file called “metrics”
Add some metrics:
company{name="inuits"} 1
favorite_color{name="red"} 1
random_number 10
workshop_step 1
then, run python -m SimpleHTTPServer 5678
and add it to prometheus (and your
neighbors too).