Prometheus

Install & Setup Prometheus

Prometheus is an open source monitoring system designed around metrics. It is a large ecosystem, with plenty of different components.

The prometheus documentation provides an overview of those components:

Prometheus Architecture, CC-BY-SA 4.0, from the Prometheus Authors 2014-2019

How Prometheus works

Prometheus monitoring is based on metrics, exposed on HTTP endpoints. The Prometheus server is “active” and starts the polling. That polling (called “scraping”) happens at a high interval (usually 15s or 30s).

Each monitored target must expose a metrics endpoint. That endpoint exposes metrics in the Prometheus HTTP format or in the OpenMetrics format.

Once collected, those metrics are mutated by Prometheus, which adds an instance and job label. Optionally, extra relabeling configured by the user occurs.

The Prometheus server

Download the prometheus server 2.30.0.

Extract it

$ tar xvf Downloads/prometheus-2.30.0.linux-amd64.tar.gz

List the files
```
$ ls prometheus-2.30.0.linux-amd64
```

Launch prometheus

$ cd prometheus-2.30.0.linux-amd64
$ ./prometheus

Open your browser at http://127.0.0.1:9090
Look at the TSDB data

tsdb

Prometheus stores its data in a database called tsdb. The TSDB is self-maintained by the server, which manages the data lifecycle.

The web ui

There is a lot of information that can be found in the prometheus server web ui.

Try to find:

The version of prometheus
The duration of data retention
The “targets” that are scraped by default
The “scrape” interval

React UI

The Prometheus UI went under a huge refactoring in 2020. It is now react-based, with powerful autocomplete features. There is still a link to access the “classic” UI.

promtool

promtool is a command line tool provided with Prometheus.

With promtool you can:

Validate Prometheus configuration

$ ./promtool check config prometheus.yml

Query Prometheus

$ ./promtool query instant http://127.0.0.1:9090 up

Info

The up metric is added by prometheus on each scrape. Its value is 1 if the scrape has succeeded, 0 otherwise.

Create blocks from OpenMetrics files or recording rules, aka backfill.

Adding targets

Note

At this point, make sure you understand the basis of YAML.

exercise

Open prometheus.yml
Add each one’s prometheus server as targets to your prometheus server.
Look the status (using up or the target page)

What is a job? What is an instance?

Tip

You do not need to reload prometheus: you can just send a SIGHUP signal to reload the configuration:

$ killall -HUP prometheus

Admin commands

Enable admin commmands
```
$ ./prometheus --web.enable-admin-api
```

Take a snapshot

$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot

Look in the data directory.

Note

This is snapshotting the TSDB. There is another kind of snapshot, Memory
Snapshot on Shutdown, which is a different feature.

Delete a timeserie

$ curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=process_start_time_seconds{job="prometheus"}'

Federation

File_sd

Now, let’s move to file_sd.

Create a file:

- targets:
    - 127.0.0.1
  labels:
    name: Julien
- targets:
    - 127.0.0.2
  labels:
    name: John

With your IP + your neighbors.

Name it users.yml.

Adapt Prometheus configuration:

  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    file_sd_configs:
      - files:
         - users.yml
    relabel_configs:
      - source_labels: [__address__]
        target_label: __address__
        replacement: "${1}:9090"

Duplicate the job, but with the following instructions:

The new job should be called “federation”
The new job should query http://127.0.0.1:9090/federate?match[]=up
The “up” metric fetched should be renamed to external_up

Tip

The name of a metric is a label too! It is the __name__ label.

Solution

Click to reveal.

- job_name: 'federation'
  metrics_path: '/federate'
  params:
    'match[]':
      - up
  file_sd_configs:
    - files:
       - users.yml
  relabel_configs:
    - source_labels: [__address__]
      target_label: __address__
      replacement: "${1}:9090"
  metric_relabel_configs:
    - source_labels: [__name__]
      target_label: __name__
      regex: up
      replacement: federate_up

Hide

DigitalOcean SD

Now, let’s move to digitalocean_sd.

In your VM, there is a /etc/do_read file with a digitalocean token.

The version of Prometheus you have has native integration with DigitalOcean.

Adapt Prometheus configuration:

  - job_name: 'prometheus'
    digitalocean_sd_configs:
      - bearer_token_file: /etc/do_read
        port: 9090
    relabel_configs:
      - source_labels: [__meta_digitalocean_tags]
        regex: '.*,prometheus_workshop,.*'
        action: keep

Reload Prometheus:

killall -HUP prometheus

You should see the 10 prometheus servers.

Duplicate the job, but with the following instructions:

The new job should be called “federation”
The new job should query http://127.0.0.1:9090/federate?match[]=up
The “up” metric fetched should be renamed to external_up

Tip

The name of a metric is a label too! It is the __name__ label.

Solution

Click to reveal.

- job_name: 'prometheus'
  digitalocean_sd_configs:
    - bearer_token_file: /etc/do_read
  relabel_configs:
  - source_labels: [__meta_digitalocean_tags]
    regex: '.*,prometheus_workshop,.*'
    action: keep
  - source_labels: [__meta_digitalocean_droplet_name]
    target_label: instance
  - source_labels: [__meta_digitalocean_public_ipv4]
    target_label: __address__
    replacement: '$1:9090'
- job_name: 'federation'
  metrics_path: '/federate'
  digitalocean_sd_configs:
    - bearer_token_file: /etc/do_read
  params:
    'match[]':
      - up
  relabel_configs:
  - source_labels: [__meta_digitalocean_tags]
    regex: '.*,prometheus_workshop,.*'
    action: keep
  - source_labels: [__meta_digitalocean_droplet_name]
    target_label: instance
  - source_labels: [__meta_digitalocean_public_ipv4]
    target_label: __address__
    replacement: '$1:9090'
  metric_relabel_configs:
    - source_labels: [__name__]
      target_label: __name__
      regex: up
      replacement: federate_up

Hide

Last exercise

Prometheus fetches Metrics over HTTP.

Metrics have a name and labels.

As an exercise, let’s build on top of our previous example:

In a new directory, create a file called “metrics”

Add some metrics:

company{name="inuits"} 1
favorite_color{name="red"} 1
random_number 10
workshop_step 1

then, run python -m SimpleHTTPServer 5678 and add it to prometheus (and your neighbors too).