1. What is Prometheus

Prometheus is an open source system monitoring and alarm toolkit with active ecosystems. In short, it is an open source monitoring solution.

Prometheus main features:

  • Multidimensional data model containing time series data identified by indicator names and key/value pairs
  • PromQL, a flexible query language
  • Not dependent on distributed storage; A single server node is autonomous
  • Time series collection is done through the Pull model over HTTP
  • Push time series is supported through an intermediate gateway
  • Discover targets through service discovery or static configuration
  • Supports multiple modes of graphics and dashboards

Why pull instead of push?

Because pull has the following advantages:

  • When you make changes, you can run the monitor on your laptop
  • It is easier to determine if a target is offline
  • You can manually go to the target and check its health using a Web browser

The target exposes HTTP endpoints, and Prometheus servers actively pull data over HTTP. Now that is the server to the target on their own pull data, then the server running on the local (on our own computer) may be used, as long as you can access the target endpoint can, at the same time as the heartbeat detection can be judged whether the target offline, and the service side pull on their own, so want to pull who pull data the data, thus it can be randomly pull switch targets.

Skywalking has a client and a server. You need to install a probe (Agent) on the target service. The probe collects the indicator data of the target service and reports it to the OAP service on the server. Prometheus does not need a probe and uses the Push gateway to implement the push effect.

By the way, I want to be clear about the noun metrics, which I prefer to translate as metrics

2. Basic concepts

2.1. Data model

Prometheus essentially stores all data as time series: time-stamped streams of values belonging to the same metric and dimension of the same set of markers. In addition to stored time series, Prometheus may generate temporarily derived time series as the result of a query.

Metric names and labels

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

Each time series is uniquely identified by its indicator name and an optional key/value pair called a label.

The sample constitutes the actual time series data. Each sample includes:

  • A 64-bit floating point value
  • A millisecond timestamp

Given an indicator name and a set of labels, a time series is usually identified by this notation:

<metric name>{<label name>=<label value>, ... }Copy the code

For example, if there is a time series with the indicator name API_HTTP_REQUESTs_total and tags method=”POST” and handler=”/messages”, it might be represented like this:

api_http_requests_total{method="POST", handler="/messages"}
Copy the code

2.2. Indicator type

Counter

Counter is a cumulative metric that represents a monotonically increasing counter whose value can only be increased or reset to zero on reboot. For example, counters can be used to indicate the number of requests served, completed tasks, or errors.

Do not use counters to display values that can be reduced. For example, do not use a counter for the number of processes currently running, use gauges instead.

Gauge

A gauge is an indicator representing a single value that can fluctuate up or down at will.

Gauges are often used to measure values, such as temperature or current memory usage, but also for “counts” that can rise and fall, such as the number of concurrent requests.

Histogram

The histogram samples observations (typically request duration or response size) and counts them in a configurable bucket. It also provides the sum of all observations.

A histogram with a basic index name will expose multiple time series during fetching:

  • Observe the bucket’s cumulative counter, represented by _bucket{le=””}
  • The sum of all observations, expressed as _sum
  • The number of observed events, represented by _count

Summary

Similar to histograms, summaries are sampled for analysis of observations, typically such things as request duration and response size. While it also provides the total number of observations and the sum of all observations, it can compute configurable quantiles within a sliding time window.

A summary with a basic index name of exposes multiple time series during fetching:

  • The flow of φ quantile (0≤φ≤1) of observed events is expressed as {quantile=”

    “}
    φ>

  • The sum of all observations, expressed as _sum
  • The number of observed events, expressed as _count

2.3. Operations and examples

In Prometheus terminology, an endpoint that can be grabbed is called an instance, usually corresponding to a single process. A collection of instances with the same purpose is called a job.

For example, a job has four instances:

  • Job: API-ServerInstance 1: 1.2.3.4:5670Instance 2: 1.2.3.4:5671Instance 3: 5.6.7.8:5670Instance 4: 5.6.7.8:5671

When Prometheus captures a target, it automatically appends tags to the captured time series to identify the captured target:

  • Job: indicates the name of the configured job to which the target belongs
  • Instance: The: part of the target URL to be fetched

3. Installation and configuration

Prometheus collects metrics from targets by fetching metrics HTTP endpoints. Because Prometheus exposes its own data in the same way, it can capture and monitor its own health.

By default, Prometheus’s own health data can be captured by running it directly without changing the configuration

# Start Prometheus.
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path)

./prometheus --config.file=prometheus.yml
Copy the code

Go directly to localhost:9090

To view metrics, visit localhost:9090/metrics

For example

Enter the following expression and click “Execute” to see the following effect

prometheus_target_interval_length_seconds
Copy the code

This should return multiple different time series (and the most recent value for each sequence), each with a metric name prometheus_TARGEt_interval_LENGTH_seconds, but with a different label.

This shows the metrics graphically, as well as using localhost:9090/metrics

If we are only interested in 99% latency, we can use the following query:

Prometheus_target_interval_length_seconds {quantile = "0.99"}Copy the code

To calculate the number of time series returned, the query should be written like this:

count(prometheus_target_interval_length_seconds)
Copy the code

Next, let’s use Node Exporter to add a few more targets:

tar -xzvf node_exporter-*.*.tar.gz cd node_exporter-*.* # Start 3 example targets in separate terminals: /node_exporter --web.listen-address 127.0.0.1:8080./node_exporter --web.listen-address 127.0.0.1:8081./node_exporter --web.listen-address 127.0.0.1:8081 -- web. Listen - address 127.0.0.1:8082Copy the code

Next, configure Prometheus to capture these three new targets

First, define a job called ‘node’ that is responsible for fetching data from the three target endpoints. Imagine that the first two endpoints are production and the other is non-production. To distinguish them, we label them two different ways. In this example, we add the group=”production” tag to the first target group and group=”canary” to the second target.

scrape_configs:
  - job_name:       'node'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:8080', 'localhost:8081']
        labels:
          group: 'production'

      - targets: ['localhost:8082']
        labels:
          group: 'canary'
Copy the code

3.1. The configuration

To view all command line arguments, run the following command

./prometheus -h
Copy the code

The configuration file is in YAML format and can be specified using the –config.file argument

The structure of the configuration file is as follows:

global: # How frequently to scrape targets by default. [ scrape_interval: <duration> | default = 1m ] # How long until a scrape request times out. [ scrape_timeout: <duration> | default = 10s ] # How frequently to evaluate rules. [ evaluation_interval: <duration> | default = 1m ] # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: [ <labelname>: <labelvalue> ... ]  # File to which PromQL queries are logged. # Reloading the configuration will reopen the file. [ query_log_file: <string> ] # Rule files specifies a list of globs. Rules and alerts are read from # all matching files. rule_files: [ - <filepath_glob> ... ]  # A list of scrape configurations. scrape_configs: [ - <scrape_config> ... ]  # Alerting specifies settings related to the Alertmanager. alerting: alert_relabel_configs: [ - <relabel_config> ... ]  alertmanagers: [ - <alertmanager_config> ... ]  # Settings related to the remote write feature. remote_write: [ - <remote_write> ... ]  # Settings related to the remote read feature. remote_read: [ - <remote_read> ... ]Copy the code

4. Capture the Spring Boot application

Prometheus wants to crawl or poll a single application instance for metrics. Spring Boot provides a actuator endpoint in /actuator/ Prometheus that provides Prometheus grabbing in an appropriate format.

To expose metrics in a format that the Prometheus server can crawl, rely on micrometer-Registry-Prometheus

<dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> The < version > 1.6.4 < / version > < / dependency >Copy the code

Here is an example prometheus.yml

scrape_configs:
  - job_name: 'spring'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['HOST:PORT']
Copy the code

Next, create a project called Prometry-Example

pom.xml

<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd" > < modelVersion > 4.0.0 < / modelVersion > < the parent > < groupId > org. Springframework. Boot < / groupId > The < artifactId > spring - the boot - starter - parent < / artifactId > < version > 2.4.3 < / version > < relativePath / > <! -- lookup parent from repository --> </parent> <groupId>com.cjs.example</groupId> < artifactId > Prometheus - example < / artifactId > < version > 0.0.1 - the SNAPSHOT < / version > < name > Prometheus - example < / name > <description>Demo project for Spring Boot</description> <properties> <java.version>1.8</java.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <scope>runtime</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>Copy the code

application.yml

spring:
  application:
    name: prometheus-example
management:
  endpoints:
    web:
      exposure:
        include: "*"
  metrics:
    tags:
      application: ${spring.application.name}
Copy the code

This sentence and don’t forget: management. The metrics. Tags. Application = ${spring. Application. The name}

The default endpoints of Spring Boot Actuator are many, for details

Docs. Spring. IO/spring – the boot…

Start the project and use the browser to access /actuator/ Prometheus endpoint

Configure Prometheus to capture the application

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
  
  - job_name: 'springboot-prometheus'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['192.168.100.93:8080'] 
Copy the code

Restart the service

./prometheus --config.file=prometheus.yml
Copy the code

4.1. Grafana

grafana.com/docs/

grafana.com/tutorials/

Download & Extract

Wget https://dl.grafana.com/oss/release/grafana-7.4.3.linux-amd64.tar.gz tar - ZXVF grafana - 7.4.3. Linux - amd64. Tar. GzCopy the code

Start the

./bin/grafana-server web 
Copy the code

Browser visit http://localhost:3000

The default account is admin/admin

After the first login we changed the password to admin1234

Configure a data source first and select the data source later when adding the dashboard

Grafana officially provides many templates that we can use directly

The first step is to find the template we want

For example, we have chosen a template at random here

You can directly download the TEMPLATE JSON file and import it, or directly enter the template ID to load it. In this example, you can directly enter the template ID

Immediately, you see a beautiful display

Let’s add another DashBoard (ID: 12856)

The original link: www.cnblogs.com/cjsblog/p/1…

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

1. Like, forward, with your “like and comment”, is the power of my creation.

2. Follow the public account “Wish heaven has no BUG” and share original knowledge from time to time. Also look forward to the follow-up article ing

3. Reply [Learning] scan the code to obtain the learning materials package