Introduction to Performance Monitoring Prometheus

“This is the 24th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

An overview,

Prometheus is an open source monitoring, alarm, and time series database combination originally developed by SoundCloud and derived from Google Borgmon. Since joining CNCF in 2016, version 1.0 was officially released in June 2016, and version 2.0 based on new storage layer was released at the end of 2017, which can better cooperate with container platform and cloud platform. After graduation in August 2018, it has become the official monitoring scheme of Kubernetes. The community is active and third-party integration is very rich.

Official website address:prometheus.io/

Second, the target of monitoring

In the book SRE: Google Operation, Maintenance and Decryption, it is pointed out that the monitoring system needs to be able to effectively support white box monitoring and black box monitoring. The actual operating status of the white box can be understood, and the possible problems can be predicted by observing the monitoring indicators, so as to optimize the potential uncertainties. Black box monitoring, such as HTTP probe and TCP probe, can quickly notify relevant personnel to handle system or service failures. By establishing a sound monitoring system, the following objectives can be achieved:

Long-term trend analysis: Analyzes the long-term trend of monitoring indicators through continuous collection and statistics of monitored sample data. For example, the growth rate of disk space can be used to predict when resources need to be expanded in the future.
Comparative analysis: What are the differences in resource usage between the two versions of the system? How does the concurrency and load of the system vary at different volumes? Through monitoring, the system can be easily tracked and compared.
Alarm: When a fault occurs or is about to occur in the system, the monitoring system needs to respond to the fault and notify the administrator. In this way, the fault can be quickly handled or prevented in advance to avoid service impact.
Fault analysis and location: After a fault occurs, you need to investigate and rectify the fault. Through analysis of different monitoring and historical data, root causes can be found and solved.
Data visualization: The visual dashboard provides intuitive information such as system running status, resource usage, and service running status.

3. Advantages of Prometheus

Prometheus is an open source, complete monitoring solution that revolutionizes the traditional monitoring system testing and alerting model, creating a new model based on centralized rule calculation, unified analysis and alerting. Compared with traditional monitoring system Prometheus, it has the following advantages:

Easy to manage: a single binary, no third party dependencies, pull-based data fetching, and Prometheus’ Pull model architecture allow us to set up our monitoring system from anywhere (local computer, development environment, test environment). For complex situations, monitoring targets can also be dynamically managed using the capabilities of the Prometheus Service Discovery.
Monitoring the internal running status of a service: Pometheus encourages users to monitor the internal running status of a service. Based on Prometheus’s rich Client library, users can easily add support for Prometheus in their applications to obtain the real running status of the service and the application.

Powerful data model: Time series database TSDB and Golang can realize that each time series is uniquely identified by Metrics Name and a set of Labels. Labels representing dimensions may come from the status of your monitored objects. Such as code=404 or content_path=/ API /path. It could also come from your environment definition, such as environment=produment. Based on these Labels, we can easily aggregate, filter and cut the monitoring data.

http_request_status{code='200',content_path='/api/path', environment='produment'} => [value1@timestamp1,value2@timestamp2...]  http_request_status{code='200',content_path='/api/path2', environment='produment'} => [value1@timestamp1,value2@timestamp2...]
Copy the code

Powerful query language PromQL: A powerful data query language PromQL is built in to enable multiple queries, aggregations, and PromQL is also used for data visualization (such as Grafana) and alarms. PromQL makes it easy to answer questions like:
- What is the distribution range of 95% application latency over time?
- What is the estimated disk space footprint after 4 hours?
- What are the top five services in CPU usage? (filtering)
High performance: For a monitoring system, a large number of monitoring tasks inevitably result in a large amount of data. Prometheus processes these data efficiently, and for a single Instance of Prometheus Server it can process:
- Millions of monitoring indicators
- Processing hundreds of thousands of data points per second
- Thousands of the targets
Easy to scale: Prometheus is so simple that you can run a standalone Prometheus Sevrer in every data center, on every team. Prometheus’ support for federated clusters allows multiple Prometheus instances to generate a logical cluster. When the number of tasks handled by a single instance of Prometheus Server is too large, It can be extended by using functional partitioning (Sharding)+ federated clustering (Federation) to achieve multiple data centers.
Easy integration: Monitoring services can be built quickly using Prometheus and can be easily integrated into applications. Currently supported: Client SDKS for Java, JMX, Python, Go, Ruby,.NET, Node.js, etc., based on which applications can be quickly monitored by Prometheus or develop their own monitoring data collection programs. Monitoring data collected by these clients support not only Prometheus but also other monitoring tools such as Graphite. Prometheus also supports integration with other monitoring systems: Graphite, Statsd, Collected, Scollector, Muini, Nagios, etc. The Prometheus community also provides a wide range of third-party implementation support for monitoring data collection: JMX, CloudWatch, EC2, MySQL, PostgresSQL, Haskell, Bash, SNMP, Consul, Haproxy, Mesos, Bind, CouchDB, Django, Memcached RabbitMQ, Redis, RethinkDB, Rsyslog, etc. All in all, SDK support for multiple languages for application data burial, the community has rich plug-ins, white box & black box monitoring support, DevOps friendly.
Visualization: Prometheus Server comes with a Prometheus UI that makes it easy to query data directly and display data graphically. Prometheus also provides Promdash, a standalone Ruby On Rails based Dashboard solution. The latest Grafana visualization tool also provides full Prometheus support for creating more elegant monitor ICONS based on Grafana. It is also possible to implement its own monitoring visualization UI based on the API provided by Prometheus.
Openness: Generally speaking, when we need to monitor an application, we need the application to provide support for the corresponding monitoring system protocol. Therefore, the application is bound to the monitoring system of your choice. To reduce the restrictions imposed by this binding. For decision makers, either integrate support for the monitoring system directly into the application, or create separate services externally to accommodate different monitoring systems.

For Prometheus, the output format of the client library using Prometheus not only supports Prometheus’ formatted data, but also supports other monitoring systems, such as Graphite. So you could even use Prometheus’s Client library to enable monitoring data collection for your application without using Prometheus. This means that data collected using the SDK can be used by other monitoring systems, not necessarily by Prometheus.

Basic structure of Prometheus

The role and workings of the Prometheus project as a monitoring system can be explained in an official diagram shown below:

Prometheus collects Metrics data from the monitored object using a Pull from its partner, or indirectly through the gateway (if deployed within K8S, By default, it stores all captured data locally, cleans and collates the data according to certain rules, and then stores the results in a TSDB (time series database, such as OpenTSDB, InfluxDB, etc.) for subsequent retrieval by time. With this core monitoring mechanism in place, the remaining components of Prometheus are used to support this mechanism. Pushgateway, for example, allows monitored objects to Push Metrics data to Prometheus as a Push. Alertmanager, on the other hand, has the flexibility to set alarms based on Metrics information. The most popular feature of Prometheus, of course, is the flexible configuration of the monitoring data visualization interface exposed through Grafana.

V. Component content

Prometheus Server: The core component of Prometheus, which obtains, stores, and queries monitoring data. Prometheus Server manages monitoring targets through static configuration or dynamically manages monitoring targets and obtains data from them using Service Discovery. Second, Prometheus Server stores the collected monitoring data. Prometheus Server itself is a time series database that stores the collected monitoring data on local disks in a time series manner. Finally, Prometheus Server provides a custom PromQL language for data query and analysis.
- Retrieval: a sampling module
- TSDB: The default local storage of a storage module is TSDB
- HTTP Server: provides HTTP interface query and panel. The default port is 9090
Exporters/Jobs: Collects target objects (host, container…) And obtain performance data from Prometheus Server through the HTTP interface. Prometheus Server can access the monitoring data to be collected through the Endpoint provided by Prometheus. Supports databases, hardware, message-oriented middleware, storage systems, HTTP servers, JMX, and more. As long as it conforms to the interface format, it can be collected.
- Direct Collection: This type of Exporter has direct built-in support for Prometheus monitoring, as cAdvisor, Kubernetes, Etcd, Gokit, etc., all directly built-in endpoints for exposure of monitoring data to Prometheus.
- Indirect collection: indirect collection. The original monitoring target does not directly support Prometheus, so we need to write the monitoring and collection program of the monitoring target through the Client Library provided by Prometheus. For example, Mysql, JMX, Consul, and so on.
Short-lived jobs: Scenarios for transient tasks that cannot be pulled in pull mode. Push mode is used together with PushGateway
PushGateway: Optional component, used mainly for short-term jobs. Because Prometheus data acquisition is based on the Pull model, the network environment must be configured so that Prometheus Server can communicate directly with exporters. When such network requirements cannot be met directly, PushGateway can be used for forwarding. You can use PushGateway to Push internal network monitoring data to the Gateway. Prometheus Server can obtain monitoring data from PushGateway in the same Pull manner. Because such jobs were short-lived, they may have disappeared before Prometheus came to pull. To do this, jobs could push their metrics directly to Prometheus Server. This approach is primarily used for metrics at the service level, while Node exporter is required for metrices at the machine level.
Client SDK: The official client class libraries include Go, Java, Scala, Python, and Ruby, as well as many third-party class libraries supporting NodeJS, PHP, and Erlang
PromDash: Deprecated dashboard developed with Rails for visualizing metrics data
Alertmanager: Prometheus Server supports the creation of alarm rules based on PromQL. If the PromQL rules are met, an alarm is generated, and the alarm handling process is managed by the Alertmanager. AlertManager allows for integration with email, Slack, and other built-in notifications, as well as Webhook custom alarm handling. AlertManager is the alarm handling center of Prometheus.
Service Discovery, Service Discovery, Prometheus to support a variety of Service Discovery mechanisms: files, DNS, Consul, Kubernetes, it, EC2, and so on. The process of service discovery is not complicated; Prometheus searches for a list of targets to monitor through an interface provided by a third party, and then trains those targets to obtain monitoring data.

The general working process is as follows:

Prometheus Server periodically pulls metrics from the configured Jobs or Exporters, or receives metrics from Pushgateway. Or pull metrics from another Prometheus server.
Prometheus Server stores metrics collected locally and runs defined alert.rules to record new time series or push alerts to Alertmanager.
The Alertmanager processes the received alarms based on the configuration file and generates alarms.
In the graphical interface, visual data collection.

6. Install Prometheus Server

Prometheus is a compiled software package based on Golang that does not rely on any third party dependencies. To start Prometheus Server, users need to download the binary package, decompress the package, and add basic configurations.

1. Install from the binary package

For non-Docker users, the latest version of the Prometheus Sevrer package can be found at Prometheus. IO/Download / :

exportVERSION = 2.25.2 curl - LO, https://github.com/prometheus/prometheus/releases/download/v$VERSION/prometheus-$VERSION.darwin-amd64.tar.gz
Copy the code

Unzip and add the commands associated with Prometheus to the system environment variable path:

tar -xzf prometheus-${VERSION}.darwin-amd64.tar.gz
cd prometheus-${VERSION}.darwin-amd64
Copy the code

The current directory contains the default Promethes configuration file promethes. Yml:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
Copy the code

As Promtheus serves as a time series database, the data collected by it is stored locally as a file. The default storage path is data/, which requires manual creation:

mkdir -p data
Copy the code

You can also change the path of the local data store by using the parameter –storage.tsdb.path=”data/”.

When the Prometheus service is started, the Prometheus. Yaml file in the current path is loaded by default:

./prometheus  
Copy the code

Under normal circumstances, you should see the following output:

Level = info ts = 2021-03-21 T04:40:31. 632 zcaller=main.go:722 msg="Starting TSDB ..."Level = info ts = 2021-03-21 T04:40:31. 632 zcaller=web.go:528 component=web msg="Start listening for connections"Address = 0.0.0.0:9090 level = info ts = 2021-03-21 T04:40:31. 632 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1615788000000 1615852800000 ulid maxt = = 01 f0wfkta8jpye3gta5rp9xmb5 level = info ts = 2021-03-21 T04:40:31. 633 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1615852800000 1615917600000 ulid maxt = = 01 f0yddarf4mhecc8pz603czr1 level = info ts = 2021-03-21 T04:40:31. 633 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1615917600000 1615982400000 ulid maxt = = 01 f10b6wrznygjxzw90r2bqenk level = info ts = 2021-03-21 T04:40:31. 633 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1615982400000 1616047200000 ulid maxt = = 01 f1290d3kn9jngvmjxhp57epr level = info ts = 2021-03-21 T04:40:31. 633 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616047200000 1616112000000 ulid maxt = = 01 f146sz7q4mb39srvbrsdqkmm level = info ts = 2021-03-21 T04:40:31. 633 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616112000000 1616176800000 ulid maxt = = 01 f164kfhdrkcnxr7y9h6y8ge2 level = info ts = 2021-03-21 T04:40:31. 634 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616176800000 1616241600000 ulid maxt = = 01 f182d1jxt487yv58wyfzmjzs level = info ts = 2021-03-21 T04:40:31. 634 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616241600000 1616263200000 ulid maxt = = 01 f18q04a506q2942tqapknvye level = info ts = 2021-03-21 T04:40:31. 634 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616284800000 1616292000000 ulid maxt = = 01 f19bk905ygne1wv8cvx9rzqa level = info ts = 2021-03-21 T04:40:31. 634 zcaller=repair.go:57 component=tsdb msg="Found healthy block"Mint = 1616263200000 1616284800000 ulid maxt = = 01 f19bkb09bvdwbe7qzpxnetx0 level = info ts = 2021-03-21 T04:40:31. 741 zcaller=head.go:645 component=tsdb msg="Replaying on-disk memory mappable chunks if any"Level = info ts = 2021-03-21 T04:40:31. 799 zcaller=head.go:659 component=tsdb msg="On-disk memory mappable chunks replay completed"Duration = 57.298066 ms level = info ts = 2021-03-21 T04:40:31. 799 zcaller=head.go:665 component=tsdb msg="Replaying WAL, this may take a while"Level = info ts = 2021-03-21 T04:40:32. 030 zcaller=head.go:691 component=tsdb msg="WAL checkpoint loaded"Level = info ts = 2021-03-21 T04:40:32. 475 zcaller=head.go:717 component=tsdb msg="WAL segment loaded" segment=1134 maxSegment=1137
Copy the code

2, use container installation

For Docker users, Prometheus Server can be started directly using a Prometheus image:

docker run -p 9090:9090 -v 
/etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
Copy the code

After starting, you can access the Prometheus UI at http://localhost:9090:

3. Install using Kubernetes Operator

Reference: github.com/coreos/kube…

Seven, summary

We have taken a look at Prometheus and its advantages and disadvantages compared to other similar solutions to help you select a monitoring solution. At the same time, we introduced the ecology and core capabilities of Prometheus, and we believe that we can have an intuitive understanding of Prometheus through this article.

References:

[1] : www.xuyasong.com/?p=1550
[2] : Prometheus. IO/docs/introd…