preface

This paper mainly introduces the case of quickly building a simple monitoring system on the user’s legacy system to improve the control and response speed of the system. The characteristics of this case are firstly the legacy system, secondly the operating system of the machine that can be provided is Windows, and finally the Internet cannot be connected due to the internal network. Of course, Rome is not built in a day, in line with lean thinking, first solve the biggest pain points, and then gradually optimize, here simply leave the idea of building monitoring system, if you can help us solve some problems, of course, it would be better.

background

Recently, a system of the company was recommended by users to expand the scope of use, which was originally a good thing, but I did not expect that the number of system users skyrocketed in a short period of time, making the problems of the original system frequent, and the problems that had never been heard of when used in a small range showed explosive growth. More deadly is user of the system is deployed in Intranet, development and operations teams to access it is very difficult, several times is a response to a user in the group, our development and operations staff realized, plus access server is not very convenient, need special approval, head of the user to obtain the corresponding permissions, spent a long time to locate and fix the problem, Customer satisfaction has also decreased. So in order to solve the current predicament, on the one hand, to arrange the r&d team ruled out potential problems, on the one hand, considering the user’s special situation and system expansion, sure it is time to set up a set of monitoring system, the purpose is to happen in the system or impending problems immediately notified in action, maximum limit reduce problems brought about by the impact. So the monitoring system construction action began, the first is to analyze the status quo. 1) The system is deployed on the user Intranet and cannot connect to the Internet. 2) All the machines of the legacy system are Windows. 3) The application of the legacy system should avoid modification as far as possible, because the R&D team has already engaged in another line of fire.

Technical selection considerations

In fact, in the past, the r&d team considered the issue of monitoring, and tried Java Melody for interface and method monitoring, but the current technology stack is obviously not suitable, first of all, it is intrusive and can only monitor Java applications. We need to monitor not only multiple Java applications but also multiple data services postgrelSQL, Redis and mysql. Given the diversity of our monitoring targets, we needed an independent monitoring system that was not dependent on a specific technology stack, and Prometheus entered the option. Prometheus is an open source system monitoring and alarm tool set built by SoundCloud. Of note is that Prometheus is the second CNCF recommended open source project after Kubernetes. Prometheus was not mentioned in the first place because it was the preferred monitoring system for Kubernetes, which was thought to require support from Kubernetes or at least Docker. In fact, while Prometheus worked well for Kubernetes, it was commendable for supporting older environments, and the decision was made immediately after compatibility testing on Windows, It is not a stretch to say that you could install Prometheus on a Kubernetes cluster in 15 minutes, but it could be set up in 30 minutes if you knew how to do it and had the resources downloaded in advance. Since Prometheus was used, the companion display service Grafana was a natural choice, but Grafana also offered a Version of Windows Monitoring, and the last thing to do was to use any of Prometheus’s exporters for monitoring data.

Actual combat process

Monitoring system composition

Prometheus as the core component, Granfana as the display component jmx_exporter as the collection terminal for JVM applications, mysql_exporter, redis_exporter as the collection terminal for storage services, and pushgateway as the pushgateway.

Build the core component Prometheus

  1. Download Prometheus from the download address
  2. Unpack the
  3. Modify the Prometheus. Yml,
  4. Run Prometheus. exe –config.file=prometheus.yml as an administrator

Prometheus. Yml example

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']
Copy the code

The schematic diagram of normal startup is as follows:

Build the presentation component Grafana

  1. Download Granfana from the download address
  2. Simply run grafana-server.exe
  3. To access the system, the default port is 3000 and the default account is admin/admin

Normal operation results:



Prepare data collection components

The data collection component is up to you

jmx_exporter

Github portal JMX_EXPORTER is associated with the startup application through Java Explorer, and can access monitoring data through the JMX interface. Example start command line

Java - javaagent:. / jmx_prometheus_javaagent - 0.12.0. Jar = 8080: config. Yaml - jar yourJar. JarCopy the code

Jmx_prometheus_javaagent-0.12.0. jar is the jar file you need to download, 8080 is the exposed monitor port, and config.yaml is the specific configuration file used for configuring Prometheus

mysql_exporter

Mysql-half is a mysql monitoring terminal. You need to set the account and permissions of mysql to operate normally.

  1. Download the exporter, url
  2. Configure database users and permissionsCREATE USER 'exporter'@'localhost' IDENTIFIED BY 'XXXXXXXX' WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
  3. Set environment variable DATA_SOURCE_NAME= ‘user:password@(hostname:3306)/’
  4. Start the exporter,./mysqld_exporter <flags>The default port looks like 9104

pushgateway

Although emQX has Prometheus monitoring support, it is implemented as a plugin and pushGateway, and the official documentation is not clear. The following uses emqx as an example to illustrate the use of pushagateway.

  1. Download Prometheus PushGateway, URL
  2. Double-click directly or start with script. The default port is 9091
  3. Download the EMQx (must >3.0)URL
  4. Go to the bin directory and enter emqx start
  5. The browser accesses localhost:18083. The default account is admin/public
  6. Into the plug-in management interface, find below the line, start, and modify the configuration statsd. Push. Gateway. The server item to the address on your pushgateway

Modify the Prometheus. Yml

Update Prometheus configuration based on your monitoring items

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'mysql-metrics'
   #   metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9104']
        labels:
          instance: db1
  - job_name: pushgateway
    static_configs:
      - targets: ['localhost:9091']
        labels:
          instance: pushgateway
  - job_name: jmx
    static_configs:
      - targets: ['localhost:9092']
Copy the code

Configuration reload

Send an update request using curl


curl -XPOST http://ip:9090/-/reload
Copy the code

Note that the –web.enable-lifecycle startup parameter is added at startup as described above

Of course, you can also manually shut down the EXE and then start the EXE.

After restarting your Prometheus, the Targets interface should look something like this.

Custom Grafana dial

Then you can go to LocalHost :3000 and create a dashboard to tap into Prometheus data and customize your interface with your own ideas. Not to say the least, it’s a good idea to start by looking for templates, deleting, deleting, and editing. As shown below.

conclusion

There are things that look hard but are easy to do, like building a surveillance system, and there are things that look easy but are hard to do, like building a good surveillance system. Prometheus’s excellent design and compatibility allowed us to build a monitoring system in 30 minutes, but it was a long way off.