The author | Chen Houdao Feng Qing source | alibaba cloud native public number

Rocketmq-exporter this article will be a brief introduction to the design and implementation of RocketMQ-Exporter. You can learn about the implementation process of RocketMQ-Exporter and how to maintain your RocketMQ monitoring system through RocketMQ-Exporter. RocketMQ interactive online tutorials are now available at the Zhixing Hands-on Lab and at start.aliyun.com on PC.

RocketMQ Cloud Native series:

  • Ali RocketMQ how to make double 11 peak below 0 failure
  • What happens when RocketMQ meets Serverless?
  • Cloud native RocketMQ O&M management and control tool – RocketMQ Operator
  • The evolution path of message-oriented middleware in cloud native era
  • Building a customized DevOps Platform based on RocketMQ Prometheus Exporter

RocketMQ – Exporter project making address: https://github.com/apache/rocketmq-exporter

The main content of this article includes the following aspects:

  1. RocketMQ introduction
  2. Prometheus profile
  3. RocketMQ- A concrete implementation of RocketMQ
  4. RocketMQ-Exporter monitors and alarms
  5. RocketMQ-Exporter example

RocketMQ introduction

RocketMQ is a distributed messaging and streaming data platform with low latency, high performance, high reliability, trillion-scale capacity, and flexible scalability. In simple terms, it consists of a Broker server and a client. The Producer is a message Producer that sends messages to the Broker server. Another is the message Consumer client, where multiple consumers can form a Consumer group to subscribe to and pull messages stored on the consuming Broker server.

Because of its characteristics of high performance, high reliability and high real-time performance, it is more and more widely used in MQTT and other message scenarios combined with other protocol components. However, for such a powerful message-oriented middleware platform, there is still a lack of monitoring and management platform in actual use.

Prometheus is currently the most widely used monitoring solution in the open source community. Compared with other traditional monitoring systems, Prometheus has the advantages of ease of management, internal running state of monitoring services, powerful data model, powerful query language PromQL, efficient data processing, scalability, easy integration, visualization, and openness. Prometheus also made it possible to quickly build a monitoring platform for RocketMQ.

Prometheus profile

The following diagram shows the basic architecture of Prometheus:

1. Prometheus Server

Prometheus Server is the core component of Prometheus and is responsible for obtaining, storing, and querying monitoring data. Prometheus Server manages monitoring targets through static configuration or dynamically manages monitoring targets and obtains data from them using Service Discovery. Second, Prometheus Server stores the collected monitoring data. Prometheus Server itself is a time series database that stores the collected monitoring data on local disks in a time series manner. Finally, Prometheus Server provides a custom PromQL language for data query and analysis.

2. Exporters

A half Exporter exposes monitoring data collection endpoints to Prometheus Server through HTTP, and Prometheus Server accesses the endpoints provided by a half Exporter to obtain monitoring data to be collected. Rocketmq-exporter is one such Exporter, which first collects data from the RocketMQ cluster and then, with the help of a third-party client library provided by Prometheus, normalize the collected data to meet the requirements of the Prometheus system. Prometheus periodically pulls data from a friend.

The current RocketMQ Exporter has been official included Prometheus, the address is: https://github.com/apache/rocketmq-exporter.

RocketMQ- A concrete implementation of RocketMQ

At present, I am at Exporter, and the implementation principle is as follows:

The whole system is based on the Spring Boot framework. Because MQ itself provides comprehensive data statistics, at my friend I can simply pull the statistics from the MQ cluster and process them. So rocketMQ-Exporter’s basic logic is to internally start multiple scheduled tasks to periodically pull data from the MQ cluster, and then normalize the data and expose it to Prometheus through its endpoints. It mainly contains the following three main functional parts:

  • The MQAdminExt module captures statistics within the MQ cluster by encapsulating an interface provided by the MQ system client.
  • The MetricService is responsible for processing the resulting data returned by the MQ cluster to the formatted data required by Prometheus.
  • The Collect module stores normalized data, and when Prometheus periodically pulls data from its Exporter, its Exporter exposes the data collected by the Collector to the/Metrics endpoint via HTTP.

RocketMQ-Exporter monitors and alarms

Rocketmq-exporter collaboratively monitors Prometheus. Check out the monitoring and alarm indicators currently defined by RocketMQ-Exporter on Expoter.

  • Monitoring indicators

Rocketmq_message_accumulation is an aggregation indicator that needs to be aggregated based on other reported indicators.

  • The alarm indicator

The value threshold is not fixed for each consumer. It is based on the number of messages produced by a producer in the past five minutes. Users can also set the threshold based on the actual situation. The alarm indicator value is a threshold and a symbolic value. You can set it based on RocketMQ usage. In previous monitoring systems, since Prometheus did not have the powerful PromQL language to handle consumer alarms, it was necessary for the RocketMQ system maintenance staff to set up alarms for each consumer. Or it can be added automatically when the system background detects that a new consumer has been created. In Prometheus, this is done with a statement like this:

(sum(rocketmq_producer_offset) by (topic) - on(topic)  group_right  sum(rocketmq_consumer_offset) by (group,topic)) 
- ignoring(group) group_left sum (avg_over_time(rocketmq_producer_tps[5m])) by (topic)*5*60 > 0
Copy the code

Using the PromQL statement, you can not only create a consumer stack alarm for any consumer, but also set the consumption stack threshold to a threshold related to the producer sending speed. This greatly increases the accuracy of the consumption accumulation alarm.

RocketMQ-Exporter example

1. Start NameServer and Broker

To verify RocketMQ’s Spring-boot client, first ensure that the RocketMQ service is correctly downloaded and started. See the RocketMQ Main Site’s Quick Start to do this. Make sure that starting NameServer and Broker are started correctly.

2. Compile RocketMQ – Exporter

Git source code:

git clone https://github.com/apache/rocketmq-exporter
cd rocketmq-exporter
mvn clean install
Copy the code

3. Configure and run

Rocketmq-exporter has the following running options:

The above run options can be changed in the configuration file after downloading the code or set from the command line.

The compiled JAR package is called RocketMQ-MAINexporter 0.0.1-snapshot.jar and can be run as follows.

Java jar rocketmq - exporter - 0.0.1 - the SNAPSHOT. Jar [- rocketmq. Config. NamesrvAddr = "127.0.0.1:9876"...Copy the code

4. Install the Prometheus

Download the Prometheus installation package from the official download address of Prometheus. The Prometheus installation package in Linux is promethes-2.1.0-RC.1.linux-amd64.tar. gz. To start the Prometheus process, perform the following steps.

The tar - XZF Prometheus - 2.7.0 - rc. 1. Linux - amd64. Tar. GZCD Prometheus - 2.7.0 - rc. 1. Linux - amd64 /. / Prometheus --config.file=prometheus.yml --web.listen-address=:5555Copy the code

The default listening port number for Prometheus is 9090, and in order not to conflict with the listening port number of other processes on the system, we reset the listening port number to 5555 in the startup parameter. To verify that Prometheus is installed, visit http://< server IP address >:5555 in a browser. The following page is displayed:

Since the RocketMQ-Exporter process is running, data from RocketMQ-Exporter can be accessed through Prometheus by changing the configuration file started by Prometheus.

The overall configuration file is as follows:

# my global config
global:
   scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
   evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
   # scrape_timeout is set to the global default (10s).
 
 
 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
 rule_files:
   # - "first_rules.yml"
   # - "second_rules.yml"
   

 scrape_configs:
   - job_name: 'prometheus'
     static_configs:
     - targets: ['localhost:5555']
   
   
   - job_name: 'exporter'
     static_configs:
     - targets: ['localhost:5557']
Copy the code

After the configuration file is changed, restart the service. After restart, rocketMQ-EXPORTER can be consulted on the Prometheus interface for indicators reported by RocketMQ_broker_Tps, which produces the following results:

5. Add alarm rules

RocketMQ alarm indicators can be configured in Prometheus after Prometheus can display rocketMQ-EXPORTER indicators. Add the following alarm configuration items in the configuration file of Prometheus: *. Rules indicates that multiple files with the rules suffix can be matched.

rule_files: # # - "first_rules. Yml" - "second_rules. Yml" - / home/Prometheus/Prometheus - 2.7.0 - rc. 1. Linux - amd64 / rules / *. RulesCopy the code

The alarm configuration file is warn. Rules, as shown in the following figure. The threshold is only used as an example. You need to set the threshold based on actual usage.

###
# Sample prometheus rules/alerts for rocketmq.
#
###
# Galera Alerts

groups:
- name: GaleraAlerts
  rules:
  - alert: RocketMQClusterProduceHigh
    expr: sum(rocketmq_producer_tps) by (cluster) >= 10
    for: 3m
    labels:
      severity: warning
    annotations:
      description: '{{$labels.cluster}} Sending tps too high.'
      summary: cluster send tps too high
  - alert: RocketMQClusterProduceLow
    expr: sum(rocketmq_producer_tps) by (cluster) < 1
    for: 3m
    labels:
      severity: warning
    annotations:
      description: '{{$labels.cluster}} Sending tps too low.'
      summary: cluster send tps too low
  - alert: RocketMQClusterConsumeHigh
    expr: sum(rocketmq_consumer_tps) by (cluster) >= 10
    for: 3m
    labels:
      severity: warning
    annotations:
      description: '{{$labels.cluster}} consuming tps too high.'
      summary: cluster consume tps too high
  - alert: RocketMQClusterConsumeLow
    expr: sum(rocketmq_consumer_tps) by (cluster) < 1
    for: 3m
    labels:
      severity: warning
    annotations:
      description: '{{$labels.cluster}} consuming tps too low.'
      summary: cluster consume tps too low
  - alert: ConsumerFallingBehind
    expr: (sum(rocketmq_producer_offset) by (topic) - on(topic)  group_right  sum(rocketmq_consumer_offset) by (group,topic)) - ignoring(group) group_left sum (avg_over_time(rocketmq_producer_tps[5m])) by (topic)*5*60 > 0
    for: 3m
    labels:
      severity: warning
    annotations:
      description: 'consumer {{$labels.group}} on {{$labels.topic}} lag behind
        and is falling behind (behind value {{$value}}).'
      summary: consumer lag behind
  - alert: GroupGetLatencyByStoretime
    expr: rocketmq_group_get_latency_by_storetime > 1000
    for: 3m
    labels:
      severity: warning
    annotations:
      description: 'consumer {{$labels.group}} on {{$labels.broker}}, {{$labels.topic}} consume time lag behind message store time
        and (behind value is {{$value}}).'
      summary: message consumes time lag behind message store time too much 
Copy the code

Finally, check out the alarm display from Prometheus, where items in the alarm state are red and those in the normal state are green.

6. Grafana dashboard for RocketMQ

Prometheus’ own metrics display platform is not as good as the current popular Grafana display platform. In order to better display RocketMQ metrics, Prometheus can use Grafana to display metrics obtained by Prometheus.

First of all to the website to download: https://grafana.com/grafana/download, here is still in binaries are installed, for example.

Wget https://dl.grafana.com/oss/release/grafana-6.2.5.linux-amd64.tar.gz tar - ZXVF grafana - 6.2.5. Linux - amd64. Tar. Gz CD Grafana 5.4.3 /Copy the code

Grafana = 55555; grafana = 55555; grafana = 55555; grafana = 55555;

./bin/grafana-server web
Copy the code

You can then verify that Grafana has been successfully installed by visiting http://< server IP address >:55555 in your browser. The default user name and password are admin and admin. When you log in to the system for the first time, you are required to change the password. After changing the password, the following information is displayed:

Clicking the Add Data Source button will ask you to select the data source.

Select Prometheus as the data source and set the address of the data source to the address of Prometheus started in the previous step.

Returning to the main screen will ask you to create a new Dashboard.

Click Create Dashboard. You can create dashboards manually or by importing configuration files. The RocketMQ dashboard configuration file has been uploaded to Grafana’s official website, where it is created by importing configuration files.

Click the New Dashboard dropdown button.

Select Import Dashboard.

This time can arrive Grafana website to download the current for RocketMQ create good configuration file, the address is: https://grafana.com/dashboards/10477/revisions, as shown in the figure below:

Click Download to download the configuration file, download the configuration file, and then copy the content of the configuration file and paste it into the paste in the image above.

Finally, the configuration file is imported into Grafana as described above.

The final result looks like this:

Author’s brief introduction

Chen Houdao, formerly worked for Tencent, Shanda, Douyu and other Internet companies. Currently, I am working for Suntech, where I am responsible for the design and development of infrastructure. Has in-depth research on distributed message queues, microservices architecture and landing, DevOps and monitoring platforms.

Feng Qing used to work for Huawei. Currently, I am working in Suntech Infrastructure Team, responsible for the development of basic components.

Log in to start.aliyun.com for immersive online interactive tutorials.