1. Introduction to this paper

We usually use message middleware for message decoupling to achieve the purpose of peak cutting and valley filling. Then does the decoupling effect really meet the expectation? How to measure the result? For this purpose, this paper uses Prometheus + Grafana to record data indicators and view real-time results.

First look at the implementation effect:

What does this thing do?

1) Monitor the consumption of MQ and know the actual effect of peak clipping and valley filling after business decoupling;

2) Provide data support for reasonably configuring the number of consuming threads;

3) Consumption duration distribution provides data support for optimizing business code;

Note: reprint please indicate the source!!

2. Basic learning

Prometheus: next-generation cloud native monitoring system; Combined with Grafana data kanban, it can well realize the monitoring of database and server running state.

Metrics Types: Counter, Gauge, summary, histogram

counter

Introduction:

Incrementing counters, common usage scenarios: machine uptime, total service requests

Display format:

# HELP mq_message_total Current TPS. # TYPE mq_message_total counter mq_message_total{application="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",} 274.0 mq_message_total {application = "yiyi - example - Prometheus", method = "save", topic = "EAT_FINISH group =" EAT_FINISH_GROUP ",} 531.0Copy the code

gauge

Introduction:

Counter that can increase or decrease; Common usage scenarios: Number of threads currently running and memory size used by service running

Display format:

# HELP mq_message_working_threads Number of worker threads executing
# TYPE mq_message_working_threads gauge
mq_message_working_threads{application="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 0.0 mq_message_working_threads} {application ="yiyi-example-prometheus",method="save",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 1.0}Copy the code

summary

Introduction:

Total number of records (sum) and number of records (count). Common usage scenarios are as follows: Task processing duration distribution and average CPU usage.

Display format:

# HELP mq_message_deal_time Time spent to process each message
# TYPE mq_message_deal_time summary
mq_message_deal_time_count{application="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 274.0 mq_message_deal_time_sum} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 7093.0 mq_message_deal_time_count} {application ="yiyi-example-prometheus",method="save",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 531.0 mq_message_deal_time_sum} {application ="yiyi-example-prometheus",method="save",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 13572.0}Copy the code

histogram

Introduction:

It is generally used to analyze the distribution of data and record the total amount (sum), quantity (count) and distribution.

Display format:

# HELP MQ_message_deal_time_histogram Processing duration grouping
# TYPE mq_message_deal_time_histogram histogram
mq_message_deal_time_histogram_bucket{application="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.005", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.01", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.025", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.05", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.075", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.1", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.25", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.5", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="0.75", 0.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="1.0", 4.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="2.5", 11.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="5.0", 24.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="7.5", 37.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="10.0", 55.0 mq_message_deal_time_histogram_bucket} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="+Inf", 274.0 mq_message_deal_time_histogram_count} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 274.0 mq_message_deal_time_histogram_sum} {application ="yiyi-example-prometheus",method="update",topic="EAT_FINISH",group="EAT_FINISH_GROUP", 7093.0}Copy the code

3. Prometheus + Grafana service establishment

Prometheus + grafana boot See also: Click to jump

4. Write MQ consumption monitoring plug-in

Here the MQ consumption performance metrics are recorded with a small requirement;

Requirements:

  • MQ consumer – TPS count of consumption messages in seconds
  • MQ consumer – number of worker threads statistics in seconds;
  • MQ consumer – Average message processing time in seconds;
  • MQ consumer – Distribution of the time it takes to process messages in seconds;

4.1 Spring-boot plug-in writing

pom.xml

<! -- =========================== prometheus =========================== -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.1.4</version>
</dependency>
Copy the code

Core processing class

JobMetrics.java

@Component
public class JobMetrics {

    @Value("${spring.application.name}")
    private String application;

    String [] labelNameArray = {"application"."method"."topic"."group"};

    /** * count mp message TPS, * tag (app name, sample IP,topic,group) 
 TPS count:  increase(mq_message_total{topic="EAT_FINISH"}[1m])/60 

*/
private final Counter tps = Counter.build()
.name("mq_message_total")
.labelNames(labelNameArray)
.help("The current TPS.")
.create();

/** * Average processing duration of a single message: sum(rate(mq_message_deal_time_sum{application="yiyi-example-prometheus", topic="EAT_FINISH",group="EAT_FINISH_GROUP"}[1m])) / sum(rate(mq_message_deal_time_count{application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"}[1 M]) single message duration: same principle as above * */
private final Summary dealTimeSummary = Summary.build()
.name("mq_message_deal_time")
.labelNames(labelNameArray)
.help("Processing time per message")
.create();

private final Summary delayTimeSummary = Summary.build()
.name("mq_message_delay_time")
.labelNames(labelNameArray)
.help("Per-message processing delay")
.create();

/** * Number of working threads */
private final Gauge workingThreads = Gauge.build()
.name("mq_message_working_threads")
.labelNames(labelNameArray)
.help("Number of working threads executing")
.create();

/** * Processing duration grouping *

 * To use this type, you need to know the label of le grouping in advance ** The following promQL statements are shown in percentage: * = = = = = = = = = = = = "left" = = = = = = = = = = = = = = = = * sum (rate (mq_message_deal_time_histogram_bucket * {* Application ="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="1.0" *}[1m])) * / * sum(rate(mq_message_deal_time_histogram_count * { * application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP" * }[1m])) * * * ============ [among] the = = = = = = = = = = = = = = = = (sum (rate (mq_message_deal_time_histogram_bucket * * * {* Application ="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="2.5" *}[1m]) * * -sum(rate(mq_message_deal_time_histogram_bucket * { * Application = "yiyi - example - Prometheus," topic = "EAT_FINISH group =" EAT_FINISH_GROUP ", le = "1.0" * *} [m] 1))) /sum(rate(mq_message_deal_time_histogram_count * { * application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP" * }[1m])) * * * ============ [back] = = = = = = = = = = = = = = = = (sum (rate (mq_message_deal_time_histogram_count * * * {* application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP" * }[1m])) * -sum(rate(mq_message_deal_time_histogram_bucket * { * Application = "yiyi - example - Prometheus," topic = "EAT_FINISH group =" EAT_FINISH_GROUP ", le = "10.0" * *} [m] 1))) /sum(rate(mq_message_deal_time_histogram_count * { * application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP" * }[1m])) * * *

* * */
private final Histogram dealTimeHistogram = Histogram.build()
.name("mq_message_deal_time_histogram")
.labelNames(labelNameArray)
.help("Processing Duration Grouping")
.create();

/** ** simulates request processing, actually using MQ Consumer, which can integrate itself with AOP and companies */
void handleRequest(String method){

String [] labelArray = {"yiyi-example-prometheus",method,"EAT_FINISH"."EAT_FINISH_GROUP"};

// Handle arrays
workingThreads.labels(labelArray).inc();

// Indicates the time of birth
long bornTime = System.currentTimeMillis() - new Random().nextInt(5) + 1;

int dealTime = new Random().nextInt(50) + 1;
try {
TimeUnit.MILLISECONDS.sleep(dealTime);
} catch (InterruptedException e) {
// TODO exception handling
e.printStackTrace();
}
long delayTime = System.currentTimeMillis() - bornTime;

tps.labels(labelArray).inc();
dealTimeSummary.labels(labelArray).observe(dealTime);
delayTimeSummary.labels(labelArray).observe(delayTime);
workingThreads.labels(labelArray).dec();
dealTimeHistogram.labels(labelArray).observe(dealTime);

}

@Autowired
public JobMetrics(PrometheusMeterRegistry meterRegistry) {
CollectorRegistry prometheusRegistry = meterRegistry.getPrometheusRegistry();
prometheusRegistry.register(tps); // TPS
prometheusRegistry.register(dealTimeSummary); // Task execution time
prometheusRegistry.register(delayTimeSummary); // Duration of the task lifecycle
prometheusRegistry.register(workingThreads); // Number of worker threads
prometheusRegistry.register(dealTimeHistogram); // Group the processing duration}}Copy the code

MyJob.java

Mock user request

/** * simulates 2 too machines processing messages */
@Component
@EnableScheduling
public class MyJob {

    @Autowired
    private JobMetrics jobMetrics;


    @Async("main")
    @Scheduled(fixedDelay = 500)
    public void tpsRequestHandle1(a) {
        jobMetrics.handleRequest("save");
    }


    @Async("main")
    @Scheduled(fixedDelay = 1000)
    public void tpsRequestHandle2(a) {
        jobMetrics.handleRequest("update"); }}Copy the code

4.2 Grafana Displays the configuration

Number of working threads

promQL

mq_message_working_threads{application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"}
Copy the code

Effect of the message

TPS quantity statistics for consuming messages

promQL

rate(mq_message_total{topic="EAT_FINISH"}[1m])
Copy the code

Effect of the message

Average message processing time

promQL

sum(rate(mq_message_deal_time_sum{application="yiyi-example-prometheus", topic="EAT_FINISH",group="EAT_FINISH_GROUP"}[1m]))
/
sum(rate(mq_message_deal_time_count{application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"}[1m]))
Copy the code

Effect of the message

The time distribution for processing messages

promQL

= = = = = = = = = = = = "left" = = = = = = = = = = = = = = = = sum (rate (mq_message_deal_time_histogram_bucket {application ="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="1.0"
        }[1m]))
 /
 sum(rate(mq_message_deal_time_histogram_count
    {
 		application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"} [m] 1)) = = = = = = = = = = = = "middle", in turn, configure multiple = = = = = = = = = = = = = = = = (sum (rate (mq_message_deal_time_histogram_bucket {application ="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="2.5"
    }[1m]))

 -sum(rate(mq_message_deal_time_histogram_bucket
    {
 		application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="1.0"
    }[1m])))
 /sum(rate(mq_message_deal_time_histogram_count
    {
 		application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"} [m] 1)) = = = = = = = = = = = = [back] = = = = = = = = = = = = = = = = (sum (rate (mq_message_deal_time_histogram_count {application ="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"
    }[1m]))
 -sum(rate(mq_message_deal_time_histogram_bucket
    {
 		application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP",le="10.0"
    }[1m])))
 /sum(rate(mq_message_deal_time_histogram_count
    {
 		application="yiyi-example-prometheus",topic="EAT_FINISH",group="EAT_FINISH_GROUP"
    }[1m]))
Copy the code

According to the effect

see

  • prometheus-book
  • Source address – Github