Services have recently been monitored, and the most popular database currently monitored is Prometheus, which is also the default access database for Go-Zero. Read today about how Go-Zero plugged into Prometheus and how developers define their own monitoring metrics.

Monitor access

Service metrics monitoring based on Prometheus is integrated into the GO-Zero framework. However, it is not explicitly opened, which requires the developer to configure in config.yaml:

Prometheus:
  Host: 127.0. 01.
  Port: 9091
  Path: /metrics
Copy the code

If the developer builds Prometheus locally, write the configuration for collecting service monitoring information in Prometheus configuration file promethe. yaml:

- job_name: 'file_ds'
    static_configs:
      - targets: ['your-local-ip:9091']
        labels:
          job: activeuser
          app: activeuser-api
          env: dev
          instance: your-local-ip:service-port
Copy the code

Because local is running with Docker. Yaml in docker-Prometheus directory:

docker run \
    -p 9090:9090 \
    -v dockeryml/docker-prometheus:/etc/prometheus \
    prom/prometheus
Copy the code

Open localhost:9090 to see:

To see monitoring information for the service, click http://service-ip:9091/metrics:

In the figure above, we can see that there are two types of buckets and count/sum.

So how does Go-Zero integrate monitoring metrics? And what are they monitoring? How do we define our own metrics? Here are some explanations

Zeromicro.github. IO/Go-Zero /ser…

How to integrate

In the example above, the request mode is HTTP, which means that the monitoring indicator data is constantly collected when the server is requested. It is easy to think of middleware functionality, specific code: github.com/tal-tech/go…

var (
	metricServerReqDur = metric.NewHistogramVec(&metric.HistogramVecOpts{
		...
    // Monitor indicators
		Labels:    []string{"path"},
    // Histogram distribution, statistics of buckets
		Buckets:   []float64{5.10.25.50.100.250.500.1000},
	})

	metricServerReqCodeTotal = metric.NewCounterVec(&metric.CounterVecOpts{
		...
    // Monitor indicator: record indicator incr()
		Labels:    []string{"path"."code"}}))func PromethousHandler(path string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      // Request entry time
			startTime := timex.Now()
			cw := &security.WithCodeResponseWriter{Writer: w}
			defer func(a) {
        // Request the return time
				metricServerReqDur.Observe(int64(timex.Since(startTime)/time.Millisecond), path)
				metricServerReqCodeTotal.Inc(path, strconv.Itoa(cw.Code))
			}()
			// Middleware release to execute subsequent middleware and business logic. Go back here and do a full request metric report
      // [🧅 : Onion model]
			next.ServeHTTP(cw, r)
		})
	}
}
Copy the code

It’s actually quite simple:

  1. HistogramVecCollect request time:
    • bucketThat’s what they storeoptionSpecified time indicator. The amount of time a request takes is aggregated by the corresponding bucket and counted.
    • The final result is a routing distribution at different times, which provides an intuitive area for developers to optimize.
  2. CounterVecResponsible for the specifiedlabelsTag collection:
    • Labels: []string{"path", "code"}
    • labelsQuite atuple.go-zeroBased on(path, code)As a whole, the return times of different status codes for different routes are recorded. if4xx,5xxToo many times, should you look at the health of your service?

How to Customize

The basic Prometheus Metric package is also provided in Go-Zero for developers to develop their own Prometheus middleware.

Code: github.com/tal-tech/go…

The name of the use Collect function
CounterVec Single count. Use: QPS statistics CounterVec.Inc()Index + 1
GuageVec Simple index record. Applicable to disk capacity, CPU/Mem usage (can be increased or decreased) GuageVec.Inc()/GuageVec.Add()Index plus 1 over index plus N, or it could be negative
HistogramVec The distribution of reaction values. Applicable to: Request time and response size HistogramVec.Observe(val, labels)Record the current value of the indicator and locate the bucket where the value resides

HistogramVec.Observe()

We can actually see that each HistogramVec statistic above has three occurrences:

  • _count: Number of data
  • _sum: Sum all data
  • _bucket{le=a1}: in a[-inf, a1]Number of data of

Therefore, we also guess that in the statistical process, there are three kinds of data for statistics:

// Basically all Prometheus statistics use atomic CAS
// Better performance than using Mutex
func (h *histogram) observe(v float64, bucket int) {
	n := atomic.AddUint64(&h.countAndHotIdx, 1)
	hotCounts := h.counts[n>>63]

	if bucket < len(h.upperBounds) {
    // val corresponds to the data bucket +1
		atomic.AddUint64(&hotCounts.buckets[bucket], 1)}for {
		oldBits := atomic.LoadUint64(&hotCounts.sumBits)
		newBits := math.Float64bits(math.Float64frombits(oldBits) + v)
    // sum +v
		if atomic.CompareAndSwapUint64(&hotCounts.sumBits, oldBits, newBits) {
			break}}// count count +1
	atomic.AddUint64(&hotCounts.count, 1)}Copy the code

So developers want to define their own monitoring metrics:

  1. In the use ofgoctlThe build API code specifies what to buildThe middleware:Zeromicro. Making. IO/go – zero/mid…
  2. Write the indicator logic that needs to be counted in the middleware file
  3. Of course, developers can also write the metric logic of statistics in their business logic. Same as above.

The above is a parsing of the HTTP part of the logic, the RPC part of the logic is similar, you can see the design in the interceptor section.

conclusion

This paper analyzes the logic of the go-Zero service monitoring indicators. Of course, Prometheus can monitor some infrastructure by introducing a corresponding EXPORTER.

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!