Originally published on Kuricat.com

The scope of this article is limited to Golang webApp single service manometry and not to scenarios such as link manometry

Purpose of manometry

Before talking about the index of pressure measurement, we need to clarify the purpose of pressure measurement, pressure measurement is not the highest QPS OK…… We also need to pay attention to the corresponding index performance of Web App under various pressures. At the same time, it is very important to use pressure measurement to find the most suitable pressure range for App or frame. Based on these data and reports, we can better select and allocate resources.

indicators

In addition to QPS, there are other indicators that need to be paid attention to, such as CPU/MEM, HTTP status code, Timeout status, response delay and network card traffic. The following indicators are discussed one by one.

QPS

The number of requests per second is an objective metric. For Go Web applications, the main factors affecting QPS are CPU and bandwidth.

CPU && Response latency && Timeout

Before we talk about CPU/ response latency /Timeout, let’s do a simple experiment to understand the phenomenon.

It is assumed that a Go Web App will be in the most appropriate pressure range when it is 50 concurrent, the CPU usage is stable at 95-100%, the response delay is 7-10 ms, and the QPS without Timeout is 10K.

When we increase the number of concurrent requests to 100, we find that the QPS does not continue to increase, remaining around 10K, but the response delay increases, say, around 20ms

We then increased the number of concurrent requests to 1000, and we found that the QPS was still around 10K, but the response delay had skyrocketed to 300ms, and even the response delay of some requests had increased to 1s, accompanied by a small amount of request Timeout.

The experimental conclusion is that for GoWeb App, as the number of concurrent increases, our QPS also increases, while the response delay does not increase until it reaches the most appropriate pressure range of the program. After reaching the most appropriate pressure range, continue to increase the amount of concurrency, QPS will not continue to rise, but the response delay will gradually increase, in the last extreme situation, may be accompanied by request Timeout.

So why does this happen? We need to take a quick look at the operating model of Golang Http Server.

In the HTTP Server implementation of the standard library NET/HTTP, a new Goroutine coroutine is opened to handle each request received. So when a huge amount of concurrency comes in, the HTTP server opens a huge number of Goroutines to handle it, but goroutines are not operating system level threads, and there is no way to split time slices directly from the CPU. Its execution is completely scheduled by Golang’s own coroutine scheduler, and when the scheduler does not allocate time slices to the coroutine, the coroutine is just a piece of code context stored in memory, without any excess loss.

As you can see in the graph below, due to the limited computing power, the CPU’s computing power is equivalent to a fixed size window, which slides along a strip of goroutines to complete a range of Goroutine tasks.

When the Groutine strip is too long, the Goroutine behind the strip has not been partied for too long, which causes the Client to directly Timeout, which explains why the concurrency continues to increase while the QPS remains the same. Request latency and Timeout requests continue to increase with concurrency.

MEM

Usually, the MEM occupied by Goweb programs will not increase too much or too rapidly during the running process (except in special cases). If it is found that the MEM occupied continues to increase during the pressure test, it is recommended to use pprof debug to check whether there is a memory leak… Pprof’s guide can be found here

Network traffic

Network adapter traffic is very easy to be ignored in pressure measurement. It is common to find that no matter how much you tune it, the QPS just won’t work, and it turns out that the uplink/downlink traffic/packet of the network card has reached its limit.

The HTTP status code

During the pressure survey, the most abnormal status codes we encounter are usually 499 and 5XX series,

  • 499
    • Client cancellation
  • 5xx
    • 500 The server program reported an error
    • According to the definition, 503 means service overload. However, according to the normal operation model of Goweb, Timeout is reported instead of 503 status code when the service is overloaded, but this is an incorrect feedback. The Goweb Goroutine Pool is modified or QOS is performed on the link
    • 502 &&504 may be a gateway error

Pressure measuring tools

Finally, the pressure measuring tool, I use WRK more, but the current pressure measuring tool is basically similar, are specified the number of concurrent, and pressure measuring time, to obtain QPS.

However, there is another pressure tool that can gradually push up the concurrency by itself, stopping when the concurrency stalls or the response latency starts to increase or a Timeout occurs.