How to improve QPS and RT

For QPS, RT these terms must be familiar to everyone, but when it comes to how to improve they are at a loss. Let’s look into it today

Noun explanation

RT: Response Time (RT): Time when a request is completed

QPS(Query Per Second): the number of requests completed within 1 Second

The relationship between QPS and number of threads

For a single thread, QPS = 1000ms/RT

For example, a system with only one thread and a response time of 50ms would have a QPS of 1000/50=20

If it has two threads, its QPS is: 20*2=40

This assumes that CPU, IO, and memory are not affected

Theoretically, the more threads a server can support, the higher the QPS, which is directly proportional to the number of threads.

Of course, the resources of the server are limited. In the actual compression process, the QPS will increase as the number of threads increases at the beginning. When the number of threads reaches a certain number and the CPU bottleneck is reached, the QPS will remain the same.

Reason is very simple, when the CPU resources abundant, thread has enough CPU execution time is used to run the program, when the number of threads to achieve a certain number of CPU resources run out at the same time, the thread start competing for CPU, frequent thread context switch (the process is very time consuming), between threads waiting for each other, natural increase response time.

Optimal number of threads

From the relationship between QPS and the number of threads, it is easy to get an idea.

Optimal number of threads: the critical number of threads that just consume server resources.

Formula: Optimal number of threads = ((thread wait time + thread CPU time)/ thread CPU time) * number of cpus

Equivalent to: Optimal number of threads = (thread wait time/thread CPU time + 1) * number of cpus

The first one is on the Internet, and the equivalent formula is what I converted, because it explains what it means

Of course, if you know the exact meaning of the first formula, please tell me

formula

In the single-thread case, the wait time of the thread is 100ms, and the CPU time of the thread is 20ms. In the single-thread case, the CPU time of the thread is 20ms, and the CPU time of the thread is idle for 100ms. 100/20 = 5, plus its own thread is 6, so the maximum number of threads that can be used by the CPU in this time period is 6. If the server has 2 cpus, then 6*2 = 12.

Apply the formula :(10/2 + 1)*2 = 12

Of course, in practice, it’s not 20ms for him or 20ms for me, but rather the CPU allocates time slices for each thread to execute alternately

features

When the optimal number of threads is reached, the number of threads continues to increase, but the QPS remains the same, while the response time becomes longer and the number of threads continues to increase, and the QPS begins to decrease.

How do I get the optimal number of threads

1. Slowly increase the number of threads through pressure measurement and observe the pressure measurement. It is easy to obtain the optimal number of threads according to the characteristics

2. It is a bit difficult to calculate directly from the formula because it is difficult to know the system’s thread CPU time and thread wait time

3, according to the improvement of the first way, a pressure test, observe the CPU situation, and then the number of threads *(CPU expected value/current CPU value), will get a approximate value, and then a little adjustment can get the best number of threads.

case

In order to better understand the above theories and discuss how to improve QPS, we build a test case through Springboot

Define a method to simulate CPU execution

public long runCpu(int count){
  long start = System.currentTimeMillis();
  // Make the CPU run with a few arguments
  int a = 0;
  double b = 0;
  long c = 0;
  for (int i = 0; i < count; i++) {
    for (int j = 0; j < 100; j++){ a++; b++; c++; a=a*2; b=b/2;
      a=a/2; b=b*2;
      c=c*2; c=c/2; a--; b--; c--; } a++; b++; c++; } System.out.println(a);// Return the running time
  return System.currentTimeMillis() - start;
}
Copy the code

The count parameter makes the method run time variable

Define the pressure test interface

/ * * *@paramCount Number of cycles used to simulate CPU runtime *@paramSleep IO time milliseconds */
@GetMapping("/benchmark")
public String qps(int count, long sleep) throws InterruptedException {
  long start = System.currentTimeMillis();
  // CPU run time
  long cpuTime = runCpu(count);
  long ioStart = System.currentTimeMillis();
  // Simulate I/O blocking
  Thread.sleep(sleep);
  long ioTime = System.currentTimeMillis() - ioStart;
  long total = System.currentTimeMillis() - start;
  return "total: "+ total + " cpu-time:" + cpuTime + " io-time:" + ioTime;
}
Copy the code

For testing purposes, I made it a mirror and ran it using Docker

Here is my Docker-compose file, given 2 cpus

version: '3.5'
services:
  qps-test:
    image: QPS - test: 1.0.0
    container_name: qps-test
    ports:
      - 8080: 8080
    resources:
      limits:
        cpus: '2.00'
Copy the code

For the first test, set count to 100000 (which is equivalent to my machine’s CPU-time of 10-20ms) and IO time to 80ms

http://192.168.65.206:8080/qps/benchmark?count=100000&sleep=80

The results are as follows

RT	qps	cpu	Optimal number of threads
103	125	190%	13

Single-threaded QPS: 1000/103 = 9.7

There may be some partners who do not know how to call up this result, here I simply explain next

First we need to know, the server bottleneck on the CPU, because I this case there can be no memory bottleneck, so we need to press the CPU to 190% (critical CPU bottleneck) after left, if pressure to 200%, it is likely that at this time the number of threads has super, CPU resources have been exhausted, it need to reduce the number of threads, if not to 190%, Continue to increase the pressure gauge thread until it is constant at about 190%.

I use Jmeter for pressure measurement tools

With this baseline data, now is the time to try to improve QPS

To optimize the direction

According to the formula: QPS = (1000/RT) * number of threads

Since CPU resources are running low, we can only try to reduce the response time

Response time is divided into two parts: CPU time and thread wait time, so we start with both.

Reduces THE I/O wait time

We tried to actually reduce IO from 80ms to 40ms

http://192.168.65.206:8080/qps/benchmark?count=100000&sleep=40

The results of pressure measurement are as follows:

RT	qps	cpu	Optimal number of threads
65	123	190%	8

Single-threaded QPS: 1000/65 = 15.4

We found that the response time went from 103 to 65, but the QPS barely changed, and the optimal number of threads went from 13 to 8

Conclusion: reducing IO time does not improve QPS, why?

We follow the CPU resource constancy principle: CPU resource = CPU time of a thread * total number of threads * QPS per thread

So the formula is as follows: CPU processing time per second for baseline data = CPU processing time per second for reducing I/O wait time

23ms * 13 * 9.7 = 25ms * x * 15.4 x = 7.53

25ms is RT(65) -IO (40) 15.4 is 1000/65

The number of threads	Single thread QPS	RT	CPU processing time	QPS
13	9.7	103	23ms * 13 * 9.7	125
8 x material	15.4	65	25ms * x * 15.4	123

Reduces CPU execution time

We cut the CPU run time by average, count value 100000 -> 50000

http://192.168.65.206:8080/qps/benchmark?count=50000&sleep=80

The results of pressure measurement are as follows:

RT	qps	cpu	Optimal number of threads
101	244	190%	25

Single-threaded QPS: 1000/101 = 9.9

The response time barely changed, but the QPS doubled and the number of optimal threads doubled

It is concluded that reducing CPU time can significantly improve QPS

Also according to the principle of constant CPU resource:

23ms * 13 * 9.7 = 21ms * x 9.9

X material 14

For some unknown reason, the car rolled over here, and normally the CPU time should be around 10ms, because count has been halved.

If the CPU time is between 10ms and 15ms, the X is close to 25 and fits the manometry. Here the guess is because the IO time is wrong, causing RT to become longer.

summary

count	sleep	RT	qps	cpu	Optimal number of threads
100000	80ms	103	125	190%	13
100000	40ms	65	123	190%	8
50000	80ms	101	244	190%	25

Relationship between QPS and RT

If we simply follow the formula: QPS = 1000/RT, the relationship between QPS and RT is as follows

However, through the case, we know that in actual situation, the relationship between QPS and RT is not like this. There are two kinds of time in RT that affect QPS.

CPU execution time is reduced and QPS is significantly improved.

The IO wait time decreases, and the QPS is not improved significantly or has no improvement.

conclusion

Through the above analysis, if you want to improve RT

1. Reduce I/O response time

2. Reduce CPU execution time

If you want to improve your QPS

1. Reduce CPU execution time

2. Increase the number of cpus

Tip: If the QPS peaks before the CPU reaches the bottleneck during the pressure test, then there is another bottleneck, such as memory

The resources

https://www.docin.com/p-73662763.html?docfrom=rrela
Copy the code

Chase more, want to know more exciting content, welcome to pay attention to the public number: programmer PURPLE

Personal blog space: zijiancode.cn

If my article is helpful to you, please like and forward it. Your support is my motivation to update. Thank you very much!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

directory

Noun explanation

The relationship between QPS and number of threads

Optimal number of threads

formula

features

How do I get the optimal number of threads

case

To optimize the direction

Reduces THE I/O wait time

Reduces CPU execution time

summary

Relationship between QPS and RT

conclusion

How to improve QPS and RT

directory

Noun explanation

The relationship between QPS and number of threads

Optimal number of threads

formula

features

How do I get the optimal number of threads

case

To optimize the direction

Reduces THE I/O wait time

Reduces CPU execution time

summary

Relationship between QPS and RT

conclusion

Related Posts

Implement MQ from scratch, modeled after Kafka

CentOS7 Install Docker and run Hello -world

Go constructs are nested and inherited