Welcome to Tencent Cloud + community, get more Tencent mass technology practice dry goods oh ~
This post was posted by Mariolu in cloud + Community
Preface:
The author will load the server logs online and replay them to measure some machine performance indicators. The simulated machine resources are relatively small, and the online machine logs to be simulated are relatively large. Assuming that the QPS of an online single machine is 1W, then the cluster composed of 5 machines will have 5W QPS. Analog machine pressure clients need to be faster than 5W QPS to be meaningful.
Chapter 1: The first HTTP experience
“Life is short, I use Python.” Python comes with urllib2, urllib3 and third-party request. The supported proxy access and adding request headers basically meet the functional requirements. The author used the urllib2+multiprocessing library to successfully complete the code to run the code, check the QPS is only more than 2k, which is obviously far below the requirements. Scaling up the number of processes to several times the number of CPU cores also found that Python only reached 3k +. Things happen for a reason, so I will monitor the interface and shell widgets to find a variety of problems with the machine.
Chapter 2: “The Dark Ages”
The Middle Ages were a dark, long time, when you did a lot of things and you gave very little away. What remained was the experience of trying. From the CPU, memory, hard disk, network data. The CPU usage is more than 90%, the memory is full, the hard disk WA is low, and the network GIGABit network adapter is full. The first is to replace gigabit machines with gigabit machines. View the number of timewait connections up to 1w3. Let’s start by optimizing what looks like a bottleneck. Configure tcp_timestamps=1, tcp_TW_reuse =1, and Tpc_TW_recycle =1. When sysctl -p is in effect, the number of timewait connections does not go down and the number of concurrent connections does not go up. If the hardware is all set up, then why is other people’s Luna showy, mine is a piece of shit.
Back to the program architecture issue. Firstly, urllib2 and request library are blocked by network IO. Secondly, the network is short connected. Moreover, so many process switching costs are also very high. Having swum through the vast ocean of the Internet, the Grequest library may be the solution. Gevent is a individual coroutine library that uses a high-performance asynchronous networking framework based on libeV implementation provided by the Greenlet library. Perfect! It looks so perfect. So I tried to rewrite the program. But the performance didn’t go up. Is it the python language’s own limitations that keep the CPU high and concurrency low? Leave a question here and answer it at the end of the article.
Chapter three: Be Open-minded
Not content with and no longer obsessed with Python, rewrite it with golang. Golang’s coroutine library is touted as a high-performance, language-level parallelism, easy-to-write tool. Write finish run a run, concurrent quantity or on. Keeping the spirit of never giving up, the author used golang Pprof of go to analyze the bottleneck of the code again. The reports generated by Pprof can also be used to generate a more intuitive fire map using Uber’s third-party component, Go-Torch. See Figure 1. GcBgMarkWorker (GC: garbage collector), and runtime.mallocgc also takes a lot of CPU time. Then, run the go tool pprof -alloc_space replay1 / TMP /mem.prof to check as shown in Figure 2, and enter the top10 command. It is found that the pull_worker accumulatively allocates more than 600 G of memory, accounting for 93%. The list pull_worker command finds the bottleneck of the function. The initialization of the R4 variable is placed inside a for loop. R4 is used to temporarily read the response body. R4 is allocated repeatedly on every request, causing high memory.
Final chapter: summary
Ok, so far, the maximum concurrency of a single machine can reach 3W, and almost reach the target of the plan. A chicken with two of these machines can satisfy a 5W QPS request. To answer the question left above, the python language is not doing well simply because the library does not support reading the response body from the outside buffer, resulting in a memory explosion. This is not a problem with the language itself. Believe python is not that bad. At the same time, I became familiar with a new tool, go. Go native coroutine support and performance analysis tool or very intuitive very easy to use, recommend!!
Question and answer
How does Angular2 handle HTTP responses?
reading
Server Push best practices for HTTP/2
How to back up your MySQL database
This section describes the changes in MySQL 8.0
Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1160803?fromSource=waitui
Welcome to Tencent Cloud + community or follow the wechat public account (QcloudCommunity), the first time to get more mass technology practice dry goods oh ~
Massive technical practice experience, all in the cloud plus community!