This is not an essay that answers questions, but rather a reflection on them.
Think about it carefully. When you hear about a high concurrency site, what is the concept of this site? Is taobao the first thing that comes to mind? With questions, we thought about technology together
I wrote this topic because I was not satisfied with the answers I got from search engines and decided to share some of my thoughts in the hope that we could discuss them with each other.
We are often asked in the interview, do you have high concurrency experience? Let’s not mention the fake companies with high scores (some of them actually fake). I’m thinking what is high concurrency? You can’t get a few PV’s a day. Search the Internet first, did not find a clear standard definition. So what is concurrency?
Concurrency, in an operating system, refers to a period of time when several programs are running between startup and completion, and these programs are all running on the same processor, but only one program is running on the processor at any one time.
From Baidu Baike
What do we mean by high concurrency?
The above definition is obviously not what we normally mean by concurrency, and in the Internet age, when we mean concurrency, high concurrency, we usually mean concurrent access. That’s how many visits are coming at a time.
I’ve seen someone define high concurrency something like this:
High concurrency usually means that we provide system services that can handle many requests in parallel at the same time.
Let’s look at this definition, where we first confuse union with parallelism. About concurrent parallel difference here (https://laike9m.com/blog/huan-zai-yi-huo-bing-fa-he-bing-xing / 61), I’m not much said, continue to explore concurrency.
And then the definition says a lot of requests, right? What do you mean a lot of requests? As a Chinese, this word makes my imagination run wild… Ok, pull back and continue.
High concurrency on the network is not clearly defined. However, according to my search, it is generally the pv level above ten million companies will be involved in this concept. So I came up with a custom concept: if a system has a daily PV of 10 million or more, it is likely to be a highly concurrent system.
Why is it possible? That’s because some companies don’t go the technical route at all, they rely on machines, and that’s not what we’re talking about. To avoid being pushy, let’s focus on concurrency (not heights).
Concurrency, what exactly should we care about?
To be honest, high concurrency is an abstract concept. It is difficult to have a uniform and measurable standard. Are there any other standard metrics for measuring system performance? I’m going to talk to you about some indicators from my previous computer science course.
A couple of concepts. Don’t fall asleep.
-
QPS (TPS) : Number of requests/transactions per second. In the Internet world, the number of requests (HTTP requests) per second;
-
Throughput: the number of requests processed per unit of time (usually determined by QPS and concurrency);
-
Response time: The average time the system takes to respond to a request. For example, it takes 200ms for the system to process an HTTP request. This 200ms is the response time of the system (I think only processing time should be included here, the network transfer time is ignored).
It’s important to note here that, uh, QPS does not equal concurrency
Concurrency is how many accesses are coming at the same time. QPS is the number of requests per second. Then it would be easy to derive a formula:
QPS = Number of concurrent requests/average response time
Behind our analysis is around this publicity to launch, do not understand the aftertaste again.
Now let’s assume a scenario: since QPS is the number of HTTP requests processed per second. So 1s is equal to 1000ms. Assume that our current HTTP request server takes 100ms to complete (i.e. the average response time = 100ms). So it can handle 10 requests in 1s. So QPS is equal to 10. Calculate the concurrency = 10
Often we are asked questions about high concurrency, but in part they are asking how to improve the performance of existing applications. Now let’s do the analysis based on the assumptions above. Suppose there is a system whose performance is the same as our above assumption. It has 3 million PV per day and runs on a single machine (of course, it often goes down). According to the above system performance data, the optimization solution is provided.
Improve concurrency
According to the above analysis, to improve concurrency, we need to improve our QPS.
The quickest solution is to add machines. Let’s do the actual calculation based on the above situation.
-
Page view: 200W PV
-
QPS: 10
According to daily experience, 80% of the visits are concentrated in 20% of the time, calculate how much QPS the 200W PV actually needs to meet the machine.
QPS = (200W * 0.8)/(24 * 3600 * 0.3) QPS = 61.7
Copy the code
In fact, on a single machine, we are required to process more than 61.7 requests per second, whereas our current system QPS is 10. So what’s the solution?
Plan one: get on the machine
The ability of an individual is limited, the power of a team is infinite. Since one machine can’t do it, we need more machines. This involves db master-slave, read-write separation, load balancing and other technologies.
It works by shunt, which distributes pressure that was previously concentrated. Changes are quick, flexible and quicker to implement.
Scheme 2: Increase the performance of a single machine
How much performance can be increased on a standalone machine depends on your machine configuration and how complex your service is.
Ps: Write here suddenly a little can understand why the Internet is talking about a lot of requests for high concurrency, there is no specific data, because this is really only for business, 100 concurrent for static web pages is nothing at all, but for some intensive computing type of estimation…
So how to improve the performance of the common single machine? For example: increasing the cache of infrequently changing data, enabling PHP opcache, optimizing code (e.g. N +1 problem, multiple nested loops, deep recursion, etc.), db table optimization, etc. Because each of these points could write a book. Why don’t we go on.
conclusion
Since the author himself has not actually experienced kw level PV scene, many things may not be correct, this article is to clarify my own ideas. Hope to discuss with more friends.
I also hope that this article can solve your doubts, so that we can from the lofty concept to practical problems.
The resources
-
The difference between concurrent and parallel (https://laike9m.com/blog/huan-zai-yi-huo-bing-fa-he-bing-xing, 61 /)
-
The system performance test to a concept (http://www.ha97.com/5095.html)
If you are interested in my content, please follow my wechat official account:
Public id: ICanfo
GitHub:https://github.com/helei112g
I open source the PHP SDK of Alipay, wechat Pay and China Merchants Online Payment on Github, which has nearly 800star up to now. Hope to help you improve the efficiency of project development.
Project Address:
https://github.com/helei112g/payment