High concurrency is an experience that almost every programmer wants to have. The reason is simple: As traffic increases, there are all kinds of technical issues, such as interface response timeouts, CPU load increases, frequent GC, deadlocks, large data storage, and so on. These issues push us to further our technical depth. High concurrency is one of the things we inevitably ask about in an interview, but do you really understand it? Today Xiaobian will tell you about high concurrency and high concurrency big factory interview question analysis!
As a reading benefit, Xiaobian also collated high concurrency related learning materials and interview questions, and now share with Java programmers reading this article friends, you need to [Click here to】
In past interviews, when candidates were working on projects that were too concurrency, I would often ask them to explain their understanding of high concurrency, but not many people could systematically answer this question well. It falls into the following categories:
1. No concept of data metrics: It is not clear what metrics to choose to measure the high concurrency system? You don’t even know the total number of users on your system, the number of active users, the QPS and TPS at peak and peak times.
2. I designed some schemes, but I did not grasp the details thoroughly: I could not tell the technical points to be concerned about and the possible side effects. For example, read performance bottlenecks will introduce cache, but ignore cache hit ratio, hot key, data consistency and other issues.
3. One-sided understanding and equating high concurrency design with performance optimization: talking about concurrent programming, multi-level caching, asynchrony, and horizontal scaling, while ignoring high availability design, service governance, and operational support.
4. Master the big scheme, but ignore the most basic things: can clearly explain the big ideas such as vertical layering, horizontal partition, cache, etc., but do not realize whether the data structure is reasonable, the algorithm is efficient, and do not think about the most fundamental IO and calculation from the two dimensions of detail optimization.
In this article, I would like to systematically summarize the knowledge and practical ideas needed to master high concurrency based on my own experience in high concurrency projects. I hope it will be helpful to you. The content is divided into the following three parts:
- How do you understand high concurrency?
- What is the goal of high concurrency system design?
- What are the high concurrency practices?
How do you understand high concurrency?
High concurrency means large amount of traffic. It is necessary to use technical means to resist the impact of traffic. These means are like operating traffic, which can make the traffic more smoothly processed by the system and bring better user experience.
Our common high-concurrency scenes include: Double 11 on Taobao, the rush for tickets during the Spring Festival travel rush, hot news on Weibo Big V, etc. In addition to these typical things, a second-kill system with hundreds of thousands of requests per second, an order system with tens of millions of orders per day, and an information flow system with hundreds of millions of daily activities can all be classified as high concurrency.
Obviously, in the high concurrency scenarios mentioned above, the amount of concurrency varies, so what is high concurrency?
1. We should not only look at the numbers, but also look at the specific business scenarios. You can’t say that a 10W QPS seckill is high concurrency, and a 1W QPS message flow is not. The business logic of an information flow scenario, which involves complex recommendation models and various human policies, can be more than 10 times more complex than that of a seckill scenario. Therefore, not in the same dimension, there is no comparison.
2 is from 0 to 1, the business done, concurrency and QPS are only the reference index, the most important thing is: the business gradually became the original 10 times and 100 times in the process, if you use the high concurrency processing method to evolve your system, from the architecture design, coding, implementation, and even product solutions such as dimension to prevent and solve the problems caused by high concurrency? Instead of blindly upgrading the hardware, add machines to do horizontal expansion.
In addition, the business characteristics of each high-concurrency scenario are completely different: there are information flow scenarios with more reads and less writes, and there are transaction scenarios with more reads and more writes. Is there a common technical solution to solve the problem of high concurrency in different scenarios?
I think we can learn from big ideas and other people’s plans, but there will still be countless holes in the details during the actual implementation process. In addition, because the hardware and software environment, technology stack, and product logic are not exactly the same, all of these will lead to the same business scenario, even the same technical solution will face different problems, these pits have to go through one after another.
Therefore, in this article I will focus on the basics, general ideas, and experiences that have worked for me in the hope of giving you a deeper understanding of high concurrency.
What is the goal of high concurrency system design?
It is meaningful and pertinent to discuss the design scheme and practical experience on the basis of clarifying the design goal of high concurrency system.
2.1 Macro objectives
High concurrency does not mean high performance alone, which is what many people think. From a macro perspective, high concurrency systems are designed with three goals: high performance, high availability, and high scalability.
1. High performance: performance reflects the parallel processing ability of the system. With limited hardware input, improving performance means saving cost. At the same time, the performance also reflects the user experience, and the response time is 100 milliseconds and 1 second respectively, giving the user a completely different experience.
2. High availability: indicates the time when the system can serve normally. A year without shutdown, no fault; Another three out of five accidents on the line, downtime, the user must choose the former. In addition, if the system is only 90 percent usable, it can also be a huge drag on business.
3, high expansion: refers to the expansion ability of the system, whether the capacity can be expanded in a short time during the peak traffic, and more smoothly undertake the peak traffic, such as the Double 11 event, celebrity divorce and other hot events.
These three goals need to be considered as a whole, because they are interrelated and may even influence one another.
For example, considering the scalability of your system, you might design your services to be stateless. This cluster design ensures high scalability and indirectly improves performance and availability of your system.
For another example, it is common to timeout service interfaces to ensure availability in case too many threads block on slow requests and cause a system avalanche. What is a reasonable timeout? In general, we will set the Settings based on the performance of the dependent services.
2.2 Micro-goals
From a micro point of view, what are the specific indicators to measure high performance, high availability and high scalability? Why are these indicators chosen?
❇ Performance indicators
The performance index can measure the existing performance problems and serve as the evaluation basis for performance optimization. Typically, the interface response time over a period of time is used as a metric.
1. Average response time: Most commonly used, but with obvious drawbacks, it is not sensitive to slow requests. For example, for 10,000 requests, 9,900 of them are 1ms and 100 of them are 100ms, then the average response time is 1.99ms. Although the average time is only increased by 0.99ms, the response time of 1% of requests has increased by 100 times.
2. Quantile value of TP90 and TP99: the response time is sorted in order from smallest to largest. TP90 represents the response time ranked in the 90th quantile.
3. Throughput: inversely proportional to response time, for example, the response time is 1ms, the throughput is 1000 times per second.
Typically, performance goals are set with both throughput and response time, such as AVG under 50ms and TP99 under 100ms at 10,000 requests per second. For highly concurrency systems, both AVG and TP quartile values must be considered.
In addition, from a user experience point of view, 200 milliseconds is considered the first cut-off point where the user does not feel the delay, and 1 second is considered the second cut-off point where the user feels the delay but accepts it.
Therefore, for a healthy high concurrency system, TP99 should be kept under 200 milliseconds, and TP999 or TP9999 should be kept under 1 second.
❇ Usability metrics
High availability refers to the high failure free operation ability of the system. Availability = average failure time/total system running time. In general, several 9s are used to describe the availability of the system.
For a highly concurrency system, the most basic requirement is to guarantee three or four nines. The reason is simple. If you can only do two nines, that means 1% of downtime, which is a billion dollar business impact for some big companies that spend billions of dollars on GMV or revenue per year.
❇ Extensibility metrics
In the face of sudden traffic, it is impossible to improvise the architecture. The quickest way is to increase the processing capacity of the system linearly by adding machines.
For a business cluster or an underlying component, scalability = performance increase ratio/machine increase ratio, the ideal scaling capacity is: resources increase several times, performance increase several times. Generally speaking, the scaling capacity should be maintained at more than 70%.
But from the overall architecture perspective of a high concurrency system, the goal of scaling is not just to design services to be stateless, because when traffic increases by a factor of 10, business services can scale by a factor of 10 quickly, but the database may become the new bottleneck.
Stateful storage services like MySQL are often a technical challenge to scale, and if the architecture is not planned for ahead of time (vertical and horizontal split), it can involve migrating large amounts of data.
Therefore, high scalability needs to be considered: service clusters, databases, middleware such as cache and message queues, load balancing, bandwidth, dependent third parties, etc. When concurrency reaches a certain order of magnitude, each of the above factors may become the bottleneck point of expansion.
What are the high concurrency practices?
After understanding the three goals of high concurrency design, a systematic summary of high concurrency design scheme will be carried out in the following two parts: firstly, a general design method will be summarized, and then specific practical schemes will be given around high performance, high availability and high scalability respectively.
3.1 General design method
The general design method is mainly from the “vertical” and “horizontal” two dimensions, commonly known as the high concurrent processing of the two axes: vertical expansion and horizontal expansion.
❇ Scale up
Its goal is to increase the processing power of a single machine, and the solution includes:
1. Improve the hardware performance of the single machine: improve it by increasing memory, CPU core count, storage capacity, or upgrade the disk to SSD and other heap hardware.
2. Improve the software performance of a single machine: reduce the number of IO by using cache, and increase the throughput by using concurrent or asynchronous methods.
❇ Scale-out
Since there is always a limit to the performance of a single machine, horizontal scaling is finally needed to further improve the concurrent processing capacity through cluster deployment, which also includes the following two directions:
1. Do a good job in layered architecture: this is the advance of horizontal expansion, because the business of high concurrency system is often complex, and the complex problems can be simplified through layered processing, and it is easier to achieve horizontal expansion.
The above diagram is the most common hierarchical architecture of the Internet, and of course the architecture of a true high concurrency system can be further refined from this. The reverse proxy layer can be LVS+ NGINX, the Web layer can be a unified API gateway, the business service layer can be further micro-servified according to vertical business, and the storage layer can be a variety of heterogeneous databases.
2. Horizontal expansion of each layer: stateless horizontal expansion, stateful shard routing. The business cluster can usually be designed to be stateless, while the database and cache are often stateful, so it is necessary to design the partitioning key to do a good job of storage sharding. Of course, the read performance can also be improved through the scheme of master-slave synchronization and read-write separation.
3.2 Specific practice plan
Next, based on my personal experience, I will summarize the practical scheme of landing in three aspects of high performance, high availability and high expansion.
❇ High performance practices
1. Cluster deployment reduces the pressure of single machine through load balancing.
2. Multi-level cache, including the use of CDN for static data, local cache, distributed cache, etc., as well as the processing of hot key, cache penetration, cache concurrency, data consistency and other issues in the cache scenario.
3. Optimize database, tables and indexes, and solve complex queries with the help of search engines.
4. Consider the use of NoSQL database, such as HBase, TIDB, etc., but the team must be familiar with these components and have strong operation and maintenance ability.
5, Asynchronous, secondary processes through multithreading, MQ, or even delayed tasks for asynchronous processing.
6. Traffic limit. It is necessary to consider whether the service allows traffic limit (for example, the seckill scenario is allowed), including the front-end traffic limit, the NGINX access layer traffic limit, and the server side traffic limit.
7. Cut the peak and fill the valley for the flow, and accept the flow through MQ.
8. Concurrent processing, parallelizing the serial logic through multiple threads.
9. Predict calculation, such as snatching red envelopes, you can calculate the amount of red envelopes in advance and cache it, and use it directly when sending red envelopes.
10. Cache warm-up. Preheat data to local or distributed caches by asynchronous tasks.
Reduce the number of IO, such as batch reads and writes to the database and cache, RPC batch interface support, or eliminate RPC calls with redundant data.
12. Reduce the packet size during IO, including using lightweight communication protocol, appropriate data structure, removing redundant fields in the interface, reducing the size of cache keys, compressing cache values, etc.
13. Program logic optimization, such as prepositioning the judgment logic of the high-probability blocking execution process, optimizing the calculation logic of the For loop, or adopting more efficient algorithms.
14, Use of various pooling techniques and pool size Settings, including HTTP request pooling, thread pooling (considering CPU intensive or IO intensive Settings for core parameters), database and Redis connection pooling, etc.
15, JVM optimization, including the size of the new and old age, GC algorithm selection, etc., to minimize GC frequency and time consumption.
16, Lock selection, read more than write scenarios to use optimistic locking, or consider the way to reduce lock conflicts through segmenting locking.
The above scheme is nothing more than to consider all possible optimization points from the two dimensions of calculation and IO. It needs a supporting monitoring system to understand the current performance in real time, and support you to perform performance bottleneck analysis, and then follow the principle of 28, grasping the main contradiction for optimization.
❇ High availability practices
1. Peer node failover. Both NGINX and the service governance framework support access to another node after one node fails.
2. Non-peer node failover, through heartbeat detection and implementation of primary/standby switch (such as Redis sentinel mode or cluster mode, MySQL master-slave switch, etc.).
Timeout setting, retry policy and idempotent design at the interface level.
4. Degrading: guarantee core services, sacrifice non-core services, and fuse off if necessary; Or if the core link is out of order, there are alternative links.
5, the current limit processing: to exceed the system processing capacity of the request directly rejected or return an error code.
6. Guarantee of message reliability in MQ scenario, including retry mechanism on the producer side, persistence on the broker side, ack mechanism on the consumer side, etc.
7. Grayscale release can support small traffic deployment according to machine dimensions, observe system logs and business indicators, and push full volume after smooth operation.
8, monitoring alarm: a comprehensive monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, all kinds of middleware monitoring and business indicators monitoring.
9. Disaster preparedness drill: similar to the current “chaos engineering”, some destructive means are carried out on the system to observe whether local faults will cause availability problems.
The high availability scheme is mainly considered from three aspects: redundancy, trade-off, and system operation and maintenance. At the same time, a matching duty mechanism and fault handling process are needed to timely follow up and deal with online problems.
❇ Highly Extended Practice Program
1. Reasonable layered architecture: For example, the most common layered architecture of the Internet mentioned above, in addition, micro-services can be further stratified according to the data access layer and the business logic layer with finer granularity (but the performance needs to be evaluated, and there may be one more hop in the network).
2. Splitting the storage layer: vertically dividing according to the business dimension, and horizontally dividing according to the data characteristic dimension (dividing database and table).
3. Split of the business layer: the most common is in accordance with the business dimension (such as e-commerce scene goods and services, order services, etc.), can also be dismantled in accordance with the core interface and non-core interface, and can also be dismantled in accordance with the request source (such as To C and To B, APP and H5).
The last word
High concurrency is indeed a complex and systematic problem. Due to the limited space, distributed Trace, full link pressure test and flexible transactions are all technical points to be considered. In addition, if the business scenarios are different, the high concurrency landing schemes will also be different, but the overall design idea and the schemes that can be used for reference are basically similar.
High Concurrency Design also adheres to the three principles of architectural design: simplicity, appropriateness, and evolution. “Premature optimization is the root of all evil”, should not be divorced from the actual situation of the business, not to mention over design, the appropriate scheme is the most perfect.
As a reading benefit, Xiaobian also collated high concurrency related learning materials and interview questions, and now share with Java programmers reading this article friends, you need to [Click here to】
High concurrent interview questions
Because the content is more, Xiaobian will not all show, as a reading benefit, Xiaobian put all the high concurrency related learning materials and interview papers organized into a PDF document, now share to read this article to Java programmers friends, need can [Click here to】 can be obtained