High concurrency is an experience that almost every programmer wants to have. The reason is simple: as traffic increases, there are all kinds of technical problems, such as interface response timeouts, increased CPU load, frequent GC, deadlocks, large data storage, and so on, that drive us further in technical depth.

In past interviews, IF candidates are working on projects with high concurrency, I usually ask them to talk about their understanding of high concurrency, but there are not many people who can systematically answer this question, which can be divided into the following categories:

1. Have no concept of data indicators: Do not know what indicators to choose to measure the high concurrency system? Can not distinguish the concurrent amount and QPS, and even do not know their system’s total number of users, active users, flat peak and peak QPS and TPS and other key data.

2. I designed some schemes, but I did not master the details thoroughly: I could not tell the technical points to be paid attention to and the possible side effects of the scheme. For example, cache is introduced when the read performance is bottleneck, but problems such as cache hit ratio, hotspot key, and data consistency are ignored.

3. One-sided understanding, equate high concurrency design with performance optimization: talk about concurrent programming, multi-level caching, asynchronization and horizontal expansion, but ignore high availability design, service governance and operation and maintenance guarantee.

4. Master the big scheme, but ignore the most basic things: can clearly explain the big ideas such as vertical stratification, horizontal partition and cache, but have no consciousness to analyze whether the data structure is reasonable, whether the algorithm is efficient, and have no thought to optimize the details from the most fundamental two dimensions of IO and computation.

In this article, I want to combine my experience in high-concurrency projects and systematically summarize the knowledge and practical ideas needed for high-concurrency projects, hoping to be helpful to you. The content is divided into the following three parts:

  • How to understand high concurrency?
  • What are the goals of high concurrency system design?
  • What are the high concurrency practices?

01 How to Understand high Concurrency?

High concurrency means large traffic, so it is necessary to use technical means to resist the impact of traffic. These means, like operation flow, can make the flow more smoothly processed by the system, and bring users a better experience.

Common scenarios of high concurrency include: Double 11 of Taobao, snatching tickets during Spring Festival travel rush, hot news of Microblog V, etc. In addition to these typical things, seckill systems with hundreds of thousands of requests per second, order systems with tens of millions of orders per day, and information flow systems with hundreds of millions of daily transactions can all be classified as high concurrency.

Obviously, the amount of concurrency in the high concurrency scenarios mentioned above varies, but how much concurrency is considered high?

1. Don’t just look at numbers, look at specific business scenarios. It cannot be said that 10W QPS is high concurrency, but 1W QPS is not high concurrency. An information flow scenario involves complex recommendation models and various human policies, and its business logic can be more than 10 times more complex than a seckill scenario. Therefore, not in the same dimension, there is no comparison.

2 is from 0 to 1, the business done, concurrency and QPS are only the reference index, the most important thing is: the business gradually became the original 10 times and 100 times in the process, if you use the high concurrency processing method to evolve your system, from the architecture design, coding, implementation, and even product solutions such as dimension to prevent and solve the problems caused by high concurrency? Rather than blindly upgrade hardware, add machines to do horizontal expansion.

In addition, the business characteristics of each high concurrency scenario are completely different: there is an information flow scenario with more read and less write, and there is a transaction scenario with more read and more write. Is there a general technical solution to solve the high concurrency problem in different scenarios?

I think big ideas can be used for reference, other people’s plans can also be referred to, but in the real landing process, there will be countless pits on the details. In addition, the software and hardware environment, technology stack, and product logic cannot be completely consistent, which will lead to the same business scenario. Even if the same technical solution is used, different problems will be faced, and these pits will have to go through one by one.

Therefore, in this article I will focus on the basics, general ideas, and effective experiences THAT I have practiced in the past, hoping to give you a deeper understanding of high concurrency.

What is the goal of high concurrency system design?

It is meaningful and targeted to discuss the design scheme and practical experience on the basis of making clear the goal of high concurrency system design.

2.1 Macro objectives

High concurrency is by no means the only pursuit of high performance, which is the one-sided understanding of many people. From a macro point of view, high concurrency system design has three goals: high performance, high availability, and high scalability.

1. High performance: Performance reflects the parallel processing capability of the system. With limited hardware input, improving performance means saving costs. At the same time, performance also reflects the user experience, with response times of 100 milliseconds and 1 second respectively, giving the user a completely different experience.

2. High availability: indicates the time when the system can provide normal services. A year-round non-stop, trouble-free; Another line every three or five accidents, downtime, the user must choose the former. In addition, if the system is only 90% usable, it can be a significant drag on business.

3. High expansion: indicates the expansion capacity of the system. Whether the capacity can be expanded in a short time during peak traffic hours and handle peak traffic more smoothly, such as singles’ 11 activities, celebrity divorce and other hot events.


These three goals need to be considered as a whole because they are interrelated and even affect each other.

For example, considering the scalability of the system, you might design services to be stateless. This cluster design ensures high scalability, but also indirectly improves the performance and availability of the system.

Another example: In order to ensure availability, it is common to set a timeout on the service interface, in case a large number of threads block and cause an avalanche of slow requests. What is a reasonable timeout? In general, we set it up based on the performance of the dependent service.

2.2 Micro-objective

From the micro point of view, what are the specific indicators of high performance, high availability and high expansion to measure? Why are these indicators chosen?

2.2.1 Performance Specifications

Performance indicators can measure existing performance problems and serve as a basis for performance optimization evaluation. Generally, the interface response time over time is used as an indicator.

Average response time: most commonly used, but the defect is obvious and insensitive to slow requests. For example, for 10,000 requests, 9900 of which are 1ms and 100 of which are 100ms, the average response time is 1.99ms. Although the average time is only increased by 0.99ms, the response time of 1% requests has increased by 100 times.

2. The quartile value of TP90 and TP99: the response time is sorted from small to large, TP90 represents the response time in the 90th quartile. The larger the quartile value is, the more sensitive it is to slow requests.


Throughput: inversely proportional to the response time. For example, if the response time is 1ms, the throughput is 1000 times per second.

Typically, throughput and response time are taken into account when setting performance goals, such as AVG under 50ms and TP99 under 100ms at 10,000 requests per second. For high concurrency systems, AVG and TP quantile values must be considered simultaneously.

In addition, from a user experience perspective, 200 milliseconds is considered the first cut-off point where the user will not feel the delay, and 1 second is the second cut-off point where the user will feel the delay but will accept it.

Therefore, for a healthy high concurrency system, TP99 should be controlled within 200 ms, and TP999 or TP9999 should be controlled within 1 second.

2.2.2 Availability Indicators

High availability means that the system has a high fault free running capability. Availability = uptime/total system running time. Generally, the system availability is described by a number of nishes.


For high-concurrency systems, the basic requirement is to guarantee three or four nines. The reason is simple, if you can only do two nines, that means 1% failure time, like some big companies that generate over $100 billion GMV or revenue per year, 1% is a billion business impact.

2.2.3 Scalability Specifications

In the face of sudden traffic, it is not possible to temporarily modify the architecture, and the fastest way is to add machines to linearly increase the system’s processing capacity.

For a business cluster or base component, scalability = performance increase ratio/machine increase ratio. The ideal scaling capability is: resource increase several times, performance increase several times. Typically, scalability is maintained at more than 70%.

But from the overall architecture perspective of a high-concurrency system, the goal of scaling is not just to design services to be stateless, because when traffic increases by 10 times, business services can be rapidly expanded by 10 times, but the database may become a new bottleneck.

Stateful storage services like MySQL are often technical difficulties for scaling, and if the architecture is not planned in advance (vertical and horizontal split), it will involve the migration of large amounts of data.

Therefore, high scalability needs to be considered: service clusters, databases, middleware such as caches and message queues, load balancers, bandwidth, and dependent third parties. When the concurrency reaches a certain level, each of these factors may become the bottleneck point of expansion.

What are the high concurrency practices?

After understanding the three goals of high-concurrency design, a systematic summary of high-concurrency design scheme will be carried out from the following two parts: first, a general design method will be summarized, and then specific practical schemes will be given around high performance, high availability and high expansion.

3.1 General design methods

The general design method is mainly from the “vertical” and “horizontal” two dimensions, commonly known as the two axes of high concurrency processing: vertical expansion and horizontal expansion.

3.1.1 Vertical Scaling (Scale-up)

Its goal is to improve the processing capacity of a single machine, and the solution includes:

1. Improve the hardware performance of a single machine: increase the number of memory, CPU cores, storage capacity, or upgrade the disk to SSD and other heap hardware to improve.

2. Improve the software performance of a single machine: use cache to reduce I/O times, and use concurrent or asynchronous methods to increase throughput.

3.1.2 Scale-out

Because there will always be a limit to the performance of a single machine, horizontal scaling will eventually be introduced to further improve concurrent processing capability through cluster deployment, which includes the following two directions:

1. Hierarchical architecture: This is the advance of horizontal expansion, because high-concurrency systems tend to have complex businesses. Hierarchical processing can simplify complex problems and make horizontal expansion easier.


The diagram above is the most common layered architecture of the Internet, and of course a real high-concurrency system architecture would improve upon it. For example, the reverse proxy layer can be LVS+Nginx, the Web layer can be a unified API gateway, the business service layer can be further micro-servization based on vertical business, and the storage layer can be a variety of heterogeneous databases.

2. Horizontal expansion of each layer: stateless horizontal expansion and stateless fragment routing. Service clusters can be stateless, while databases and caches are stateful. Therefore, partition keys must be designed to fragment storage. You can also improve read performance by synchronizing primary and secondary data and separating read and write data.

3.2 Specific practice schemes

Based on my personal experience, I will summarize practical solutions for high performance, high availability and high expansion.

3.2.1 High performance practices

1. In cluster deployment, load balancing is used to reduce single-node stress.

2. Multi-level cache, including the use of CDN for static data, local cache, distributed cache, etc., and the processing of hot key, cache penetration, cache concurrency, data consistency and other issues in the cache scene.

3, sub-database sub-table and index optimization, and with the help of search engines to solve complex query problems.

4. Consider using NoSQL databases, such as HBase and TiDB, but the team must be familiar with these components and have strong operation and maintenance capabilities.

Asynchronization, secondary processes are processed asynchronously through multi-threading, MQ, or even delayed tasks.

6. Traffic limiting, including traffic limiting at the front end, Nginx access layer, and server, needs to be considered first (for example, traffic limiting is allowed in seckill scenarios).

7, the flow of peak filling valley, through MQ to undertake the flow.

8. Concurrent processing, parallelization of serial logic through multithreading.

9, estimated calculation, such as grabbing red envelope scene, can be calculated in advance of the red envelope amount cache, send red envelope can be used directly.

10. Cache preheating: Preheating data to local cache or distributed cache through asynchronous tasks.

Reduce I/O counts, such as batch reads and writes to databases and caches, batch interface support for RPC, or eliminating RPC calls through redundant data.

12. Reduce the packet size during I/O, including using lightweight communication protocols, appropriate data structures, removing redundant fields in interfaces, reducing the size of cache keys, and compressing cache values.

13. Program logic optimization, such as blocking the predisposition of judgment logic of execution process with high probability, optimizing calculation logic of For loop, or adopting more efficient algorithms.

14, the use of various pooling technologies and pool size Settings, including HTTP request pool, thread pool (consider CPU intensive or IO intensive setting core parameters), database and Redis connection pool, etc.

15. JVM optimization, including generation and generation sizes, GC algorithm selection, etc. to minimize GC frequency and time.

16, lock selection, read more write less scenario use optimistic lock, or consider the way to reduce lock conflicts by segmental lock.

The above solution is nothing more than considering all possible optimization points from the two dimensions of computing and IO. It is necessary to have a supporting monitoring system to know the current performance in real time, and support you to analyze the performance bottleneck, and then follow the 80/20 principle and focus on the main contradiction for optimization.

3.2.2 Practical scheme of high availability

1. Peer failover. Both Nginx and the service Governance framework support access from one node to another after a node fails.

2. Failover of non-peer nodes through heartbeat detection and master/slave switchover (such as redis sentinel mode or cluster mode, MySQL master/slave switchover, etc.).

3. Timeout setting, retry strategy and idempotent design at interface level.

4. Downgrade processing: guarantee core services, sacrifice non-core services, fuse when necessary; Or if the core link fails, there are backup links.

5. Traffic limiting: Reject the requests that exceed the system’s processing capacity or return error codes.

6. Guarantee message reliability in MQ scenarios, including retry mechanism on producer side, persistence on broker side, ack mechanism on consumer side, etc.

7, grayscale release, can support small flow deployment according to the machine dimension, observe system logs and business indicators, such as smooth operation and then push the full volume.

8. Monitoring and alarm: comprehensive monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, all kinds of middleware monitoring and business indicators monitoring.

9. Disaster recovery drill: Similar to the current “chaos engineering”, some destructive means are carried out on the system to observe whether local faults will cause availability problems.

The high availability solution mainly focuses on redundancy, trade-off, and system operation and maintenance. In addition, it requires a supporting duty mechanism and troubleshooting process. When an online problem occurs, it can be handled in a timely manner.

3.2.3 High-expansion practice scheme

1. Reasonable layered architecture: for example, the most common layered architecture of the Internet mentioned above, in addition, micro-services can be further stratified in finer granularity according to the data access layer and business logic layer (but the performance needs to be evaluated, there may be one more hop in the network).

2. Split storage layer: split vertically according to the business dimension and further split horizontally according to the data characteristic dimension (divided into databases and tables).

3. Business layer separation: most commonly, it is divided according To business dimension (such as goods and services, order services, etc.), core interface and non-core interface, and request source (such as To C and To B, APP and H5).

Write in the last

High concurrency is indeed a complex and systematic problem, and due to space constraints, techniques such as distributed Trace, full link pressure, and flexible transactions are all considered. In addition, if the business scenario is different, the implementation scheme of high concurrency will also be different, but the overall design ideas and schemes that can be used for reference are basically similar.

High concurrency design also follows the three principles of architectural design: simplicity, fit, and evolution. “Premature optimization is the root of all evil”, can not be divorced from the actual situation of the business, let alone over design, the appropriate plan is the most perfect.

I hope this article gives you a more comprehensive understanding of high concurrency, and if you have some useful experiences and in-depth thinking, please leave a comment in the comments section.

About the author: 985 master, former Engineer of Amazon, now 58-year-old technical director

Welcome to pay attention to my personal public number: IT career advancement