High concurrency

Due to the advent of distributed systems, High Concurrency usually refers to designing a system that can handle many requests in parallel. In layman’s terms, high concurrency is when many users access the same API or Url at the same time. It often occurs in business scenarios with large numbers of active users and a high concentration of users.

Before discussing the architecture of systems with high concurrency, it is important to clarify the level of concurrency, which is different for different levels of concurrency architecture. For first-line Internet companies, the amount of concurrency is extremely large, especially for the page view of details, details can be taken out as a separate module. For second-line Internet companies, in addition, there are such as 12306, the test system and other instantaneous high concurrency.

Here we mainly talk about the high-concurrency architecture of such first-line big factories as Ali and Tencent.

Note: High concurrency level

  1. Front-line: Alibaba, Tencent, JD.com…
  2. Second line: Meituan, 58.com…
  3. Third line: melon seeds…

Data consistency under high concurrency

In addition to the level of concurrency, another thing that needs to be clarified is the requirement for data consistency

Data consistency is one of the core problems in dealing with high concurrency. Whether data should be consistent, high consistency or final consistency, how to distribute data, how to synchronize data, and response time should be considered in advance.

  • High consistency: Higher performance costs
  • Final consistency: Delayed updates with lower performance costs

Peak flow away

From the business point of view, the second kill activity is to hope more people to participate in, that is, before buying more and more people to see the purchase of goods.

However, when the time is up and users start placing orders, the backside of the server does not want to have millions of simultaneous requests.

We all know that the processing resources of the server are limited, so when there is a peak, it is easy to cause the server to go down and users can not access the situation.

This is just like the problem of morning peak and evening peak when traveling. In order to solve this problem, the travel has the solution of staggered peak limit.

Similarly, a similar solution is needed to survive the peak traffic caused by simultaneous buying. This is known as peak traffic clipping.

Multi-level cache architecture for 100 million traffic

When a high traffic request is sent, it should be peak-clipped first. We can use Nginx as a layer of reverse proxy, and then have it proxy to dozens or hundreds of servers at the back end to do load balancing.

With this architecture, Nginx as a load balancer must be able to withstand high concurrency requests first, otherwise the architecture will not be viable.

The official figure for the maximum number of concurrent requests Nginx can handle is around 50,000

In other words, the above architecture is not applicable once the number of concurrent requests exceeds 50,000

When the number of concurrent traffic exceeds 50,000, or in the case of 100 million levels of traffic, the following architecture is generally adopted

First, divide services into different modules. Each module is a service line. Deploy multiple equipment rooms for different service lines

For example, dividing domain names:

  • qq.com
  • down.qq.com
  • games.qq.com
  • … …

Each domain name is a service line and corresponds to a group of servers

Entrance layer

Then do a CDN for a single domain name and let the CDN cache static resources and small/low consistency/dynamic cache.

For dynamic cache, such as search result page, we can reduce data consistency, cache the first few pages of the search result page in CDN, set a timed JOB and update the cache every five minutes.

CDN can greatly reduce concurrency

In addition, you can put the processed data in the cache, that is, the Redis cluster, fetch data from the cache on each request, if there is no data to process, and put the processed data into the Redis cluster.

For example, common SQL statements are written into redis cluster as key to reduce the pressure on the database

As we know, There is a Controller in Tomcat. The Controller calls the Service, and the service finally calls the Redis cluster. In this case, tomcat concurrency should not exceed 200, otherwise there will be problems such as stalling.

Access layer

After CDN, two LVS can be added, as master backup, and Keepalived can be added to achieve the purpose of HA high availability.

After LVS join Nginx cluster as reverse proxy to achieve the purpose of traffic distribution.

The application layer

After the Nginx cluster at the access layer, you can also add an application-level Nginx cluster, with two primary and secondary Nginx clusters for each Nginx at the access layer.

On the application layer Nginx, you can embed Lua scripts to access redis clusters or add another layer of peak shaving before redis clusters to join kafka message queues

In addition, in the application layer Nginx can also add Lua cache, reduce socket requests and redis returned to the client, that is, reduce performance overhead

Then hash the URL on the access layer’s Nginx to ensure that each URL accesses only the same Nginx, and filter the anchors in the URL

Architecture the difficulty

  1. Redis cluster data update, application layer Nginx cache synchronization
  2. How do microservice clusters update data in Redis

The gateway zuul

The above access layer and application layer act as a gateway, but you can also add a gateway zuul after the access layer

To further reduce concurrency, gateways can also be clustered, with different Nginx requests sent to different gateways

The registry

Join the registry eruka or Zookeeper behind the gateway to register the microservice cluster with the registry

The registry sends requests from the gateway to the microservice cluster, which performs callbacks between microservice clusters and returns the results to the gateway, which then sends the results to the front-end Nginx

This is a coarse-grained, high-concurrency architecture for read requests.