Load balancing Continued: Load balancing in the trillion traffic scenario

High Concurrency Optimization series (updated continuously)

1.1. Architecture Optimization: Cluster deployment, load balancing 1.2. This content

Welcome to search and follow the wechat official account of the same name [Coder’s Technical Road]. The background provides historical articles to be compiled and downloaded in PDF. Welcome to collect and discuss

The first part of the basic load balancing involved in the basis of all listed, so in the actual scenario, especially in the trillion traffic scenario, the real load balancing scheme is how to do it. This paper introduces and discusses the application of load balancing in taobao Double 11, Spring Festival Travel 12306, wechat red envelope and Douyin Spring Festival Gala red envelope.

Load Balancing under Ali Double 11 Traffic [1]

Double eleven flow characteristics of a huge amount of requests, pulse type. It is a test of all services on the Ali ecological chain.

Excellent performance: to cope with the pulse flow impact on double 11 night
Stable service: High availability to cope with device and network jitter
Service non-inductive: Smooth upgrade and Dr Switchover

Realize the principle of

1) Good performance depends on DPDK

Alibaba’s new generation of load balancer is based on DPDK[2]. Its advantages are summarized as follows *[3]It is because of these high performance support specifically for packets that a good performance load balancer can be achieved to support the pressure of pulse traffic in double 11 scenarios for many years.

2) Connection interruption caused by ECMP re-election

Equal-cost Multipathrouting (ECPM) is an equal-cost multipath routing algorithm that maximizes the use of shortest paths.

<<< Slide left and right to see more >>>

As shown in the preceding figure, SLB is deployed in a scale-out cluster. Multiple servers advertise the same route and form ECPM routes on switches. To achieve the purpose of high availability. However, if the server hardware or network is abnormal before the connection synchronization, the server becomes unavailable. If the ECPM resends the route, the connection reaches another server. As a result, the existing connection is interrupted and the user access is abnormal. The SLB uses the session synchronization mechanism to solve the problem of long connection interruption in upgrade and Dr Scenarios. Multicast technology is used to solve the problem of machine online and offline in session synchronization mechanism. See reference [1] for detailed explanation.

Load Balancing of Railway 12306 [4]

12306 is so famous that it needs no introduction. Many of these scenarios and techniques can give us a good reference. However, only papers published in 2016 were found, not the latest architecture deployment.

Business difficulties of 12306

Dynamic inventory, the remaining tickets can be split by site
Strong consistent transaction, order transaction nature
Multi-dimensional data consistency, online and offline ticketing channels
Discharge flood peak, in holidays have discharge flood peak

The previous issues are not discussed here, but the role of load balancing in coping with flood peak.

The history of the 12306 architecture is as follows:

<<< Slide left and right to see more >>>

AS can be seen from the figure above, before the first optimization, performance bottlenecks appeared in almost all the link services, because concurrent queries led to high query system load, and user retries caused AS overload. AS congestion leads to increased response, which leads to WEB load problems. Online pressure leads to the anomaly of the entire ticketing system, which affects the normal operation of offline ticketing channels until the link avalanches.

After the first optimization, the queuing system is introduced, and different trains use different queues, which has reached the request diversion; And the queuing system adopts dynamic flow control, according to the processing speed of the ticket center of each railway bureau, the speed control request is delivered; And the passenger ticket network service is split according to different rules, so that the traffic load balance. This optimization allowed 12306 to successfully spend 13 years of Spring Festival travel. However, with the rapid development of the Internet and the increasing number of online tickets, the current architecture has reached the bottleneck of bandwidth, performance and stability. So the second optimization is as follows:This article will focus on the actual role of load balancing in a business scenario, so other optimization points will not be discussed. It is precisely because of the multi-dimensional and multi-level load balancing that 12306 is able to carry higher traffic impact.

Load Balancing behind wechat Red envelopes [5]

In the first month of 2017, wechat announced that users received and sent 14.2 billion red packets on New Year’s Eve, with the peak value reaching 760,000 per second.

Ten billion red envelope business features:

Different from ordinary shopping malls, a group red envelope is equivalent to a second to kill activities, concurrent requirements are higher
Financial attribute. Data consistency is not allowed and the security level is higher.

So wechat red envelope scheme is how to design

Vertical SET, divide and conquer

If the service is split and deployed in a normal way, the massive lock competition will put incalculable pressure on the DB due to the need for lock inventory to prevent overrelease. Even if it is the use of external storage of distributed lock for pre-pressure relief, only the transfer of pressure, but can not be reduced.

The advantage of set-based deployment is that the same red packet will only be routed to the same SET, which is equivalent to a reduce-like split of the huge flood. Different sets do not affect each other, which greatly reduces the resource pressure among different sets. (In fact, it is similar to the unitized deployment principle of Ali’s RGCzone)

Server layer request queuing

The reason for concurrent lock grab is that requests to DB are likely to be concurrent. If requests to DB are guaranteed to travel through, then there is no concurrency.

<<< Slide left and right to see more >>>

First, the IDhash is used to ensure that the requests of the same red packet are allocated to the same Server, and then the single red packet is queued. In this way, the order of the requests of the same red packet can be guaranteed to reach the DB, thus reducing the concurrency of DB lock grab.

Double dimensional library table design

Because of the large number of red packets, performance problems will occur when the data of a single table reaches a certain level. Therefore, in addition to dividing the database and table by red packet ID, we also split the hot and cold data by day to ensure the DB performance on the same day on the premise of ensuring the elegant migration of data. During query, routing of database table is carried out through database middleware.

conclusion

From the perspective of load balancing, it can be seen that the entire architectural design of red envelope can be understood as using three layers of load balancing. The first is the entry layer, which divides the traffic into sets to achieve load balancing of the whole SET cluster. The second layer is the server layer, which hashes the service logic of the red packet ID, and achieves the load balancing within the server cluster while the ID travels. Then there is the DB layer, through the design of the two-dimension database table, it can ensure the performance of DB and achieve the load balance of data access.

Load Balancing behind hongbao in Douyin Spring Festival Gala [6][7]

The first several parts respectively from the network layer, the architecture layer, the internal design and so on the actual application of load balancing. This section will focus on the load balancing advantages of Service Mesh, the next generation microservice technology involved in douyin architecture.

What is a Service Mesh[8]

In order to solve the problem of end-to-end bytecode communication, TCP protocol was born, so that multi-machine communication becomes simple and reliable; In the age of microservices, Service Mesh was born to mask the complexity of distributed systems and allow developers to get back to business.

<<< Slide left and right to see more >>>

Load Balancing of Istio under Service Mesh [9]

The Istio service grid is logically divided into two parts: control plane and data plane. The data plane consists of a set of intelligent proxies deployed in Sidecar mode, which mediate and control all network communications between microservices and mixers.An Envoy can send a number of indicators and telemetry data, depending on the Envoy’s configuration.

An Envoy acts as a mediator in the network architecture and can add additional functions to traffic management on the network, including providing security, privacy protection, or load policies. In the scenario of inter-service invocation, the proxy can hide the service backend topology details for the client, simplify the complexity of the interaction, and protect the back-end service from being overloaded. And can discover all the members in the cluster, and then through the active health check to determine the health status of the cluster members, and according to the health status, through the load balancing policy to determine the request to which cluster members.

conclusion

This paper, from the perspective of practice, selects four most typical cases, respectively from the network layer, architecture layer, micro-service development and other aspects of the practical application of load balancing, hope to be helpful to your work and medical study ~ welcome to pay attention to private message communication ~

Reference

[1] support double tenth of a high-performance load balancing: www.aliyunhn.com/Home/Articl…

[2] are read DPDK: cloud.tencent.com/developer/a…

[3] DPDK technology introduction: www.jianshu.com/p/86af81a10…

[4] 12306 Internet ticketing system structure optimization and evolution: journal of railway computer application

[5] Ten billion level Wechat red envelope system design scheme: www.infoq.cn/article/201…

[6] trill gala behind the scenes: www.volcengine.cn/docs/6360/6…

[7] Douyin Spring Festival Gala hongbao 10 billion interactive magnitude behind: www.163.com/dy/article/…

[8] what is Service Mesh: zhuanlan.zhihu.com/p/61901608

[9] Service Mesh Istio architecture analytic: developer.aliyun.com/article/759…

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Load balancing Continued: Load balancing in the trillion traffic scenario