What the hell is load balancing? Literally, it has two meanings: load and balance. For the system load balancing it also has two layers of meaning, system load refers to the system can bear the maximum access traffic, system balancing refers to the front-end requests to be evenly distributed to the back-end machine, at the same time, the same user to be allocated to the same machine as far as possible. The system has the following benefits after load balancing:

1. Avoid wasting resources. If we choose a bad balancing algorithm, it will lead to waste of backend resources. For example, if the consistent Hash algorithm is used, the cache capacity can be well utilized. Using randomness, on the other hand, can compromise cache performance.

2. Avoid service unavailability. When we don’t consider the carrying capacity of system, may directly crush a machine, such as when the machine’s CPU utilization of 80%, if you have a large number of requests, so the machine downtime directly, or even cause an avalanche of (a machine downtime, the corresponding request will be distributed to other machines, then other machine also can appear downtime, That all the machines go down).

Theoretical basis

The system to achieve load balancing, behind must need some algorithm support, the following is to see the corresponding algorithm.

1. Load algorithm

Since there are many ways to solve the carrying capacity of the back-end system, the common ones are as follows:

Static configuration

This approach is the most effective and stable for small and medium-sized systems. We know best because of the performance configuration of the back-end machine, which services are deployed on it, how much carrying capacity there is, and so on. For example, we often see nginx configuration:

Dynamic adjustment

When a machine fails and requests cannot be processed due to performance issues, if new requests are assigned to the node at this point, the node may go down. Therefore, it is very important to dynamically adjust the weight of nodes according to the actual load of nodes. Of course, getting to the true load of the node and how to define the load, whether or not the load is collected in a timely manner, are issues to consider.

Dynamic adjustment firstly calculates the request response time of all nodes. For the node that responds quickly, we can allocate more requests to it, and then increase its request number. When its response becomes slow, we can slowly reduce its request number, and gradually find the optimal balance point of the node, namely, how many requests to allocate to it. We find the equilibrium of all the nodes by the same method.

The advantage of this approach is that you can dynamically balance the processing power of subsequent servers. However, every coin has two sides. This scheme, if taken to the extreme, could cause an avalanche. When a machine has a brief period of network jitter, its response may be slow, and the front-end service allocates its requests to other machines. If you allocate too many, it may cause some machines to respond slowly. The requests from these machines are then allocated to other machines. This can cause an avalanche.

2. Equalization algorithm

The balancing algorithm mainly deals with how requests are sent to back-end services. Three algorithms are commonly used: random, round robin, and hash.

Random algorithm

Random algorithm is to distribute all requests to each node through a random function. This method is relatively simple and can evenly distribute requests to each node. Therefore, random algorithm is often used.

Polling algorithm

Polling algorithm is to provide services to all nodes with the same probability, but it does not take into account the performance difference between each node. For the same number of requests, nodes with good performance can easily complete, while nodes with poor performance complete more laborious. Therefore, we propose a weighted polling algorithm to assign different weights to nodes with different performance.

The hash algorithm

Usually, the user ID or IP address is used as the key to calculate the hash value, and then the number of nodes is modelled, that is, the hash(key) mode n, where N is the number of nodes, to obtain the node on which the user request falls. This method can make the same request fall on the same node, but when the number of nodes changes dynamically, this method is not very suitable. At this point, a consistent hash algorithm should be used. In the consistent hash algorithm, each server is divided into V virtual nodes, and all virtual nodes (N * V) are randomly assigned to the consistent hash ring. In this way, all users from the position on the ring to the first VNode clockwise is their node. If the node is faulty, remove another node clockwise as a replacement node. A more detailed description can be found in the consistent Hash article, which will not be expanded here.

The specific implementation

The current load balancing system includes Nginx, LVS and F5, which are always sad. Nginx is the 7-layer load balancing of software, LVS is the 4-layer load balancing of kernel, and F5 is the 4-layer load balancing of hardware.

The difference between software and hardware lies in performance, hardware is far higher than software, Nginx performance is ten thousand level, a general Linux server installed on a Nginx can reach 50 thousand concurrent requests per second; The F5, on the other hand, can do megabits, anywhere from 2 to 8, but it’s expensive.

The difference between Layer 4 and Layer 7 is protocol and flexibility. Nginx is layer 7, which supports protocols like HTTP, while LVS and F5 are layer 4 protocols, which are protocol independent and can be done by almost any application.

Welcome to follow wechat public account: Mukeda, all articles will be synchronized on the public account.