The ancients cloud, not suffering from a lack of balance.

In the computing world, this is commonly known as load balancing. The idea is that if a group of computer nodes (or processes) offer the same (or homogeneous) services, the requests for those services should be evenly distributed among them. The premise of load balancing must be “provide a single Internet service from multiple Servers”, These nodes that provide services are called Server farms, Server pools, or Backend Servers.

The service here is generalized and can be simple computation or data reading or storage. Load balancing is not something new, this kind of thought in the era of multicore cpus have, only in a distributed system, load balancing is ubiquitous, it is the natural characteristic of distributed system, distributed node is to use a large number of computer to complete a single computer can’t do computing, storage services, with a large number of computer nodes, So balanced scheduling is very important.

Load balancing enables all nodes to provide services at the lowest cost and in the best state. In this way, the system has the maximum throughput, higher performance, and less request time for users. Moreover, load balancing enhances the reliability of the system and minimizes the probability of overload or even crash of a single node. It’s not hard to imagine a system where most of the requests fall on the same node, and the response time for those requests is slow, and in case the node degrades or crashes, all the requests are moved to the next node, causing an avalanche.

In fact, there are many articles on the web about load balancing algorithms, most of which are similar. This paper is more of their own summary of these algorithms and thinking.

This paper addresses: www.cnblogs.com/xybaby/p/78…

Learn all about load balancing in a minute

The title and content of this chapter comes from the article All about Load balancing in a Minute. Of course, the title is a bit exaggerated, but it shows how load balancing can be used at all levels in a large Web site at a glance.

  

The common Internet distributed architecture is divided into client layer, reverse proxy Nginx layer, site layer, service layer and data layer. As you can see, there are multiple upstream calls to each downstream, and it is only necessary that each upstream accesses each downstream equally to achieve the “request/data [evenly] spread across multiple operation units”.

(1) Load balancing from the client layer to the reverse proxy layer is implemented through DNS polling. (2) Load balancing from the reverse proxy layer to the site layer is implemented through Nginx. (3) Load balancing from the site layer to the service layer. (4) Load balancing of [data layer] is achieved through “service connection pool”. Two points should be considered: “data balancing” and “request balancing”. Common methods include “level sharding by scope” and “hash level sharding”.

Load balancing at the data layer was covered in detail in my previous article, Data Sharding for Distributed Systems with Problems.

Algorithm to measure

In my opinion, when we talk about a load balancing algorithm, or a specific application scenario, we should consider the following questions

First, whether you realize that different nodes have different service capabilities, such as CPU, memory, network, and geographic location

Second, whether it is aware that the service capacity of nodes is dynamically changing, and the processing speed of highly equipped machines may be slow due to some unexpected reasons

Third, whether to consider distributing the same client, or the same request, to the same processing node is very important for “stateful” services, such as sessions, such as distributed storage

Fourth, who is responsible for the load balancer, that is, who acts as the load balancer, and whether the Balancer itself becomes a bottleneck

These questions will be considered in combination with specific algorithms

Load balancing algorithm

Round-robin algorithm

The idea is very simple, that is, nodes that provide homogeneous services provide services one by one, so that absolute balance can be achieved. The Python sample code is shown below

1 def round_robin(cur = [0]): 1 def round_robin(cur = [0]) 7 length = len(server_lst) 8 ret = server_lst[cur[0] % length] 9 cur[0] = (cur[0] + 1) % length 10 return retCopy the code

As you can see, all the nodes are serving at the same rate, that is, without taking into account the differences between the nodes, maybe the same number of requests, the high machine CPU is only 20%, the low machine CPU is 80%

Weight round-robin algorithm

The weighted rotational training algorithm is based on the rotational training algorithm, taking into account the differences of the machine, assigning different weights to the machine. Note that the assignment of this weight depends on the type of request. For example, if the request is computationally intensive, consider CPU and memory. If it’s IO intensive, consider disk performance. The Python sample code is shown below

1 WEIGHT_SERVER_LIST = {2' 10.246.10.1': 1, 3' 10.246.10.2': 3, 4 '10.246.10.3': 2, 5 } 6 7 def weight_round_robin(servers, cur = [0]): 8 weighted_list = [] 9 for k, v in servers.iteritems(): 10 weighted_list.extend([k] * v) 11 12 length = len(weighted_list) 13 ret = weighted_list[cur[0] % length] 14 cur[0] = (cur[0] + 1) % length 15 return retCopy the code

 

Random algorithm

This is easier to understand, random selection of a node service, according to the probability, as long as the number of requests is large enough, then the effect of absolute equilibrium can be achieved. And the implementation is much simpler

1 def random_choose(server_lst):
2     import random
3     random.seed()
4     return random.choice(server_lst)Copy the code

 

Weighted Random Algorithm (RANDOM)

Just like the weighted rotation training algorithm, the rotation training algorithm also introduces the weights of different nodes at random time, and the implementation is similar.

def weight_random_choose(servers):
    import random
    random.seed()
    weighted_list = []
    for k, v in servers.iteritems():
        weighted_list.extend([k] * v)
    return random.choice(weighted_list)Copy the code

 

Of course, if the list of nodes and their weights do not change much, then all nodes can be normalized and selected according to the probability interval

1 def normalize_servers(servers): 2 normalized_servers = {} 3 total = sum(servers.values()) 4 cur_sum = 0 5 for k, v in servers.iteritems(): 6 Normalized_Servers [k] = 1.0 * (cur_sum + v)/total 7 cur_sum += v 8 return normalized_Servers 9 10 def weight_random_choose_ex(normalized_servers): 11 import random, operator 12 random.seed() 13 rand = random.random() 14 for k, v in sorted(normalized_servers.iteritems(), key = operator.itemgetter(1)): 15 if v >= rand: 16 return k 17 else: 18 assert False, 'Error normalized_servers with rand %s ' % randCopy the code

 

Hash

Compute a hash value based on the client IP address, or the requested “Key”, and modulo the number of nodes. The advantage is that the same request can be assigned to the same service node, which is necessary for “stateful” services

1 def hash_choose(request_info, server_lst):
2     hashed_request_info = hash(request_info)
3     return server_lst[hashed_request_info % len(server_lst)]Copy the code

As long as the hash results are scattered enough, they can be perfectly balanced.

Consistency hashing

The defects of hashing algorithm are also obvious. When the number of nodes changes, requests will be allocated to other nodes with a high probability, leading to a series of problems, such as sticky Session. And in some cases, like distributed storage, it’s definitely not allowed.

In order to solve the problem of the hash algorithm, a consistent hash algorithm is introduced. In simple terms, a physical node is mapped to multiple virtual nodes, and the number of virtual nodes is used instead of the number of physical nodes when hashing. When physical nodes change, the number of virtual nodes does not change, only the redistribution of virtual nodes. Moreover, adjusting the number of virtual nodes corresponding to each physical node means that each physical node has different weights

Least Connection algorithm

Many of the above algorithms either do not take into account the differences between nodes (rotation training, random, hash), or the weights between nodes are statically assigned (weighted rotation training, weighted random, consistent hash).

Consider a situation where a machine fails and cannot process requests in a timely manner, but new requests continue to be assigned to this node with a certain probability, resulting in a backlog of requests. Therefore, it is very important to adjust the weight of nodes dynamically according to the real load of nodes. Of course, to obtain the true load of the junction is not a general thing, how to define the load, whether the load collection is timely, these are the issues that need to be considered.

The current number of connections per node is an easy metric to collect, so lease Connection is the most commonly mentioned algorithm. Others focus on different or more complex and objective metrics, such as least response time, least active, and so on.

A little thought

Stateful request

Let’s start with the third question in algorithmic measurement: whether the same request is distributed to the same service node, the same user or the same unique identifier. When is it best (and necessary) to distribute the same request to the same service node? That’s stateful — requests depend on some data that exists in memory or on disk, such as a session for web requests, such as distributed storage. How to achieve this, there are several ways:

(1) When distributing requests, ensure that the same request is distributed to the same service node.

This relies on load balancing algorithms, such as simple rotation training, which will not work randomly, and hashing will also fail when nodes are added and deleted. What is possible is consistent hash, and segmentation by scope in distributed storage (that is, keeping track of which requests are serviced by which service node), at the cost of maintaining additional data in the Load Balancer.

(2) State data is shared between Backend servers

Ensuring that the same request is distributed to the same service node is only a means to ensure that the request can use the corresponding state data. If state data can be shared between service nodes, this can also be done. Such as a service node connecting to a shared database, or an in-memory database such as memcached

(3) State data is maintained on the client

This is also used in Web requests, called cookies, but encryption is required for security purposes.

 

About the load balancer

Now answer the fourth question: about the load balancer, in other words, where to do the load balancer, the client or the server, the initiator of the request or the request 3. Specifically, either on the client side, according to the information of the service node, and then send the request directly to the selected service node; Or a centralized proxy is placed in front of the cluster of service nodes, and the proxy is responsible for request distribution. Either way, you need to know at least the basic information of the current service node list.

To implement load balancing on the client, the client needs to know the server list, which can be statically configured or queried through simple interfaces. However, the detailed load information of the Backend server cannot be queried through the client. Therefore, the client load balancing algorithm is either relatively simple, such as rotation (weighted rotation), random (weighted random), and hash algorithms. As long as each client is sufficiently random, according to the large number theorem, the load of service nodes is also balanced. To use complex algorithms on the client, for example, based on the actual load of Backend, you need to use the external Load balancing service to query the information. GRPC uses this method

  

As you can see, the Load Balancer communicates with the GRPC server to obtain the load of the GRPC server, and then the GRPC client obtains this information from the Load Balancer. Finally, the GRPC client directly connects to the selected GRPC server.

The proxy-based approach is more common, such as 7-layer Nginx and 4-layer F5 and LVS, with both hardware routing and software distribution. Centralization is characterized by easy control and the ability to implement more sophisticated and complex algorithms. The disadvantages are obvious. First, the load balancer itself can be a performance bottleneck. Second, it may introduce additional latency; the request must first reach the load balancer and then the actual service node.

The load Balance proxy responds to requests without passing through the proxy, such as LVS. Or through a Proxy, such as Nginx. Below is a schematic diagram of LVS (source: watermark)

  

If response is also load balancer proxy, then the whole service process is completely transparent to the client, which also prevents the client from trying to connect to the background server, providing a layer of security!

It is important to note that load Balancer proxies cannot be a single point of failure, so they are typically designed as highly available master-slave structures

other

As mentioned in this article, load balancing is a push model that must pick a service node and push requests to it. Another way of thinking, using message queue, becomes a pull model: idle service nodes take the initiative to pull requests for processing, and the load of each node is naturally balanced. The advantage of message queuing over load balancing is that service nodes are not overwhelmed by a large number of requests, and it is easier to add service nodes. The disadvantage is also obvious, the request is not processed in fact.

 

Think of another example, for example, in gunicorn pre-fork model, master (Arbiter in Gunicorn) will fork a specified number of worker processes, and the worker process listens on the same port. The one who listens to the network connection request first will provide the service. This is also load balancing between worker processes.

references

Wiki: Load balancing

Learn all about load balancing in a minute

grpc load-balancing.md