preface
In a server deployment, where services are typically clustered to avoid single points of failure, which service provider instance should a consumer invoke? This involves load balancing of services. Generally, load balancing policies need to evenly distribute requests to all service nodes to avoid the situation that requests are concentrated at a certain point. Node weight, session stickiness, and other requirements are sometimes considered
According to the roles that perform the load balancing policies, the load balancing policies can be classified into client load balancing policies or server load balancing policies. However, the load balancing policies applicable to either mode are the same. The common load balancing policies are as follows:
- random
- polling
- Minimum active number
- Consistent hash
Each strategy has its own scenarios, which are described below
1. The random
As the name implies, a random policy will randomly select a node from the list of services to invoke the service. The simple implementation is shown below
private Node select(List<Node> servers){
Random r = new Random();
int serverSize = servers.size();
return servers.get(r.nextInt(serverSize));
}
Copy the code
If the weight of each node is different (for example, a node with a good configuration has a large weight and is expected to undertake more requests), the node with a higher weight has a higher probability of being randomly assigned to it
The weight can be configured either statically in advance or dynamically in accordance with the system operation. For example, during the warm-up phase when a node is just started, the weight can be small at the beginning and gradually increase to the predefined weight as time goes by
Here is a random implementation code by weight:
private Node select(List<Node> servers){
// Calculate the sum of all node weights
int totalWeight = calTotalWeight(servers);
// Random number, range :[0,totalWeight]
int offset = new Random().nextInt(totalWeight);
for (int i = 0; i<servers.size(); i++){ totalWeight -= servers.get(i).weight;if(totalWeight < 0) {returnservers.get(i); }}return null;
}
Copy the code
Does this code properly assign requests? As an example, suppose there is a cluster of 3 nodes with weights 1,2, and 3 respectively, and the totalWeight is 6, so the range of offset of random number is [0:5].
Each generated number has the following relationship with the node number selected by the policy:
Random number | Node number |
---|---|
0 | 1 |
1 | 2 |
2 | 2 |
3 | 3 |
4 | 3 |
5 | 3 |
You can see that the number of occurrences of node numbers does match the weight of each node
Since it is a random number, the probability of generating each number in [0:5] is the same, so the total probability of selecting each node is equal to the proportion of its occurrence in the total number of times, which is equal to its weight
Some people may think that the random strategy implementation is simple, will not be good load balancing effect. It is true that this algorithm does not distribute requests completely evenly, especially when the number of requests is small, but as the number of requests increases, the final load results are evenly distributed according to the weight
2. The polling
The polling policy is to traverse the node list in order to select the service node to be invoked. Unlike the random strategy, the polling strategy allows each node to receive even a small number of requests
Similarly, if the weights are different between nodes, some nodes are expected to take on more requests, so the weighting factor needs to be taken into account when polling. Weighted polling is generally classified into ordinary weight polling and smooth weight polling
2.1 Common weight polling
This algorithm is relatively simple. It will request the number of times that a node meets its weight ratio in a short time, and then move to the next node to continue the request
For example, nodes A,B, and C, whose weights are 1,2, and 3, first request A, then request B twice, and then request C three times
The above operations do allocate requests according to the weight of nodes, but only one node is requested in a short period of time. As a result, the pressure on this node is too high and the risk of downtime increases. However, other nodes are idle and resources are not effectively utilized. The following smooth weight polling can effectively alleviate this problem
2.2 Smoothing weight polling
The so-called smoothness means that within a certain period of time, not only is the number of times the server is selected consistent with the weight, meeting the weight requirements, but the scheduling algorithm can also evenly select nodes to allocate requests
The implementation of the algorithm is introduced first, which may be a bit convoluted, and will be explained in detail with examples later, and simply prove its correctness
// Total original weights of all nodes
private int totalWeight;
// The original weight of all nodes
private int[] nodeOriginWeight;
// Current weight of all nodes
private int[] nodeCurWeights;
Copy the code
TotalWeight: Saves the sum of the weights of all nodes. This value remains unchanged in the subsequent process
NodeOriginWeight: The original weight of each node is maintained and is maintained in the subsequent process
NodeCurWeights: Holds the current weights for each node. This array changes each time the request is computed to which node should be assigned, initialized to the weights for each node
private Node select(List<Node> servers){
int maxIndex = 0;
int maxCurWeight = servers.get(0).weight;
// Find the node with the highest current weight
for (int i = 1; i<nodeCurWeights.length; i++){if(maxCurWeight < nodeCurWeights[i]){ maxCurWeight = i; maxCurWeight = nodeCurWeights[i]; }}// Subtracted totalWeight from the value of the node with the highest weight
nodeCurWeights[maxIndex] -= totalWeight;
// Add each current weight to the original weight of each node
for (int i = 0; i<servers.size(); i++){ nodeCurWeights[i] += nodeOriginWeight[i]; }// Return the selected node
return servers.get(maxIndex);
}
Copy the code
Each time a node is selected, the following three steps are performed
- Select node A with the largest value in the current weight
- Subtracting totalWeight from a’s current weight
- Add each current weight to the original weight of each node
Here are some examples:
For example, nodes A,B, and C with weights 1,2, and 3 are asked for 6 times by smoothing polling algorithm
Number of requests | Request before nodeCurWeights | After the request nodeCurWeights | Select the node |
---|---|---|---|
1 | [1, 2, 3] | [1, 2, 3] | C |
2 | ,4,0 [2] | [2, 2, 0] | B |
3 | ,0,3 [3] | [- 3,0,3] | A) in b) in C) in D) in |
4 | [- 2,2,6] | [- 2 0] | C |
5 | [-1,4,3] | [- 1, 2, 3] | B |
6 | ,0,6 [0] | [0, 0] | C |
Before the seventh request, nodeCurWeights will return to [1,2,3] and start a new cycle
It can be found that among the first six requests, 3 requests are sent to NODE C, 2 requests are sent to node B, and 1 request is sent to node A. The proportion of the requests of each node in the total number of times is exactly in line with its weight. In addition, polling of other nodes will be interposed during polling, so as to avoid the problem of excessively concentrated requests
The algorithm is abstract, why can it not only meet the weight requirements, but also intersperse with other nodes?
The algorithm picks the node with the largest current weight and subtracts the sum of all the node weights
Therefore, the faster a node grows, the more likely it is to be selected, and the growth rate is proportional to the weight of the node. Therefore, the greater the weight of the node, the more likely it is to be selected. On the contrary, the smaller the node weight is, the slower the node grows to be the node with the largest current weight, and the lower the probability of being selected, so as to achieve the effect of allocating requests by weight
When each node is selected, the values are all subtracted equally, and because a larger value (the sum of the original weights of all nodes) is subtracted, the node has a lower probability of being selected in the next requests, because it takes time to recover to the maximum value. To achieve a smooth effect
3. Minimum number of activities
The active count is the number of requests being processed by a node
The minimum active number policy is: monitor the current number of active requests for each node and allocate the requests to the node with the smallest active number each time
This is based on the principle that a node with a small number of active requests is likely to have a small load. On the one hand, the node with a large number of active nodes should be avoided, and on the other hand, the node with the smallest number of active nodes should be loaded
So how do we know how many nodes are active? You can add one to the count of the node before each request begins and subtract one from the count at the end of each request. For client load balancing, only the requests initiated by the current client are recorded. For server load balancing, all requests can be recorded, making the minimum active count more accurate
This strategy can be used in combination with random and polling. For example, when multiple nodes have the same minimum active number, one of these same nodes can be randomly selected. The overall implementation is as follows:
// Record the number of active nodes for each node
private Map<Node,Integer> activeCount;
private Node select(List<Node> servers){
// Record the index of the node with the lowest active number. Since multiple minimum values may be the same, an array is used
int[] minIndexs = new int[servers.size()];
// The number of active nodes is equal to the minimum number of active nodes
int nodeCount = 0;
int minActiveCount = Integer.MAX_VALUE;
for (int i = 0; i<servers.size(); i++){int curCount = activeCount.get(servers.get(i));
// Find the new minimum
if(curCount < minActiveCount){
minActiveCount = curCount;
nodeCount = 0;
minIndexs[nodeCount++] = i;
// Add minIndexes to the same minimum as before
} else if(curCount == minActiveCount){ minIndexs[nodeCount++] = i; }}// There is only one minimum value.
if(nodeCount == 1) {return servers.get(minIndexs[0]);
}
// If multiple nodes have the same number of active nodes and are equal to the minimum value, select one at random
return servers.get(minIndexes[new Random().nextInt(minIndexs.length)]);
}
Copy the code
Similar to this is the minimum response time strategy, which monitors the response time for each request in each node and distributes each request to the node with the lowest average response time, with the implicit logic that nodes with low average response time are likely to have low load
4. Consistency hash
The consistent hash algorithm is more used for routing policies when data is cached and stored. Because it needs to consider increasing the failure range of the data after the node is deleted, and the cost of data migration. However, the general server cluster request invocation is stateless. It does not matter whether the request goes to the same node each time. Services can be normally executed and responses returned. This may be required in some cases, such as session persistence requirements
Usually hash is not consistent, in this case, the inconsistent when node number change, are unlikely to be assigned to the original request node, it is difficult to accept in some scenarios, such as file storage systems, the equivalent of adding nodes at a time, needs to all the files to redistribute all nodes, lest appear file can’t find
Consistent hashing algorithms need to maintain the following features:
- Monotonicity: If some requests have already been assigned to some nodes and a new node is added, ensure that the requests are assigned to the original node or the new node. This property makes consistent hashing different from other allocation strategies
- Balance: Try to ensure that each node processes an even number of requests. The realization is that each node in the hash ring is as close as possible to the same distance. If the calculated hash value is relatively concentrated, consider switching to a hash function with better uniformity
Here is a brief introduction to the benefits of consistent hashing. For details, please refer to consistent hashing
conclusion
This article introduces several common load balancing strategies that are implemented in different ways, but give different perspectives on how to allocate requests so that loads can be balanced. You can also envision your own load balancing strategy for a specific business
Reference documentation
How to understand smooth weighted polling?
Talk about consistency hashing