preface

In a server deployment, where services are typically clustered to avoid single points of failure, which service provider instance should a consumer invoke? This involves load balancing of services. Generally, load balancing policies need to evenly distribute requests to all service nodes to avoid the situation that requests are concentrated at a certain point. Node weight, session stickiness, and other requirements are sometimes considered

According to the roles that perform the load balancing policies, the load balancing policies can be classified into client load balancing policies or server load balancing policies. However, the load balancing policies applicable to either mode are the same. The common load balancing policies are as follows:

  • random
  • polling
  • Minimum active number
  • Consistent hash

Each strategy has its own scenarios, which are described below

1. The random

As the name implies, a random policy will randomly select a node from the list of services to invoke the service. The simple implementation is shown below

private Node select(List<Node> servers){

    Random r = new Random();

    int serverSize = servers.size();

    return servers.get(r.nextInt(serverSize));

}
Copy the code

If the weight of each node is different (for example, a node with a good configuration has a large weight and is expected to undertake more requests), the node with a higher weight has a higher probability of being randomly assigned to it

The weight can be configured either statically in advance or dynamically in accordance with the system operation. For example, during the warm-up phase when a node is just started, the weight can be small at the beginning and gradually increase to the predefined weight as time goes by

Here is a random implementation code by weight:

private Node select(List<Node> servers){

    // Calculate the sum of all node weights

  int totalWeight = calTotalWeight(servers);



    // Random number, range :[0,totalWeight]

  int offset = new Random().nextInt(totalWeight);



    for (int i = 0; i<servers.size(); i++){ totalWeight -= servers.get(i).weight;if(totalWeight < 0) {returnservers.get(i); }}return null;

}
Copy the code

Does this code properly assign requests? As an example, suppose there is a cluster of 3 nodes with weights 1,2, and 3 respectively, and the totalWeight is 6, so the range of offset of random number is [0:5].

Each generated number has the following relationship with the node number selected by the policy:

Random number Node number
0 1
1 2
2 2
3 3
4 3
5 3

You can see that the number of occurrences of node numbers does match the weight of each node

Since it is a random number, the probability of generating each number in [0:5] is the same, so the total probability of selecting each node is equal to the proportion of its occurrence in the total number of times, which is equal to its weight

Some people may think that the random strategy implementation is simple, will not be good load balancing effect. It is true that this algorithm does not distribute requests completely evenly, especially when the number of requests is small, but as the number of requests increases, the final load results are evenly distributed according to the weight

2. The polling

The polling policy is to traverse the node list in order to select the service node to be invoked. Unlike the random strategy, the polling strategy allows each node to receive even a small number of requests

Similarly, if the weights are different between nodes, some nodes are expected to take on more requests, so the weighting factor needs to be taken into account when polling. Weighted polling is generally classified into ordinary weight polling and smooth weight polling

2.1 Common weight polling

This algorithm is relatively simple. It will request the number of times that a node meets its weight ratio in a short time, and then move to the next node to continue the request

For example, nodes A,B, and C, whose weights are 1,2, and 3, first request A, then request B twice, and then request C three times

The above operations do allocate requests according to the weight of nodes, but only one node is requested in a short period of time. As a result, the pressure on this node is too high and the risk of downtime increases. However, other nodes are idle and resources are not effectively utilized. The following smooth weight polling can effectively alleviate this problem

2.2 Smoothing weight polling

The so-called smoothness means that within a certain period of time, not only is the number of times the server is selected consistent with the weight, meeting the weight requirements, but the scheduling algorithm can also evenly select nodes to allocate requests

The implementation of the algorithm is introduced first, which may be a bit convoluted, and will be explained in detail with examples later, and simply prove its correctness

  // Total original weights of all nodes

private int totalWeight;

  // The original weight of all nodes

private int[] nodeOriginWeight;

  // Current weight of all nodes

private int[] nodeCurWeights;
Copy the code

TotalWeight: Saves the sum of the weights of all nodes. This value remains unchanged in the subsequent process

NodeOriginWeight: The original weight of each node is maintained and is maintained in the subsequent process

NodeCurWeights: Holds the current weights for each node. This array changes each time the request is computed to which node should be assigned, initialized to the weights for each node

private Node select(List<Node> servers){

    int maxIndex = 0;

    int maxCurWeight = servers.get(0).weight;

    

    // Find the node with the highest current weight

  for (int i = 1; i<nodeCurWeights.length; i++){if(maxCurWeight < nodeCurWeights[i]){ maxCurWeight = i; maxCurWeight = nodeCurWeights[i]; }}// Subtracted totalWeight from the value of the node with the highest weight

  nodeCurWeights[maxIndex] -= totalWeight;

    

    // Add each current weight to the original weight of each node

  for (int i = 0; i<servers.size(); i++){ nodeCurWeights[i] += nodeOriginWeight[i]; }// Return the selected node

  return servers.get(maxIndex);

}
Copy the code

Each time a node is selected, the following three steps are performed

  • Select node A with the largest value in the current weight
  • Subtracting totalWeight from a’s current weight
  • Add each current weight to the original weight of each node

Here are some examples:

For example, nodes A,B, and C with weights 1,2, and 3 are asked for 6 times by smoothing polling algorithm

Number of requests Request before nodeCurWeights After the request nodeCurWeights Select the node
1 [1, 2, 3] [1, 2, 3] C
2 ,4,0 [2] [2, 2, 0] B
3 ,0,3 [3] [- 3,0,3] A) in b) in C) in D) in
4 [- 2,2,6] [- 2 0] C
5 [-1,4,3] [- 1, 2, 3] B
6 ,0,6 [0] [0, 0] C

Before the seventh request, nodeCurWeights will return to [1,2,3] and start a new cycle

It can be found that among the first six requests, 3 requests are sent to NODE C, 2 requests are sent to node B, and 1 request is sent to node A. The proportion of the requests of each node in the total number of times is exactly in line with its weight. In addition, polling of other nodes will be interposed during polling, so as to avoid the problem of excessively concentrated requests

The algorithm is abstract, why can it not only meet the weight requirements, but also intersperse with other nodes?

The algorithm picks the node with the largest current weight and subtracts the sum of all the node weights

Therefore, the faster a node grows, the more likely it is to be selected, and the growth rate is proportional to the weight of the node. Therefore, the greater the weight of the node, the more likely it is to be selected. On the contrary, the smaller the node weight is, the slower the node grows to be the node with the largest current weight, and the lower the probability of being selected, so as to achieve the effect of allocating requests by weight

When each node is selected, the values are all subtracted equally, and because a larger value (the sum of the original weights of all nodes) is subtracted, the node has a lower probability of being selected in the next requests, because it takes time to recover to the maximum value. To achieve a smooth effect

3. Minimum number of activities

The active count is the number of requests being processed by a node

The minimum active number policy is: monitor the current number of active requests for each node and allocate the requests to the node with the smallest active number each time

This is based on the principle that a node with a small number of active requests is likely to have a small load. On the one hand, the node with a large number of active nodes should be avoided, and on the other hand, the node with the smallest number of active nodes should be loaded

So how do we know how many nodes are active? You can add one to the count of the node before each request begins and subtract one from the count at the end of each request. For client load balancing, only the requests initiated by the current client are recorded. For server load balancing, all requests can be recorded, making the minimum active count more accurate

This strategy can be used in combination with random and polling. For example, when multiple nodes have the same minimum active number, one of these same nodes can be randomly selected. The overall implementation is as follows:

 // Record the number of active nodes for each node

private Map<Node,Integer> activeCount;



private Node select(List<Node> servers){

    // Record the index of the node with the lowest active number. Since multiple minimum values may be the same, an array is used

    int[] minIndexs = new int[servers.size()];

    // The number of active nodes is equal to the minimum number of active nodes

    int nodeCount = 0;

    int minActiveCount = Integer.MAX_VALUE;



    for (int i = 0; i<servers.size(); i++){int curCount = activeCount.get(servers.get(i));

        // Find the new minimum

        if(curCount < minActiveCount){

            minActiveCount = curCount;

            nodeCount = 0;

            minIndexs[nodeCount++] = i;

        // Add minIndexes to the same minimum as before

        } else if(curCount == minActiveCount){ minIndexs[nodeCount++] = i; }}// There is only one minimum value.

    if(nodeCount == 1) {return servers.get(minIndexs[0]);

    }

    

    // If multiple nodes have the same number of active nodes and are equal to the minimum value, select one at random

    return servers.get(minIndexes[new Random().nextInt(minIndexs.length)]);

}
Copy the code

Similar to this is the minimum response time strategy, which monitors the response time for each request in each node and distributes each request to the node with the lowest average response time, with the implicit logic that nodes with low average response time are likely to have low load

4. Consistency hash

The consistent hash algorithm is more used for routing policies when data is cached and stored. Because it needs to consider increasing the failure range of the data after the node is deleted, and the cost of data migration. However, the general server cluster request invocation is stateless. It does not matter whether the request goes to the same node each time. Services can be normally executed and responses returned. This may be required in some cases, such as session persistence requirements

Usually hash is not consistent, in this case, the inconsistent when node number change, are unlikely to be assigned to the original request node, it is difficult to accept in some scenarios, such as file storage systems, the equivalent of adding nodes at a time, needs to all the files to redistribute all nodes, lest appear file can’t find

Consistent hashing algorithms need to maintain the following features:

  • Monotonicity: If some requests have already been assigned to some nodes and a new node is added, ensure that the requests are assigned to the original node or the new node. This property makes consistent hashing different from other allocation strategies
  • Balance: Try to ensure that each node processes an even number of requests. The realization is that each node in the hash ring is as close as possible to the same distance. If the calculated hash value is relatively concentrated, consider switching to a hash function with better uniformity

Here is a brief introduction to the benefits of consistent hashing. For details, please refer to consistent hashing

conclusion

This article introduces several common load balancing strategies that are implemented in different ways, but give different perspectives on how to allocate requests so that loads can be balanced. You can also envision your own load balancing strategy for a specific business

Reference documentation

How to understand smooth weighted polling?

Talk about consistency hashing