preface

A quick question for my savvy readers, if you are using springCloud-Netflix microservices for business development, then the online registry is also using cluster deployment. Here’s the problem:

Do you understand how the Eureka registry cluster achieves client request load and failover?

Think about it for a minute. I hope you read this article with questions, and I hope you’ve learned something!

background

Request execution failed with message: Connection refused when a Sentry service is enabled to interact with the registry.

Request Execution succeeded on Retry #2.

Seeing this shows that our service is interacting with the registry normally after two attempts at reconnection.

Connection refused was refused. The jitter of the registry network triggered the reconnection of our service. When the reconnection was complete, everything returned to normal.

Although this alarm did not affect our online business, and was immediately restored to normal, but as a thinking Turkey, I am curious about a series of logic behind this: How does the Eureka registry cluster achieve client request load and failover?

Registry cluster load testing

The online registry is a cluster of three machines in a 4C8G configuration. The address of the registry is configured as follows on the business side (the peer here instead of the specific IP address) :

eureka.client.serviceUrl.defaultZone=http://peer1:8080/eureka/,http://peer2:8080/eureka/,http://peer3:8080/eureka/
Copy the code

We can write a Demo to test it:

Registry cluster load testing

1. Change the port number of the EurekaServer service to simulate the cluster deployment of the registry, and start using ports 8761 and 8762 respectively

2, start the client SeviceA, configure registry address is: http://localhost:8761/eureka, http://localhost:8762/eureka

3, where to send the registration request to interrupt while starting SeviceA point: AbstractJerseyEurekaHttpClient. The register (), as shown in the figure below:

Here you see that when the registry is requested, the service is connected to port 8761.

In 4, change ServiceA registry configuration: http://localhost:8762/eureka, http://localhost:8761/eureka

5. Restart SeviceA and view the ports as shown below:

At this point, you can see that the request registry is connected to the service on port 8762.

Registry failover tests

Start EurekaServer on two ports, and then start a client ServiceA. After the service is successfully started, disable the service corresponding to port 8761 and check whether the client automatically migrates the request to the service corresponding to port 8762.

1. Start EurekaServer using port 8761 and port 8762. 2. http://localhost:8761/eureka, http://localhost:8762/eureka 3, after the success of the start, 4. Create a breakpoint where EurekaClient sends heartbeat requests: AbstractJerseyEurekaHttpClient. SendHeartBeat () 5, view the breakpoints data, the first request EurekaServer is 8761 port services, because the service has been closed, So the response returned is null

6. The service of port 8762 will be requested again for the second time, and the response returned is status 200, indicating successful failover, as shown in the following figure:

thinking

Through these two test demos, I think EurekaClient will take the first host configured by defaultZone as the address of the request to EurekaServer. If this node fails, It automatically switches to the next EurekaServer in the configuration for rerequest.

Does EurekaClient really request the first service node configured by defaultZone? That seems too weak!! ?

EurekaServer cluster is pseudo cluster!! ? Except for the first node configured by the client, all nodes in the registry can only be used for backup and failover!! ?

Is that the truth? NO! What we see is not true, there is no secret before the source code!

Cuihua, dry goods!

Client request load principle

The principle diagram

The load principle is shown in the figure below:

The EurekaClient IP will be randomly seeded and the serverList will be randomly shured. For example, the registry cluster address configured in the commodity Service (192.168.10.56) is: Peer1,peer2,peer3, the scrambled address might be peer3,peer2,peer1.

The IP addresses of the registry cluster configured for the user service (192.168.22.31) are peer1, Peer2, and peer3. The IP addresses may become peer2, Peer1, and Peer3.

EurekaClient requests the first service in the serverList at a time to load it.

Code implementation

Let’s look directly at the implementation of the lowest load code, the specific code in

Com.net flix. Discovery. Shared. Resolver. ResolverUtils. Randomize () :

In this, random is used as the random seed of EurekaClient terminal to generate a reordered serverList, which is the randomList in the corresponding code. Therefore, the order of the serverList obtained by each EurekaClient may be different. In the process of use, the first element of the list is taken as the host of the server side to achieve the purpose of load.

thinking

The original code was loaded by EurekaClient’s IP, so the results of the DEMO program just now can be explained, because we use the same IP for experiments, so we will visit the same Server node every time.

Now that we’re talking about loads, there’s definitely another question:

With load balancing over IP, each request is evenly distributed to eachServerNode?

For example, visit Peer1 for the first time, visit Peer2 for the second time, visit Peer3 for the third time, visit Peer1 for the fourth time, etc., and repeat the cycle……

We can do another experiment, let’s say we have 10,000 EurekaClient nodes, three EurekaServer nodes.

The IP address range of the Client node is 192.168.0.0 to 192.168.255.255, which covers more than 6W IP segments. The test code is as follows:

/ * * * load simulation registry cluster, hash algorithm to verify load * * / public class EurekaClusterLoadBalanceTest {public static void main (String [] args) {testEurekaClusterBalance(); } /** * simulate IP segment test registry load cluster */ private static voidtestEurekaClusterBalance() {
        int ipLoopSize = 65000;
        String ipFormat = "192.168. % s. % s";
        TreeMap<String, Integer> ipMap = Maps.newTreeMap();
        int netIndex = 0;
        int lastIndex = 0;
        for (int i = 0; i < ipLoopSize; i++) {
            if (lastIndex == 256) {
                netIndex += 1;
                lastIndex = 0;
            }

            String ip = String.format(ipFormat, netIndex, lastIndex);
            randomize(ip, ipMap);
            System.out.println("IP: " + ip);
            lastIndex += 1;
        }

        printIpResult(ipMap, ipLoopSize); } private static void randomize(String eurekaClientIp, TreeMap<String, Integer> ipMap) { List<String> eurekaServerUrlList = Lists.newArrayList(); eurekaServerUrlList.add("http://peer1:8080/eureka/");
        eurekaServerUrlList.add("http://peer2:8080/eureka/");
        eurekaServerUrlList.add("http://peer3:8080/eureka/");

        List<String> randomList = new ArrayList<>(eurekaServerUrlList);
        Random random = new Random(eurekaClientIp.hashCode());
        int last = randomList.size() - 1;
        for (int i = 0; i < last; i++) {
            int pos = random.nextInt(randomList.size() - i);
            if (pos != i) {
                Collections.swap(randomList, i, pos);
            }
        }

        for (String eurekaHost : randomList) {
            int ipCount = ipMap.get(eurekaHost) == null ? 0 : ipMap.get(eurekaHost);
            ipMap.put(eurekaHost, ipCount + 1);
            break;
        }
    }

    private static void printIpResult(TreeMap<String, Integer> ipMap, int totalCount) {
        for (Map.Entry<String, Integer> entry : ipMap.entrySet()) {
            Integer count = entry.getValue();
            BigDecimal rate = new BigDecimal(count).divide(new BigDecimal(totalCount), 2, BigDecimal.ROUND_HALF_UP);
            System.out.println(entry.getKey() + ":" + count + ":" + rate.multiply(new BigDecimal(100)).setScale(0, BigDecimal.ROUND_HALF_UP) + "%"); }}} * Simulate registry cluster load, Hash algorithm validation load * * * @ @ the author a flower is not romantic date 2020/6/21 declare * / public class EurekaClusterLoadBalanceTest {public static void main(String[] args) {testEurekaClusterBalance(); } /** * simulate IP segment test registry load cluster */ private static voidtestEurekaClusterBalance() {
        int ipLoopSize = 65000;
        String ipFormat = "192.168. % s. % s";
        TreeMap<String, Integer> ipMap = Maps.newTreeMap();
        int netIndex = 0;
        int lastIndex = 0;
        for (int i = 0; i < ipLoopSize; i++) {
            if (lastIndex == 256) {
                netIndex += 1;
                lastIndex = 0;
            }

            String ip = String.format(ipFormat, netIndex, lastIndex);
            randomize(ip, ipMap);
            System.out.println("IP: " + ip);
            lastIndex += 1;
        }

        printIpResult(ipMap, ipLoopSize); } private static void randomize(String eurekaClientIp, TreeMap<String, Integer> ipMap) { List<String> eurekaServerUrlList = Lists.newArrayList(); eurekaServerUrlList.add("http://peer1:8080/eureka/");
        eurekaServerUrlList.add("http://peer2:8080/eureka/");
        eurekaServerUrlList.add("http://peer3:8080/eureka/");

        List<String> randomList = new ArrayList<>(eurekaServerUrlList);
        Random random = new Random(eurekaClientIp.hashCode());
        int last = randomList.size() - 1;
        for (int i = 0; i < last; i++) {
            int pos = random.nextInt(randomList.size() - i);
            if (pos != i) {
                Collections.swap(randomList, i, pos);
            }
        }

        for (String eurekaHost : randomList) {
            int ipCount = ipMap.get(eurekaHost) == null ? 0 : ipMap.get(eurekaHost);
            ipMap.put(eurekaHost, ipCount + 1);
            break;
        }
    }

    private static void printIpResult(TreeMap<String, Integer> ipMap, int totalCount) {
        for (Map.Entry<String, Integer> entry : ipMap.entrySet()) {
            Integer count = entry.getValue();
            BigDecimal rate = new BigDecimal(count).divide(new BigDecimal(totalCount), 2, BigDecimal.ROUND_HALF_UP);
            System.out.println(entry.getKey() + ":" + count + ":" + rate.multiply(new BigDecimal(100)).setScale(0, BigDecimal.ROUND_HALF_UP) + "%");
        }
    }
}</pre>
Copy the code

The load test results are as follows:

You can see that the second machine has 50% requests and the last machine only has 17% requests. The load situation is not very even. I think it is not a good solution to load through IP.

Remember we talked about RoundRobinRule, the default polling algorithm in the Ribbon,

This algorithm is a good hash algorithm, which can ensure that every request is uniform, as shown in the following figure:

Failover principle

The principle diagram

The conclusion is as follows:

After reordering our serverList according to the IP address of the client, we will request the first element as the host to interact with the Server each time. If the request fails, we will try to request the second element in the serverList to continue the request. After this request succeeds, The host of the request is stored in a global variable, and the next time the client requests it, the host will be used directly.

The request is retried at most twice.

Code implementation

Look directly at the underlying interaction code, located at

Com.net flix. Discovery. Shared. Transport. Decorators. RetryableEurekaHttpClient. The execute () :

Let’s examine this code:

  1. Line 101, getclientThe last successfulserverthehostIf there is a value, use this directlyhost
  2. Line 105,getHostCandidates()Is to obtainclientThe configuration ofserverListData, and passipA list for reordering
  3. Line 114,candidateHosts.get(endpointIdx++), the initialendpointIdx=0Gets the first element in the list ashostrequest
  4. Line 120, get the returnedresponseResult if the status code returned is200, will this requesthostSet to globaldelegatevariable
  5. Line 133, the execution up to this point indicates the execution of line 120responseThe returned status code is not200That is, execution failure will be global variablesdelegateData clearing in
  6. So let’s repeat the first stependpointIdx=1Gets the second element in the list ashostrequest
  7. Execute the loop condition on line 100numberOfRetries=3, the loop will break out after a maximum of 2 retries

We can also have lines 123 and 129, which are the log messages thrown by our business, and everything corresponds.

conclusion

Thank you for reading this and I’m sure you’ve understood the question at the beginning.

The above analysis has been done in the Eureka cluster Client request load balancing selection and automatic retry request implementation principle in the case of cluster failure.