preface

A quick question for my savvy readers, if you are using springCloud-Netflix microservices for business development, then the online registry is also using cluster deployment. Here’s the problem:

Do you understand how the Eureka registry cluster achieves client request load and failover?

Think about it for a minute. I hope you read this article with questions, and I hope you’ve learned something!

background

Request execution failed with message: Connection refused when a Sentry service is enabled to interact with the registry.

Request Execution succeeded on Retry #2.

Seeing this shows that our service is interacting with the registry normally after two attempts at reconnection.

Connection refused was refused. The jitter of the registry network triggered the reconnection of our service. When the reconnection was complete, everything returned to normal.

Although this alarm did not affect our online business, and was immediately restored to normal, but as a thinking Turkey, I am curious about a series of logic behind this: How does the Eureka registry cluster achieve client request load and failover?

Registry cluster load testing

The online registry is a cluster of three machines in a 4C8G configuration. The address of the registry is configured as follows on the business side (the peer here instead of the specific IP address) :

eureka.client.serviceUrl.defaultZone=http://peer1:8080/eureka/,http://peer2:8080/eureka/,http://peer3:8080/eureka/
Copy the code

We can write a Demo to test it:

Registry cluster load testing

Change the port number of the EurekaServer service to simulate the cluster deployment of the registry, and use ports 8761 and 8762 respectively. 2. Start the SeviceA client and set the registry address to: http://localhost:8761/eureka,http://localhost:8762/eureka

3, where to send the registration request to interrupt while starting SeviceA point: AbstractJerseyEurekaHttpClient. The register (), as shown in the figure below:

Here you see that when the registry is requested, the service is connected to port 8761.

4, change ServiceA registry configuration: http://localhost:8762/eureka, http://localhost:8761/eureka, 5, and restart the SeviceA then look at port, as shown in the figure below:

8762

Registry failover tests

Start EurekaServer on two ports, and then start a client ServiceA. After the service is successfully started, disable the service corresponding to port 8761 and check whether the client automatically migrates the request to the service corresponding to port 8762.

1. Start EurekaServer using port 8761 and port 8762. 2. http://localhost:8761/eureka, http://localhost:8762/eureka 3, after the success of the start, close port 8761 EurekaServer 4, in the heart of EurekaClient end request place breakpoints on: AbstractJerseyEurekaHttpClient. SendHeartBeat () 5, view the breakpoints data, the first request EurekaServer is 8761 port services, because the service has been closed, so the returned response is null

8762
response
200

thinking

Through these two test demos, I think EurekaClient will take the first host configured by defaultZone as the address of the request to EurekaServer every time. If this node fails, it will automatically switch to the next EurekaServer configured for rerequest.

Does EurekaClient really request the first service node configured by defaultZone? That seems too weak!! ?

EurekaServer cluster is pseudo cluster!! ? Except for the first node configured by the client, all nodes in the registry can only be used for backup and failover!! ?

Is that the truth? NO! What we see is not true, there is no secret before the source code!

Cuihua, dry goods!

Client request load principle

The principle diagram

The load principle is shown in the figure below:

The EurekaClient server IP will be randomly seeded and the serverList will be randomly shured. For example, we configured the registry cluster address in ** commodity services (192.168.10.56)** as: Peer1,peer2,peer3, the scrambled address might be peer3,peer2,peer1.

** User service (192.168.22.31)** The registry cluster address is set as peer1,peer2,peer3. The scrambled address may become peer2,peer1,peer3.

EurekaClient requests the first service in the serverList at a time to load it.

Code implementation

We see bottom load directly code the implementation of specific code on the com.net flix. Discovery. Shared. Resolver. ResolverUtils. Randomize () :

In this, random is used as the random seed of EurekaClient’s ipv4 terminal to generate a reordered serverList, that is, the randomList in the corresponding code, so the serverList order obtained by each EurekaClient may be different. In the process of use, the first element of the list is taken as the server host, so as to achieve the purpose of load.

thinking

The original code was loaded by EurekaClient’s IP, so the results of the DEMO program just now can be explained, because we use the same IP for experiments, so we will visit the same Server node every time.

Now that we’re talking about loads, there’s definitely another question:

With load balancing over IP, each request is evenly distributed to eachServerNode?

For example, visit Peer1 for the first time, visit Peer2 for the second time, visit Peer3 for the third time, visit Peer1 for the fourth time, etc., and repeat the cycle……

We can do another experiment, let’s say we have 10,000 EurekaClient nodes, three EurekaServer nodes.

The IP address range of the Client node is 192.168.0.0 to 192.168.255.255, which covers more than 6W IP segments. The test code is as follows:

/** * Simulate registry cluster load to verify load hashing algorithm **@authorA flower is not romantic *@date2020/6/21 declare * /
public class EurekaClusterLoadBalanceTest {

    public static void main(String[] args) {
        testEurekaClusterBalance();
    }

    /** * Test registry load cluster */ simulate IP segment
    private static void testEurekaClusterBalance(a) {
        int ipLoopSize = 65000;
        String ipFormat = "192.168. % s. % s";
        TreeMap<String, Integer> ipMap = Maps.newTreeMap();
        int netIndex = 0;
        int lastIndex = 0;
        for (int i = 0; i < ipLoopSize; i++) {
            if (lastIndex == 256) {
                netIndex += 1;
                lastIndex = 0;
            }

            String ip = String.format(ipFormat, netIndex, lastIndex);
            randomize(ip, ipMap);
            System.out.println("IP: " + ip);
            lastIndex += 1;
        }

        printIpResult(ipMap, ipLoopSize);
    }

    /** * Simulate the specified IP address to obtain the corresponding registry load */
    private static void randomize(String eurekaClientIp, TreeMap<String, Integer> ipMap) {
        List<String> eurekaServerUrlList = Lists.newArrayList();
        eurekaServerUrlList.add("http://peer1:8080/eureka/");
        eurekaServerUrlList.add("http://peer2:8080/eureka/");
        eurekaServerUrlList.add("http://peer3:8080/eureka/");

        List<String> randomList = new ArrayList<>(eurekaServerUrlList);
        Random random = new Random(eurekaClientIp.hashCode());
        int last = randomList.size() - 1;
        for (int i = 0; i < last; i++) {
            int pos = random.nextInt(randomList.size() - i);
            if (pos != i) {
                Collections.swap(randomList, i, pos);
            }
        }

        for (String eurekaHost : randomList) {
            int ipCount = ipMap.get(eurekaHost) == null ? 0 : ipMap.get(eurekaHost);
            ipMap.put(eurekaHost, ipCount + 1);
            break; }}private static void printIpResult(TreeMap<String, Integer> ipMap, int totalCount) {
        for (Map.Entry<String, Integer> entry : ipMap.entrySet()) {
            Integer count = entry.getValue();
            BigDecimal rate = new BigDecimal(count).divide(new BigDecimal(totalCount), 2, BigDecimal.ROUND_HALF_UP);
            System.out.println(entry.getKey() + ":" + count + ":" + rate.multiply(new BigDecimal(100)).setScale(0, BigDecimal.ROUND_HALF_UP) + "%"); }}}Copy the code

The load test results are as follows:

It can be seen that the second machine will have **50% requests and the last machine only has 17% requests. The load is not very uniform. I think it is not a good solution to load through IP.

Note that the Ribbon uses RoundRobinRule, the default polling algorithm in the Ribbon.

This algorithm is a good hash algorithm, which can ensure that every request is uniform, as shown in the following figure:

Failover principle

The principle diagram

The conclusion is as follows:

After reordering our serverList according to the IP address of the client, we will request the first element as the host to interact with the Server each time. If the request fails, we will try to request the second element in the serverList to continue the request. After this request succeeds, The host of the request is stored in a global variable, and the next time the client requests it, the host will be used directly.

The request is retried at most twice.

Code implementation

Direct look at the underlying interaction code, location at com.net flix. Discovery. Shared. Transport. Decorators. RetryableEurekaHttpClient. The execute () :

Let’s examine this code:

  1. Line 101, getclientThe last successfulserverthehostIf there is a value, use this directlyhost
  2. Line 105,getHostCandidates()Is to obtainclientThe configuration ofserverListData, and passipA list for reordering
  3. Line 114,candidateHosts.get(endpointIdx++), the initialendpointIdx=0Gets the first element in the list ashostrequest
  4. Line 120, get the returnedresponseResult if the status code returned is200, will this requesthostSet to globaldelegatevariable
  5. Line 133, the execution up to this point indicates the execution of line 120responseThe returned status code is not200That is, execution failure will be global variablesdelegateData clearing in
  6. So let’s repeat the first stependpointIdx=1Gets the second element in the list ashostrequest
  7. Execute the loop condition on line 100numberOfRetries=3, the loop will break out after a maximum of 2 retries

We can also have lines 123 and 129, which are the log messages thrown by our business, and everything corresponds.

conclusion

Thank you for reading this and I’m sure you’ve understood the question at the beginning.

The above analysis has been done in the Eureka cluster Client request load balancing selection and automatic retry request implementation principle in the case of cluster failure.

If you have any questions you don’t understand, please add my wechat account or leave a message on my official account. I will discuss and communicate with you separately.

This article is from the first: a flower is not romantic public number, if reproduced please indicate the source at the beginning of the article, if you need to open white can be directly public number reply.