background

I am responsible for the work related to the dubbo registration center developed by myself in the company, and I often receive feedback from the business side in the group that duBBo interface cancellation is wrong. After investigation, it was determined that the same interface called the deregister interface twice, because our registry deregister interface cannot be called repeatedly, the second call will report the error that the instance cannot be found because the instance has been deregistered.

Although this error will only print an error log and will not affect the business, IN the spirit of follow through, I decided to check it out. Moreover, repeated logout also increases the end time of the application and affects the release rollback speed.

Problem of repetition

I got the dubbo version of the business side, which is based on the internal customized version of open source 2.7.3. The modification of this version mainly involves security vulnerability repair and some business adaptation. I wrote a demo and ran it, and then killed it, and found that the error was reported.

To make sure that the problem was not caused by internal modifications, I tested the open source version 2.7.3 again and found that the error was still reported.

At the same time, in order to confirm that this is a bug, I changed the dubbo version to 2.7.7 for testing and found that this version no longer reported errors.

Note that repeated logout is at least a bug in open source Dubbo 2.7.3, which has been fixed in higher version 2.7.7.

Here’s the solution: Upgrade Dubbo, but this article wouldn’t be here if it were that simple.

  1. The internal Dubbo has been modified, and the upgrade will have to be merged into the new version, which is difficult
  2. Even if you upgrade the internal version of Dubbo, you can’t get the business side to upgrade that fast

So you should first find out what caused the bug, and then see if the registry extension can fix the problem. If not, you can fix the problem in the internal dubbo version.

Troubleshoot problems

Doubt ShutdownHook

As I have studied ShutdownHook in recent days (see ShutdownHook Principle), I immediately suspected that ShutdownHook might have problems.

ShutdownHook implementation in DubboShutdownHook class, along the code to comb out the following relationship

Dubbo and Spring are both registered with ShutdownHook. Debug checks to see if the registration is duplicated. Here is a little bit of experience.IntelliIDEADebug ShutdownHook execution manuallykillDebug is triggered only by the process, not by clicking the Close button on the IDE

Found in DubboShutdownHook. DoDestroy hit a breakpoint, the debug will only once, it shows that spring and dubbo ShutdownHook will only be registered once, how did this happen? After a lot of testing, dubbo had a great design.

DubboShutdownHook register and unregister methods are used to register and unregister ShutdownHook, respectively.

Dubbo registered ShutdownHook, but if you use the Spring framework, the Spring framework unregisters Dubbo’s registered ShutdownHook during initialization, so only Spring ShutdownHook remains. This is just a few lines of code

public static void addApplicationContext(ApplicationContext context) {
    CONTEXTS.add(context);
    if (context instanceof ConfigurableApplicationContext) {
        ((ConfigurableApplicationContext) context).registerShutdownHook();
        DubboShutdownHook.getDubboShutdownHook().unregister();
    }
    BeanFactoryUtils.addApplicationListener(context, SHUTDOWN_HOOK_LISTENER);
}
Copy the code

So the suspected ShutdownHook problem proved to be no problem at all.

Continue troubleshooting from the logout stack

Use IDE debug to see the call stack of two deregests. Add a breakpoint in the unregister method of the registry extension. You can see the following stack information from two different sources

In the code

That is, one ShutdownHook execution triggers two logout.

Next, it will be easier to debug step by step. Here is the explanation

  • AbstractRegistryFactory.destroyAll()Is to destroy all registries, which will investigate the registry’s logout interface
  • destroyProtocolsDestroy all of themprotocolThe registry’s protocol gets Registry at destruction time and then invokes Registry’s logout interface

So how does Dubbo 2.7.7 avoid this problem?

In the dubbo 2.7.7 code, the protocol of the registry is slightly added to get the registry on destruction

The destroyed variable is set to true after the registry is destroyed. Therefore, when the Registry Protocol obtains the new registry again, it does not get the original registry. The destroyed variable is empty.

Looking back at Github, this PR is

Github.com/apache/dubb…

This fix was fixed in 2.7.5

conclusion

  • Dubbo double logout problem exists in 2.7.0 ~ 2.7.4, fixed in 2.7.5, zK registry does not report an error, may not be aware of, but it does exist, and will slow down application closure
  • Tracing reveals that the problem can be resolved in an extension to the registry so that Registry’s destroy can only be called once
  • No matter how small the problem, have time to study, you will receive some new knowledge, such as the clever design of ShutdownHook in dubbo

About the author: the author of the public account “Bug Catching Master”, focuses on the back-end middleware development, pays attention to me, and pushes the purest technical dry goods to you