This article has been included github.com/lkxiaolou/l… Welcome to star.
one
In the previous article “Redis’s contribution to microservices”, I learned from an interview that Redis can play so smoothly in microservices, and also analyzed dubbo’s Redis registry from a source point of view. Finally, the conclusion is drawn that Dubbo’s Redis registry cannot be used for production, for the following two reasons:
- Using keys blocks single-threaded redis, while keys is running, all other commands must be queued
- There is no heartbeat detection function. I have tested that after the provider is killed by kill -9, the consumer cannot sense it. However, from the perspective of implementation, it wants to judge whether the service is available by the expiration time of the storage, that is, it needs to compare the value corresponding to the URL with the current time, if the expiration time should be removed, but this part seems not complete
Later, I looked through the latest code and found that the first point has been improved. Keys are replaced by Scan. It can be simply understood that keys query all keys in redis at one time.
If the number of services is not very large, it can work normally, then the second point is still not solved. I wondered if I could optimize my contribution to the community. So he did it.
two
The verification process is as follows:
- Using the Redis registry, start two providers and one consumer for consumption
- Apply to one of the providers
kill -9
- Observing the consumer, you will find that the consumer request will partially succeed, partially fail, and always fail without recovery, i.e., the provider will not be removed from the Redis registry if it unexpectedly fails (unlogout logic is not executed, and kill-9 can be simulated)
Why do I need to start two providers? Because Dubbo has a protection mechanism when it pushes to the registry, it will ignore this push when the list of push providers is empty. After all, it is better not to update the provider than to have no provider at all.
Analysis to solve
Note that the data stored in the Redis registry is a hash structure, and the key is the URL and the value is the expiration time
127.0.0.1:6379 > hgetall dubbo/com. Newboo. Sample. API. DemoService/will 1) "Dubbo: / / 172.23.233.142:20881 / com. Newboo. Sample. The API. DemoService? Anyhost = true&application = the boot samples - dubbo&deprecated = false&dubbo = 2.0.2 & dynamic = true&generic = false&interface = com. Newboo . Sample. API. DemoService&metadata -type = remote&methods = sayHello&pid = 19807 & release 2.7.8 & side of = = provider×tamp = 1621857955 355 "2)" 1621858734778"Copy the code
That’s easy. Can you delete the expired data regularly and inform the consumer?
A second look at the code shows that this idea has already been implemented, starting the Redis registry with a thread scanning every 1/2 expiration time
this.expirePeriod = url.getParameter(SESSION_TIMEOUT_KEY, DEFAULT_SESSION_TIMEOUT);
this.expireFuture = expireExecutor.scheduleWithFixedDelay(() -> {
try {
deferExpired(); // Extend the expiration time
} catch (Throwable t) { // Defensive fault tolerance
logger.error("Unexpected exception occur at defer expire time, cause: " + t.getMessage(), t);
}
}, expirePeriod / 2, expirePeriod / 2, TimeUnit.MILLISECONDS);
Copy the code
Each scan
- The registered service will be “renewed”, this part is not concerned about
- If the user is admin, clear expired registration information and notify the user
private void deferExpired(a) {
for (URL url : new HashSet<>(getRegistered())) {
if (url.getParameter(DYNAMIC_KEY, true)) {
String key = toCategoryPath(url);
if (redisClient.hset(key, url.toFullString(), String.valueOf(System.currentTimeMillis() + expirePeriod)) == 1) { redisClient.publish(key, REGISTER); }}}if(admin) { clean(); }}Copy the code
When is admin true here?
Admin sets true when subscribing to services ending in *, possibly dubbo console
@Override public void doSubscribe(final URL url, final NotifyListener listener) { ... try { if (service.endsWith(ANY_VALUE)) { admin = true; . } catch (Throwable t) { ... }}Copy the code
And there was a line comment on the clean method in the previous code
// The monitoring center is responsible for deleting outdated dirty data
Copy the code
If admin is true, it may be monitoring Center.
In any case, very few companies are using open source monitoring centers or consoles in production, and most are adapting or developing their own.
And this kind of system also can’t guarantee stability, in case of failure, is not very easy to make fault.
Why not explore the service on the consumer side?
Just when the subscription and change push will go to Redis to get the latest data, just when the provider will renew the event, if
- Cache this data
- Check whether the data has expired every 1/2 expiration time
- If it expires, go to Redis to get the latest data for check (to prevent the loss of renewal events)
- If it does expire, the provider is considered unhealthy
The idea is relatively simple, it took 10 minutes to write a demo, using the above verification method to verify, really worked
three
Haven’t contributed source code to the community for a long time, so simply put up, after two days received comments
Would you please add some ut cases to verify this PR?
UT? Oh, it was unit test, forgot the open source community’s gameplay, only trust test code, so I went to make up the unit test.
Not to mention that testing is much harder than code, the registry notification mechanism is even harder to test than asynchronous callbacks. I came up with a clever way to test this by customizing the notification callback, storing the contents of the callback in a map, and then writing a loop for the main thread to check.
The mock service is killed by -9 using reflection to get the registered service and remove it so that it is not renewed.
There are more ways than difficulties.
Two more days later, I received comments
please comment in English
Emmm, forget, to use English, after two days, received comments
Is it possible for expireCache to go leaking for it's never cleared?
ExpireCache is used to cache urls and expireCache maps. If you just stuff them in, you forget to clean them up, which will cause memory leaks. So I added the cleanup logic.
There was an episode. Between 21 and 22 o ‘clock that day, I fixed the memory leak bug and wrote a unit test. The test method was the same as before, telling the main thread to loop. As soon as the local test was ok, it was submitted to Github, and the github compilation failed. I didn’t think much of it, because dubbo is a big project and often fails compilations.
Miraculously, I went back that night and dreamed that the unit test I had written might have missed a break, so when I ran the test, it didn’t jump out in time, so the local build succeeded, and the Github build failed (timed out).
The next day, the morning came to see, really missing a break!!
Another 2 days later, I received comments
Also, I don't see where expireCache is used inside doNotify.
Emm, looking at this, I feel like they didn’t understand the code, so I replied
expireCache mark which service may be down and call doNotity to fetch latest data from redis
Finally after a few days the PR was merged.
four
Here are a few things I learned from this:
- Writing articles has many benefits
- Contribute code to the community in English, cover unit tests, and be thoughtful
- The subconscious is really powerful
Attached is the link of this PR:
https://github.com/apache/dubbo/pull/7929
Search attention wechat public number “bug catching master”, back-end technology sharing, architecture design, performance optimization, source code reading, problem solving, practice.