1. What are the difficulties in troubleshooting online problems?

We’ve learned a lot about how to troubleshoot online problems, but we’ve never had the opportunity to use it, you know?
An occasional problem online, but not enough time to save the scene, and then very difficult to reproduce?
Is it stable enough to reproduce, or is it too difficult to analyze? No experience, no flexible application of theoretical knowledge, difficult to start?

2. What are the advantages of this article

Directly based on the open source frameworkdubbo-adminAnalysis, no complex business, anyone can look at the source code
(We can simulate the service quantity manually)
dubbo-adminThe source code is simple, and many companies are using, learning can help you solve practical problems

Memory leak problem

The new dubo-Admin is compatible with Dubo2.6 as well as supporting the new dubo2.7 features. With the Dubbo2.7 metadata center, we can do things like service testing, which is already supported in the current version of Dubbo-Admin.

conclusion

Say first conclusion, causing the memory leak of code in the org. Apache. Dubbo. Admin. Service. RegistryServerSync# notify, the core code is this paragraph

if (URL_IDS_MAPPER.containsKey(url.toFullString())) {
    ids.put(URL_IDS_MAPPER.get(url.toFullString()), url);
} else {
    String md5 = CoderUtil.MD5_16bit(url.toFullString());
    ids.put(md5, url);
    URL_IDS_MAPPER.putIfAbsent(url.toFullString(), md5);
}
Copy the code

To put it simply, URL_IDS_MAPPER keeps growing, causing it to take up more and more memory, resulting in non-stop fullGC

Analysis of the

1. When will this method be executed? When the node under /dubbo is changed

2. The original intention of URL_IDS_MAPPER is to maintain the relationship between MD5 and fullUrl, but due to improper control, its capacity keeps growing, so it feels that this URL_IDS_MAPPER is completely unnecessary

3. What does improper control mean? For example, every time the provider or consumer goes online -> goes offline -> goes online, although there is only one instance of the service, multiple MD5’s are generated. If this operation is performed frequently, the capacity of URL_IDS_MAPPER will become larger and larger

4, URL_IDS_MAPPER also has a ‘registryCache’, but why does ‘registryCache’ not leak memory? Because there is a cleanup for ‘registryCache’ in this method

How did you find this problem? If the service volume is small and the service changes infrequently, this problem may not be perceived. However, if the service volume is high and there is a lot of offline, the problem is obvious. You’ll notice that the more memory your application takes up, the more it stays in fullGC

6, how to check?

top
top -p pid -H
jstatck pid |grep 0xxx
The GC
Memory DUmp.
MAT analysis: View the large object and findURL_IDS_MAPPERThere are 1 million elements in
Analysis of the code

Frequent YGC

background

The memory overflow problem was solved, but you encountered a new problem: YGC too frequently. The frequent YGC is mainly reflected in two aspects:

Application startup: 300-500 YGCs occur
Application runtime: irregular frequent YGC. A stable period of no GC and then a sudden period of frequent GC

conclusion

Dubbo-admin is essentially a registry of information query and modification! In dubbo-admin, it uses ZK’s listening mechanism to detect changes in registry information in a timely manner. Instead of using the native API, it uses the ZookeeperRegistry and NotifyListener provided by Dubbo to listen on nodes under /dubbo.

2. There is a service information caching mechanism in Dubbo. The purpose is that in the case of a suspended registry, the cached service instances can still be called, and the new service instances cannot be called. It’s a fault-tolerant mechanism. The cache file should be updated when the service node changes, i.e. abstractregistrynotify method. If the service node changes, the local cache file needs to be updated. Frequent YGC is caused by this.

3. Why does normal Dubbo applications not have this problem and dubbo-admin does? /dubbo/ the node under the interface name Dubo-admin listens on the /dubbo node, which is equivalent to all services

The analysis process

If YGC is too frequent, you may not find the information you want in memory dump

Check gc status, full stable, YGC frequent
Looking at the situation, the old used very little, the new generation is very easy to fill, then trigger YGC
Try to make the Cenozoic generation capacity larger, the YGC frequency is less, but still very frequent, indicating that the problem is not here
Memory is dumped several times, but very little is dumped and recycled
With jVisualVM tools, GC plug-ins, and performance analysis, you can see how much memory is being consumed by a thread
Dubbosaveregistrycache-thread-1 This thread is capable of generating a large number of objects at any time in a short time. See AbstractRegistry for details

1. GC at the beginning (1200M Cenozoic generation), frequent GC

2. The optimized GC (1200M Cenozoic generation) shows slow growth of Cenozoic generation, large capacity of Cenozoic generation and less GC frequency, but each GC takes longer time

3. Optimized GC (512M Cenozoic generation), the Cenozoic generation is smaller, the GC frequency is slightly faster, but the GC time of each time is relatively low

4. In the optimized GC situation (512M for the new generation), I manually simulated the request and found that the GC frequency was accelerated and met the expectation

5. Optimized GC (512M for the new generation). At this point, everyone should have gone to dinner and no service information was changed

Solution: This caching mechanism is all about fault tolerance, dubo-Amind doesn’t need this feature at all, just turn it off

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Dubbo control console online troubleshooting

Memory leak problem

conclusion

Analysis of the

Frequent YGC

background

conclusion

The analysis process

Dubbo control console online troubleshooting

Memory leak problem

conclusion

Analysis of the

Frequent YGC

background

conclusion

The analysis process

Related Posts

Docker has three core components: image, container and repository

Python From Zero to Hero (2) – Data Type (Part A)

Second, the construction of cross-language micro-service framework – Istio environment