Under high concurrency, GC problems in Java programs are a typical class of problems that can be further amplified. Whether it’s “GC too fast” or “GC takes too long”, The Stop The World problem can easily lead to service timeouts and performance problems during GC.
The advertising system in charge of our team received a large amount of C-terminal traffic, and the requests during the flat peak period basically reached thousands of QPS. We also encountered many online problems related to GC in the past.
In this Article in May, I presented a case of Full GC being too frequent and provided a systematic summary of the JVM’s heap memory structure and GC principles.
In this article, I will share a more difficult online case of Young GC taking too long, and at the same time, I will sort out the relevant knowledge points of YGC, hoping to make you learn something. The content is divided into the following two parts:
-
Let’s start with a case where YGC takes too long
-
Summary of relevant knowledge points of YGC
01 Start with a case where YGC takes too long
In April this year, after the launch of the new version of our advertising service, we received a large number of service timeout alarms. As can be seen from the monitoring chart below, the timeout volume suddenly increased greatly, and even reached thousands of interface timeout times within 1 minute. The following describes how to troubleshoot the fault.
1. Check the monitoring
After receiving the alarm, we checked the monitoring system immediately and found that YoungGC took too long. Our program went online around 21:50. It can be seen from the following figure that: before going online, YGC was basically completed within tens of milliseconds, but after going online, YGC took significantly longer, even up to more than 3 seconds.
Since The program stops The World during YGC, and our upstream system sets The service timeout time to hundreds of milliseconds, it is inferred that The service timeout is caused by The long YGC time.
Following the normal troubleshooting process for GC problems, we immediately removed a node and then dumped the heap memory file for on-site use with the following command.
jmap -dump:format=b,file=heap pid
Finally, the online service was rolled back. After the rollback, the service was immediately restored to normal. The following was a 1-day troubleshooting and repair process.
2. Confirm the JVM configuration
With the following command, we checked the JVM parameters again
ps aux | grep “applicationName=adsearch”
-Xms4g -Xmx4g -Xmn2g -Xss1024K
-XX:ParallelGCThreads=5
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+UseCMSCompactAtFullCollection
-XX:CMSInitiatingOccupancyFraction=80
You can see that the heap memory is 4G, and the new generation and the old generation are both 2G, and the new generation uses the ParNew collector.
After running the jmap-heap PID command, it can be found that the Eden area of the Cenozoic generation is 1.6G, and the S0 and S1 areas are 0.2g.
The launch did not change any parameters related to the JVM, and the volume of requests for our service was basically the same as usual. So guess: this problem is most likely related to the online code.
3. Check the code
Going back to the principle of YGC to think about this problem, an YGC process mainly includes the following two steps:
1. Scan objects from GC Root and annotate surviving objects
2. Copy the live object to S1 or promote it to Old
As can be seen from the monitoring chart below, under normal circumstances, the usage rate of Survivor zone remains at a very low level (about 30M), but after it goes online, the usage rate of Survivor zone begins to fluctuate, reaching 0.2G at the most. Moreover, YGC time is positively correlated with Survivor zone usage. Therefore, we hypothesized that the increasing number of objects with long life cycles would lead to the time-consuming annotation and replication processes.
Back to the overall performance of the service: upstream traffic does not change significantly, and the response time of the core interface is basically within 200ms under normal circumstances. The frequency of YGC is about once every 8 seconds.
Obviously, local variables can be collected immediately after each YGC. So why do so many objects survive after YGC?
We further target the suspect object: a program’s global variable or a class static variable. However, diff reviewed the code that went live this time, and we didn’t find any introduction of such variables in the code.
4. Analyze heap memory files in dump
After the code check failed, we started to look for clues from the heap memory files. After importing the dump heap files from step 1 using MAT, we saw all the large objects in the current heap through the Dominator Tree view.
The NewOldMappingService class takes up a lot of space. This class is located in the client package of a third party and provided by the commodity team of our company, which is used to realize the conversion of the old and new categories (the commodity team is reforming the category system recently, and in order to be compatible with the old business, new and old category mapping is needed).
A closer look at the code reveals that there are a number of static HashMaps in this class that cache all the data needed to convert the old and new classes to reduce RPC calls and improve conversion performance.
I thought it was very close to the truth of the problem, but I found that all static variables of this class were initialized when the class was loaded. Although it would take up more than 100 M of memory, it would hardly add data after that. Also, this class has been available since March, and the version of the Client package has not changed.
After all of the above analysis, the static HashMap of this class will survive through multiple rounds of YGC and eventually be promoted to the old age, which should not be the reason why YGC lasts too long. So, for the time being, we’ve ruled it out.
5. Analyze the time of YGC processing Reference
The team has very little experience in troubleshooting YGC problems and does not know how to analyze them further. Basically all the cases available on the Internet were swept up and found that the reasons were concentrated in two categories:
1. It takes too long to annotate living objects: For example, Finalize method of Object class is overloaded, which results in too long time to annotate Final Reference. Alternatively, the String.intern method is used improperly, causing YGC to take too long to scan StringTable.
2. Excessive accumulation of long-period objects: for example, improper use of local cache accumulates too many viable objects; Or lock contention can become so severe that threads block and the lifetime of local variables becomes longer.
For class 1 problems, you can display the time it takes for GC to process Reference with -xx :+PrintReferenceGC. When this parameter is added, you can see that different types of reference processing take very short time, so this factor is eliminated.
6. Return to the long-period object for analysis
After that, we added various GC parameters to try to find clues, but nothing came of it. We seemed to run out of ideas. The combination of monitoring and analysis suggests that only long-lived objects can cause us this problem.
After several hours, a friend finally finds a second suspect in MAT’s heap memory.
As you can see from the screenshot above, the ConfigService class, which ranks third in the list of large objects, has 270 million objects in a single ArrayList variable, most of them identical elements.
The ConfigService class is in a third-party Apollo package, but the source code has been modified twice by the corporate architecture department. The problem is that line 11 adds elements to the List every time the getConfig method is called, and it is not reprocessed.
Our advertising service stores a large number of advertising policy configurations in Apollo, and most requests call the getConfig method of ConfigService to get the configurations, causing this problem by constantly adding new objects to the static variable Namespaces.
At this point, the whole problem was finally solved. This BUG was accidentally introduced by the architecture department during the customized development of Apollo Client package. It was obviously not tested carefully, and it was released to the central warehouse just one day before our launch. However, the version of the company’s basic component library was maintained by means of super-POM, without any business awareness.
7. Solutions
To quickly verify that YGC takes too long because of this problem, we replaced it directly with the old Apollo Client package on one of the servers, then restarted the service and observed that YGC returned to normal for about 20 minutes.
In the end, we notified the architecture department to fix the BUG and re-released super-POM, which solved the problem once and for all.
02 Summary of relevant knowledge points of YGC
As you can see from the above case, YGC problems can be difficult to troubleshoot. Compared to FGC or OOM, YGC logs are very simple and only know the changes and time of the new generation memory, and the dumped heap memory must be carefully checked.
In addition, if the YGC process is not clear, it is more difficult to troubleshoot. Here, I will sort out the knowledge points related to YGC, so that you can have a more comprehensive understanding of YGC.
1. Five questions to re-understand the new generation
When YGC is carried out in Cenozoic, it is necessary to make clear the division of Cenozoic reactor structure. The new generation is divided into Eden area and two Survivor areas. Eden:from:to = 8:1:1 (the ratio can be set by parameter — XX:SurvivorRatio), which is the most basic understanding.
Why is there a new generation?
If all objects are in one region regardless of generation, each GC will need to scan the whole heap, which is an efficiency problem. After generation, the collection frequency can be controlled separately and different collection algorithms can be adopted to ensure the global optimal GC performance.
Why did the new generation adopt copying algorithms?
The new generation of objects die quickly, and about 90% of new objects can be quickly recycled, with cheap replication algorithms and free space from debris. Although the mark collation algorithm can also ensure that there is no fragment, due to the large number of objects to be cleaned in the new generation, it requires a large number of moving operations to collate the surviving objects to be cleaned, which is higher than the time complexity of the replication algorithm.
Why does the new generation need two Survivor zones?
In order to save space, if the traditional replication algorithm is adopted and there is only one Survivor zone, the size of Survivor zone should be equal to the size of Eden zone. In this case, the space consumption is 8 * 2, and two survivors can keep new objects always created in Eden zone. The surviving object can be transferred between Survivor, and the space consumption is 8+1+1, obviously the latter has higher space utilization.
What is the actual available space for the New generation?
After YGC, there is always a Survivor zone that is free, so the free memory space of the new generation is 90%. Don’t be surprised if you look at the new generation space in the YGC log or through the jmap-heap PID command and find that capacity is only 90%.
How does the Eden region speed up memory allocation?
The HotSpot VIRTUAL machine uses two techniques to speed up memory allocation. Bump -the-pointer and TLAB (Thread Local Allocation Buffers), respectively.
Since the Eden region is continuous, bump-the-pointer only needs to check if there is enough memory behind the last object when creating an object, thus speeding up memory allocation.
TLAB technology is for multi-threading. Each thread is allocated a region in Eden to reduce lock conflicts during memory allocation, speed up memory allocation and improve throughput.
2. Four types of recyclers for the new generation
SerialGC, one of the oldest, executes single-threaded and is suitable for single-CPU scenarios.
ParNew (parallel collector), the serial collector multi-threaded, suitable for multi-CPU scenarios, need to be used with the old CMS collector.
ParallelGC, which differs from ParNew in that it focuses on throughput, sets the desired pause times, and automatically adjusts the heap size and other parameters as it works.
G1 (garage-first collector), the default collector in JDK 9 and later, takes care of both old and new generations. It splits the heap into a series of regions and does not require contiguous chunks of memory.
All of The above collectors use a copy algorithm and are exclusive, stopping The World during execution.
3. Trigger time of YGC
YGC is triggered when there is insufficient space in Eden. See the detailed process in conjunction with the memory allocation of new generation objects:
1. New objects will first try to allocate on the stack. If not, they will try to allocate in TLAB.
2. If there is no suitable space in Eden area, YGC will be triggered.
3. During YGC, the surviving objects in Eden area and From Survivor area are processed. If the conditions of dynamic age judgment are met or the space in To Survivor area is insufficient, the old era is directly entered; if the space in the old era is insufficient, promotion failed occurs and the recycling of the old era is triggered. Otherwise, the surviving object is copied To the To Survivor zone.
4. At this time, the remaining objects in Eden zone and From Survivor zone are garbage objects, which can be directly erased and recycled.
In addition, if CMS collector is used in the old era, in order to reduce the time consuming of CMS Remark stage, YGC may also be triggered, which is not expanded here.
4. YGC execution process
The replication algorithm adopted by YGC is mainly divided into the following two steps:
1. Find GC Roots and copy the referenced objects to S1
2. Recursively traverse the object in step 1, copy the referenced object to S1 or promote it to Old
The entire process above requires the suspension of the business thread (STW), but the new generation of recyclers such as ParNew can execute in parallel with multiple threads to improve processing efficiency.
YGC searches down from the GC Root (the starting point of the reachable object) through the reachabness analysis algorithm to mark the currently alive objects, and the remaining unmarked objects are the objects that need to be reclaimed.
Objects that can be used as GC Root for YGC include the following:
1. Objects referenced in the virtual machine stack
2. Objects referenced by static properties and constants in the method area
Objects referenced in the local method stack
4. Objects held by Synchronized locks
Record the SystemDictionary of the currently loaded class
6. Record StringTable references to string constants
7. Objects that are referenced across generations exist
Objects in the same CardTable as GC Root
Among them, 1-3 is easy to think of, while 4-8 is easy to ignore, but it is very likely to be the clue entry for analyzing YGC problems.
Also note that in the case of cross-generation references in the figure below, the older object A must also be part of the GC Root, but there is definitely an efficiency issue if the older object is scanned every time YGC is performed. In HotSpot JVM, Card tables are introduced to speed up the marking of cross-generation references.
Simply understood, Card Table is an idea of space for time. Since objects with cross-generation references account for less than 1%, the heap space can be divided into 512-byte Card pages. If an object in a Card page has cross-generation references, 1 byte can be used to mark the Card page as dirty. The card page state is further maintained through write barrier technology.
After traversing GC Roots, you can find the first surviving objects and copy them into S1. What follows is a recursive process of finding and copying living objects.
In S1 area, two pointer variables are introduced to facilitate the maintenance of the memory area: _SAVed_mark_word and _TOP, where _saved_mark_word represents the location of the current traversal object, and _top represents the location of the current allocated memory. Obviously, the objects between _saved_mark_word and _TOP are copied but not scanned objects.
As shown in the figure above, every time an object is scanned, _saved_mark_word will move forward. During this period, if there is a new object, it will also be copied to S1, and _TOP will also move forward until _saved_mark_word catches up with _TOP, indicating that all objects in S1 have been traversed.
One detail to note is that the target space of the copied object is not necessarily S1, but may also be older. If an object’s age (the number of YGC experiences) meets the dynamic age criteria, it is promoted directly to the old age. The age of the object is stored in the Mark Word data structure of the Java object header.
Write in the last
This paper introduces the relevant knowledge of YGC in detail through online case analysis and principle explanation. From the perspective of YGC combat, let’s briefly summarize:
1. Firstly, the implementation principle of YGC should be clear, such as heap memory structure of young generation, memory allocation mechanism of Eden area, GC Roots scanning, object copy process, etc.
2. The core steps of YGC are annotation and replication. Most YGC problems are concentrated in these two steps.
About the author: 985 master, former Engineer of Amazon, now 58-year-old technical director
Welcome to pay attention to my personal public number: IT career advancement, wonderful original constantly!