The background,

As can be seen from Figure 1 and figure 2, heap and permGen usage are not high, but inquiry-Center2 machines perform fullGC almost twice a day.

Two, problem investigation

 -Xms2g 
 -Xmx2g
 -Xmn448m
 -XX:SurvivorRatio=5
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:+UseCMSCompactAtFullCollection
 -XX:CMSFullGCsBeforeCompaction=0
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:MetaspaceSize=300M
 -XX:MaxMetaspaceSize=300M
Copy the code

View the true line JVM configuration parameter CMSInitiatingOccupancyFraction = 75, configuration of memory is 2 gb, actual heap memory remain at around 570 m, the old s memory 1.5625 g, the old s memory usage is less than 75%, short of trigger fullGc conditions; Basically, the heap memory does not change significantly before and after each FULLGC. Investigation and discussion, judge the fullGC may be caused by the System. Gc, pinpoint pinpoint trigger fullGC time point, view the corresponding time period of GC-log, is really caused by the fullGC System. The program in the project was excluded from calling System.gc, judging that it might be due to the calling of System.gc by NIO package, but no logs with reference value were found, and the specific calling method could not be located. Through repeated observation of GC logs, it is found that these FULLGC logs are periodic with a period of 10 hours, as shown in Figure 3. According to the GC log analysis, the phenomenon of Full GC triggering every 10 hours is most likely caused by the call of external components. Based on this clue, we started to troubleshoot the problem and learned that tomcat, NIO, and CXF versions may have similar periodic Full GC problems. Check out the related resources and see a few blogsBlog.csdn.net/qq_35963057…, also every 10 hours full gc; Therefore, it is highly probable that full GC is caused by APACHE CXF package.

3. Positioning analysis

According to the above analysis, check the maven dependency of the application, as shown in Figure 4. Sure enough, the apache-CXF package is indirectly introduced in the application due to its dependence on the company’s two packages.Because JDKBugHacks under apache-CXF uses reflection to call the requestLatency method in sun.misc.gc as shown in Figure 5, calling this method creates a daemon thread as shown in Figure 6; From Figure 7, it can be seen that the fullGc is caused by the system. gc method being called in the daemon thread’s run method, and from gc.lock. wait, it can be seen that VAR1 is 36000000ms passed in by the reflection call in Figure 5, i.e. 10 hours. After a fullGc, VAR4 will become 0. Therefore, the lock waiting time is 10 hours, which basically determines that the true line of online inquiry is caused by the JDKBugHacks reflection call under the CXF package of Apache; The GC thread is to prevent parts memory leak problem (such as javax.mail. Management. Remote. The rmi. RMIConnectorServer later distribution outside the heap memory), Create a 10-hour GC daemon thread by calling requestLatency in the Sun.misc.gc class to periodically tell the JVM to collect garbage. This thread determines whether any other component has already created GC threads using requestLatency, and if so, it skips, or creates a new daemon. Based on the class name JDKBugHacks, we can also guess that this class is intended to fix some existing JDK or component problems, but this regular FGC is not required in this application. Since it is introduced by the company’s two-party package, there is no explicit call related classes in the project. We need to continue to check where the JDKBugHacks class is called. It was found that the spring framework was called during initialization, as shown in Figure 8 and 9 (several calling nodes were omitted in the middle, and you can check them with debug if you are interested).

Iv. Solutions

1. It can be seen from FIG. 5 that the judgment conditions can be controlled, Let it not perform if the logic inside directly join in the start-up project JVM parameters – Dorg. Apache CXF. JDKBugHacks. GcRequestLatency = true to make the judgment to false JDKBugHacks, then skip this section of the logic. 2. Since this package is introduced by margin, the actual call margin interface does not use this package, so apache-CXF package is excluded, as shown in Figure 10.This modification adopts the second solution;The fullGC did not appear in the past three days after the release of the version at 20:00 on Line 19.

Five, the other

5.1 During the investigation of this problem, some good solutions were found. Although these solutions were not used this time, they were recorded and may be used in the future. If you want to print out the method of calling system. 1) lightTrace: if system.gc is called, the stack will be printed (github address: github.com/xpbob/light…). 2) write code to trace system.gc calls via btrace (github address: github.com/btraceio/bt…) ; 5.2, through the screening of this problem, it is recommended that companies to standardize the second party to provide, should try to provide a cleaner 2 package, in addition to the problems that cause fullgc, also often occur in the project development due to rely on the second side of package introduced other dependence, lead to project cannot be started, this kind of problem is usually hidden, it will be very hard to fast accurate positioning;