preface

The weekly routine came as usual. I am familiar with the development version of the system, but also opened the service monitoring system, so that others see that I am a responsible boy for business. Tons, tons, tons, tons, tons of operation… Dubbo request timeout: 400+ a minute, 10 times the number of requests, up to the order of 100.

It happened late on Tuesday night. Preparing for the previous requirement at that time would have resulted in longer IO for a single thread in the service, which was expected. In the gray scale, I found that the number of dubbo request timeouts of the downstream service increased, so I immediately thought of checking the monitoring of the underlying storage and found that the 999 lines of the underlying storage service were all normal. With years of experience, a voice in my mind told me that there must be something wrong with the service GC.

Then I looked at the service’s GC, and sure enough, the time for a single GC has increased considerably.

At that time, I immediately found the downstream service and told me that the timeout of this part was within the range, and I was finally relieved. Then continued to engage in other business, although I pretended to be calm, in fact, at that time the heart has panic not.

This is when you say no and expand, but am I that easy to compromise? Isn’t this the time to test my internal skills? Although I usually work in CRUD, what programmer doesn’t like to explore various technologies? This is the romance of all programmers! After finishing up the work at hand, GC tuning began.

The process
  1. Print the GC log. You can add it to the startup GC parameter of the virtual machine-XX:+PrintGCDetailsParameter to print out the time taken for each GC.
  2. Analyze the logs to find out which GC phases are causing the increase in GC time.

GC log at that time:

%xwEx[GC Pause (G1 Evacuation Pause) (young), 0.0962103 SECS] [Parallel Time: 23.3ms, GC Workers: 4] [GC Worker Start (ms): MinAvg: 146441.3, Max: 146441.3, Diff: 0.1] [Ext Root Scanning (MS): Min: 1.5, Avg: 1.8, Max: 2.4, Diff: 1.0, Sum: 7.2in[Processed Buffers: M] : 1.0, Avg: 1.5, Max: 1.7, Diff: 0.6, Sum: 5.9inAvg: 34.8, Max: 41, Diff: 14, Sum: 139] [Scan RS (ms): Min: 0.3, Avg: 0.3, Max: 0.3, Diff: 0.0, Sum: 1.3]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.5, Max: 1.1, Diff: 1.1, Sum: 1.9]
      [Object Copy (ms): Min: 18.3, Avg: 19.0, Max: 19.6, Diff: 1.2, Sum: 76.1in: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.8, Max: 3, Diff: 2, Sum: 7]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [GC Worker Total (ms): Min: 23.1, Avg: 23.2, Max: 23.2, Diff: 0.1, Sum: 92.7]
      [GC Worker End (ms): Min: 146464.4, Avg: 146464.4, Max: 146464.5, Diff: 0.1] [Ref Proc: 69.9 ms] [Ref Enq: 0.6 ms] [Redirty Cards: 0.1ms] [Humongous Reclaim: 0.1ms] [Free CSet: 0.1ms] Survivors: 88.0m -> 92.0m Heap: 5044.1m (8192.0m)-> 224.1m (8192.0m)] [Times: Survivors: 88.0m -> 92.0m Heap: 5044.1m (8192.0m)-> 224.1m (8192.0m)] User sys = = 0.17 0.00, real = 0.09 secs]Copy the code

At that time, logs of young GC type were basically found in GC, so there was no GC time increase caused by old genaration collection in Mixed GC. ParallelRefProcEnabled -xx :+ParallelRefProcEnabled -xx :ConcGCThreads -xx :G1HeapRegionSize -xx :G1HeapRegionSize A set of flowing water down, began the dry version of the process. Sure enough, everything was as I expected:

Wait, I’m done here? Of course not. I want people to see is the purpose of this article will know how to optimize high concurrent service after gc, I dare say that even if a lot of struggle for years old in a line of giant development, tuning the parameters of gc is also a little knowledge, so this part of the knowledge, can easily stand out in the interview and work! I will go through GC for virtual machines step by step in the following sections.

Common collector comparison

Most of the current corporate development is based on JDK8. The main GC collector used is G1. Here is a simple comparison between CMS and G1 to highlight how good G1 collector is. There may be some small companies that use the Parallel collector, but Parallel is really a featureless collector that doesn’t compare to the G1. This is just a comparison and will be explained in more detail later, but it highlights the reasons why most companies choose the G1.

G1 Recovery phase
  1. Initial tag
  2. Concurrent tags
  3. Final tag (concurrent)
  4. Filter recycle (concurrent)
CMS Collection phase
  1. Initial tag
  2. Concurrent tags
  3. To mark
  4. Concurrent remove
The same parts of CMS and G1
  1. Initial tags all need to be paused, that is, not in parallel with The user thread, and need to Stop The Wrold. Objects that can be marked directly with GC Roots (the concept of GC Roots will be explained in a later virtual machine series, not related to GC in this article)
  2. Concurrent tags can be accompanied by user threads, that is, no pauses are required. Concurrent marking is primarily the process of analyzing object reachability along the GC Reference Chain.
Differences between CMS and G1
  1. Concurrent collection of the CMS is done with the user thread and does not require pauses. If garbage is generated after the tagging process, there is a problem of incomplete collection. Imagine cleaning your house while someone is throwing trash around. Isn’t it a bad way to clean? While G1’s filter collection needs to suspend the user thread, but it is concurrent collection, so it is fast and efficient.
  2. The collector of CMS cannot collect data from the young generation, but only from the old generation. Therefore, you need to work with other collectors, such as ParNew or Serial. G1, on the other hand, can be recycled whether young or old, and can manage the entire Java heap independently. In addition, the recycling algorithm adopts the mark collation (partly copy and clear algorithm, because the young generation adopts copy and clear algorithm with higher efficiency), which will not cause the problem of memory fragmentation. In contrast to CMS, which is just poor mark clearing, in some high QPS scenarios, memory fragmentation is so severe that it can easily cause Full Gc!
  3. The Java heap has a different memory layout when using the two collectors. When using the CMS collector, the new generation and the old generation are physically isolated (as two different contiguous regions of memory). In G1, the whole Heap is divided into several different regions. Cenozoic and old are memory regions intersecting each other, which is a logical classification. The purpose of this division is to avoid the need to perform every reachability analysis on an object throughout the heap. For example, each region in a Collection Set has a Remembered Set. G1 can maintain this Remembered Set and select the most cost-effective region for recycling. Improve recycling efficiency.

Therefore, from the perspective of business, G1 has obvious advantages, such as generation collection, avoidance of memory fragmentation, complete collection at a time, difficulty in Full GC, concurrent marking, and controllable pause time (value collection can be determined based on the pause time and region occupied size).

G1 Related concepts

  • Region, which can be understood as the smallest unit of memory in the heap managed by G1. The number of regions in the heap is divided according to RegionSize. Objects in the heap are distributed in each Region. If the size of objects exceeds RegionSize, the objects are distributed in different regions with contiguous memory regions.
  • Young region and Old Region. As the name implies, objects of Young generation are allocated to the Young region. Most objects of the old generation are allocated in the Old Region (humongous objects are allocated in the Humongous Region when the size of the object exceeds half of the RegionSize. This region is also divided into old generation. Young generation includes eden-region and Survivor region. Eden-region is the region allocated to all newly generated objects. The Survivor region is where objects of the young generation survive during the marking process are stored. Some objects in the Survivor region may be promoted to the old generation in later stages.

The whole GC process can be understood as follows: For young GC, the reachability of the whole young GEenration object is analyzed. If the object is reachable, the object is marked as alive; if not, the object is marked as dead. The object marked as alive is then copied to survivor regions or Old Regions, and the regions in which dead objects reside are reclaimed.

For Mixed Gc, in addition to marking and copying the whole young generation in the above young Gc, the old regions in part of the old generation will be recovered. The recovery algorithm adopts marker-collation. That is to free up as much contiguous memory space as possible.

G1 Recovery phase classification
  1. youg-only phase: Promote objects in Young Regions to Old, save them to Old Regions, and reclaim Young Regions.
  2. space-reclamation phase: commonly known as a phase in Mixed GC. Not only young regions will be recycled, but also valuable Old regions will be selected for recycling.

The two phases can be converted. When the percentage of the memory usage space of the old generation to the total old generation space exceeds a certain threshold (explained later), not only the young generation will be reclaimed, but also the old generation will be reclaimed.

Specifically, when the threshold is exceeded, the young-only phase will be triggered first, and then the space-Reclamation phase will be carried out after the young GC is finished. If the percentage of old decade space usage does not reach the threshold, the space-Reclamation phase will not be triggered, but the next YOUg-only phase will continue to cycle.

Pause phase during G1 collection
  • Youg -only Phase pause

    1. Initial Mark: Marks objects to which GC Roots can be directly associated, with very short pauses. Concurrent Marking (no pause required) is followed by Concurrent Marking to determine whether all living objects in the Old regions need to be retained in the next space-Reclamation phase. While the initial tag process may not be finished, another Young GC process that does not include the initial tag may already be working. Initial tag does not necessarily occur in all of the young, gc, only use the old s space total old s space size exceeded InitiatingHeapOccupancyPercent, will trigger the initial tag.

    2. Remark: This pause is mainly to complete the marking process. Because the previous procedure was concurrent marking, running along with the user thread, it is possible that after the concurrent marking, some objects will change their reachability as the user thread executes, so the user thread will have to stop to clean up some floating garbage. This process mainly deals with Reference processing and the unloading of some classes, and although it is paused, it can have multiple marking threads.

    3. CleanUp: Also requires a pause. A region is reclaimed and an empty region is reclaimed. And determine if you want to move into the Space-Reclamation phase. If a space-recalmation phase is needed, proceed and cleanUp some young regions and old regions (old regions with recycling value). If you are sure that the space-reclamation phase is not needed, the current cleanup phase is equivalent to the YOUg-only phase cleanup, which simply indicates that the cleanup of the young generation has been completed.

  • Pause in the Space-Reclamation phase

    For example, old objects in Colletions refer to new generation objects, The region where the new generation object resides will record the old region in the Remembered set maintained by the new generation region. Therefore, in the space-Reclamation phase, GC will be conducted not only for the new generation of Object regions, but also for the Remembered set behind the new generation of object regions, and some old regions with high cost performance will be selected and recycled.

So far, I have told you everything I know about G1 reclamation. If you are interested in G1 reclamation algorithms, such as snapshot-at-the-beginning, you can go to oracle’s official website to learn more about it. I will publish an article about G1 reclamation when I have time. Next, I will analyze different GC scenarios and make recommendations. Sit tight and let’s go!

GC actual combat optimization link

Full GC

This is the stage of a round-the-bottom GC for G1 (and most collectors as well). For example, the program uses a certain template parsing engine but does not extract variables, resulting in the compilation of a new class every time. In this case, it is easy to cause the Full GC. Or there are many humongous objects. In addition to gC-related tuning, if Full GC occurs in your application, you will also need to spend more time optimizing your business code.

To put it bluntly, too many objects were created and g1 could not be recycled in time. A common scenario is that Concurrent Marking does not complete in time.

Therefore, the optimization idea of Full GC is mainly divided into two aspects:

  1. Reduce the time for Concurrent Marking;
  2. Or increase the old Gen area.

Full GC tuning Tips:

  1. If there are too many large objects, check the number of Humongous regions by gc+heap=info. You can increase the RegionSize by running -xx :G1HeapRegionSize to prevent large objects in the old era from occupying too much memory.

  2. By increasing the heap size, the corresponding effect G1 can have more time to complete Concurrent Marking.

  3. Add threads for Concurrent Marking, set with -xx :ConcGCThreads.

  4. Force the MARK phase to proceed earlier. Before the Mark phase, G1 determines whether the Initiating Heap Occupancy Percent(IHOP) threshold is sufficient based on the behavior of the application, such as initial Mark. And the space-Reclamation phase in the subsequent CleanUp phase; If there is a sudden increase in service traffic or other behavior changes, then the threshold based on the previous prediction will be inaccurate. The following ideas can be adopted:

    1. You can add G1 to theIHOPAnalyze the required memory space during the process, through-XX:G1ReservePercentTo improve the efficiency of forecasting.
    2. Turn off G1 autoIHOPAnalyzing the mechanism,-XX:-G1UseAdaptiveIHOP, then manually specify the threshold size,-XX:InitiatingHeapOccupancyPercent. This saves one time cost per forecast.
  5. The Full GC may be due to the large number of Humongous objects in the system and the system cannot find a contiguous regions to allocate. You can increase the region size by -xx :G1HeapRegionSize or increase the entire heap size.

Mixed GC or Young GC tuning

The Reference Object Processing takes a long time

Ref Proc and Ref Enq, Ref ProcG1 update the referents according to the requirements of different reference types. Ref EnqG1 If the actual reference objects are already unreachable, they are added to the corresponding reference queue. If this process is a long one, consider turning the process on parallel by passing -xx :+ParallelRefProcEnabled.

young-onlyRecycling longer

The main reason is that there are too many living objects in the Collection Set that need to be copied. You can obtain the corresponding time based on the Evacuate Collection Set in gc logs, and increase the minimum size of young Geenration by running the -xx :G1NewSizePercent command. It is also possible that at a certain moment, a large number of objects will survive, which will cause the GC pause time to increase. In this case, the maximum space of young Generation can be increased by -xx :G1MaxNewSizePercent.

MixedThe recovery time is longer

Gc +ergo+cset=trace can be used to trace predicated young Regions. If the predicated old regions are long, you can use the following methods:

  • increase-XX:G1MixedGCCountTargetThis parameter is increased by spreading regions of the old generation into more collections (explained above)-XX:G1MixedGCCountTargetParameter values. Avoid processing large collections at once.

Now, looking back at the parameters I adjusted earlier, does it make sense that the GC efficiency of the service improves immediately after I adjusted it? In fact, the process of tuning is not overnight, need to continue to polish, with experience, after you see what you think of more than others forever!

It’s hard to write by hand, welcome everyone to give a thumbs-up and leave a message. Congratulations to FPX and TES for entering the final together.