Java garbage collection plays an important role in the JVM. This article will briefly discuss the Java garbage collector, from the ancient Serial to the latest ZGC, and briefly analyze the characteristics of each garbage collector, focusing on G1 and ZGC.

Serial/Serial Old

The Serial collector is the most basic and oldest collector and was once (prior to JDK 1.3.1) the only choice for the new generation of HotSpot VIRTUAL machine collectors.

During garbage collection, all other worker threads must be suspended until the collection is complete and the GC threads are single-threaded. For the new generation, Serial uses a replication algorithm; For older generations, Serial uses a mark-collation algorithm.

ParNew

The new generation collector needs to be used with other older collectors. Prior to JDK7, only he and Serial could be used with CMS collectors, so they were often the first choice for the new generation of collectors.

The ParNew collector is essentially a multithreaded parallel version of The Serial collector. The control parameters, collection algorithms, Stop The World, object allocation rules, and collection strategies are exactly The same as The Serial collector. In terms of implementation, The two collectors also share a considerable amount of code.

Parallel Scavenge/Old

The Parallel Collector is a new generation collector based on the mark-copy algorithm, multi-threaded collection. The Collector cannot be used with CMS.

Parallel Old collector works in the Old days, based on mark-collation, multi-threaded collection, launched in JDK6.

Unlike a collector that focuses on pause times, a Parallel collector focuses on throughput, which is the ratio of how long the processor runs the user’s code to the total time consumed by the processor.

CMS

The CMS collector works from the old days, based on the mark-sweep algorithm, multi-threaded collection, launched in JDK5. It is the first collector to support concurrency, garbage collection threads can work at the same time as user threads.

The working process of the

The working process of CMS is mainly divided into four steps:

1) CMS Initial Mark

2) CMS Concurrent Mark

3) Re-marking (CMS Remark)

CMS Concurrent sweep

The initial tag and re-tag steps still need to “Stop The World,” and The other two steps can work simultaneously with The user thread.

Initial tags simply mark objects to which GC Roots can be directly associated

The concurrent marking phase is the process of traversing the entire object graph from the directly associated objects of GC Roots. This process is time-consuming but does not require the user thread to be paused and can be run concurrently with the garbage collection thread.

The relabelling phase is used to correct the record of the part of the object whose markup changes during concurrent marking as the user program continues to operate.

Finally, there is the concurrent cleanup phase, which cleans and deletes dead objects judged by the marking phase. Since there is no need to move living objects, this phase can also be concurrent with the user thread.

disadvantages

Two features of a CMS cause it to start a FULL GC

Floating garbage generation

Concurrent tags in CMS and concurrent cleanup phase, user thread is continues to run, program is running will also naturally accompanied by a new garbage objects are generated, but this is part of the garbage objects appear in the marking process after the CMS cannot be in when to collect to get rid of them, so we have to leave to clean up again when the next garbage collection. This part of garbage is called “floating garbage”. If you generate a lot of floating garbage, you might hit a Full GC that completely “Stop The World”.

A lot of space debris

CMS is a collector based on a “mark-and-sweep” algorithm, which means a lot of space debris is generated at the end of the collection. When there is too much space debris, the allocation of large objects will cause a lot of trouble. It is often the case that there is still a lot of free space in the old years, but there is not enough contiguous space to allocate the current object, and you have to trigger a Full GC in advance. To optimize both cases, the JVM provides two parameters that control the defragmentation of old chronospatial debris:

  1. -xx :+UseCMS -compactatFullCollection. If this parameter is enabled, the defragmentation process will be enabled before FULL GC is required.
  2. Another is – XX: CMSFullGCsBeforeCompaction, , which requires the CMS collector to defragment the next time it enters Full GC (the default is 0, indicating that it defragment each time it enters Full GC) after a number of undecluttered Full GC.

Since defragmentation requires moving living objects, it cannot be performed concurrently with the user thread, resulting in a short pause.

G1

The Garbage First(G1 for short) collector is a landmark in the history of Garbage collector technology. Based on the memory layout of Region, Garbage is collected for the entire Java heap, and users can specify their own expected pause times.

Memory layout based on Region

In the garbage collector before G1, the heap memory was generally divided into the new generation and the old generation, and the young generation was divided into Eden region and two Survivor regions. The ratio of space size between different generations and partitions was specified by parameters.

The G1 garbage collector divides the contiguous Java heap into independent regions of equal size, each of which can act as an Eden space for the new generation, a Survivor space, or an old chronolespace, as needed.

Regions have a special Humongous Region that is used to store large objects. G1 considers large objects as long as the size of a Region exceeds half of its capacity. The size of each Region can be set using -xx :G1HeapRegionSize. The value ranges from 1MB to 32MB and is the NTH power of 2. However, super-large objects that exceed the capacity of the entire Region will be stored in N consecutive Humongous regions. Most of G1 behaviors treat Humongous regions as part of the old era.

Garbage collector for entire heap memory

Garbage collectors prior to G1 were generally only for a specific generation (except for FULL GC), either for the new generation or the old generation. Although G1 is also designed based on the idea of generational collection, the concept of generational collection is weakened due to the use of region-based memory layout. Garbage collection is not measured by which generation it belongs to, but by which memory blocks hold the most garbage and collect the most revenue. During garbage collection, G1 copies the surviving objects of the Region to the empty Region and cleans up the entire space of the old Region. This is more like a mark-copy algorithm for new generation regions, and more like a mark-unscramble algorithm for regions that store large objects.

G1 collector mainly has two modes: Young GC and Mixed GC:

Young GC

Select all Cenozoic regions. The time cost of the Young GC is controlled by controlling the number of regions in the new generation, i.e. the memory size of the new generation. G1 algorithm divides the heap into several regions, which still belong to the generational collector. However, some of these areas contain the new generation, whose garbage collection still copies live objects to older generations or Survivor Spaces by suspending all application threads. The old era is also divided into regions, and the G1 collector cleans up by copying objects from one region to another. This means that, under normal processing, G1 compacts the heap (at least part of it) so that there is no CMS memory fragmentation problem.

Mixed GC

All regions in the new generation are selected, plus some old regions with high revenue collection based on global Concurrent marking statistics. Select the old Region with high income as far as possible within the cost target range specified by the user. The Mixed GC is not a full GC and can only reclaim parts of older regions. If the Mixed GC is simply unable to keep up with the program’s memory allocation rate and the old age fills up too fast to continue the Mixed GC, the serial Old GC (Full GC) is used to collect the entire GC heap. So we can see that the G1 does not provide full GC.

Predictable pause time model

The pause time model is meant to support the goal of specifying that within a time segment of M milliseconds in length, there is a high probability that no more than N milliseconds will be spent on garbage collection.

Predictable pauses model implementation approach is to allow the G1 collector to track each Region the size of the accumulation of junk inside “value”, value the recycling of space and time needed for recycling experience value, and then maintain a list of priority in the background, every time according to user setting allows collection pauses, use the parameter – XX: MaxGCP AuseMillis specifies that the default value is 200 milliseconds.

G1 Working process

The operation of the G1 collector can be roughly divided into four steps:

Initial Marking

Just mark the objects that GC Roots can be directly associated with. This phase requires the thread to pause, but it is very short, and it is done synchronously during the Minor GC, so the G1 collector actually has no additional pauses during this phase.

Concurrent Marking

Reachability analysis of objects in the heap is performed starting with GC Root, recursively scanning the entire heap object graph to find objects to reclaim, which is time-consuming but can be performed concurrently with the user program.

Final Marking

Another short pause is made on the user thread to deal with the last few unmarked objects that remain after the concurrent phase is over. (Incremental update: THE CMS collector is implemented using the incremental update algorithm, while the G1 collector is implemented using the raw snapshot (SATB) algorithm.)

Live Data Counting and Evacuation

You can update statistics of regions, sort the reclamation value and cost of each Region, and make a reclamation plan based on the expected pause time of users. You can select multiple regions Then, the living objects of the Region to be reclaimed are copied to the empty Region, and the entire space of the old Region is cleared. The operation here involves the movement of the living object, which must suspend the user thread, and is done in parallel by multiple collector threads

As you can see from the description of the phases above, in addition to the concurrent marking, the G1 collector also suspends the user thread completely during the remaining phases

ZGC

The ZGC collector is a garbage collector based on Region memory layout, with (for now) no generation, and uses techniques such as read barriers, dye Pointers, and memory multiple mapping to implement concurrent mark-collation algorithms with low latency as the primary goal.

ZGC memory model

On the X64 hardware platform,ZGC regions are divided into large, medium, and small capacities:

  1. Small Region: the capacity of a Region is fixed at 2MB. It is used to store Small objects smaller than 256KB.

  2. Medium Region: with a fixed capacity of 32MB, it is used to store objects larger than or equal to 256KB but smaller than 4MB.

  3. Large Region: the capacity of a Large Region is not fixed and can be dynamically changed. The value must be an integer multiple of 2MB and is used to store Large objects of 4MB or larger. Each large Region contains only one large object. This indicates that the actual capacity of a large Region may be smaller than that of a medium-sized Region, with the capacity as low as 4MB. Large regions are not reallocated in ZGC implementations (reallocation is a ZGC processing action used to copy objects during the collector phase) because copying a large object is very expensive.

Low latency implementation

ZGC has shorter pause times than previous garbage collectors.

G1 has a long pause because it has to “Stop The World” when it comes to collating and moving objects, because they don’t solve The problem of locating objects accurately during object movement. While the GC thread is moving the object, the application thread is constantly accessing the object. If the object is moved and the object address is not updated in time, the application thread may access the old address, causing an error.

Two problems need to be solved in the process of accurate access to the object, one is how to determine whether the object is moved, the other is to update the address of the object in time.

ZGC determines whether the object moves by coloring pointer and updates the address of the object in time by reading barrier technology, which solves the problem of accurate access to the object in the process of transfer and realizes concurrent transfer.

barrier

The barriers in question are the techniques by which the JVM inserts small pieces of code into application code. There are two types of barriers in the JVM, Load barriers and Store barriers.

The use of barriers is closely related to the architecture of modern cpus, which tend to have caches, so the values in the cache may differ from those in main memory.

For Load barriers, inserting read barriers before reading instructions invalidates data in the cache and reloads data from main memory

For a Store Barrier, inserting a write Barrier after a write instruction allows the most recent data written to the cache to be written back to main memory.

ZGC uses read barriers to ensure that the latest address of an object is read.

Coloring pointer

ZGC only supports 64-bit systems and divides the 64-bit virtual address space into multiple subspaces

[0 to 4TB) corresponds to the Java heap. [4 to 8TB) is called the M0 address space, [8 to 12TB) is called the M1 address space, [12 to 16TB) is reserved, and [16 to 20TB) is called Remapped space.

When an application creates an object, it first applies for a virtual address in the heap space, but the virtual address is not mapped to a real physical address. The ZGC applies for a virtual address for the object in the M0, M1, and Remapped address Spaces. The three virtual addresses correspond to the same physical address, but only one of the three Spaces is valid at the same time.

Corresponding to the above address space division, ZGC actually only uses bits 0-41 of the 64-bit address space, while bits 42-45 store metadata, and bits 47-63 are fixed at 0. ZGC stores object survival information in 42 to 45 bits, as opposed to traditional garbage collection and placing object survival information in object headers.

Two Mark bits indicate whether the object pointer has been marked. Using two Mark bits allows different Mark bits to be used in different GC sessions. Remapped indicates whether the current object pointer was adjusted to the moved object pointer. The Finalizable bit serves for Finalizable objects and is used to indicate whether the object pointer is only marked by Finalize objects. It is mainly used in the Mark stage and weak reference processing stage. Colored (Marked or Remapped) indicates that the Runtime is Marked (Marked or Remapped) in different GC phases.

ZGC working process

The working process of the ZGC can be abstracted into four parts: concurrent marking, concurrent preparatory reallocation, concurrent reallocation, and concurrent remapping

Concurrent Mark

Concurrent markup is the reachability analysis phase of traversing the object graph, followed by a short pause similar to G1, initial markup, and final markup (although not named as such in the ZGC). Unlike G1 and Shenandoah,ZGC is Marked on Pointers instead of objects, and Marked 0 and Marked 1 flag bits in dyeing Pointers are updated in the marking phase.

Concurrent Prepare for Relocate

The region to be cleared is formed into a Relocation Set. The surviving objects in the Relocation Set will be copied to other regions and released.

Concurrent Relocate

Redistribution is the core stage of the ZGC. This process copies the surviving objects in the redistribution set to the new Region and maintains a Forward Table for each Region in the redistribution set to record the direction relationship from the old objects to the new objects. Benefited from dyeing pointer support, can only from ZGC collector on a clear reference whether an object in the redistribution of set, if the user thread as concurrent access to the concentration of the redistribution of object, the visit will be intercepted by the preset memory barrier, then immediately turn according to the Region of the published records to forward the access to the new copy of objects, and the same Is modified to update the value of the reference so that it points directly to the new object, a behavior ZGC calls the pointer’s “SelfHealing” ability.

Concurrent Remap

All remapping does is correct all references in the heap to old objects in the redistributable set. Concurrent remapping of THE ZGC is not a task that has to be “urgent” because, as mentioned earlier, even old references are self-healing, with at most one more forward and fix operation on the first use. The main purpose of remapping to clean up these old references is not to slow them down (and to release the spin-off benefits of forwarding when the cleaning is done), so it’s not “urgent.” Therefore, the ZGC cleverly merges the work of the concurrent remapping phase into the concurrent marking phase of the next garbage collection cycle, which will traverse all objects anyway, thus saving the overhead of traversing the object graph once. Once all Pointers have been corrected, the forwarding table that records the relationship between the old and new objects can be released.

parameter

The parameters of ZGC can be roughly divided into three categories:

  • Heap size: Xmx. The ZGC is able to meet the service entry requirements of the business’s high SLAs with extremely low latency, but like concurrent GC in all programming languages, latency is trade-off in memory space. If the Allocation rate exceeds the reclamation rate and the heap memory is insufficient, an Allocation Stall is triggered. This Stall slows down the current user thread. Therefore, when we see an Allocation Stall in the GC log, we can usually assume that the heap space is too small or the number of Concurrent GC threads is too small.
  • The GC trigger: ZAllocationSpikeTolerance ZCollectionInterval. ZAllocationSpikeTolerance used to estimate the current heap memory allocation rate, the rest of the current heap memory, the greater the ZAllocationSpikeTolerance, estimate the faster time to OOM, trigger GC ZGC would sooner. The ZCollectionInterval is used to specify the interval at which GC is triggered in seconds.
  • GC threads: ParallelGCThreads, ConcGCThreads. ParallelGCThreads sets the number of GC threads used for STW tasks. The default value is 60% of the number of cpus. ConcGCThreads is the number of CONCURRENT GC threads. The default value is 12.5% of the number of cpus. Increasing the number of GC threads speeds up GC completion and reduces the time of each phase, but also increases CPU preemption overhead, which can be adjusted according to production.

conclusion

The performance optimization of garbage collector has gone through several different stages. At the beginning, garbage collector GC thread was single thread, and finally to ParNew collector, GC thread could execute concurrently with multiple threads. CMS was the first garbage collector that could execute concurrently with worker thread, followed by G1 and ZGC, More and more garbage collection processes can be executed in parallel with worker threads, and with the rise of the Internet, garbage collection is increasingly concerned with the pause time of garbage collection. Low latency has become an important measure of whether a box garbage collector is good or not.