HotSpot garbage collector

The HotSpot VIRTUAL machine provides a variety of garbage collectors, each with its own characteristics, and while we are comparing them, we are not trying to pick the best collector. We have chosen only the collector that is most appropriate for our specific application.

New generation garbage collector

Serial garbage collector (single thread)

Start only one GC thread for garbage collection and Stop all user threads during garbage collection, i.e. Stop The World.

The average client application requires less memory, does not create many objects, and does not have much heap memory, so the garbage collector takes a short time to collect, even if it stops all user threads during this time, it will not feel significantly stuck. Serial garbage collector is therefore suitable for use by clients.

Because the Serial collector uses only one GC thread, it avoids the overhead of thread switching, making it simple and efficient.

ParNew garbage collector (multithreaded)

ParNew is the multithreaded version of Serial. Garbage cleaning is done in parallel by multiple GC threads. But The clean-up process still needs to Stop The World.

ParNew pursues “low pause times”, the only difference from Serial is the use of multi-threading garbage collection, in multi-CPU environment performance can be somewhat improved than Serial. However, thread switching requires additional overhead and therefore does not perform as well as Serial in a single-CPU environment.

Insane. Parallel Insane.

The Parallel Insane, like ParNew, is a multi-threaded, new generation garbage collector. But there are big differences:

  • Insane: Chase CPU throughput, be able to perform specified tasks in a relatively short time, and therefore be suitable for background computing without interaction.
  • ParNew: Seeks to reduce user pause time, suitable for interactive applications.
Throughput = time to run user code/(Time to run user code + garbage collection time)Copy the code

The pursuit of high throughput can be achieved by reducing the amount of time the GC spends performing the actual work, however, running GC only occasionally means that there is a lot of work to be done each time the GC runs because of the high number of objects that accumulate in the heap during that time. Individual GC takes more time to complete, resulting in higher pause times. Given the low pause times, it is best to run GC frequently to complete more quickly, which in turn leads to throughput degradation.

  • Run -xx :GCTimeRadio to set the percentage of the garbage collection time to the total CPU time.
  • Set the maximum garbage processing pause time with the -xx :MaxGCPauseMillis parameter.
  • Run the -xx :+UseAdaptiveSizePolicy command to enable the adaptive policy. Once we set the heap size and MaxGCPauseMillis or GCTimeRadio, the collector automatically adjusts the size of the new generation, the ratio of Eden to Survivor, and the age at which the object enters the old age. As close as possible to the MaxGCPauseMillis or GCTimeRadio we set.

Old age garbage collector

Serial Old garbage collector

Serial Old collectors are older versions of Serial, both single-threaded collectors that enable only one GC thread and are suitable for client applications. The only difference is that Serial Old works in the Old days and uses a mark-tidy algorithm. Serial works in the new generation, using a “copy” algorithm.

Parallel Old garbage collector (Multithreaded)

The Parallel Old collector is an older version of the Parallel Insane, which seeks CPU throughput.

CMS garbage collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to achieve the shortest collection pause time (pursuit of low pauses). It allows the user thread and the GC thread to execute concurrently during garbage collection, so that the user does not feel a significant lag during garbage collection.

  • Initial tag: Stop The World, marking all objects directly associated with GC Roots using only one initial tag thread.
  • Concurrent tagging: Multiple tagging threads are used to execute concurrently with the user thread. This process performs reachability analysis and marks all abandoned objects. It’s slow.
  • Re-mark: Stop The World, execute concurrently with multiple marking threads, and mark The newly discarded objects that just appeared during The concurrent marking process.
  • Concurrent cleanup: Only one GC thread is used, executed concurrently with the user thread, to clean the object just marked. This process is very time consuming.

Concurrent tagging and concurrent cleanup processes take the longest time and can work with user threads, so overall, the CMS collector’s memory reclamation process is performed concurrently with the user thread.

Disadvantages of CMS:

  • Low throughput
  • Unable to handle floating garbage, resulting in frequent Full GC
  • Create debris space using a mark-clean algorithm

During the concurrent cleanup step, the user thread also generates a portion of the recyclable objects, but the portion of the recyclable objects will only be reclaimed the next time the cleanup is performed. A ‘Concurrent Mode Failure’ occurs when insufficient memory is reserved for user threads during cleanup, and when this occurs, the SerialOld collection Mode is switched to.

For generating the fragments of the space problem, can through the open – XX: + UseCMSCompactAtFullCollection, in each Full GC after the completion of a memory compression, will be scattered everywhere, to a piece of object. Set parameters – XX: CMSFullGCsBeforeCompaction tell CMS, after N times again after Full GC memory consolidation.

G1 General purpose garbage collector

G1 is a garbage collector for server-side applications that doesn’t have a concept of generation and generation, but instead divides the heap into separate regions. To collect garbage, estimate the amount of garbage in each Region and collect garbage from the Region with the highest garbage collection value. Therefore, the garbage collection efficiency is maximized.

Each Region ranges from 1 MB to 32MB. The Region size can be adjusted by G1HeapRegionSize. If the Region size exceeds half, the Region is considered as Humongous

As a whole, G1 is a collector based on a mark-tidy algorithm, and locally (between two regions) based on a copy algorithm, meaning that no memory space fragmentation occurs during runtime.

Memory sets and PRT

An object may not be in the same Region as the object referenced within it. When garbage collection occurs, does the entire heap memory need to be scanned for a complete reachability analysis?

Each Region has a Remembered Set that records the Region of objects referenced by all objects in the Region. Adding Remembered Set to GC Roots prevents traversal of the entire heap during reachabability analysis.

And the memory set records Pointers to itself and marks the card table range to which these Pointers belong. It is essentially a hash table that stores the mapping between the start address of another Region and the card table index. This complex bidirectional card table is both complex and takes up more memory.

PRT

RSet internally uses Per Region Table(PRT) to record partition references. Since the records of an RSet take up space for a partition, if a partition is very “popular”, the RSet takes up space, reducing the available space of the partition. G1 addresses this problem by changing the density of the RSet, which will record references in three modes in the PRT:

  • Rare: Directly records the card index of the reference object
  • Fine-grained: Records the partitioning index of the reference object
  • Coarse-grained: Only references are recorded, one bit per partition

Scanning efficiency is inversely proportional to record granularity

The maintenance of RSet

Since whole heap scanning is not possible and the exact activity of the partition needs to be calculated, G1 requires an incremental fully labeled concurrent algorithm to maintain the RSet to obtain accurate partition reference information. In G1, rsets are maintained from two main sources: Write barriers and Concurrence Refinement Threads.

In G1, the card table structure is complex and the post-write barrier operation of Rset is complex. In order to implement STAB algorithm, the pre-write barrier is also required to track application changes. Hence the complexity of the write barrier implementation. Using log recording, asynchronous update scheme.

Write barriers

Write pre-write Barrrier

When an assignment is about to be executed and the object on the left side of the equation changes and references to another object, the partition on which the object on the left side originally referenced loses a reference, and the JVM needs to keep track of the lost reference before the assignment takes effect. The JVM does not maintain the RSet immediately, but rather updates it in the future through batch processing (see SATB).

Post-write barrier Barrrier

When an assignment is performed and the object on the right side of the equation gets a reference to the object on the left, the RSet of the object on the right side of the equation should also be updated. Similarly, to reduce overhead, the RSet is not updated immediately after a post-write barrier occurs, but is only logged for batch processing in the future (see Concurrence Refinement Threads).

Original snapshot algorithm

SATB creates an object map, which acts as a logical snapshot of the heap, to ensure that all garbage objects during the concurrent marking phase can be identified through the snapshot. When an assignment statement occurs, the application will change its object graph, and the JVM needs to keep track of the overwritten objects. Therefore, the pre-write barrier records the value in the SATB log or buffer before referencing the change. Each thread has an exclusive SATB buffer, starting with 256 record Spaces.

Concurrent optimization thread

Concurrence Refinement Threads concentrate only on scanning cards recorded in the log buffer to maintain updated Rsets, Threads maximum number through – XX: G1ConcRefinementThreads (default is equal to – XX: ParellelGCThreads) Settings. The concurrent optimization thread is always active and begins concurrent processing as soon as a record is found in the global list. If growth record soon or too late processing, through the threshold – X: G1ConcRefinementGreenZone / – XX: XX: G1ConcRefinementYellowZone / – G1ConcRefinementRedZone, G1 schedules in a hierarchical manner, allowing more threads to process the global list. If the concurrent optimizer thread cannot keep up with the number of buffers, the Mutator thread (Java application thread) suspends the application and is added to help process until it is all processed. Therefore, such scenarios must be avoided.

Concurrent tags

In order to solve the problem of object disappearing in the process of concurrency, the original snapshot (STAB) algorithm is used to record the deleted gray to white references based on the three-color marking method.

During the process of creating new objects, G1 designs two Pointers named TAMS for each Region to divide part of the space for the allocation of new objects in the concurrent process. By default, objects in this section are implicitly marked, that is, alive by default. When this part of the object allocation rate is faster than the speed of reclaim free space, the user thread is forced to freeze.

Why does G1 use SATB instead of Incremental Update?

Because incremental update is adopted to re-mark the black to gray, and the previously scanned ones have to be scanned again, which is inefficient. G1 has RSet in conjunction with SATB. The Card Table records the RSet, and the RSet records the references of other objects to itself, so that there is no need to scan other areas.

That is, when the gray — > white reference disappears, if there is no black — > white reference, the reference will be pushed to the stack, and the next scan will get the reference. Because of the RSet, there is no need to scan the heap to find the reference pointing to white, which is more efficient.

The concurrent marking and re-marking phases are reduced by increasing the cost of tracking reference changes during the run of the user program.

Activity cycles and collection steps

Global concurrent markup (concurrent markup cycle)

  • Initial tag: Stop The World, using only one initial tag thread to mark all objects directly associated with GC Roots (pushed onto The search stack). Use an external bitmap to record mark information (instead of object headers). This is done synchronously (logically separate from the MinGC) when the Minor GC is borrowed.
  • Root zone scan: From Survior zone objects, markers are referenced to older objects and pushed onto the scan stack. Concurrent execution must be completed before MinGC begins.
  • Concurrent tagging: Execute concurrently with the user thread using a tagging thread. This process performs reachability analysis, continually taking references from the scan stack and recursively scanning the whole heap for objects. Each scanned object is marked and its fields are pushed onto the scan stack. Repeat the scan process until the scan stack is empty. References recorded by the SATB write Barrier are also scanned. Concurrent Marking can be interrupted by MinGC
  • Final mark: After the concurrent mark is completed, each Java thread will have some remaining references to the SATB write Barrier record that have not been processed. This stage handles the rest of the references. This stage also performs weak reference processing.
  • Filter collection: Calculates the cost and value of collection in a Region and specifies any collection plan based on the parameters. Recycle discarded objects, also Stop The World, and use multiple filter recycle threads to execute concurrently.

There are two submodes of selected CSet in generational G1 mode, corresponding to young GC and Mixed GC respectively:

The Cset in MIXGC is to select all the regions of young Gen, plus some old Gen regions with high revenue according to global Concurrent marking statistics.

The Cset in the MinGC is the region selected for all young Gen. Control the overhead of young GC by controlling the number of regions of Young Gen.

MinGC and MIXGC are both copied and cleared by multiple threads, and the whole process will STW. The low latency of G1 is due to the fact that the reclaim area becomes more precise and the range is smaller.

The normal workflow for generational G1 is to switch between young and Mixed GC as appropriate, with periodic global concurrency markers behind it. Initial marking is executed on young GC by default; G1 will not choose to do mixed GC when global concurrent markers are working, and will not start Initial marking if a mixed GC is in progress. In normal workflow, there is no concept of full GC, and the collection of Old Gen is completed by mixed GC.

Reliable pause prediction model

Based on the theory of attenuation mean, the garbage collection time of each Region and the cost of measurable steps such as the number of dirty cards in the memory set of each Region are recorded during garbage collection.

New garbage collector

\