The JVM garbage collector is a specific implementation of the garbage collection algorithm. It is helpful to understand the previous knowledge of garbage collector.

What is garbage?

Garbage is basically objects on the heap, so how do you determine that these objects are recyclable?

The idea is, if an object can never be accessed, it’s garbage, it can be recycled how do you know that object will never be used?

Reference counting method

Adds a reference counter to the object, incrementing its value by one each time it is referenced somewhere; When a reference is invalid, the counter value is reduced by one; An object whose counter is zero at any point in time cannot be used again. But, in the field of Java, at least there’s no use in the Java virtual machine of mainstream reference counting algorithm to manage memory, the main reason is that this seemingly simple algorithm have a lot of exceptions to consider, must want to cooperate with a large number of additional processing to ensure that work correctly, such as a simple reference counting is difficult to solve the problem of mutual circular references between objects.

As shown in the figure, each object has a reference of 1, which forms a circular reference, but it cannot be accessed by other objects. These two objects have no references, and the reference counting algorithm cannot reclaim them.

Code verification:

package com.courage; public class ReferenceCountingGC { public Object instance = null; private static final int _1MB = 1024 * 1024; Private byte[] bigSize = new byte[5* _1MB]; private byte[] bigSize = new byte[5* _1MB]; public static void testGC() { //5 M ReferenceCountingGC objA = new ReferenceCountingGC(); //5 M ReferenceCountingGC objB = new ReferenceCountingGC(); objA.instance = objB; objB.instance = objA; objA = null; objB = null; // Can objA and objB be reclaimed if GC occurs on this line? System.gc(); } public static void main(String[] args) { testGC(); }}Copy the code

Execution Result:

[warning][GC] -xx :+PrintGCDetails is deprecated. Will use -xlog: GC * instead. [0.012s][info][GC,heap] heap Region size: 1M [0.015s][info][GC] Using G1 [0.015s][info][GC, Heap, COOPS] Heap Address: 0x0000000701000000, size: 4080 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 ...... [0.119s][info][GC,metaspace] GC(0) metaspace: 805K->805K(1056768K) [0.119s][info][GC] GC(0) Pause Full (system.gc () 14M->0M(8M) 2.886ms [0.119s][info][GC, CPU] GC(0) User=0.03s Sys=0.00s Real=0.00s [0.120s][info][GC,heap,exit] heap......Copy the code

After system.gc (), the memory usage is 14M->0M, freeing up the object’s 10M. The JVM does not use reference counting to flag garbage.

Root reachable algorithm

The basic idea of this algorithm is to use a series of root objects called “GC Roots” as the starting node set. From these nodes, search down according to the Reference relationship. The path through the search process is called “Reference Chain”. Or in graph theory terms, when the object is unreachable from GC Roots, then the object cannot be used again.

Fixed objects that can be used as GC Roots in Java technology architecture include the following:

  1. Objects referenced in the virtual machine stack (the local variable table in the stack frame), such as parameters, local variables, temporary variables used in the stack of methods called by individual threads, and other objects referenced in the class static properties of the method area, such as Java class reference type static variables.
  2. Objects that are constant references in the method area, such as references in the String Constant pool (String Table).
  3. Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack.
  4. Internal references to the Java virtual machine, such as Class objects corresponding to basic data types, resident exception objects (NullPointExcepiton, OutOfMemoryError), and system Class loaders.
  5. All objects held by the synchronized keyword.
  6. Jmxbeans that reflect Java virtual machine internals, callbacks registered in JVMTI, local code caches, and so on.

Garbage collection algorithm

This article introduces three common garbage collection algorithms (Mark-sweep, Mark-Compact, and Mark-copy) that are the basis of various Java virtual machine garbage collectors.

Garbage collection algorithm idea

Most of the current commercial VIRTUAL machine garbage collectors are designed to follow the theory of “Generational Collection,” which is essentially a set of rules of thumb for how most programs actually run. It’s based on two Generational hypotheses:

1) Weak Generational Hypothesis: the overwhelming majority of subjects were Generational and Generational.

2) The Strong Generational Hypothesis: the more times an object goes through garbage collection, the less likely it is to die.

Together, these two generational hypotheses form a consistent design principle for several commonly used garbage collectors: Collector Java heap should be divided into different areas, and then recycling objects according to their age (age object through the number of garbage collection process) are assigned to different parts of the storage is obvious, if most of the objects in an area is in the evening out, it is difficult to survive the garbage collection process, then put them on together, By focusing only on keeping a small number of objects alive each time rather than marking a large number of objects to be collected, a large amount of space can be reclaimed at a lower cost.

If all that is left are hard-to-die objects, the virtual machine can recycle the area at a low frequency, balancing the time cost of garbage collection and the efficient use of memory space.

Mark-sweep algorithm

The algorithm is divided into “mark” and “clear” two stages: first, mark all the objects to be recycled, after the completion of the mark, all the marked objects will be recycled, or conversely, mark the surviving objects, all the unmarked objects will be recycled.

It has two major disadvantages:

The first is that the execution efficiency is not stable. If the Java heap contains a large number of objects, and most of them need to be recycled, then a large number of marking and clearing actions must be carried out, resulting in the execution efficiency of both the marking and clearing process decreases with the increase of the number of objects.

The second problem is the fragmentation of memory space. After marking and clearing, a large number of discontinuous memory fragments will be generated. Too much space fragment may cause that when large objects need to be allocated in the running process of the program, enough continuous memory cannot be found and another garbage collection action has to be triggered in advance.

Mark-copy

The mark-copy algorithm, often referred to simply as the copy algorithm, divides the available memory by capacity into two equally sized pieces, using only one of them at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again.

If most objects are live in the memory, the algorithm will produce a lot of memory copy overhead, but in the case of most objects are recycled, algorithm need to copy is in the minority living objects, but every time is for the entire area in memory, when allocating memory, also need not consider to have the complex situation of space debris, as long as the mobile heap pointer, It can be distributed in order. This is easy to implement and efficient to run, but its drawbacks are also obvious. The cost of this copy-recycle algorithm is to reduce the available memory by half, which is too much space waste.

Mark-compression Mark-compact

The efficiency of mark-copy algorithm will decrease when the survival rate of the object is high. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, in the extreme case that all objects in the memory used are 100% alive, so in the old days, this algorithm generally cannot be used directly.

In mark-compress, the marking process is still the same as in mark-clean, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects towards one end of the memory space, and then clean up the memory directly beyond the boundary:

The essential difference between the mark-clean algorithm and the mark-tidy algorithm is that the former is a non-mobile recovery algorithm while the latter is mobile. Whether to move the recovered living object is a risky decision with both advantages and disadvantages: If moving a live object, especially in the old days when there were a large number of object live areas per collection, moving a live object and updating all references to those objects would be an extremely heavy operation, and such an object move would have to suspend the user application for the entire time (STW problem).

Garbage disposal

Based on the above three garbage collection algorithms, seven garbage collectors are derived:

Serial collector

The collector is a collector of the single thread work, but it’s the meaning of the “single thread” is not just that it can only use a single processor or a collection of threads to complete garbage collection work, more important is to emphasize on its garbage collection, must suspend all other worker threads, until it is about the end of the collection.

To date, it is still the default new generation collector for HotSpot VIRTUAL machines running in client-side mode. It has an advantage over other collectors in that it is simple and efficient (compared to the single-threaded collectors), and for memory-constrained environments, It is the smallest Memory Footprint [1] of all collectors; For single-core processors or environments with a small number of processor cores, the Serial collector can achieve maximum single-thread collection efficiency by focusing on garbage collection because there is no overhead of thread interaction. The Serial collector is a good choice for virtual machines running in client mode.

ParNew collector

The ParNew collector is essentially a multithreaded parallel version of the Serial collector, except that multiple threads are used simultaneously for garbage collection

In addition, the remaining behavior includes all control parameters available to the Serial collector (e.g. -xx: SurvivorRatio, -xx: SurvivorRatio).

PretenureSizeThreshold, -xx: HandlePromotionFailure, etc.), collection algorithm, Stop The World, object allocation rule

, collection strategy, and so on are exactly the same as the Serial collector, and the two collectors share a considerable amount of code in implementation.

The ParNew collector does not offer much innovation over the Serial collector except that it supports multithreaded parallel collection, but it does

It is the preferred new generation collection of HotSpot virtual machines running in server mode, especially legacy systems prior to JDK 7

For a reason that has nothing to do with functionality or performance, but is actually important:

In addition to the Serial collector, it is currently the only one that can communicate with the CMS

Collectors work together, and the emergence of CMS, on the other hand, strengthens ParNew’s position

The ParNew collector is absolutely no better than the Serial collector in a single-core processor environment, even due to the presence of threads

This collector is not 100% guaranteed to outperform Serial even in a pseudo-dual-core processor environment implemented through hyper-threading technology. Of course, as the number of processor cores that can be used increases, ParNew is useful for garbage collection

Efficient use of system resources is very beneficial.

Parallel avenge

The Parallel Collector is a new generation collector. It is based on the mark-copy algorithm and the Parallel Collector. It is a multi-threaded collector that can collect in Parallel. The Parallel Exploiture is superficially similar to the ParNew Insane, so what’s so special about it?

The Parallel Collector is characterized by its focus on minimizing the pause time of user threads during garbage collection. The goal of the Parallel Insane is to achieve a controlled Throughput. Throughput is the ratio of the time the processor spends running user code to the total elapsed time of the processor, namely:

Throughput = time to run user code time to run user code + time to run garbage collector Throughput = time to run user code time to run user code + time to run garbage collector

If the virtual machine completes a task and the user code plus garbage collection takes 100 minutes, of which garbage collection takes 1 minute, the throughput is 99%. The shorter the pause time, the more suitable for the need to interact with users or to ensure the quality of service response, good response speed can improve user experience; The high throughput can make the most efficient use of processor resources and complete the operation tasks of the program as soon as possible, which is mainly suitable for the analysis tasks without too much interaction in the background.

The Parallel Avenge collector is also often referred to as a “through-first collector” because of its affinity for throughput.

Serial Old collector

Serial Old is an older version of the Serial collector, which is also a single-threaded collector using a mark-collation algorithm. The main purpose of this collector is also to be used by HotSpot virtual machines in client mode. If in server mode, it may also serve two purposes: The Application is used in JDK 5 and earlier with the Parallel Scavenge collector, and the application is used as a fallback in the event of a Concurrent Mode Failure in a Concurrent collection.

Parallel Old collector

Parallel Old is an older version of the Parallel Avenge collector, supported by multiple threads for concurrent collection and implemented on a mark-collation algorithm. The collector was only available in JDK 6, and up until that time, the New generation of the Parallel Exploiter had been in a fairly awkward situation because of the application of the Parallel Exploiter. Older generations have no choice but the Serial Old (PSMarkSweep) collector, with which other well-performing older collectors such as CMS cannot work.

Due to the “drag” of the Serial collector on server application performance, the Parallel Collector is not necessarily an overall throughput maximizer.

Similarly, because the single-threaded aged collection set cannot make full use of the parallel processing capability of the server’s multi-processors, the total throughput of this combination may not even be better than that of ParNew plus CMS in the operating environment with large memory space and relatively advanced hardware specifications. It wasn’t until the Parallel Old collector was invented that the Parallel Scavenge collector became a worthy combination of the insane and the Parallel Avenge collector.

CMS collector

The CMS (Concurrent Mark Sweep) collector is a collector whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are concentrated on the server side of Internet websites or B/S system based on browsers. Such applications usually pay more attention to the response speed of services and hope that the system pause time is as short as possible to bring good interactive experience to users. The CMS collector is a good fit for such applications.

As the name (including “Mark Sweep”) suggests, the CMS collector is based on a mark-sweep algorithm. Its operation is more complex than the previous collectors, and the whole process is divided into four steps:

1) CMS Initial Mark

2) CMS Concurrent Mark

3) Re-marking (CMS Remark)

CMS Concurrent sweep

The initial marking and re-marking steps still need to “Stop The World”. The initial tag simply marks the objects that GCRoots can relate to directly, which is fast; The concurrent marking phase is the process of traversing the entire object graph from the directly associated object of GC Roots. This process takes a long time but does not need to suspend the user thread and can be run concurrently with the garbage collection thread. The re-marking stage is to correct the marking record of the part of objects that are marked differently due to the continued operation of the user program during the concurrent marking period. The pause time of this stage is usually slightly longer than that of the initial marking stage, but also much shorter than that of the concurrent marking stage. Finally, there is the concurrent cleanup phase, which cleans and deletes dead objects judged by the marking phase. Since there is no need to move living objects, this phase can also be concurrent with the user thread. Since the garbage collector thread can work with the user thread during both the concurrent marking and concurrent cleanup phases, which are the longest in the process, the CMS collector’s memory reclamation process is generally performed concurrently with the user thread.

Advantages: Concurrent collection, low pauses

Disadvantages: 1. Very sensitive to processor resources

2. Unable to handle “Floating Garbage”

3. Space debris

Garbage First

The Garbage First (G1 for short) collector is a milestone in the history of Garbage collector technology, which pioneered the idea of collector design oriented to local collection and region-based memory layout. G1 is a garbage collector primarily for server applications.

All other collectors prior to the G1 collector, including the CMS, targeted garbage collection at either the entire Minor, Major, or Full Java heap. G1, however, breaks out of this trap. It can collect any part of the heap memory to form a Collection Set (generally referred to as CSet). The measurement standard is no longer the generation to which it belongs, but the memory that stores the most garbage and has the largest Collection benefit. The region-based heap memory layout pioneered by G1 is key to its ability to achieve this goal. While G1 is still designed to follow the generational collection theory, its heap memory layout is very different from other collectors: G1 no longer insists on a fixed size and a fixed number of generational regions, but divides the continuous Java heap into multiple independent regions of equal size. Each Region can act as the Eden space of the new generation, Survivor space, or old chronospace as required. The collector can apply different strategies to regions that play different roles, so that both newly created objects and old objects that have survived multiple collections for a while can be collected well.

Regions also have a special class of Humongous regions that are used to store large objects. G1 considers large objects as long as the size of a Region exceeds half of its capacity. The size of each Region can be set using -xx: G1HeapRegionSize. The value ranges from 1MB to 32MB and is the NTH power of 2. However, super objects that exceed the capacity of the entire Region are stored in N consecutive Humongous regions. Most of G1 behaviors regard Humongous Regions as part of the old period, as shown in Figure 3-12.

While G1 still retains the concept of Cenozoic and oldene, Cenozoic and oldene are no longer fixed, but rather dynamic collections of a series of regions that do not need to be continuous. The G1 collector is able to model predictable pause times because it uses regions as the smallest unit of a single collection, that is, each collection of memory is an integer multiple of Region size, so that it can systematically avoid all-region garbage collection across the entire Java heap. A more specific approach is to have the G1 collector track the “value” of garbage accumulation in each Region, which is the amount of space collected and the experience of how long it takes to collect, and then maintain a list of priorities in the background, each time according to the user set the allowable collection pause time (using the parameter -xx: MaxGCPauseMillis specifies, the default is 200 milliseconds), which prioritises regions that benefit most from collecting value, hence the name “Garbage First”. This use of regions and prioritized Region collection ensures that the G1 collector achieves the highest possible collection efficiency in a limited amount of time.

Garbage Disposal Summary

At present, the new generation of old age garbage collector combination:

Source: www.tuicool.com/articles/7j…