preface

Garbage collection algorithm is the methodology of memory collection; The garbage collector is a concrete implementation of memory reclamation;

Automatic memory management allocates memory to objects and reclaims memory occupied by obsolete objects.

Scope of garbage collection: Java heap and method area.

One, object live algorithm

1) Reference counting algorithm (mainstream virtual machines do not adopt) : add a reference counter to the object, every time a place refers to it, the counter increases by 1; When a reference is invalid, the counter value is reduced by 1; Any time an object’s reference counter reaches zero, it becomes obsolete. It can be recycled.

Advantages: simple implementation, high judgment efficiency;

Disadvantages: It is difficult to solve the problem of circular references between objects.

2) Reachability analysis method: The idea of the algorithm is to search down from a series of objects called “GC Roots” as the starting point, and the search path is called the reference chain. When there is no reference chain between an object and “GC Roots”, it indicates that the object is unavailable.

Possible “GC Roots” include the following:

1. Objects referenced in the vm stack;

2. The object referenced by the class static attribute in the method area;

3. Objects referenced by constants in the method area;

4. Objects referenced by JNI(general Native methods) in the native method stack.

3) To determine to be or not to be

Even objects that are unreachable in reachability analysis are not immediately recoverable. This is because it takes at least two “marking” processes to be recycled as discarded objects.

When an object is first discovered and has no reference chain between a set of Gc Roots nodes, it will be tagged once and filtered once based on whether it is necessary to execute the Finalize () method. When the object does not overwrite the Finalize () method, or the Finalize () method has already been called by the virtual machine. Virtual machines are considered “unnecessary”.

If the object is judged to be necessary to execute finalize(), the object will be prevented from being in an F-queue and will be executed later by a low-priority thread. The Finalize () method is the last chance for objects to escape garbage collection. F-queue is marked a second time later. If no relationship has been established with any object in the meantime, it will basically be reclaimed after the second markup.

Second, garbage collection algorithm

1) Mark-clear algorithm

There are two stages in the algorithm. The first stage is to mark all unreachable “garbage” according to the reachable analysis algorithm. The second stage is to directly release the memory space occupied by these objects.

Advantages: it is the basic idea of garbage collection algorithm, and the subsequent collection algorithm is improved based on this algorithm.

Disadvantages:

  • Efficiency problem: Marking and cleaning are inefficient, requiring two passes through the heap.
  • Space issues: After the tag is cleared, there is a lot of memory fragmentation. Once large objects need to be allocated, garbage collection often has to be triggered again.

2) Replication algorithm

The memory is divided into two equal chunks, and only one of them is used at a time. When the system initiates GC collection, all surviving objects in the current block are copied to the other block, and the memory space occupied by the current block is released as a whole.

Modern commercial virtual machines: are objects that use this collection algorithm to reclaim new generation areas. Typically, memory is divided into a large Eden and two small Survivor Spaces. The default ratio is 8:1:1.

This increases memory usage from 50 percent to 90 percent, but even this does not guarantee that no more than 10 percent of the objects collected survive each time. At this time, it is up to other memory (mainly the old) to guarantee memory allocation.

3) Label sorting algorithm

Because the replication algorithm must carry out more replication operations when the object survival rate is high, the efficiency will become very low. More importantly, if we do not want to waste 50% of the memory, we need to have extra space for memory allocation guarantee to cope with the extreme case of more than 90&object survival. So in the old days you couldn’t do it directly.

According to the characteristics of the old era, the proposed mark-sorting algorithm is actually an enhanced version of the mark-clearing algorithm. It just doesn’t clean up the recyclable objects directly. Live objects are moved to the same end, and memory beyond the boundary is reclaimed.

4) Generational collection algorithm

The current mainstream JVM virtual machine, all use “generation collection” algorithm, this algorithm is not a new idea, just according to the characteristics of the object life cycle, the Java heap into several pieces. New generation and old age. The “new generation” is generally the object of a short period of time is a large number of abandoned, using the copy algorithm is more efficient. In the “old days”, when the average object has a high survival rate and there is no extra spatial memory to guarantee memory allocation, the “mark-clean” or “mark-clean” algorithm must be used.

Garbage collector

The figure shows garbage collectors acting on different generations, which can be used together if there is a connection between the two collectors. The region in which the virtual machine is located indicates whether it belongs to the new generation collector or the old generation collector

“Stop the World” can happen in any GC algorithm. “Stop the World” means that the JVM has stopped the execution of the application for GC.

  • Serial collector: single-threaded, next-generation collector that uses a replication algorithm. It uses only one CPU or one collection thread to complete garbage collection, and must “Stop the World” while garbage collection is going on, pausing all of its worker threads until it finishes collecting.

  • ParNew collector: A multithreaded version of the Serial collector with exactly the same control parameters, collection algorithm, Stop the World, object allocation rules, and reclaim policy

  • The Parallel Scavenge collector, a generation collector that uses replication algorithms and Parallel threading.

  • Serial Old collector: An older version of the Serial collector, single-threaded, uses a mark-collation algorithm.

  • The Parallel Old Collector: an older version of the Parallel Avenge collector, multi-threaded, using the mark-collation algorithm

  • CMS collector: A collector whose goal is to obtain the shortest collection pause time, based on a mark-sweep algorithm. The operation process is divided into four steps: initial marking, concurrent marking, re-marking, and concurrent cleaning.

    The initial marking and re-marking steps still require “Stop the World”. The initial tag simply marks objects that GC Roots can be directly associated with, which is fast. Concurrent mark phase is the GC Roots Tracing process, and to mark phase is to correct for user program continue to operate during concurrent tags lead to produce changes in that part of the object’s marking record, this one phase of the pause time usually slightly longer than the initial marking phase, but is far less than the period concurrent tags.

    The collector thread can work with the user thread on both the concurrent marking and concurrent cleaning processes that take the most time, so the CMS collector’s memory reclamation process is performed concurrently with the user thread.

    Advantages: Concurrent collection, low pauses

    Disadvantages: Very CPU resource sensitive, unable to handle floating garbage, based on tag scavenging algorithm, large number of control fragments generated at the end of collection

  • G1 collector: The G1 collector is one of the most advanced collector technologies today, a garbage collector for server-side applications.

    Features: parallelism and concurrency, generational phones, spatial integration, predictable pauses

    The operation process is as follows: initial tag, concurrent tag, final tag, filter recycle.

    Initial marking phase: just mark the objects that GC Roots can directly associate with, and modify the TAMS value so that the next phase of the user program runs concurrently to create new objects in the correct Region available. This phase requires the thread to be paused, but it takes a short time.

    Concurrent marking stage: The reachability analysis of the heap objects is performed from GC Roots to find the surviving objects. This stage is time-consuming but can be executed concurrently with the user program.

    The virtual machine will record object changes in the thread Remembered Set Logs during the concurrent marking period. The virtual machine will record object changes during this period in the thread Remembered Set Logs. The final marking phase requires the consolidation of data from the Remembered Set Logs into the Remembered Set. This phase requires the thread to be paused, but can be performed in parallel.

    Finally, in the recovery screening stage: firstly, the recovery value and cost of each Region are sorted. The collection plan is based on the expected GC pause times of users.

4. Memory allocation and reclamation strategy

Objects are allocated in Eden first: In most cases, objects are allocated in Eden of the new generation. When the Eden area does not have enough space to allocate, the virtual machine will initiate a Minor GC.

Large objects go straight to the old days: Large objects are Java objects that require a large number of contiguous memory controls, most typically long strings and arrays.

Long-lived objects will be old: Virtual machines use generational collection to manage memory, so memory reclamation must be able to identify which objects should be placed in the new generation and which objects should be placed in the old generation. To do this, the virtual machine defines an object age counter for each object. If an object survives after Eden’s birth and the first Minor GC and can be accommodated by a Survivor, it is moved to a Survivor space, and the object’s age is set to 1. Each time the object “survives” a Minor GC in a Survivor area, the object’s age increases by one year. When it reaches a certain age (15 by default), it will be promoted to the old age.

Dynamic object age determination: The virtual machine does not always require that the age of the object must reach MaxTenuringThreshold to be promoted to the old age. If the sum of the size of all objects of the same age in the Survivor space is greater than half of the size in the Survivor space, the object whose age is greater than or equal to this age can directly enter the old age. There is no need to wait until the age required in MaxTenuringThreshold.

Space allocation guarantee: Before Minor GC occurs, the virtual machine checks to see if the maximum available contiguous space of the old generation is greater than the total space of all objects of the new generation. If this condition is true, then Minor GC is guaranteed to be safe. If this is not true, the virtual machine checks the HandlePromotionFailure setting to see if the guarantee failure is allowed.

Five, the summary

Memory recycling and garbage collector in many cases is one of the main factors influencing the performance of the system, the concurrent ability, the virtual machine to provide different types of collector and to provide a large number of adjustable parameters, because only based on the need of practical application, the implementation way to select the optimal methods to obtain the highest performance. Without fixed collectors, parameter combinations, and optimal tuning methods, virtual machines have no necessary memory reclamation behavior.