JVM garbage collection algorithm and garbage collector.

First, the area of garbage collection

  • Stack: The life cycle in the stack follows the thread, so it is generally not a concern.
  • Heap: Objects in the heap are the focus of garbage collection.
  • Method area: This area also occurs in garbage collection, but it is less efficient and is not usually the focus of our attention.

How to judge the survival of an object

There are generally two approaches (reference counting and reachability analysis), and the JVM uses reachability analysis.

1. Reference counting

Adds a reference counter to an object, incrementing it by 1 when a reference is added to the object, and subtracting it when the reference is invalidated. Objects with a reference count of 0 can be reclaimed (used by Python, but not by mainstream virtual machines).

  • Advantages: Fast, convenient, simple implementation.
  • Defect: When objects are referenced to each other (A.inistance =B and B.inistance =A), it is difficult to determine whether the objects should be reclaimed.

2. Accessibility analysis

To determine whether an object is alive. The basic idea of this algorithm is to search down from a series of objects called “GC Roots” as the starting point. The path taken by the search is called the Reference Chain. When an object is not connected by any Reference Chain to GC Roots, it is proved that the object is unavailable.

Objects that serve as GC Roots include the following:

  • The object referenced in the local variable table in the current virtual machine stack
  • An object referenced by a class static property in the method area
  • Objects referenced by constants in the method area

3. finalize

Java provides finalize() method. When the garbage collector is ready to free memory, it will call Finalize () first. It can save the object (not be collected), but it cannot guarantee that the object will not be collected.

3. Various references

The data stored in Reference represents the start address of another block of memory.

1. Strong reference

The general Object obj = new Object() is a strong reference.

If there is a strong reference to GCroots, the garbage collector will never reclaim it. It would rather throw an OOM error when there is insufficient memory and the program stops abnormally than reclaim a strong reference object.

2. Soft references

The SoftReference garbage collector does not reclaim it when memory is sufficient, and it does when memory is insufficient.

Example code:

public static void main(String[] args) {
    String str = new String("SunnyBear"); / / strong reference
    SoftReference<String> strSoft = new SoftReference<String>(str);
    str = null; // Eliminate strong references and make sure there are only strSoft soft references
    System.out.println(strSoft.get()); // SunnyBear
    System.gc(); Do not use this command online. This command is used as an example only
    System.out.println("------------ gc after");
    System.out.println(str); // null
    System.out.println(strSoft.get()); // SunnyBear
}
Copy the code

So soft references are often used to implement some kind of memory sensitive cache, where objects remain unreclaimed as long as there is enough memory

3. WeakReference

When the garbage collector detects the object, it reclaims the memory of the object, regardless of whether the memory is sufficient or not

Example code:

public static void main(String[] args) {
    String str = new String("SunnyBear"); / / strong reference
    WeakReference<String> strWeak = new WeakReference<String>(str);
    str = null; // Eliminate strong references and make sure there are only strSoft soft references
    System.out.println(strWeak.get()); // SunnyBear
    System.gc(); Do not use this command online. This command is used as an example only
    System.out.println("------------ gc after"); // null
    System.out.println(str); // null
    System.out.println(strWeak.get()); // null
}
Copy the code

Practical applications, such as WeakHashMap and ThreadLocal.

4. The virtual reference is PhantomReference

Ghost references, the weakest, receive a notification when they are garbage collected. If an object has only a virtual reference, then it can be collected at any time, just as if it had no reference at all.

Virtual references are primarily used to track the activity of an object being collected by the garbage collector.

Fourth, the GC

1. Minor GC

  • Characteristics: Occurs in the Cenozoic era, occurs frequently, the execution speed is fast.
  • Trigger condition: Insufficient space in Eden zone/guaranteed space allocation.

2. Full GC

  • Characteristics: Mainly occurs in the old age (the new generation will also be recycled), rarely occurs, the execution speed is slow.
  • Trigger condition:
    • Call System. The gc ().
    • Lack of space in the old area.
    • Space allocation guarantee failed.
    • The permanent generation (method area) of JDK 1.7 and earlier is out of space.

Fifth, garbage collection algorithm

1. Copying algorithms

Divide the available memory into two equal sized chunks by capacity and use only one chunk at a time. When the memory of this piece is used up, the objects that are still alive are copied to another piece, and then the used memory space is cleaned up once again. This makes every time is to the whole half area of memory reclamation, memory allocation will not have to consider the complex situation such as memory fragmentation, as long as the sequence of memory can be allocated, simple implementation, efficient operation. But the cost of this algorithm is to reduce memory size by half.

  • advantages
    • Simple and efficient, with no memory fragmentation.
  • disadvantages
    • The memory usage is low.
    • A larger number of living objects is significantly less efficient because you need to move the actual location of memory for each non-reclaimable data.

Note:

Special research shows that 90% of the objects in the new generation are “born and died”, so generally, 10% of the space occupied by reclamation is sufficient. Therefore, it is not necessary to divide the memory space in a 1:1 ratio, but to divide the memory into a large Eden space and two small Survivor Spaces. Each time Eden and one of them Survivor[1] are used. When reclaimable, the remaining objects in Eden and Survivor are copied to another Survivor space at one time, and Eden and the previously used Survivor space are cleaned up. The default size ratio of The HotSpot virtual machine to Eden and Survivor is 8:1, which means that the available memory space in each generation is 90% (80%+10%) of the entire generation capacity, and only 10% of the memory is “wasted”.

2. Mark-sweep algorithm

First, all the objects that need to be reclaimed are marked, and then the marked objects are reclaimed uniformly.

  • advantages
    • Utilization is 100%.
  • disadvantages
    • Marking and clearing are not efficient (compared to the replication algorithm).
    • A large number of discontinuous memory fragments are generated.

3. Mark-compact algorithm

First, mark all the objects that need to be reclaimed. After the mark is completed, the next step is not to clean up the recyclable objects directly, but to make all the living objects move to one end, and then directly clean up the memory outside the boundary.

  • advantages
    • Utilization is 100%.
    • No memory fragmentation.
  • disadvantages
    • The efficiency of marking and clearing is not high (compared with the replication algorithm and the clear marking algorithm).

6. Garbage collector

The JVM garbage collector uses all three of the above algorithms, using generational collection.

1. The New Generation: Replication algorithms.

The collector Collect objects and algorithms Collector type
Serial The new generation, the replication algorithm Single thread
ParNew The new generation, the replication algorithm Parallel multithreaded collector
Parallel Scavenge The new generation, the replication algorithm Parallel multithreaded collector

2, the old age: mark clearing algorithm and mark sorting algorithm

The collector Collect objects and algorithms Collector type
Serial Old The old era, tag sorting algorithm Single thread
Parallel Old The old era, tag sorting algorithm Parallel multithreaded collector
CMS (Conc Mark Sweep) In the old era, the marker removal algorithm Parallel and concurrent collectors
G1 (Garbage First) Across the new generation and the old generation, copy algorithm + tag sorting algorithm Parallel and concurrent collectors

Note:

  • Parallelization: Simultaneous garbage collection of multiple threads.
  • Concurrency: Multithreading of garbage collection and user application at the same time.
  • usejps -vYou can see the garbage collector used, for example:-XX:+UseConcMarkSweepGC(CMS)

1. A compatible garbage collector

The line represents a garbage collector that can work with the new generation and the old generation.

2. Serial/Serial Old

Oldest, single threaded, exclusive, mature, suitable for single CPU servers. -xx :+UseSerialGC Both the new generation and the old generation use serial collectors.

3. ParNew

ParNew and Serial are basically the same, the only difference is: multi-threaded, multi-CPU, less pause time than Serial.

-xx :+UseParNewGC ParNew for the new generation, Serial Old for the Old.

Work with a CMS.

Be insane/insane

Throughput focused garbage collector, high throughput can efficiently use CPU time to complete the program’s computing tasks as soon as possible, mainly suitable for background computing without much interaction. Throughput is the ratio of the CPU time spent running user code to the total CPU consumption, which is throughput = user code time/(User code time + garbage collection time). The virtual machine runs for 100 minutes, and garbage collection takes 1 minute, which is 99% throughput.

5. CMS (Concurrent Mark Sweep)

A collector is one whose goal is to obtain the shortest collection pause time. At present, a large part of Java applications are concentrated on the Internet site or the server of B/S system. This kind of applications pay special attention to the response speed of service, hoping that the pause time of the system is the shortest, so as to bring better experience to users.

The CMS collector is well suited for this type of application. -xx :+UseConcMarkSweepGC. The newer generation uses ParNew, while the older generation uses CMS. As you can see from the name (including “MarkSweep”), the CMS collector is based on the “MarkSweep” algorithm. It is a bit more complex than the previous collectors.

The recycling process

The whole process is divided into four steps, including:

1. Initial marking: Just marking the objects that GC Roots can directly relate to. This is very fast and requires stW-stop the world.

2. Concurrency markup: Reachability analysis of objects in the heap starting with GC Root to find the surviving objects, which takes the longest time throughout the collection process and does not require a pause.

3. Re-marking: Pauses (STW) are required to correct the part of the object’s marking record where the marking changes as the user program continues to operate during concurrent marking. The pause time for this phase is typically slightly longer than for the initial markup phase, but much shorter than for concurrent markup.

  1. Concurrent cleanup: No pauses required.

The advantages and disadvantages

1, the advantages of

Since the longest concurrent marking and concurrent cleanup process threads in the entire process can both work with the user thread, the MEMORY reclamation process for the CMS collector is performed concurrently with the user thread as a whole

2 and disadvantages

  • CPU resource sensitivity: Because multiple threads occupy CPU resources during the concurrent phase, if CPU resources are insufficient, efficiency will be significantly reduced.
  • Because the user thread is still running during the CONCURRENT cleanup phase of the CMS, new garbage is naturally generated as the program is running. This part of garbage appears after the marking process, and the CMS cannot dispose of it in this collection, so it has to be cleaned up in the next GC. This part of the garbage is called floating garbage.
  • The existence of floating garbage means that some memory needs to be set aside, meaning that CMS collections cannot wait for the old age to run out as other collectors do. In version 1.6, at the Old space usage threshold (92%), if the reserved memory is not enough to store floating garbage, Concurrent Mode Failure will occur, in which case the VM will temporarily enable Serial Old instead of CMS.
  • Memory fragmentation occurs: the mark-clean algorithm results in discontinuous memory fragmentation.

6. G1

G1 compared to CMS

  • Based on the mark-collation algorithm, space fragmentation is not generated. When allocating large objects, continuous space will not be obtained and a full GC is triggered in advance.
  • Controllable Pause time: G1 can control the garbage collection time by setting an expected Pause time, but this expected Pause time is only possible, not necessarily possible.

A predictable pause:

The G1 collector is able to model predictable pause times because it can systematically avoid region-wide garbage collection across the entire Java heap. G1 keeps track of the value of Garbage accumulation in each Region (the experience value of how much space is collected and how long it takes to collect), and maintains a priority list in the background. The Region with the largest value is First collected based on the allowed collection time (hence the name garbage-first). In this way, the memory space is divided by regions and the collection mode has priority, which ensures that the G1 collector can obtain as high collection efficiency as possible in a limited time.

Setting parameters for G1

  • -xx :+UseG1GC // Enable G1
  • -xx :MaxGCPauseMillis=200 // Expected pause time 200 milliseconds, default is 200
  • -xx :G1HeapRegionSize=2 // Set the size of each region to 2M. The size must be a power of 2. The range can range from 1Mb to 32Mb
  • -xx :G1NewSizePercent // The minimum value of the new generation. The default value is 5%
  • -xx :G1MaxNewSizePercent // The default value is 60%
  • -xx :ParallelGCThreads // number of ParallelGCThreads during STW
  • -xx :ConcGCThreads=n // Concurrency markup phase, the number of threads executing in parallel

How does G1 scrape heap memory

G1 divides the heap into independent regions of equal size, so that the new generation and the old are no longer physically isolated.

The G1 algorithm divides the heap into several independent regions, which are still part of the generational collector. However, some of these areas contain the new generation, which still suspends all application threads and copies the surviving objects to the old or Survivor space. For example, one of the independent areas is shown in the following figure:

The pattern of GC

1, Young GC

The Young GC mainly GC the Eden zone and is triggered when the Eden space is exhausted. In this case, the data in the Eden space is moved to the Survivor space, and if the Survivor space is insufficient, part of the data in the Eden space is directly promoted to the old space. Data from the Survivor zone is moved to the new Survivor zone, and some data is promoted to the old age space. Finally, the Data in the Eden space is empty, the GC stops working, and the application thread continues executing

2, Mixed GC

Select all the regions in the new generation, and collect several old regions with high income according to global concurrent marking statistics. Within the specified cost range, select an older Region with higher benefits as much as possible. Mixed GC is not full GC and can only reclaim some older regions. If the Mixed GC simply cannot keep up with the rate at which the program allocates memory, causing the Mixed GC to fill up and make it impossible to continue the Mixed GC, the serial Old GC (Full GC) is used to collect the entire GC heap. Therefore, we can know that G1 does not provide full GC

Collection process

There are roughly four steps:

1. Initial mark: Simply marking the objects that GC Roots can directly correlate with and changing the value of TAMS (Nest Top Mark Start) allows the next phase of user programs running concurrently to create objects in the correct Region, which requires a short STW

2. Concurrency marking: Starting from GC Root, the reachability analysis is performed on the objects in the heap to find the living objects. This phase takes a long time, but it can be executed concurrently with the user program

3. Final Mark: To correct the portion of the token record that changed during the concurrent token period as the user program continued to operate, the virtual machine records object changes during this period in the thread’s Remembered Set Logs. The final markup phase requires that the data for Remembered Set Logs be merged into the Remembered Set. This phase requires a pause thread (STW), but can be executed in parallel

4. Filtering collection: First, the collection value and cost in each Region are sorted, and the collection plan is made according to the GC pause time expected by users. In this phase, it is possible to execute concurrently with the user program. However, because only part of the Region is reclaimed, the time is user-controlled, and pausing the user thread greatly improves the collection efficiency

Some important parameters of the garbage collector

parameter describe
UseSerialGC The vm runs the default value in Client mode. After this switch is enabled, the Serial+Serial Old collector combination is used to reclaim memory
UseParNewGC When this switch is turned on, memory is reclaimed using the collector combination of ParNew + Serial Old
UseConcMarkSweepGC When this switch is turned on, memory is reclaimed using the collector combination of ParNew + CMS + Serial Old. The Serial Old collector will be used as a backup collector in the event of a Concurrent Mode Failure to the CMS collector
UseParallelGC The vm runs the default value in Server mode. When this switch is turned on, reclaim memory using the collector combination of Parallel Scavenge + Serial Old(PS MarkSweep)
UseParallelOldGC Apply the collector combination of Parallel Scavenge and Parallel Old to reclaim memory
SurvivorRatio In the new generation, the capacity ratio of Eden and Survivor is 8 by default, indicating that Eden: Survivor = 8:1
PretenureSizeThreshold If this parameter is set, objects larger than this parameter will be allocated directly in the old age
MaxTenuringThreshold The age of objects promoted to the old age increases by 1 after each object has survived a Minor GC, and enters the old age when this parameter value is exceeded
UseAdaptiveSizePolicy Dynamically resize areas of the Java heap and age into the old age
HandlePromotionFailure Whether an allocation guarantee is allowed to fail, where the remaining space in the old age is insufficient for the extreme case where all objects survive in the entire Eden and Survivor sections of the new generation
ParallelGCThreads Sets the number of threads to reclaim memory during parallel GC
GCTimeRatio GC The ratio of time to the total time, which defaults to 99, allows 1% GC time. Applies only when using the Parallel Scexploiture
MaxGCPauseMillis Sets the maximum GC pause time. Applies only to the Parallel Scavenge collector
CMSInitiatingOccupancyFraction Sets how much of the old space is used by the CMS collector to trigger garbage collection. The default is 68% and only takes effect when the CMS collector is used
UseCMSCompactAtFullCollection Setting whether the CMS collector performs a memory defragmentation after garbage collection takes effect only when the CMS collector is used
CMSFullGCsBeforeCompaction Setting the CMS collector to start defragmentation once after a number of garbage collections takes effect only when the CMS collector is used

Now that you’ve read this, like, comment, follow, or collect it!

Author: IT wang2 xiao3 er4 starting address: www.itwxe.com/posts/a4932… Copyright notice: The content of the article is subject to the authorship-non-commercial-no deduction 4.0 international license. If you want to reprint, please link to the author in a prominent position on the article page.