JVM garbage collector

Serial collector

The Serial collector is the most basic and oldest collector, and was the only choice for garbage collection for the next generation of virtual machines until JDK1.3.1. The collector is single-threaded. Its single-threaded significance does not merely mean that it uses only one CPU or one collection thread to complete the collection, but most importantly, it does garbage collection while other worker threads pause until the collection is complete. This work is automatically initiated and performed by the virtual machine in the background, shutting down all worker threads without the user’s visibility, which is unacceptable for many applications. We can imagine that our computer runs for an hour and then stops for five minutes. For this kind of design, virtual machine designers are also very aggrieved, because it is impossible to collect, but also constantly generate garbage objects, so it is not finished cleaning. So from 1.3 until now, the virtual machine development team has been working hard to reduce thread pauses due to garbage collection, and the virtual machines are getting better and better, but they are still not completely eliminated.

Serial is the default garbage collector for the new generation of virtual machines in Client mode. It has advantages over other garbage collectors, such as the fact that focusing on garbage collection naturally yields the highest thread utilization due to the lack of overhead of switching between threads. In the context of user desktop applications, the memory allocated to virtual machines is generally not too large, and the pause time can be controlled between tens of milliseconds and one hundred milliseconds for the collection of new generation objects of tens of megabytes or one to two hundred megabytes. This is acceptable as long as it does not happen frequently. Therefore, Serial collector in Client mode is still a good choice for the new generation.

ParNew collector

The ParNew collector is a multithreaded version of the Serial collector. In addition to using multithreading for garbage collection, the rest of the controllable parameters, collection algorithms, stop working threads, object allocation principles, collection strategies and so on are completely consistent with the Serial collector.

There isn’t much innovation other than multi-threaded garbage collection, but it is the preferred virtual machine collector for the new generation of Servers. One important reason is that it is the only one that can be used with the CMS, in addition to the Serial collector. In JDK1.5, HotSpot launched a revolutionary collector CMS for strong interactive applications. This collector is HotSpot’s first truly concurrent collector. For the first time, it makes it possible for garbage collection to work simultaneously with worker threads, in other words, you can pollute and collect.

However, CMS, as an older collector, does not work with the latest next-generation garbage collector released in 1.4, and instead has to use either Serial or Parnew. The ParNew collector can be forcibly specified using -xx :+UseParNewGC, or the default Cenozoic collector with the -xx :+UseConcMarkSweepGC option.

The ParNew collector is by no means better than the Serial collector in a single CPU environment, or even better than the thread interaction overhead, which is not guaranteed to exceed the Serial collector 100 percent in both CPU environments implemented through hyperthreading technology. Of course, as the number of cpus increases, it is beneficial for efficient resource utilization of the system during GC. In the case of very large cpus, you can limit the number of garbage collection threads by using -xx :ParallelGCThreads.

Parallel avenge

The Parallel Scavenge collector is a new generation collector that uses a replication algorithm and is a Parallel multithreaded garbage collector. The focus of a collector such as CMS is to reduce the amount of time a user’s thread is shut down during garbage collection, whereas the Parallel Scavenge collector is to achieve a controlled throughput, which is the amount of time the CPU runs a user’s thread as a percentage of the total CPU time. Throughput = (user thread working time)/(user thread working time + garbage collection time). For example, if the virtual machine runs for 100 minutes and garbage collection takes 1 minute, the throughput is 99%. The shorter the pause time is, the more suitable for the program that interacts with the user. The good response speed can improve the user experience, but the high throughput can efficiently use the CPU time to complete the operation task of the program as soon as possible. It is mainly suitable for the program that does not need too much interaction in the background.

Two parameters control the throughput: -xx :MaxGCPauseMills. Directly set the throughput size: -xx :GCTimeRatio

-XX:+UseAdaptiveSizePolicy

Adaptive strategies are also important to differentiate the Parallel Scavenge collector from the unparNew collector

Serial Old collector

The Serial Old collector is an older version of the Serial collector, which is also a single-threaded collector using a mark-collation algorithm. The main purpose of the collector is to be used in the Client mode. If in Server Mode, there are two applications, one to be used in conjunction with the Parallel Scavenge collector in previous versions of JDK5, and the other as a CMS fallback when Concurrent Mode failures occur in Concurrent collections.

Parallel Old collector

The Parallel Old collector is an older version of the Parallel avenge collector, using multithreading and the mark-collation algorithm. The Parallel avenge collector was only used in the jdk6 insane. The Parallel avenge collector was always in an awkward phase. The Parallel Scavenge avenge would have no choice but to insane the Serial Insane. The Parallel Avenge avenge would not be insane to maximize throughput. Because the single-threaded legacy can’t take full advantage of the multi-CPU processing power of the server, the throughput of this combination is not even as good as the Parallel Insane + CMS insane in large and advanced hardware environments.

It was not until the Parallel Old collector was invented that the Parallel Scavenge collector + Parallel Old collector became a worthy combination of the Outrage-first and CPU-resource-sensitive collector.

CMS collector

The CMS collector is a collector whose goal is to obtain the minimum pause time. As can be seen from the name Concurrent Mark Sweep, the mark-sweep algorithm adopted is divided into four steps:

Only initial marking and re-marking require suspending the user thread.

(1) Initial tag — only associate objects that GC Roots can directly associate with, fast;

(2) Concurrent marking — the process of GC Roots Tracing;

(3) re-marking — in order to correct the mark record of the part of the object whose mark is changed due to the operation of the user program during concurrent marking;

(4) Concurrent clearing

In general, the CMS’s memory reclamation process is executed concurrently with the user thread, since the collector for concurrent marking and concurrent cleaning, the longest part of the process, can work with the user thread

Three major disadvantages of the CMS collector:

(1) THE CMS collector is very sensitive to CPU resources

(2) Floating garbage cannot be processed

(3) based on tag removal algorithm because it is over, so there will be a lot of debris – XX: + UseCMSCompactAtFullCollection

G1 collector

First and foremost, the G1 is designed for simple, actionable performance tuning

-XX:+UseG1GC -Xmx32g -XX:MaxGCPauseMillis=200

-xx :+UseG1GC to start G1 garbage collector, -XMx32g design heap memory of the maximum memory of 32G, -xx :MaxGCPauseMillis=200 set the maximum pause time of GC to 200ms. If we need to tune, we only need to change the maximum pause time for a given memory size.

(1) Memory allocation

(2)Young garbage recycling

(3)Mix garbage collection

Common setting parameters:

JVM garbage collection algorithm

Garbage collection algorithms include mark-clean algorithm, copy algorithm, marked truth algorithm and generational collection algorithm, which are described in detail below.

Mark-clear algorithm:

The most basic collection algorithm is the mark-sweep algorithm, which, as its name suggests, is divided into two phases: tag and sweep. The first step is to mark the objects to be reclaimed, and all marked objects will be reclaimed after the completion of the marking. This is the most basic garbage collection algorithm because other algorithms are based on this idea and have improved upon it.

There are two main problems:

The first problem is efficiency, marking and cleaning are not efficient.

The second is the problem of space allocation. After the mark is cleared, a large amount of discontinuous memory space will be generated. Too much space fragmentation may cause that the program cannot find enough memory space when it needs to allocate space for larger objects in the running process, and it has to carry out a garbage collection operation in advance. As shown in the figure, a large number of garbage fragments will be generated, resulting in low utilization of space.

Replication algorithm

In order to solve the problem of efficiency, a kind of collection algorithms called copy appeared, and its available memory can be divided into two pieces of equal size, using only one piece, every time when it ran out of a memory area, will still live objects are copied to another piece of memory, and then put the used space at once, so every time is half a regional recycle, Memory allocation does not have to consider fragmentation and other problems, as long as the heap top pointer is moved, in order to allocate memory, simple implementation, efficient operation.

But this approach to reduce the original memory in half, the cost is too high.

Today’s commercial virtual machines all use this method to recover the new generation. IBM’s special research shows that 98% of the objects in the new generation are “live and die”. Therefore, it is not necessary to divide the memory area according to 1:1, but to divide the memory into a larger area for Eden and two smaller areas for Survivor. The surviving objects in Eden and Survivor zone are copied to another Survivor zone once, and Eden and Survivor zone are cleaned up once. The default Hotspot area has an 8:1 Eden to Survivor ratio, which means 90% of the available memory for the new generation and only 10% of the memory is classified as reserved memory. Of course, it’s 98% in most cases, but we can’t guarantee that less than 10% of the surviving objects are collected every time, and when Survivor zones are insufficient, we need to rely on other zones for allocation guarantees. If another Survivor zone becomes insufficient, the object can be moved directly to the old age via a memory guarantee mechanism.

Tag sorting algorithm

In proportion to the live object replication algorithm under the condition of higher for more copy operation, efficiency will be lower, more to the point, if you don’t want to waste of 50% of the area, you will need to allocate extra space guarantee, 100% in response to a memory object to survive extreme situation, so the old s generally do not choose this kind of algorithm.

According to the characteristics of the old days, someone proposed another mark-tidy algorithm, the marking process is the same as the mark-clean algorithm, but the next step is not to clean the recyclable objects directly, but to move all the living objects to one end, and then directly clean up the objects outside the boundary. The schematic diagram is as follows:

Generational collection algorithm

The current commercial garbage collector is the use of generational garbage collection, this algorithm has no new ideas, just according to the life cycle of the object will be divided into several pieces of memory, generally divided into the Java heap new generation and old age, so that you can choose the most appropriate recycling algorithm according to the characteristics of each generation of objects. In the new generation, a large number of objects die and a small number survive each garbage collection, making replication algorithms suitable. Garbage collection can be done with a small object copy cost, whereas in the old days, because of the high survival rate, there is no guarantee of other memory allocation, and it must be collected using mark-clean or mark-tidy.

  1. The generation is divided into the young generation and the old generation. The young generation is divided into Eden area and Survivor area. Usually, the default ratio is 8:1:1.

  2. After each garbage collection, the age of the surviving object is +1. When the surviving object has survived 15 times, we let it go straight to the old age.

  3. Another way to enter the old age is memory guarantee, which means that when the new generation runs out of space, the object directly enters the old age;

  4. The new generation of garbage collection is called Minor GC and the older generation is called Full GC.

Reference article:

What are the JVM garbage collectors? The depth of the interpretation of the

What are the garbage collection algorithms? Detailed introduction