Mind mapping

preface

The most obvious feature of Java compared to the C/C++ language is the introduction of automatic garbage collection. Garbage Collection (GC) frees programmers from having to worry about JVM memory management and focus on writing programs. It’s hard for programmers to be aware of GC, but when it comes to performance tuning, troubleshooting problems on the line, and so on, a deep understanding of GC is essential. It is often possible to improve system performance by setting a few JVM parameters.

First, the JVM memory area

To understand GC in depth, it’s important to understand what data the GC reclaims and where the data resides. Next, let’s look at the memory region of the JVM.

As can be seen from the figure, the memory area is divided into five parts:

  • Virtual machine stack: Thread private, consisting of stack frames, each stack frame corresponds to a called method, holding information such as local variables of the method. The stack frame goes on when a method is called and goes off when a method ends the call. The timing of loading and unloading is clear, so GC is not required.
  • Native Method stack: Very similar to the virtual machine stack, the difference is that the virtual machine stack executes Java methods, while the Native Method stack executes Native methods. This area also does not require GC.
  • Program counter: Thread-private, which acts as a line number indicator of the bytecode executed by the current thread. We know that JVM multithreading is implemented through a CPU time slice rotation algorithm (that is, threads take turns switching and allocating processor execution time). That is, one thread might be suspended while the other thread gets a time slice and starts executing. In the JVM, a program counter is used to track the byte code execution position of a thread, and when the suspended thread retrieves the time slice, it knows where it was last suspended. This area also does not require GC.
  • Method area: before Java8, there was the concept of permanent generation, implemented in the heap, managed by GC, mainly storing class information, constants, static variables, because of the permanent generation has -xx :MaxPermSize upper limit, so it is easy to create OOM. After Java8, the persistent generation was removed, and then the implementation of the method area was moved into the meta-space in local memory, so that the method area was out of the control of the JVM. Similar in nature to permanent generations, meta-spaces are implementations of method areas in the JVM specification. However, the biggest difference between a meta-space and a permanent generation is that the meta-space is not in the virtual machine, but uses local memory. Therefore, by default, the size of the meta-space is limited only by local memory. So after Java8, GC is not required for method areas either.
  • Heap: The heap is the storage area for Java objects, and any Java object instances and arrays allocated by new fields are allocated on the heap. The GC is mainly used in this region to collect both types of data.

How to determine whether the object is recyclable

The main area of garbage collection (GC) is in the heap. In GC, there are two algorithms, one is reference counting, the number of references to an object is 0 is garbage, and the other is reachability algorithm, if an object is not in the reference chain starting from the GC Root node, it is garbage.

2.1 Reference Counting Algorithm

If an object is referenced, the counter on the object’s head is incremented by one. Each time a reference fails, the counter is reduced by one. If there are no references (the number of references is zero), the object is reclaimed. However, this algorithm is difficult to solve the problem of cyclic reference between objects.

2.2 Accessibility algorithm

The so-called GC Roots is a set of references that must be active. The basic idea is to search from a series of GC Roots all the way down. A line formed by GC Root is called the reference chain.

Which objects can be used as GC Root objects:

  • The object referenced in the virtual machine stack (the local variable table in the stack frame)
  • The object referenced by the class static property in the method area
  • The object referenced by the constant in the method area
  • Objects referenced by JNI (commonly referred to as Native methods) in the Native method stack

Common garbage collection algorithms

You’ve already seen how to determine which objects are recyclable. So what algorithm does the GC use to collect after determining whether it is recyclable? This is to talk about several ways of garbage recycling:

  • Mark clearance
  • Label finishing
  • Replication algorithm
  • Generational collection algorithm

3.1 Mark removal method

Actually very simple, divided into mark and clear two steps. The first step is to mark the object being reclaimed according to the reachability algorithm, and the second step is to reclaim the marked object.

The obvious disadvantage of this garbage collection algorithm is that it is very easy to generate memory fragmentation.

3.2 Label finishing method

The first two steps are the same as the mark clearing algorithm, but the difference is that there is a step of sorting on the basis of the mark clearing algorithm. As shown in the figure, when defragmenting, move all surviving objects to the left and then clean up all areas on the other end so that there is no memory fragmentation.

There is no memory fragmentation, but it is inefficient because of the frequent movement of living objects.

3.3 Replication Algorithm

The memory is divided into two parts, namely area A and area B. The first step is to mark the living objects according to the reachability algorithm, the second step is to copy the living objects to area B, and the third step is to empty all area A. That’s the copy algorithm.

The replication algorithm does not generate memory fragmentation and does not need to move living objects frequently. The disadvantage is that the memory utilization is not sufficient. For example, a 500 MB memory should be divided into two parts and only 250 MB of memory can be utilized.

3.4 Generational collection algorithm

Generational collection algorithm is based on different characteristics of objects, and the use of appropriate algorithms, there is no actual new algorithm generated. The generational collection algorithm is not so much the fourth algorithm as the practical application of the first three algorithms.

First of all, let’s discuss the different characteristics of objects. Objects in memory can be roughly divided into three types according to the length of their life cycle:

  • Aborted objects (New generation) : Objects that live and die, such as local variables in a method.
  • Persistent objects (old age) : Objects that live longer but still die, such as cache objects, singletons, and so on.
  • Persistent objects (persistent generation) : Objects that almost never die after they are generated, such as objects in the String pool (share primitives), loaded class information, and so on.

The region of memory in which the above objects correspond is that aborted objects and persistent objects are in the Java heap, and permanent objects are in the method area.

The principle of generation algorithm is to divide the heap into young generation and old generation according to the different inventory period of the object. The Cenozoic era is divided into Eden region, from Survivor region (S0 region), to Survivor region (S1 region), with a ratio of 8:1:1.

Let’s look at the GC of the young generation. The collection algorithm used by the young generation is the replication algorithm. When the newly created object is created, it is allocated to the Eden region, and when the Eden region becomes full, GC is triggered.

In this step, GC will recover most aborted objects, mark the surviving objects according to the reachability algorithm, copy the surviving objects to S0, and then empty Eden area.

The next time GC is triggered, the live objects in Eden and S0 will be copied to S1, and Eden and S0 will be emptied. The roles of sections S0 and S1 are reversed after each garbage collection. After each GC, age is increased by one if the object survives.

We know that objects that survive longer in the younger generation will get older, and that if the surviving objects reach our threshold, they will be promoted from S0 (or S1) to the older generation. Since the objects of the old age are not often recycled, the algorithm adopted is the label sorting method, the recycling times of the old age are relatively few, and each recycling time is relatively long.

Stop the world

The Stop The World mechanism in Java, referred to as STW, suspends all other threads of The Java application (except The garbage collector) when The garbage collection algorithm is performed, so minimizing STW time is The primary goal of optimizing The JVM.

Common garbage collectors

Garbage collector is actually the specific implementation of the algorithm mentioned above, there is no garbage collector is the best, only according to the characteristics of the application to choose the most appropriate, so the appropriate is the best.

The most common garbage collector, except G1 garbage collector, only works on one region, either in the young generation or in the old generation, so it is generally used in combination. There are seven types of garbage collector.

5.1 Serial Collector

The Serial collector works on a young, single-threaded garbage collector, which means that it only uses one CPU or one thread to collect garbage. While it is collecting garbage, due to the SWT mechanism, other worker threads are suspended until the garbage collection is complete. This garbage collector is suitable for Client mode applications, in a single CPU environment, because there is no overhead and other threads interaction, can concentrate on garbage collection work, can give full play to the advantages of a single thread, simple and efficient. To enable the reclamation mode, run -xx :+UseSerialGC.

5.2 ParNew collector

The ParNew collector is a multithreaded version of the Serial collector that works on the younger generation. By default, the number of collection threads is enabled as well as the number of cpus. The number of runs can be set to ParallelGCThreads by modifying ParallelGCThreads.

5.3 Parallel Scavenge

The Parallel Collector, also known as the through-first collector, is used as a younger, multithreaded garbage collector using a replication algorithm, similar to the ParNew collector. Unlike the ParNew collector, the Parallel Insane collector focuses on throughput and provides two parameters to control throughput, -xx :MaxGCPauseMillis(controls the maximum garbage collection pause time), and -xx :GCTimeRatio(directly sets the throughput size).

If the -xx :+UseAdaptiveSizePolicy parameter is set, the VM will collect monitoring information according to the running status of the system and dynamically adjust the size, Eden,Survivor ratio of the new generation to achieve the maximum garbage collection time or throughput size we set. This adjustment method is called the adaptive adjustment strategy of GC. This is the key difference between the Parallel Scavenge collector and the ParNew collector.

5.4 Serial Old Collector

The Serial Old collector is a single-threaded garbage collector that works in an older era using a mark-collation algorithm. The Collector can be used in Client mode with the Serial collector or, if used in Server mode, with the Parallel Scavenge collector prior to JDK1.5. The CMS collector is a back-up to the CMS garbage collector. Used when a Concurrent Mode Failure occurs.

5.5 Parallel Old Collector

The Parallel Old collector is an older version of the Parallel Avenge collector, a multi-threaded collection using the tag collation algorithm. The Parallel Insane collector and the Parallel Old collector work together.

5.6 CMS Collector

The CMS collector is a collector whose goal is to obtain the shortest collection pause time and adopts a mark-sweep algorithm. This method is applicable to scenarios where the system pause time is short to provide better user experience.

The CMS collector runs in four steps:

  • Initial tag: Marks objects that GC Roots can directly associate with. There is Stop The World.
  • Concurrent markup: GC Roots Tracing, which can be executed concurrently with user threads.
  • Re-marking: The re-determination of The survival of objects generated during marking, The correction of The marking of these objects, The execution time is shorter than that of concurrent marking, and there is Stop The World.
  • Concurrent cleanup: Clears objects that can be executed concurrently with user threads.

Disadvantages of the CMS collector are:

  • Sensitive to CPU resources.
  • Unable to handle floating garbage. A “Concurrent Mode Failure” can occur, resulting in another Full GC. Since the user thread is still running during the Concurrent cleanup, the garbage is cleaned up while new garbage is created, and this garbage (floating garbage) can only be cleaned up during the next GC.
  • It uses the tag clearing algorithm, so it will generate memory fragmentation. Memory fragmentation causes large objects to fail to allocate contiguous memory space, which can then lead to Full GC, affecting application performance.

5.7 G1 Collector

The G1 garbage collector is primarily a service-oriented garbage collector that can be used by both young and old generations. In operation, the whole adopts the tag collation algorithm, and the part adopts the copy algorithm. Both algorithms do not generate memory fragmentation, so the collector can generate continuous memory space after recycling.

It is designed specifically for the following scenarios:

  • Like the CMS collector, can be executed concurrently with application threads.
  • Decluttering free space is faster.
  • GC pause times are needed to be more predictable.
  • You don’t want to sacrifice a lot of throughput performance.
  • A larger Java Heap is not required.

G1 garbage collector’s memory partition no longer uses the traditional memory partition, the new generation, the old physical space partition is removed.

Instead, the heap is divided into regions and only a few regions are collected at a time to control the STW generated by garbage collection. The biggest difference between G1 garbage collector and traditional garbage collector is that it weakens the concept of generation and introduces the idea of partitioning.

G1 uses discrete regions of the same size instead of contiguous storage addresses for each generation. In addition, G1 has an additional H, which stands for Humongous. The H is used to store Humongous objects. If the object size is greater than or equal to half of region, the object is directly allocated to the old region, preventing repeated copying and moving.

G1 garbage collection can be divided into four steps:

  • Initial tag. Collecting all GC roots (object origin pointer, root reference), STW, is done in the young generation.
  • Concurrent tokens. Marks the survivable object.
  • Final tag. It’s the last tagging phase, STW, which is very short and completes all the tagging.
  • Filter recycling. Reclaim regions with no viable objects and add them to the available Region queue.

conclusion

This article has summarized the JVM garbage collection theory knowledge, the idea is to understand the role of GC area is in the heap, then introduce accessibility algorithm is used to mark live objects, know what is recycled object, followed by the use of recycling garbage collection algorithm, and then introduces the common several garbage collection algorithms (tag removal, replication algorithm, Tag collation), and finally several common garbage collectors.

For the introduction of garbage collector, this is only a simple description, not in-depth explanation, because each garbage collector can be explained in detail for a long time, so if you are interested in exploring, CMS and G1 garbage collector are more important two.

That’s all for this article. I hope I can help you after reading it. Thank you for reading it.

Please give me a thumbs-up if you think it is useful. Your thumbs-up is the biggest motivation for my creation

I’m a programmer who tries to be remembered. See you next time!!

Ability is limited, if there is any mistake or improper place, please criticize and correct, study together!