Previous article
(1) The data region of the JVM runtime
2, The JVM (ii), Java object creation process
(3) Allocating memory for object creation
Garbage collection is a big topic and a very important part of our application performance optimization. Being good at diagnosing GC problems on the JVM not only makes you look good on the job, but also makes you more attractive to interviewers. GC this part I will say common garbage collection algorithm, garbage collection algorithm based algorithm, there are also have some complicated incremental algorithm and generational algorithm, etc., and then there’s the working process of the common garbage collector and optimize the details of the collector said I will focus on the CMS, ParNew, G1, ZGC these a few, still can take you read hands-on GC log, As well as for the application type and some common scenarios to do a certain optimization summary, will also introduce several troubleshooting tools. There are some things in this, such as the working process of garbage collector, here is a certain difficulty. If you have any questions, please leave a comment at the bottom of the article. As long as I see it, I will reply to you.
GC involves a lot of points, here I will be more detailed, my goal is to let you become a real person, so in the later GC will often show a variety of troubleshooting tools and my troubleshooting process, so it is inevitable that the article will be longer.
This chapter starts with the basic garbage collection algorithm, which is a big premise to understand.
First, the basis for judging the object can be recycled
There are two ways to determine whether an object can be reclaimed, and we’ll talk about them separately
1. Reference counting
This is a very old algorithm. When an object is allocated in heap memory, it is allocated an extra space for the object. This space is used to maintain a counter. If a reference to this object is null or to another object, the value of the counter is reduced by one. Each time a new reference is made to this object, the counter is incremented by one; Conversely, if a reference to this object is null or to another object, the counter is reduced by 1; When the value of the counter is 0, the object is automatically deleted.
2. Accessibility analysis
The reachability analysis method takes the root set (GCRoot) as the starting point, starts from these nodes and starts to search according to the reference relationship. The path is called the reference chain. When an object is not accessed by any reference chain, it is proved that the object is inactive and can be recycled. This is exactly what the JVM uses.
Fundamentally speaking, there is no superior or inferior between the two algorithms, only the difference between more and less used. The most used algorithm is the accessibility analysis algorithm, probably for the following reasons: 1. Reference counting can be difficult to solve the problem of recycled referrals. Although the Recycler algorithm can be used to solve this problem, it can cause unpredictable performance problems. 2. Changing the number of references in a multi-threaded environment is a significant performance drain.
GCRoot refers to the starting point from which reachable analysis is performed. In Java, GCRoot can be used as: 1. Objects referenced in virtual machine stack (local variable table in stack frame) 2. Objects referenced by class static attributes in method area 3Copy the code
The two methods mentioned above are the most basic methods used to determine whether an object can be recycled, and many algorithms such as the tag algorithm are based on this. Let’s look at the specific garbage collection algorithm.
Second, the basic garbage collection algorithm
Here are three basic garbage collection algorithms that are the foundation for understanding how the garbage collector works
(1) Mark clearing algorithm
This algorithm is divided into two stages: marking stage and clearing stage. The task of the mark phase is to mark all the objects that need to be reclaimed. The clear phase is to reclaim the space occupied by the marked objects. The process is as follows:As we can also see from the figure, the tag clearing algorithm has a serious memory fragmentation problem, which leads to inefficient allocation of memory when creating objects, and too much memory fragmentation can lead to additional garbage collection actions.
(2) Mark compression algorithm
This algorithm is designed to solve the memory fragmentation problem above. After marking, the living object is moved to one end of memory as follows:Let’s not talk about the pros and cons of this algorithm, but let’s talk about another algorithm
(3) Tag copy algorithm
In this algorithm, the memory is divided into two pieces, and only one piece is used each time. The marking action only occurs in the used piece of memory, and then all the surviving objects are copied to the other piece of memory, and then all the previous memory blocks are emptied. The process is as follows:Both algorithms solve the problem of memory fragmentation. However, the tag copy algorithm wastes half of the memory, which is the problem of low memory utilization. Therefore, the tag compression algorithm is suitable for all scenarios. Obviously not, if there are a lot of live objects in the block, using the tag compression algorithm involves copying a lot of objects and changing the reference addresses, which can reduce the execution time allocated to the user thread (in fact, the tag replication algorithm also has this problem when there are a lot of live objects).
Third, the improvement of garbage collection algorithm
The two garbage collection algorithms described below address the shortcomings of the base algorithm, such as memory fragmentation, long pause times, and low space utilization.
(1) Generational algorithm
The generational algorithm is based on the hypothesis that most objects are ephemeral. This hypothesis has been proven, and it is not hard to imagine that most OLTP applications recycle objects as soon as they are created. The generation algorithm classifies objects into generations and uses different GC algorithms for different generations: A newly generated object is called a Cenozoic object, a GC performed on a new object is called a Minor GC, an object of a certain age is called an old age OBJECT, a GC for an old age object is called a major GC, and a new generation object is called a promotion. But more algebra is not always better. In general, two or three generations of algebra are the best.
A new concept is introduced, OLTP and OLAP, which are two different types of applications. OLAP is usually a data analysis application, while OLTP is a common real-time service application. JVM optimizations are different for these two applications, so keep that in mind and we'll talk about optimizations later.Copy the code
The old age GC will not be executed until the old age space is filled by objects promoted through the new generation GC. As a result, the old GC is executed less frequently than the Cenozoic GC. By using generational garbage collection, you can reduce the time spent on GC.
(2) Incremental algorithm
Incremental algorithms mainly control STW (stop the world) time by means of concurrency. This is reflected in the fact that garbage collection thread work and user thread work are performed concurrently. This is because the JVM USES is accessibility analysis algorithm, based on analysis of accessibility marker algorithm work process is generally divided into several stages, each stage does not need to stop the user threads, so they need not stop user thread stage only needs and user threads execute concurrently (incremental if don’t understand don’t try so hard, You’ll understand better when we talk about garbage collectors later.)
4. Problems and solutions brought by the improved garbage collection algorithm
These improvements also come at a cost, because the essence of these algorithms is a tradeoff between time and space. If you get faster, you’ll have to sacrifice some memory, and if you get smaller, you’ll be slower.
(1) The object reference relationship changes during the marking process
Let’s take a look at the marking process in the marking algorithm. This process can be abstractly summarized by the three-color algorithm, that is, objects are divided into three categories according to the different degrees of marking. White: objects that are not marked by the garbage collector, gray: objects that have been marked themselves but whose member variables have not been marked, and black: Itself is marked, and all member variables of the object itself are marked.
Object at the beginning of the GC are all white, such as state 1, first of all, on the base of GCRoot scanning, will figure in A labeled gray, E marked as black as state 2, then embarks from the gray object (no scan), black, grey reference marked as (A) black, then B tag as the gray, such as state 3, then the recursive loop based on gray scanning process, At the end of the day, only D will not be scanned, so D will be garbage, and the garbage will be removed during the cleanup process.
Question 1: If the reference between A and B is removed in state 3 when the JVM is ready for the next markup, then the next markup will start with B as black and C as gray, and both B and C will be marked. But in fact, we can easily infer from the figure that B and C are unreachable from GCRoot after A and B are removed from the reference relationship. Therefore, B and C cannot be recycled in this round of recycling, so B and C become floating garbage.
Question 2: Another happens at the end of the state of 2, the JVM before ready for the next step, quote B E, and A reference to the relationship between A and B were lifted, then B and C will not be scanned in the next round of the scanning to the next time (because the scan will only starting from the gray object), then B and C will become garbage, this is very serious, because it affects the correctness of the program. The JVM introduces read and write barriers to address these two problems, and read and write barriers occur under the following conditions
Question 2 the code for this scenario is as follows: void test(A A,E E) {B B = a.b; // Trigger read barrier a.b = null; e.b = b; // Trigger write barrier}Copy the code
Specifically, (1) Write barrier: when E.b =b, mark e object or B object as gray so that the next scan can be detected. (2) Read barrier: mark A.B as gray immediately when a.B is assigned to B and B to ensure the scan accuracy. This is also called incremental update.
(2) Cross-generation references
There is a price to be using a generational, the object not only in the new generation is GCRoot references, and may be old s object references, as shown in figure, want to know whether B can be recycled, will have to scan the old s, but it lost the meaning of generational JVM must be at the same time in the safe recovery of Cenozoic old s don’t scan, Otherwise, it’s better not to divide generations.
In the figure above, we can also use C as a GCRoot so that each scan can start from C, or we can color C gray (write barrier). The method of using C as GCRoot is to record the old age object into a Set named Remember Set whenever cross-generation reference occurs, and then the object in Remember Set will also be used as GCRoot when scanning, thus avoiding the problem of scanning the whole old age.