A profile,

Understanding the underlying principle of Java virtual machine garbage collection mechanism is the foundation of system tuning and online problem detection, and also a senior Java programmer’s basic skills, this article for Java garbage collection this topic to do some sort and record.

There are many different types of Java garbage collectors, and they are designed to balance throughput (memory space) against real-time (user thread interruptions), with different scenarios (e.g. Desktop applications, Web applications), so we are only talking about the default garbage collector under JDK8, which is currently the dominant version (80%), and we are only talking about garbage collection for heap memory space.

The default garbage collector for JDK8 is UseParallelGC: Parallel (new generation) + (old generation) garbage collector

How to determine whether the object is recyclable?

Let’s start with a question: With all the objects in the heap, which objects should the collector reclaim? How do you identify the objects to recycle? So for garbage collection, determining and identifying whether an object is recyclable is the first step. Theoretically speaking, there are two ways to determine whether an object is recyclable.

The first, reference counter algorithm:

Every time an object is referenced the counter increases by 1, the object loses the reference and the counter decreases by 1. If the counter is 0, the object is dead. This algorithm is simple and efficient, but requires more overhead for circular references or other complex cases, so Java rarely uses it.

Second, root search algorithm – reachability analysis algorithm:

The so-called reachability analysis refers to searching down the root of GCRoots (summarized as “follow the trail” with an idiom). The whole searching process forms a “reference chain”, as long as the objects on the reference chain are called reachability, and those outside the reference chain (indicating that they have nothing to do with GCRoots) are called unreachable. Unreachable objects are then judged to be recyclable.

Which objects can serve as GCRoots objects? These include:

  • Reference objects (method parameters, local variables, temporary variables) in the local variable table on the vm stack frame
  • Static properties in the method area reference type objects and constants reference objects
  • Reference objects in the Native method stack
  • Reference objects within the Java virtual machine, such as exception objects and system class loaders
  • So an object held by synchronize
  • Java virtual machine internal situation registration callback, local cache, and so on

As GCRoots, these are easy to understand if you know something about the memory layout and flow of the virtual machine. They are the source of the program’s runtime, and the program must rely on them for proper operation. Objects that have nothing to do with these sources are considered recyclable. It’s like, “If the melon fell off the vine, it must be useless.”

Reachability analysis is easy to understand in theory, but in the specific run of garbage collector, the problem to consider how many times more complex, because in the reachability analysis at the same time, the program is also running in parallel, the state of the entire memory heap as the program is running is real-time change, to be real

If the analysis results are consistent with the memory state, the user thread must be paused and analyzed in a snapshot.

Third, garbage collection algorithm

Reachable analysis solves the problem of determining whether an object is recyclable, so what happens to memory space during garbage collection? This is the garbage collection algorithm to discuss the problem, according to the algorithm to take different operations on memory, garbage collection algorithm can be divided into three kinds, mark-clear algorithm, mark-copy algorithm, mark-collation algorithm.

1. Mark-clear algorithm

By name can understand the algorithm is divided into two stages: the object of the first mark all need to be recycled, and unified the labeled objects clear, clear object occupied area of memory, before the figure below shows the recovery and recycling memory area after contrast, red said recyclable objects, orange said not object, white said memory space.

Two disadvantages of the mark-clear algorithm:

The first is that the execution efficiency is not controllable. Imagine if most objects in the heap are recyclable, and the collector has to perform a lot of marking and collecting operations.

Second: a lot of memory fragmentation is generated. According to the memory state graph after the collection, the region memory after the collection is not continuous. When there are large objects to allocate and the space cannot be found to meet the size, the next garbage collection is triggered.

2. Mark-copy algorithm

In view of the efficiency of mark-clear algorithm and the shortcomings of memory fragmentation, computer scientists put forward a “half copy region” algorithm.

Tag – memory replication algorithm is divided into two regions with the same size, operation area, the reserved area, all to create a new object is assigned to run area, when the operation area of memory, will run all live objects are copied to the reserved area in the region, and then to empty the whole operation area of memory, then the role of the two areas have also changed, Each live object is kicked out like a ball in the running area and the reserved area, while the garbage object will be released as the entire area memory is empty. The state before and after memory is shown in the following figure:

In the case of a large number of garbage objects, the mark-copy algorithm only needs to copy a small number of living objects, and does not produce the memory fragmentation problem. The allocation of new memory only needs to move the heap top pointer in order, which gives consideration to both the efficiency and the problem of memory fragmentation.

The tag-copy algorithm also has disadvantages:

Reserving half of the memory area is wasteful, and if a lot of memory is alive and there are only a few garbage objects, the collector will have to perform more copies to free up the small amount of memory, which is not worth the cost.

3. Mark-tidy algorithm

Tag – replication algorithm to waste half of memory space, and in most states for survival states use efficiency is low, for this situation, a computer scientist and proposed a new algorithm “tag – sorting algorithm”, sorting algorithm of the marking and other algorithms, but in the finishing stage, algorithm move objects from one end of the memory space of survival, Then empty all space outside the boundary of the surviving object, as shown below:

The markup defragmentation algorithm solves the memory fragmentation problem, and there is no waste of space. It looks nice. However, when there are many live objects in memory, and they are all small objects, and there are few garbage objects, a large number of live objects have to be moved for a small amount of memory space.

Different garbage collection algorithms have their own advantages and disadvantages and are suitable for different garbage collection scenarios.

New generation and old age heap memory structure

How is the Java heap memory space divided between the new generation and the old generation? How are objects allocated to different regions after they are created? It can be seen from the following figure that the whole heap memory is divided into two large regions: Cenozoic and old age. By default, Cenozoic occupies 1/3 of the space and old age occupies 2/3 of the space. Cenozoic is further divided into two regions: Eden, Survial, S0 and S1, which account for 8/10 and 1/10 respectively by default. 1/10 of the space.

Why do they do that? Why are there so many different areas of memory? This is determined by the life cycle characteristics of the object and the advantages and disadvantages of various garbage collection algorithms, which is the theoretical basis of garbage collector design. After statistical analysis, most application object life cycles conform to two characteristics:

The theoretical basis of garbage recycling

  • The vast majority of objects are “ephemeral”, which means they die soon after being created
  • The more an object survives this garbage collection process, the harder it is to die

Only one garbage collection algorithm can be used for an independent memory region. According to the characteristics of the object’s life cycle, it can be divided into different regions, and then specific garbage collection algorithm can be used for specific regions. Only in this way can the advantages of garbage collection algorithm be brought into full play.

For example, the mark-copy algorithm is used in the new generation, and the mark-collation algorithm is used in the old generation.

Detailed explanation of heap memory reclamation process

We analyzed how to determine whether an object is recyclable or not, as well as three basic garbage collection algorithms, as well as the division of memory regions between the young generation and the old generation and why. Next, we will analyze the heap memory reclamation process step by step.

1. Initial state of memory

Suppose that before the first garbage collection, the state in memory is as shown in the figure. Eden area has two living objects and three garbage objects, and the available area of memory is almost empty. Survivor area is empty because no MinorGC has been carried out, and one large object is directly allocated to the old age.

2. Status after the first MinorGC execution

When a new object is allocated to Eden, it is found that the memory space is insufficient, so the first MinorGC is triggered. The garbage collector first copies the two surviving objects in Edne to S0, and then clears Eden space, as shown below:

3. State of the program after running for a period of time

After the first MinorGC program runs for a while, the heap memory state is as follows: A large number of objects are generated in Eden area again, and most of them can be reclaimed, which is also in line with the characteristics of “living in the future and going out of life”. There is also one object that can be reclaimed in S0 area, and S1 remains unchanged from the old age. In this state, what will happen if MinorGC is triggered again by new object allocation?

4. Status after the second MinorGC execution

The new object is allocated enough space in Eden area, which triggers the second MinorGC. The operation of the second MinorGC is the same as that of the first GC in Eden area: Copy the surviving objects in Eden area to S1 area, and then empty the whole Eden area. At the same time, copy the surviving objects in S0 area to S1 area and increase the age of the objects by 1, and then empty S0 area. The state after GC is as follows:

5. Status of the 2MinorGC program after running for a period of time

After the second MinorGC, the program ran for a period of time. Many objects were generated in Eden area and one object could be recovered in S1 area.

6. Memory status after the 15th minorgc

In each subsequent MinorGC, the same as in the second MinorGC, the survivor object is moved from Eden and survivor non-blank area to the blank area in survivor area, and the memory space of these two areas is emptied. The survivor object is moved from the two survivor areas each time, and the age of the object is increased by 1. The following figure shows the state of heap memory after 15 minorgcs.

For the memory collection of the young generation region, the mark-copy algorithm is used, just to reduce the memory waste in the blank area of the replication algorithm. Instead of dividing the memory into two parts, the memory is cleverly divided into three regions, and the reserved blank area only accounts for 1/10 of the whole young generation region.

7, how to enter the old age

The above is the allocation and recycling problem of the young generation, how does the object enter the old age? In my opinion, there are 2 types and 6 situations when objects enter the old age.

The first type — direct allocation:

When an object is created, it is directly assigned to an older age.

  • The default value of an object that exceeds the size set by the VM PretenureSizeThreshold parameter is 0, which means that objects of any size are allocated to the Eden area first.
  • An object larger than Eden
  • If the new generation allocation fails, a large array or large string

The second type — promotion from the younger generation:

There are also three types of promotion from the younger generation to the older generation.

  • When MinorGC is executed, the surviving objects in Eden zone should be copied to Survivor zone. However, the default space of Survivor zone is only 2/10 of that of the new generation, and only 1/10 of the actual space is used. When Survivor zone memory is not enough to allocate all the surviving objects, You need to assign Survivor objects to older generations, a mechanism called allocation guarantees.
  • The age of the object exceeds MaxTenuringThreshold. The maximum age is 15.
  • The sum of the size of all objects of the same age in the Survivor space is greater than half of the TargetSurvivorRatio of the Survivor space, and objects older than or equal to this age go directly to the old age.

Memory allocation guarantee mechanism

When MinorGC is executed, the Survivor zone should be copied to Survivor zone, but the default space of Survivor zone is only 2/10 of that of the new generation, and only 1/10 of the actual space is used. When Survivor zone memory is insufficient to allocate all viable objects, This mechanism is called allocation guarantee. However, space in the old age is also limited. If there is not enough space in the old age, the FullGC can only be executed once.

8. Old generation recycling algorithm -FullGC

FullGC is triggered when an object is about to enter the old age and the old age space is insufficient. Of course, FullGC is triggered by more than just the old age space. FullGC uses the mark-collation algorithm mentioned above.

Six, summarized

Judging whether objects can be recycled is the basis and premise of garbage collection. Through the reachability analysis, “follow the trail” from GCRoots to find unreachable objects (recyclable).

The characteristics of the life cycle of objects are the theoretical basis of garbage collection algorithm.

There are three basic garbage collection algorithms, namely, mark-clearing algorithm, mark-responsible algorithm and mark-sorting algorithm, which have their own occasions and advantages and disadvantages.

According to the characteristics of the life cycle of the object, generational garbage algorithm divides it into different regions, so as to use the most suitable garbage algorithm for optimization.

In the default configuration of JDK8, the new generation garbage collection strategy is used, the new generation area uses mark-copy algorithm, and the old age area uses mark-collation algorithm.

Author: Flying fish climbing

Original link: xie.infoq.cn/article/9d4…

This article is only for learning, copyright belongs to the original author, if there is infringement, please contact delete.

I’ve compiled the interview questions and answers in PDF files, as well as a set of learning materials covering, but not limited to, the Java Virtual Machine, the Spring framework, Java threads, data structures, design patterns and more.

Follow the public account “Java Circle” for information, as well as quality articles delivered daily.