Outline of this paper:

An overview,

1.1 Garbage Collection

Garbage Collection, the GC. In the process of running the program, objects in the Java heap and method area are dynamically recycled.

1.2 Why does garbage collection occur in the heap and method area?

In the previous blog LCODER’s JVM series: Runtime data area, we saw that the program counter, the virtual machine stack, and the local method stack are created and killed by the thread. In memory stack frame, it is with the method of entry and exit in an orderly way the execution of the stack and the stack operation, how much memory allocation in each stack frame, basically settled in the class structure is known, this a few areas of memory allocation and recovery with certainty, methods, or thread at the end of the nature of memory with recycling. Unlike method areas, the Java heap does not know which objects will be created until the program is running, and this part of memory is allocated and reclaimed dynamically.

How to judge whether the object is still alive?

Before the garbage collector can collect the heap, the first thing to do is to determine which objects in the heap are still alive and which are no longer used. Reference counting and reachability algorithms are commonly used to analyze whether an object is alive or not.

2.1 Reference counting method

Add a reference counter to the object with a value of +1 every time it is referenced; When a reference is invalidated, the counter value is -1, and at any time an object with a 0 counter is no longer used. Disadvantages: Cannot solve the problem of circular references.

2.2 Analysis of accessibility algorithm

Through a series of “GC Roots” objects as the starting point, search down from these nodes, the search path is called the reference chain. When an object is not connected to GC Roots by any reference chain, that is, from GC Roots to this object is unreachable, this object is proved to be unreferential. In Java, objects that can be used as GC Roots include the following:

  1. The object referenced in the local variable table in the virtual machine stack frame
  2. The object referenced by the class static property in the method area.
  3. The object referenced by the constant in the method area.
  4. Objects referenced by Native methods in the Native method stack

If there is no reference chain connected with GC roots after the reachability analysis of the object, it will be marked for the first time and filtered. The filtering condition is whether it is necessary to execute finalize() method. When objects do not override the Finalize () method or finalize() method has already been called by the virtual machine, the virtual machine considers both cases unnecessary. If the object is judged to be necessary to execute finalize method, then the object will be put in an F-queue. Finalize method is the last chance for the object to escape death. Later, GC will mark the object in the F-queue for a second time. If objects want to save themselves in Finalize, they will be removed from the collection when they are tagged the second time. If objects want to save themselves in Finalize, they can re-associate with any objects in the reference chain (that is, they can rewrite finalize() methods, such as assigning themselves to a class variable or a member variable of the object), they will be removed from the collection when they are tagged the second time. The Finalize () method of any object will only be called once by the system.

2.3 The four types of references

Strong references are common references in program code such as Object O = new Object(). As long as Strong references exist, the garbage collector will never reclaim the referenced Object. Soft references are used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to run out of memory. An out-of-memory exception is thrown if there is not enough memory for this collection. Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with Weak references only survive until the next garbage collection is sent. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory. A Phantom Reference is also called a Phantom Reference. Whether an object has a Phantom Reference or not has no effect on its lifetime and cannot be used to obtain an object instance. The only purpose of a virtual reference is to receive a system notification when an object is reclaimed.

2.4 Method area recycling

Garbage collection in the method area is not as efficient as the Java heap. The method area, also known as the persistent generation in the HotSpot VIRTUAL machine, holds static classes and methods, constants, and compiled classes. These are data that don’t need to be recycled; they don’t need to be created and recycled frequently. So this is where garbage collection takes place, which typically involves collecting discarded constants and useless classes.

Recycling deprecated constants is very similar to recycling objects in the Java heap. For example, a String “ABC” is already in the constant pool, but there is no String named “ABC” in the system. In other words, no String refers to the “ABC” constant in the constant pool, and there is no other place to refer to the literal. If memory reclamation occurs at this point, the “ABC” is removed from the constant pool by the system. Symbolic references to other classes (interfaces), methods, and fields in the constant pool are similar.

Compared with discarded constants, the judgment of a useless class is more stringent. A class must meet the following three conditions to be considered a “useless class” :

  1. All instances of the class have already been reclaimed, meaning that there are no instances of the class in the Java heap
  2. The ClassLoader that loaded the class has been reclaimed
  3. The java.lang.Class object corresponding to this Class has no reference anywhere, and its methods cannot be accessed through reflection anywhere

The JVM can reclaim useless classes that meet all three of these criteria, but only “can”, not necessarily. Scenarios where reflection, dynamic proxies, and so on are heavily used require the JVM to have the ability to unload classes to prevent overflow of permanent generations.

Third, garbage collection algorithm

3.1 Mark-Sweep Algorithm

The most basic algorithm: mark-clear algorithm, as the name implies, this algorithm is divided into two stages: mark, clear. First, all marked objects need to be recycled, after the completion of marking, all marked objects are recycled. The process of marking is that the garbage collector walks from GC Roots, marking all referenced objects and recording them as reachable in the object’s Header. The process of cleaning is that the garbage collector simply walks linearly through the heap memory from beginning to end, and if it finds that an object is not marked as reachable in its headers, it is reclaimed.

Disadvantages of mark-clear algorithm: 1. Low efficiency. 2. A large number of discrete memory fragments will be generated after the mark is cleared. Too much memory fragmentation can cause contiguous memory not to be found the next time a large object is allocated, triggering another GC, or even OutOfMemory.

3.2 Copying Algorithms

Copy algorithm: based on the mark-clear algorithm evolved, is the available memory according to the available size divided into two equal size, each time only one of the pieces. When this area of memory is used up, the surviving objects are copied to another area of memory, and then the area is completely emptied, thus avoiding the problem of creating discrete memory fragments. But the downside is that it’s half as much memory, and obviously, it’s not worth it.

3.3 Mark-Compact Algorithm

Mark-clean algorithm: mainly used in the old era, the marking process is the same as the mark-clean algorithm. The process of declacting is to move all surviving objects towards one end and then clean up memory directly beyond the end boundary.

3.4 Generational Collection algorithm

In Android virtual machines, this algorithm is mainly used for garbage collection. In the Android Advanced OS version, there is a Generational Heap Memory model for Heap Memory, as shown in the following figure:

As you can see from the figure, the entire heap is divided into three regions: Young Generation, Old Generation, and Permanent Generation.

3.4.1 Young Generation

The young generation is divided into three zones: Eden Zone, S0 zone and S1 zone. The default ratio is 8:1:1. S0 zone and S1 zone are both Survivor zones and are essentially the same. Objects of the younger generation, which have a shorter lifetime, are recycled based on Copying algorithms. For the young generation, objects are copied between Eden, S0, and S1. Recycling process:

  1. Most of the newly created objects will be stored in Eden.
  2. If the Eden area does not have enough space to allocate, the VM initiates a Minor GC. After the first Minor GC in Eden area, the surviving objects are moved to one of the Survivor areas (for example, S1, which is From), and Eden area is emptied.
  3. After performing a Minor GC in Eden (for the whole generation, not just Eden), if S1 objects are still alive, the age of the surviving objects increases. All surviving objects are then moved to another Survivor zone (such as S2, which is the From zone for the next Minor GC), and Eden space and S1 are emptied.
  4. Perform one Minor GC, and the age of the object is +1. Perform MaxTenuringThreshold (15) several times, and the remaining objects are moved to the old age.

The young generation uses a free pointer to control GC triggering. The pointer holds the position of the last allocated object in the young generation. When there is a new object to allocate memory, the pointer is used to check whether there is enough space. If the age of the object is +1 and the age of the object is 15, not all new objects will be placed in Eden. If a large object needs a lot of continuous space, it will be assigned to the old age as soon as it is born.

3.4.2 Old Generation

The old generation has only one region, which by default is twice the size of the young generation. The old age stores objects copied from the young generation above. Generally speaking, the life cycle of objects in the old age is relatively long and stable. The GC sent in this area, known as the Full GC, uses a mark-collation algorithm for garbage collection.

3.4.3 Permanent Generation

The reclamation strategy for this area, as explained in 2.4 above, will not be repeated here.

3.4.4 Guarantee of space allocation

Before Minor GC occurs, the virtual machine checks to see if the maximum available contiguous space of the old generation is greater than the total space of all objects of the new generation. If this condition is true, then Minor GC is guaranteed to be safe. If this is not true, the virtual machine checks the HandlePromotionFailure setting to see if the guarantee failure is allowed. If so, it continues to check whether the maximum available contiguous space of the old age is greater than the average size of the objects promoted to the old age, and if so, a Minor GC is attempted, although this Minor GC is risky; If less than, or if the HandlePromotionFailure setting does not allow risk, then do a Full GC instead. To explain what the “risk” is, as mentioned earlier, the new generation uses a copy-collection algorithm but uses only one Survivor space as a rotation backup for memory utilization, Therefore, when a large number of objects survive the Minor GC (the most extreme case is when all objects survive in the new generation after memory reclamation), an allocation guarantee from the old generation is required to pass Survivor objects directly into the old generation. And life of the loan guarantees, similar to the old s such a guarantee, the premise is that the old s itself and accommodate these objects of the remaining space, how many objects are to live in the actual memory recovery is not clear know before, so I had to take each time before recycling the average size of the object to the old s capacity promotion value as experience value, Compare the remaining space of the old generation and decide whether to make more space for the old generation by Full GC. Averaging comparisons is still a dynamic probability measure, that is, if the number of objects surviving a Minor GC is significantly higher than the average, it can still lead to Handle Promotion Failure. If a HandlePromotionFailure fails, the Full GC will have to be restarted after the failure. Although the loop is largest when a guarantee fails, HandlePromotionFailure is turned on in most cases to avoid Full GC too often.