What is garbage

Garbage is an object in memory that is no longer useful. Since it is “garbage collection”, you must know which objects are garbage. An algorithm called “** reachability analysis “is used in the Java virtual machine to determine whether objects can be reclaimed.

Accessibility analysis

The reachable analysis algorithm is introduced from the graph theory in discrete mathematics. JVM regards the reference relationship between all objects in memory as a graph, and starts from a set of objects named “GC Root” as the starting point. From these nodes, the search path is called the reference chain. Finally, the object can be reclaimed by judging whether the reference chain of the object is reachable. As shown below:

For example, in the figure above, objects A/B/C/D/E have A direct or indirect reference chain between them and GC Root, which also means that they are reachable to GC Root and therefore cannot be collected by GC. The objects M and K are referenced to by J, but there is no reference chain connecting them to GC Root, so when GC does garbage collection, it just iterates through J/K/M and recycles them.

Note: Although the circle icon in the figure above marks an object, it actually represents an in-memory reference to that object. Including GC Root is also a set of references rather than objects.

The GC Root object

In Java, there are several objects that can be used as GC Root: 1. Objects referenced in the Java virtual machine stack (local variable table). 2. The object to which the static reference points in the method area. 3. Thread objects that are still alive. 4. Objects referenced by JNI in Native methods.

When to recycle

1.Allocation Failure: During Allocation in heap memory, if the object memory fails to be allocated due to insufficient free space, the system will trigger a GC. System.gc() : At the application layer, Java developers can actively invoke this API to request a GC.

Garbage collection algorithm

Mark clearing algorithm

Starting from the “GC Roots” collection, the entire memory is traversed once, keeping all objects that can be directly or indirectly referenced by GC Roots, and the rest of the objects are treated as garbage and recycled. The process is divided into two steps.

  1. Mark stage: Find all GC Root objects in memory and Mark them as gray (live objects) if they are directly or indirectly connected to GC Root objects, and black (garbage objects) if they are not.
  2. Sweep phase: After all GC roots are traversed, objects marked as garbage are swept directly.

Advantages: Simple implementation, no need to move the object. Disadvantages: This algorithm needs to stop the world from executing other components in the process and can generate memory fragmentation, increasing the frequency of garbage collection.

Replication algorithm

Divide the existing memory space into two pieces, use one piece at a time, and copy living objects from the in-use memory to unused memory blocks at garbage collection time. After that, garbage collection is completed by removing all objects in the memory block being used and switching the roles of the two memory blocks.

  1. Before copying the algorithm, the memory is divided into A and B, and currently only memory A is used, as shown in the following figure:

  1. After marking, all reachable objects are copied into memory B in order and B is set to the memory currently in use. The memory status is as follows:

  • Advantages: Memory can be allocated sequentially, simple implementation, efficient operation, do not consider memory fragmentation.
  • Disadvantages: The available memory size is reduced by half, and objects are frequently copied when they have a high survival rate.

Mark-compression algorithm

All reachable objects need to be marked up, starting at the root node. After that, it does not simply clean up the unmarked objects, but compresses all living objects to one end of memory. Finally, clean up all the space outside the boundary. So tag compression is also done in two steps:

  1. Mark stage: Find all GC Root objects in memory and Mark them as gray (live objects) if they are directly or indirectly connected to GC Root objects, and black (garbage objects) if they are not.
  2. Compact phase: The remaining living objects are sequentially compressed to one end of memory.

  • Advantages: This method not only avoids fragmentation, but also does not require two identical memory Spaces, so it is cost-effective.
  • Disadvantages: The so-called compression operation still needs to carry out local object movement, so it still reduces efficiency to some extent

JVM generational reclamation strategy

The Java virtual machine (JVM) divides the heap memory into several blocks based on the lifetime of the object, usually into the new generation and the old generation. This is the memory generation strategy of JVM. Note: In HotSpot there are not only new generation and old generation, but also permanent generation

The central idea of generational collection is that memory is allocated in the new generation for newly created objects, which generally have a short lifetime. If they survive multiple cycles of recycling, they are transferred to older age.

The young generation

The newly generated objects are preferentially stored in the new generation, and the new generation objects die overnight and have a very low survival rate. In the new generation, the conventional application of a garbage collection can generally recover 70% to 95% of the space, with high recycling efficiency. In the new generation, because some replication operations need to be performed, the GC collection algorithm commonly used is the replication algorithm.

The new generation can be further divided into three parts: Eden, Survivor0 (ABBREVIATED as S0), and Survivor1 (abbreviated as S1). These three sections divide the new generation by an 8:1:1 ratio. The memory allocation process of these three areas is as follows:

Most newly created objects will be stored in Eden. As shown in the figure:

When Eden is full for the first time, garbage collection takes place. Garbage objects in Eden will be collected and cleaned first, and the surviving objects will be copied to S0, where S1 is empty. As shown in the figure:

The next time Eden is full, the garbage collection is performed again. This time, all garbage objects in Eden and S0 will be removed and the living objects will be copied to S1, at which point S0 will be empty. As shown in the figure:

After doing this repeatedly, switching between S0 and S1 a few times (15 by default), if there are any surviving objects. If these objects have a long life cycle, move them to the old age. As shown in the figure:

The old s

If an object survives long enough in the Cenozoic era to not be cleaned up, it will be copied to the old age. The memory size of the older generation is generally larger than that of the new generation and can hold more objects. If the object is large (such as a long string or a large array) and there is not enough space left for the new generation, the large object will be allocated directly to the old generation.

We can use – XX: PretenureSizeThreshold to control the size of the object directly to the old s, is greater than the value of the object will be allocated directly on the old s. In the old age, because of the long life cycle of the object, there is no need for too much copy operation, so the recovery algorithm of mark compression is generally adopted.

Note: It can be the case for older generations that objects from older generations sometimes refer to new generation objects. If you want to perform a Cenozoic GC at this point, you may need to query for possible references to Cenozoic in the entire old era, which is obviously inefficient. Therefore, a 512 byte card table is maintained in the old age, where all information about the old age objects referencing the new age objects is recorded. Whenever GC occurs in the new generation, only the card table needs to be checked, greatly improving performance.

The GC Log analysis

To make it easier for upper-layer application developers to debug Java programs, the JVM provides GC logs. Various corresponding logs are printed during the GC execution of the garbage collection event. There are differences between the logs printed by the new generation and the old generation.

  • Cenozoic GC: This area is called a Minor GC. Because Java objects are mostly ephemeral, Minor GC is frequent and generally fast.
  • Old GC: GC that occurs in this region is also called Major GC or Full GC. When a Major GC occurs, it is often accompanied by at least one Minor GC.

Note: There are some differences between the Major and Full GC in some virtual machine implementations. The Major GC simply represents the collection of the old generation, while the Full GC represents the collection of the entire heap, which is the new generation plus the old generation.

Conclusion:

The virtual machine garbage collection mechanism is often one of the major factors affecting system performance and concurrency. Especially for Android developers, garbage collection can sometimes affect UI threads to a large extent and cause the interface to stall. Therefore, understanding the garbage collection mechanism and learning to analyze GC logs is an essential skill.