Hello everyone, I am Fang Mu ~
JVM garbage collection (GC) is a new garbage collection algorithm that can be used to collect garbage in the JVM. It is a new garbage collection algorithm that can be used to collect garbage in Java. Still, you need to understand the JVM garbage collection mechanism so that you don’t get overwhelmed by the lack of heap memory, OOM, and frequent GC.
Determination of object viability
The precondition of garbage collection is to determine the object viability. The commonly used methods for determining object viability include reference counting and reachability analysis. However, since reference counting cannot solve the problem of cyclic reference of objects, the mainstream JVM tends to use reachability analysis.
Reference Counting
Reference counters are used in Microsoft’s COM component technology and in Adobe’s ActionScript3. The principle of reference counter is very simple. For an object A, as long as any object references A, the reference counter of A is increased by 1. When the reference is invalid, the reference counter is decreased by 1. Object A cannot be used again as long as its reference counter has A value of 0. Reference counters are also very simple to implement, requiring only one orthopedic counter to be configured for each object. But reference counters have a serious problem in that they can’t handle circular references. Therefore, this algorithm is not used in the Java garbage collector. A simple circular reference problem is described as follows: there are objects A and B, object A contains A reference to object B, and object B contains A reference to object A. At this point, neither object A nor object B has A reference counter of 0. But there is no third object in the system that references either A or B. In other words, A and B are garbage objects that should be collected, but the garbage objects reference each other so that the garbage collector cannot recognize them, causing A memory leak.
Reference tree traversal
A reference tree is essentially a rooted graph structure that follows the root handle of an object down to a living node and marks it down. The remaining unmarked nodes are dead nodes, and these objects can be recycled, or the living nodes can be copied away, depending on the region in the HeapSize and the algorithm. Its rough schematic diagram is shown in the figure below (note that the pointer is one-way here) :
First, all collectors count viable objects through a marking process. All modern GC algorithms used in the JVM find all surviving objects before recycling. This concept is best illustrated by the memory layout in the JVM shown below:
The so-called GC root objects include: all local variables and input parameters in the currently executing method, active threads, static variables in the loaded class, and JNI references. Next, the garbage collector iterates through the entire object graph in memory, starting with the GC root object and then other objects referenced by the root object, such as instance variables. The collector marks all objects accessed as alive. Live objects are marked blue in the figure above. When the marking phase is complete, all living objects have been marked. The others (the gray ones in the figure above) are objects that the GC root object cannot reach, meaning that your application will no longer use them. These are garbage objects that the collector will clean up in the following stages.
However, objects found unable to reach GC Roots are not immediately recycled and must be marked at least twice before they can actually be recycled. When it is found unreachable for the first time, the object is tagged once, and the Finalize () method of the object is called (if any); The object is reclaimed after it is found unreachable a second time. With the finalisze() method, an object can escape being recycled once, but only once. To escape, you need to add a hook from GCRoots to finalize() :
public class EscapeFromGC {
public static EscapeFromGC hook;
@Override
protected void finalize(a) throws Throwable {
super.finalize();
System.out.println("finalize mehtod executed!");
EscapeFromGC.hook = this; }}Copy the code
Mark-sweep (mark-sweep)
The mark-sweep algorithm divides garbage collection into two phases: the tag phase and the sweep phase. One possible implementation is to mark all larger objects starting from the root node first through the root node in the marking phase. Therefore, unmarked objects are unreferenced junk objects. Then, in the purge phase, all unmarked objects are cleared. The biggest problem with this algorithm is that there is a large amount of space debris, because the recovered space is discontinuous. In the heap space allocation of objects, especially large objects, the efficiency of discontinuous memory space is lower than that of continuous memory space.
Conceptually, the mark-clear algorithm takes the easiest approach, simply ignoring these objects. That is, when the marking phase is complete, the space where the objects are not accessed is considered free and can be used to create new objects. This approach requires a free list to record all free areas and their sizes. Managing the free list increases the amount of work to allocate objects. Another drawback of this approach is that although the size of the free region is sufficient, there may be no single region that is sufficient for the allocation, and therefore the allocation will fail (an OutOfMemoryError in Java).
Copying algorithms
The existing memory space is divided into two parts and only one part is used at a time. During garbage collection, the surviving objects in the used memory are copied to the unused memory block. After that, all objects in the used memory block are cleared and the roles of the two memory blocks are swapped to complete garbage collection. If there are many junk objects on the system, the number of live objects that the replication algorithm needs to copy is not very large. So replication algorithms are efficient when garbage collection is really needed. And because objects are uniformly copied to the new memory space during garbage collection, the reclaimed memory space is guaranteed to be free of fragmentation. The disadvantage of this algorithm is that the system memory is halved.
The idea of a replication algorithm is used in Java’s next-generation serial garbage collector. The New generation is divided into three parts: Eden Space, from Space and to space. The FROM space and to space can be regarded as two space blocks of the same size, equal status and role exchange for replication. From and to Spaces, also known as survivor Spaces, are used to hold objects that have not been reclaimed. At garbage collection, living objects in Eden space are copied to the unused survivor space (assuming to), and young objects in the used survivor space (assuming from) are copied to the TO space (large objects, If the to space is full, the object will enter the old age directly. At this point, the remaining objects in Eden space and from space are garbage objects and can be directly emptied, while to space stores the surviving objects after the reclamation. This improved replication algorithm not only ensures the continuity of the space, but also avoids a large amount of memory space waste.
Mark-copy algorithms are very similar to mark-collation algorithms in that they reassign all living objects. The difference is that the target address of the reallocation is different. The replication algorithm allocates an additional region of memory to the living objects as their new home. The advantage of the tag copy algorithm is that the tag and copy phases can be carried out simultaneously. The disadvantage is that it requires an extra chunk of memory to hold all the living objects.
Mark-compact (Marker-compression algorithm)
The high efficiency of the replication algorithm is based on the premise of fewer live objects and more garbage objects. This happens a lot in the younger generation, but it’s more common in the older generation for most objects to be alive. If the replication algorithm is still used, the cost of replication will be high due to the large number of viable objects.
Mark-compression algorithm is an old recycling algorithm, which has made some optimization on the basis of mark-clean algorithm. It also needs to mark all reachable objects first, starting at the root node, but after that, instead of simply cleaning up the unmarked objects, it compresses all living objects to one end of memory. After that, clean up all the space outside the boundary. This method not only avoids fragmentation, but also does not require two identical memory Spaces, so it is cost-effective.
The mark-compress algorithm fixes the weakness of the mark-clean algorithm by moving all marked, or live, objects to the beginning of the memory region. The downside of this approach is that GC pauses are longer because you need to copy all objects to a new location and update their reference addresses. Compared with mark-clear algorithm, its advantages are also obvious — after collation, new object partitioning can be completed by pointer bumping, which is quite simple. With this method, the location of the free area is always known and there is no fragmentation problem.
Comparison of common garbage collection algorithms
Compare the three basic garbage recovery algorithms described above:
Incremental Collecting algorithm
During garbage collection, the application will be in a state of high CPU consumption. In this high CPU consumption state, all threads of the application hang, suspending all normal work while the garbage collection completes. If garbage collection takes too long, the application will be suspended for a long time, which will seriously affect user experience or system stability.
Incremental algorithm a precursor of modern garbage collection, the basic idea is that you can alternate garbage collection threads with application threads if processing all the garbage at once requires a long pause on the system. Each time, the garbage collection thread collects only a small area of memory space, and then switches to the application thread. Repeat until garbage collection is complete. Using this approach reduces system downtime because application code is also intermittently executed during garbage collection. However, the overall cost of garbage collection increases due to the consumption of thread switches and context transitions, resulting in a decrease in system throughput.
Generational Collecting
Generational collector incremental collection is the embodiment of another, according to the properties of the garbage collection objects, different stages, the optimal way is to use the appropriate garbage collection algorithm used in this phase, a generational algorithm that is based on this idea, it will be the memory range according to the characteristics of the object into several blocks, based on the characteristics of each memory interval, using different recovery algorithm, To improve the efficiency of garbage collection. The Hot Spot VIRTUAL machine, for example, puts all newly created objects into an area of memory called the young generation. The young generation is characterized by fast collection of objects. Therefore, the efficient replication algorithm is selected in the young generation. When an object survives several collections, it is put into a memory space called the old generation. In the old days, almost all objects survived several garbage collections. Therefore, you can assume that these objects will be resident in memory for a period of time, or even for the entire life of the application. If the replication algorithm is still used to reclaim old generations, a large number of objects will need to be copied. In addition, the recycling cost ratio of the old generation is lower than that of the new generation, so this approach is not desirable. According to the idea of generation, different mark-compression algorithm can be used for the old generation to improve the efficiency of garbage collection.
Concurrent Collecting
So-called concurrent recovery algorithm which is refers to the garbage collector and the application to work alternately, concurrent collector was also suspended, but time is very short, it will not be in recovery from the start, to find, tags, clear, compression or copy process fully suspended, it found that there are several time is longer, one is the tag, Because of the recycling generally with old age, this area is usually very big, but in general most of the object should be alive, so mark time is very long, and there’s a time compression, compression is, but not necessarily every finish GC to compression, and copy does not generally used in old age, so for the time being not to consider; So what they came up with was this: First brief downtime is the pointer to the root of all the objects found, this is very easy to find, and very quickly, to find, after the GC start tag alive from the root node (parallel) can be used here, and then after the tag, the application at this time there may be new memory and be abandoned (Java release the concept itself has no memory). At this point, the JVM will record the increments in the process. For an old age, the increments in this period are usually very small, and the probability of its release is not very high. Pageout and pagein are not suitable for pageout and pagein. The JVM quickly marks the internal nodes based on this incremental information, which is also very fast, and can start reclaiming. Since there are not many nodes to kill, this process is also very fast. After a certain amount of time, the compression will do a special operation. If you don’t want it to, use quantification instead of scaling when setting the size of each region. When using concurrent recovery algorithm, for old s area, in general will not wait for memory is less than 10% recovery, when it is because of the concurrent collection is allowed when recycling is assigned, it may be too late, so complicated with recycling, the JVM might be started when I was around 68% of the old s GC.
conclusion
Finally, to summarize, we explored six garbage collection algorithms. The basic garbage collection algorithms are mark-sweep, mark-collation, and copy. In addition, three comprehensive garbage collection algorithms are discussed, namely incremental garbage collection algorithm, generational garbage collection algorithm and concurrent garbage collection algorithm.
My wechat official account “Java Architect Advanced Programming” focuses on sharing Java technology dry goods, looking forward to your attention!