Small knowledge, big challenge! This paper is participating in theEssentials for programmers”Creative activities.
This article focuses on three questions: which memory needs to be reclaimed, when to reclaim it, and how to reclaim it. Keep reading and you’ll find out.
The Java runtime memory area can be divided into program counters, virtual machine stacks, local method stacks, Java heap, method areas, and runtime constant pools. The program counter, a virtual machine stack, local stack method these three areas to exist with the existence of the thread and during the program compiled to determine, so we are talking about the garbage collection does not consider them, main consideration is the Java heap and methods of the area, this program is running is dynamically allocated, is also the main focus on the garbage collector area.
Once you’ve determined what memory needs to be reclaimed, let’s look at when.
We often use reference counting and reachability analysis to determine whether an object is available. However, there is still a long way to go between being available and being recycled. The recycling is based on whether the object is “dead” or not. There is a process to finalize the dead object from the unusable object.
Reference counting algorithm
The algorithm basically adds a reference counter to the object and increments it by one every time a reference is made to it, and decays it by one when the reference is dead. A counter of 0 means that there are no references, that is, the object can no longer be used.
This method is easy to implement and efficient to determine, but there is a problem, that is, objects reference each other, at this time the reference counter is not zero, but there are no valid references.
Accessibility analysis algorithm
In data structures we use reachability to indicate whether a point in a graph is reachability. The analogy here is that we start with GC Roots, and if we can’t reach an object, the object is unavailable. What’s disgusting here is GC Roots?
In the Java language, there are several objects that can be used as GC Roots
1 Object referenced in the VM stack.
The object referenced by the class static attribute in the method area.
The object referenced by the constant in the method area.
4 Native method stack, objects referenced in Native methods.
At this point we’ve just determined that the objects are not available, but not so much that the GC can’t reclaim them yet.
In the reachabability analysis algorithm, to determine that an object is dead, it must go through at least two marking processes. If there is no chain of references connecting the object to GC Roots, it will be marked first and filtered once.
If you don’t want the object to die so fast, you can refer to the object again in Finalize method, and you can point this object to a variable. This makes the object reachable.
If we do not rewrite Finalize method, or if the virtual machine has implemented Finalize method, then there is no chance. Simply speaking, we are waiting to die. Because the second time you mark it up, well, you still don’t have a reference chain, it’s officially recycled.
If we overwrite the Finalize method of objects, we will put them in an F-Quene queue and wait for the Finalizer thread created by a virtual machine to execute the Finalize method. There is no guarantee that the method will be executed. It should be executed on a timeshare basis, otherwise if an infinite loop occurs in one of the methods, the subsequent methods will not be executed.
You see, Finalize method really does not give up, but it is not recommended to use this method, because there is too much uncertainty. First, it will only be executed once, and second, it is not guaranteed whether it can be executed sequentially. Also, don’t think about releasing resources in this method. Try catch is better.
Review which memory needs to be reclaimed, the heap, and the method area. When to recycle, when the object is dead, to determine whether the object is dead, first determine whether the object is not available + two flags.
However, it is a pity that the recycling points mentioned above are all in the heap, and we also need to recycle the method area. As we all know, the recycling efficiency in the method area (permanent generation) is relatively low. The recovery efficiency of the new generation in the heap is about 75 to 90 percent.
The collection of the permanent generation consists of two main parts: discarded constants and useless classes.
The conditions for recycling discarded constants are similar to those for objects in the heap. If there is no place to reference the constant, it may be recycled. Determining whether a class is useless is much more complicated and requires both:
1 All instances of the class have been reclaimed, that is, there are no instances of the class in the Java heap.
2 The ClassLoader that loaded the class has been reclaimed.
3 The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.
If the virtual machine meets the above conditions, it does not mean that the class will be recycled. It only proves that the class is useless. The HotSpot virtual machine provides -xnoClassGC parameter control on whether to recycle the class.
In scenarios where reflection, dynamic proxy, CGLib, dynamic JSP generation, and OSGI frequently define classloaders are widely used, the virtual machine is required to have the ability to unload classes to ensure that the permanent generation does not overflow.
It is now clear which memory needs to be reclaimed, the Java heap and the method area. And we know when to recycle. Here’s how to recycle.
Mark-clear algorithm
The algorithm is divided into two stages, marking and clearing. It is a basic algorithm with the following shortcomings: first, the efficiency of marking and clearing is not high; second, the space problem, a large number of discontinuous memory fragments will be generated after the clearing of marks. Space debris can lead to insufficient space for allocating large memory objects and having to perform another GC.
Replication algorithm
To solve the efficiency problem, the replication algorithm emerged, which divides the available memory into two equally sized pieces by capacity, using only one piece at a time. When this block of memory is used up, surviving objects are copied onto another block. The problem, you might notice, is that the cost is so high that the memory size shrinks to half.
In practice, commercial virtual machines use this algorithm to recycle the new generation. IBM research shows that 98 percent of the new generation of objects are ephemeral, so we don’t need a 1:1 memory allocation.
Instead, the actual space is divided into one Eden and two Survivor, and the memory ratio of Eden and Survivor is 8: 1. Each time we move one Eden and one Survivor to another Survivor, so that our memory usage reaches 90%. However, there is still a problem. In this case, there will be a permanent generation as memory guarantee, not enough to allocate directly in the permanent generation.
Here to explain, mark – clear, replication algorithm, this is just a kind of guiding ideology, GC to implement what specific virtual machine, is not the same, and the virtual machine itself is using a generational collection (new generation (new generation is divided into a Eden, two Survivor) and old s) algorithm, different s, The algorithms used are also different.
Mark-collation algorithm
Object replication algorithm in survival rate is higher in recycling of the region is frequent to copy operation, efficiency is lower, and one more thing, must have a space to guarantee (new generation have old s do guarantee, must not use the old s algorithm), based on the characteristics of old age, mark – in the sorting algorithms, sorting algorithms and marking cleared, The first is to mark the recyclable object, but the collection does not leave debris behind. It moves the living object to the side and cleans up the rest of the space, which is a good solution to the mark-clean fragmentation problem.
Generational collection algorithm is to divide the memory into different areas according to the different life cycle of the object, generally divided into the new generation and the old age, and select the appropriate collection algorithm according to the characteristics of different ages.
For example, if the new generation of objects come and die quickly, the replication algorithm is suitable. In the old days when objects died slowly and there was no extra space to guarantee them, use mark-clear or mark-tidy.
Above, we have described all of GC’s capabilities algorithmically and logically. That is, what memory needs to be reclaimed, when it needs to be reclaimed, and how. However, we are not talking about GC implementation for a specific virtual machine. I’m going to stop there and understand the principles, but it will be much easier to understand the implementation.