preface
This article requires a basic knowledge of garbage collection, including common object survival algorithms and garbage collection algorithms, which will not be covered in this article because it focuses on understanding the details of Java garbage collection algorithms.
Algorithm implementation
The following is based on the HotSpot virtual machine.
Root node enumeration
We all know that the reachability analysis algorithm finds the reference chain from the COLLECTION of GC Roots, and the root node enumeration is to find the suitable nodes as GC Roots. Fixing the nodes that can be used as GC Roots is mainly in global references (such as constant or class static attributes) and execution contexts (such as local variable tables in stack frames). Due to the increasing size of Java applications today, the size of method areas alone can often be hundreds or thousands of megabytes, despite clear goals. But it’s not easy to be efficient in the lookup process.
To date, all collectors have had to “Stop The World” by suspending The user thread at The root enumeration step, one important reason being that The root enumeration must take place in a snapshot that guarantees consistency, often referred to as accuracy. When we can’t change the STW, what we can do is make enumeration more efficient and keep user thread pauses as short as possible.
-
Consistency: This means that the execution subsystem appears to be frozen at a point in time throughout enumeration, without changing the object reference relationship of the root node set during analysis.
-
Improve enumeration efficiency: Mainstream Java virtual machines use accurate garbage collection, so when a user thread pauses, the virtual machine should have a direct way of knowing where object references are stored, rather than having to check all execution contexts and global reference locations. In HotSpot’s solution, a set of data structures called OopMap are used to do this. Once the classes after completion of loading, the HotSpot will put any offset in the object is what type of data calculated, will be in a specific location to record and register in which position is quoted on the stack, so the collector during scanning can directly know this information, do not need a real starting area methods such as GC Roots without skipping.
Accurate garbage collection: For example, if there is a 32-bit integer 123456 in memory, the VM will be able to distinguish whether it is a reference type to the memory address 123456 or an integer with the value 123456. This is also a prerequisite for determining whether the data on the heap is still usable during garbage collection.)
safer
With the help of OopMap, HotSpot can do GC Roots enumerations quickly and accurately, but there is a very real problem: There are so many instructions that can cause the reference relationship to change, or the content of the OopMap to change, that generating an OopMap for each instruction would require a lot of extra storage space, and the space cost associated with garbage collection would become unbearably high.
It is true that HotSpot does not generate an OopMap for each instruction, as mentioned earlier, but only records the information at “specific locations” called safepoints. With a safe point, it is determined that the execution of a user program does not stop at any point in the code’s instruction stream to start garbage collection, but rather forces execution to stop at a safe point.
The criteria for selecting safe points are neither so small that the collector waits too long, nor so frequent that the memory load at runtime is excessively high, i.e. characteristics that make the program run for too long.
Another issue to consider for safe points is how to make all threads (not actually the thread executing the JNI call) run to the nearest safe point and then stop when garbage collection occurs. There are two options:
-
Presuspension: Preemptive interrupt does not require the execution code of the thread to cooperate actively. When garbage collection occurs, the system first interrupts all user threads. If it is found that the user thread is interrupted where it is not at the safe point, it will resume the execution of the thread and let it re-interrupt until it runs to the safe point. Few virtual machine implementations now use preemptive interrupts to suspend threads in response to GC events.
-
Voluntary Suspension: The idea of active interrupt is that when the garbage collection needs to interrupt the thread, it does not operate directly on the thread, but simply sets a flag bit. Each thread will take the initiative to poll this flag during the execution process. Once it finds that the interrupt flag is true, it voluntarily interrupts and suspends at the nearest safe point. Polling flags overlap with safe points, plus all created objects and other places where memory needs to be allocated on the Java heap to check if garbage collection is imminent and there is not enough memory to allocate new objects.
The safety area
The safe-point mechanism ensures that when a program executes, it will not be long before it reaches a safe point at which it can enter the garbage collection process, but what about when the program “doesn’t execute”?
So-called program execution is not allocated processor time, typical scenario is user threads in the Sleep state or Blocked state, the threads can’t respond to the virtual machine interrupt request, can’t go to a safe place to interrupt a hang oneself, the virtual machine also obviously could not continue waiting thread is activated again allocate processor time.
In this case, you need to introduce a Safe Region.
A security zone is the ability to ensure that reference relationships do not change within a piece of code, so it is safe to start garbage collection anywhere in the zone, or we can think of a security zone as a point of safety that has been stretched.
When a user thread executes code in the safe zone, it first identifies itself as in the safe zone, so that the virtual machine does not have to worry about threads that have declared themselves in the safe zone during the garbage collection period. When the thread wants to leave the safe zone, it checks to see if the virtual machine has completed the root enumeration (or any other phase of garbage collection that requires suspending the user thread). If so, the thread assumes that nothing happened and continues. Otherwise it must wait until it receives a signal that it can leave the safe area.
Memory sets and card tables
This section needs to be introduced with some pre-knowledge, which is the cross-generation reference question occasionally asked in an interview.
Two objects that reference each other should tend to live or die at the same time. For example, if a new generation object has a cross-generation reference, since the old age object is hard to die, the reference will keep the new object alive while collecting, and then be promoted to the old age as it grows older, and the cross-generation reference will be eliminated.
When collected Cenozoic (a Minor GC), assuming that the new generation of objects are old s reference, in order to point out the live objects in the area, had to in a fixed GC Roots, all objects in an extra traverse the entire old s to ensure the accuracy of the accessibility analysis results, it will no doubt bring memory recycling huge burden of performance. To solve the cross-generation reference problem, we only need to build a global data structure called “memory set” on the new generation.
A memory set is an abstract data structure for recording a collection of Pointers from a non-collection region to a collection region.
The memory set divides the old age into small chunks, identifying which chunk of memory in the old age will have cross-generation references, and then only objects in the small chunk containing cross-generation references will be added to GC Roots for scanning when a Minor GC occurs.
With that out of the way, let’s take a look at how HotSpot implements memory sets.
In a garbage collection scenario, the collector only needs to determine from the memory set whether a non-collection area has Pointers to the collection area, and does not need to know all the details of these cross-generation Pointers. When implementing a memory set, designers can choose a more coarse record granularity to save storage and maintenance costs of the memory set. Here are some of the record precision options available (and beyond) :
-
Word length accuracy: Each record is accurate to a word length (that is, the processor’s addressing bit, such as the common 32 or 64 bits, which determines the length of the pointer to the address of the physical memory that the machine accesses) that contains a pointer across generations.
-
Object precision: Each record is accurate to an object that has fields containing cross-generation Pointers.
-
Card accuracy: Each record is accurate to an area of memory that contains objects with cross-generation Pointers.
Among them, the third “Card accuracy” refers to the use of a called “Card Table” (Card Table) to achieve the memory set, which is currently the most common way to achieve a memory set.
The memory set mentioned in the previous definition is actually a kind of “abstract” data structure, which only defines the behavior intention of the memory set, but does not define the concrete implementation of its behavior, while the card table is a concrete implementation of the memory set, which defines the record accuracy of the memory set and the mapping relationship with the heap memory.
The simplest form of a card table can be just a byte array, and the HotSpot VIRTUAL machine does exactly that. The following line of code is HotSpot’s default card table markup logic:
CARD_TABLE [this address >> 9] = 0;
Copy the code
Each element of the byte array CARD_TABLE corresponds to a block of memory of a specific size in its identified memory area, which is called a Card Page. In general, the size of the card page is 2 to the NTH power of bytes. As you can see from the above code, the card page used in HotSpot is 2 to the 9th power of 512 bytes. The corresponding relationship between card table and card page is as follows:
The memory of a card page usually contains more than one object. As long as there is a cross-generation pointer to the field of one (or more) object in the card page, the value of the array element of the corresponding card table is marked as 1, which is called the element becomes Dirty, and it is marked as 0 if there is no such element. In the event of garbage collection, by screening out the dirty elements in the card table, it is easy to figure out which card page memory blocks contain cross-generation Pointers and add them to GC Roots for scanning.
Write barriers
We have solved the problem of how to use memory sets to reduce the range of GC Roots scans, but we have not solved the problem of how card table elements are maintained, such as when they get dirty and who gets them dirty.
The answer to when a card table element becomes dirty is clear. When an object in another generation region references an object in the region, the corresponding card table element should become dirty. In principle, the dirty point should occur at the moment when the reference type field is assigned.
So how to update the maintenance card table at the moment the object is assigned?
If it is interpreted bytecode execution, it is relatively easy to handle, the virtual machine is responsible for the execution of each bytecode instruction, there is plenty of room for intervention; But what about in a compilation execution scenario? The just-in-time compiled code is now pure machine instruction flow, so a means must be found at the machine level to put the maintenance of the card table into every assignment.
- Interpreted execution: program statements are interpreted and executed by the interpreter one by one, without the need to compile into machine instructions. Interpreted execution needs to be interpreted each time.
- Compile execution: program statements are translated into machine instructions by the compiler and then executed. Compile execution only needs to be compiled once.
The state of the card table is maintained in the HotSpot virtual machine using Write barriers. Write barriers can be thought of as an AOP aspect of the “assignment of a reference type field” action at the virtual machine level. When a reference object is assigned, an Around notification is generated for the program to perform additional actions, that is, before and after the assignment is covered by the write barrier. The part of the Write Barrier that precedes the assignment is called a pre-write Barrier, and the part that precedes the assignment is called a post-write Barrier.
Many collectors for the HotSpot VIRTUAL machine use write barriers, but until the G1 collector came along, other collectors used only post-write barriers. The following code is a simplified logic that updates the status of the card table:
Void oop_field_store(oop* field, oop new_value) {// Reference field assignment *field = new_value; Post_write_barrier (field, new_value); }Copy the code
Once the write barrier is applied, the VIRTUAL machine generates instructions for all assignment operations. Once the collector adds an update card table operation to the write barrier, it incurs extra overhead every time the reference is updated, regardless of whether the reference is old or new. However, this is much cheaper than the cost of scanning an entire old age in Minor GC.
In addition to the overhead of write barriers, card tables also face the problem of “False Sharing” in high concurrency scenarios.
Pseudo Shared when dealing with complicated with low-level details when a often need to consider the problem of modern central processor Cache system is based on the Cache Line (Cache Line) as the unit of storage, when multithreaded modify each independent variable, if these variables are just sharing a Cache Line, will influence each other (write back, invalid or synchronous) and cause performance degradation.
Assume that the processor’s cache row size is 64 bytes, and since one card table element is 1 byte, all 64 card table elements will share the same cache row. The total memory of the card pages corresponding to the 64 card table elements is 32KB (64×512 bytes), which means that if the objects updated by different threads are in the 32KB memory area, the same cache row will be written when the card table is updated, affecting performance.
To avoid the pseudo-sharing problem, a simple solution is not to use an unconditional write barrier, but to check the card table tags first and mark the card table elements as dirty only if they are not marked out of date, such that the card table update logic becomes the following code:
if (CARD_TABLE [this address >> 9] ! = 0) CARD_TABLE [this address >> 9] = 0;Copy the code
After JDK 7, the HotSpot virtual machine added a new parameter -xx :+UseCondCardMark that determines whether to enable conditional judgment for card table updates. Enable will increase the cost of an extra judgment, but can avoid the pseudo sharing problem, both have performance loss, whether to enable according to the actual operation of the application to test trade-offs.
Concurrency accessibility analysis
As we all know, garbage collectors in mainstream programming languages basically rely on reachability analysis algorithms to determine whether objects are alive or not. Reachability analysis algorithms theoretically require that the whole process of analysis be based on a consistent snapshot, which means that the user thread must be frozen throughout the operation.
In the root enumeration step, because GC Roots is still a small number of objects compared to the total number of objects in the entire Java heap, and thanks to various optimization techniques (such as OopMap), the pauses are very short and relatively fixed (not growing with the heap capacity). The pause times for this step must be directly proportional to the size of the Java heap: the larger the heap, the more objects are stored, the more complex the structure of the object graph, and the longer the pause times for marking more objects, which sounds like a natural thing to do.
To know contains “tags” phase is a common characteristic of all garbage collection algorithms track type, if this phase will increase as the pile of proportion bigger and pause time, its impact will spread to almost all of the garbage collector, similarly, if you can cut this part pauses, the earnings will be systematic.
In order to solve or reduce user thread pauses, it is necessary to ask why the object graph must be traversed on a consistent snapshot. In order to explain this problem clearly, we introduced tri-color Marking as a tool to assist derivation, and marked the objects encountered in the process of traversing the object map into the following three colors according to the condition of “whether they have visited” :
-
White: indicates that the object has not been accessed by the garbage collector. At the beginning of the reachability analysis, all objects are white, and at the end of the analysis, all objects are still white, which means unreachable.
-
Black: Indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object indicates that it has been scanned, and it is safe and survivable. If there is another object reference pointing to the black object, there is no need to scan again. A black object cannot point directly to a white object (without passing the gray object).
-
Gray: Indicates that the object has been accessed by the garbage collector, but at least one reference to the object has not been scanned.
During the scan of the reachable analysis, if the user thread is frozen at this point and only the collector thread is working, there should be no problem. But what if the user thread works concurrently with the collector? The collector marks colors on the object graph while the user thread changes the reference relationship, which can have two consequences. One is to mismark dead objects as alive, which is not a good thing, but is actually tolerable, and just creates a bit of floating garbage that escapes the current collection and can be cleaned up next time. The other is to mismark a previously alive object as dead. This is a fatal consequence, and the program is bound to fail because of this. The following shows how this fatal error can occur.
As you can see from the diagram above, the problem of “object disappearance” occurs if and only if the following two conditions are met, that is, an object that should be black is mistakenly marked as white:
- The assignor inserts one or more new references from the black object to the white object;
- The assigner removes all direct and indirect references from gray objects to white objects.
Therefore, to solve the problem of objects disappearing during scanning, we only need to break either of these two conditions. This leads to two solutions: incremental updates and raw snapshots.
-
Incremental update: Incremental update destroys the first condition. When a black object inserts a new reference to a white object, the newly inserted reference is recorded. After the concurrent scan, the black object in these recorded reference relationships is scanned again. This can be simplified as: a black object becomes a gray object once a new reference to a white object is inserted.
-
Original snapshot: The original snapshot destroys the second condition. When a gray object wants to delete a reference to a white object, the reference to be deleted is recorded. After the concurrent scanning is complete, the gray object in the recorded reference relationship is scanned again. This can also be simplified as: whether the reference relationship is deleted or not, the search is based on the snapshot of the object graph at the moment when the scan starts.
No matter the above reference records are inserted or deleted, the recording operations of the virtual machine are implemented through write barriers. In the HotSpot VIRTUAL machine, both incremental update and raw snapshot solutions have practical applications. For example, CMS is based on incremental updates for concurrent marking, G1 and Shenandoah are implemented using raw snapshots.
conclusion
Here, I briefly introduced how the HotSpot VIRTUAL machine initiates the memory collection, how to speed up the memory collection, and how to ensure the correct collection. If you are interested in the details of how a virtual machine performs a memory collection operation, you can find out for yourself, because it depends on which garbage collector the virtual machine uses, and there are often multiple garbage collectors in a virtual machine.