“This is the fifth day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
\
Garbage collection algorithm – three – color tag details
The problem of concurrent markup
The CMS garbage collection algorithm uses three-color markup, which is illustrated with the CMS garbage collection example. The CMS garbage collection process is as follows:
There are 5 steps: initial mark, concurrent mark, re-mark, concurrent cleanup (including: concurrent cleanup, thread reset). Both The initial tag and re-tag Stop The World. During concurrent marking, because the application thread continues to run during the marking, references between objects may change, and multiple marks and missing marks may occur.
Two, what will be multi – mark – floating garbage?
Under what circumstances return multi-mark? To analyze the multi-index case. The diagram below:
In the process of concurrent marking, we find all GC roots from the stack, so we find the Math object, and at this point, the Math object has a space in the heap that is also swimming pool, which means that they are not garbage. However, while the application thread continues to execute, it is possible that the Math object is no longer referenced, and Math becomes garbage, but the heap space is not marked as garbage, so it is not collected when it is collected. That’s the multi-object case.
What are the consequences of multiple bids? Is to generate floating garbage.
What do you do when you have multiple targets? In fact, you don’t have to do anything special, wait for the next garbage meeting, re-mark, this space will be recycled.
Floating garbage: During concurrent marking, a part of the local variable (GC Root) is destroyed due to the end of the method, and the object referenced by GC Root was scanned by the garbage collector and marked as non-garbage, so the current GC does not reclaim this part of the memory. This part of memory that should be collected but is not is called “floating garbage.”
Floating garbage does not affect the correctness of the garbage collection, it just has to wait until the next garbage collection to be removed. In addition, for new objects created after the start of concurrent markup (and concurrent cleanup), it is common practice to treat them all as black and not clean them in this round. This part of the object may also become garbage in the meantime, which is also part of the floating garbage.
Three, what will be less mark missed mark – three color mark?
To deal with multiple and missing tokens, we introduced “tricolor tokens,” which fall into three categories of objects encountered during traversing object tokens GC Root through reachability analysis. These three objects are marked with different colors: “black”, “gray”, and “white”. What do they mean?
- Black: Indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object indicates that it has been scanned. It is a safe and viable object. If other object references the black object, you do not need to scan again. A black object cannot point directly to a white object (without passing the gray object).
- Gray: Indicates that the object has been accessed by the garbage collector, but at least one reference to the object has not been scanned.
- White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all objects are white. If at the end of the analysis, all objects are still white, that is, they are unreachable.
The following is an example to analyze the color of the object.
public class ThreeColorRemark { public static void main(String[] args) { A a = new A(); D dd = A.B.D; a.b.d = null; a.d = dd; } } class A { B b = new B(); D d = null; } class B { C c = new C(); D d = new D(); } class C { } class D { }Copy the code
There are four objects in here, A, B, C and D. In the main method, we first new an A object. The a object is a GC Root and will be marked as GC Root when initially marked. Suppose, when we enter the concurrency phase, we have just finished executing A A = new A(); What color should A be?
Analyzing the code above, we need to freeze time.
A A = new A(); B b = new B(); D d = null; C c = new C(); D D = new D(); Phase.
Let’s assume an extreme case here. The two member variables b and D in object A, which execute b first, point to the address of new b () in the heap. D does not refer to any object, so no instantiation is required. So the two member variables in object A are all iterated over, so object A will be marked black. Black means that the garbage collector scans this object and all reference objects in this object, and it is clear that object A is not garbage at this point. It won’t be recycled.
When the execution b object refers to the new b () object, there are two reference objects in the B object, respectively c and D. So let’s say we’re at C C = new C(); The c object points to a reference to new C() in the heap. So at that moment, that’s C C = new C(); D D = new D(); B is gray at the time of this code. Gray indicates that the object has been scanned by the garbage collector, but not all references in it have been scanned. In this case, the object should be the target of the next scan and cannot be collected. C is black, because there are no objects in C, it has been scanned completely.
At the same time (execute C C = new C(); D D = new D(); D = new D(); D = new D(); So D is white, which means it hasn’t been scanned yet. All objects are white in the beginning, that is, A, B, C, and D are all white. After the garbage collector has scanned all objects, some objects are still white, which means the garbage collector can’t scan them, which is garbage and will be collected.
Let’s look at the state of each object in time freeze.
Note that the above is frozen at a certain point in the GC process. The whole GC doesn’t end, so B is gray and D is white just for that frozen moment.
Summary: Black indicates that the GC has been parsed, gray indicates that it has not been parsed, and white indicates that it has not been parsed. When all the GC is done and there are still objects that are white, these objects are unreachable objects and are the objects to recycle.
At this point, another line of code is executed. a.b.d=null; That’s when object A adds a reference to d. Object B’s reference to D is broken. The diagram below:
What happens at this point? When the garbage collector scans, black object A will not be scanned again, and the target object to be scanned again is gray object B. At this point, B no longer refers to D, so all objects of B have been scanned and turned black. And d? D is still referenced by A, but the garbage collector won’t scan for black objects, so it won’t know that D is still referenced by A. At this point, D will remain a white object and will remain a white object and will not be scanned by the garbage collector. In this case, D will be removed as garbage. D is not a garbage object. This is an error deletion. This would have happened in earlier versions of the JVM, but it rarely happens now.
In fact, this missing mark problem can also be solved by code:
D dd = A.B.D; a.b.d = null; a.d = dd;Copy the code
First, we take A.B.D out and assign it to a new object, then remove the reference from a.b to d and set a.d=d. This way d is not recycled as a garbage object, because there is a root object that refers to object D.
The above is addressed at the code level. Is there a way to solve this missing label problem from the underlying JVM?
Solve the problem of missing marks from the underlying JVM
Missing marks can cause the referenced object to be cleaned up as garbage, which can cause serious bugs. There are two main solutions to this problem:
- One is Incremental Update.
- The other is Snapshot At The Beginning (SATB).
4.1 Incremental Update
Incremental updates, by their name, handle new references. Here’s the definition:
When a black object inserts a new reference to a white object, the newly inserted reference is recorded. After the concurrent scan, the black object in these recorded reference relationships is scanned again. This can be simplified to say that a black object becomes a gray object as soon as a new reference to a white object is inserted.
Object that is to say, in a black added a pointer to a white object reference, this reference will be recorded, there will be a collection, designed to put a black object added to white object reference, does not process in concurrent markers, such as concurrent markers, enter re-mark phase, to mark process will Stop The World, In processing the collection, rescan the black objects in the reference relationship as the root. This time, the white objects will become black objects or gray objects and will not be garbage collected.
4.2 Original Snapshot
Original snapshot, not new object processing, but the original object processing, let’s look at the definition:
When the gray object wants to delete the reference relation to the white object, the reference to be deleted will be recorded. After the concurrent scanning, the gray object in these recorded reference relation will be scanned again, so that the white object can be scanned. By marking white objects as black (so that they can survive the current GC cleanup and be rescan in the next GC, which may also be floating garbage), the virtual machine records are recorded through write barriers, whether they are inserted or deleted from reference records.
Take a look at this illustration:
When a reference from object B to object D is deleted, a snapshot of the reference to be deleted is saved and placed in the collection. How is the reference from figure B to D cleared? An assignment is made:
a.b.d = null;
Copy the code
That is, when the assignment is performed, the assignment is paused, the write barrier operation is performed, the reference to be deleted is extracted, stored in a collection, and the assignment is performed again. Then, the next time you re-mark the collection, rescan the gray objects in these reference relationships as roots, so that you can scan for white objects and mark all of them as black objects. The object marked black is intended to survive the current garbage collection and be rescanded for the next gc, which may be floating garbage.
4.3 write barriers
Both incremental updates and raw snapshots are implemented through write barriers.
Incremental updates and original snapshots are both operations on references, one to add and one to delete, and either way, they are eventually collected into the collection. So how do you collect them? It’s just throwing a reference into the collection before or after an assignment. Doing something before or after an assignment is called the operation barrier of the code.
For example, to assign a value to a member variable of an object, the underlying code looks something like this:
/** * @param field Member variable of an object, such as A.B.D * @param new_value new value, Null */ voidoop_field_store(oop*field,oopnew_value){*field = new_value; // Assignment}Copy the code
Write barriers refer to some processing before and after assignment (see AOP’s concept):
voidoop_field_store(oop*field,oopnew_value){ pre_write_barrier(field); // Write barrier ‐ pre-write operation *field = new_value; post_write_barrier(field, value); // write barrier ‐ post operation}Copy the code
- Write barriers implement raw snapshots
The original snapshot is the deletion of a reference from the record. For example, when executing A.B.D =null, write barrier is used to record the reference object D of the original B member variable:
Void pre_write_barrier(oop*field){oop old_value = *field; // Get the old value remark_set.add(old_value); // Record the original reference object}Copy the code
- Write barriers implement incremental updates
When the reference of the member variable of object A changes, such as the new reference (a.d = d), we can use write barrier to record the reference of the new member variable of object A to object D:
void post_write_barrier(oop*field,oopnew_value){ remark_set.add(new_value); // Record the newly referenced object}Copy the code
These are both barrier code, one executed before writing and one executed after writing. Delete is performed before write and assignment is performed after write.
How does hotspot source code implement the write barrier? Find oop. Inline-hpp
*/ template <class T> inline void oop_store(volatile T* p, oop v) {update_barrier_set_pre((T*)p, v); // cast away volatile // Used by release_obj_field_put, so use release_store_ptr. oopDesc::release_encode_store_heap_oop(p, v); update_barrier_set((void*)p, v); // cast away type }Copy the code
This is an assignment. update_barrier_set_pre((T )p, v); Is a write barrier, update_barrier_set((void)p, v); It’s a post-write barrier. That is, an operation code is added before and after the assignment. In fact, you can see that this code is very similar to our pseudocode. The name is different, but the meaning is the same.
Take a look at how SATB implements write barriers in hotspot source code.
void G1SATBCardTableModRefBS::enqueue(oop pre_val) {
// Nulls should have been already filtered.
assert(pre_val->is_oop(true), "Error");
if (!JavaThread::satb_mark_queue_set().is_active()) return;
Thread* thr = Thread::current();
if (thr->is_Java_thread()) {
JavaThread* jt = (JavaThread*)thr;
jt->satb_mark_queue().enqueue(pre_val);
} else {
MutexLockerEx x(Shared_SATB_Q_lock, Mutex::_no_safepoint_check_flag);
JavaThread::satb_mark_queue_set().shared_satb_queue()->enqueue(pre_val);
}
}
Copy the code
Satb_mark_queue_set ().shared_satb_queue()->enqueue(pre_val); Put the old value into the queue. Why would you put it in a queue? For efficiency. Since it is a write operation, adding logic before and after the write operation will affect the efficiency of the original code. In order to avoid the impact on the source code, it is put into a queue for processing.
4.4 read barrier
oopoop_field_load(oop*field){ pre_load_barrier(field); ‐ return *field; }Copy the code
D = A.B.D; D = A.B.D; D = A.B.D;
voidpre_load_barrier(oop*field){ oop old_value = *field; remark_set.add(old_value); // Record the read object}Copy the code
Modern traceable (reachable) garbage collectors almost all borrow the algorithmic idea of tricolor tagging, Although this is implemented in different ways: white/black collections are not generally present (but there are other colors), gray collections can be implemented in stacks/queues/cache logs, traversal in breadth/depth, and so on.
Five, all kinds of garbage collector to deal with the missing mark
For read/write barriers, using Java HotSpot VM as an example, the following is the solution to handle missing flags when marking concurrently:
- CMS: Write barrier + incremental update is used
- G1: Write barrier + Raw juice Snapshot (SATB)
- ZGC: The read barrier is used
In engineering implementation, read/write barriers have other functions, such as write barriers to record cross-generation/region reference changes, read barriers to support concurrent execution of moving objects, etc. Beyond functionality, there are performance considerations, so each garbage collector has its own idea of which one to choose.