Welcome to github.com/hsfxuebao/j… , hope to help you, if you feel ok, please click on the Star

preface

This paper mainly introduces the basic idea of tricolor labeling, floating garbage caused by multi-labeling, and the treatment scheme of missing labeling (read-write barrier).

1. A brief review of garbage collection

There are basically several garbage collection algorithms: mark-clean, mark-copy, and mark-tidy. On this basis, generations (new generation/old generation) can be added, and different recycling algorithms are adopted for each generation to improve the overall distribution and recycling efficiency. No matter which algorithm is used, tagging is always a necessary step. This is the adjustment of course, you do not find the garbage, how to recycle?

The garbage collector works as follows:

  1. Mark which objects are alive and which are garbage (recyclable);
  2. Recycle (clean/copy/collate) and update references if objects have been moved (copy/collate).

This article focuses on the markup section.

2. Three-color marking method

2.1 Basic Algorithm

To find the living object, according to the accessibility analysis, the traversal visit starts from GC Roots, and the reachable objects are the living objects:

Final result: A/D/E/F/G reachable

We mark the objects encountered in the process of traversing the object graph in the following three colors according to the condition of “whether they have been visited” :

  • White: not yet visited.
  • Black: This object has been accessed, and all other objects referenced by this object have also been accessed.
  • Gray: This object has been accessed, but other objects referenced by this object have not been accessed. When all access is complete, it turns black.

The first step:

The second step:

Step 3:

Step 4:

Step 5:

Three-color marking traversal process

Suppose we now have white, gray, and black (representing the color of the current object), and the traversal access process is as follows:

  1. Initially, all objects are in the White set;
  2. Move objects directly referenced by GC Roots to the grey set.
  3. Get objects from grey set: 3.1. Move all other objects referenced by this object to [Grey set]; 3.2. Move this object into the Black Collection.
  4. Repeat step 3 until [grey set] is empty.
  5. After the end, objects still in the [white set] are unreachable by GC Roots and can be recycled.

Note: If the object remains white after the marking ends, it is “lost” and cannot be re-referenced.

When Stop The World (hereinafter referred to as STW), references between objects do not change and can be easily marked. When you need to support concurrent tagging, where application threads continue to run during tagging, references between objects can change, and multiple markers and missing markers can occur.

2.2 Multi-label – Floating garbage

Objd. fieldE = null if E is already iterated (turned grey) :

The reference of D > E is broken

After this point, object E/F/G is “supposed” to be recycled. However, since E has turned gray, it will continue to be traversed as a living object. The end result is that this part of the object is still marked as alive, meaning that this GC does not reclaim this part of the memory.

This part of memory that should be collected but is not is called “floating garbage.” Floating garbage does not affect the correctness of the application, it just needs to wait until the next garbage collection to be removed.

In addition, it is common practice to treat all new objects after the concurrent marking starts as black, and this round will not clean them up. This part of the object may become garbage during this period, which is also part of the floating garbage.

2.3 Mark leakage – Read/write barrier

Assuming that the GC thread has traversed E, the application line executes first:

var G = objE.fieldG; objE.fieldG = null; Objd. fieldG = G; // Black D references white GCopy the code

E > G is disconnected, and D references G

At this point, we cut back to the GC thread and continue running. Since E has no reference to G, we do not place G in the grey set. Although G is rereferenced for D, because D is already black, the traversal will not be done again. The end result is that G stays in the white collection and is eventually removed as garbage. This directly affects the correctness of the application and is unacceptable.

It is not difficult to analyze that missing marks can occur only when the following two conditions are met: Condition 1: the gray object disconnects the reference of the white object (direct or indirect reference); That is, the reference to the original member variable of the gray object has changed. Condition 2: The black object re-references the white object. That is, the black object member variable adds a new reference.

From a code perspective:

var G = objE.fieldG; Obje. fieldG = null; Objd. fieldG = G; / / 3. WriteCopy the code
  1. Read a reference to the member variable fieldG of object E, that is, object G;
  2. Object E writes null to its member variable fieldG.
  3. Object D writes object G to its member variable fieldG;

We just need to do some manipulation in any of the three steps above, record object G, and then iterate over it as a gray object. For example, put it into a particular collection and wait for the initial GC Roots to traverse (concurrent marking), then the objects of the collection will traverse (relabelling).

STW is needed for relabelling, because if the application keeps running, the collection may keep adding new objects and never run out. Of course, it is also possible to run most of the collection first during concurrent tagging, thus shortening the time to re-tag the STW, which is an optimization problem.

Write barriers are used to intercept the second and third steps; The reading barrier is the first step in interception. The purpose of their interception is simple: to record object G before and after reading and writing.

2.3.1 Store Barrier

When assigning a value to a member variable of an object, the underlying code looks something like this:

/** * @param field Member variable of an object, such as d.fildg * @param new_value new value, Void oop_field_store(oop* field, oop new_value) {*field = new_value; // Assignment}Copy the code

Write barriers refer to some processing before and after assignment (see AOP’s concept) :

void oop_field_store(oop* field, oop new_value) { pre_write_barrier(field); *field = new_value; post_write_barrier(field, value); // write barrier - write after operation}Copy the code

(1) Write barrier + SATB

When a reference to a member variable of object E changes (obje.fieldg = null;) , we can use write barrier to record the reference object G of the original member variable of E:

void pre_write_barrier(oop* field) { oop old_value = *field; // Get the old value remark_set.add(old_value); // Record the original reference object}Copy the code

Before the reference to the original member variable changes, record the original reference object. Try to preserve The initial object map, i.e. Snapshot At The Beginning (SATB), when GC Roots At a certain moment are determined, The object map At that moment is already determined. For example, if D refers to G at that time, then subsequent markers should follow the object graph at that time (D refers to G). If there is a change during that time, it can be recorded to ensure that the tag still looks like the original view.

It is worth mentioning that the operation of scanning all GC Roots (the initial tag) usually requires a STW, otherwise it may never be finished, as new GC Roots may be added during concurrency.

SATB breaks condition 1: [the gray object disconnects the reference to the white object], thus ensuring that the label will not be missed.

A quick tip: If you are not in the concurrent marking phase of garbage collection, or have already been marked, there is no need to record, so a simple judgment can be added:

Void pre_write_barrier(OOP * field) {$gc_phase == GC_CONCURRENT_MARK &&! isMarkd(field)) { oop old_value = *field; // Get the old value remark_set.add(old_value); // Record the original reference object}}Copy the code

(2) Write barrier + incremental update

When a reference to a member variable of object D changes (objd.fieldg = G;) , we can use write barrier to record D’s new member variable reference object G:

void post_write_barrier(oop* field, oop new_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { remark_set.add(new_value); // Record the newly referenced object}}Copy the code

The idea is that instead of keeping the original snapshot, new references are logged for iteration, known as Incremental Update.

Incremental updates break condition two: the black object re-references the white object, thus ensuring that no labels are missed.

2.3.2 Load Barrier

oop oop_field_load(oop* field) { pre_load_barrier(field); Return *field; }Copy the code

The read barrier is directed at the first step: var G = obje.fieldg; When reading a member variable, record it:

void pre_load_barrier(oop* field, oop old_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { oop old_value = *field; remark_set.add(old_value); // Record the read object}}Copy the code

This is conservative, but safe. Since the black object re-references the white object in condition 2, the premise of the re-reference is that the white object is obtained, at which point the read barrier is in effect.

2.4 Tricolor labeling and modern garbage collectors

Accessibility of modern track type (analysis) of the garbage collector almost all learning algorithm thought from three color marker, although different implementation: such as white/black collection generally won’t appear, but there are other reflected color), gray collection can be through the stack/queue/cache in the form of a log and implementation, traversal method can be a breadth traversal/depth and so on.

For read/write barriers, using Java HotSpot VM as an example, the following is the solution to handle missing flags when marking concurrently:

  • CMS: Write barrier + incremental update
  • G1: Write barrier + SATB
  • ZGC: Read barrier

In engineering implementation, read/write barriers have other functions, such as write barriers to record cross-generation/range reference changes, read barriers to support concurrent execution of moving objects, etc. Beyond functionality, there are performance considerations, so each garbage collector has its own idea of which one to choose.

It is worth noting that incremental updates used in CMS also require rescan of GC Roots during the relabelling phase, in addition to traversing the write barrier records. This is because CMS does not add write barriers for instructions such as astore_x. For details, see here.

The resources

The design and implementation of a new generation of garbage collector (ZGC) in Java Virtual Machine (Java Virtual Machine)