This is the 35th original article in why Technology

The picture above was taken in a hutong near the Drum Tower when I was still drifting north.

It had just rained that day. When I passed this place, I was attracted by the colorful door panels and bicycles for a moment, so I took this picture. When I see this picture I am very happy, more vivid, more comfortable picture ah.

In future articles, my first photo will be taken by myself at any time. Share life, share technology, lol.

All right, back to the article.

In this article, we talk about the JVM. The JVM is a must have skill for an interview. It’s on the resume. Ask questions. It’s not on the resume, but it should be mentioned.

Let’s start with a simple warm-up question that leads to what we want to share in this article.

When it comes to the JVM part of the interview, the interviewer will most likely ask you how the JVM determines which objects should be recycled.

This kind of classic interview question certainly difficult you.

You can blurt out the reference counting algorithm and the reachability analysis algorithm.

And then you stopped? Don’t you know that after you’ve answered one sentence, the interviewer is sure to ask you to elaborate? So don’t stop. Be proactive. Be proactive in the interview. You want to take advantage of the valuable opportunity that the interviewer has to give you the right to speak, go ahead, you have to stand up

Because reference counting works like this: You add a reference counter to the object, incrementing it by one every time a place references it; When the reference is invalidated, the counter value is subtracted by one; An object whose counter is zero at any time is unlikely to be used again.

But there’s a problem with this algorithm. What is it?

Ask yourself a question and answer yourself. Let the interviewer listen to leng leng leng.

It just doesn’t solve the circular dependency problem.

With the paper and pen I prepared, I quickly drew the following diagram:

Object 1 and Object 2 can both be collected, but they also refer to each other, so their respective counters are 1, and they will not be collected.

Therefore, the Java virtual Machine does not use reference counting. It uses the reachability analysis algorithm.

The idea of the accessibility analysis algorithm is to use a series of “GC Roots”, that is, the root object is taken as the starting node set. Starting from the root node, search downward according to the reference relationship. The path taken in the search process is called reference chain.

In graph theory terms, when the object is unreachable from GC Roots, it proves that the object can no longer be used. So this object is an object that can be reclaimed.

When you say this, again, quickly draw the following picture on the paper:

Good, here can give the voice to the interviewer. Because at this point, there are so many points he can ask next, you don’t know what he’s going to ask, like:

You just talked about the root node, do you know which objects can be the root object?

You just talked about references, do you know what kinds of references are in Java?

Student: You just talked about the reachability analysis algorithm, so if an object is determined to be unreachable in that algorithm, does it have to be reclaimed?

Talk about a garbage collector you are familiar with and how they work.

.

These questions are so common that there are several of them in any noodle.

This article addresses the slightly less common, but sure to mention, “concurrent marking” and “floating garbage.”

CMS and G1 both have a process of concurrent tagging. What problem does concurrent tagging solve? What’s the problem? How to solve these problems?

What problem does a concurrent tag solve?

The reachability analysis algorithm we just talked about requires a theoretical premise: the entire algorithm needs to be based on a consistent snapshot, which means that the user thread must be frozen throughout.

In order not to freeze the user thread, we need to run the garbage collection thread and the user thread at the same time.

So let’s prove it by contradiction, assuming that there is no concurrent flag, that is, only the garbage collection thread is running in the process:

The first step is to find the root node, also known as the root node enumeration.

In this process, since GC Roots are much smaller than the total number of objects in the Java heap, and thanks to optimization techniques like OopMap, the pause time is very short and relatively constant, meaning that it does not increase as the number of objects in the heap increases. That’s roughly what this picture means:

But we’re done with the root node enumeration, just the first step. Next, we need to continue the process of “marking” through the object graph from GC Roots. The pause time of this step necessarily increases as the number of objects in the Java heap increases. That’s roughly what this picture means:

The logic is not complicated: the heap is larger, the more objects you store, the more complex the object graph structure is, and the more objects you have to mark, so the pause time is naturally longer.

So, from the above analysis, we know that the root node enumeration phase is not very time consuming and does not increase as the number of objects stored in the Java heap increases. The time of the markup process increases as the number of objects stored in the Java heap increases.

The “tag” phase is a phase that is common to all garbage collectors that use reachability analysis algorithms. We can therefore see that if the pause time in this part of the “marking” process can be reduced, the benefits will be substantial.

So what problem do concurrent tags solve?

This is the part of the pause time to reduce. That is to have the garbage collector and the user thread running and working concurrently. This is what we call the phase of concurrent markup.

What’s the problem with concurrent markup?

Before we talk about what the problem is, let’s get one thing straight:

Why do you have to traverse an object graph in a snapshot to ensure consistency?

And to illustrate that, we’re going to introduce the trichromatic notation. Note: “tricolor” is also a test for the JVM.

What is “tricolor”? In Understanding the Java Virtual Machine (Version 3), this is how it is described:

In the process of traversing the object graph, the objects that have access to the capital are marked with the following three colors according to the condition of “have they been accessed” :

White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all the objects are white. If the objects are still white at the end of the analysis, they represent unreachable.

Black: indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object indicates that it has been scanned. It is safe to survive. If there is another object reference to the black object, it does not need to be scanned again. It is not possible for a black object to directly point to a white object (without passing a gray object).

Gray: Indicates that the object has been accessed by the garbage collector, but there is at least one reference to the object that has not been scanned.

Read the above description and then sample the following picture:

As you can see, the gray object is the intermediate state between the black object and the white object. When the marking process is over, there are only black and white objects, and the white objects are the ones that need to be reclaimed.

If only the garbage collection thread was working during the scan of the reachability analysis, there would be no problem.

But the garbage collector and the user thread running at the same time? That’s when it gets interesting.

The garbage collector colors the object graph while the user thread is changing the reference. If the reference changes, the object graph changes. This can have one of two consequences:

One is to erroneously mark a dead object as alive, which is not a good thing, but is actually tolerable, producing a little floating garbage that escapes this collection and can be cleaned up next time.

One is to mistakenly mark a living object as dead. This is a very serious consequence. An object that a program still needs to use is recycled.

When an interviewer asks you why there is floating garbage, you can use these words to answer the question.

But in most cases the interviewer should be more concerned about the second.

He might ask: What about the second case you just said, “incorrectly marking a living object as dead”? How did it die? How does the garbage collector solve this problem?

So next, let’s focus on the problem of “object disappearance” during concurrent markup. I don’t know how the object is missing.

Here is an example from Understanding the Java Virtual Machine (3rd Edition), but the description of the 3rd edition example is not particularly easy to understand, so I will try to make it as clear as I can. The following three scenarios of markup will be analyzed in combination with giFs:

Normal tag

Let’s take a look at the normal marking process:

The first is the initial state, very simple, only the GC Roots are black. Note also that the arrow directions in the following images are directed, for example, one of the reference chains is: root node ->5->6->7->8->11->10

Over the course of the scan, the changes look like this:

Inner OS: God knows how hard I had to work to make the following giFs, and to fit every one of them into a pixel size, blind my titanium dog eyes.

If you look at the GIF above, the gray object is always somewhere between black and white. When the scan is complete, the object graph looks like this:

At this point, the black object is a living object and the white object is a dead object that can be reclaimed.

Remember, this is a normal situation where everything is wonderful.

Case one where the object disappears

Next, let’s look at the case where the object disappears:

If the user thread modifies the reference while marking, the following happens:

When the scan is complete, the object graph looks like this:

At this time, we can clearly see that after the scan is completed, object 9, which was originally referenced by object 5, will be garbage collected according to the three-color marking principle because it is a white object.

This is where the object disappears.

Object message case two

Here’s another “disappearing object” phenomenon:

As demonstrated above, objects that are re-referenced by the black object after the user thread has severed the reference are part of the original reference chain.

Objects 7 and 10 are originally part of the original reference chain (root node ->5->6->7->8->11->10). The modified reference chain becomes (root node ->5->6->7->10).

When the scan is complete, the object graph looks like this:

Since black objects are not rescanned, this will cause objects 10 and 11 to be reclaimed when the scan is complete. They were all part of the original reference chain before they were modified.

So, back to the original question: What’s the problem with concurrent markup?

After analyzing the giF-image of the above three cases (one normal case and two “missing objects” cases) and comparing with the final object graph after scanning, we know that concurrent marking will not only generate floating garbage, but also cause the problem of “disappearing objects”.

How do we solve the object disappearance problem?

One of the big names, Wilson, theoretically proved in 1994 that the problem of “disappearing objects” occurs if and only if the following two conditions are met: objects that should be black are mislabeled as white:

Condition 1: The assignor inserts one or more new references from the black object to the white object.

Condition two: The assigner removes all direct or indirect references from the gray object to the white object.

If and only if, are these two conditions related to each other in combination with the diagram we have shown above:

The reference between black object 5 and white object 9 is new and corresponds to condition 1.

The reference between black object 6 and white object 9 is removed, corresponding to condition 2.

Because the relationship between the two conditions is if and only if. So, to solve the problem of objects disappearing in concurrent markup, we only need to break one of the two conditions.

Two solutions emerged: Incremental Update and Snapshot At The Beginning (SATB).

In the HotSpot virtual machine, CMS does concurrent marking based on incremental updates, while G1 uses raw snapshots.

What is an incremental update?

Incremental updates to destruction is the first condition (assignment by inserting one or more new reference object from black to white objects, when the black object insert a new white object references, with the newly inserted record reference, such as concurrent scanning is done, then these records of the dark object references as the root, to scan again.

A simple way to think about it is that the black object becomes gray once a reference to the white object is inserted.

The following figure shows that after a concurrent scan, the black object 5 is pointing to the white object 9:

So object 9 is scanned as black again. It will not be collected, so objects will not disappear.

What is a raw snapshot?

Original snapshot is to destroy the second condition (assignment deleted from the gray object to all this white object reference) directly or indirectly, when grey object to remove points to the white object reference relations, will delete the record reference, at the end of the concurrent scan then these records of grey objects in the reference relationship as the root, to scan again.

This can be simplified as follows: regardless of whether the reference relationship is deleted or not, the object graph snapshot is searched at the moment the scan started.

It is important to note that both the insert and delete of a reference record in the previous section are performed using a write barrier. Write barriers are also an important point, but they are not the focus of this article and will not be covered in detail.

Just two points to add:

1. Write barriers are not the same thing as “memory barriers” that are often used to solve concurrent out-of-order execution problems.

2. Write barriers can be viewed as the VIRTUAL machine-level AOP aspect of assigning a value to a reference type field. When assigning a value to a reference object, a circular notification is generated for the program to perform additional actions, i.e. the assignment is covered by the write barrier before and after the assignment. The part before assignment is called a pre-write Barrier, and the part after assignment is called a post-write Barrier.

Therefore, after a simple derivation, we can know:

Incremental updates use a post-write Barrier to record all new references.

The original snapshot uses a pre-write Barrier to record all the old references of reference relationships that are to be deleted.

Last word (attention)

1. Most of the content of this article comes from “Understanding the Java Virtual Machine (version 3)”, but I think it is difficult to describe this point with three colors only using pictures and text, so I added my own understanding to make a GIF. It is highly recommended that you read the relevant content of the third edition first, if you do not understand, then read this article, should be able to better understand.

2. I’ve been getting a lot of requests from readers for resume revision and job advice, so I know it’s time to start hiring again.

In fact, I am not qualified to modify your resume, and I am not a skilled person. I just share what I know, which not only enables me to consolidate my knowledge, but also forces me to input knowledge. In addition, I can help you a little bit, which is the full value of my article.

Also, if you are going through spring or social recruitment, you can read my previous article to see if it helps a little:

“Thoughts after Interviewing 15 2020 Graduate Students from 985/211 Universities”

If you find something wrong, please leave me a message to point it out so that I can modify it.

Update: An error is described in the article, see the link for more details

If you think the article is good, your retweet, share, praise, like, comment is the biggest encouragement to me.

Thank you for your reading, I insist on the original, very welcome and thank you for your attention.

The above.

Welcome to pay attention to the public number [Why Technology], adhere to the output of original. Share technology, taste life, wish you and I progress together.