This article is participating in the Java Theme Month – Java Debug Notes EventActive link
Hello, I’m Why.
I don’t know if you remember, but I wrote an article about a technical problem that has been bothering me for 122 days, and I think I know the answer.
I gave an example like this:
public class VolatileExample {
private static boolean flag = false;
private static int i = 0;
public static void main(String[] args) {
new Thread(() -> {
try {
TimeUnit.MILLISECONDS.sleep(100);
flag = true;
System.out.println("Flag changed to true");
} catch (InterruptedException e) {
e.printStackTrace();
}
}).start();
while(! flag) {
i++;
}
System.out.println("End of program, I =" + i);
}
}
Copy the code
The above program will not terminate normally because flag is not volatile.
However, in the 100ms when the child thread is asleep, the flag of the while Loop is always false. When the Loop reaches a certain number of times, the JIT of JVM will be triggered, and Loop Expression will be carried out, resulting in a dead Loop.
However, if volatile is used to modify the flag variable to ensure the visibility of the flag, it will not be promoted.
The verification method is to disable the JIT function by running the command -xint or -djava.com piler=NONE.
That’s not the point. The point is that I made a few minor changes and the code ran differently.
Here’s what I said in the last section of the article:
The question “about Integer” mentioned in the picture is the “metaphysics” mentioned in the article:
Yeah, I came back to fill the hole.
To explore again
In fact, the reason I have to explore this question again is because in April someone sent me a private message asking me if THERE was any conclusion to the metaphysical question of Integer.
All I can say is:
But then I thought of this comment from the article:
Because at that time the public number did not leave a message function, with the third party small program, so I did not pay attention to the message remind.
It took me a long time to read the big guy’s message, but I also replied to the message:
Thank you for your analysis. I will analyze it according to this idea when I have time.
But THEN I put it on hold, too, because I felt like there wasn’t much to be gained from pursuing it further.
Unexpectedly, after such a long time, another reader to ask.
So during May Day, I revised the program and conducted a wave of research based on search engines.
Hey, guess what?
I actually came up with something interesting.
Conclusion first: the final keyword affects the results of the program.
In this case, where is the final keyword?
When we change the int inside the program to Integer, i++ operation involves the operation of boxing and unboxing, the corresponding source code in this process is here:
And the value in the new Interger(I) is final,
The program ends normally, and it is the final keyword that affects the results of the program.
So how does final affect that?
After exploring this place, I found that there is a certain deviation from what the message said.
Because of the storeStore barrier and happens-before relationship, flags will be flushed into main memory.
Based on the help of the search engine, I came to the conclusion that two sets of machine codes were generated with and without final, resulting in inconsistent running results.
But I have to add one premise here: the processor is x86.
This conclusion is based on the following test case, which is also written in line with the comments:
Class contains a final property, which is assigned in the constructor. We then continue to new the object within the while loop:
My operating environment is:
- Jdk1.8.0 _271
- win10
- IntelliJ IDEA 2019.3.4
The results are as follows:
- If the age property is decorated with final, the program exits normally.
- If the age property removes the final modifier, the program loops indefinitely and cannot exit.
Here’s the GIF:
You can also paste the code I gave you above and run it to see if it matches what I said.
Tell me about the final
When I changed the program to look like this, it was obvious that the final keyword affected the program.
In fact, I was very excited when I came to this conclusion. I was finally going to solve a mystery that had been bothering me for more than a year.
With the conclusion, it is not easy to find the process of reasoning, right?
And I know where to find the answers, hidden in a book on my desk.
So I turned to the Art of Concurrent Programming in Java, which has a section devoted to the memory semantics of final fields:
I was so impressed by this section, because the “overflow” in section 3.6.5 should mean “escape”. Earlier, I wrote this article based on this:
To be honest, I found one mistake in this book!
So I just need to find evidence in this section to support the argument in the comment that the storestore barrier plus happens-before relationship results in flags being flushed into main memory.
But it was not as simple as I had thought, for I found in the book, instead of evidence to prove the argument, I found evidence to disprove it.
I won’t carry over a large section of the book, but will focus on the implementation of 3.6.6 final semantics in processors:
Note the underlined sentence: on X86 processors, final domain reads/writes do not insert any memory barriers.
Since no memory barrier exists, the “StoreStore barrier” is also omitted. Therefore, on X86 processors, the memory semantics of the final domain bring flag refreshes that do not exist.
So the previous argument is incorrect.
Where does this book come from that final domain reads/writes on X86 processors do not insert any memory barriers?
It’s a coincidence that our old friend Doug Lee told the author about it.
You see jSR-133 mentioned in section 3.6.7. The JSR-133 Cookbook for Compiler Writers is The jSR-133 Cookbook for Compiler Writers.
http://gee.cs.oswego.edu/dl/jmm/cookbook.html
In this recipe, there is a chart like this:
As you can see, on x86 processors, LoadStore, LoadLoad, and StoreStore are no-op, that is, no operation is performed.
On x86, any lock-prefixed instruction can be used as a StoreLoad barrier. (The form used in linux kernels is the no-op lock; Addl $0, 0) (% % esp.) Versions supporting the “SSE2” extensions (Pentium4 and later) support the mfence instruction which seems preferable unless a lock-prefixed instruction like CAS is needed anyway. The cpuid instruction also works but is slower.
On x86, any instruction prefixed with Lock can be used as a StoreLoad barrier. (The form used in the Linux kernel is no-op lock; Addl $0, 0) (% % esp.) Versions that support the “SSE2” extension (Pentium4 and later) support the mfence directive, which seems to be better unless a lock prefixed directive like CAS is needed anyway. The CPUID command also works, but at a slower speed.
Check here when I am quick meng force, very not easy to sort out a little bit of thinking was blocked again.
Let me get this straight for you.
Have we been able to make it very clear that the barrier that final brings (StoreStore) is an empty operation on X86 processors and has no effect on memory visibility?
So why does the program stop with final?
The program stopped, which means the main thread must have observed a change in the flag, right?
So why can’t the program stop when final is removed?
The main thread must not have observed the flag change.
In other words, whether to stop or not to stop is directly related to whether there is a final.
But the barrier that final domains bring is empty operation on X86 processors.
This is metaphysics, right?
I went all the way around. I don’t know.
This wave, seriously, pissed me off, all the time I spent, going all the way around and back?
To do it.
stackoverflow
After the previous analysis, the conclusion mentioned in the message is not verified.
But I can already tell very clearly that the final keyword is definitely at fault.
So I set out to take a look around StackOverflow to see if I might find something unexpected.
Sure enough, god pays off, I probably turned over hundreds of posts, on the verge of preparing to give up, I turned to a tiger body let me a shock of the post.
Tiger body shock, another gasp: my mother, this is a BUG in the JVM! ?
Without mentioning this, I’ll tell you how I searched for questions in StackOverflow.
First of all, in this case, the key words I can determine are Java and final.
But WHEN I took the two keywords to check, the results of the query are too many, after turning over a few I found that this is undoubtedly looking for a needle in a haystack.
So I changed my strategy. Stackoverflow searches have tags:
If I had to divide this problem into a label, the label would be Java,JVM,JMM,JIT.
So I found a treasure trove under the Java-memory-Model, or JMM:
It’s this treasure question that drives the rest of the plot:
https://stackoverflow.com/questions/57427531/in-java-what-operations-are-involved-in-the-final-field-assignment-in-the-c ons
I know that when you saw this place there was nothing going on inside you, and you even wanted to laugh when you heard my body shake.
But when I saw this question, my hands were literally shaking.
Because I knew I could solve the metaphysical problem here.
The reason FOR my gasp is that the sample code in this question is exactly the same as my code, and Simple in his code corresponds to Why in my code. The question of verification is much the same.
The description in the question says:
Actually, I know the storing “final” field would not emit any assembly instructions on x86 platform. But why this situation came out? Are there some particular operations I don’t know ?
In fact, I know that “final” fields do not issue any assembly instructions on x86 processors. But why does this happen? Is there any special operation I don’t know about?
The truth
Here’s the science behind metaphysics in an answer to the above StackOverflow question:
Let me translate it for you:
Dude, I saw the screenshot of your question, and you’re not posing it right.
What are the screenshots?
These are the two screenshots that the questioner attached to the question:
The final Case screenshot looks like this:
A screenshot of a non-final case looks like this:
As an aside, the source of the screenshot is the JITWatch tool, which is a very powerful tool.
As you can see from your screenshots, runMethods are compiled, but not actually executed. What you need to notice is where the % mark is in the assembly output, which represents OSR (on-stack replacement) stack replacement.
If you’re not sure what OSR is, don’t worry.
With or without final, the resulting assembly code is different. After compiling, I only retain the relevant parts as follows:
As you can see from the screenshot, assembly code is an infinite loop without final. With final, the flag field is loaded every time.
But you see, in both cases, there is no instance allocation to the Simple class, and there is no field allocation.
So, this is not a compiler assignment of final fields, but rather an optimization by the compiler.
There is no Simple class at all, and no final fields at all. But adding final does affect the results of the program.
This issue has been fixed in newer JVM versions. .
So, if you run the same code on JDK 11, with or without final, the program will not exit properly.
Well, with all that said, it’s pretty clear why.
The root cause is that with or without final produces two different sets of machine code in my example environment.
The underlying reason is OSR mechanism.
validation
After the previous analysis, now we have a new direction of investigation.
Now I have to go see if this guy is talking nonsense.
So I went to test his words:
If you run the same example on JDK 11, there will be an infinite loop in both cases, regardless of the final modifier.
Run final and no final modifiers with older JDK versions.
The program is really stuck in a loop.
As you can see from the GIF below, my JDK version is 15.0.1:
The first point is validated. The same code in JDK8 and JDK15 will run differently.
I have reason to believe that maybe this is a BUG, not a BUG, of the JVM. (and so on… Isn’t a BUG a BUG?)
The second point of verification is this:
Instead, execution jumps from the interpreter to the OSR stub.
The result is not the same with JDK8 because of the stack substitution, so I can use the following command to replace the stack to close:
-XX:-UseOnStackReplacement
After removing final, run the program again and it stops.
The second point is verified.
The third point of verification is his place:
I’ll also pull out my compilation and see if there’s anything like this.
How do you assemble it?
Use the following command:
-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation -XX:LogFile=jit.log
At the same time you also need a HSDIS DLL file, there are many online, a search can find, I believe that if you also want to verify yourself, so find this file is not difficult to you.
When final fields are not added, the assembly looks like this:
What does a JMP directive do?
Unconditional jump.
So, it’s a constant cycle.
With final fields added, the assembly looks like this:
The first jump is je instead of JMP.
The je jump is conditional and stands for “equal to jump”.
Before je instruction, there is movzBL instruction, which reads the value of flag variable.
Therefore, after the final statement is added, the value of the flag variable is read each time, so that the change of the flag value can be seen by the main thread in time.
New Why(18) : new Why(18) : new Why(18) : new Why(18) :
So we don’t see any instructions in the assembly to allocate the Why object, which verifies his statement:
You see, in both cases there is no Simple instance allocation at all, and no field assignment either.
Since then, metaphysical problems have been scientifically explained.
If you’re still reading this, congratulations, you’ve learned another useless lesson.
If you want to learn something relevant and useful about this article, I suggest checking out these places:
- Section 3.6 of The Art of Concurrent Programming in Java – Memory semantics for final fields.
- Part 4 of Understanding the Java Virtual Machine – Program compilation and code optimization.
- Chapter 7 – Compilation Overview, Chapter 8 -C1 compiler, chapter 9 -C2 compiler.
- Chapter 10 of Java Performance Optimization Practices – Understanding just-in-time compilation
After reading the above, you should at least have a good idea of the two processes by which Java programs are compiled from source code to bytecode, and from bytecode to native machine code.
Learn about the JVM’s hot code detection scheme, just-in-time compilation of HotSpot, compile trigger conditions, and how to observe and analyze compiled data and results from outside the JVM.
You’ll also learn about compiler optimization techniques such as method inlining, hierarchical compilation, on-stack substitution, branch prediction, escape analysis, lock elimination, lock bloat… Wait, these are basically useless, but you know the knowledge of the high point.
In addition, I strongly recommend this column of R University:
https://www.zhihu.com/column/hllvm
This article in the column, Treasure:
https://zhuanlan.zhihu.com/p/25042028
For example, in the case of on-stack replacement (OSR) mentioned in this article, R is large and the answer is:
To be blunt, OSR is useful for running points, but not for normal programs:
It includes the following passage:
JIT optimizes the code very aggressively.
In fact, going back to our article, the addition or absence of the final keyword appears to generate two different machine codes, but in essence it is the final keyword that prevents JIT from making aggressive optimizations.