The original article is from my blog yuequan’s blog
It is recommended that you have prior knowledge of the JMM and the CPU Cache consistency protocol and associated memory barriers and understand out-of-order CPU execution. If this article has been helpful to you, please point a small star to the Github of your blog. You can also directly submit an issue on Github if you have misunderstood the article
The topics discussed in this article are:
- Volatile in Java semantics
- The memory barrier
- The implementation of the JVM
- The generated assembly instruction
- How to ensure visibility and order
- Why does volatile not guarantee atomicity for compound operations
Volatile in Java semantics
Let’s start with a very common case
public class Test {
public static void main(String[] args) throws InterruptedException {
Demo demo = new Demo();
demo.setName("demo-thread");
demo.start();
Thread.sleep(1000);
demo.flag = false;
demo.join();
System.out.println(demo.getName() + "Thread completes execution:" + demo.count);
}
static class Demo extends Thread{
boolean flag = true;
int count = 0;
@Override
public void run(a){
while(flag){ count ++; }}}}Copy the code
It is obvious that the demo-thread reads the flag value true from main memory and places it in working memory. Then, while checks whether the flag value is true. If it is true, the demo-thread loops until flag value is false. However, the Demo-thread thread did not perceive that the flag value had been changed to true by the main thread. Therefore, it could not be stopped. To put it simply, this is the problem of visibility between threads.
How to solve this problem? Volatile is the subject of this article. (There are actually several ways to solve this, and I’m doing this for the purpose of this article.)
public class Test {
public static void main(String[] args) throws InterruptedException {... }static class Demo extends Thread{
volatile boolean flag = true; . }}Copy the code
The variable is decorated with the volatile keyword so that changes to it are visible to other threads. Volatile also prohibits instruction-optimized reordering. After volatile, memory barriers are inserted when operations are performed on the variable.
The memory barrier
About the processor memory barriers, and then to discuss the JVM memory of the definition of barrier, first understand what is memory barrier essence, memory barrier in essence is actually a kind of synchronous barrier instruction, added a barrier, if the barrier has read and write operations before and after the barrier also has read and write operations, the barrier before the read and write operations must must prior to read and write operations after the barrier, The read/write behind the barrier must also follow the read/write behind the barrier.
Processor memory barrier
-
Read memory barrier
Ensure that read operations that precede the barrier are followed by read operations that are later than the barrier
-
Write memory barrier
Ensures that writes that precede the barrier are followed by writes that are later than the barrier
-
Full memory barriers
Ensure that read and write operations prior to the barrier are followed by read and write operations later than the barrier
Let’s start with a few semantic instructions, because we’ll find them in JVM implementations later, so I’ll briefly explain them
Acquire: Commands in front of the barrier will not be queued behind the barrier
Release: Instructions behind the barrier are not emitted to the barrier
Instructions in front of the fence do not queue behind the fence, and instructions behind the fence do not queue behind the fence
Memory barriers for the JVM
LoadLoad read barrier: for example, there are Load1 and Load2 instructions, then if the insert barrier instruction is Load1; LoadLoad; Load2: Insert a LoadLoad barrier in the middle to ensure that read operations will not be optimized out of order. That is, when Load2 is executed, the read operation of Load1 should be completed.
StoreStore Write barrier: For example, if there are Store1 and Store2 instructions, if the insert barrier instruction is Store1; StoreSotre; Store2. Insert StoreStore barriers in the middle to prevent write operations from being out of order. That is, when Store2 is being executed, the write operations of Store1 should be completed and the write operations of Sotre1 should be visible to Store2.
LoadStore read/write barrier: for example, there are Load1 and Store2 instructions, then if the insert barrier instruction is Load1; LoadStore; Store2: LoadStore barrier is inserted in the middle to ensure that the previous read operations and the subsequent write operations will not be optimized out of order. That is, Load1 should be completed when Store2 is executed
StoreLoad Write/read barrier: for example, there are directives Store1 and Load2. StoreLoad; Load2: The StoreLoad barrier is inserted in the middle to ensure that the previous write operations will not be optimized out of order for the subsequent read operations and is visible. That is, when Load2 is executed, Store1 should be completed and its write operations are visible to the read operations behind the barrier
If either with what kind of barrier, such as namely barrier LoadStore barrier before the read operation will read values from the main memory, after the barrier instruction will into main memory write values, such as StoreLoad barrier in front of the barrier again write operation will be to write value, main memory to read operation after the barrier, of course, it’s just writing a value into the main memory also can’t guarantee is visible, So subsequent read operations also read values from main memory
The implementation of the JVM
So, is it clear that volatile is visible to other threads? So that our modified flag can be effective in multi-threaded environment. With a very simple example, further discussion.
public class Demo{
static volatile int i;
public static void main(String[] args){
i = 1; }}Copy the code
View the generated bytecode (partial snippets)
static volatile int i;
descriptor: I
flags: ACC_STATIC, ACC_VOLATILE
Copy the code
You can see that there is an ACC_VOLATILE identifier on the bytecode file, and then open up the JVM (which I use with Hotspot) code to look ~
You can see in the JVM source code that there is an IS_volatile to determine whether the volatile access qualifier is qualified, and then look at the partial source code for the bytecode interpreter
In this case, we call release_int_field_put because it is an int. Finally, we insert a barrier called storeload. Let’s look at the itOS definition first
As the name suggests: represents data of type int cached at the top of the stack
Then see release_int_field_put
You’ll notice that it calls OrderAccess:: Release_store
So what does this method actually do? Notice first that the volatile keyword is added to the method argument. This is the c++ volatile keyword and Java(Java syntax has the same name, doesn’t it?). A variable decorated with this keyword is meant to be mutable, and a variable decorated with this keyword in c++ is retrieved from its memory address every time it is used and the compiler does not optimize it.
So what is OS ::atomic_copy64? This is going to be for different systems, but I’m only looking at Linux
Crudely, generating assembly code to copy values, right?
And then let’s see
Then see OrderAccess: : storeload
Please tell me do these four things look familiar? ! ? This is of course just defining the implementation of different systems, but we’re still looking at Linux
If you look at the implementation of this method, what are the other three implementations? Did this article also explain the semantics above? Move on to the implementation under Linux
What is FULL_MEM_BARRIER
For the environment is different, here is no longer specific
The generated assembly instruction
= = When I wrote this section, I was using Windows, so I need to update the source code of Windows fence implementation
Take a look at the assembly code generated on my machine
[Disassembling for mach='amd64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo'
# parm0: rdx:rdx = '[Ljava/lang/String;'
# [sp+0x40] (sp of caller)
0x00000000037e5320: mov dword ptr [rsp+0ffffffffffffa000h],eax
0x00000000037e5327: push rbp
0x00000000037e5328: sub rsp,30h
0x00000000037e532c: mov rsi,17cf2af8h ; {metadata(method data for {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo')}
0x00000000037e5336: mov edi,dword ptr [rsi+0dch]
0x00000000037e533c: add edi,8h
0x00000000037e533f: mov dword ptr [rsi+0dch],edi
0x00000000037e5345: mov rsi,17cf2a30h ; {metadata({method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo')}
0x00000000037e534f: and edi,0h
0x00000000037e5352: cmp edi,0h
0x00000000037e5355: je 37e537eh ;*iconst_1
; - org.yuequan.thread.test.Demo::main@0 (line 6)
0x00000000037e535b: mov rsi,0d5b0dad0h ; {oop(a 'java/lang/Class' = 'org/yuequan/thread/test/Demo')}
0x00000000037e5365: mov edi,1h
0x00000000037e536a: mov dword ptr [rsi+68h],edi
0x00000000037e536d: lock add dword ptr [rsp],0h ;*putstatic i
; - org.yuequan.thread.test.Demo::main@1 (line 6)
0x00000000037e5372: add rsp,30h
0x00000000037e5376: pop rbp
0x00000000037e5377: test dword ptr [2f20100h],eax
; {poll_return}
0x00000000037e537d: ret
0x00000000037e537e: mov qword ptr [rsp+8h],rsi
0x00000000037e5383: mov qword ptr [rsp],0ffffffffffffffffh
0x00000000037e538b: call 37e20a0h ; OopMap{rdx=Oop off=112}
;*synchronization entry
; - org.yuequan.thread.test.Demo::main@-1 (line 6)
; {runtime_call}
0x00000000037e5390: jmp 37e535bh
0x00000000037e5392: nop
0x00000000037e5393: nop
0x00000000037e5394: mov rax,qword ptr [r15+2a8h]
0x00000000037e539b: mov r10,0h
0x00000000037e53a5: mov qword ptr [r15+2a8h],r10
0x00000000037e53ac: mov r10,0h
0x00000000037e53b6: mov qword ptr [r15+2b0h],r10
0x00000000037e53bd: add rsp,30h
0x00000000037e53c1: pop rbp
0x00000000037e53c2: jmp 374ece0h ; {runtime_call}
0x00000000037e53c7: hlt
0x00000000037e53c8: hlt
0x00000000037e53c9: hlt
0x00000000037e53ca: hlt
0x00000000037e53cb: hlt
0x00000000037e53cc: hlt
0x00000000037e53cd: hlt
0x00000000037e53ce: hlt
0x00000000037e53cf: hlt
0x00000000037e53d0: hlt
0x00000000037e53d1: hlt
0x00000000037e53d2: hlt
0x00000000037e53d3: hlt
0x00000000037e53d4: hlt
0x00000000037e53d5: hlt
0x00000000037e53d6: hlt
0x00000000037e53d7: hlt
0x00000000037e53d8: hlt
0x00000000037e53d9: hlt
0x00000000037e53da: hlt
0x00000000037e53db: hlt
0x00000000037e53dc: hlt
0x00000000037e53dd: hlt
0x00000000037e53de: hlt
0x00000000037e53df: hlt
[Exception Handler]
[Stub Code]
0x00000000037e53e0: call 3750aa0h ; {no_reloc}
0x00000000037e53e5: mov qword ptr [rsp+0ffffffffffffffd8h],rsp
0x00000000037e53ea: sub rsp,80h
0x00000000037e53f1: mov qword ptr [rsp+78h],rax
0x00000000037e53f6: mov qword ptr [rsp+70h],rcx
0x00000000037e53fb: mov qword ptr [rsp+68h],rdx
0x00000000037e5400: mov qword ptr [rsp+60h],rbx
0x00000000037e5405: mov qword ptr [rsp+50h],rbp
0x00000000037e540a: mov qword ptr [rsp+48h],rsi
0x00000000037e540f: mov qword ptr [rsp+40h],rdi
0x00000000037e5414: mov qword ptr [rsp+38h],r8
0x00000000037e5419: mov qword ptr [rsp+30h],r9
0x00000000037e541e: mov qword ptr [rsp+28h],r10
0x00000000037e5423: mov qword ptr [rsp+20h],r11
0x00000000037e5428: mov qword ptr [rsp+18h],r12
0x00000000037e542d: mov qword ptr [rsp+10h],r13
0x00000000037e5432: mov qword ptr [rsp+8h],r14
0x00000000037e5437: mov qword ptr [rsp],r15
0x00000000037e543b: mov rcx,6601c4e0h ; {external_word}
0x00000000037e5445: mov rdx,37e53e5h ; {internal_word}
0x00000000037e544f: mov r8,rsp
0x00000000037e5452: and rsp,0fffffffffffffff0h
0x00000000037e5456: call 65cd4510h ; {runtime_call}
0x00000000037e545b: hlt
[Deopt Handler Code]
0x00000000037e545c: mov r10,37e545ch ; {section_word}
0x00000000037e5466: push r10
0x00000000037e5468: jmp 3727600h ; {runtime_call}
0x00000000037e546d: hlt
0x00000000037e546e: hlt
0x00000000037e546f: hlt
Decoding compiled method 0x00000000037e4ed0:
Code:
Argument 0 is unknown.RIP: 0x37e5020 Code size: 0x00000110
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000017cf2a38} 'main' '([Ljava/lang/String;)V' in 'org/yuequan/thread/test/Demo'
# parm0: rdx:rdx = '[Ljava/lang/String;'
# [sp+0x40] (sp of caller)
0x00000000037e5020: mov dword ptr [rsp+0ffffffffffffa000h],eax
0x00000000037e5027: push rbp
0x00000000037e5028: sub rsp,30h ;*iconst_1
; - org.yuequan.thread.test.Demo::main@0 (line 6)
0x00000000037e502c: mov rsi,0d5b0dad0h ; {oop(a 'java/lang/Class' = 'org/yuequan/thread/test/Demo')}
0x00000000037e5036: mov edi,1h
0x00000000037e503b: mov dword ptr [rsi+68h],edi
0x00000000037e503e: lock add dword ptr [rsp],0h ;*putstatic i
; - org.yuequan.thread.test.Demo::main@1 (line 6)
0x00000000037e5043: add rsp,30h
0x00000000037e5047: pop rbp
0x00000000037e5048: test dword ptr [2f20100h],eax
; {poll_return}
0x00000000037e504e: ret
0x00000000037e504f: nop
0x00000000037e5050: nop
0x00000000037e5051: mov rax,qword ptr [r15+2a8h]
0x00000000037e5058: mov r10,0h
0x00000000037e5062: mov qword ptr [r15+2a8h],r10
0x00000000037e5069: mov r10,0h
0x00000000037e5073: mov qword ptr [r15+2b0h],r10
0x00000000037e507a: add rsp,30h
0x00000000037e507e: pop rbp
0x00000000037e507f: jmp 374ece0h ; {runtime_call}
0x00000000037e5084: hlt
0x00000000037e5085: hlt
0x00000000037e5086: hlt
0x00000000037e5087: hlt
0x00000000037e5088: hlt
0x00000000037e5089: hlt
0x00000000037e508a: hlt
0x00000000037e508b: hlt
0x00000000037e508c: hlt
0x00000000037e508d: hlt
0x00000000037e508e: hlt
0x00000000037e508f: hlt
0x00000000037e5090: hlt
0x00000000037e5091: hlt
0x00000000037e5092: hlt
0x00000000037e5093: hlt
0x00000000037e5094: hlt
0x00000000037e5095: hlt
0x00000000037e5096: hlt
0x00000000037e5097: hlt
0x00000000037e5098: hlt
0x00000000037e5099: hlt
0x00000000037e509a: hlt
0x00000000037e509b: hlt
0x00000000037e509c: hlt
0x00000000037e509d: hlt
0x00000000037e509e: hlt
0x00000000037e509f: hlt
[Exception Handler]
[Stub Code]
0x00000000037e50a0: call 3750aa0h ; {no_reloc}
0x00000000037e50a5: mov qword ptr [rsp+0ffffffffffffffd8h],rsp
0x00000000037e50aa: sub rsp,80h
0x00000000037e50b1: mov qword ptr [rsp+78h],rax
0x00000000037e50b6: mov qword ptr [rsp+70h],rcx
0x00000000037e50bb: mov qword ptr [rsp+68h],rdx
0x00000000037e50c0: mov qword ptr [rsp+60h],rbx
0x00000000037e50c5: mov qword ptr [rsp+50h],rbp
0x00000000037e50ca: mov qword ptr [rsp+48h],rsi
0x00000000037e50cf: mov qword ptr [rsp+40h],rdi
0x00000000037e50d4: mov qword ptr [rsp+38h],r8
0x00000000037e50d9: mov qword ptr [rsp+30h],r9
0x00000000037e50de: mov qword ptr [rsp+28h],r10
0x00000000037e50e3: mov qword ptr [rsp+20h],r11
0x00000000037e50e8: mov qword ptr [rsp+18h],r12
0x00000000037e50ed: mov qword ptr [rsp+10h],r13
0x00000000037e50f2: mov qword ptr [rsp+8h],r14
0x00000000037e50f7: mov qword ptr [rsp],r15
0x00000000037e50fb: mov rcx,6601c4e0h ; {external_word}
0x00000000037e5105: mov rdx,37e50a5h ; {internal_word}
0x00000000037e510f: mov r8,rsp
0x00000000037e5112: and rsp,0fffffffffffffff0h
0x00000000037e5116: call 65cd4510h ; {runtime_call}
0x00000000037e511b: hlt
[Deopt Handler Code]
0x00000000037e511c: mov r10,37e511ch ; {section_word}
0x00000000037e5126: push r10
0x00000000037e5128: jmp 3727600h ; {runtime_call}
0x00000000037e512d: hlt
0x00000000037e512e: hlt
0x00000000037e512f: hlt
Copy the code
What about this long? Just look at the key parts
The lock command is used. So the question is what is the lock instruction, I will explain superficial here: The CPU provides a means to lock the bus during the execution of the instruction, so the assembly generation machine code with lock makes the CPU pull down the potential of #HLOCK pin when executing the instruction, and release it at the end of the instruction, so as to lock the bus, so as to ensure the atomicity of the execution of the instruction
How to ensure visibility and order
The memory barrier is used to remind the compiler and CPU not to optimize instructions to prevent them from being executed out of order, and visibility between threads is achieved through main memory read and write before and after the barrier. (O(∩_∩)O I want to explain no more)
Why does volatile not guarantee atomicity for compound operations
For example, in multithreading, if multiple threads increment I of an instance variable, such as I ++, a race condition occurs. For example, if you increment I 5000 times, the result may be 5000 or it may be less than 5000, even though you volatile it. Remember that volatile is protected only by the mechanism of memory barriers
For example,
load1; load2; store1; store2; StoreLoad; load3; store3.....Copy the code
Although you guarantee visibility, you can’t guarantee atomicity. Atomicity essentially means that instructions are executed without interruption or without execution. If you think about it, i++ is a three-step compound operation: evaluate, add, assign, as in: You do not assign when other threads execute, other threads are also in the state of assignment, language explanation trouble to see the following example
I = 0 Thread A Thread B Value 0 Value 0 add 1 Add 1 assign 1 assign 1Copy the code
Although you guarantee visibility, you can’t guarantee that the values you get are always up to date.