What is volatile

The volatile keyword is a lightweight synchronization mechanism provided by Java. It guarantees visibility and order, but not atomicity

visibility

For visibility into volatile, take a look at the execution of this code

  • flagThe default istrue
  • Create A thread A to determineflagWhether it istrueIf fortrueLoop executioni++operation
  • After two seconds, create another thread B willflagModified tofalse
  • Thread A is not aware of itflagHas been modified tofalseWe can’t break out of the loop

What is this equivalent to? It is equivalent to your goddess telling you that if you work hard, you will marry you if you earn a million yuan a year. After listening, you try to make money. 3 years later, you make millions of dollars a year, go back to your goddess, only to find your goddess married, she did not tell you the news of marriage! Is it hard?

Goddess marriage can not tell you, but Java code attributes are stored in memory, why one thread modify another thread why not visible? This will have to mention in the Java memory model, in the Java memory model, referred to as “the JMM, JMM defines the threads and the abstract relationship between the main memory, defines the thread between the Shared variables stored in main memory, each thread has a private local memory, to store the thread in the local memory in read/write copy of the Shared variables, It covers caching, write buffers, registers, and other hardware and compiler optimizations.

Attention! The JMM is an abstract concept that hides the differences between operating system architectures and is just a set of Java specifications.

Now that we know about the JMM, let’s go back to the code at the beginning of this article. Why does thread B change flag and thread A still see the same value?

  • Because thread A made A copy of the originalflage=trueInto local memory, which thread A then usesflagThese are all flags that are copied to local memory.
  • Thread B has changedflagAfter that, the flag value is flushed to the main memory, and the main memory flag value becomesfalse.
  • Thread A doesn’t know that thread B has changedflag, has been using local memoryflag = true.

So how do you let thread A know that the flag has been changed? Or how to invalidate the flag cached in thread A’s local memory and make it visible between threads? Using volatile to modify flag does this:

We can see that volatile modifies the flag and thread B modifies the flag and thread A is aware of it, indicating that volatile guarantees visibility between thread synchronizations.

reorder

Before we get into volatile orderliness, we need to add some information about reordering.

Reordering is a process by which compilers and processors reorder instruction sequences to optimize program performance.

Why reorder? In simple terms, it is to improve the efficiency of execution. Why does it improve execution efficiency? Let’s look at this example:

As you can see, the CPU actually performs a read and write back operation after reordering, which indirectly improves execution efficiency.

One thing must be stressed that the above example just in order to give readers a better understanding of why reorder can improve execution efficiency, and in fact the inside of the Java reordering is not based on the code level, from the code between the CPU execution and there are many phases, the underlying CPU and some optimization, implementation process may actually not above said. Don’t get too hung up on it.

Reordering can improve the efficiency of the program, but it must follow the as-IF-serial semantics. What is the as-if-serial semantics? In simple terms, no matter how you reorder, you have to make sure that no matter how you reorder, you can’t change the result of executing your program in a single thread.

order

Now that we’ve covered Java reordering, let’s talk about the orderliness of volatile.

Let’s start with a classic interview question: Why does the DDL singleton need volatile?

Singleton = new singleton () is not an atomic operation.

  • Allocate a block of memory
  • Invoke the constructor to initialize the instance
  • singletonRefers to the allocated memory space

Reorder may occur during the actual execution, resulting in the actual execution steps like this:

  • Apply for a block of memory
  • singletonRefers to the allocated memory space
  • Invoke the constructor to initialize the instance

After the singleton points to the allocated memory space, the singleton is not empty. However, until the constructor is called to initialize the instance, the object is still in the semi-initialized state. In this state, the properties of the instance are the default properties. If another thread calls getSingleton(), it will get the semi-initialized object, causing an error.

Volatile disables reordering, ensuring that the singleton is not pointed to the allocated memory until the object is initialized, preventing uncontrolled errors. Volatile provides a happens-before guarantee that writes to volatile variables are happens-before all subsequent reads by other threads.

The principle of

From the DDL singleton use case above, the presence of reordering can cause some unknown errors in the concurrent case. Volatile prevents reordering. How does volatile prevent reordering?

To implement the memory semantics of volatile, the JMM restricts certain types of compiler and processor reordering. The JMM makes tables for volatile reordering rules for compilers:

To sum up:

  • The second operation is volatile write and will not be reordered no matter what the first operation is
  • The first operation is a volatile read, and no reordering is done no matter what the second operation is
  • The first operation is volatile write, the second is volatile read, and reordering does not occur

How do you guarantee that these operations don’t send reorders? This is done by inserting memory barriers, which can be divided into load barriers and Store barriers at the JMM level. For volatile operations, the JMM memory barrier insertion strategy:

  • Insert a StoreStore barrier before each volatile write
  • Insert a StoreLoad barrier after each volatile write
  • Insert a LoadLoad barrier after each volatile read
  • Insert a LoadStore barrier after each volatile read

The above barriers are all at the JMM specification level, meaning that writing the JDK to this specification ensures that operations on volatile areas of memory do not send reordering.

At the hardware level, a series of memory barriers are also provided to provide consistent capabilities. The X86 platform, for example, provides these memory barrier instructions:

  • Lfence directive: A read before the lfence directive must be completed before a read after the lfence directive, similar to a read barrier
  • Sfence: Write operations before sfence when they must be completed before write operations after sfence, similar to write barriers
  • Mfence: Read and write operations before the mfence must be completed before the read and write operations after the mfence, similar to the read and write barrier.

The JMM specification requires so many memory barriers, but the actual situation does not require so many memory barriers. For example, the common X86 processor does not resort read-read, read-write, and write-write operations. Instead, it only resorts write-read operations, ignoring the memory barriers corresponding to these three operations. So volatile write-read operations only need to insert the StoreLoad barrier after volatile writes. In The JSR-133 Cookbook for Compiler Writers, this is also clearly stated:

On x86 processors, there are three ways to implement the StoreLoad barrier:

  • Mfence Directive: As mentioned above, it implements an all-purpose barrier with both lfence and sfence capabilities.
  • The CPUID instruction: The CPUID opcode is a complementary instruction for x86 processors. Its name is derived from CPU identification, which allows software to discover processor details.
  • Lock instruction prefix: bus lock. The lock prefix can only be added to special instructions.

In fact HotSpot’s implementation of volatile uses the Lock directive, only prefixes the volatile marker with the lock directive, and does not refer to the JMM barrier design and uses the corresponding Mfence directive.

+ UnlockDiagnosticVMOptions – plus – XX: XX: + PrintAssembly XcompJVM parameter again the main method, in the assembly code of print, we can also see a lock addl $0 x0, the operation of RSP (%).

This can also be verified in the source code:

Lock addl $0x0,(% RSP) addl $0x0,(% RSP) is an empty operation. Add means add, 0x0 is a hexadecimal 0, RSP is a type register, add the value of the register 0, does 0 equal nothing? This pool code is just a carrier of the LOCK instruction. As mentioned above, the lock prefix can only be added to special instructions, such as add.

As for why Hotspot uses the Lock directive instead of the mfence directive, as FAR as I can see, it’s a lot easier to implement. Because Lock is so powerful, it doesn’t require much consideration. Moreover, the lock directive preferentially locks the cache rows, and the lock directive is not as bad as expected in terms of performance, nor is the mfence directive as good as expected. So using Lock is a very cost-effective option. Also, Lock has semantics for visibility.

Find Lock in the instruction sheet in the IA-32 Architecture Software Developer’s Manual:

I’m not going to go into the details of the implementation of the Lock directive here. It’s easy to get bogged down in technical jargon, and it’s beyond the scope of this article. For those interested, check out the IA-32 Architecture Software Developer’s Manual.

We only need to know what lock does:

  • Ensure atomicity of subsequent instruction execution. In Pentium and its predecessors, instructions prefixed with lock lock the bus during execution, making it temporarily impossible for other processors to access memory through the bus, which is obviously expensive. In the new processor, Intel uses cache locking to ensure atomicity of instruction execution. Cache locking greatly reduces the execution overhead of lock prefix instructions.
  • Disallows reordering of this instruction with previous and subsequent read and write instructions.
  • Flushes all data written to the buffer to memory.

In summary, the LOCK instruction guarantees both visibility and atomicity.

The important thing, again, is that the LOCK instruction guarantees both visibility and atomicity, and has nothing to do with buffering conformance protocols or MESI.

In order not to confuse the cache consistency protocol with the JMM, I purposely did not mention the cache consistency protocol in previous articles, because the two are not in the same dimension and exist for different purposes. We will talk about this next time.

conclusion

The focus of the article is on the visibility and ordering of volatile, and much of it is devoted to describing some of the underlying computer concepts, which may be too boring for the reader, but if you read it carefully, I’m sure you’ll learn something.

Without further ado, volatile is just a generic keyword. If you dig deeper, volatile is a very important point. Volatile can combine software and hardware, and to understand it thoroughly, you need to drill down to the very bottom of the computer. But if you do. Your knowledge of Java will definitely improve.

Focusing only on the Java language seems very limiting. The volatile keyword is also used in C and C++. I haven’t seen how volatile is implemented in C++, but I’m sure the underlying principles are the same.

Write in the last

In line with the principle of being responsible for every article sent, I will try my best to find and verify the knowledge theory in official documents and authoritative books. Even so, I can’t guarantee that every point in the article is correct. If you find something wrong, please point it out and I will correct it.

Writing is not easy, and your positive feedback is very important to me! A “like”, a “look”, a “follow” and even a “666” in the comments section are the biggest support for me!

I’m CoderW, an ordinary programmer.

Thanks for reading, and see you next time!

Welcome to CoderW and thank you very much for your attention


The resources

  • JSR – 133: gee.cs.oswego.edu/dl/jmm/cook…
  • The Art of Concurrent Programming in Java
  • In Depth understanding the Java Virtual Machine, third edition
  • Ia-32 + Architecture Software Developer’s Manual