Volatile can also be understood in this hard way

As the lightweight synchronization mechanism provided by the Java Virtual machine, the volatile keyword plays an important role in Java concurrent programming. However, it is not easy to understand volatile. Those who know volatile know that volatile variables guarantee visibility. Visibility is closely related to the Java memory model, so this article gives you a brief introduction to the memory model, takes a look at volatile variables at the Java virtual machine level, and takes you through volatile and the story behind it at the hardware level.

1. The relationship between computer memory model and Java memory model

Due to the difference of several orders of magnitude in the speed of modern computer processors and storage devices, modern computers buffer the processor and main memory with a cache: The data required by the processor’s calculation is copied to the cache. The processor directly obtains the data calculation from the cache. At the same time, the processor puts the calculation result into the cache and synchronizes it to the main memory from the cache.

The Java virtual machine also has its own memory model, the Java Memory Model (JMM), for the purpose of “compile once, run anywhere”. As a specification, The Java memory model shields the memory access rules of various operating systems and hardware, and is a logical abstraction of the computer memory model. It states that all variables must be in main memory, that each Java thread has its own working memory, which contains copies of the required variables, and that Java threads must operate on variables in working memory rather than directly in main memory.

As shown in the figure above, although both memory models can solve the problem of speed mismatch, the problem of cache inconsistency arises: multiple processors have their own caches, but they share the same main memory, making variable changes invisible. To address cache inconsistencies, it is required that the processor comply with a cache consistency protocol, such as the MESI protocol, when handling the cache. Given the existence of cache consistency protocols, why do you need the volatile keyword to ensure visibility?

2. Characteristics of volatile variables

First, volatile variables have the following characteristics:

Visibility. For reads of volatile variables, threads can always read the latest volatile value. In other words, any write to a volatile variable by any thread is immediately visible to other threads.
Orderliness, which prevents compilers and processors from reordering instructions to improve performance;
Atomicity is not guaranteed, and since long/double is a non-atomic protocol, long/double allows variable reads and writes that are not volatile to be divided into two runs under the 32-bit x86 hotspot virtual machine. But starting with JDK9, hotspot also explicitly restricts all data type access to atomicity, so volatile variables guarantee atomicity that can be largely ignored.

So how do volatile variables guarantee visibility and order?

3. Dig deep into volatile variables

From the Java memory model level: How does the Java memory model ensure that volatile variables are visible, that the JMM ensures that new values are immediately synchronized to main memory, that copies of corresponding variables are invalidated in other threads’ working memory, and that shared variables are read from main memory immediately before each use?

In the Java memory model, as-if-serial and happens-before are used to ensure the correctness of slave reordering. How does the underlying Java memory model implement special rules for volatile variables: writes to a variable occur first before reads to that variable? The answer is Memory barriers. In the Java memory model, there are four main types of memory barriers:

LoadLoad barrier: For statements such as Load1,LoadLoad, and Load2, ensure that the data to be read by Load1 is completed before Load2 and subsequent read operations.
LoadStore barrier: For statements such as Load1,LoadStore, and Store2, ensure that the data to be read by Load1 is fully read before Store2 and subsequent writes.
StoreStore barrier: For statements such as Store1,StoreStore, and Store2, ensure that the write operations of Store1 are visible to other processors before Store2 and subsequent write operations.
StoreLoad barrier: For statements such as Store1,StoreLoad,Load2, writes to Store1 are visible to all processors before Load2 and subsequent reads.

As shown below:
- For volatile reads, the JMM inserts a LoadLoad barrier and a LoadStore barrier after the read.
- For volatile writes, the JMM adds a StoreStore barrier before the write and a StoreLoad operation after the write.

The VISIBILITY and ordering of volatile variables is implemented by the JMM through memory barriers.

The basic principle of volatile variables is as follows: volatile variables are volatile variables. Volatile variables are volatile variables.

public class VolatileTest {
    public static volatile int race = 0;
    public static int value = 0;
    public static void increase(a) {
        race++;
        value++;
    }
    private static final int THREAD_COUNT = 20;
    public static void main(String[] args) {
        Thread[] threads = new Thread[THREAD_COUNT];
        for (int i = 0; i < THREAD_COUNT; i++) {
            threads[i] = new Thread(() -> {
               for (int j = 0; j < 10000; j++) { increase(); }}); threads[i].start(); }while (Thread.activeCount()> 1) {
            Thread.yield();
        }
        System.out.println("race: " + race + " value: "+ value); }}Copy the code

The program uses 20 threads to accumulate the volatile variable RACE 10,000 times per thread, which should be 200,000 if executed concurrently. The final result is a number less than 200,000

As you can see here, volatile variables do not guarantee atomicity. Run the above code through the JITWatch tool to get the following assembly statement:

Be volatile modified through assembly instruction can be seen, the lock instruction prefix, there is a lock instruction is used to write the local processor cache into the memory, at the same time to other processor cache invalidation, such other treatment need data calculation, have to read the data from main memory, so as to achieve the purpose of the visibility of the variable; For disabling instruction reordering, the entire lock instruction (lock add1$0x0, (% RSP)) forms a memory barrier to disable instruction reordering.

So far, we’ve looked at the characteristics of volatile variables and how the JMM implements them. However, there seems to be no answer to the question raised at the beginning of this article, why volatile is necessary to ensure visibility of variables when there are cache-consistency protocols to ensure cache consistency. This article will focus on the hardware aspects of caching, the MESI protocol, and more. You will have a deeper understanding of volatile variables.

4. Cache structure and MESI protocol analysis

First, the internal structure of the cache is as follows:

Inside the cache is a zipper hash table, which is similar to the internal structure of a HashMap. The cache is divided into buckets, and each bucket contains a list of cache entries. Each cache entry consists of three parts:

Tag: refers to the address of the cached data in main memory
Cache line: Stores data of multiple variables
Flag: indicates the status of the cache line

When the CPU accesses the memory, it decodes the address to obtain three data: index, which is used to locate the data in the bucket. Tag: determines which cache entry is in the tag. Offset: the corresponding data is obtained by the offset.

There are four states of flag:
- M: indicates that the cache line is valid and has just been Modified, which is inconsistent with the data in the memory or other caches
- E (Exclusive, Exclusive) : Indicates that the cache line is valid and is being exclusively modified and cannot be modified by other processors
- S (Shared) : indicates that the cache line is valid and the data is consistent with that in the memory or other caches
- I: indicates that the cache line is Invalid

This leads to the MESI cache consistency protocol, which has the following conventions for all processors:

Each processor will send messages to the bus when operating memory data, and each processor will constantly sniff messages from the bus to ensure the cooperation of each processor through this message.

There are also two operations in MESI:

Flush: Forces the processor to flush the updated data (possibly to buffers or registers) to the cache or main memory (different hardware implements the MESI protocol differently) and sends a message to the bus indicating that it has modified the data
Refresh operation: After sniffing a data failure from the bus, it invalidates the data in its own cache, and then loads the data from the updated processor cache or main memory into its own cache

Let’s illustrate the entire process of modifying data for one of the two processors (processor 0). Assume that the data cache line is in the S(Shared) state in both caches.

1. Processor 0 sends an invalidate message to the bus;

2. Processor 1 sniffs the invalidate message on the bus and locates the corresponding cache line through address resolution. When the state of the cache line is found to be S, the state of the cache line is changed to I and the INvalidate ACK message is returned to the bus.

3. After the bus sniffs all invalidate acks (for example, only processor 1), the state of the cache line to be Modified is set to E(Exclusive), indicating that the modification is Exclusive. After the modification, the state of the cache line to be Modified is set to M(Modified). Data may also be flushed back to main memory.

During this process, any other processor attempting to modify the state of the cache line in processor 0 will be blocked.

If processor 1 reads the corresponding cache line, it will find that the state is I(Invalid). Then processor 1 sends a read message to the bus. After sniffing the read message, processor 0 sends data to the bus from its cache or main memory and sets its corresponding cache line state to S(Shared). After processor 1 receives the read message from the bus, Writes the latest data to the corresponding cache line and sets its state to S(Shared). The state of the cache line corresponding to processor 0 and processor 1 becomes S(Shared).

The process of updating and reading data is as follows:

The MESI protocol ensures the consistency of cached data across processors, but it also presents two serious efficiency problems:

When processor 0 sends the invalidate message to the bus, the corresponding cache line state can be changed to E and modified until all other processors with the same cache return the Invalidate ACK message. However, it is always blocked during this process, which will seriously affect the processor performance
When processing 1 sniffs the invalidate message, it first sets the corresponding cache line status to I before returning the Invalidate ACK message to the bus. This process also affects performance. Based on the above two problems, designers introduced write buffers and invalid queues. In the above scenario, processor 0 first puts the modified data into the write buffer and then sends an invalidate message to the bus to notify other processors with the same cache that the cache is invalid. Then, processor 0 can continue to execute other instructions. After receiving invalidate acks from all other processors, Then set the cache line in processor 0 to E and write the data from the write buffer to the cache. When processor 1 sniffs an Invalidate message from the bus, it puts the message into an invalid queue and immediately returns an Invalidate ACK message. In this way, the processing speed is improved and the performance is improved.

After joining the write buffer and invalid queue, the cache structure looks like the following figure:

Problems with writing buffers and invalid queues:

Write bufferandInvalid queueImproves processor performance under THE MESI protocol, but also introduces new visibility and order issues as follows:As shown above: Assuming that the Shared variable x=0 initially exists in the caches of both processing 0 and processing 1, and the corresponding state is S(Shared), processing 0 needs to change the value of x to 1, write the value to the write buffer first, and then send an invalidate message to the bus. At the same time, processor 1 wants to assign the value of x to Y by 1. At this time, processor 1 finds that the status of x=0 in its cache is S, so it directly uses x=0 to participate in the calculation, thus an error occurs. Obviously, this error is caused by the write buffer and invalid queue, because the new value of X is still in the write buffer, and the invalid message is in the invalid queue of processing 1.

To solve this problem, there are two types of memory barriers: Store barriers and Load barriers.

Write barrier: Force the write buffer to be written to the cache, or write all instructions behind the barrier to the write buffer until the previous write buffer is flushed back to the cache. That is, processing 0 must wait for all invalidate ACK messages before performing subsequent operations, equivalent to flush.
Read barrier: The processor must forcibly check whether there are invalidate messages in the invalid queue before reading data. If there are invalid messages in the invalid queue, the processor must process them before reading data, which is equivalent to the refresh operation.

Visibility and order are ensured by adding read and write barriers. The reason why order is guaranteed is that the instruction out-of-order phenomenon is that the write buffer asynchronously receives the invalidate ACK message from other processors and then executes the contents in the write buffer, resulting in the disorder of the order of the instructions that should be executed. By adding write barriers, the subsequent instructions can be executed only after asynchronous operations, ensuring the original order of instructions.

The JMM guarantees the order and visibility of volatile variables through four memory barriers. How do read/write barriers relate to the four memory barriers in the JMM?

Relationship between write barriers and (StoreStore, StoreLoad) barriers: Adding a StoreSore barrier before volatile writes ensures that the contents of the write buffer are flushed back to the cache before volatile writes, preventing instruction rearrangements between previous writes and volatile writes. Adding a StoreLoad barrier after volatile writes ensures that the contents of the write buffer are flushed back to the cache before volatile writes. This ensures that subsequent read/write operations are rearranged with volatile writes, so the write barrier functions as both StoreStore and StoreLoad
LoadLoad and LoadStore barriers Adding a LoadLoad barrier after volatile reads ensures that invalid messages on invalid queues for subsequent reads are flushed back to the cache. Adding a LoadStore barrier after volatile reads ensures that invalid messages on invalid queues for subsequent writes are flushed back to the cache. Read barriers have both LoadLoad and LoadStore functions.

At this point, isn’t it clear why the volatile keyword is needed to ensure visibility and order when MESI cache consistency protocols exist?

Reference: [1]: Zhonghua Shishan [2]: In-depth understanding of the Java virtual machine

For more exciting content, please follow my personal official account, scan the qr code below or search on wechat: Xiao said

Volatile can also be understood in this hard way

1. The relationship between computer memory model and Java memory model

2. Characteristics of volatile variables

3. Dig deep into volatile variables

4. Cache structure and MESI protocol analysis

Related Posts

Linux overview

SQL language set data query, data manipulation, data definition and data control functions in one

GitLab CI Continuous integration -.gitlab- ci.yML