📦 The article and sample source code are archived in Javacore

Java Memory Model (JMM).

In the JVM, an attempt was made to define a JMM to mask differences in memory access between hardware and operating systems, so that Java programs can achieve consistent memory access across platforms.

I. Physical memory model

The concurrency problem in physical machines is similar to that in virtual machines, and the solution to the concurrency problem in physical machines is of great reference significance to the implementation of virtual machines.

Hardware processing efficiency

The first problem with physical memory is hardware processing efficiency.

  • The vast majority of computing tasks cannot be completed only by the processor “computing”. The processor at least needs to interact with memory, such as reading operational data and storing operational results. This I/O operation is very difficult to eliminate (you cannot complete all computing tasks only by relying on registers).
  • Since there are several orders of magnitude differences between the computing speed of the computer’s storage device and that of the processor, this contradiction in speed will reduce the processing efficiency of the hardware. As a result, modern computers have to include a Cache as a buffer between memory and the processor. Copying needed data to the cache allows the operation to proceed quickly, and then synchronizing it from the cache to memory when the operation is complete, so that the processor does not have to wait for slow memory reads and writes.

Cache consistency

Caching solves the problem of hardware efficiency, but introduces a new one: Cache Coherence.

In a multiprocessor system, each processor has its own cache, and they share the same main memory. When the computation tasks of multiple processors all involve the same main memory area, the cache data of each processor may be inconsistent.

In order to solve the problem of cache consistency, it is necessary for each processor to follow some protocol when accessing the cache, and to operate according to the protocol when reading and writing.

Code out of order to perform optimization

In addition to caching, the processor may optimize the input code for out-of-order Execution in Order to maximize the use Of the processor’s internal operational units. The processor will recombine the out-of-order execution result after calculation to ensure that the result is consistent with the sequential execution result, but does not guarantee that the order of each statement in the program is the same as the order in the input code.

Out-of-order execution is a process in which the processor optimizes the order of the code in order to improve the speed of the computation.

  • In a single-core environment, the processor guarantees that the optimizations made will not lead to execution far from the desired goal, but this is not the case in a multi-core environment.
  • In a multi-core environment, if the computing tasks of one core depend on the intermediate results of the computing tasks of the other core, and the related data is read and written without any protection measures, the sequence of the code can not be guaranteed.

Java memory model

The concept of a memory model. We can think of it as a process abstraction of read and write access to a particular memory or cache under a particular operating protocol. Physical computers with different architectures can have different memory models, and the JVM also has its own memory model.

The JVM attempts to define a Java Memory Model (JMM) to mask the differences in Memory access between different hardware and operating systems, so that Java programs can achieve consistent Memory access effects across different platforms.

Main memory and working memory

The main goal of the JMM is to define the access rules for variables in the program, the low-level details of storing and fetching variables from memory in the virtual machine. Variables include instance fields, static fields, and elements that make up numeric objects, but exclude local Variables and method parameters, which are thread private and will not be shared. To achieve better performance, the JMM does not limit the execution engine to use specific registers or caches of the processor to interact with main memory, nor does it limit optimizations such as adjusting the order in which code is executed even if the compiler does so.

The JMM specifies that all variables are stored in Main Memory.

Each thread also has its own Working Memory, which holds a copy of the main Memory of the variables used by the thread. Working memory is a JMM abstraction that doesn’t really exist, covering caches, write buffers, registers, and other hardware and compiler optimizations.

All operations by a thread on a variable must be done in working memory, rather than directly reading or writing variables in main memory. Different threads cannot directly access variables in each other’s working memory, and the transfer of variable values between threads needs to be completed through the main memory.

Description:

The main memory, working memory is not the same level of memory partition as the heap, stack, method area, and so on in the Java memory area.

JMM memory operation problems

Similar to the problems faced by the physical memory model, the JMM has two problems:

  • Working memory data consistency – When each thread manipulates data, it saves a copy of the shared variable in the main memory. When multiple threads perform operations involving the same shared variable, their copies of the shared variable are inconsistent. If this happens, whose copy of the data will be synchronized back to main memory? The Java memory model ensures data consistency through a series of data synchronization protocols and rules.
  • Instruction reorder optimization– In Java, reordering is usually used by the compiler or runtime environment to optimize program performance by reordering instructions. There are two types of reordering:Compile – time and run – time reordering, corresponding to compile-time and run-time environments respectively. Similarly, instruction reordering is not arbitrary, it needs to satisfy the following two conditions:
    • You cannot change the results of a program run in a single-threaded environment. The just-in-time compiler (and processor) needs to ensure that the program compliesas-if-serialProperties. In layman’s terms, in a single-threaded case, you want to give the program the illusion of sequential execution. That is, the result of the reordered execution must be the same as the result of the sequential execution.
    • Data dependencies cannot be reordered.
    • In a multi-threaded environment, if there is a dependency between thread processing logic, the result may be different from the expected result due to instruction reordering.

Intermemory operation

The JMM defines eight operations to complete the interaction between main and working memory. The JVM implementation must ensure that each of the operations described below is atomic (load, store, read, and write are allowed for exceptions on some platforms for double and long variables).

  • lock(lock) – acting on phiMain memory, which identifies a variable as a thread-exclusive state.
  • unlock(Unlocked) – Action onMain memoryIt releases a variable that is in a locked state before it can be locked by other threads.
  • read(read) – operates onMain memoryA variable that takes the value of a variable from main memorytransmissionTo the thread’s working memory for subsequentloadAction use.
  • write(write) – operates onMain memoryThe store operation takes the value of the variable from working memory and puts it into the main memory variable.
  • load(Load) – action onThe working memory, which puts the value of the variable obtained by the read operation from main memory into a copy of the variable in working memory.
  • use(used) – acting onThe working memory, which passes the value of a variable in working memory to the execution engine whenever the virtual machine reaches a worthwhile bytecode instruction that needs to use the variable.
  • assign(Assign) – operates on phiThe working memory, which assigns a value received from the execution engine to a variable in working memory, performing this operation whenever the virtual machine encounters a bytecode instruction that assigns a value to the variable.
  • store(Storage) – acting onThe working memory, which passes the value of a variable in working memory to main memory for subsequent usewriteOperational use.

If you want to copy a variable from main memory to working memory, you need to perform read and load operations in order. If variables are synchronized from working memory back to main memory, store and write operations need to be performed sequentially. But the Java memory model only requires that these operations be performed sequentially, not sequentially.

The JMM also specifies the eight basic operations, which must meet the following rules:

  • Read and load must be paired; Store and write must be paired. That is, it is not allowed for a variable to be read from main memory but not accepted by working memory, or for a variable to be written back from working memory but not accepted by main memory.
  • A thread is not allowed to discard its most recent assign operation, in which a variable changed in working memory must be synchronized to main memory.
  • It is not allowed for a thread to synchronize data from working memory back to main memory for no reason (without any assign operation).
  • A new variable can only be created in main memory. It is not allowed to directly use an uninitialized (load or assign) variable in working memory. In other words, a load or assign operation must be performed before use and store can be applied to a variable.
  • A variable can be locked by only one thread at a time. However, the lock operation can be repeated by the same thread several times. After the lock operation is performed several times, the variable can be unlocked only after the same number of UNLOCK operations are performed. So lock and unlock must come in pairs.
  • If you lock a variable, the value of the variable will be emptied from working memory. Before the execution engine can use the variable, you need to re-initialize the value by load or assign.
  • You cannot unlock a variable that has not been previously locked by a lock operation, nor can you unlock a variable that has been locked by another thread.
  • Before you can unlock a variable, you must first synchronize it to main memory (store and write)

Java memory model rules

Three main features of memory interaction

The eight basic operations for Java memory interaction described above follow the three main features of Java memory: atomicity, visibility, and orderliness.

These three features, in the final analysis, are designed to achieve multi-threaded data consistency, so that the program can operate as expected in a multi-threaded concurrent, instruction reordering optimized environment.

atomic

Atomicity means that one or more operations are either all performed (without interruption by any factor) or none at all. Even when multiple threads are working together, once an operation has started, it cannot be disturbed by other threads.

In Java, two high-level bytecode instructions, Monitorenter and Monitorexit, are provided to ensure atomicity. The Java equivalent for these two bytecodes is synchronized.

Therefore, synchronized can be used in Java to ensure that operations within methods and code blocks are atomic.

visibility

Visibility means that when multiple threads access the same variable and one thread changes the value of the variable, other threads can immediately see the changed value.

JMM relies on main memory as a delivery medium by synchronizing the new values back to main memory after variable changes and refreshing the values from main memory before variables are read.

Java implements multithreaded visibility in the following ways:

  • volatile
  • synchronized
  • final

order

Orderliness rules manifest themselves in two scenarios: within and between threads

  • In-thread – from a thread’s point of view, the instructions are executed in a form called serial (as-if-serial), which is already used in sequential programming languages.
  • Interthread – When this thread “watches” other threads concurrently execute code that is not synchronized, any code may be executed interlock due to instruction reordering optimization. The only constraint that matters is that for synchronized methods, synchronized blocks (synchronizedKeyword modification) andvolatileThe operation of the fields remains relatively orderly.

In Java, synchronized and volatile can be used to ensure order between multiple threads. Implementation methods are different:

  • volatileKeyword disallows instruction reordering.
  • synchronizedThe keyword is mutually exclusive to ensure that only one thread can operate at a time.

Principle of antecedent

The JMM defines a partial order relationship for all operations in a program, called the happens-before principle.

The principle of antecedent is very important. It is the main basis for determining whether data is competing and whether threads are safe. Based on this principle, we can solve all the problems of whether two operations may conflict in a concurrent environment through several rules.

  • Sequence rule – in a thread, in order of code, operations written earlier take place before those written later.
  • Pipe lock rules– aunLockThe operation takes place first and then on the same locklockOperation.
  • Volatile variable rule– to avolatileA write to a variable occurs first after a read to the variable.
  • Thread start ruleThreadThe object’sstart()Method occurs first for each action of this thread.
  • Thread termination rule– All operations in a thread occur before the thread terminates, which we can passThread.join()Method ends, and the return value method of Thread.isalive () detects that the Thread has terminated execution.
  • Thread interrupt rule– on the threadinterrupt()Method is invoked when the interrupted thread’s code detects the occurrence of an interrupt event and can passThread.interrupted()Method detects whether an interrupt has occurred.
  • Object finalization rule– The completion of an object’s initialization takes place before itsfinalize()The beginning of the method.
  • Transitivity – If operation A precedes operation B, and operation B precedes operation C, it follows that operation A precedes operation C.

The memory barrier

How do you ensure order and visibility of the underlying operations in Java? The memory barrier can be passed.

A memory barrier is an instruction inserted between two CPU instructions to prevent processor instructions from being reordered (like a barrier). In addition, in order to achieve the effect of a barrier, it also causes the processor to write the main memory value to the cache before writing or reading the value, clearing the invalid queue, thus ensuring visibility.

Here’s an example:

Store1; Store2; Load1; StoreLoad; // Memory barrier Store3; Load2; Load3; Copy the codeCopy the code

For the above set of CPU instructions (Store for write and Load for read), Store instructions before the StoreLoad barrier cannot be swapped with loads behind the StoreLoad barrier, i.e. reordering. However, the instructions before and after the StoreLoad barrier are interchangeable, i.e. Store1 can be interchanged with Store2 and Load2 can be interchanged with Load3.

There are four common barriers

  • LoadLoadBarriers – for such statementsLoad1; LoadLoad; Load2, ensure that the data to be read by Load1 is completed before the data to be read by Load2 and subsequent read operations are accessed.
  • StoreStoreBarriers – for such statementsStore1; StoreStore; Store2Before Store2 and subsequent write operations are executed, ensure that the write operations of Store1 are visible to other processors.
  • LoadStoreBarriers – for such statementsLoad1; LoadStore; Store2Before Store2 and subsequent write operations are executed, ensure that the data to be read by Load1 is completed.
  • StoreLoadBarriers – for such statementsStore1; StoreLoad; Load2, ensure that writes to Store1 are visible to all processors before Load2 and all subsequent read operations are executed. It has the largest overhead of the four barriers (flushing the write buffer, emptying the invalid queue). In most processor implementations, this barrier is a universal barrier that doubles as the other three memory barriers.

The use of memory barriers in Java is less common in generic code, such as blocks of code modified with the volatile and synchronized keywords (more on that later). Also, memory barriers can be used through the Unsafe class.

Special rules for volatile variables

Volatile is the lightest synchronization mechanism provided by the JVM.

Volatile is the Chinese word for unstable and volatile, and the use of volatile to modify variables is intended to ensure that variables are visible in multiple threads.

Characteristics of volatile variables

Volatile variables have two properties:

  • Ensure that variables are visible to all threads.
  • Command reordering is disabled
Ensure that variables are visible to all threads

Visibility here means that when one thread changes the value of a volatile variable, the new value is immediately known to other threads. This cannot be done with ordinary variables, whose values are passed between threads through main memory.

A thread writes a volatile variable:

  1. Change the value of a volatile copy in the thread working memory
  2. Flushes the value of the changed copy from working memory to main memory

A thread reads a volatile variable:

  1. Reads the latest value of a volatile variable from main memory into the thread’s working memory
  2. Reads a copy of the volatile variable from working memory

Note: Ensuring visibility is not the same as ensuring the safety of concurrent operations with volatile variables

Atomicity is still guaranteed by shackles in scenarios that do not:

– The result of the operation does not depend on the current value of the variable, or can ensure that only a single thread changes the value of the variable.

– Variables do not need to participate in invariant constraints with other state variables.

However, if multiple threads flush the updated variable value back to main memory at the same time, the value may not be the expected result:

For example: Volatile int count = 0; count++ is executed 500 times by two threads. Count ++ is executed 500 times by two threads. The result is less than 1000.

  1. The thread reads the latest count value from main memory
  2. The execution engine increments the count value by 1 and assigns it to the thread working memory
  3. Thread working memory saves count to main memory. It is possible that at some point, both threads read 100 in step 1, get 101 after step 2, and finally refresh twice and save 101 to main memory.
Semantic 2 disallows instruction reordering

To be specific, the rules against reordering are as follows:

  • When the program executes tovolatileWhen a variable is read or written, all previous changes must have been made and the result is visible to subsequent operations; The operation behind it has certainly not taken place;
  • Instruction optimization cannot be performed in pairsvolatileVariable access statements are executed after it and cannot bevolatileThe statement following the variable is executed in front of it.

Ordinary variables only guarantee that the method will get the correct result wherever it depends on the assignment, not that the assignment will be executed in the same order as in the program code.

Here’s an example:

volatile boolean initialized = false; DoSomethingReadConfg (); doSomethingReadConfg(); doSomethingReadConfg(); initialized = true; Initialized is true, indicating that thread A has completed the initialization of configuration information. initialized) { sleep(); } // Use thread A to initialize the configuration information doSomethingWithConfig(); Copy the codeCopy the code

If the initialized variable is defined without the volatile modifier, it may be optimized for instruction reordering. Cause the last line of code in thread A, “initialized = true”, to be executed before “doSomethingReadConfg()”, which causes the code in thread B to use configuration information to potentially fail, The volatile keyword prevents this by prohibiting the semantics of reordering.

The principle of volatile

When bytecode is generated at compile time, memory barriers are added to the instruction sequence to ensure this. The following is the JMM memory barrier insertion strategy based on conservative strategy:

  • Insert a StoreStore barrier before each volatile write. In addition to ensuring that writes before and after a barrier cannot be reordered, this barrier also ensures that any read or write prior to volatile writes will be committed prior to volatile.
  • Insert a StoreLoad barrier after each volatile write. In addition to preventing volatile writes from reordering with subsequent reads, this barrier flushers the processor cache so that write updates to volatile variables are visible to other threads.
  • Insert a LoadLoad barrier after each volatile read. In addition to preventing volatile reads from reordering from previous writes, the barrier flushers the processor cache so that volatile variables are read with the latest values.
  • Insert a LoadStore barrier after each volatile read. In addition to disallowing the reordering of volatile reads with any subsequent writes, the barrier flushers the processor cache so that write updates of volatile variables from other threads are visible to the thread with the volatile reads.

Usage scenarios for volatile

To sum up, it is “write once, read everywhere”, with one thread responsible for updating variables and the other thread only reading (not updating) variables and performing logic based on their new value. For example, status flag bits are updated and observer model variable values are published.

Special rules for long and double variables

Lock, unlock, read, Load, assign, Use, Store, and write operations are required to be atomic, but for 64-bit data types (long and double), the model defines the following rules loosely: Allow the VIRTUAL machine to split read and write operations on 64-bit data that is not volatile into two 32-bit operations. This allows the VIRTUAL machine to choose not to atomize the load, store, read, and write operations on 64-bit data. Because of this non-atomicity, it is possible for other threads to read the value of the “32-bit half-variable” that has not been synchronized.

However, in practical development, the Java memory model strongly suggests that the virtual machine should implement atomic reading and writing of 64-bit data. At present, commercial virtual machines under various platforms choose to treat the reading and writing of 64-bit data as atomic operation. Therefore, we generally do not need to declare the long and double variables used as volatile when writing code.

Special rules for final quantities

As we know, final member variables must be initialized at declaration time or in the constructor, otherwise a compilation error will be reported. Visibility of the final keyword means that, once the initialization of a field modified by final is complete, the value of the final field can be correctly seen by other threads without synchronization at declaration time or in the constructor. This is because the value of the final variable is written back to main memory as soon as initialization is complete.

The resources

  • Java Concurrent Programming
  • The Art of Concurrent Programming in Java
  • In-depth Understanding of the Java Virtual Machine
  • Understand the Java memory model