JVM Literacy 3: The Virtual Machine memory model and efficient concurrency

When multiple threads access the same object, if don’t have to consider these threads in the runtime environment of scheduling and alternate operation, also do not need to undertake additional synchronization, or any other coordinated operation in the caller, call the object’s behavior can obtain correct results, that the object is thread-safe. The interpretation of the definition is a matter of opinion. Thread-safety problems are generally caused by data inconsistencies and reorders between main memory and working memory. The most important thing to solve thread-safety problems is to understand how these two kinds of problems come from. The core of understanding them is to understand the Java Memory Model (JMM).

1. Virtual machine memory model JMM

The Java Memory Model, or JMM, is an abstract concept, or protocol, used to solve the problem of Memory access in concurrent programming while being compatible with different hardware and operating systems. The principle of JMM is similar to the principle of hardware consistency. In an implementation of hardware consistency, there is one cache per CPU, and each CPU interacts with its own cache to read and write data to shared memory.

As shown in the figure below, in the Java memory model, all variables are stored in main memory. Each Java thread has its own working memory, which stores a copy of the variables used by the thread. The thread can read and write variables in the working memory, and cannot directly operate the main memory, nor can it directly access the working memory of other threads. When a variable’s value is passed between threads, it must pass through main memory.

When two threads A and B communicate, they need to go through the following two steps:

Thread A reads the shared variable into thread A’s working memory from the main memory and operates on it, then writes the data back to the main memory.
Thread B reads the latest shared variable from main memory

The volatile keyword enables each volatile variable to be forcibly flushed to main memory and thus visible to each thread.

It is important to note that the JMM and Java memory partition are at a different conceptual level. Rather, the JMM describes a set of rules that govern how variables in a program can be accessed in shared and private data areas. In the JMM, main memory is the shared data area, which to some extent should include the heap and method area, while working memory data thread private data area, which to some extent should include program counters, virtual machine stacks, and local method stacks.

The operation of intermemory interaction

The principle of interaction between main memory and working memory and communication between threads in the JMM was described above, but in terms of how variables are passed between individual memory, the JMM defines eight operations to implement specific interaction protocols between main memory and working memory:

lock(lock) : a variable that operates on main memory, marking a variable as a thread-exclusive state;
unlock(unlock) : effect on the main memory variable, a variable in the locked state released, released after the variable can be locked by other threads;
read(read) : acts on a main memory variable to transfer a variable value from main memory to the thread’s working memory for subsequent load action;
load(Load) : a variable operating on working memory that places the value of the variable obtained by the read operation from main memory into a copy of the variable in working memory;
use(use) : a variable that operates on working memory, passing the value of a variable in working memory to the execution engine whenever the virtual machine reaches a bytecode instruction that needs to use the value of the variable;
assign(assignment) : a variable operating in working memory that assigns a value received from the execution engine to a variable in working memory. This operation is performed whenever the virtual machine accesses a bytecode instruction that assigns a value to the variable.
store(storage) : a variable operating on working memory that transfers the value of a variable in working memory to main memory for subsequent write operations;
write(write) : a variable operating on main memory that transfers store operations from the value of a variable in working memory to a variable in main memory.

To copy a variable from main memory to working memory, read and load operations are performed sequentially, while to synchronize variables from working memory back to main memory, store and write operations are performed sequentially. The Java memory model only requires that the above two operations be executed sequentially, but there is no guarantee that they must be executed consecutively. Other instructions can be inserted between read and load, store and write. For example, when accessing variables A and B in main memory, the possible order is read A, read B, load B, load A.

The Java memory model also specifies that the following rules must be met when performing the eight basic operations described above:

Don’t allowread 和 load,store 和 writeOne of the operations appears alone;
A thread is not allowed to discard its nearestassignThat is, variables changed in working memory must be synchronized to main memory;
It is not allowed for a thread to have no reason (nothing has happenedassignOperation) synchronize data from working memory back to main memory;
A new variable can only be created in main memory. It is not allowed to directly use an uninitialized (load or assign) variable in working memory. That is, it applies to a variableuse 和 storeYou must have done it before you can do itassign 和 loadOperation;
Only one thread is allowed to run on a variable at a timelockOperation,lock 和 unlockMust come in pairs;
If it’s executed on a variablelockAction will clear the value of the variable from working memory and will need to be reexecuted before the execution engine can use the variableload 或 assignOperation to initialize the value of a variable;
If a variable has not beenlockThe operation is locked, and it is not allowed to be executedunlockIt is also not allowed to unlock a variable that has been locked by another thread;
Execute on a variableunlockThis variable must be synchronized to main memory before operation (executestore 和 writeOperation).

In addition, the virtual machine makes some special provisions for the voliate keyword and for long and double.

The two functions of the voliate keyword

Ensure variable visibility: When a variable modified by the voliate keyword is modified by one thread, other threads can immediately get the result of the modification. When a thread writes data to a variable modified by the voliate keyword, the virtual machine forces it to be flushed to main memory. When a thread uses a value decorated with the voliate keyword, the virtual machine forces it to read from main memory.
Shielding instruction reordering: Instruction reordering is a method used by compilers and processors to optimize programs efficiently. It can only ensure that the results of program execution are correct, but it cannot guarantee that the order of program operations is consistent with the order of code. This is not a problem in a single thread, but it can be in multiple threads. The classic example is to add voliate to a field in a singleton method to prevent instruction reordering. To illustrate this point, look at the following example.

Let’s use the following program as an example to illustrate how voliate prevents instruction reordering:

    public class Singleton {
        private volatile static Singleton singleton;

        private Singleton(a) {}

        public static Singleton getInstance(a) {
            if (singleton == null) { / / 1
                sychronized(Singleton.class) {
                    if (singleton == null) {
                        singleton = new Singleton(); / / 2}}}returnsingleton; }}Copy the code

In fact, if we had not decorated the variable Singleton with the voliate keyword at point 2, we might have caused an error. This is because the process of initializing an object with the new keyword is not an atomic operation. It is broken down into three steps:

Allocate memory for singleton
Call the constructor of the Singleton to initialize a member variable
Point the Singleton object to the allocated memory space (singleton is non-null after this step)

If the VIRTUAL machine has instruction reordering optimization, the order of steps 2 and 3 cannot be determined. If thread A enters the synchronized block first and executes 3 instead of 2, then the singleton is not null. Thread B gets to 1, checks that the Singleton is not null, and returns it to use because the singleton is not actually initialized.

Note, however, that prior to JDK 1.5, the use of volatile double-checking was problematic. The reason for this was that the JMM (Java Memory Model) prior to Java 5 was flawed, and even declaring variables volatile did not completely avoid reordering, primarily because the code before and after volatile variables still had reordering problems. The issue of volatile blocking reordering was fixed in JDK 1.5 (JSR-133), when the JDK increased the semantics of volatile and added read/write barriers to volatile objects to ensure visibility. At this point 2-3 becomes the code order and is not rearranged by the CPU, so it is safe to use volatile after that.

Special rules for long and double

In addition to the special rules for voliate, the virtual machine also has some special rules for long and double: it allows reads and writes of long and double that are not volatile to be split into two 32-bit operations. That is, reading and writing long and double is non-atomic, and is done in two steps. However, you can guarantee atomicity by declaring them voliate.

Happens-before & as-if-Serial

The Java memory model is defined by operations, and the JMM defines a happens-before relationship for all operations in a program. It is the main basis for determining whether data is contentious and threads are safe. To ensure that the thread performing operation B sees the result of operation A, A happens-before relationship must be satisfied between A and B, or the JVM can order them arbitrarily.

The principle of antecedent mainly includes the following items. When two variables meet any of the following relations, we can judge that there is a sequential and serial execution between them.

Program Order RuleIn the same thread, operations written earlier take place before manipulations written later, in order of program code. To be precise, it is the control flow sequence of the program, considering branches and loops, etc.;
Monitor Lock RuleUnlock: An unlock operation occurs first (in chronological order) in subsequent lock operations on the same lock.
Volatile Variable Rules: Writes to a volatile occur first (in chronological order) before reads to that volatile;
Thread Start Rule: of the Thread objectstart()Method occurs first for every action on this thread;
Thread Termination Rule: All operations on a thread occur before the thread terminates and can passThread.join()Method end,Thread.isAlive()The thread has terminated execution;
The Thread Interruption Rule: the threadinterrupt()Method is invoked when an event occurs when the interrupted thread’s code detects an interrupt.Thread.interrupted()Can detect whether there is an interruption;
Object Finilizer RuleThe completion of an object’s initialization (completion of constructor execution) occurs first of itsfinalize()The beginning;
Transitivity (Transitivity): If operation A precedes operation B and operation B precedes operation C, then A precedes operation C.

There is no relationship between the time sequence of different operations and the principle of antecedent, the two can not be inferred from each other, the measurement of concurrency safety can not be interfered by the time sequence, everything should be based on the principle of antecedent.

If two operations access the same variable, and one of them is a write operation, the two operations have data dependence. There are three cases: 1. Read write; 2). 3). Write after read, the three operations are data dependent, reordering will have an impact on the final execution result. The compiler and processor adhere to data dependencies when reordering, and the compiler and processor do not change the order in which two operations are executed in a data dependency relationship.

There is also the as-if-serial semantics: no matter how much reordering (compiler and processor to provide parallelism), the execution result of a (single-threaded) program cannot be changed. The compiler, runtime, and processor must comply with the AS-IF-Serial semantics. The as-if-serial semantics guarantee that the execution result of a program in a single thread is not changed, and the happens-before relationship guarantee that the execution result of a properly synchronized multithreaded program is not changed.

The happens-before principle and as-if-serial semantics are the principles followed by the virtual machine to optimize the parallelism of the provider under the condition that the execution result is not changed. The former applies to the multi-threaded situation, while the latter applies to the single-threaded environment.

Java threads

2.1 Implementation of Java thread

In Windows system and Linux system, the implementation of Java thread is based on one-to-one thread model, the so-called one-to-one model, in fact, through the language level program to indirectly call the system kernel thread model, that is, when we use Java thread, Inside the Java virtual machine, the kernel thread of the current operating system is instead called to complete the current task. Kernel-level threads (KLT) are threads supported by the operating system Kernel. These threads are switched by the operating system Kernel, which schedules the threads by operating the scheduler. And map the tasks of the thread to each processor. Each kernel thread can be seen as a doppelgant of the kernel, which is why operating systems can multitask simultaneously. Because the multithreaded programs we write are language-level, they don’t call kernel threads directly. Instead, they call light-weight processes, threads in general, because each lightweight Process maps to a kernel thread. Therefore, we can call kernel threads through lightweight processes, and then the operating system kernel maps tasks to individual processors. This one-to-one relationship between lightweight processes and kernel threads is called the one-to-one threading model.

As shown in the figure, each thread is eventually mapped to the CPU for processing, and if the CPU has multiple cores, a single CPU can execute multiple threading tasks in parallel.

2.2 Thread Safety

There are three ways to ensure thread-safety in Java: 1) mutex synchronization; 2). Non-blocking synchronization; 3). No synchronization.

The mutex synchronization

The most basic way to use synchronization in Java is to use the Sychronized keyword, which, when compiled, forms monitorenter and Monitorexit bytecode instructions before and after the synchronized code block. Both bytecodes require a reference type parameter to specify which object to lock and unlock. If an object parameter is specified explicitly in a Java program, the object is used; otherwise, either the object instance or the Class object is removed as the locked object, depending on whether the sychronized or Class method is modified.

Synchronized is inherently reentrant: According to the requirements of the VIRTUAL machine, when executing the sychronized command, it first tries to obtain the lock of the object. If the object is not locked or the current thread already owns the lock, it increments the counter of the lock by one. Accordingly, monitorexit decreases the counter of the lock by one and releases the lock when the counter reaches zero. If the object lock fails to be acquired, the current thread blocks and waits until the object lock is released by another thread.

In addition to using Sychronized, we can also use ReentrantLock in JUC for synchronization, which is similar to Sychronized, but differs mainly in the following three aspects:

Interruptible wait: When the thread holding the lock does not release the lock for a long time, the thread waiting can choose to abandon the wait.
Fair lock: When multiple threads are waiting for the same lock, they must obtain the lock in the sequence in which the lock was applied. Non-fair locks do not guarantee that any thread waiting for the lock can acquire it when it is released. Sychronized itself is not a fair lock, whereas ReentrantLock is not fair by default and can be required to be fair through the constructor.
A lock can bind to multiple conditions: a ReentrantLock can bind to multiple conditions, whereas sychronized has to add a lock to associate with multiple conditions. A ReentrantLock simply calls newCondition multiple times.

Before JDK1.5, Sychronized was a bit worse than ReentrantLock in multi-threaded environments, but after JDK1.6, virtual machines were optimized for Sychronized performance, Performance is no longer the primary factor in using ReentrantLock instead of Sychronized.

Nonblocking synchronization

Non-blocking synchronization is achieved without suspending threads, as opposed to mutex synchronization. Mutex synchronization is essentially a pessimistic concurrency strategy, while non-blocking synchronization is an optimistic concurrency strategy. Many concurrent constructs in JUC are implemented based on the CAS principle, which is compare-and-swape, similar to optimistic locking. But unlike the familiar optimistic lock, it involves three values: “New value”, “old value” and “in-memory value”, in the implementation of an infinite loop, each time the “old value” and “in-memory value” compared, if the two values are the same, it means that the “in-memory value” has not been modified by other threads; Otherwise, it has been modified. You need to read the value in memory again as old value, and then check the old value and In-memory value. Until the old value is the same as the in-memory value, the new value is updated into memory.

Note here that the above CAS operation is divided into three steps, but these three steps must be done at once, because otherwise, when the “value in memory” is determined to be equal to the “old value”, the “new value” written to memory may be modified by other threads, resulting in incorrect results. Unsafe and other Native methods such as compareAndSwapInt in the JDK do this. Also note that there are some problems with the CAS operation above:

A typical ABA problem is that when a value in memory is changed by one thread and then changed back, the current thread sees the value as expected, but has actually been changed by another thread. To solve the ABA problem, you can use a traditional mutex synchronization strategy.
Another problem with CAS is that they can spin for too long. Because CAS is non-blocking synchronous, the thread will not be suspended, but will spin (nothing more than an infinite loop) for the next attempt, where spinning for too long can be costly to performance.
As can be seen from the above description, CAS can only guarantee the atomicity of one shared variable, which cannot be guaranteed when there are multiple variables. One solution is to package multiple shared variables into one, that is, define them as a whole into one object, and use CAS to ensure the atomicity of the whole, for exampleAtomicReference.

Asynchronous scheme

A synchroness solution is one that does not require synchronization.

For example, if some sets are immutable, there is no need to synchronize them.
There are some methods that function as a function, which is more common in functional programming ideas, such a function can predict the output from the input, and the variables involved in the calculation are local variables, so there is no need to synchronize.
Another option is thread-local variables, such as ThreadLocal.

2.3 lock optimization

Spin-locking and adaptive spin

Spin locks are used to solve the problem of thread switching during mutually exclusive synchronization, because thread switching itself has some overhead. If the physical machine has more than one processor that allows two or more threads to execute in parallel at the same time, we can tell the next thread that requests the lock to “wait a minute,” but not give up the processor’s execution time, to see if the thread holding the lock will release the lock soon. To make a thread wait, we simply have the thread perform a busy loop (spin), a technique known as spin locking.

Spin-locking was introduced in JDK 1.4.2, but is turned off by default. It can be turned on using the -xx :+UseSpinning parameter, which was turned on by default in JDK 1.6. Spin wait itself while avoiding the thread overhead, but it is to take up processor time, so if the lock occupied a very short time, the effect of spin wait will be very good, if the lock is the amount of time is very long, so the spin thread will all be processor resources consumption, and won’t do any useful work, it will bring the performance of the waste.

We can specify the number of spins with the argument -xx :PreBlockSpin. The default is 10 spins. Adaptive spin locking was introduced in JDK 1.6. Adaptive means that the spin time is no longer fixed, but is determined by the previous spin time on the same lock and the state of the lock owner. If the spin wait has just successfully acquired the lock on the same lock object, and the thread holding the lock is running, the virtual machine will assume that the spin wait is likely to succeed again, and it will allow the spin wait to last a relatively long time, such as 100 cycles. On the other hand, if the spin is rarely successfully acquired for a lock, it is possible to omit the spin process in future attempts to acquire the lock to avoid wasting processor resources.

Here is an example of an implementation of a spin lock:

    public class SpinLock {
        private AtomicReference<Thread> sign = new AtomicReference<>();

        public void lock(a) {
            Thread current = Thread.currentThread();
            while(! sign.compareAndSet(null, current)) ;
        }

        public void unlock(a) {
            Thread current = Thread.currentThread();
            sign.compareAndSet(current, null); }}Copy the code

As we can see from the above example, the spin lock is placed and released by CAS by comparing the period value to the expected value. If the value of sign in the lock method is null, the lock is released, otherwise the lock is occupied by another thread and needs to wait through a loop. In the UNLOCK method, waiting threads are notified that the lock has been released by setting the value in sign to null.

Lock coarsening

The concept of lock coarsening should be easy to understand. It is to merge multiple locking and unlocking operations into one, and expand multiple consecutive locks into a larger range of locks.

    public class StringBufferTest {
        StringBuffer sb = new StringBuffer();

        public void append(a){
            sb.append("a");
            sb.append("b");
            sb.append("c"); }}Copy the code

Each call to sb.append() requires locking and unlocking. If the virtual machine detects a series of locking and unlocking operations on the same object in a row, it will combine them into a larger locking and unlocking operation, the first time the method append() is used. Unlock after the last append() method.

Lightweight lock

Lightweight locks are used to solve the performance cost problem of heavyweight locks in the process of mutual exclusion. The so-called heavyweight locks are sychronized locks. Synchronized is implemented through a lock inside an object called a monitor. But the essence of the monitor Lock depends on the underlying operating system Mutex Lock to implement. However, the operating system needs to switch between threads from user state to core state. This cost is very high, and the conversion between states takes a relatively long time.

First, there is a part called Mark Word in the object header of the object, which stores the runtime data of the object, such as hash code, GC age, and so on, with 2 bits for the lock flag bits.

When the code enters the synchronization block, if the Lock state of the object is no Lock state (the Lock flag bit is “01” state), the virtual machine will first create a space called Lock Record in the stack frame of the current thread, which is used to store the current copy of the Lock object Mark Word. After the copy is successful, the VM uses the CAS operation to try to update the Mark Word of the object to a pointer to the Lock Record, and the owner pointer in the Lock Record points to the correct Mark Word. The lock bit of the Mark Word of the object is changed to 00, indicating that the object is locked. If the update fails, the virtual machine first checks to see if the Mark Word of the object points to the stack frame of the current thread. If it does, the current thread has the lock of the object and can proceed directly to the synchronization block. Otherwise, it means that multiple threads compete for the lock, so the lightweight lock will swell to the heavyweight lock, and the lock flag will change to “10”. The pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock will also enter the blocking state. The current thread attempts to acquire the lock using spin, which is the process of looping to acquire the lock without blocking the thread.

In fact, when a thread acquires an object’s lightweight Lock, the object’s Mark Word points to the Lock Record in the thread’s stack frame, and the Lock Record in the stack frame points to the object’s Mark Word. The Lock Record in the stack frame is used to determine which thread has held the Lock on the current object, and the object’s Mark Word is used to determine which thread has held the Lock on the current object. When a thread tries to acquire the lock of an object, it will determine whether the current object is locked by the lock flag, and then determine whether the current thread is the current thread by CAS operation.

The lightweight lock is not designed to replace the heavyweight lock because it adds additional CAS operations in addition to the lock, so it will be slower than the traditional heavyweight lock in competitive situations.

Biased locking

When an object is first instantiated and no threads are accessing it. It’s biased, meaning it now thinks that only one thread can access it, so when the first thread accesses it, it favors that thread. At this point, the object holds a bias lock that favors the first thread. This thread uses CAS when it changes the object header to a biased lock, and changes the ThreadID in the object header to its own ID. Later, when accessing the object again, it only needs to compare the IDS and does not need to use CAS for operations.

Once a second thread to access the object because the biased locking does not take the initiative to release, so the second thread can see objects to state, at this time that already there is competition on the object, check whether the original owners of the thread lock the object is still alive, if you hang up and the object can be become unlocked state, then back to the new thread, If the original thread is still alive, the stack of that thread is immediately executed to check the usage of the object, and if bias locks are still needed, bias locks are upgraded to lightweight locks (this is when bias locks are upgraded to lightweight locks). If no use exists, you can revert the object to an unlocked state and then re-bias it.

Lightweight locks consider contention to exist, but the degree of contention is minimal. Usually two threads will stagger operations on the same lock, or wait a little (spin) before the other thread releases the lock. But when the spin exceeds a certain number, or when one thread is holding the lock, one is spinning, and a third person is calling, the lightweight lock bulges into a heavyweight lock, which blocks all threads except the one that owns the lock, preventing the CPU from idling.

If most of the time the lock is always accessed by multiple different threads, the bias mode is redundant and can be disabled by -xx: -userbiaselocking.

Lightweight locking and biased locking are based on the fact that most of the time the same thread is acquiring an object lock, which is more efficient in this case, but not necessarily more efficient when the lock is always accessed by multiple different threads. Therefore, they are not proposed to replace heavyweight locks, but they can be efficient in some scenarios, so we can set whether or not to enable them according to the virtual machine parameters of our application scenario.

conclusion

JMM is the theoretical basis for Java to implement concurrency. It defines eight operations and eight rules, and makes special provisions for voliate, long and double types.

The JVM resorts our code to optimize performance, and for reordering, the JMM introduces happens-before and as-if-Serial semantics to ensure that the end result of the program does not change because of reordering.

Java threads are implemented through a lightweight mapping to kernel threads. We can use mutual-exclusive synchronization, non-blocking synchronization and non-synchronization to ensure thread safety in multithreading. In addition, Java provides a variety of lock optimization strategies to improve code performance in multithreaded situations.

This section is mainly about the JMM, so the concurrency section is only about the JMM. But to really study the content of concurrent and distributed packages, there are a lot of source code we need to read, only one article obviously cannot cover all.