What is a JUC

Java java.util.concurrent (JUC) is a concurrent programming tool class provided by Java in version 5.0, which provides thread pooling, asynchronous IO, etc., as well as Collection implementation for multithreaded context etc.

Memory visibility

What is the Java Memory Model (JMM)

The Java Memory Model describes how threads interact with each other through Memory. Specifically, the JVM has a Main Memory (or Java Heap Memory) that is shared by all threads, and each thread has its own Working Memory that holds a copy of some of the variables in Main Memory. The operation of all variables by threads does not take place in main memory, but in working memory. Threads cannot access each other directly, and the transfer of variables in the program depends on main memory to complete.

Two provisions of the Java memory model

  • All operations by a thread on a shared variable must be done in its own working memory and cannot be read or written directly from memory
  • Different threads cannot directly access variables in the working memory of other threads, and the transfer of variable values between threads needs to be completed through the main memory

What is Memory Visibility?

In multithreading, reads and writes occur in different threads, and the reader thread cannot read the latest value written by the writer thread in time

What is a CPU cache

Because of the computer storage and CPU speed in the gap is very large, so the modern computer system will increase as close as possible to read/write speed processor speed in the cache as a buffer between memory and processor, the operation of the need to use the data is copied to the cache, let operation can quickly, when the operation ended again from the cache synchronization into memory

Let’s look at the CPU memory structure

After being added to the CPU cache, the CPU performs the following operations

  1. The data needed for the calculation is cached in the CPU cache
  2. The CPU computes data directly from the cache
  3. Write the data back to the cache after the calculation is complete
  4. Write data back to main memory

Why cache consistency is a problem (2 explanations)

The first kind of

A computer is the most core components of the CPU, memory, and I/O devices, the processing speed has the very big difference between the three, but in the end, the overall computation efficiency depends on the equipment, the slowest in order to balance the three speed difference, maximize the use of the CPU performance, both in hardware, operating synergy or compilers have done a lot of optimization:

  1. The CPU has increased the cache
  2. The operating system adds processes and threads to maximize CPU performance through time slice switching
  3. The spirit of the compiler is optimized to make better use of the CPU cache

The second (recommended) due to the rapid development of CPU, CPU processing speed and the speed of read/write memory, so a cache exists between memory and processing, every nuclear to maintain its own cache, and each nuclear cache is not visible to each other, then the cache consistency problems.

How to solve the cache consistency problem

  • The bus lock
    • A bus LOCK is used to LOCK the bus. When a CPU performs a data operation on a thread, it sends a LOCK signal to the bus. When other threads attempt to access the main memory, it blocks, so that the processor core can have exclusive access to the shared memory. It can also be understood that bus locks serialize parallelized operations by locking up communication between memory and CPU, which can cause serious performance problems, so as technology evolved, cache locks emerged.

  • Cache lock
    • After one CPU checks the data in the cache, it tells the other CPU cores to abandon their internal cache or read it again from main memory.

The msci agreement

The processor has a complete set of protocols to ensure cache consistency. The classic one is the MESI protocol, which stores a marker bit in the CPU cache to mark the four states. In addition, the Cache controller of each Core not only knows its own read/write operations, but also listens to the read/write operations of other caches, namely snooping.

  • M: It was modified. Data in this state is cached only in the local CPU and not in other cpus. Its state has been modified relative to the value in memory and has not been updated in memory.
  • E: Exclusive. The data in this state is cached only in the local CPU and is not modified, that is, consistent with the data in the memory.
  • S: Shared. Data in this state is cached in multiple cpus and is consistent with memory.
  • I: Invalid. The cache in this CPU is invalid.

How do I implement memory visibility

1. Use the volatile keyword

Instruction reordering

Code that is not written in the same order as it is executed. Reordering is an optimization made by the compiler or processor to improve program performance

The role of the volatile keyword

  1. Memory visibility is guaranteed, but atomicity of operations is not guaranteed
  2. The Spirit of Prevention rebeats

How does the volatile keyword guarantee visibility

For volatile variables, after each write, a store memory barrier command is added, which forces the working memory to save the variable’s latest value to main memory. Before each read operation, a load memory barrier command is added, which forces the working memory to load the latest value of the variable from main memory to the working memory

When can VOLATILE be used

  1. Writes to variables do not depend on the current value of the variable, or you can ensure that only a single thread updates the value of the variable.
  2. This variable is not contained in an invariant with other variables

2. Synchronized

How it works: The JVM synchronizes methods and synchronized blocks by entering and exiting the Monitor, which relies on the underlying operating system’s Mutex Lock implementation

The implementation is to add a monitor.enter directive after compilation before synchronous method calls, and insert monitor.exit directives at exit methods and exceptions

Threads that do not acquire the lock will block at the method entry until the thread that acquired the lock, monitor.exit, attempts to continue acquiring the lock

3. The difference between volatile and synchronized

  1. Volatile does not lock: Volatile variables are a weaker synchronization mechanism. Volatile variables do not lock when accessed and thus do not block the thread of execution, making them a lighter synchronization mechanism than synchronized
  2. Volatile variables act like synchronous variable reads and writes: From a memory visibility perspective, writing volatile variables equals exiting a synchronized block, and reading volatile variables equals entering a synchronized block.
  3. Volatile is less secure than synchronized: Excessive reliance on volatile variables to control the visibility of state in code is generally more fragile and difficult to understand than code that uses locks. Volatile variables should be used only when they simplify code implementation and the validation of synchronization policies. In general, it is safer to use a synchronization mechanism
  4. Volatile does not guarantee memory visibility and principle at the same time: Locking (or synchronization) ensures both visibility and atomicity, whereas volatile variables only ensure visibility. The reason is that simple variables declared as volatile have no effect if their current value is related to their previous value. In other words, the following expressions are not atomic operations: Count ++, count = count+1

Take a look at the synchronized modifiers

public class Demo {
    public static void main(String[] args) {
        synchronized (Demo.class) {
            System.out.println("synchronized demo..."); }}}Copy the code

Compile the code

javac Demo.java
Copy the code

View the compiled code through Javap

javap -c Demo.class
Copy the code

Synchronized modification method

Synchronized modifies a method with the ACC_SYNCHRONIZED flag, which indicates that the method is a synchronized method. The JVM uses the ACC_SYNCHRONIZED access flag to determine whether a method is declared as a synchronized method and to perform the corresponding synchronized call

Synchronized characteristics

  1. Guarantee atomicity, visibility and order
    • Visibility: When the lock is released, all writes are written back to memory, and when the lock is acquired, all reads read the latest data from memory
  2. reentrancy
    • After the same thread obtains the lock, other code requiring the same lock can be directly called
    • Principle: 1. Record the thread and number of lock holders; 2. Check whether the object is locked when calling synchronized code; if yes, check whether it is locked by the current thread; if yes, count is increased by one; if not, enter the waiting queue; 3. When the count is reduced by 1, until it is reduced to 0, the lock is cast
  3. heavyweight
    • The bottom layer is done through a monitor object
    • The essence of the monitor Lock is to rely on the Mutex Lock implementation of the underlying operating system, which implements thread switching from user mode to kernel mode
    • The above switching process is long, so synchronized efficiency is low and heavyweight

4. The Synchronized optimizations

It can be seen from the characteristics of synchronized that it is a heavyweight lock, which will affect the efficiency of operating system state switch. Therefore, VARIOUS optimizations are made on SYNCHRONIZED in JDK1.6. Biased lock and lightweight lock are introduced in order to reduce the consumption caused by lock acquisition and release.

Biased locking

Biased locking is created in the absence of a multithreaded competition situation as far as possible to reduce unnecessary lightweight lock execution path, because of the lightweight lock acquisition and release rely on the CAS atomic instructions for many times, and biased locking only need to rely on a CAS when replacement ThreadID atomic instruction (because once appear, multithreading competition we must cancel the biased locking, So the performance loss of the undo operation with bias lock must be less than the performance cost of the saved CAS atomic instruction.

Bias lock acquisition process

  1. Whether the Mark of biased lock is set to “1” and whether the lock flag bit is “01” in the access Mark Word — confirm that it is in the biased state.
  2. If it is biased, judge whether the thread ID points to the current thread; if it is, enter step (5); otherwise, enter Step (3).
  3. If the thread ID does not point to the current thread, the lock is contested through the CAS operation. If the competition is successful, set the thread ID in Mark Word to the current thread ID, and then execute (5); If the competition fails, perform (4).
  4. If the CAS fails to obtain a biased lock, a race occurs. When the safepoint is reached, the thread that acquired the bias lock is suspended, the bias lock is upgraded to a lightweight lock, and the thread blocked at the safepoint continues to execute the synchronization code.
  5. Execute synchronization code.

Bias lock release

Biased lock only when other threads try to compete for biased lock, the thread holding biased lock will release the lock, the thread will not actively release biased lock. The revocation of biased lock needs to wait for the global safe point (at which no bytecode is being executed). It will first suspend the thread with biased lock and determine whether the lock object is locked. After revocation of biased lock, it will return to the state of unlocked (flag bit “01”) or lightweight lock (flag bit “00”).

Lightweight lock Lightweight lock is not used to replace the heavy lock, it is intended to reduce the performance consumption of traditional heavy lock without multithreading competition.

The process of adding lightweight locks

  1. When the code enters the synchronization block, if the Lock status of the synchronization object is lockless (the Lock flag bit is “01”, whether the bias Lock is “0”), the virtual machine will first establish a space called Lock Record in the stack frame of the current thread, which is used to store the copy of the current Mark Word of the Lock object.
  2. Copy the Mark Word from the copy object header to the lock record.
  3. After the copy is successful, the VM uses the CAS operation to try to update the Mark Word of the object to a pointer to the Lock Record, and the owner pointer in the Lock Record to the Object Mark Word. If the update succeeds, go to Step 3; otherwise, go to Step 4.
  4. If the update succeeds, the thread owns the lock on the object, and the object’s Mark Word lock bit is set to 00, indicating that the object is in a lightweight locked state.
  5. If the update fails, the virtual machine first checks to see if the object’s Mark Word points to the current thread’s stack frame. If it does, the current thread has the lock on the object, and can proceed directly to the synchronization block. Otherwise, it means that multiple threads compete for the lock, and the lightweight lock will swell to the heavyweight lock, and the status value of the lock flag will change to “10”. The pointer to the heavyweight lock (mutex) is stored in the Mark Word, and the thread waiting for the lock will also enter the blocking state. The current thread attempts to acquire the lock using spin, which is the process of looping to acquire the lock without blocking the thread.

The process of unlocking lightweight locks

  1. Attempts to replace the current Mark Word with the product copied in the thread by CAS operation.
  2. If the replacement succeeds, the synchronization process is complete.
  3. If the replacement fails, another thread has attempted to acquire the lock (which has ballooned), and the suspended thread must be awakened at the same time the lock is released.

Other optimizations adaptive spin: When using CAS, if the operation fails, CAS spins to try again. Since spin is CPU intensive, spinning for long periods of time wastes CPU. JDK1.6 adds adaptive spin, which means that if a lock spin is rarely successful, the next time the spin is reduced.

Spin is turned on with the –XX:+UseSpinning parameter (before JDK1.6 spin was turned off by default). Change the number of spins with –XX:PreBlockSpin. The default value is 10.

Lock elimination: Lock elimination refers to a virtual machine that performs lock elimination even when the compiler is running, if it detects that there is no possibility of contention for shared data. Lock elimination can save time on pointless lock requests.

Lock coarsening: When writing code, we recommend limiting the scope of synchronized blocks to as small as possible — synchronizing only in the actual scope of the shared data. This is to keep the number of operations that need to be synchronized as small as possible, so that the waiting thread can acquire the lock as quickly as possible if there is a lock contention.

Note: In most cases, the above principle is fine, but if a series of consecutive operations repeatedly lock and unlock the same object, there can be a lot of unnecessary performance costs

5. To summarize

  1. In contrast to synchronized blocks, volatile provides a lightweight lock on shared variables, and we need to consider using volatile when communicating with shared variables between threads.
  2. Volatile is a weaker synchronization mechanism. Locking is not performed when accessing volatile variables, and therefore thread blocking is not performed. Volatilei is therefore a lighter synchronization mechanism than the volatilei keyword.
  3. Use volatile on member variables that two or more threads need to access. Volatile is not necessary when the variable to be accessed is already in a synchronized block or is a constant.
  4. Using volatile is inefficient because it blocks out necessary code optimizations in the JVM, so use this keyword only when necessary

CAS algorithm and atomic variables

CAS

Compare and Swap

  • CAS is a hardware-on-concurrency support, a special instruction in a processor designed for multiprocessor operations to manage concurrent access to shared data.
  • There are three operands in total, one memory value v, one thread-local memory old value A (expected value before operation) and one new value B. During operation, the old value A and the memory value V are compared to see if there is any change. If there is no change, the memory value V can be updated to the new value B.
  • CAS is an implementation of a lock-free non-blocking algorithm.

Let’s simulate CAS in code

public class CompareAndSwapDemo {

	public static void main(String[] args) {
		final CompareAndSwap cas = new CompareAndSwap();
		
		for (int i = 0; i < 10; i++) {
			new Thread(new Runnable() {
				
				@Override
				public void run() {
					int expectedValue = cas.get();
					boolean b = cas.compareAndSet(expectedValue, (int)(Math.random() * 100));
					System.out.println(b);
				}
			}).start();
		}
	}
}

class CompareAndSwap{
	private int value;

	public synchronized int get() {return value;
	}

	public synchronized int compareAndSwap(int expectedValue, int newValue){
		int oldValue = value;
		
		if(oldValue == expectedValue){
			this.value = newValue;
		}
		
		return oldValue;
	}

	public synchronized boolean compareAndSet(int expectedValue, int newValue){
		returnexpectedValue == compareAndSwap(expectedValue, newValue); }}Copy the code

The cyclic CAS algorithm performs the CAS operation continuously. Java. Util. Concurrent. Atomic package under the atomic variable types, such as AtomicInteger, use these underlying JVM support for digital type of CAS reference types to provide a highly efficient operation.

The synchronized keyword in Java has also been optimized since jdk1.5. Until then synchronized was the heavyweight lock, and in any case it simply added a mutex to an object, making it inefficient in some cases. After version 1.5, synchronized was upgraded with a lock,

From bias lock to lightweight lock to spin lock to heavyweight lock. The lightweight lock is similar to cas algorithm to achieve,

The JVM will create a space for storing lock records in the current thread’s stack frame before executing the synchronized block and copy the product header Mark Word (officially called product Mark Word) into the lock record. The thread then tries to use CAS to replace the Mark Word in the object header with a pointer to the lock record. If it succeeds, the current thread acquires the lock; if it fails, the spin acquires the lock. If the spin acquires the lock still fails, it indicates that there are other threads competing for the lock (two or more threads competing for the same lock), and the lightweight lock expands to the heavyweight lock.

Although CAS is very efficient in solving atomic operations, there are still three problems with CAS:

  • ABA problem
  • Long cycle time and high overhead
  • Atomic operations of only one shared variable are guaranteed

The atomic variable

Class toolkit that supports thread-safe programming to unlock a single variable. In fact, the classes in this package extend the concept of volatile values, fields, and array elements to those that also provide atomic conditional update operations

Under the Java. Util. Concurrent. Atomic package provides some atomic operations of the common categories:

  • AtomicBoolean
  • AtomicInteger
  • AtomicLong
  • AtomicReference
  • AtomicIntegerArray
  • AtomicLongArray
  • AtomicMarkableReference
  • AtomicReferenceArray
  • AtomicStampedReference