preface

This content is more, nearly 6 thousand words, can be read according to the needs of the table of contents, suggested collection

If you don’t know the basics of concurrent programming, read this article first

Java Concurrent Programming thread Basics – Small White introductory: juejin.cn/post/696549…

What is multithreaded concurrent programming

1. The concept of concurrency and parallelism

Concurrency means that multiple tasks are being executed at the same time in the same period of time without completion, while concurrency means that multiple tasks are being executed at the same time in a unit time.

Concurrent tasks emphasize that they are executed simultaneously in a time period, and a time period is accumulated by multiple units of time. Therefore, concurrent tasks may not be executed simultaneously in a unit of time.

The following figure shows A dual CPU configuration, with thread A and thread B each executing tasks on their own CPU, enabling true parallel running.

In the practice of multithreaded programming, the number of threads is often more than the number of cpus, so it is generally called multithreaded concurrent programming rather than multithreaded parallel programming.

Second, why multithreaded concurrent programming

The arrival of multi-core CPU era breaks the limitation of single-core CPU to multi-thread efficiency. Multiple cpus mean that each thread can run on its own CPU, which reduces the overhead of thread context switching, but as the performance and throughput demands on applications increase, there is a need to handle massive amounts of data and requests, which make highly concurrent programming imperative.

Third, thread safety in Java

1. The concept of shared resources

If a resource is owned by or accessible to multiple threads, it is a shared resource.

2. The concept of thread safety issues

Thread-safety is a problem that occurs when multiple threads simultaneously read and write to a shared resource without any synchronization, resulting in dirty data or other unforeseeable consequences.

4. Memory visibility of shared variables

Before introducing memory visibility, let’s look at Java’s memory model for handling shared variables in multiple threads, as shown in the figure below

1. Java memory model

According to the Java memory model, all variables are stored in main memory. When a thread uses variables, it copies the variables in main memory to its own workspace, or working memory. When a thread reads or writes variables, it manipulates the variables in its own working memory.

The Java memory model is an abstract concept. In practice, the working memory of threads is shown in the following figure:

The figure shows a dual-core system architecture, with each core having its own controller and arithmetic unit. The controller contains a set of registers and operation controllers, and the arithmetic unit performs arithmetic logic operations. Each core has its own level 1 cache, and in some architectures there is a level 2 cache shared by all cpus.

Working memory in the Java memory model corresponds to L1 or L2 cache or CPU registers here.

2. The memory is invisible

When a thread operates on shared variables, it copies the shared memory from main memory to its own working memory, processes the variables in working memory, and shares the traversal values to main memory.

What happens if thread A and thread B handle A shared variable at the same time? If thread A and thread B are running on different cpus, and the current two levels of Cache (L1 and L2Cache) are empty, then the memory will be invisible because of the Cache.

See the following example to understand what is a shared variable memory invisible problem, so please take a look

  1. Thread A first fetches the value of shared variable X, and since both levels of Cache miss, loads the value of X in main memory, assuming it is 0. It then caches the value of X=0 to the two-level Cache. Thread A modifies X to 1, writes it to the two-level Cache and flushes it to main memory. After thread A completes the operation, the X value in both Cache level and main memory of the CPU where thread A resides is 1.

  2. Thread B gets the value of X, first level 1 cache missed, then look at level 2 cache, level 2 cache hit so return X=1; Everything is fine up here, because we have X=1 in main memory. Thread B then changes X to 2, stores it in thread B’s level 1 Cache and shared level 2 Cache, and finally updates main memory X to 2. At this point, everything is fine.

  3. This time thread A needs to change the value of X again. The cache hit level 1 and X=1, so the problem is that thread B has changed the value of X to 2. This is the memory invisibility of shared variables, where values written by thread B are not visible to thread A.

Using the Volatile keyword in Java solves this problem. Look at section 6 of the table of contents

Synchronized keywords

1. Introduction to synchronized

A synchronized block is an atomic built-in lock provided by Java that can be used by every object in Java as a synchronization lock. These built-in Locks that cannot be seen by Java users are called internal locks, also known as monitor locks.

The executing code of the thread automatically acquires the internal lock before entering the synchronized code block. At this time, other threads accessing the synchronized code block will be blocked and suspended.

The thread holding the internal lock releases the internal lock when it exits the synchronized block or throws an exception, or when the wait series of methods on the resource are called within the synchronized block. A built-in lock is an exclusive lock, that is, when a thread acquires the lock, other threads must wait for the thread to release the lock before acquiring it.

Since threads in Java correspond to native threads of the operating system one by one, when a thread is blocked, it needs to switch from user mode to kernel mode to perform blocking operation, which is a time-consuming operation, and the use of synchronized will lead to context switch.

2. Synchronized memory semantics

The memory semantics of entering a synchronized block are to clear variables used in a synchronized block from the thread’s working memory (note that only used variables are cleared, unused variables are not cleared), so that the variable used in a synchronized block is not retrieved from the thread’s working memory. I get it directly from main memory.

The memory semantics of exiting a synchronized block are to flush changes made to shared variables in a synchronized block to main memory.

Thus, synchronized can solve the problem of shared variable memory visibility.

In fact, this is also the semantics of lock and lock release. When the lock is acquired, the shared variables that will be used in the local memory of the lock will be emptied. When these shared variables are used, they are loaded from the main memory, and when the lock is released, the shared variables modified in the local memory will be refreshed to the main memory.

The volatile keyword

1. Definition and implementation principle

If a field is declared volatile, the Java thread memory model ensures that all threads see the variable’s value as consistent.

Volatile shared variables write with an instruction prefixed with Lock. This instruction does two things on multicore processors:

(1) Write the data of the current processor cache line into the system memory.

(2) Writing back to memory invalidates data cached in other cpus.

Notice number 2. How does this work?

On a multiprocessor, in order to ensure that each processor cache is consistent, will realize the cache coherence protocol, each processor by sniffing the spread of the data on the bus to check the value of the cache is expired, when the processor found himself cache line corresponding to the memory address has been changed, and will be set for the current processor cache line in invalid state, When the processor modifies the data, it reads the data from system memory back into the processor cache.

2. Avoid invisible memory problems

When a variable is declared volatile, the thread does not cache the value in registers or elsewhere when writing to the variable. Instead, the value is flushed to main memory. When another thread reads the shared variable, it retrieves the latest value from main memory, rather than using the value in the current thread’s working memory.

As you can see, the memory semantics of volatile are similar to those of synchronized, and it also addresses the memory visibility problem without the overhead of thread context switching compared to synchronized locking.

3. Avoid command reordering

The Java memory model allows the compiler and processor to reorder instructions to improve performance, and only instructions that do not have a data dependency.

Reordering in a single thread ensures that the result of the final execution is the same as the result of the sequential execution of the program, but there are problems in multithreading.

When writing volatile variables, you can ensure that operations prior to volatile writes will not be reordered by the compiler after volatile writes. When a volatile variable is read, you can ensure that operations after volatile reads are not reordered by the compiler to those before volatile reads.

Concept and realization principle of atomic operation

1. The concept

Atomic operations refer to a series of operations that are either all or none performed, and there is no case where only some of them are performed.

Thread safety (memory visibility and atomicity) can be achieved by using the synchronized keyword, but synchronized is an exclusive lock, and threads that do not acquire the internal lock are blocked, which requires the overhead of context switching. Is there a better implementation? Yes, that is the CAS operation.

2. How does the processor implement atomic operations

(1) Use bus lock to ensure atomicity

The total cue is the use of a LOCK# signal provided by the processor. When one processor outputs this signal on the bus, requests from other processors will be blocked, so that the processor can monopolize the shared memory. Other processors cannot operate on the cache that caches the memory address of the shared variable.

(2) Use cache locks to ensure atomicity

Memory area if is cached “cache Lock” processor cache line, and is locked during the Lock operation, so when it executes Lock operations back to writes to the memory, the processor do not claim to LOCK# signals on the bus, but modify the internal memory addresses, and allow it to the cache consistency mechanism to ensure the atomicity of operation, Because the cache consistency mechanism prevents simultaneous modification of data in more than two cached memory regions, it invalidates the cached rows when other processors write back to the locked rows.

Atomic operations can be implemented in Java by locking and looping CAS.

8. CAS Operation

1. The CAS is introduced

CAS, Compare And Swap, is a non-blocking operation provided by the JDK, which guarantees atomicity of the comparison-update operation through hardware.

The Unsafe class in the JDK provides a variety of compareAndSwap* methods. The following example looks at the compareAndSWapLong method:

  • Boolean compareAndSwapLong(Object obj,long valueOffset,long expect, long updat) CAS has four operands: the memory location of the object, the offset of the variable in the object, the expected value of the variable, and the new value. If the memory offset in obj is valueOffset and the value is epect, the new value update is used to replace the old value expect. This is an atomic instruction supplied by the processor.

2. Three major problems of CAS atomic operation

2.1 INTRODUCTION and solutions to ABA problems

If thread I uses CAS to change variable X from A to B, thread I will first fetch the current variable X (A) and then use CAS to try to change variable X to B. If the CAS operation succeeds, does the program run correctly?

Not necessarily, because it is possible that after thread I obtains the value A of variable X, thread II uses CAS to change the value of variable X to B before executing CAS, and then uses CAS to change the value of variable X to A. So even though the value of X is A when thread I executes CAS, this A is not the same as when thread I gets CAS. That’s the ABA problem.

ABA occurs because the state values of A variable undergo A circular transition from A to B, and then from B to A. If the value of A variable can only be converted in one direction, such as A to B, B to C, without forming A circle, there is no problem.

The AtomicStampedReference class in the JDK avoids the 1ABA problem by equipping the state value of each variable with a timestamp.

2.2 Long cycle time and high overhead

Spin CAS, if unsuccessful for a long period of time, can impose a significant execution overhead on the CPU. If the JVM could support the processor-provided pause instruction, there would be some efficiency gains.

The pause directive has two effects:

First, it can delay pipeline execution so that the CPU does not consume too many execution resources.

Second, it can avoid the CPU pipeline being emptied due to memory scheduling conflicts when exiting the loop, thus improving CPU execution efficiency.

2.3 Atomic operations of only one shared variable can be guaranteed

When performing operations on a shared variable, we can loop CAS to ensure atomic operations, but when performing operations on multiple shared variables, the loop CAS cannot guarantee atomic operations, so we can use locks.

However, since Java1.5, the JDK has provided the AtomicReference class to ensure atomicity between reference objects, allowing multiple variables to be placed in a single object for CAS operations.

False sharing

1. What is fake sharing

When multiple threads modify multiple variables in a cached row at the same time, performance deteriorates compared to putting each variable on a cached row because only one thread can operate on the cached row at the same time. This is called pseudo-sharing.

So what is a cache row? We know that in order to solve the problem of the speed difference between the main memory and the CPU in a computer system, one or more levels of Cache are added between the CPU and the main memory. This Cache is usually integrated into the CPU, so it is also called the CPU Cache. The following figure shows a two-level Cache structure.

The Cache is stored in rows, each of which is called a Cache row. A Cache row is the unit of data exchange between the Cache and main memory. The size of a Cache row is usually a power of two bytes.

When the CPU accesses a variable, it first checks whether the variable exists in the CPU Cache. If it does, it fetches the variable directly from the CPU Cache. Otherwise, it fetches the variable from main memory and copies one Cache line of the variable to the Cache.

Because the Cache row is stored as a block of memory rather than as a single variable, it is possible to store multiple variables in a single Cache row.

2. Why does fake sharing occur?

Pseudo-sharing occurs when multiple variables are placed in a single cache line and multiple threads write to different variables in the cache line at the same time.

Why are multiple variables placed in a cache row?

Because the unit of data exchanged between cache and memory is the cache line, when the variable to be accessed by THE CPU is not found in the cache, according to the locality principle of program running, the memory with the size of the cache line of the variable is put into the cache line. Note: multiple variables with consecutive addresses may be placed in a cache line.

Let’s look at a piece of code, a two-dimensional array, and see the performance difference between row traversal and column traversal.

(1) Go through the number group

public class TestForContent { static final int lineNum = 1024; static final int columNum = 1024; public static void main(String[] args){ long[][] array = new long[lineNum][columNum]; long startTime = System.currentTimeMillis(); for(int i=0; i<lineNum; I++){// line for(int j=0; j<columNum; Array [I][j] = I *2+j; } } long endTime = System.currentTimeMillis(); long cacheTime = endTime - startTime; System.out.println(cacheTime); }}Copy the code

The results

(2) List the traversal number group

public class TestForContent1 { static final int lineNum = 1024; static final int columNum = 1024; public static void main(String[] args){ long[][] array = new long[lineNum][columNum]; long startTime = System.currentTimeMillis(); for(int i=0; i<columNum; I++){// column for(int j=0; j<lineNum; Array [I][j] = I *2+j; } } long endTime = System.currentTimeMillis(); long cacheTime = endTime - startTime; System.out.println(cacheTime); }}Copy the code

The results

As you can see, column traversal takes longer to process arrays than row traversal. This is because column traversal jumps through array elements, which breaks the principle of locality of program access, and the cache is capacity controlled. When the cache is full, rows are replaced according to some elimination algorithm. This causes elements of the cached row displaced from memory to be replaced before they can be read.

So changing multiple variables in a cache row sequentially in a single thread takes advantage of the locality principle of program execution, thus speeding up program execution. Concurrent modification of multiple variables in a cached row in multiple threads will result in competing cached rows, thus reducing program performance.

3. How to avoid fake sharing?

As we can see from the above, the root cause of pseudo sharing is that multiple variables are stored in a single cache line. When multiple threads modify multiple variables at the same time, it is not allowed. So we just have to make a cache line and only put one variable that we need to use.

public final static class FilledLong{
    public volatile long value = 0L;
    public long p1,p2,p3,p4,p5,p6;
}
Copy the code

Assuming the cache behavior is 64 bytes, we fill the FilledLong class with six variables of type long, which fills exactly one cache row, and we only need the value variable, so we avoid pseudo-sharing.

JDK8 provides a sun.misc.Contended annotation to resolve sharing issues.

X. Overview of locks

1. Optimistic and pessimistic locks

Optimistic locking and pessimistic locking are terms introduced in databases, but similar ideas have been introduced in parallel packet locking.

Pessimistic locking refers to the conservative attitude towards data being modified by external threads, believing that data can be easily modified by other threads. Therefore, data is locked before being processed, and the data is locked during the whole process.

The implementation of pessimistic locking often relies on the locking mechanism provided by the database. That is, in the database, exclusive locking is added to the record before the data record operation. If the lock fails to be obtained, it indicates that the data is being modified by the thread, and the current thread will wait or throw an exception. If the lock is acquired successfully, the record is manipulated and the exclusive lock is released after the transaction is committed.

Optimistic locking is relatively pessimistic. It believes that data will not cause conflicts in general, so it will not add exclusive locks before accessing records. Instead, it will formally detect whether data conflicts are detected when data is submitted for update.

Optimistic locking does not use the locking mechanism provided by the database, but is generally implemented by adding a Version field to the table or using business state. Optimistic locks are locked only when a commit is found, so there are no deadlocks.

2. Fair and unfair locks

According to the preemption mechanism, locks can be divided into fair locks and unfair locks. Fair lock means that the order of acquiring lock is determined according to the time when the thread requests the lock, that is, the lock is acquired first. Non-fair locks break in at run time, meaning they are not necessarily first come, first served.

ReentrantLock provides both fair and unfair lock implementations

  • Fair lock: ReentrantLock pairLock = new ReentrantLock(true);
  • Unfair lock: ReentrantLock pairLock = New ReentrantLock(false). If the constructor passes no arguments, the default is an unfair lock.

Use unfair locks when there is no fairness requirement because fair locks incur performance overhead (thread state switching)

3. Exclusive lock and shared lock

Locks can be classified as exclusive or shared depending on whether they can only be held by a single thread or by multiple threads.

An exclusive lock guarantees that only one thread can acquire the lock at any time, and ReentrantLock is implemented in an exclusive manner. Shared locks can be held by multiple threads at the same time. For example, the ReadWriteLock lock allows multiple threads to read a resource simultaneously.

An exclusive lock is a pessimistic lock that limits concurrency because each access to a resource is preceded by a mutex, because a read operation does not affect data consistency. An exclusive lock allows only one thread to read data at a time, and other threads must wait for the current thread to release the lock before they can read.

The shared lock is an optimistic lock, which relaxes the lock conditions and allows multiple threads to read simultaneously.

4. What is reentrant lock?

If a thread blocks when it attempts to acquire an exclusive lock held by another thread, will it block when it attempts to acquire another lock it has already acquired? If it is not blocked, then the lock is said to be reentrant. That is, an acquired lock is not blocked, and the lock is called a reentrant lock.

Synchronized is a reentrant lock.

The principle of a reentrant lock is to maintain a thread identifier inside the lock to indicate which thread the lock is currently occupied by, and then associate a counter. The initial counter value of 0 indicates that the lock is not being held by any thread. When one thread acquires the lock, the counter becomes 1, and when another thread attempts to acquire the lock, it finds that the owner of the lock is blocked.

However, when the thread that acquired the lock acquires the lock again and finds that the reference of the lock is itself, the counter will be +1. When the lock is released, the counter will be -1. When the counter value is 0, the thread identifier inside the lock is reset to NULL, and blocked threads are awakened to contest the lock.

5. The spin lock

When the current thread acquires the lock, if it is already occupied by another thread, it does not immediately block itself and attempts to acquire the lock multiple times (default is 10) without giving up CPU usage, most likely several times after the lock has been released by another thread. If the lock is not acquired after the specified number of times, the current thread is blocked and suspended.

Because threads in Java correspond to threads in the operating system, when a thread fails to acquire a lock (such as an exclusive lock), it is switched to kernel state and suspended. When the thread acquires the lock, it needs to be switched to the kernel state to wake up the thread. Switching from user mode to kernel mode is expensive, which affects concurrency performance to some extent.

Spinlocks capture the overhead of thread blocking and scheduling using CPU time, but there is also the possibility that this CPU time is wasted.

11. Lock optimization

1. The biased locking

2. Lightweight locks