An article to understand Java concurrent programming

Hi, friends, long time no see. Many things have happened in the past two months. The outbreak of the epidemic has affected many families and brought them to the brink of breaking up. We do not know how many small and medium-sized enterprises are facing difficulties in resuming work and unable to pay salaries. At this time of national crisis, we should have more trust in our country, actively cooperate with our work, and pray for an early end to the epidemic and for people’s lives to return to normal. Go Wuhan, go China!

But we have to continue to learn. When you can’t go out at home, reading some articles to deepen your knowledge is also a good choice. Today I want to share the topic of Concurrent programming in Java. I knew something about concurrent programming before, but it didn’t feel systematic. Therefore, I re-learned concurrent programming during this period, hoping to form a natural system and share what I have learned. This article is long and recommended for viewing on a computer.

Here is an outline of the Java Concurrent programming series:

The source of concurrency problems

Concurrency in Java has always been relatively advanced. Many Java programmers have also learned about the use of synchronize, a lightweight version of the lock used to synchronize code. However, there may be some friends who don’t know what scenario to use the corresponding toolkit, that is, don’t know exactly what problem each tool solves. Let’s start by talking about why concurrent programs can cause all kinds of weird bugs.

With the development of technology, CPU, memory and IO devices have all improved dramatically. The number of CPU cores changes from single-core to multi-core. The CPU may be executing many tasks at the same time, or a task may be running by multiple cpus at the same time. The speed and capacity of memory and IO devices are also changing rapidly. However, no matter how the change, there is a core contradiction that the speed difference of the three has always existed, and the difference is as big as a day in the sky, ten years on earth. Most of our programs need to access memory, and some of them need to access IO, so overall performance is limited by the speed at which they can access memory and IO, and the CPU may be languidly idle most of the time.

In order to balance the speed difference among the three and improve the UTILIZATION rate of CPU, some optimizations have been made in computer structure, operating system and compiler, mainly including:

CPU
Increase the cacheIt is much faster to put recently used data into the cache than to read data directly from memory.
The operating system
Add processes and threadsIn order to time – multiplexing CPU, balance CPU and IO device speed difference
compiler
Optimize the order of instruction executionSo that the data in the cache mentioned above can be used more rationally

But it’s these optimizations that make it so easy to get weird bugs when writing concurrent programs.

Visibility problem

As mentioned above, the CPU increases the cache to balance out the speed difference between CPU and memory. With single-core, all threads are running on the same CPU, and this optimization is not a problem. Since there is only one cache, thread A writes in the cache, and when the CPU runs thread B, thread B must see the result of the write.

Changes made by one thread to a shared variable that another thread can see immediately are called visibility.

But in the multi-core era, with multiple cpus and multiple caches, the write operations of thread A in CPU A’s cache are not necessarily visible to thread B in CPU B, because they are different caches of operations, so the write operations of thread A are not visible to B. Here’s an example.

public class Test {
  private long count = 0;
  private void add10K() {
    int idx = 0;
    while(idx++ < 10000) {
      count += 1;
    }
  }
  public static long calc() {
    final Test test= new Test(); Th1 = new Thread(()->{test.add10k (); }); Thread th2 = new Thread(()->{ test.add10K(); }); // Start two threads th1.start(); th2.start(); Th1.join (); // Wait for both threads to finish. th2.join();returncount; }}Copy the code

We used two threads to increment count 10000 times in the same thread. On a single-core CPU, the result should be 20000. On a multi-core CPU, the result might be a random number. Let’s say that thread A and thread B are running at the same time, so they both start with 0, and then they both increase by 1 and then they both write to memory at the same time, and then there’s A 1 in memory instead of the 2 we want. Since the memory is read and written in indeterminate order, the end result may be a random number.

This is the visibility problem caused by multiple caches. Changes made by one thread cannot be seen by other threads in a timely manner.

Atomicity problem

In order to improve CPU efficiency, the operating system invented multiple processes and threads. The operating system allows a thread to execute a time slice, and then selects a new thread to execute. In this way, while waiting for I/OS, the operating system can transfer time slices to other threads to execute, improving CPU utilization.

This may seem natural, but since task switching occurs mostly at the end of the time slice, modern high-level languages often require multiple CPU instructions for a single statement, such as the familiar count += 1 operation, which requires three instructions:

Load the variable count from memory into the CPU’s register
Perform the +1 operation in the register
Write the result back to memory (cache).

A time slice switch, which can occur at the end of any CPU instruction, can cause some strange problems when the two threads switch. This is a bit of a trap. For example, as this diagram shows, both threads perform a +1 operation and end up with a 1 instead of the expected 2.

We call atomicity the property of one or more operations being executed by the CPU without interruption. The CPU can only guarantee atomic operations at the instruction level, not operators in high-level languages. So we have to do something else to make the operation atomic.

Order problem

The compiler sometimes changes the order in which statements are executed in a program to optimize performance. For example, in the program: “A =6; B = 7.” , the compiler may be optimized to “b=7; A = 6;” . In this case, the compiler adjustment did not affect the final result of the program. But sometimes unexpected bugs can occur.

A classic example in the Java world is the double-checked singleton pattern, as shown in the following code.

public class Singleton {
  static Singleton instance;
  static Singleton getInstance() {if (instance == null) {
      synchronized(Singleton.class) {
        if(instance == null) instance = new Singleton(); }}returninstance; }}Copy the code

At first glance, the above code seems to have no problem, but some students may have some ideas after reading the above section of atomicity problem. Yes, that’s the problem. Because the new operation is not atomic. It actually consists of three steps:

Allocate a block of memory M;
Initialize the Singleton object on memory M;
M’s address is then assigned to the instance variable

The operating system may be optimized for a 1–>3–>2 order of operations. In this case, let’s say thread A executes the getInstance() method first, and when instruction 2 is finished, A thread switch happens, switching to thread B; If thread B also executes the getInstance() method, then thread B will find instance! = null, so instance is returned directly, and the instance is not initialized. If we call the member variable of instance, we may raise the null-pointer exception.

What’s the solution? Modify variables with volatile, or with static inner class singletons

summary

Concurrent programs often appear all sorts of strange questions, may at first glance is very wulitou, don’t know where to check up, but as long as we are profound understanding the visibility, atomicity, order, we can know what problem may arise in certain situations, and know the Java provides each of the concurrent tool of is to solve any problem. We’ll look at how Java solves these problems later, on the basis of concurrency theory.

2. Basis of concurrency theory

We’ve looked at common visibility, atomicity, and orderliness problems in concurrent programs, and we’ll look at what Java is trying to do to address them. I divide it into the Java Memory Model (JMM) and the fundamentals of concurrent programming (mainly thread-related concurrency knowledge). Both of these are important concurrent programming backgrounds that can help us understand and code better.

Java Memory Model (JMM)

In concurrent programming, two key issues need to be addressed: how threads communicate with each other and how threads synchronize. Synchronization here refers to the mechanism used in a program to control the relative order in which operations occur between threads. There are two common thread communication mechanisms, shared memory and message passing. There are some differences between the two mechanisms in their treatment of two key issues:

1. Shared memory: In the concurrent model of shared memory, threads share the common state of the program and communicate implicitly through the common state in write-read memory. Implicit communication is used because two threads are not in direct contact with each other, but share memory to get related results from each other. But synchronization is explicit, and the programmer must explicitly specify that a method or piece of code needs to be executed mutually exclusive between threads. Such as below

2. Messaging: In the concurrent model of messaging, there is no common state between threads, and threads need to send messages to each other, so communication is explicit. And because messages are sent and received in order, the relative order between the two threads is implicitly specified. Such as below

Java uses a shared memory model. Programmers need to understand how implicit communication works, or they can run into all sorts of weird memory visibility problems.

The JMM basis

Communication between Java threads is controlled by the JMM, which determines when a write by one thread to a shared variable is visible to another thread. It also defines an abstract relationship between threads and main memory: shared variables between threads are stored in main memory, and each thread has a private local memory that stores a copy of the shared variables that the thread has read/written. Of course, local memory is an abstraction, including caches, registers, and so on. Represents where the thread stores the data.

Note: Shared variables refer to variables stored in heap memory such as instance fields, static fields, and array elements. Only heap memory can be shared between threads. Less clearly, refer to the JVM’s heap and stack

An abstract representation of the Java memory model is shown here. As we mentioned earlier, shared data is stored in main memory, and a copy of shared variables is stored in private memory in each thread.

In section 1, we discussed how compilers and processors often reorder instructions to improve performance. There are three types of reordering:

The compiler optimizes reordering. That is, the compiler can rearrange the execution order of statements without changing the semantics of as-if-serial. If you don’t know what as-if-serial means, we’ll talk about it later.
Instruction level parallel reordering. If there is no data dependency, the processor can change the execution order of the machine instructions corresponding to the statement
Memory system reordering. Because the processor uses caching and read/write buffers, this makes it appear that read and write operations may be performed out of order.

The first of the above three types is compiler reorder, and 2 and 3 are processor reorder. To address the visibility and orderliness issues that reordering entails, the JMM forbids certain types of compiler reordering and refines certain types of processor reordering by inserting memory barriers. Memory barriers we’ll talk about later.

What are the conditions under which the compiler and processor will prohibit reordering? The main criterion here is whether there is data dependency between the two operations.

Data dependencies: If two operations access the same variable and one of them is a write operation, then there is a data dependency between the two operations. So if it’s a good test, if there’s a write operation, it’s a data dependency. For example, write after read (a=1, b=a), write after write (a=1, a=2), read after write (a=b, b=1). Whenever there is a reordering of these three types, the result of the execution will be changed, and the compiler and processor will disallow the order of execution of the two operations that are data-dependent.

As-if-serial = as-if-serial The as-if-serial semantics mean that the execution result of a single-threaded program cannot be changed no matter how the compiler and processor reorder it. Both the compiler and processor must comply with the as-IF-serial semantics. This is the guarantee that our program can be stable and perform as expected. For example, suppose we have a program

a = 1
b = 2
c = a * b
Copy the code

So A and B can be reordered, but A and C, b and C can’t be reordered, because they have data dependencies

As we mentioned above, caching causes visibility problems because the cache is not flushed into memory in a timely manner. The problem is, if it is timely, two threads write to the cache at the same time, and then two caches write to memory at the same time, will the data in memory conflict? The answer is no, and it has everything to do with how the bus works.

In a computer, data is transferred between the processor and memory through the bus, and each data transfer between the processor and memory is done through a series of steps called bus transactions. Bus transactions include read and write transactions. Read transactions send data from memory to the processor, and write transactions send data from the processor to memory. Also, the bus synchronizes transactions that attempt to use the bus concurrently. This means that only one processor is executing a bus transaction at a time. The remaining processors need to wait for the previous processor to complete the transaction before performing the operation.

4. In this section, we introduce the abstract structure of JMM, let you see how cache affects data synchronization, and also introduce the types of reordering and the basis for identifying reordering – data dependency. Finally, we introduce the working mechanism of bus, aiming to give you a preliminary understanding of the working principle of JMM. Recognition of these mechanisms is a prerequisite for the stable operation of the program.

Happens-before rules

Happens-before is the core concept of the JMM. As a Java programmer, if you understand the happens-before rule, you understand the key to the JMM. In this regard, we must work hard.

At the beginning of the JMM design, designers need to consider two important requirements:

Programmers use the memory model: they want it to be easy to understand, easy to program, not to know everything, but stable and reliable
Compiler and processor implementations of memory models: Want as few constraints as possible so they can make as many optimizations as possible

The designers tried to balance these two needs by providing a set of happens-before rules for upper-level programmers to program based on the guarantee of memory visibility provided by these rules. In the case of the underlying compiler, reordering that changes the result of program execution is prohibited. In addition, the compiler can optimize whatever it wants, such as removing useless locking and volatile handling.

2. Define the happens-before concept to specify the order of execution between two operations, either on the same thread or in different threads. It is defined as: if an action happens-before another, the result of the execution of the first action will be visible to the second action. For example, operation A reads and writes variables A and B. Operation A happens before operation B. Then, when operation B is performed, it can see the result of A and B variables after operation A is completed.

So let’s go back to our previous example.

A = 1 b = 2 C = a * b // 3Copy the code

For this example, operation 1 happens-before 2 Operation 1 happens-before 3 Operation 2 happens-before 3

Note, however, that the happens-before rule does not necessarily match the order in which the program is executed, and this is where the designers put their hands on the compiler and processor. In terms of data dependencies, 2 and 3 are required in the three happens-before relationships, but 1 is not necessary. Therefore, the execution order of operations 1 and 2 can be reversed. Compilers and processors are optimized as much as possible in this case.

There are six happens-before rules defined in Java:

Procedure order rule: For every action in a thread, happens-before any subsequent action in that thread.
Monitor lock rule: a lock is unlocked, happens-before a lock is subsequently locked
Rule for volatile variables: Writes to a volatile variable, happens-before any subsequent reads of that volatile variable
Transitive: If A happens-before B, B happens-before C, then A happens-before C.
Start () rule: if thread A performs the operation threadb.start (), then thread A’s threadb.start operation happens-before any operation in ThreadB
Join () rule: if thread A performs the operation threadb.join () and returns successfully, any operation in ThreadB happens-before thread A returns successfully from threadb.join ().

Because both the compiler and the processor must satisfy the as-IF-serial semantics, the as-IF-serial semantics further guarantee program ordering rules. So the program order rules are equivalent to encapsulating the as-IF-serial semantics we mentioned earlier. While the start and Join rules provide visibility guarantees for thread switching, the first four rules provide visibility guarantees for the various tools we use on a daily basis.

Happens-before is very important and will be critical to our understanding of how locks and toolsets are implemented. Just keep in mind that we’re going to be using this a lot.

Volatile memory semantics

When most Java learners first learn about volatile, they remember that volatile is a lightweight lock that modifiers variable changes to ensure multithreaded visibility. When I first learned about volatile, I thought that volatile was just a small tool provided by Java, but when I saw that Java’s happens-before rule had a special provision for volatile, it didn’t feel so easy.

We can think of a single read/write to a volatile variable as synchronizing these individual reads/writes using the same lock. So volatile variables have the following properties:

Visibility. A read on a volatile variable always sees the last write to that variable by any thread.
Atomicity. Read/write to any single volatile variable is atomic, but compound operations such as volatile++ are not.

Having covered the features of volatile variables, let’s take a look at how the JMM handles volatile reads and writes using the JMM memory abstraction structure mentioned above. When a volatile variable is written, the JMM flusher the shared variable from the thread’s local memory to main memory, and on read, the volatile variable forces the read from main memory. This effectively disables CPU caching. Note here that the JMM flushers not only volatile variables to main memory, but all shared variables in local memory. This feature is used in various utility classes for the Java Concurrent package. Here’s a little chestnut to illustrate.

public class VolatileExample {
    int x = 0;
    volatile boolean v = false;
    public void writer() {
        x = 42;
        v = true;
    }
    public void reader() {
        if (v == true) {
            System.out.println("x = "+ x); }}}Copy the code

For this class, one thread executing Writer and another thread executing Reader will print x=42 because shared variables are flushed into main memory when volatile variables are written. Note that volatile variables must be written last and read first.

We can summarize the memory semantics of volatile writes and volatile reads:

Thread A writes A volatile variable, which is essentially A message that thread A sends to A thread that will read the volatile variable next
Thread B reads a volatile variable, essentially receiving a message from a previous thread that made changes to the shared variable before writing to the volatile variable
Thread A writes A volatile variable, and thread B reads the volatile variable. Essentially, thread A sends A message through main memory to thread B.

Does this make you more aware of the implicit communication features of the JMM shared memory model?

The memory barrier

Having covered the memory semantics of volatile, let’s look at how the JMM implements the memory semantics of volatile. We mentioned earlier that reordering is divided into compiler reordering and processor reordering, and the JMM restricts both types of reordering to achieve volatile memory semantics.

You can see that

The second operation, volatile write, cannot be reordered regardless of the first operation
When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered
The first operation is volatile write, and the second is volatile read, which cannot be reordered

When the bytecode is generated, the compiler inserts a memory barrier into the instruction sequence to prevent processor reordering. The memory barrier serves two purposes: it prevents instruction reordering on either side of the barrier, and it forces the cache to update or write data to main memory. The Load Barrier is responsible for updating the cache and the Store Barrier is responsible for writing the contents of the cache back to main memory

The JMM takes a conservative approach and ensures that volatile semantics are correct in all cases.

Insert a StoreStore barrier before each volatile write.
Insert a StoreLoad barrier after each volatile write. The main purpose of this step is to prevent reordering of volatile writes and potentially volatile reads
Insert a LoadLoad barrier after each volatile read.
Insert a LoadStore barrier after each volatile read.

Here’s an example of what a barrier means using volatile writes. As shown in the figure below, volatile writes are inserted into the StoreStore and StoreLoad barriers, respectively, after which the processor cannot reorder the specified type. You can think of the memory barrier as a railing, where the code can only be executed in a fixed order.

The second operation is volatile, and the first operation cannot be reordered regardless of what the second operation is, but we just add a StoreStore to the front of the volatile operation. Why not LoadStore? In fact, I did not want to understand here, welcome to understand the students give directions.

Locked memory semantics

Locking is the most important synchronization mechanism in Java. You may have started learning Java with synchronized. Let’s look at the memory semantics of locks.

When a thread releases the lock, the JMM fluses the shared variables in the thread’s local memory to the main memory. When another thread acquires the lock, the JMM invalidates that thread’s local memory, so the critical section code protected by the monitor must read the shared variable from main memory. We can also derive this from the happens-before rule.

Follow the rules of procedure, 1 happens-before 2 happens-before 3, 4 happens-before 5 happens-before 6. 3 happens-before 4. So 2 happens-before 5. That is, any changes made by the previous thread in the critical section are visible to the subsequent thread that acquires the lock.

Locking and releasing locks correspond to the same memory semantics as volatile reads and writes. The summary is as follows:

When thread A releases A lock, thread A essentially sends A message to the next thread that will acquire the lock (whose changes to the shared variable)
Thread B acquires a lock, essentially receiving a message from a previous thread that modified the shared variable before releasing the lock
Thread A releases the lock and thread B acquires the lock, essentially sending A message to thread B through main memory.

Implementation of lock semantics

Now we use ReentrantLock to analyze the implementation mechanism of memory locking semantics. Already rely on AbstractQueuedSynchronizer (hereafter referred to as “AQS). AQS maintains synchronization status using a volatile variable state of the integer type. ReentrantLock calls are fair and unfair:

Fair locking: Volatile state is written at the end of the lock and read first when the lock is acquired. According to the happens-before rule for volatile, changes made by the locking thread to a shared volatile variable before writing to it will be visible immediately after another thread reads the same volatile variable.
Unfair lock: An unfair lock is released exactly as a fair lock and acquired using the compareAndSet method.

Here we introduce the compareAndSet method, which is also often abbreviated as CAS. It sets the synchronization state atomically to the given update value if the current state value is equal to the expected value. You can see that the CAS operation satisfies atomicity and visibility. The handler prefixes the switch instruction with lock when handling CAS methods. The lock prefix has the following characteristics:

Ensure that read – change – write operations to memory atoms are performed
Disallow the instruction with read and write before and after
Flushes all data in the write buffer to memory

The latter two points give CAS the full memory semantics of volatile read and write.

Therefore, the memory semantics of locks can actually be implemented in two ways:

Memory semantics for write-reads using volatile variables
Use memory semantics for volatile writes and reads that come with CAS.

Of course, you can also use a free combination of volatile and CAS. When we analyze the source code implementation of the Concurrent package, we will find a generalized implementation pattern:

Declare the shared variable volatile
Synchronization between threads is achieved using atomic conditional updates of CAS
Communicate between threads in conjunction with volatile read/write and CAS.

Volatile and CAS are arguably the cornerstones of concurrent programming in Java

Threads and concurrency

With the memory model out of the way, let’s move on to the java-centric concept of how threads work in concurrent programming. I’m focusing here on thread monitors and wait/notification mechanisms.

The monitor

Each object has its own monitor. The Object methods wait(), notify(), and notifyAll() are all related to monitors. When this object is called by a synchronized block or method, the thread executing the method must obtain the monitor before it can enter the synchronized region. We can think of the monitor as the door to the room. You can only enter the room by getting the lock. Threads that do not reach the monitor BLOCK at the entry and enter the BLOCK state. We say the thread is blocked.

Wait/notification mechanism

Synchronized, as we all know, is a single-thread Monitor lock. Only one thread can acquire the Monitor and enter the critical section to execute code, while the other threads enter the synchronization queue and wait until the thread releases the Monitor and receives a monitor. Exit notification before continuing to attempt to acquire the Monitor.

Object’s wait and notify operations wait for thread A to release the monitor, and then thread B can acquire the lock.

In fact, the reality is not as good as you think, thread B gets the lock, but maybe the condition is not satisfied, thread B can’t execute. For example, thread B can only execute with flag=true, but when it gets the monitor, flag=false. If thread B really wants to perform the following operations, it has two options:

Hold the monitor and wait in a loop to check whether the flag is true at intervals
Release the monitor, and when flag is true, it will be notified by other objects, and then it will fetch the monitor.

Conceivably, the second is more efficient and timely. This is also what we call a wait/notification mechanism.

Here we see that when conditions are not met, we can use the wait() method of the condition variable to put the currently executing thread into the wait queue. When conditions are met, we call notify or notifyAll methods of the condition variable to remove the thread from the wait queue. So some of you are going to be confused. Is there a difference between the wait queue and the blocking queue mentioned above because the thread didn’t get the monitor?

Of course there is! The blocking queue above is a queue that has not acquired the monitor, while the waiting queue here is a queue that has acquired the monitor, but the condition to continue running is not met, so it falls into the wait state. They are different concepts. Taking chasing girls as an example, we can understand that blocking queue is to queue into the door of the girl’s house, and to chat with the girl after entering the door, but after chatting, we have to enter the spare tire pool and wait for the girl to choose a husband. After sister picked up the phone, you can re-line up and fall in love with the sister.

Using wait(), notify(), and notifyAll() there are some details to note:

Using these three methods requires locking the calling object first
After the wait method is called, the thread state changes from Running to Waiting and the current thread is placed on the object’s wait queue
After notify or notifyAll is called, the wait thread does not return from wait. Instead, it must retrieve the lock before returning.
Notify moves one wait thread from the wait queue to the synchronization queue, and notifyAll moves all threads to the synchronization queue.

3. To summarize

This article covered the Java memory model and the synchronization mechanism for threads. The Java memory model focuses on happens-before rules, volatile, and locking. Once these things are understood, the implementation paradigm in the Concurrent package can be understood. By understanding the wait/notification model for threads, we can better understand the use of locks.

That’s all for this article. In the future, I may learn the design and implementation of Java concurrent tools. If there is something worth sharing, I will write another article to share it. Please look forward to it.

reference

“Java Concurrent Programming art” “Java Concurrent Programming Actual Combat” Wang Baoling

I am Android stupid bird journey, stupid birds also want to have the heart to fly up, I am here to accompany you slowly become stronger. Looking forward to your attention