Java concurrent programming learning sharing goals

  • Understand the use and usage of common tools and classes in Java concurrent programming
  • Understand the implementation principle and design ideas of Java concurrent programming tools
  • Learn about common problems and solutions encountered in concurrent programming
  • Know how to choose a more appropriate tool to achieve an efficient design solution according to the actual situation

Learning and sharing Team Team: Peiyou – Technology Center – Platform R&D Department – Operation R&D team Java Concurrent Programming sharing team: @Shen Jian @Cao Weiwei @Zhang Junyong @Tian Xinwen @Zhang Chen Sharing of this chapter: @Shen Jian

This chapter introduces the course

Volatile keyword Synchronized keyword

Volatile article

** Goals :** Understand the relationship between Java concurrency tools, JMM, MESI, and hardware

The use of the Volatile

Modify variables

private volatile long value;

The effect of Volatile

  1. Used to make a variable globally visible
  2. Prevent problems caused by reordering instructions

Some of you might ask, why do you want to ensure global visibility? What’s wrong with not ensuring global visibility? Let me use an example to explain: The following is a simple scene of a track and field race. The main thread is the starter, and the 10 sub-threads are 10 athletes. The race starts after the starter gives the command.

package com.company; public class Main { boolean start = false; public static void main(String[] args) { new Main().test(); } public void test() { Thread[] threads = new Thread[10]; for (int i = 0; i < 10; i++) { threads[i] = new Thread(() -> { int wait = 0; while (! start) { wait++; } System.out.println(Thread.currentThread().getName() + "run after second: " + wait); }); threads[i].start(); } start = true; try { Thread.sleep(20000); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

My local run results are as follows:



As you can see, only thread 9 senses the instruction and triggers the response action.

Why is that? The reason is that the main thread’s changes to the flag bit are not always visible to the child threads, as each Java thread has its own piece of memory data due to the existence of caches at various levels. In this case, the volatile keyword can be used for optimization. After adding the volatile keyword, the result is as follows:

The details of the Java memory model will be covered later

Unsolved mystery: Why the second-to-last thread in a thread group always executes the fastest.

Ok, now that I’ve explained visibility, I’m going to ask you why you should be careful about reordering instructions. There’s no guarantee that something will go wrong. Here’s an example:

package com.company; public class RearrangeTest { public static void main(String[] args) { for (int i = 0; i < 10000; i++) { new RearrangeTest().test(); } } public void test() { ReorderExample example = new ReorderExample(); Thread write = new Thread(() -> { example.writer(); }); Thread read = new Thread(() -> { example.reader(); }); write.start(); read.start(); } class ReorderExample { int a = 0; boolean flag = false; public void writer() { a = 1; // 1 flag = true; // 2 } public void reader() { if (flag) { // 3 int i = a * a; // 4 if (i ! = 1) { System.out.println(i); } } } } }Copy the code

In this example, the value after execution is indeterminate due to instruction reordering.

So where does the visibility problem come from, which I’ll cover next in the JMM model and MESI.

JMM model versus MESI

In the previous section, we introduced two problems of instruction rearrangement and variable visibility. In this section, we will analyze the causes of these two problems.

The first question is why there is a visibility problem. This is where the memory model comes in. JMM is short for the Java Memory Model, which is used to control when writes in one thread should be visible to another thread. In the JMM model, variables shared between threads are placed in main memory, and each thread makes a copy of memory to the thread’s private memory. Changes made to shared variables in a thread only take effect in private memory and are not visible to other threads. This is where the visibility problem comes from, as shown below:



The JMM model also provides a variety of mechanisms to nullize the visibility of variables across multiple threads, which we discuss in detail in the volatile Principle section.

Below we still introduce JMM this system is how to achieve, JMM really will copy a memory to the local, if we start dozens of threads, is not to copy dozens of minutes. This is a huge waste of memory. JMM is a virtual concept, not a real one. The sole purpose of the JMM abstraction is to provide the developer with a guarantee that changes to variables will be visible to other threads under certain circumstances. This mechanism is accomplished through the coordination of processor cache, write cache, registers, compilers and other hardware. This is a encapsulation of the MESI model.

Let’s take a look at the lowest level of the MESI protocol

The cache consistency protocol defines a cache line (usually 64 bytes) in the exclusive, shared, Modified, and invalid states, which describe whether the line is shared and modified by multiple processors. So the cache consistency protocol is also called the MESI protocol.

  • Exclusive: The cache row is owned only by the current processor, has not been modified, and is the latest value.
  • Share: More than one processor owns the cache row, and each processor has not modified the cache and is the latest value.
  • Modified: Only the current processor owns the cache row, and the cache row has been modified. Within a certain period of time, the cache row will be written back to main memory, and the write success status will change to S.
  • Invalid: The cache row has been modified by another processor. This value is not the latest value and needs to be read from main memory.

The agreement collaboration is as follows:

  • A cache row in the M state must always listen for all attempts to read the main memory address of the cache row, and if so, must write the data in the cache row back to the CPU before the operation is performed.
  • A cached row in the S state must always listen for requests to invalidate or monopolize the cached row, and if it does, its cached row state must be set to I.
  • A cache line in the E state must always listen for other attempts to read the main memory address corresponding to the cache line. If it listens, its cache line state must be set to S.
  • When the CPU needs to read data, if there are no cache in a CPU cache, will be read from the cache, the collocation of E state, if the status of the cache line is, I will need to read from memory, and make their state S, if not I, you can directly read the values in the cache, but before that, you must wait for the other CPU surveillance results, If other cpus have caches of this data and the status is M, they need to wait for the cache to be updated to the memory before reading the data.
  • When the CPU needs to write data, it can only do so if its cache line is M Or E. Otherwise, it needs to issue special RFO instructions (Read Or Ownership, which is a bus transaction) to inform other cpus that the cache is invalid (I). In this case, the performance overhead is relatively high. After the write is complete, change its cache state to M.

Here’s a picture to illustrate:

So that’s MESI. In general, Java concurrency depends on JMM, JMM depends on MESI, and MESI depends on hardware.

In a hypothetical scenario where JMM semantics require a variable to be visible across multiple threads, the Java compiler inserts an instruction in the generated code that operates on the MESI CPU cache, borrowing MESI to implement the semantics. I describe this process in the following section on the volatile principle.

The principle of Volatile

public class VolatileCompileTest { volatile int v1 = 0; int a = 0; public void write() { a = v1; v1 = v1 + 1; } public int read() { v1 = 0; return a; } public static void main(String[] args) { VolatileCompileTest ins = new VolatileCompileTest(); for (int i = 0; i < 1000 * 1000; i++) { ins.write(); } System.out.println(ins.v1); }}Copy the code

What happens when the volatile keyword is used in the code shown above? We can output the code assembly result to look at:

Here is the assembly code without volatile:

The rSI address is the address of this, and 0xc(%rsi) represents v1. As you can see, after volatile v1 is computed, a lock addl is added.

This instruction controls the CPU to write the current line of the processor’s cache back to system memory, marking the cache flag as M. At this point, the caches of other processors will reach this state and mark the corresponding line of their cache as I, and the next read will be from main memory again.

The JMM solution to the visibility problem is to insert the LOCK directive into the compiled code

For reordering problems, the JMM abstractly provides Javaer with two promises: happens-before and as-if-Serial rules

Happens-before rules

The happens-before rule is defined as follows:

If an operation A happens-before another operation B, A is performed before B, and the result of the execution is visible to B

If two operations have a happens-before relationship this does not mean that the program will always execute according to the specified relationship, as long as the result of the execution remains, the JMM allows this sort of ordering

What actions can be taken to control the happens-before rule? The Java memory model defined in JSR-133 defines the following happens-before rule

As-if-serial the as-if-serial rule the original instructions for single-threaded execution of code compiled by the compiler and jit compilation can be different from the code written by the programmer, but the execution results must be consistent

To implement the convention above, the Java compiler inserts the following barriers into the bytecode:

  1. A StoreStore barrier is inserted in front of each volatile write to prevent reordering of previous writes with volatile writes, and to prevent writing to two variables in the opposite order in another thread
  2. Insert a StoreLoad barrier at the end of each volatile write to prevent reordering of subsequent reads and volatile writes
  3. Insert a LoadLoad barrier after each volatile read to prevent volatile reads and volatile read reordering
  4. Insert a LoadStore barrier at the end of each volatile read to prevent volatile writes and volatile read reordering. These barriers are also virtual to see how the compiler handles instruction reordering when generating code. To ensure that volatile semantics are correct in any program on any platform

Here is an example:

Learning about the concept of Volatile design, I developed three principles for getting things done with minimal cost

With locks, why do we need them? They don’t require locks, and in some cases they can be more efficient than locks. In our design process, there are many feasible solutions for each problem, and most of the time we need to choose the one with the least cost among these feasible solutions. This ability is one of the most important skills on our way to becoming a development expert, and we need to learn from it.

public class VisibilityTest1 { boolean start = false; public synchronized boolean isStart() { return start; } public synchronized void setStart(boolean start) { this.start = start; } public static void main(String[] args) { new VisibilityTest1().test(); } public void test() { Thread[] threads = new Thread[100]; for (int i = 0; i < threads.length; i++) { threads[i] = new Thread(() -> { int wait = 0; while (! isStart()) { wait++; } System.out.println(Thread.currentThread().getName() + " run after second: " + wait); }); threads[i].start(); } setStart(true); try { Thread.sleep(20000); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code

Understand the underlying principles to write the ultimate performance

Concurrency guru Doug Lea developed LinkedTransferQueue in JDK1.7, and added a lot of useless objects to the class when using volatile variables to improve efficiency:

The reason why Doug Lea did this was that the main processor cache lines were 64 bytes at the time, and appending nodes to 64 bytes meant that each node occupied exactly one cache line, preventing cache locking that would occur when a single cache line was read and written simultaneously.

Of course, for most of our programming we don’t have to go that far. However, it is important to realize that many of the most popular software in the industry, such as Kafka’s zero-copy technology, clickHouse’s vector computing, and so on, have become accepted technology models in the industry because of their pursuit of extreme performance.

Synchronized article

Well, Synchronized is probably as familiar as anyone. I’m an old tool man in the Java programming world, serving the Java language since its release, and still evolving.

** Chapter Objectives: ** Know how to choose the appropriate lock according to the actual situation, how to design an adaptive lock

Why Synchronized

Volatile solves visibility and instruction reordering, but it does not solve atomicity problems in program execution. For example:

private volatile long value; value = 1000; // thread-safe value++; // Not thread safeCopy the code

Because for value++, after converting to bytecode, it can be seen that the program actually does multiple operations, and thread switching may occur during the execution of these operations, resulting in inaccurate execution results.

The use of the Synchronized

Can anyone name five uses of Synchronized?

  • Modification methods

  • Modify code block

    public synchronized void test1() { start++; }

    public void test2() { synchronized (this) { start++; }}

Principle of Synchronized

Before discussing the Synchronized keyword resources, let’s take a look at what happens when the Synchronized keyword is added:

public synchronized void test1();
    descriptor: ()V
    flags: (0x0021) ACC_PUBLIC, ACC_SYNCHRONIZED
    Code:
      stack=5, locals=1, args_size=1
         0: aload_0
         1: dup
         2: getfield      #2                  // Field start:J
         5: lconst_1
         6: ladd
         7: putfield      #2                  // Field start:J
        10: return
      LineNumberTable:
        line 6: 0
        line 7: 10
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      11     0  this   Lcom/company/SyncTest1;
 
  public void test2();
    descriptor: ()V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=5, locals=3, args_size=1
         0: aload_0
         1: dup
         2: astore_1
         3: monitorenter
         4: aload_0
         5: dup
         6: getfield      #2                  // Field start:J
         9: lconst_1
        10: ladd
        11: putfield      #2                  // Field start:J
        14: aload_1
        15: monitorexit
        16: goto          24
        19: astore_2
        20: aload_1
        21: monitorexit
        22: aload_2
        23: athrow
        24: return
      Exception table:
         from    to  target type
             4    16    19   any
            19    22    19   any
Copy the code

As you can see, methods that add the Synchronized keyword directly to a method will have an ACC_SYNCHRONIZED flag when compiled into bytecode, and methods that add the Synchronized keyword to a code block will have an ACC_SYNCHRONIZED flag when compiled into bytecode. Monitorenter and Monitorexit directives were added.

Monitorenter and Monitorexit are called by the JVM when Synchronized is added to the monitorenter and Monitorexit.

Looking at our Monitorenter and Monitorexit directives, we can see that these two operations consume one parameter and produce zero parameters,

  • Decorates a method that takes a method object

  • Decorates a static method that takes a class object

  • In the following two examples, the write and read methods are mutually exclusive. In this example, the write and read methods are mutually exclusive

    public class SynchronizedExample1 { int a = 0; public synchronized void write(int i) { a = i; } public synchronized int read() { return a; } } public class SynchronizedExample2 { int a = 0; public void write(int i) { synchronized (Integer.valueOf(1)) { a = i; } } public synchronized int read() { synchronized (Integer.valueOf(1)) { return a; }}}

Each Java object is associated with a unique monitor object, as shown in the following figure:



_owner refers to the thread that is acquiring the lock, which is done with CAS

_waitSet refers to the collection of threads in the wait state after calling Object.wait

_EntryList refers to the collection of threads in wait state

Let me explain why we call Monitorenter or Monitorexit with an object parameter. Synchronized lock state is stored in the object header.

Let me give you the composition of the Java object header, to the 64-bit machine as an example.

The first 8 bytes of the heap used by a SynchronizedExample1 object are stored in MarkWork. The MarkWord structure is as follows:



As you can see, the contents of markwork are different in different lock cases. However, each level of lock will have two bits in the lock area, which are used to mark the type of lock. In other words, after the monitor checks the 63-64bit, it can know what lock is currently added to the object, determine the type of lock, and determine the distribution of data in markwork, and perform corresponding processing. For example, if it is determined that the current status of the lock is biased lock, it can read 1~54bit of data to find the id of the thread that is currently acquiring the biased lock to determine whether to directly release or upgrade to lightweight lock.



The above table is the corresponding relationship between the species and the flag bits. As you can see, the unlocked and biased locks are the same. They are both 01. How to distinguish the two locks is based on the biase_lock flag bit in the previous image.

In this section, we’ll look at upgrading a Synchronized lock.

Synchronized Upgrade Process

Learning about Synchronized design, I came up with three principles: if you can’t use a lock, don’t use it. Why Synchronized is the first to add biased lock, because biased lock is basically equivalent to no lock, if the critical section code is running in a single thread, using biased lock can achieve almost no lock performance.

Take a look at the following examples to see what is wrong with the program, what is the optimization method:

public class SynchronizedExample3 { static int a = 0; private static Object lock = new Object(); public static void main(String[] args) { Thread[] threads = new Thread[10]; for (Thread t : threads) { t = new Thread(() -> { synchronized (lock) { a += new Random().nextInt(10); }}); t.start(); } for (Thread t : threads) { try { t.join(); } catch (InterruptedException e) { e.printStackTrace(); } } System.out.println("result is: " + a); }}Copy the code

Optimistic lock in low concurrency and Pessimistic lock in High concurrency When studying the upgrading of Synchronized lock, we can see that Synchronized uses a combination of multiple locks to achieve higher efficiency in different application scenarios. When critical section data is accessed by multiple threads simultaneously, Synchronized first upgrades the lock to a lightweight lock. If the critical section data is not hotly contested and locks can be acquired with a finite number of CAS, then the cost of adding mutex can be eliminated, since mutex costs are often high. However, if the critical section is extremely competitive and the optimistic lock cannot be acquired multiple times, the overhead of the optimistic lock is very high, because the thread has to keep empty spin, consuming a lot of CPU resources. Synchronized does this by upgrading the lock to a heavyweight, or pessimistic, lock. The enlightenment that this brings to us is that different locks adapt to different operating conditions, the competition is not fierce, can get the lock soon, with optimistic lock will be better, the competition is more fierce, it takes a long time to get the lock, with pessimistic lock is appropriate.

In order to pursue universality, general tools often do a lot of adaptive work. If we want to pursue perfection, we need to make specific optimization for practical application based on general components. As can be seen from the upgrade process of Synchronized, as a general component, Synchronized does a lot of extra work to achieve better performance in different application scenarios. But if we know our application scenario, the extra work will not only to improve the performance it will consume the performance, for example, if we are a complicated with more intense scenes, or large task scenario, the lightweight lock the cas process should be avoided, should use a heavyweight lock directly. We can turn off the lightweight lock with a parameter at JVM startup.

Practice after class

Use CAS to implement a lock similar to Synchronized.

  • In single-threaded running, efficiency is similar to lockless
  • In the case of weak concurrency, the efficiency is better than adding mutex directly
  • In the case of strong concurrency, efficiency is better than optimistic locking

Resources & Extended reading on the Art of Concurrent Programming in Java

🎉🔥** Good future technology Exchange Group established!! * * 🔥 🎉

In order to provide you with a better, faster instant communication platform ~

Last Year, the Good Future Technology Exchange wechat Group was formally established before Last year ️ ‼ ️ ‼ ️

Here…

There are regular online and offline welfare activities for you to participate!

Gather in the industry in all directions of technology!!

There is good future technology line first-hand recruitment information!!

If you have other things you want, please feel free to chat privately at 📢

🌟 any small editor you want has 🌟

Also leng why ⁉ ️ ⁉ ️ rush 🦆!!