Thoughts on Java memory access reordering

preface

Take a look at some test code and come up with your own answer without the help of external tools.

import java.util.*;
import java.util.concurrent.CountDownLatch;

public class Reordering {
    static int a = 0;
    static int b = 0;
    static int x = 0;
    static int y = 0;
    static final Set<Map<Integer, Integer>> ans = new HashSet<>(4);
    public void help(a) throws InterruptedException {
        final CountDownLatch latch = new CountDownLatch(2);
        Thread threadOne = new Thread(() -> {
            a = 1;
            x = b;
            latch.countDown();
        });

        Thread threadTwo = new Thread(() -> {
           b = 1;
           y = a;
           latch.countDown();
        });
        threadOne.start();
        threadTwo.start();
        latch.await();
        Map<Integer, Integer> map = new HashMap<>();
        map.put(x, y);
        if (!ans.contains(map)) {
            ans.add(map);
        }
    }

    @Test
    public void testReordering(a) throws InterruptedException {
      for (int i = 0; i < 20000&& ans.size() ! =4; i++) {
          help();
          a = x = b = y = 0; } help(); System.out.println(ans); }}Copy the code

Your result ans may be [{0=>1}, {1=>1}, {1=>0}], because thread scheduling is random, it may be that one thread executes, the other thread gets the CPU execution, or two threads overlap, in which case the answer to ans is clearly the above three results. As for the thread execution order corresponding to the above three results, I will not simulate here, that is not the point. {0=>0} {0=>0}} {0=>0}} {0=>0}

X = b = > x = 0;
ThreadTwo executes b = 1, y = a => y = 0
ThreadOne executes a = 1. Or reverse the roles of threadOne and two. You may be wondering whyx = b happens before a = 1? This is essentially instruction reorder.

Instruction reordering

Most modern microprocessors use out-of-order instruction execution, where conditions permit, to run subsequent instructions that are currently capable of executing immediately, avoiding the wait to retrieve the data needed for the next instruction. By using out-of-order execution techniques, the processor can greatly improve execution efficiency. In addition to the CPU reordering instructions to optimize performance, the Java JIT also reorders instructions.

When is instruction reordering not done

So when and how to disallow instruction reordering? Otherwise all hell breaks loose.

Data dependency

First, instructions with data dependencies do not undergo instruction reordering! What does that mean?

a = 1;
x = a;
Copy the code

Just like the above two instructions, x depends on a, so x = a will not be reordered before a = 1.

There are three types of data dependencies:

Write and read, as in the example abovea = 1andx = aThis is typical of write after read, which does not reorder instructions.
Write after writing, e.ga = 1anda = 2Again, there’s no reordering.
There is one last kind of data dependency, which is read after write, such asx = aanda = 1.

The as – if – serial semantics

What is the as – if – serial? The as-if-serial semantics mean that the execution result of a single-threaded program cannot be changed, no matter how much reordering is done (to improve parallelism by the compiler and processor). So the compiler and CPU follow the as-if-serial semantics when reordering instructions. Here’s an example:

x = 1;   / / 1
y = 1;   / / 2
ans = x + y;  / / 3
Copy the code

For the above three instructions, instruction 1 and instruction 2 have no data dependence relationship, and instruction 3 depends on instruction 1 and instruction 2. Since reordering does not change our data dependencies, we can be sure that instruction 3 will not be reordered before instruction 1 and instruction 2. Let’s take a look at the above instructions compiled into a bytecode file:

public int add(a) {
  int x = 1;
  int y = 1;
  int ans = x + y;
  return ans
}
Copy the code

Corresponding bytecode

public int add(); Code: 0: iconst_1 // add int 1 to the operand stack 1: istore_1 // Write the top value of the operand stack to the second variable in the local variable table (since non-static methods pass this, this is the first variable) 2: Iconst_1 // add int 1 to the operand stack 3: istore_2 // Write the top value of the operand stack to the third variable in the local variable table 4: iloAD_1 // add the second variable to the operand stack 5: Iload_2 // add the value of the third variable to the stack 6: iadd // Add int to the top and next element of the stack, and push the result to the stack 7: istore_3 // store the value of the top of the stack into the fourth variable 8: Iload_3 // pushes the fourth variable to 9: iReturn // returnsCopy the code

The above bytecode we only care about 0->7 lines, the above 8 lines of instructions can be divided into:

Write x
Write y
Read the x
Read y
The addition operation is written back to ANS

For the above five operations, operation 1 May be reordered with 2 and 4, operation 2 May be reordered with 1 and 3ch, operation 3 May be reordered with 2 and 4, and operation 4 May be reordered with 1 and 3. It is possible to reorder the assignments x and y above. Yes, this is not hard to understand because writing x and writing y do not have an explicit data dependency. But operations 1 and 3 and 5 cannot be reordered, because 3 depends on 1 and 5 depends on 3, and operations 2, 4, and 5 cannot be reordered either.

Therefore, to ensure that data dependencies are not broken, reordering follows the as-IF-serial semantics.

@Test
    public void testReordering2(a) {
        int x = 1;
        try {
            x = 2;     //A
            y = 2 / 0;  //B
        } catch (Exception e) {
            e.printStackTrace();
        } finally{ System.out.println(x); }}Copy the code

It is possible to reorder code A and B because x and Y have no data dependencies and no special semantics. In order to ensure as-if-serial semantics, the Java exception handling mechanism does something special for reordering: The JIT inserts compensation code into the catch statement when reordering (i.e., reordering to A after B). This can complicate the logic in the catch statement, but the PRINCIPLE of JIT optimization is to optimize the logic in the normal operation of the program as much as possible, even at the expense of the complicated catch block logic.

Procedural order principle

If A happens-before B
If B happens — before C
A happens-before C

This is called happens-before transitivity

Reorder with JMM

The Java Memory Model (JMM) summarizes the following eight rules to ensure that the happens-before and happens-before operations are not reordered and the latter is visible to the former Memory.

The procedural order rule: Every action A in A thread happens before every action B in that thread, where all action B can come after A in the program.
Monitor lock rule: unlocking a monitor lock happens-before each subsequent locking of the same monitor lock.
The law of volatile variables: Writes to a volatile field happens-before each subsequent read or write to the same field.
Thread start rule: Within a Thread, a call to thread. start is happens-before the action of each starting Thread.
Thread termination rule: Any action in a Thread is happens-before another Thread detects that the Thread has terminated, or returns successfully from the thread. join call, or thread. isAlive returns false.
The interrupt rule: One thread calls an interrupt happens-before before the interrupted thread finds the interrupt.
Finalizer: The end of an object’s constructor happens-before the start of the object’s finalizer.
Transitive: If A happens-before B and B happens-before C, then A happens-before C.

Double-check singleton pattern causing an error by instruction reordering

Someone must have written the following double-check singleton pattern

public class Singleton {
    private static Singleton instance;

    public static Singleton getInstance(a) {
        if (instance == null) {
            synchronized (Singleton.class) {
                if (instance == null) {
                    instance = newSingleton(); }}}returninstance; }}Copy the code

But is this double-check lock singleton normal? No. Because creating an instance object is not an atomic operation, and reordering can occur as follows:

Application memory
Initialize the
Instance refers to the allocated chunk of memory

It is possible for operations 2 and 3 above to be reordered. If operation 3 is reordered before operation 2, instance 2 is not null and is not safe.

So how do you prevent this instruction reordering? Modified as follows:

public class Singleton {
    private static volatile Singleton instance;

    public static Singleton getInstance(a) {
        if (instance == null) {
            synchronized (Singleton.class) {
                if (instance == null) {
                    instance = newSingleton(); }}}returninstance; }}Copy the code

The volatile keyword has two semantics: One keep visibility, the semantic we will learn the next blog (is actually a thread changes are visible to another thread, if not volatile, threads are in TLAB with copies of the operation, after the revised copy of the value is not instant flushed to main memory, other threads are not visible) secondly, instruction reordering is forbidden, If the instruction reorder is prohibited when new is up here, so we get the desired situation.

As an aside, thread-safe singletons can often be implemented in the form of static inner classes, which are most appropriate.

public class Singleton {
    public static Singleton getInstance(a) {
        return Helper.instance;
    }

    static class Helper {
        private static final Singleton instance = newSingleton(); }}Copy the code

How do I disable instruction reordering

We used to allow resort and disallow resort, but how does resort disallow work? This is done using memory barrier CPU instructions, which, as the name suggests, add a barrier that won’t let you resort.

Memory barriers can be categorized into the following types:

LoadLoad barrier: for statements such as Load1; LoadLoad; Load2: Ensure that the data to be read by Load1 is completed before the data to be read by Load2 and subsequent read operations are accessed.
StoreStore barrier: for statements like Store1; StoreStore; Store2. Before Store2 and subsequent write operations are executed, ensure that the write operations of Store1 are visible to other processors.
LoadStore barrier: for statements such as Load1; LoadStore; Store2, ensure that the data to be read by Load1 is completed before Store2 and subsequent write operations are flushed out.
StoreLoad barrier: for statements like Store1; StoreLoad; Load2, ensure that writes to Store1 are visible to all processors before Load2 and all subsequent reads are executed. It is the most expensive of the four barriers. In most processor implementations, this barrier is a universal barrier that doubles as the other three memory barriers.

The original link