Hello, I am Xiao Huang, a Java development engineer of Unicorn Enterprise. Thank you for meeting us in the vast sea of people. As the saying goes: When your talent and ability are not enough to support your dream, please calm down and learn. I hope you can study with me and work hard together to realize your own dream.

One, the introduction

For Java developers, the underlying knowledge is generally used as a black box, without having to open the black box.

But with the current development of the programmer industry, it’s important to open this black box and explore its secrets.

This series of articles will take you through the mysteries of the underlying black box.

Before reading this article, it is recommended that you download the openJDK for better results

OpenJDK download address: openJDK

If you feel that the network speed is slow, it is recommended to pay attention to the public number: love to knock code xiao Huang, send: openJDK can get baidu network disk link.

Everyone can give me a point of attention ~

Second, operating system

1. Out-of-order CPU execution

The CPU executes instructions while performing read wait, which is the root cause of CPU disorder. It improves efficiency, not disorder

Let’s look at the following program:

x = 0;
y = 0;
a = 0;
b = 0;
Thread one = new Thread(new Runnable() {
    public void run(a) {
        Thread one starts first. Thread two can adjust the wait time according to the actual performance of your computer.
        //shortWait(100000);
        a = 1; x = b; }}); Thread other =new Thread(new Runnable() {
    public void run(a) {
        b = 1; y = a; }}); one.start(); other.start(); one.join(); other.join(); String result ="The first" + i + "(" + x + "," + y + ")";
if (x == 0 && y == 0) {
    System.err.println(result);
    break;
}
Copy the code

We can see that a = 1 must precede x = b and b = 1 must precede y = a if our CPU does not execute out of order

What can we conclude, that x and y can’t both be 0 at the same time?

Let’s run the program and get the following result:

We verified our results by running 2728,842 times.

2.1 Possible problems caused by disorder

Common examples: Why is the DCL volatile?

Let’s take the following example:

class T{
	int m = 8;
}
T t = new T();
Copy the code

Decompiler pool code:

0 new #2 <T>
3 dup
4 invokespeecial # 3 <T.<init>>
7 astore_1
8 return 
Copy the code

We analyzed the sink code step by step:

  • new #2 <T>: createm = 0Object and the stack frame has a reference to that object

  • dup: Copies a reference in our stack frame

  • invokespecial #3 <T.<init>>: pops up a value in a stack frame, instantiating its constructor

  • astore_1: assigns our stack frame reference to t, here1It refers to the number one in our local variable scale

So, we think of one thing, we have proved that the CPU is out of order, so what harm will it do to our operations?

When ourastore_1In ourinvokespeecial # 3 <T.<init>>Execution before execution causes us to assign an object that we haven’t instantiated tot, as shown below:So in order to avoid that, we have to get rightDCLvolatile

The question is, how do we guarantee order with volatile?

2.2 How do I Disable command Reordering

We talk about forbidden instruction reordering from the following three aspects:

  • The code level
  • Bytecode layer
  • The JVM level
  • The CPU level

2.2.1 Java Code level

  • I’ll just add onevolatile Keyword is ok
public class TestVolatile {
    public static volatile int counter = 1;

    public static void main(String[] args) {
        counter = 2; System.out.println(counter); }}Copy the code

2.2.1 Bytecode Layer

At the bytecode level, when volatile is decompiled, we see VCC_volatile

We decompile the above code to get its bytecode

Java to compile the class as a class file, and decompile the bytecode file using the javap -v TestVolatile. Class command

Public static volatile int counter = 1;

	public static volatile int counter;
    descriptor: I
    flags: ACC_PUBLIC, ACC_STATIC, ACC_VOLATILE
    // Below is the bytecode for initializing counter
    0: iconst_2
    1: putstatic     #2                  // Field counter:I
    4: getstatic     #3                  // Field 
Copy the code
  • Descriptor: indicates method parameters and return values

  • Flags: ACC_PUBLIC, ACC_STATIC, ACC_VOLATILE: flags
  • Putstatic: Performs operations on static attributes

The ACC_VOLATILE flag tells us that the variable has been modified by volatile

2.2.3 HotSpot source level

How does our JVM implement volatile variables?

We usually see these four words on our website: StoreStore, StoreLoad, LoadStore, LoadLoad

ourJVMThat’s true. Let’s look at the implementation.

In Java, static properties are class-based. To operate on static properties, the corresponding instruction is putStatic

We use the bytecodeInterpreter. CPP file in the openjdk8 root JDK \ SRC \hotspot\ Share \interpreter\zero path to process the putStatic instruction code:

CASE(_putstatic):
    {
          / /... Omit several lines
          // Now store the result
          // ConstantPoolCacheEntry* cache; Cache is a constant pool cache instance
          // cache->is_volatile() -- determines whether the volatile access flag modifier is present
          int field_offset = cache->f2_as_index();
          // **** key judgment logic ****
          if (cache->is_volatile()) { 
            // Assign logic to volatile variables
            if (tos_type == itos) {
              obj->release_int_field_put(field_offset, STACK_INT(- 1));
            } else if (tos_type == atos) {// Object type assignment
              VERIFY_OOP(STACK_OBJECT(- 1));
              obj->release_obj_field_put(field_offset, STACK_OBJECT(- 1));
              OrderAccess::release_store(&BYTE_MAP_BASE[(uintptr_t)obj >> CardTableModRefBS::card_shift], 0);
            } else if (tos_type == btos) {// Byte assignment
              obj->release_byte_field_put(field_offset, STACK_INT(- 1));
            } else if (tos_type == ltos) {// Long assignment
              obj->release_long_field_put(field_offset, STACK_LONG(- 1));
            } else if (tos_type == ctos) {// assign to type char
              obj->release_char_field_put(field_offset, STACK_INT(- 1));
            } else if (tos_type == stos) {// Short assignment
              obj->release_short_field_put(field_offset, STACK_INT(- 1));
            } else if (tos_type == ftos) {// A float assignment
              obj->release_float_field_put(field_offset, STACK_FLOAT(- 1));
            } else {// Double assignment
              obj->release_double_field_put(field_offset, STACK_DOUBLE(- 1));
            }
            // *** StoreLoad barrier *** after writing values
            OrderAccess::storeload();
          } else {
            // Assignment logic for non-volatile variables}}Copy the code

Cache ->is_volatile() JDK \ SRC \hotspot\ Share \utilities\accessFlags.hpp

  // Java access flags
  bool is_public      (a) const         { return(_flags & JVM_ACC_PUBLIC ) ! =0; }
  bool is_private     (a) const         { return(_flags & JVM_ACC_PRIVATE ) ! =0; }
  bool is_protected   (a) const         { return(_flags & JVM_ACC_PROTECTED ) ! =0; }
  bool is_static      (a) const         { return(_flags & JVM_ACC_STATIC ) ! =0; }
  bool is_final       (a) const         { return(_flags & JVM_ACC_FINAL ) ! =0; }
  bool is_synchronized(a) const         { return(_flags & JVM_ACC_SYNCHRONIZED) ! =0; }
  bool is_super       (a) const         { return(_flags & JVM_ACC_SUPER ) ! =0; }
  bool is_volatile    (a) const         { return(_flags & JVM_ACC_VOLATILE ) ! =0; }
  bool is_transient   (a) const         { return(_flags & JVM_ACC_TRANSIENT ) ! =0; }
  bool is_native      (a) const         { return(_flags & JVM_ACC_NATIVE ) ! =0; }
  bool is_interface   (a) const         { return(_flags & JVM_ACC_INTERFACE ) ! =0; }
  bool is_abstract    (a) const         { return(_flags & JVM_ACC_ABSTRACT ) ! =0; }
Copy the code

Obj ->release_long_field_put(field_offset, STACK_LONG(-1)) : JDK \ SRC \hotspot\share\ oops.oop.inline-hpp

jlong oopDesc::long_field_acquire(int offset) const                   { return Atomic::load_acquire(field_addr<jlong>(offset)); }
void oopDesc::release_long_field_put(int offset, jlong value)         { Atomic::release_store(field_addr<jlong>(offset), value); }
Copy the code

Let’s go to JDK \ SRC \hotspot\ Share \ Runtime \atomic. HPP and take a look at the atomic ::release_store method

inline T Atomic::load_acquire(const volatile T* p) {
  return LoadImpl<T, PlatformOrderedLoad<sizeof(T), X_ACQUIRE> >()(p);
}
template <typename D, typename T>
inline void Atomic::release_store(volatile D* p, T v) {
  StoreImpl<D, T, PlatformOrderedStore<sizeof(D), RELEASE_X> >()(p, v);
}
Copy the code

Const volatile T* p and volatile D* p are called with the C/C++ volatile keyword

Moving on, after we have performed the assignment, we will have this operation: OrderAccess:: storeLoad ();

We looked at the JDK \ SRC \hotspot\share\ Runtime orderAccess. HPP file and found this code

/ / barriers barrier
  static void     loadload(a);
  static void     storestore(a);
  static void     loadstore(a);
  static void     storeload(a);

  static void     acquire(a);
  static void     release(a);
  static void     fence(a);
Copy the code

We can clearly see that this is the read/write barrier of the JVM that we see on various websites

Of course, we need to see how it is implemented in linux_x86 under orderAccess_linux_x86.hpp of JDK \ SRC \hotspot\ OS_CPU \linux_x86

// A compiler barrier, forcing the C++ compiler to invalidate all memory assumptions
static inline void compiler_barrier(a) {
  __asm__ volatile ("" : : : "memory");
}

inline void OrderAccess::loadload(a)   { compiler_barrier(); }
inline void OrderAccess::storestore(a) { compiler_barrier(); }
inline void OrderAccess::loadstore(a)  { compiler_barrier(); }
inline void OrderAccess::storeload(a)  { fence();            }

inline void OrderAccess::acquire(a)    { compiler_barrier(); }
inline void OrderAccess::release(a)    { compiler_barrier(); }
Copy the code

2.2.4 CPU level

  • Intel primitive instructions:Mfence Memory barrier.Ifence read barrier.Sfence write barriers

As we can see, the most important line of code is __asm__ volatile (“” : : : “memory”);

  • __ASm__ : Instructs the compiler to insert an assembly statement here
  • Volatile: Tells the compiler that it is forbidden to recombine the assembly statements with other statements for optimization. That is: treat this and this compilation as it was.
  • ("" : : : "memory") : Memory forces the GCC compiler to assume that all RAM units have been modified by assembly instructions, so that the registers in the CPU and the cached memory units in the cache are invalidated. The CPU will have to re-read the data in memory as needed. This prevents the CPU from using the registers, or data in the cache, to optimize instructions instead of accessing the memory.

In a nutshell: tell our CPU, don’t give me a few blind optimization, I want to serial execution.

As you can see, these instructions are kept in order by changing the CPU’s registers and cache

This basic interview is about enough to beat 80% of the interviewees and interviewers, but our article is not enough!

When we look at these methods, we see a method called fence(), and we look at this method:

inline void OrderAccess::fence(a) {
   // always use locked addl since mfence is sometimes expensive
   // 
#ifdef AMD64
  __asm__ volatile ("lock; Addl $0, 0 (RSP) % %" : : : "cc"."memory");
#else
  __asm__ volatile ("lock; Addl $0, 0 (% % esp)" : : : "cc"."memory");
#endif
  compiler_barrier();
}
Copy the code

As we can see, our approach does not recommend that we use our primitive instruction mfence(memory barrier), because Mfence consumes more resources than locked

Determine whether AMD64 does the processing to its different register RSP \ ESP

"lock; Addl $0,0(%% RSP)" : add a 0 to the RSP register) instructions are a Full Barrier that locks up the memory subsystem to ensure execution in sequence, even across multiple cpus

At this point, we’re pretty close to volatile, and we should be able to beat 90% of the interviewers

2.3 hanppens – before principle

In simple terms, the JVM specifies the rules that reordering must follow.

  • Procedural order rule
  • Pipe lock rules
  • volatile
  • Thread start rule
  • Thread termination rule
  • Thread interrupt rule
  • Object finalization rule
  • transitivity

2.4 the as if serial

No matter how reordered, the result of single-threaded execution doesn’t change, right

Third, summary

I have been writing this article for about a week. The most difficult part is that I have not been able to find a process from shallow to deep, so I do not know how to write

This was done successfully, bringing my understanding of volatile one step further

At least after reading this article, don’t be afraid of any interviewer on the subject of volatile

The next step is to talk about merging writes, processes, threads, fibers, or algorithms

I am a unicorn enterprise Java development engineer, hope smart, lovely, kind you can point a concern ah, have a question can leave a message or private letter we next period goodbye!