Personal creation convention: I declare that all articles created are their own original, if there is any reference to any article, will be marked out, if there are omissions, welcome everyone critique. If you find online plagiarism of this article, welcome to report, and actively submit an issue to the Github warehouse, thank you for your support ~
This article refers to a large number of articles, documents and papers, but it is really complicated. My level is limited, and my understanding may not be in place. If you have any objections, please leave a comment. This series will continue to be updated, and you are welcome to comment on your questions and errors and omissions
If you like the single-part version, please visit: The most hardcore Java New Memory Model parsing and Experimental single-part version (constantly updated in QA) If you like this split version, here is the table of contents:
- 1. What is the Java memory model
- Analysis and experiment of the new memory model in Java – 2. Atomic access and word splitting
- Core Understanding of the Memory barrier (CPU+ compiler)
- The most core Java new memory model analysis and experiment – 4. Java new memory access method and experiment
- Net most core Java new memory model analysis and experiment – 5. JVM underlying memory barrier source analysis
JMM related documents:
- Java Language Specification Chapter 17
- The JSR-133 Cookbook for Compiler Writers – Doug Lea’s
- Using JDK 9 Memory Order Modes – Doug Lea’s
Memory barriers, CPU and memory model related:
- Weak vs. Strong Memory Models
- Memory Barriers: a Hardware View for Software Hackers
- A Detailed Analysis of Contemporary ARM and x86 Architectures
- Memory Model = Instruction Reordering + Store Atomicity
- Out-of-Order Execution
X86 cpus
- x86 wiki
- Intel® 64 and IA-32 Architectures Software Developer Manuals
- Formal Specification of the x86 Instruction Set Architecture
ARM CPU
- ARM wiki
- aarch64 Cortex-A710 Specification
Various understandings of consistency:
- Coherence and Consistency
Aleskey’s JMM:
- Aleksey Shipilev – Don’t misunderstand the Java Memory Model (Part 1)
- Aleksey Shipilev – Don’t misunderstand the Java Memory Model (part 2)
Many Java developments use Java’s concurrency synchronization mechanisms, such as volatile, synchronized, and Lock. Also there are a lot of people read the JSR chapter 17 Threads and Locks (address: docs.oracle.com/javase/spec…). , including synchronization, Wait/Notify, Sleep & Yield, and memory models. But I also believe that most people like me, the first time reading, feeling is watching the fun, after reading only know that he is such a regulation, but why such regulation, not so regulation, do not have a very clear understanding. At the same time, combined with the Hotspot implementation and the interpretation of Hotspot source code, we even found that due to the static code compilation optimization of Javac and the JIT compilation optimization of C1 and C2, the final code performance was not quite consistent with our understanding of the code from the specification. In addition, such inconsistencies lead to misunderstandings when we learn the Java Memory Model (JMM) and understand the design of the Java Memory Model if we try to use actual code. I myself am constantly trying to understand the Java memory model, rereading THE JLS and various gods’ analyses. This series will sort through some of my personal insights from reading these specifications and analyzing them, as well as some experiments with JCStress, to help you understand the Java memory model and API abstraction post-Java 9. Still emphasize, however, the design of the memory model, the starting point is to let you don’t have to care about the underlying and abstracting some design, involves a lot of things, my level is limited, might not reach the designated position, understand I will try to put the arguments of the each out as well as the reference, please do not completely believe that all the views here, If you have any objections, please refute them with specific examples and leave a comment.
8. Analysis of underlying JVM implementation
8.1. Definition of OrderAccess in the JVM
The JVM uses memory barriers in various ways:
- Implement Java’s various syntactic elements (volatile, final, synchronized, etc.)
- Implementing various JDK apis (VarHandle, Unsafe, Thread, etc.)
- Memory barriers needed for GC: Consider whether GC multithreading and application threads (called mutators in GC algorithms) work stop-the-world or concurrent
- Object reference barrier: For example, generational GC, replication algorithm, when young GC we usually copy live objects from one S-region to another S-region, if the replication process, we do not want to Stop the world (STW), but at the same time with the application thread, then we need memory barrier, for example;
- Maintenance barriers: Such as partition GC algorithms, we need to maintain cross-partition reference tables and usage tables for each partition, such as Card tables. This also requires a memory barrier if we want the application thread to modify access concurrently with the GC thread, rather than stopping the world.
- JIT also requires a memory barrier: similarly, a memory barrier is needed to determine whether the application thread interprets executing code or executes jIT-optimized code.
These memory barriers, different CPUS, different operating systems, the underlying need different code implementation, unified interface design is:
Source code address:orderAccess.hpp
Different CPU, different operating system implementation is not the same, combined with the previous CPU out of order table:
Let’s look at the implementation of Linux + x86:
Source code address:orderAccess_linux_x86.hpp
For x86, since Load is guaranteed to be consistent with Load, Load with Store, and Store with Store, as long as the compiler is not out of order, there is a built-in StoreStore, LoadLoad, LoadStore barrier, So here we see that the implementation of StoreStore, LoadLoad, LoadStore barriers are all just compiler barriers. Acquire is equivalent to adding LoadLoad and LoadStore barriers after Load. For x86, compiler barriers are still needed. Release is equivalent to adding LoadStore and StoreStore in front of Store, but for x86 you still need a compiler barrier. Thus, we have the following table:
Let’s take a look at the implementation of Linux AARCH64, which we often use:
Source code address:orderAccess_linux_aarch64.hpp
As mentioned in the previous table, ARM CPU Load and Load, Load and Store, Store and Store, Store and Load are all out of order. Instead of using CPU instructions directly for aarch64, the JVM uses a memory barrier implementation wrapped in C++. C++ encapsulates much like the simple CPU memory barrier we talked about earlier, namely the read memory barrier (__atomic_thread_fence(__ATOMIC_ACQUIRE)), Write memory barriers (__atomic_thread_fence(__ATOMIC_RELEASE)) and write and write memory barriers (full memory barriers, __sync_synchronize()). The function of acquire is to unpack the contents of the packet as a receiving point, which is similar to the simple CPU model. In fact, it blocks and waits for the Invalidate queue to complete processing to ensure that there is no dirty data in the CPU cache. Release acts as a launch point to pack up the previous updates and send them out. Analogous to a simple CPU model, it blocks and waits for the Store Buffer to flush completely into the CPU cache. Therefore, acquire and release are implemented using read and write memory barriers, respectively.
If the invalidate queue is complete, the invalidate queue will block the read memory from the first Load. Once the invalidate queue is complete, there will be no dirty data in the current CPU. Therefore, there is no need to wait for the current CPU’s Store buffer to empty.
StoreStore ensures that the first Store precedes the second, so it blocks the read buffer after the first write and waits for the Store buffer to flush into the CPU. In the case of StoreLoads, this is special because the second Load needs to see the latest value of the Store, meaning that updates cannot only reach the Store buffer, and expiration cannot be left unprocessed in the Invalidate Queue. Therefore, read/write memory barriers (full barriers) are required.
8.2. Volatile and final memory barrier source code
Let’s look at the code for volatile memory barrier inserts, using ARM as an example. We can actually trace iload bytecode to see what happens if the load is volatile or final, And istore to see what happens if the store is volatile or final
For field access, the JVM also has fast paths and slow paths, so let’s just look at the code for fast paths:
Corresponding source code:
Source code address:templateTable_arm.cpp
Wechat search “my programming meow” public account, add the author’s wechat, a daily brush, easy to improve skills, won a variety of offers