The introduction
This article is based on my personal understanding of the Java memory model and related books and materials to thoroughly analyze the JMM memory model. This paper first describes the JVM memory model, hardware and OS (operating system) memory region architecture, Java multi-threading principle and the serial relationship between the Java memory model JMM, and then further analysis of the Java memory model. The purpose of this article is to help you understand the Java memory model (JMM) thoroughly. (My articles are written on the premise of personal understanding + relevant books and materials, if you are wrong or doubt, welcome to read the comments section, thank you!)
Understand thoroughly the differences between the JVM memory model and the Java memory model JMM
1.1 JVM Memory Model (JVM Memory Region Partitioning)
As we all know, Java programs must be built on the PREMISE of the JVM if they want to run. Java uses the JVM to shield the VIRTUAL machine from direct contact with the operating system or OS like C, so that the Java language operation is built on the BASIS of the JVM, which can be done regardless of the platform, one compile everywhere.
The JVM when running a Java program to manage the memory is divided into the above areas (runtime data area), each region has its own purpose and play their roles in the Java program is running, and actually the runtime data area and runtime data division will be divided into the thread private area and Shared area (GC does not happen in a thread private area), The specific roles of each region are as follows:
Method Area:The method area (changed to metadata space after Java8) is a thread shared memory area, also known as non-heap. It is used to store data such as class information, constants, static variables, and just-in-time compiled code that has been loaded by the virtual machine. According to the Java Virtual Machine specification, OutOfMemoryError is thrown when the method area cannot meet memory allocation requirements. It is worth noting that there is an area in the methods area called the Runtime Constant Pool, which is used to store the various literals and symbolic references generated by the compiler that will be stored in the Runtime Constant Pool after the class is loaded for later use.
JVM Heap: The Java heap is also an area of memory shared by threads. It is created when the VIRTUAL machine is started. It is the largest area of memory managed by the Java Virtual machine and is used to hold Object instances. Note that the Java heap is the main area managed by the garbage collector, so it is often referred to as the GC heap. OutOfMemoryError will be thrown if there is no memory in the heap to complete the instance allocation and the heap cannot be extended.
Program Counter Register: A data area that is private to a thread. It is a small memory space that represents the bytecode line number indicator executed by the current thread. When the bytecode interpreter works, it selects the next bytecode instruction to be executed by changing the value of this counter. Branch, loop, jump, exception handling, thread recovery and other basic functions need to be completed by this counter. The main effect is that the CPU’s time slice will “interrupt” when a thread is scheduled to work so that another thread can start working, so when this “interrupt” thread is re-scheduled by the CPU how to know which line of code was executed last time? That’s the program counter that’s responsible for this.
Java Virtual Machine Stacks:An area of data that is private to the thread, created at the same time as the thread, totals with respect to the thread, and represents the memory model of the execution of Java methods. When the thread starts executing, each method execution creates a stack frame to store the method’s variator, operand stack, dynamically linked methods, return value, return address, and so on. Each method from the end of the call for a frame in the virtual machine stack process, as follows:
Native Method Stacks: The local method stack belongs to the private data area of the thread, which is mainly related to the Native method written by C used by the VIRTUAL machine. When a program needs to call the Native method, the JVM will maintain a local method registration table in the local method stack, which is only to register which local method interface is called by which thread. The call does not occur directly in the local method stack, because it is just a call registry, and the actual call requires the local method interface to call C functions in the local method library. In general, we do not need to worry about this area.
Say this is the need to let everybody understand clearly the content of the JVM memory model and the JMM memory model is totally two different concepts, the JVM memory model is in the Java virtual machine level, the JVM, actually for the operating system, JVM still exists in main memory, essentially the JMM is a Java language and OS and hardware architecture level, The main function is to specify the hardware architecture and memory model of the Java language, and there is no such thing as JMM per se. JMM is just a specification, not some technical implementation.
1.2 overview of the Java Memory Model JMM
The Java Memory Model, or JMM, is an abstract concept that doesn’t really exist. It describes a set of rules or specifications that define how variables in a program (including instance fields, static fields, and the elements that make up array objects) can be accessed. Because the JVM to run the program of the entity is a thread, and every thread creation when the JVM to create a working memory (called the stack space) in some places, and threads used to store the private data, while the Java memory model that all variables are stored in main memory, main memory is Shared memory region, all threads can access, Thread but if you want to read the assignment to a variable such as operation must be in working memory, so a thread to operating variables to first get a variable from the main memory copy their working memory space, and then to operating variables, variable again after operation to complete flash back to the main memory and cannot be directly operating variables in main memory, The working memory stores a copy of a variable in main memory. Does the stack store more than just the reference addresses of objects? Briefly, when the thread actually runs to this line, it will find the real object in main memory according to the object reference address in the local table, and then copy the object to its own working memory before operating….. , but when the object of action is a big object (1 MB +) doesn’t completely copy, but their operations and need to copy) that part of the members, said earlier that the working memory is the private data area each thread, so don’t have access to the other side of the work between different threads of memory, the communication between threads (by value) must be done through the main memory, The brief access process is shown below:
Pay attention!! The JMM is a set of rules that govern how variables in a Java sequence can be accessed in shared and private data regions. The JMM is built around atomicity. Ordered, visible, and extended. JMM and Java memory area only similarities, there are Shared data area and the private data area, main memory in JMM belongs to the Shared data area, from a certain extent area should include the heap and method, and thread the private data area, the working memory data from a certain extent it should be including the program counter, virtual machine as well as the method of local stack. Perhaps in some places we might see main memory described as heap memory and working memory as thread stack, but they all mean the same thing. The following describes the main and working memory in the JMM:
Main memory:Main storage is a Java object instance, all threads create an instance of the object are stored in main memory (excluding opens the escape analysis and scalar replace allocated on the stack and TLAB distribution), and whether the instance object is a member variable or method of local variables (also called a local variable), of course also includes the Shared information, constants, and static variables. Since it is a shared data area, multiple threads may find thread-safety issues when performing non-atomic operations on the same variable.
Working memory:Each thread can only access its own working memory. That is, local variables in a thread are not visible to other threads, even if two threads execute the same code. They also create local variables in their working memory that belong to the current thread, including bytecode line number indicators and information about Native methods. Note that since working memory is the private data of each thread, threads cannot access working memory with each other, and communication between threads still depends on main memory. Therefore, data stored in working memory is not thread-safe.
Now that you know what main memory and working memory are, take a look at the types of data stores and how they operate. According to the virtual machine specification, for member methods in an instance object, If the method contains the local variable is the basic data types (Boolean, byte, short, char, int, long, float, double), will be directly stored in working memory, local variables in the stack frame structure, but if the local variable is a reference type, The object’s in-memory reference address will be stored locally in the frame stack structure of working memory, and the object instance will be stored in main memory (shared data area, heap). However, a member variable of an instance object, whether it is a primitive or wrapper type (Integer, Double, etc.) or a reference type, is stored in the heap (except for stack allocation and TLAB allocation). Static variables and information about the class itself will be stored in main memory. It is important to note that in the main memory of the instance objects can be multi-threaded Shared, if two threads simultaneously invoked in the same class of the same method, then the two threads will be operating data copies of a job to his memory, completes operation after flushed to main memory, simple diagram as shown below:
Second, computer hardware memory architecture, OS and Java multithreading implementation principle and Java memory model
2.1 Computer hardware memory architecture
As shown in the figure above, the simplified diagram of CPU and memory operations is not quite so simple, but we have omitted the north and south Bridges for ease of understanding. In terms of the current computer, usually with multiple cpus and each CPU core, there may be multiple multi-core is integrated in a single processor (CPU) of two or more complete calculation engine (kernel), so that you can support multitasking parallel execution, from the scheduling of threads, each thread will be mapped to each CPU core run in parallel. Inside the CPU, there is a set of CPU registers. Registers are the data that the CPU accesses and processes directly. General CPU can access data from internal to register, and then for processing, but due to the processing speed is far lower than the CPU memory, the CPU processing instruction tend to spend too much time in waiting for memory to prepare for work, so in the register is added between the CPU and main memory cache, CPU cache is small, but the access speed is much faster than main memory, If the CPU always operates on the same location of the data in the main memory, it is easy to affect the CPU execution speed. At this time, the CPU cache can temporarily save the data extracted from the memory. If the register needs to fetch the data in the same location in the memory, it can be directly extracted from the cache, without directly accessing the data from the main memory. Note that the register does not always fetch data from the cache. In case the data is not in the same memory address, the register must bypass the cache and fetch data directly from memory. So don’t get the cache the data every time, have a professional name this phenomenon called a cache hit ratio, from the cache hit, not take from the memory from the cache, it missed, visible cache hit ratio of high and low will also affect CPU performance, this is a brief interaction between CPU, cache and main memory process, In short when a CPU need access to main memory, will read the first part of main storage data to the CPU cache (of course, if need to exist in the CPU cache data will be directly from the cache), then it reads the CPU cache to register, when the CPU needs to write data to the main memory, also in the data to the CPU cache refresh register first, The data is then flushed to main memory. In fact, it is similar to Appcalition(Java) –> Cache(Redis) –> DB(MySQL). The performance of Java programs is affected because DB needs to go to disk, resulting in Java programs need to wait for THE processing result of DB when processing requests. The thread that is responsible for processing the request is waiting in a blocking state until the DB returns the result. In fact, the problem in the whole model is: The speed of DB can not keep up with the performance of Java programs, resulting in the whole request processing becomes very slow, but in fact in the process of DB processing Java threads are blocked and not working, so it is not necessary, because it will eventually lead to the overall system throughput decline. At this point, we can add Cache(Redis) to improve program response efficiency, thus improving overall system throughput and performance. In fact, the goal of performance optimization is to speed up the processing of every layer of the system, and architecture is really to design a system that can handle a larger number of requests.
2.2 The relationship between OS and JVM threads and the implementation principle of Java threads
In the above elaboration we roughly understand the hardware memory architecture and JVM memory model and Java memory model, then understand the implementation principle of Threads in Java, understand the implementation principle of threads, help us understand the relationship between Java memory model and hardware memory architecture, on Windows OS and Linux OS, The implementation of Java thread is based on one-to-one thread model. The so-called one-to-one model is actually to indirectly call the system kernel thread model through language-level programs. That is, when we use Java thread, for example:new Thread(Runnable);
Inside the JVM, the kernel thread of the current operating system is instead called to complete the current Runnable task. There is a term to understand here, kernel-level Thread (KLT), which is supported by the operating system Kernel. This kind of Thread is switched by the operating system Kernel, which schedules the Thread by operating the scheduler, and maps the tasks of the Thread to each processor. Each kernel thread can be seen as a doppelgant of the kernel, which is why operating systems can multitask simultaneously. Because the multithreaded programs we write are language-level, they don’t call kernel threads directly. Instead, they call light-weight processes, threads in general, because each lightweight Process maps to a kernel thread. Therefore, we can call the kernel thread through a lightweight process, and then the operating system kernel maps the tasks to each processor. This one-to-one relationship between the lightweight process and the kernel thread is called the one-to-one model of threads and OS in Java programs. The diagram below:
Each thread in a Java program is mapped by the OS to the CPU for processing. Of course, if the CPU has multiple cores, a SINGLE CPU can schedule and execute multiple threads in parallel.
2.3 Relationship between JMM and hardware memory architecture
Through the previous JVM memory model, Java memory model JMM, hardware memory architecture, and the implementation principle of Java multithreading, we can find that the execution of multithreading eventually maps to the hardware processor for execution, but the Java memory model and hardware memory architecture are not exactly the same. Only for hardware memory registers, the concept of cache memory, primary memory, not the working memory (thread the private data area) and main memory (heap memory), that means the division of the memory of the Java memory model for hardware memory and does not have any effect, because the JMM is an abstract concept, is a set of rules that does not actually exist, Whether working memory data or data from main memory, for computer hardware will be stored in the computer main memory, of course, may also be stored in the CPU caches or register, so in general, the Java memory model and the computer hardware memory architecture is an overlapping relations, is an abstract concept division and cross of the real physical hardware. (Note that the same is true for JVM memory partitioning.)
2.4. Why is there a need for JMM?
Let’s move on to the necessity of the Java memory model, because we need to know what we know when we learn something. Since threads are the smallest unit of operation in the OS, all program runtime entities are essentially threads, including Java programs that need to run on the OS. When each thread is created, the JVM creates a working memory (sometimes called stack space) for it to store thread-private data. If a thread wants to operate a variable in the main memory, it must do so indirectly through the working memory. The main process is to copy the variable from the main memory to the thread’s own working memory space, and then operate the variable in the working memory first. After the operation is completed, the variable is written back to the main memory. Thread-safety problems can arise if two threads operate on variables of an instance object in main memory. Int I = 0, int I = 0, int I = 0
Now, two threads, A and B, operate on the variable I respectively. Each thread copies I in main memory to its own working memory and stores it as the shared copy of variable I, and then increments I. So, suppose that A/B copies I =0 in main memory to its own working memory at the same time for operation. So in fact I in A work in his memory on the operating is A copy of the memory of B work I not visible, then A will done since increase operating results 1 flash back to the main memory and B also did i++ operation at this time, so in fact B flash back to the main memory value is based on previous copy from main memory to the value of the working memory I = 0, So B actually writes back to main memory to be 1, but I’m actually incrementing I in main memory with both threads, ideally I =2, but in this case I =1.
The second case (right) :
Suppose thread A now wants to change the value of I to 2, and thread B wants to read the value of I. Does thread B read the value of 2 after thread A updated it or 1 before it updated it? The answer is not sure, that is, B thread may read A thread to update the value of 1, before the may also read A thread on the updated value 2, this is because the working memory is each thread private data area, and thread A modified variable I, first of all is to variable copy from main memory to A thread’s working memory, and then to operating variables, Operation is completed before the variable I wrote back to the Lord, and for B threads are similar, so it may cause data consistency problems between main memory and working memory, if A thread after the modification is to write data back to the main memory and the thread B is read from main memory, at this time is I = 1 copy to their working memory, So thread B will read x=1, but if thread A writes x=2 back to main memory before thread B starts reading, then thread B will read x=2, but which one comes first? It’s not certain.
Therefore, the above two situations should not be applied to the program. Assuming that the variable I is changed into the commodity inventory of Taobao Singles’ Day, and the A/B thread is changed into the users participating in the singles’ Day, the problem this will lead to is that for the Taobao business team, it may lead to oversold, repeated selling and other problems. This will lead to business and economic losses due to technical problems, especially in such promotional activities as Taobao Double Eleven. If such problems are not controlled properly, the risk of problems will increase exponentially. In fact, this is also the so-called thread safety problem.
In order to solve the problem of similar as mentioned in the paper, the JVM defines a set of rules, through the set of rules to determine when a thread of Shared variables writes to another thread can be seen, this set of rules, also known as the Java memory model (JMM), JMM is around the whole program execution atomicity, orderliness and visibility, let’s look at these three features.
2.5. The Java Memory Model revolves around three major features
2.5.1 Atomicity
Atomicity refers to the fact that an operation is uninterruptible, even in a multi-threaded environment, and once an operation is started it will not be affected by other threads. For example, if A static variable int I = 0 is assigned by two threads at the same time, thread A operates on it as I = 1 and thread B operates on it as I = 2. No matter how the thread runs, the final value of I is either 1 or 2. The operation between thread A and thread B is uninterfered. Uninterruptible characteristics. A little note that for 32 bit system, long data and type double data (for basic data types, byte, short, int, float, Boolean, char, speaking, reading and writing is atomic operation), their reading and writing is not atomic, That is, if there are two threads reading and writing data of type long or double at the same time, it will interfere with each other. For 32-bit virtual machines, each atomic read and write is 32 bits, while long and double are 64 bits of storage. After the first 32 bits of atomic operation, thread B’s turn to read happens to read only the last 32 bits of data, so it may read a variable that is neither original nor modified by the thread. It may be the value of a “half variable”, that is, the 64-bit data is divided into two reads by two threads. But don’t worry too much, because reading “half a variable” is rare, at least in today’s commercial virtual machines, almost all 64-bit data reads and writes are performed as atomic operations, so don’t worry too much about this problem. So essentially atomic operations refer to a group of large operations that either all succeed or all fail. For example, an order: {increase orders, inventory reduction} so orders for users is an operating system must guarantee atomicity order operation, or to increase orders and inventory reduction all successful, there is no increase in orders successful, failed to reduce inventory, then this example from macro is an atomic operation, the atomic operation on the other hand, The root cause of thread safety problems is also caused by non-atomic operations on a shared resource in the case of multiple threads. But one thing that needs to be noted before we delve into concurrent programming in Java, and before we get into visibility, is the way the computer optimizes a program as it executes — instruction reordering. When a computer executes a program, in order to improve performance, the compiler and processor often rearrange instructions, generally divided into the following three types:
- Compiler optimization rearrangement: The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
- Instruction parallel rearrangement: Modern processors use instruction level parallelism to superimpose multiple instructions. If there is no data dependency (that is, the last statement executed does not depend on the results of the previous statement), the processor can change the execution order of the machine instructions corresponding to the statement.
- Rearrangement of the memory system: As the processor uses the cache and reads and writes the cache flush, the load and store operations may appear to be executed out of order, resulting in a lag between memory and cached data synchronization due to the presence of the tertiary cache.
Among them, the rearrangement of compiler optimization belongs to compile-time rearrangement, while the rearrangement of instruction parallelism and memory system belongs to processor rearrangement. In multi-threaded environment, these rearrangement optimizations may lead to memory visibility problems of the program. The following illustrates the possible problems of these two kinds of rearrangement optimizations.
2.5.1.1 Compiler optimization instructions are rearranged
int a = 0;
int b = 0;
// Thread A thread Bcode1:intx = a; code3:inty = b; code2A: part b =1; code4: a =2;
Copy the code
At this point, there are 4 lines of code 1, 2, 3, 4, where 1 and 2 belong to thread A, and 3 and 4 belong to thread B. The two threads are executing at the same time. From the perspective of program execution, the final result x = 0 due to the parallel execution. y=0; You don’t get x = 2 per se; y = 1; This result, but in fact, this situation may occur with probability, because the compiler generally rearranges the instructions of compiler optimization for some lines of code with no influence on the front and back of the code, and the coupling degree is 0. Suppose that after the compiler rearranges the instructions of the code, the following situation may occur:
// Thread A thread Bcode2A: part b =1; code4: a =2; code1:intx = a; code3:int y = b;
Copy the code
In this case, combined with the previous thread safety problem, it may appear that x = 2; y = 1; This result, this also means that in a multithreaded environment, because the compiler will do to code instructions rearrangement of optimized operation (because code to execute by commonly, directive rearrangement is OS the optimization of single thread running), eventually lead to multiple threads in a multithreaded environment using variables can ensure consistency is uncertain (PS: Compiler rearrangement is based on code that does not have dependencies, and dependencies can be divided into two types: data dependence (int a = 1; Int b = a; And control dependencies (Boolean f = true; if(f){sout(“123”); })).
2.5.1.2 Processor instruction rearrangement
Processor instruction rearrangement is an optimization of CPU performance. From the perspective of instruction execution, an instruction can be divided into multiple steps, as follows: fetch refers: IF decoding and register operand: ID execution or effective address calculation: EX Memory access: MEM write back: WB
CPU at work, need to the above instruction can be divided into several steps in sequence (note the different hardware may not be the same), because each step will be used to different hardware operations, such as when they take that registers and memory, only PC decoding will execute commands to a set of registers, perform when ALU (arithmetic logic unit), and write back to when to use a set of registers. To improve hardware utilization, CPU instructions are pipelined as follows:
(assembly line technology: similar to the production line in the factory, the workers each perform their respective duties, finish their own to the back of the transmission, and then start a new, finished and then to the back of the transmission….. And is the same as the instruction execution, if wait until after an instruction has been completed to start the next execution, as factory production lines, to wait for a product to start the next after production is completed, the efficiency is very low and waste of human, so that a line at the same time can only have one of the workers in the work, other watch, Only after the product has gone to the last person and the last worker has finished the assembly can the first worker start work on the second product)
Can be seen from the diagram when instruction 1 has not been executed, article 2 of the instructions and spare hardware implementation, it is good to do so, if spend each step 1 ms, so if you need to wait for 1 to 2 instruction instruction execution after the completion of execution, then you will need to wait for 5 ms, but if use line technology, Instruction 2 only needs to wait 1ms to start executing, which greatly improves CPU performance. Although pipeline technology can greatly improve the performance of the CPU, but unfortunately once appear, interruption of water, all hardware devices will enter a pause period, when once again make up the breakpoints could take several cycles, such performance loss will be very big, like cell phone factory assembly line, once a finishing cut off, Then the subsequent workers of the part are likely to go through one or more rounds of waiting for the part to be assembled. Therefore, we need to prevent instruction interruption as much as possible. Instruction rearrangement is one of the means to optimize interruption. We illustrate how instruction rearrangement prevents interruption of pipeline technology through an example, as follows:
i = a + b;
y = c - d;
Copy the code
instruction | describe |
---|---|
LW R1,a | The LW instruction stands for load, where LW R1 and a stands for loading the value of A into register R1 |
LW R2,b | The value of b is loaded into register R2 |
ADD R3,R1,R2 | The ADD instruction represents addition, adding the values of R1 and R2 and storing them in an R3 register. |
SW i,R3 | SW means store holds the value of the R3 register into variable I |
LW R4,c | The value of c is loaded into register R4 |
LW R5,d | Load the value of d into register R5 |
SUB R6,R4,R5 | The SUB instruction represents subtraction, subtracting the values of R4 and R5 and storing them in the R6 register. |
SW y,R6 | Represents holding the value of register R6 into variable Y |
The above is the assembly instruction execution, on some instruction is a sign of X, X represents the meaning of the interrupt, where as long as there is X leads to halt instruction pipelining technology, at the same time also will affect the subsequent instruction execution, may need to pass one or several instruction cycle can be back to normal, why stop? Part this is because the data is still not ready, such as performing the ADD command, you need to use the instruction of the data to the front R1, R2, and R2 MEM operation did not complete at this time, that is not copying to memory, such addition calculation will not be able to, must wait until the MEM after the completion of the operation to perform, has stalled because of this, other instructions is also a similar situation. As mentioned above, pauses will cause CPU performance degradation, so we should try to eliminate these pauses, and then we need to use the instruction rearrangement, as shown in the following figure. Since the ADD command needs to wait, we can use the waiting time to do other things, such as moving LW R4, C and LW R5, D to the front of the execution. After all, LW R4, C and LW R5, D do not have data dependence, and their data dependence SUB R6,R5,R4 instruction is executed after R4,R5 is loaded, the process is as follows:
As you can see in the figure above, all of the pauses are eliminated and the pipeline of instructions is not interrupted, which gives the CPU a huge boost. This is where the processor rearranges instructions. About the compiler to row rearrangement and instructions (both rearrangement unified behind us is called instruction) relevant content is clear, we must realize that for the single thread instruction rearrangement will not bring any impact, almost than unexpectedly rearrangement is the premise of guarantee serial semantic consistency of execution, but just for a multithreaded environment, Instruction reordering can lead to serious program roulette problems, as follows:
int a = 0;
boolean f = false;
public void methodA(a){
a = 1;
f = true;
}
public void methodB(a){
if(f){
int i = a + 1; }}Copy the code
As shown in the above code, both thread A and thread B operate on the instance object. Thread A calls methodA while thread B calls methodB. Due to instruction rearrangement and other reasons, the execution order of the program may be as follows:
Thread A Thread B methodA: methodB: code1:f= true; code1:f= true; code2:a = 1; code2: a = 0 ; // Unupdated a is readcode3: i = a + 1;
Copy the code
Because of instruction rearrangement, thread A f to true was carried out in advance, and thread A still in the execution of A = 1, at this time because of the f = true, so the thread B just read the f value is true, direct access to A value, while the thread is still in their working memory to A copy of A copy of variables for the assignment of operations, The result has not been written to main memory, so the value of A read by thread B is still 0, so the value of A copied to thread B’s working memory is 0. Then, I = a + 1 operation is performed in its own working memory. At this time, thread B reads A as 0 due to the instruction rearrangement of the processor, resulting in the final result of I as 1 instead of 2 as expected. This is the result of the out-of-order execution of the program caused by instruction rearrangement in the multi-threaded environment. Therefore, remember that instruction reordering only ensures consistent execution of serial semantics in a single thread. It is possible to optimize programs by instruction reordering in a single thread environment, eliminating CPU pauses, but does not care about semantic consistency across multiple threads.
2.5.2 Visibility
Visibility means that when one thread changes the value of a shared variable, other threads can immediately know the value of the change. For serial programs, visibility is nonexistent because we change the value of a variable in any operation, and subsequent operations can read the value of the variable, and it is the new value that has been changed. However, in A multi-threaded environment, this is not necessarily the case. As we have analyzed previously, since the operation of A thread on A shared variable is copied to its working memory and then written back to the main memory, there may be A thread A that has modified the value of the shared variable I and has not written back to the main memory. Another thread B the same Shared variable (I) in the main memory, but this time A thread working memory Shared variable I is not visible to thread B, this kind of working memory and the main memory synchronization delays the visibility problems caused by additional instructions rearrangement and compiler optimizations may also lead to the visibility problem, through the analysis of the front, We know that reordering of both compiler optimizations and processor optimizations in a multithreaded environment does lead to problems with the execution of program rounds and thus visibility problems.
2.5.3 Orderliness
Orderliness means that for single-threaded code, we always think that the code is executed sequentially, which is fine in a single-threaded environment, after all, it is true in a single-threaded environment, the code is executed from the top down in the order it is encoded, even if instructions are reordered, As the premise of all hardware optimizations is to comply with the as-if-serial semantics, no matter how it is sorted, it does not and cannot affect the execution results of single-threaded programs, which we call ordered execution. On the contrary, for multi-threaded environment, it may appear out of order phenomenon, because the program compiled into machine code instructions may appear instruction rearrangement phenomenon, the order of the rearranged instructions and the original instructions may not be consistent. Want to know is that in a Java program, if in this thread, all operating orderly behavior, as if it is a multithreaded environment, observed in one thread to another thread, all operations are disorderly, within the first half of the sentence refers to a single thread to ensure serial semantic consistency of execution, after half sentence is refers to the instruction rearrangement phenomenon is synchronous with the main memory and working memory delay phenomenon.
2.6 How does JMM solve the above problems in Java?
Now that we have a real understanding of the above, we can look at some of the solutions that Java provides, such as atomicity. In addition to the atomicity provided by the JVM itself, atomicity at the method level or code block level You can use the synchronized keyword or the implementation class of the Lock interface to ensure atomicity of program execution. Details about synchronized (which ensures that three properties do not prohibit instruction reordering) will be covered later. Visibility problems caused by delayed synchronization between working and main memory can be solved by using locking or the Volatile keyword, both of which make changes made by one thread immediately visible to other threads. For visibility and order problems caused by instruction reordering, the volatile keyword can be used because another effect of volatile is to disallow reordering optimizations, more on volatile later. In addition to ensuring atomicity, visibility, and order by sychronized and volatile keywords (volatile keywords do not guarantee atomicity, only instruction rearrangement and visibility problems), The JMM also defines a happens-before principle to ensure atomicity, visibility, and order between two operations in a multithreaded environment.
2.7 The happens-before principle in the Java Memory model JMM
2.7.1 Interaction between thread and memory during execution
However, before learning about the happens-before principle in THE JMM, it is important to have a simple understanding of the interaction between threads and memory during execution. Java programs are actually scheduled by the OS for JVM “threads” during execution, and the interaction between threads and memory during execution. There are eight types of memory interactions (virtual machine implementations must ensure that each operation is atomic and non-separable, with exceptions for load, store, read, and write on some platforms for double and long variables) :
- Lock: a variable that acts on main memory, marking a variable as thread-exclusive;
- Unlock: A variable that operates on main memory. It releases a locked variable so that it can be locked by another thread.
- Read: Applies to a main memory variable that transfers the value of a variable from main memory to the thread’s working memory for subsequent load action;
- Load: a variable operating on working memory that places a read operation on a variable from main memory into working memory.
- Use: applies to variables in working memory. It transfers variables in working memory to the execution engine. This instruction is used whenever the virtual machine reaches a value that requires the variable to be used.
- Assign: a variable applied to working memory that places a value received from the execution engine into a copy of the variable in working memory.
- Store: applied to a variable in main memory, which transfers a value from a variable in working memory to main memory for subsequent write use;
- Write: A variable in main memory that puts the value of a variable in main memory from the store operation in working memory
The JMM lays down the following rules for the use of these eight directives:
- 1) Do not allow one of the read and load, store and write operations to appear separately. Read must load, store must write;
- 2) Do not allow a thread to discard its recent assign operation, that is, it must inform main memory of any changes in the work variable’s data.
- 3) Do not allow a thread to synchronize unassigned data from the working memory to the main memory;
- 4) A new variable must be created in main memory. Working memory is not allowed to use an uninitialized variable directly. Assign and load operations must be performed before use and store operations are performed on variables.
- 5) Only one thread can lock a variable at a time. You must unlock the device for the same number of times.
- 6) If a variable is locked, all values of the variable will be emptied from the working memory. Before the execution engine can use the variable, the variable must be re-loaded or assigned to initialize the value.
- 7) You cannot unlock a variable if it is not locked. You cannot unlock a variable that is locked by another thread;
- 8) Before a variable is unlocked, it must be synchronized back to main memory;
With these eight rules and some special rules for volatile, the JMM can determine which operations are thread-safe and which are thread-unsafe. But these rules are too complex to analyse directly in practice. Therefore, we generally do not use the above rules to analyze. More often, use the happens-before rule in the JMM for analysis.
2.7.2 Happens-before principle in JMM
If we need to use locking or volatile to solve these problems, it will be very troublesome to write programs. Moreover, locking essentially changes the parallel execution of multiple threads to serial execution, which will greatly affect the performance of the program. No, because JMM also provides us with the happens-before principle to assist in ensuring atomicity, visibility, and order of program execution. It is the basis for determining whether data is competing and thread safety. The happens-before principle reads as follows:
- First, the principle of program sequence: that is, semantic serialization must be guaranteed within a thread, that is to say, execution in accordance with code sequence.
- An unlock operation must take place before another lock is added to the same lock. In other words, if a lock is added after it is unlocked, the action must be taken after the unlock action (the same lock).
- Iii. Volatile Rules Writes to volatile variables occur first on reads. This ensures visibility of volatile variables. In simple terms, volatile variables force the value of the variable to be read from main memory each time it is accessed by a thread. Different threads can always see the latest value of the variable.
- A thread’s start() method precedes each of its actions. That is, if thread A modifies the value of A shared variable before executing thread B’s start method, when thread B executes start, thread A’s changes to the shared variable are visible to thread B.
- 5. Transitivity priority rule: IF A precedes B and B precedes C, A must precede C.
- Thread termination rules: All Thread operations precede Thread termination. The purpose of thread.join () is to wait for the currently executing Thread to terminate. Suppose that the shared variable is modified before thread B terminates. After thread A successfully returns from thread B’s join method, thread B’s changes to the shared variable will be visible to thread A.
- If thread.interrupted () is interrupted, the interrupt() method will be used to check whether the interrupted Thread is interrupted.
- Object finalization rule: Object constructor execution, finalization before Finalize () method.
The happens-before principle does not require any additional means of guarantee. It is stipulated by the JMM. Java programs follow the above rules by default.
int a = 0;
boolean f = false;
public void methodA(a){
a = 1;
f = true;
}
public void methodB(a){
if(f){
int i = a + 1; }}Copy the code
If thread A calls methodA() and thread B calls methodB(), thread A starts first and thread B starts later, what is the value of I read by thread B? Now according to the 8 rules, the program order principle is not appropriate because there are two threads calling at the same time. Neither methodA() nor methodB() uses synchronization and lock rules are inappropriate. The volatile variable principle does not apply because the volatile keyword is not used. Thread start rules, thread termination rules, thread interrupt rules, object termination rules, transitivity, and this test case are also inappropriate. Although the startup time of thread A and thread B is different, the execution result of thread B is uncertain, which means that the above code does not fit any of the eight principles and does not use any synchronization means, so the above operation is not safe for thread B, so the value read by thread B is also uncertain. This can be fixed simply by adding synchronization (locking) to methodA() and methodB() or by adding the volatile keyword to the shared variable, ensuring that changes made by one thread are always visible to other threads.
The Volatile keyword
3.1. Visibility guaranteed by the Volatile keyword
Volatile is a lightweight synchronization tool provided by Java. It guarantees visibility and order against instruction rearrangements, but it does not guarantee atomicity. If your program must be atomic, consider using atomic classes in JUC’s atomic package (discussed in a future chapter) or locking. However, we assume that if volatile is used to modify shared variables, it guarantees that changes made by one thread to the variable it modiates will always be visible to other threads, as follows:
volatile int i = 0;
public void add(a){
i++;
}
Copy the code
For the code above, any thread that calls add() and does I ++ to I is visible to other threads, but doesn’t this code have thread-safety issues? Yes, why? Because i++ is not an atomic operation, i++ is actually composed of three operations: read value from main memory, +1 operation in working memory, and write the result back to main memory. One of these three steps may be interrupted when performing any step. The thread safety problem will still occur (refer to the first case of the thread safety problem above), but we should be aware that if multiple threads call add(), the thread safety problem will still occur. If you want to solve the problem, you will need to use sync or lock or atomic class to ensure that. The volatile keyword only disables instruction reordering and visibility. So let’s look at another example where volatile can be used to modify variables for thread-safe purposes, as follows:
volatile boolean flag;
public void toTrue(a){
flag = true;
}
public void methodA(a){
while(! flag){ System.out.println("I is false... false..... false......."); }}Copy the code
Because changing the value of the Boolean flag variable is atomic, it is thread-safe to use volatile to modify flag so that it is immediately visible to other threads. So how does the JMM implement making volatile variables immediately visible to other threads? In fact, when a volatile variable is written, the JMM flusher the value of the shared variable from the thread’s working memory to main memory. When a volatile variable is read, the JMM invalidates the thread’s working memory, and the thread can only re-read the shared variable from main memory. Volatile variables are made visible to other threads in this write-read fashion (although their memory semantics are implemented through memory barriers, as described later).
3.2. How does Volatile prevent instruction reordering?
The volatile keyword also prevents the compiler or processor from reordering programs in order to avoid out-of-order execution in multithreaded environments. Let’s start with a concept called Memory barriers. A memory barrier, also known as a memory barrier, is a CPU instruction that ensures the order in which certain operations are performed and the memory visibility of certain variables. Because both the compiler and the processor can perform instruction rearrangement optimization. Inserting a Memory Barrier between instructions tells the compiler and CPU that no instructions can be reordered with the memory-barrier instructions. This prevents reordering optimizations for instructions before and after the Barrier by inserting a Barrier. Another effect of a Memory Barrier is to force the cache of various cpus to be flushed out, so that any thread on the CPU can read the latest version of the data.
Barrier type | Order sample | instructions |
---|---|---|
LoadLoad Barriers | Load1; LoadLoad; Load2; | Ensure that loading of Load1 instruction data precedes loading of Load2 and all subsequent loading instructions. |
StoreStore Barriers | Store1; StoreStore; Store2; | Ensure that the store of Store1 data is visible to other processors (flushed into memory) and that the data that occurred before Store2 and all subsequent store instructions are written. |
LoadStore Barriers | Load1; LoadStore; Store2; | Ensure that Load1 instruction data is written to Store2 and all subsequent storage instructions before loading. |
StoreLoad Barriers | Store1; StoreLoad; Load2; | Ensure that the store of Store1 data is visible to other processors (flushed into memory) and that the data loads that occurred before Load2 and all subsequent load instructions occur. StoreLoad Barriers execute memory access instructions behind the barrier until all memory access instructions (store and load) prior to the barrier have been completed. |
The Java compiler inserts a memory barrier instruction at the appropriate place to generate the instruction sequence to prohibit reordering of a particular type of handler, allowing the program to run as expected. The JMM divides the memory barrier instructions into four categories, StoreLoad Barriers is a “universal” barrier that has the effect of the other three Barriers. Most modern multiprocessors support this barrier (other types of barriers are not necessarily supported by all processors).
In short, it is through memory barriers that volatile variables achieve their in-memory semantics, namely visibility and prohibition of reordering optimizations. Examples are as follows:
public class Singleton{
private static Singleton singleton;
private Singleton(a){}
public static Singleton getInstance(a){
if(singleton == null) {synchronized(Singleton.class){
if(singleton == null){
singleton = new Singleton();
}
}
}
}
}
Copy the code
The above code is a classic double-checked singleton pattern of code, this code in a single-threaded environment does not have any problems, but in a multi-threaded environment can be thread safety issues. The reason is that the singleton reference object may not have been initialized if the singleton read by a thread is not null until the first detection. Singleton = new singleton (); This can be done in the following 3 steps (pseudocode)
memory = allocate(); //1. Allocate object memory space
singleton(memory); //2. Initialize the object
singleton = memory; //3. Set singleton to the memory address just allocated. = null
Copy the code
Since there may be a reorder between step 1 and Step 2, as follows:
memory = allocate(); //1. Allocate object memory space
singleton = memory; //3. Set singleton to the memory address just allocated. = null
singleton(memory); //2. Initialize the object
Copy the code
This reordering optimization is allowed because there is no data dependency between steps 2 and 3, and the execution result of the program does not change in a single thread, either before or after the reordering. However, instruction reordering only ensures consistent execution of serial semantics (single thread), but does not care about semantic consistency across multiple threads. So when a thread accesses a Singleton that is not null, there is a thread-safety issue because the Singleton instance may not have been initialized. The solution is simply to use volatile to prevent singleton variables from being optimized by instruction reordering.
private volatile static Singleton singleton;
Copy the code
Four,
If you read this article carefully, you will have a clear understanding of the Java memory model (JMM). In fact, this article is the first threshold for exploring concurrent programming in Java. I will continue to publish articles on concurrent programming in the future. If you have any other opinions or questions about some points in this article, you are welcome to discuss them in the comments section. Thank you!
Reference materials and books
- Deep Understanding of the JVM VIRTUAL Machine
- The Beauty of Concurrent Programming in Java
- Java High Concurrency Programming
- Core Technology of Website Architecture with 100 million Traffic
- Java Concurrent Programming