This article is participating in “Java Theme Month – Java Development Practice”, for details: juejin.cn/post/696826…
This is the fifth day of my participation in Gwen Challenge
If the profile
There are two types of instruction reordering, compiler reordering and processor reordering. (As for the complex instruction rearrangement of memory system is not the focus of this chapter)
There are two types of reordering: compile-time reordering and run-time reordering, corresponding to compile-time and run-time environments respectively. Compiler reordering occurs at compile time and handler reordering occurs at run time. In fact, the original intention of instruction reordering is to improve the concurrent efficiency of the program. The principle is that the program running results after reordering are consistent with the single thread running results. (AS IF SERIAL)
The reason for the reordering of instructions
-
Why does instruction reordering increase program concurrency efficiency? To understand this, the minimum unit of CPU scheduling is the thread.
-
A CPU can only handle one thread at the same time, in the original single-core CPU, is through polling to complete multi-threading, in the thread to complete the context switch.
-
It’s all multicore cpus now, and each CPU is polling threads, but multicore cpus are more concurrency efficient.
There is a problem that the CPU’s computing speed is much faster than the operation on the memory. When writing data from the working memory to the main memory (physical memory), if two cpus need to write to the same memory area at the same time, one CPU needs to wait for the other CPU to finish writing, which causes a waste of CPU. This situation does not exist on a single-core CPU, so instruction reordering is required.
For example
int a = 1;
int b = 2;
Copy the code
Instruction ordering case study
-
A and B need to write to different areas of memory, in multithreading:
- If CPU1 writes a to memory A, then b to memory B.
- CPU2 must be in the same order, so it is easy for two cpus to write to memory A at the same time, which requires one CPU to wait for the other to complete, resulting in CPU wait waste.
- But if the instructions in thread 2 are reordered, int b = 2; int a = 1;
- So CPU2 writes b to memory B and then A to memory A.
- So that both cpus can write at the same time, which is a true multicore CPU, which is the purpose of instruction reordering.
Limitations of instruction ordering
Of course, instruction reorder is also conditional. There is a concept of inter-statement dependencies, which can be divided into data dependencies and control dependencies.
Data dependency
A control dependency refers to a statement that uses the data of the previous statement. A control dependency refers to a statement that uses the judgment of the previous statement. It makes sense that you can’t order reorders if you have dependencies between statements. But in high concurrency, instruction reordering can cause serious problems if shared variables are not concurrently processed. Here’s an example:
-
If thread A calls the write method and the instructions are reordered, b=true is written to memory and a=1 is written to memory, one situation may occur.
-
Thread A is calling the read method before writing a=1 to memory after writing b=true to memory.
-
If (b) reads a after judging success, a is not written as 1, so the error of sharing variables will occur at this time.
visibility
The volatile keyword can be used to modify a shared variable, so that if the variable is modified in working memory, it is visible to other threads without being written to main memory, thus ensuring that the variable is visible.
For example, synchronous locking ensures that data is read back from memory to flush the cache when it is acquired, and that data is written back to memory to keep it visible when it is released. Volatile variables simply write to or write to memory.
order
-
Variables that are also volatile do not allow instruction reordering and caching (memory barriers).
-
To avoid the problem of instruction reordering in multithreading, we need to follow the coding-and-before principle, which is about the locking mechanism of concurrent programming.
-
Unlock operation occurs first when a lock operation is performed on the same object, including in other threads.
-
Volatile writes occur before reads, ensuring that the variable is visible.
-
The start() method of a thread occurs first for each action of that thread.
-
The join() method of a thread is each action that precedes the thread after the method terminates. The return value of thread.alive() is used to determine whether the thread has terminated.
-
The Thread.interrupte () method is used to detect whether a thread is interrupted when code on the interrupted thread detects that the thread is interrupted.
-
The completion of an object’s initialization, that is, the call of its constructor, is preceded by its finalizer, such as Finalize ().
-
Action is transitive, if action A precedes action B, and action B precedes action C, then action A precedes action C.
-
In a concurrent program, the programmer will pay special attention to the data synchronization between different process or thread, especially when multiple threads at the same time to modify the same variable, reliable synchronization or other measures should be taken to guarantee the data is properly modified, here’s an important principle is: don’t assume that the order of the instruction execution, you can’t predict the instruction will be executed in what order between different threads.
The ideal model is one in which instructions are executed in a unique and ordered order, the order in which they are written in code, regardless of processor or other factors. This model is called the sequential consistency model, also based on von Neumann’s system.
Of course, this assumption is reasonable in itself, and there are few exceptions in practice, but in fact, no modern multiprocessor architecture would adopt this model because it is so inefficient. Instruction reordering is almost always involved in compiler optimization and CPU pipelining.
Compile time resort
The typical reordering at compile time is to reduce the number of reads and stores of registers as much as possible by adjusting the order of instructions without changing the semantics of the program, and fully reuse the stored values of registers.
-
The first instruction computes A value, assigns it to variable A and stores it in A register,
-
The second instruction has nothing to do with A but needs to occupy the register (assuming it will occupy the same register A is in)
-
The third instruction uses the value of A and is independent of the second instruction.
-
Then, according to the sequential consistency model, A is put into the register after the execution of the first instruction, A no longer exists when the second instruction is executed, and A is re-read into the register when the third instruction is executed. In this process, the value of A does not change.
-
The compiler usually swaps the position of the second and third instructions so that A is in A register when the first instruction ends and can then be read directly from the register, reducing the overhead of repeated reads.
-
Reordering implications for pipelining
Nearly all modern CPU using pipelining mechanism to speed up the order processing speed, in general, an instruction to a number of processing CPU clock cycle, and executed in parallel by the assembly lines, can be in the same clock cycle, the execution of some particular way, in a nutshell is the instructions are divided into different execution cycle, such as reading, addressing, parsing, implementation steps, such as At the same time, in the execution unit EU, the functional unit is divided into different components, such as addition element, multiplication element, loading element, storage element, etc., which can further realize the parallel execution of different calculations.
The pipelined architecture determines that instructions should be executed in parallel, not in the sequential model. Reordering is beneficial to make full use of pipeline and achieve the effect of superscalar.