Key issues in concurrent programming
JDK is born with multithreading, multithreading greatly speeds up the speed of the program, but everything has advantages and disadvantages, concurrent programming often involves the communication between threads and synchronization problems, generally also said to be visibility, atomicity, order.
Thread communication
Thread communication refers to what mechanism is used to exchange information between threads. There are two commonly used communication mechanisms in programming, shared memory and message passing.
- Shared memory.
In the concurrent model of shared memory, threads share the common data state of the program. Threads previously transfer information by reading and writing the common memory area in memory. The typical shared memory communication mode is to communicate through shared objects.
- Messaging, such as in Linux, is synchronized in pipes, signals, message queues, semaphores and sockets.
In the concurrent model of messaging, there is no shared state between threads. Threads must communicate explicitly by sending messages. Typical communication methods in Java are wait() and notify().
In C/C++ can support both shared memory and message passing mechanism, Java uses the shared memory model.
Thread synchronization
Synchronization is the mechanism that a program uses to control the relative order in which operations occur between different threads.
- In the shared memory concurrency model, synchronization is done explicitly. Programmers must explicitly specify that a method or piece of code needs to be executed mutually exclusive between threads.
- In the concurrent model of message delivery, synchronization is implicit because the message must be sent before the message is received.
JMM
Memory models in modern computer physics
The cache
Before we look at the JMM, let’s look at the data storage model of modern computer physics.With the development of CPU technology, CPU execution speedFaster and faster. However, since the technology of memory didn’t change much, it took me 10 seconds to execute a task. As a result, the CPU took 8 seconds to get data, and the CPU took 2 seconds to calculate. Most of the time was used to get data.
How to solve this problem? Is to increase the cache between CPU and memory. The concept of caching is to keep a copy of data. It is characterized by high speed, low memory, and high cost.
After the program runs to obtain data is the following steps.And with the continuous improvement of CPU computing capacity, one layer of cache slowly can not meet the requirements, and gradually derived from multi-level cache. CPU caches can be divided into level 1 caches (L1), level 2 cache (L2), some high-end cpus also have level 3 caches (L3), all data stored in each level of cache is part of the next level of cache. The technical difficulty and manufacturing cost of these three caches are relatively decreasing, so their capacity is also relatively increasing. The performance comparison is as follows:
Single-core CPU only has a set of L1, L2, L3 cache; If the CPU has multiple cores, that is, a multi-core CPU, each core has a L1 (or even L2) cache and shares L3 (or L2) caches.
Cache consistency
With the continuous improvement of computer power, began to support multithreading. So here’s the problem. We will analyze the impact of single – thread and multi-thread on single-core CPU and multi-core CPU respectively.
- Single thread. The CPU core’s cache is accessed by only one thread. Cache exclusivity, no access conflicts and other issues.
- Single-core CPU multi-threading. Multiple threads in a process can access the shared data in the process at the same time. After the CPU loads a block of memory into the cache, different threads will map to the same cache location when accessing the same physical address. In this way, the cache will not be invalid even if the thread switch occurs. But because only one thread can be executing at any time, cache access conflicts do not occur.
- Multi-core CPU, multi-thread. Each core has at least one L1 cache. If multiple threads access a shared memory in a process and execute on different cores, each core keeps a buffer of shared memory in its own CAEHE. Since multiple cores can be parallel, it is possible for multiple threads to write to their caches at the same time, and the data in their caches may be different.
Adding caches between CPU and main memory can lead to cache consistency issues in multi-threaded scenarios, that is, in a multi-core CPU, each core has its own cache for the same dataCache contents may be inconsistent
. Cache consistencyCache Coherence: in a multi-processor system, each processor has its own Cache, which shares the same MainMemory. When the computation tasks of multiple processors all involve the same main memory area, the cache data of each processor may be inconsistent, for example, a variable of shared memory is shared between multiple cpus. If this happens, whose cached data will be used when synchronizing back to main memory? To solve the problem of consistency, it is necessary for each processor to access the cacheFollow some protocols
, read and write according to protocols, such as MSI, Illinois Protocol (MESI), MOSI, Synapse, Firefly and Dragon Protocol.
Inconsistent demos are as follows:
// Thread A executes as follows
a = 1 // A1
x = b // A2
-----
Thread B executes as follows
b = 2 // B1
y = a // B2
Copy the code
Processors A and B perform memory accesses in parallel, in program order, and may end up with x=y=0.
- Processor A and processor B can simultaneously write shared variables to their own write buffers (A1, B1), where A =1 and B =2.
- Write operations a = 1, b = 2 are completed after A3 and B3 refresh to the shared cache.
- If this step is performed in step 2
before
Execute (A2, B2), x=b, y=a. The program will get x=y=0.
Processor optimization and instruction rearrangement
As mentioned above, adding cache between CPU and main memory can cause cache consistency issues in multi-threaded scenarios. In addition to this, there is another hardware issue that is important. The processor may execute the input code out of order in order to maximize the use of the processor’s internal arithmetic unit. This is processor optimization.
In addition to the fact that many popular processors optimize out-of-order code, compilers in many programming languages, such as the Java JIT, have similar optimizations.
As you can imagine, all sorts of problems can result if processor optimizations and compiler rearrangements of instructions are left unchecked. These issues are addressed at both the hardware and compiler levels.
Concurrent programming problem
We need to know that the basic level of software is hardware, and the problems of atomicity, visibility and orderliness will occur when software runs on such a level. Well, atomicity, visibility and order. People define it abstractly. The underlying issues of this abstraction are the aforementioned cache consistency, processor optimization, and instruction rearrangement issues.
Generally speaking, to ensure data security, concurrent programming needs to meet the following three features:
- Atomicity: Refers to an operation in which the CPU may not pause and then schedule without interrupting, completing, or not executing.
- Visibility: When multiple threads access the same variable, if one thread changes the value of the variable, the other threads can see the changed value immediately.
- Orderliness: The order in which a program is executed is the order in which the code is executed.
You can see that the cache consistency problem is really a visibility problem. Processor optimization can cause atomicity problems. Reordering of instructions leads to ordering problems.
The memory model
As mentioned earlier, cache consistency, processor optimization, and instruction reordering issues are the result of hardware upgrades. So, is there a mechanism that works well to solve all of these problems in order to ensure that atomicity, visibility, and order can be satisfied in concurrent programming? One important concept is the memory model.
In order to ensure the correctness (visibility, orderliness, atomicity) of shared memory, the memory model defines the specification of read and write operation of multithreaded program in shared memory system. These rules are used to regulate the read and write operations of memory, so as to ensure the correctness of instruction execution. It’s about the processor, it’s about the cache, it’s about concurrency, it’s about the compiler. It solves the memory access problems caused by multi-level CPU cache, processor optimization and instruction rearrangement, and ensures the consistency, atomicity and order in concurrent scenarios.
The memory model solves the concurrency problem in two main ways: limiting processor optimization and using memory barriers.
JMM
I mentioned earlier that the computer memory model is an important specification to solve the concurrency problem in multi-threaded scenarios. So what’s the implementation of that? It might vary from language to language.
As we know, Java programs need to run on the Java virtual machine. The Java Memory Model (JMM) is a kind of Memory Model specification that shields the access differences of various hardware and operating systems. A mechanism and specification that ensures consistent access to memory by Java programs on various platforms.
When referring to the Java memory model, it generally refers to the new memory model that JDK 5 began to use, mainly byJSR-133: JavaTM Memory Model and Thread Specification description The simple image is as follows:The JMM function:
This is a virtual specification for data synchronization between working memory and main memory. The purpose is to solve the problems caused by the inconsistency of local memory data, the reordering of code instructions by the compiler, and the out-of-order execution of code by the processor when multiple threads communicate through shared memory.
PS:
The concept of main memory and working memory (caches, registers) mentioned in this article is simply analogous to the concept of main memory and cache in the computer memory model. In particular, it is important to note that main and working memory are not directly analogous to the Java heap, stack, method area, and so on in the JVM’s memory structure. In Understanding the Java Virtual Machine, the main memory corresponds primarily to the object instance data part of the Java heap, in terms of definitions of variables, main memory, and working memory, if at all. Working memory corresponds to a portion of the virtual machine stack.
Communication between arbitrary threads is simple as follows:Inside the JVM, the Java memory model divides memory into two parts: the thread stack area and the heap areapostHere is only the general architecture diagram. The details have been written down.
Problems with the JMM
- Shared object for each thread
visibility
Thread A reads the main memory data modification, and the main memory data is read by thread B before it can synchronize the modification data to the main memory.
- Shared object
competition
The phenomenon of
AB two threads read from main memory at the same time, increment by 1, and return.
For the above problem is nothing more than variable usevolatile
, lock,CAS
Wait for this operation to solve.
Instruction rearrangement
When executing a program, the compiler and processor often reorder instructions to improve performance. Such as:
code1 // Takes 10 seconds
code2 // Takes 2 seconds
----
If code1 and Code2 meet the instruction reshoot requirements, Code2 does not wait until Code1 finishes executing.
Copy the code
The compiled source code may be rearranged to speed up the final CPU execution.
- Compiler optimized reordering
The compiler can rearrange the execution order of statements without changing the semantics of a single-threaded program.
- Instruction – level parallel reordering
Modern processors use Instruction-LevelParallelism (ILP) to execute multiple instructions on top of each other. If there is no data dependency, the processor can change the execution order of the machine instructions corresponding to the statement.
- Memory system reordering
The processor uses caching and read/write buffers, which makes it appear that load and store operations may be executed out of order (processor rearrangement)
Data dependency and control dependency
Reordering does not reshoot code for data dependencies and control dependencies.
- Data dependence if two operations accessSame variableAnd in the two operationsThere’s one for writeOperation, in which case there is a data dependency between the two operations, and such code is not allowed to be rearranged. There are three types of data dependencies:
- The control dependency flag variable is a flag used to indicate whether variable A has been written or not. In the use method, variable I depends on if (flag). This is called control dependency.
public void use(a){
if(flag){ //A
int i = a*a;// B
.
}
}
Copy the code
as-if-serial
No matter how to reorder, the code must be guaranteed to run correctly in a single thread, even in a single thread can not be correct, let alone discuss the situation of multi-thread concurrency, so the concept of as-if-serial is proposed, the semantics of as-if-serial meaning meaning:
No matter how much reordering is done (by the compiler and processor to improve parallelism), the execution result of a (single-threaded) program cannot be changed. The compiler, runtime, and processor must comply with the AS-IF-Serial semantics. To comply with the as-if-serial semantics, the compiler and processor do not reorder operations that have data dependencies because such reordering changes the execution result. (emphasize that this data dependent only on a single processor that executes instructions sequence and the operation of a single thread that executes between different processors and data dependencies between different threads are not considered by the compiler and processor) however, if there is no data dependencies between operating, these operations may be still the compiler and valued order processing.
int a = 1; / / 1
int b = 2;/ / 2
int c = a + b ;/ / 3
Copy the code
Data dependencies exist between 1 and 3, as well as between 2 and 3. Therefore, 3 cannot be reordered before 1 and 2 in the final sequence of instructions executed (the result of the program will be changed if 3 comes before 1 and 2). But there is no data dependency between 1 and 2, and the compiler and processor can reorder the execution order between 1 and 2. The ASIF-serial semantics allow for no reordering interference and no memory visibility issues under a single thread
Rearrangement problem under multithreading
For example, if the AB thread executes two functions at the same time,
- Thread A rearranges 12 instructions, and thread AB performs them in 2-3-4-1.
- Thread B rearranges the instruction for 34, first reads the value of A as 0, then calculates a* A = 0, and stores it temporarily. Then, if thread A completes the execution, the I in the use function ends up being 0.
Solve problems under concurrency
The memory barrier
A Memory Barrier (or sometimes called a Memory Fence) is a CPU instruction used to control reordering and Memory visibility problems under certain conditions. The Java compiler also disallows reordering based on the rules of the memory barrier. The Java compiler inserts a memory barrier instruction at the appropriate place to generate the instruction sequence to prohibit reordering of a particular type of handler, allowing the program to run as expected.
- Ensure the order in which certain operations are performed.
- Affects the memory visibility of some data, or the execution result of an instruction.
The compiler and CPU can reorder instructions to ensure the same end result in an attempt to optimize performance. Inserting a Memory Barrier tells the compiler and CPU:
- Whatever instructions don’t match this one, okay
Memory Barrier
Instruction reorder.Memory Barrier
The other thing you do is force spawns variousCPU cache
, such as aWrite-Barrier
(Write barrier) will spawn all inBarrier
Before writingcache
So any thread on the CPU can read the latest version of the data.
There are currently four types of barriers.
- LoadLoad barrier
Load1,Loadload,Load2 Read in plain English Load1 must be executed before Load2, even if Load1 is executed slowly Load2 must wait until Load1 is finished. Typically processors that can execute preload instructions/support out-of-order processing need to explicitly declare a Loadload barrier because waiting load instructions in these processors can bypass waiting storage instructions. On processors that always guarantee processing order, this barrier is equivalent to no operation.
- StoreStore Barrier Write Write
Sequence: Store1, StoreStore, Store2 In plain English any operation can be written from the cache to the shared area in time to ensure that other threads can read the latest data, which can be interpreted as ensuring visibility. In general, a processor needs to use StoreStore barriers if it cannot guarantee sequential flushing of data from write buffers or/and caches to other processors and main memory.
- LoadStore barrier read and write
Sequence: Load1; LoadStore; Store2 serves roughly the same purpose as the first one, ensuring that Load1 data is read before Store2 and subsequent Store instructions are refreshed. LoadStore barriers are required on out-of-order processors that wait for Store instructions to bypass loads.
- StoreLoad barrier Write read
Sequence: Store1; StoreLoad; Load2 ensures that Store1 data becomes visible to other processors (i.e. flushed to memory) before Load2 and all subsequent load instructions are loaded. StoreLoad Barriers execute memory access instructions behind the barrier until all memory access instructions (store and load instructions) prior to the barrier have been completed. StoreLoad Barriers is an all-purpose barrier that simultaneously has the effect of the other three Barriers. Most modern multiprocessors support this barrier (other types of barriers are not necessarily supported by all processors).
A critical region
In other words, the same lock is applied to both functions when they are run. This ensures that the two threads executing the two functions are orderly, as long as the synchronization method is responsibleas-if-serial
Can.
Happens-Before
Because the presence of instruction reorders can make it difficult to understand the CPU’s internal operating rules, the JDK uses the concept of happens-before to describe memory visibility between operations. In the JMM, if the results of one operation need to be visible to another, there must be a happens-before relationship between the two operations. The happens-before of the CPU can be guaranteed without any synchronization.
- If an operation
happens-before
Another operation, the execution result of the first operation is visible to the second operation, and the first operation is executed in the order before the second operation. (For programmers)- Exists between two operations
happens-before
Relationships, does not mean that specific implementations of the Java platform must be executed in the order specified by the happens-before relationship. If the result is reordered after execution, presshappens-before
This reordering is allowed (for the compiler and processor)
happens-before
Specific rules Mark, in case of need.
- Procedure order rule: For every action in a thread, happens-before any subsequent action in that thread.
- Monitor lock rule: a lock is unlocked, happens-before a lock is subsequently locked.
- Volatile variable rule: Writes to a volatile field, happens-before any subsequent reads to that volatile field.
- Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
- Start () rule: if thread A performs an operation threadb.start () (starts ThreadB), then thread A’s threadb.start () operation happens before any operation in ThreadB.
- Join () rule: if thread A performs the operation threadb.join () and returns successfully, any operation in ThreadB happens-before thread A returns successfully from threadb.join (). 7. Thread interrupt rule: a call to the threadinterrupt method happens-before occurs when the interrupted thread’s code detects the occurrence of an interrupt event.
Volatile semantics
Volatile ensures visibility of variables and is also weak atomicity. I will not repeat the details of the previous post on volatile, but the directive reorders the rules for volatile as follows:
- Insert a StoreStore barrier before each volatile write. Insert a StoreLoad barrier after each volatile write.
- Insert a LoadLoad barrier after each volatile read. Insert a LoadStore barrier after each volatile read
Locked memory semantics
Somewhat similar to the heavy-duty version of Volatile, it does the following:
- When a thread releases the lock, the JMM fluses the shared variables in the thread’s local memory to the main memory.
- When a thread acquires a lock, the JMM invalidates the thread’s local memory. This makes critical section code protected by the monitor have to read shared variables from main memory.
Final memory semantics
The compiler and processor follow two reordering rules.
- There is no reordering between a write to a final field within a constructor and a subsequent assignment of a reference to the constructed object to a reference variable. Look at code note 1
- There is no reordering between the first reading of a reference to an object containing a final field and the subsequent first reading of the final field. Look at code note 2
class SoWhat{
final int b;
SoWhat(){
b = 1412;
}
public static void main(String[] args) {
SoWhat soWhat = new SoWhat();
// Note 1: Do not assign the address of the new object to soWHAT before the statement b = 1 is executed.
System.out.println(soWhat); //A
System.out.println(soWhat.b); //B
// Note 2: Instructions A and B cannot be reordered.
}
}
Copy the code
When final is a reference type, the following rules are added:
There is no reordering between writing to the member field of a final referenced object inside the constructor and then assigning a reference to the constructed object outside the constructor to a reference variable.
class SoWhat{
final Object b;
SoWhat(){
this.b = new Object(); // A
}
public static void main(String[] args) {
SoWhat soWhat = new SoWhat(); //B
B can be executed only after A has been executed
}
}
Copy the code
Implementation of final semantics in the processor
- The compiler is required to insert a StoreStore barrier after the final field is written but before the constructor return.
- Reordering rules for reading final fields require the compiler to insert a LoadLoad barrier before reading final fields.
synchronized
Synchronized synchronized synchronized synchronized synchronized synchronized synchronized synchronized synchronized synchronized synchronized synchronized
reference
Computer memory structure diagram pretty good computer memory model
This article is formatted using MDNICE