Daily sentence
Love your life in order to fulfill your role in it. – rodin
Precursor profile
Consistency issues also exist between CPU and memory. Many people think of the CPU as a computing component, with no data consistency issues. But in reality, there are multiple layers of caching between the CPU and memory, because memory is not growing fast enough to keep up with CPU updates.
And the reason is because of multicore, these caches tend to have multiple layers. Synchronization is also a problem if a thread’s timeslices span multiple cpus.
import java.util.stream.IntStream;
public class JMMDemo {
int value = 0;
void add(a) {
value++;
}
public static void main(String[] args) throws Exception {
final int count = 100000;
final JMMDemo demo = new JMMDemo();
Thread t1 = new Thread(() -> IntStream.range(0, count).forEach((i) -> demo.add()));
Thread t2 = new Thread(() -> IntStream.range(0, count).forEach((i) -> demo.add()));
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println(demo.value);
Copy the code
void add();
descriptor: ()V
flags:
Code:
stack=3, locals=1, args_size=1
0: aload_0
1: dup
2: getfield #2 // Field value:I
5: iconst_1
6: iadd
7: putfield #2 // Field value:I
10: return
LineNumberTable:
line 7: 0
line 8: 10
LocalVariableTable:
Start Length Slot Name Signature
0 11 0 this LJMMDemo;
Copy the code
In addition, during execution, the CPU may also optimize the input code for out-of-order execution, and the Java virtual machine’s just-in-time compiler has similar instructions reordering optimization, cache memory mechanism optimization. The execution of the whole function is more detailed and looks very fragmented (much finer than bytecode instructions), both due to bytecode and hardware reasons. In terms of coarse-grained simplification, the relatively obvious and obvious factor is that the operation of the thread Add method is not atomic.
To solve this problem, add the synchronized keyword to the add method, which ensures not only memory synchronization but also CPU synchronization. At this point, individual threads can only queue into the Add method.
synchronized void add(a) {
value++;
}
Copy the code
If you look at the add method in particular, you can see that a simple i++ operation has so much bytecode that it’s all stupidly “executed in sequence.” This is fine when it executes by itself, but in a multithreaded environment, the order of execution becomes unpredictable.
CPU private caches only have L1 and L2, L3 and L4 (supported by the new version), which are public.
Problem analysis
-
1. (Visibility problem + atomicity problem) This out-of-order process is shown in the figure above. Thread A and thread B “concurrently” execute the same code block add, numbered as shown in the figure. They are ordered within the thread (1, 2, 5 or 3, 4, 6), but the overall order is unpredictable.
-
2. (Visibility problem) Thread A and thread B each perform A increment operation, but in this scenario, thread B’s putfield instruction directly overwrites thread A’s value, resulting in A value of 101.
-
The above example is only at the bytecode level. To complicate matters, there is also the problem of consistency between CPU and memory. Many people think of the CPU as a computing component, with no data consistency issues. But in fact, because memory is not growing fast enough to keep up with CPU updates, there are multiple layers of caches between CPU and memory, and the reason is because of multiple cores, these caches tend to have multiple layers. Synchronization is also a problem if a thread’s timeslices span multiple cpus.
-
In addition, during execution, the CPU may optimize the input code for out-of-order execution, and the Java virtual machine’s just-in-time compiler has a similar instruction reordering optimization. The execution of the whole function is much more detailed and looks very fragmented (much finer than bytecode instructions).
Concurrent scenarios
-
Single thread: The CPU core’s cache is accessed by only one thread. Cache exclusivity, no access conflicts and other issues.
-
Single-core CPU, multi-threading: Multiple threads in a process can access the shared data in the process at the same time. After the CPU loads a block of memory into the cache, different threads will map to the same cache location when accessing the same physical address. In this way, the cache will not fail even if the thread switch occurs.
-
Multi-core CPU, multi-threading: Each core has at least one L1 cache and L2 cache. In the process of the multiple threads access a Shared memory, and the multiple threads in different core respectively, then every core in their respective retain a copy of the Shared memory buffer Cache, because of the multi-core can be parallel, there may be multiple threads at the same time to write its own Cache, and between their respective Cache data could be different.
Another case (lack of standardization and uniformity)
-
Memory access logic varies between hardware manufacturers and operating systems, and the result is that when your code works well on one system and is thread-safe, you can switch systems and have all sorts of problems.
-
This is because different processors have differences in processor optimization and instruction rearrangement, resulting in the same code, after different processor optimization and instruction rearrangement, the final execution results may be different, which is unacceptable to us.
-
JMM came into being, basically to solve the concurrent environment, ensure the security of data, to meet the visibility, atomicity, order of the scene.
The basic concept
-
The JMM is an abstract concept that describes a set of rules or specifications that address multithreaded shared variables. Keywords such as volatile and synchronized are the syntax around the JMM. Variables here include instance fields, static fields, but not local variables and method parameters, because the latter are thread private and do not have contention issues.
-
The JMM attempts to define a unified memory model that encapsulates the differences in memory access between underlying hardware and operating systems so that Java programs can achieve the same concurrency on different hardware and operating systems.
The JMM provides that shared variables are stored in the main memory, and each thread has its own working memory. The working memory of the thread stores a copy of the main memory. Operations on variables are performed in the working memory, and variables in the main memory cannot be directly manipulated. Different threads cannot access each other’s working memory variables directly. They need to access each other’s working memory variables through main memory.
The structure of the JMM
JMM is divided into Main Memory and Working Memory.
- Main storage is the area where instances are located, and all instances are in main storage. For example, fields owned by an instance are in main storage, which is shared by all threads.
- Working memory is the work area owned by threads, and each thread has its own dedicated working memory. Working memory holds copies of the necessary parts of main memory. This is called a Working Copy.
In this model, threads cannot operate directly on main storage. In the figure below, thread A wants to communicate with thread B only through main memory.
- Where are these areas of memory stored? If there had to be a correspondence, you could think of main memory as objects in the Java heap and working memory as objects in the virtual machine stack.
- But in reality, main memory can also be in the cache, or in a CPU register; Working memory can also exist in hardware memory, so we don’t have to worry too much about where.
Eight operation types
With support for JMM, Java defines eight atomic operations that control the interaction between main memory and working memory.
(1) Read operates on main memory. It transfers variables from main memory to the thread’s working memory for later load actions.
(2) Load acts on the working memory. It puts the value of the read operation into a copy of the variable in the working memory.
(3) Store operates on working memory, passing a variable from working memory to main memory for subsequent write operations.
(4) Write on main memory, it puts store transfer values into variables in main memory.
It passes the value of the working memory to the execution engine, which will perform this action whenever the virtual machine reaches an instruction that needs to use this variable.
Assign assigns a value from the execution engine to a variable in the working memory. This operation is performed whenever the virtual machine receives an instruction to assign a value to a variable.
(7) Lock applies to main memory, marking variables as thread-exclusive.
(8) The UNLOCK acts on the main memory, it releases the exclusive state.
As shown in the figure above, to copy a variable from main memory to working memory, read and load are executed sequentially. To synchronize variables from working memory back to main memory, store and write operations are performed sequentially.
The three major characteristics
-
(1) atomicity
-
The JMM guarantees that the read, Load, assign, use, Store, and write operations are atomic. Except for long and double, the access to the memory unit corresponding to other basic data types is atomic.
-
If you want greater atomicity, you can use lock and unlock.
-
-
(2) Visibility
-
Visibility means that when one thread changes the value of a shared variable, other threads can immediately perceive the change. As you can see in the previous figure, multiple operations are required to ensure this effect. Changes made by one thread to a variable need to be synchronized to main memory and refreshed before another thread can read it.
-
Volatile, synchronized, final, and locking are all ways to ensure visibility.
-
Volatile is worth mentioning here, because it is the most salient feature. For variables that use the volatile keyword, whenever the value of the variable changes, the change is immediately synchronized to main memory. ** If a thread wants to use the variable, it will flush it from main memory to the working memory store, thus ensuring that the variable is visible. (Visibility + order)
-
The lock and sync keywords are easier to understand, as they force more operations to be atomized. With only one lock, visibility of variables is easier to guarantee. (Atomic operation)
-
-
(3) Orderliness
-
Java programs are interesting because, as you can see from the add operation above, all operations are in order if viewed in a thread; If viewed from another thread, all operations are unordered.
-
In addition to the observation of multi-threading disorder, disorder also comes from instruction rearrangement.
-
Instruction reordering is the process by which the JVM optimizes instructions according to certain rules in order to improve the efficiency of the program without affecting the execution results of the single-threaded program. In some cases, this optimization can cause logic problems for execution, and in the case of concurrent execution, different logic can lead to different results.
-
We can take a look at some default “ordered” behavior in the Java language, known as the happens-before principle, which may not be perceived when code is written because it is the default behavior.
-
Antecedent is A very important concept. If operation A precedes operation B, the effects of operation A can be felt by operation B.
-
- Sequence: In a thread, in order of code, operations written earlier take place before those written later.
- Monitor lock: unLock action occurs first after lock action on the same lock.
- Volatile: Writes to a variable occur first while reads to that variable occur later.
- Transfer rule: If operation A precedes operation B and operation B precedes operation C, it follows that operation A precedes operation C.
- Thread start: The operation on thread start() precedes any operation within the thread.
- Interrupt: A call to interrupt() occurs when an interrupt event is detected in the Thread code, which can be checked by thread.interrupted ().
- Thread.join(), thread.isalive (), Thread termination rule: All operations in a Thread occur before Thread termination is detected. The Thread has terminated by checking the return values of thread.join () and thread.isalive ().
- Object finalization rule: The finalization of an object occurs first at the start of its Finalize () method.
The memory barrier
So how do we guarantee all these rules and features mentioned above?
Memory barriers are used to control reordering and Memory visibility problems under certain conditions.
JMM memory barriers can be divided into read barriers and write barriers, and Java memory barriers are actually a combination of these two barriers to complete a series of barriers and data synchronization functions. When the Java compiler generates bytecode, it inserts a memory barrier at the appropriate place to execute the instruction sequence to limit processor reordering.
Here are some of these combinations.
-
Load-load Barriers: Guarantee loading of load1 data prior to loading of load2 and all subsequent Load instructions. In the case of Load barriers, inserting them before instructions invalidates data in the cache, forcing data to be reloaded from main memory.
- load1
- LoadLoad
- load2
-
Load-store Barriers: Ensure that load1 data loads take precedence over Store2 and subsequent Store instructions flush to memory.
- load1
- LoadStore
- store2
-
Store-store Barriers: Ensure that store1 data is visible to other processors, prior to store2 and the storage of all subsequent storage instructions. In the case of a Store Barrier, inserting a Store Barrier after an instruction allows the latest data to be written to main memory, making it visible to other threads.
- store1
- StoreStore
- store
-
Store-load Barriers: Ensure that writes to Store1 are visible to all processors before Load2 and all subsequent read operations are performed. This memory barrier instruction is an all-purpose barrier that has the effect of the other three barriers and is the most expensive of the four.
- store1
- StoreLoad
- load2