Interviewer: Today I want to talk to you about the Java memory model. Do you know anything about it?
Candidate: Well, let me tell you briefly what I understand. So why don’t I start with why we have a Java memory model
Interviewer: Start your performance.
Candidate: Let me give you some background
Candidates: 1. Existing computers tend to be multi-core, with caches under each core. Caches are born of “CPU and memory speed differences”, and L1 and L2 caches are typically “per-core exclusive”.
Candidates: 2. To make the CPU more efficient, the processor may perform “out-of-order execution” of the input code, which is called “instruction reordering”
Candidates: 3. A modification operation to a numeric value is often nonatomic (i++ is actually divided into multiple instructions when executed by the computer)
Candidate: None of the above is a problem under always-single-threaded because single-threaded means no concurrency. And in a single thread, the compiler /runtime/ processor must comply with the as-if-serial semantics, which means that they do not reorder “data-dependent operations”.
Candidates: The CPU has become more complex for efficiency, with caching, instruction reordering, and so on. We write programs that want to use the CPU as much as possible. As a result, we use multithreading
Candidate: Multi-threading means concurrency, and concurrency means we need to think about thread safety
Candidates: 1. Cache data inconsistency: Multiple threads modify the “shared variable” at the same time, the CPU core cache is “not shared”, so how to synchronize data between multiple caches and memory?
Candidate: 2. CPU instruction reordering in multithreading will cause code to execute unexpectedly, which will eventually lead to errors in the results.
Candidates: CPUS have their own solutions to the “cache inconsistencies” problem, and there are two commonly recognized solutions:
Candidates: 1. Bus lock: When a core is modifying data, other cores cannot modify data in memory. (Similar to the concept of exclusive memory, as long as a CPU is changing, the other CPU has to wait for the current CPU release)
Candidates: 2. Cache consistency protocol (MESI protocol, there are many, but just one that you’ve probably seen). MESI (Modified, Exclusive, Share, Invalid)
Candidate: The Cache consistency protocol, which I think is known as a “Cache lock,” “locks” on “Cache lines,” which are essentially the smallest unit of Cache storage.
Interviewer: HMM…
Candidate: The MESI protocol basically works like this: Before each CPU reads a shared variable, it identifies the “object state” of the data (whether it is modified, shared, exclusive, or invalid).
Candidate: Exclusive: Indicates that the variable data to be obtained by the current CPU is up to date and is not read by other cpus at the same time
Candidate: If it is shared, it indicates that the variable data to be obtained by the current CPU is still up to date and is being read by another CPU at the same time, but has not been modified
Candidate: If the value is modified, the current CPU is changing the value of the variable and sends a message to other cpus indicating that the data status is invalid. After receiving a response from other cpus (the other cpus change the data status from share to invalid), the current CPU writes the cached data to main memory. And change its state from MODIFY to exclusive.
Candidate: If it is invalid, the current data has been modified and the latest data needs to be read from main memory.
Candidate: What the MESI protocol does is determine the “object state” and make different policies based on the “object state.” The point is that when one CPU makes changes to the data, it needs to “synchronize” the other cpus to say that I’ve changed the data and you can’t use it.
Candidates: MESI has a smaller lock granularity than bus locks, so performance is definitely higher
Interviewer: But as far as I know, the CPU is still optimized. Do you know that?
Candidate: Well, a little bit.
Candidate: As you can see from the previous section, when a CPU modifies data, it needs to “synchronize” the message to the other cpus and wait for the other cpus to receive an invalid response before it can write the cached data to main memory.
Candidate: Synchronous, means waiting, waiting means doing nothing. The CPU didn’t like it, so it optimized it.
Candidate: The optimization idea is to go from “synchronous” to “asynchronous.”
Candidate: The CPU “synchronizes” the changes to tell other cpus, but now it writes the newly changed value to the store Buffer, notifies other cpus to remember to change the state, and the CPU goes back to doing something else. Wait until you receive a response message from another CPU, then update the data to the cache.
Candidate: When other cpus receive an invalid message, they also place the message in an “Invalid Queue”. Writing to the “Invalid Queue” will return a message telling the CPU that changed the state to “invalid”.
Candidate: And asynchrony brings A new problem: now that I’ve written A to the “Store Buffer,” the CPU can do something else. If the CPU receives an instruction to modify A, the last modification is still in the Store buffer, not in the cache.
Candidate: Therefore, when the CPU reads data, it needs to check the “Store buffer” to see if the data exists. If the data does not exist, it will directly fetch the data. 【 Store Forwarding 】
Candidate: Ok, that solves the first asynchrony problem. (The same core reads and writes data, because of asynchro, it is likely to read the old value a second time, so first read “Store Buffer”.
Interviewer: Anything else?
Candidate: Of course, that “asynchronization” can cause problems with the same core read-write shared variables, and that can also cause problems with “different” core read-write shared variables
Candidate: CPU1 has modified the value of A, written it to store buffer and told CPU2 to perform an invalid operation on the value. CPU2 may not have received the invalid notification and have done something else, causing CPU2 to read the old value.
Candidate: Even if CPU2 receives an invalid notification, CPU1’s value has not yet been written to main memory, and CPU2 reads from main memory again with the old value…
Candidates: Variables are often “correlated” (a=1; b=0; B =a), which is again insensitive to the CPU…
Candidate: In general, due to the asynchronous optimization of “Store buffer” and “invalid queue” by CPU on “cache consistency protocol”, it is likely that subsequent instructions will not find the execution results of previous instructions (the execution order of each instruction is not the code execution order). This phenomenon is often referred to as “CPU out of order”
Candidate: In order to solve the out-of-order problem (also known as visibility problems, changes are not synchronized to other cpus in a timely manner), the concept of “memory barrier” was introduced.
Interviewer: HMM…
Candidate: “memory barrier” is a solution to the problem of “asynchronous optimization” causing “CPU out of order”/” cache not visible in time “. How does that work? Well, it’s “asynchronous optimization” to “disabled” (:
Candidates: There are three types of memory barriers: write barriers, read barriers, and omnipotent barriers (including read and write barriers). Barriers can be simply defined as inserting a “special instruction” into the data while it is being manipulated. As long as this command is encountered, all previous operations must be “done”.
Candidate: Write barriers can be understood as follows: when the CPU finds a write barrier instruction, it will flush into the cache all the write instructions that were “previously” in the Store Buffer.
Candidate: In this way, the data modified by the CPU can be immediately exposed to other cpus, achieving “write operation” visibility.
Candidate: The read barrier is similar: when the CPU finds a read barrier instruction, it will process all instructions that were “previously” in the “invalid queue”
Candidate: This ensures that the current CPU cache state is accurate and that the “read operation” must be the latest read.
Candidates: In order to simplify the work of Java developers, different CPU architectures have different cache architectures, different cache consistency protocols, different reordering strategies, and different memory barrier instructions provided. Java encapsulates a set of specifications known as the Java Memory Model
Candidates: In more detail, the Java Memory model seeks to mask differences in access across hardware and operating systems, ensuring that Java programs access memory consistently across platforms. The goal is to solve the atomicity, visibility (cache consistency), and orderliness problems of multithreading.
Interviewer: How about a brief talk about the specifications and contents of the Java memory model?
Candidate: No, I’m afraid it’ll be an afternoon. Maybe another time?
This paper concludes:
-
The three main causes of concurrency problems are visibility, orderliness and atomicity.
-
Visibility: Caches exist under CPU architecture, L1/L2 caches under each core are not shared (not visible)
-
Orderliness: There are three main things that can lead to breaking
- Compiler optimizations lead to reordering (the compiler can reorder code statements by reordering them without changing the semantics of a single-threaded program)
- Instruction set parallel reordering (it is possible to reorder instructions natively by CPU)
- Memory system reordering (CPU architecture is likely to have store buffer /invalid queue buffers, this “asynchronous” is likely to cause instructions reordering)
-
Atomicity: A Java statement often requires more than one CPU instruction to complete (I ++), and since the operating system’s thread switch is likely to cause the I ++ operation to not complete, other threads “midstream” the shared variable I, resulting in an unexpected end result.
-
At the CPU level, to address the “cache consistency” issue, there are related “locks”, such as “bus locks” and “cache locks”.
- A bus lock is a lock bus, and changes to a shared variable allow only one CPU to operate at the same time.
- A cache lock is a lock on a cache line, known as a MESI protocol, which marks the status of the cache line and implements visibility and order of the data by means of “synchronous notification”
- However, “synchronous notifications” affect performance, so there are store buffers /invalid queues that are “asynchronous” to improve CPU efficiency
- After the introduction of memory buffers, there will be “visibility” and “ordering” problems. In most cases, you can enjoy the benefits of “asynchronous”, but in rare cases, you need strong “visibility” and “ordering”, and can only “disable” cache optimization.
- At the CPU level, there are “memory barriers”, read barriers/write barriers/all-in-one barriers, essentially inserting a “barrier instruction” so that all operations in the buffer (store buffer/invalid queue) prior to the barrier instruction are processed. Thus, reads and writes are visible and ordered at the CPU level.
-
The architecture and optimization of different CPU implementations are different. In order to shield the differences between hardware and operating system access to memory, Java put forward the specification of “Java memory model”, which ensures that the Java program access to memory on various platforms can get the same effect
Welcome to follow my wechat official account [Java3y] to talk about Java interview, on line interview series continue to update!
Online Interviewer – Mobile seriesTwo continuous updates a week!
Line – to – line interviewers – computer – end seriesTwo continuous updates a week!
Original is not easy!! Three times!!