This is the 25th day of my participation in Gwen Challenge. Following on from the previous post hello, what about the volatile keyword? (a)

Analyze visibility from the hardware level

At its core, volatile is about visibility. What is the nature of this visibility?

As we know, CPU, memory, I/O devices and so on are the core components of a computer. Hardware manufacturers will improve the overall performance of the computer by continuously improving the performance of these three components. There is a big difference in processing speed between the three, CPU> memory >I/O devices. According to the barrel principle, the most important factor affecting the overall performance is the shortest plate. In order to balance the performance of the three and maximize the utilization of computing resources, the previous article “Concurrent Programming processes, Threads, and Coroutines” mentioned that a group of smart people have developed operating systems, processes, and threads to maximize the utilization of CPU resources through time slice switching.

However, this is not enough, is it possible to execute CPU instructions, access resources directly stored on the CPU, rather than access memory, of course, there is CPU cache, after the cache, in order to make more reasonable use of the cache, compiler instructions have been optimized. These optimizations have created all sorts of problems, although they have literally drained CPU resources. Industrious people began to solve such problems again, bringing the idea of concurrent programming into an era of explosion.

4.1 CPU Cache

CPU cache is a temporary memory located between CPU and memory. Its function is mainly to solve the contradiction between CPU computing speed and memory reading speed. To put it more bluntly, what we use the computer ultimately is the computing power of the computer, but to complete the computation is very complicated, not only the CPU is involved in the line, the CPU needs to interact with the memory to read and write operations. Therefore, in order to eliminate as much time as possible for the CPU to execute this IO, people have developed a cache to solve this problem.

As shown in figure,

  • L1 CacheThis refers to level 1 caching, which is basically caching instructions and data,CoreExclusive. Because of its complex structure and high cost, it has the smallest capacity in the cache, and the common capacity range is32KB~512KB.
  • L2 CacheLevel 2 cache, direct impactCPUThe performance,CoreExclusive. The universal capacity range is128KB~24MB.
  • L3 CacheLevel 3 caching, which further reduces memory latency and improves performance in scenarios with more data,CoreTo be Shared. The universal capacity range is2MB~32MB.

In general, the closer you are to the CPU core’s cache, the smaller the capacity and the lower the simultaneous access latency (the faster the speed).

The storage interaction of cache solves the contradiction of processor and memory speed difference, but it introduces a new problem, that is cache consistency.

4.2 Cache Consistency

The process of an operation is to move the data used in the operation to the cache at the beginning, and then synchronize the result to memory at the end of the operation. This is the biggest problem in multi-core CPU running environment, the main memory of the Shared data may be more nuclear cache at the same time, to read and write operations at the same time, to write data back to the first time is in the cache, and then further synchronization to main memory, but the other at the core of the cache may not perceive the change, it can produce cache inconsistency problem.

To solve this problem and ensure cache consistency, there are two ways to solve this problem.

  • The bus lock
  • Cache lock

A bus lock, in simple terms, locally locks the communication between the CPU and memory at the bus level. During the lock, other threads cannot access the locked memory address. This approach costs a lot, affects a wide range of lock control granularity is coarse, is not suitable for solving the problem of inconsistent cache.

So in order to optimize the bus lock, they put forward the cache lock. The core mechanism of the cache lock is the cache consistency protocol, which reduces the granularity of the lock and ensures that the contents of the cache on each CPU core are consistent for the same shared data in main memory.

4.3 CPU Cache and Cache Line

The above mentioned cache lock uses the cache consistency protocol to ensure cache consistency. There are many such protocols, the most common being MESI. Since the MESI protocol is implemented in caches in four states per Cache line, there is one piece of hardware that needs to be mentioned in advance: the difference between a CPU Cache and a Cahce line.

If the L1 Cache size is 512KB, there are eight (512/64=8) Cache lines. If the L1 Cache size is 512KB, there are eight (512/64=8) Cache lines. Refer to the diagram below for details. So MESI’s granularity is the Cache Line, so the best effect is that when the main memory and Cache data need to be synchronized, the impact is only on the Cache Line, not the entire CPU Cache.

It’s a long story, so click here to continueHello. What about the volatile keyword, please? (3)


Brother boy, don’t panic to go! Leave a thumbs-up and comment on the discussion. Welcome to the column face interview don’t panic | Java concurrent programming, a raise don’t have to worry about the interview. Also welcome to pay attention to me, must be a longer better man.