“This is the 27th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

1. CPU cache structure

Modern cpus typically consist of a three-tier cache architecture, as shown in the following figure:

CPU in Windows:

Check the Linux CPU cache as follows:

[root@public-server9 ~]# lscpuArchitecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: Model Name: Intel(R) Xeon(R) CPU E5-2680 V4 @ 2.40GHz Step: 1 CPU MHz: 2399.998 BogoMIPS: 4799.99 hypermanager vendor: VMware virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840KCopy the code
  • The efficiency between caches is as follows:

    Suppose an instruction is executed

    The name of the Approximately required clock cycles
    register 1
    L1 3 ~ 4
    L2 10 ~ 20
    L3 40 ~ 45
    memory 120 ~ 240
  • Why is it so designed?

    The significance of cache is to cache hotspot data. With the progress of science and technology and the gradual increase of hotspot data, level-1 cache is no longer sufficient.

    Level 2 cache is the buffer of level 1 cache, which is fast, costly and small in capacity.

    Level 3 cache acts as a buffer for level 2 cache and is slower than level 2 cache. The difference is that level 3 caching is a buffer shared by multiple cores. Think of it as a smaller but faster memory.

Cache LIne

You’ve probably heard of a cache line, the smallest cache unit in a CPU cache, usually 64 bytes in size.

We can view the cache row size in Linux as follows:

[root@public-server9 ~]# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
64
Copy the code

Instead of moving data in a single byte, the CPU moves data in a single cache behavior.

When the CPU loads memory data into the Cache, it places adjacent 64 bytes of data on the same Cache line.

According to the principle of spatial locality: adjacent data is more likely to be accessed in the future.

Third, CPU cache consistency

Multiple cpus simultaneously read and write data to the same memory block, which causes conflicts. This problem is the CPU cache consistency problem.

How do you ensure data consistency in multilevel caches?

The MESI protocol is an Invalidate-based cache consistency protocol and is one of the most commonly used protocols to support write-back caching. The MESI protocol requires that cache-to-cache data be replicated in the event of a cache miss and the data block is in another cache. This greatly improves performance by reducing the number of main memory transactions.

Each cache row in MESI has four states:

  • E (exclusive) Exclusive

    The cache row is only in the current cache, but is clean — the cached data is the same as the main memory data. When other caches read it, the state becomes shared (S); When data is being written, the state changes to the modified state (M).

  • M (modified) has been modified

    The cache lines are dirty and differ from the values in main memory. If another CPU kernel wants to read the data from main memory, the cache line must be written back to main memory and the state becomes shared (S).

  • S (shared)

    The cache rows also exist in other caches and are clean.

  • I (invalid) Invalid

    The cache line is invalid

The following is a brief description of the state transition process and relationships:

1) Cache lines in M, E, and S states can meet CPU read requests; The I state is invalid and will be refetched.

2) The cache line in state E will change its state to M when a write request occurs, but it will not synchronize to main memory at this time. The cache line in state E listens for read requests and changes its state to S when there are read requests.

3) The cache line in M state needs to be monitored for its read operation. If the read operation occurs, the cache line in other caches (S state) will be changed to I state, and it will write itself to main storage, and then it will be changed to S state.

4) The cache line in S state will change itself to M state if there is a write request. If there is a read request, it will repeat the steps in 3) to change the cache line in other caches to I state and write itself to main storage to change itself to S state.

5) The cache line in S state needs to monitor the invalidation operation of the cache line. If the invalidation operation occurs, it needs to change itself into I state.

6) The cache line of I state has a read request and needs to be obtained from main memory.

It forms a closed loop up here.

This article mainly on the CPU cache consistency to do a basic understanding, easy to learn multithreaded concurrent programming, too much this article will not explain.

Memory barrier

Here’s a quick look at what a memory barrier is.

In previous articles, we talked about visibility between threads, and order. How is this implemented? At the time we were talking about the use of the volatile keyword, the underlying essence of which was to cross memory barriers.

Memory barriers are mainly divided into read barriers and write barriers.

  • visibility

    • Write barriers: Changes to shared variables are synchronized to main memory before they are written to barriers.
    • Read barriers: For changes to shared variables, data in main memory is loaded after a read barrier.
  • order

    • Write barrier: Ensures that the data in front of the write barrier is not placed behind the write barrier when instruction reordering occurs.
    • Read barrier: Ensures that code behind the read barrier is not placed before the read barrier when instruction reordering occurs.

In this article, we’ll look at the principles of volatile in the next section.