“This is the sixth day of my participation in the August More Text Challenge.

CPU cache

The CPU cache is temporary storage between the CPU and memory. It has a much smaller capacity than memory, but is swapped much faster than memory. The cache is used to resolve the mismatch between the CPU’s computing speed and the memory’s read/write speed. As the CPU’s computing speed is much faster than the memory’s, the CPU spends a long time waiting for data or writing data into the memory. When a large amount of data is called by the CPU, the data is called from the cache first to speed up the read time.

CPU multi-level cache

Not long after THE advent of CPU caches, as systems became more complex, the speed between the cache and main memory was increased, until another level of cache was added, which was larger and slower than the first, and was not economically feasible, so there was a level 2 cache, or even a level 3 cache. All data stored in each level of cache is part of the next level of cache. The technical difficulty and manufacturing cost of these three caches are relatively decreasing, so their capacity is also relatively increasing. When the CPU wants to read a piece of data, it first looks for it in the level-1 cache, then in the level-2 cache if it doesn’t find it, and then in the level-3 cache or memory if it still doesn’t find it. Generally speaking, the hit ratio of each level of cache is about 80%, that is, 80% of the total data volume can be found in level 1 cache, and only 20% of the total data volume needs to be read from level 2 cache, level 3 cache, or memory. Therefore, level 1 cache is the most important part of the entire CPU cache architecture.

CPU Cache Consistency

In the case of multi-core CPU, there are multiple level 1 caches. How to ensure the consistency of the internal data in the cache and prevent the system data from being chaotic? This leads to a consistent protocol, MESI.

Each cache line in the CPU is marked with four states

  • M(Modified, Modified):

The cached row is only cached in the CPU cache and is modified, not the same as the main memory data, which needs to be written back to main memory at some point in the future. When written back to main memory, the cache row becomes exclusive.

  • E(Exclusive, Exclusive)

The cache row is only cached in the CPU cache and is consistent with the main memory data. It can change to a shared state at any time when the memory is read by another CPU. Similarly, it can change to an M state when the CACHE is modified by the CPU.

  • S(Shared, Shared)

This state means that the cache row may be cached by multiple cpus, and each cache is consistent with the main memory data. When one CPU modifies the cache row, the cache row can be invalidated in other cpus.

  • I(Invalid, Invalid)

The cache is invalid (other cpus may have modified the cache line).

In a typical system, there may be several cache shared main memory buses, and each corresponding CPU will make read and write requests. The purpose of caching is to reduce the number of CPU reads and writes to the shared main memory.

A cache can satisfy CPU read requests except in the I state. An invalid cache line must be read from main memory to satisfy the CPU request.

A write request must be in the M or E state before it can be executed. If a cache row is in the S state, it must first be in the Invalid state in other caches.

A cache row in the M state must constantly monitor all attempts to read the cache row relative to main memory, which must be deferred until the cache writes the row back to main memory and changes the state to S.

A cache line in the S state must also always listen for requests from other caches that the cache line is invalid or exclusive to the cache line and invalidate the cache line.

A cache line in the E state also listens for other caches to read that cache line in main memory. Once such operations occur, the cache line needs to become the S state.

It is always exact for the M and E states, which are consistent with the true state of the cached row. The S state may be inconsistent. If one cache invalidates a cached row in the S state and another cache actually has the cache all to itself, that cache will not promote the row to the E state because the other caches will not broadcast their invalidation notification.

Also because the cache does not hold the number of copies of the cached row, there is no way (even with this notification) to know if you have the cached row to yourself.

The CPU executes out of order

An optimization made by a processor to speed up processing that violates the original order of code.

Such as: A =10,b=20,result=a+b, normally a is executed first, then B, and finally a+b, but if A is not in the cache and B is in the cache, it needs to be read from main memory because A is not in the cache, so the operation b=20 needs to wait for a to complete. To improve CPU efficiency, Run b=20, a=10, and a+b to improve execution efficiency.

The memory barrier

Out-of-order CPU execution is a good optimization tool in a single-threaded environment, but in a multi-threaded environment, data inconsistencies can occur, so memory barriers can be used to deal with this problem.

1. Store Memory Barrier: Inserting a Store Memory Barrier after an instruction allows the latest data to be written to main Memory for other threads to see. Force write to main memory, this display call, does not cause the CPU to do instruction reorder 2. Load Memory barriers: Inserting Load barriers after instructions invalidates data in the cache, forcing data to be reloaded from main Memory. It also doesn’t ask the CPU to rearrange instructions.