Hello everyone, I am Java not confused (WX public account of the same name). In this second article in this column, I’ll give you a brief overview of volatile and CAS. Why brief introduction, because different processors have different implementation methods, and the processor is too complex, we only need a brief understanding of it. In this article, I will introduce you to the cache consistency protocol and show you how it implements visibility and order.
Lock instruction prefix
For volatile variables, the compiled directive prefixes the lock directive:
lock add1 $0x0,(%esp)
Copy the code
After CAS is compiled, the lock prefix is also automatically added.
lock cmpxchg
Copy the code
It is the addition of the lock prefix that makes the volatile modifier visible and ordered, and allows CAS to replace variables atomically. So what exactly does the LOCK instruction tell the processor to do? In the last article, we learned that kernels can communicate with each other, but how do they communicate? With doubt, we continue to look down. The Lock instruction has two processing implementations in the processor: the bus Lock and the cache consistency protocol.
The bus lock
A bus lock, as its name implies, locks the bus. The CPU bus is responsible for the COMMUNICATION between the CPU and external (cache, memory, etc.). Using the bus lock, a core exclusive bus is selected, and other cores cannot communicate with the memory. At this point, other kernels cannot communicate, so the kernel has shared memory to itself, which solves the atomicity problem. However, when the kernel cannot communicate, it is expensive. This method was provided in the original processor, and now the cache consistency protocol is proposed.
Cache consistency mechanism
Now processors have special protocols to solve the problem of data consistency in multi-core caches. The classic one is the MESI protocol.
In the MESI implementation, each cache line contains three parts: vaild, tag, and block. The three parts are described as follows:
- Vaild indicates the validity of the data
- Tag is used to indicate the memory address corresponding to the data
- Block is used to store data (64 bytes)
Vaild is the validation field that we use to determine whether the cached row is available. There are two ways to handle the vaild field:
- One way is that when one kernel modifies data, other kernels, if they have the data, mark valid as invalid. This way is called write invalidation.
- One way is that when one kernel modifies data, other kernels, if they have the data, update it with the new value and mark it valid. This way is called write update.
The MESI protocol uses the idea of write invalidation. In the MESI protocol, the cache line VALID has four states:
1, M (Modified, Modified) : the data in the cached row is only cached in the kernel’s cache and has been Modified. The data in the current cache row is inconsistent with the data in main memory and needs to be written back to main memory at some point.
2, E (Exclusive, Exclusive) : This cache line is only cached in the kernel’s cache and is not modified, consistent with the data in main memory. The cached row becomes shared when read by other kernels.
3. S (Shared) : This cache line may be cached by multiple kernels, and the data in each cache line is the same as the master data. When one CPU modifies the data, the data in this cache line in other kernels is set to invalid state.
4, I (Invalid, Invalid) : this cache line is Invalid
The kernel of communication
The MESI protocol requires that replicated data be cached in other caches if the cache is not hit, so reading main memory is reduced.
First let’s take a look at what happens when the kernel modifies the data, what happens to the data in the cache row.
- If kernel 0 changes data to 0, it broadcasts to all kernels first, telling other kernels what data to change
- After kernel 1 receives the broadcast message from kernel 0, it checks the cache line containing the data in the cache
- Kernel 1 deletes or invalidates the found cache rows (vaild is set to I)
- Kernel 0 receives feedback from kernel 1, saves the data in the cache line, and modifies valid to E or M
- Kernel 0 saves data to memory.
To ensure that the data shared variables read by each kernel are up to date, Core0 needs to wait until Core1 returns a message before continuing execution.
However, kernel 0 cannot wait to receive a message from kernel 1 before proceeding.
So we added a component to store the cache.
Store Buffer
When Core0 broadcasts a message, it first stores the message in StoreBuffer so that core0 does not have to know whether other kernels have received the message.
The introduction of Store Buffer in the kernel solved the problem of waiting between kernels, but also introduced the problem of synchronizing data between Store Buffer and Cache.
To do this, for Store Buffers, the CPU flushed all Store Buffer values into memory in sequence before subsequent variable values were written.
This is called a Store Barrier.
Invalidate Queue
Core0 broadcast data cannot be processed immediately after modification. Instead, core1 adds an invalid queue component to the kernel to store invalid data from received broadcasts.
- When other kernels receive an invalid instruction, they do not need to confirm whether the cache line is actually invalid. Instead, they put it in the Invalidate Queue and return an invalid instruction confirmation. Wait until the kernel is idle to process invalid instructions in the Invalidate Queue.
- The introduction of the Invalidate Queue solved the problem of the kernel not being able to reply to messages in a timely manner, but it also caused some problems, such as not setting the cached row to invalid status and using it in a timely manner.
- For this reason, the Invalidate Queue must wait for the Invalidate Queue to be fully applied to the memory before subsequent read operations can be continued to ensure that the read operations before and after execution are sequential to other kernels. This is also called the Load Barrier of memory.
Volatile and cache consistency protocols
Above we looked at bus locks and cache consistency. Using the lock prefix instructions, the kernel uses the cache consistency protocol to handle shared data.
Memory barriers include read barriers, write barriers, and full barriers.
Read barriers: Get other kernel changes to make data in the current kernel the latest value, i.e. apply data from the Invalidate Queue to the kernel; Write barrier: make kernel changes visible to other kernels, write storeBuffer data to cache/memory; Full barrier: is the most expensive of several memory barriers, including several other barriers;
Volatile provides visibility because of cache consistency and bus locking.
Locks also have the effect of a full barrier, so volatile ensures order.
conclusion
In this article, I described the relationship between cache consistency and volatile, and how volatile enables visibility and order. I hope you have a harvest after reading, limited to personal level, if the article is wrong, but also hope readers don’t hesitate to comment.
In the next article, I’ll introduce you to the design ideas behind Synchronized and Reentrantlock, also known as semaphores and tubes in the college textbook Computer Operating Systems.
Finally, if my article is helpful to you, please help me like and forward! If you’re not familiar with volatile and CAS, you can check out my first article, “Getting to the bottom of this: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: Getting to the bottom of this column: