preface
In a processor with multiple cores, each core has its own cache, and how to ensure that the contents of the cache of multiple cores are consistent is an easy problem to encounter. MESI is a protocol specifically designed to solve the problem of cache consistency. Many processors use the MESI protocol or a variant of the MESI protocol, which is a variant of the MSI protocol. The MESI protocol uses a write-back strategy to update the cache, which further improves performance, but also introduces an additional risk that can be avoided by writing programs using memory barriers.
Introduction to the MESI protocol
The MESI protocol takes its name from the four cache states it describes: M(Modified), E(exclusive), S(shared) and I(invalid). The description of each state is as follows:
state | describe |
---|---|
Modified | The contents of the current cache are valid. The data has been modified and is inconsistent with the data in memory. The data only exists in the current cache |
Exclusive | The contents of the current cache are valid. The data is consistent with the data in memory. The data exists only in the current cache |
Shared | The current cache is valid, and the data is consistent with the data in memory. The data exists in multiple caches |
Invalid | The current cache is invalid |
State transition
The MESI protocol is a state machine, and the state of the cache is triggered by two types of external events: processor requests to the cache and bus requests to the cache:
- PrRd: The processor requests to read a cache block
- PrWr: The processor requests to write a cache block
- BusRd: A snooper request indicates that another processor is requesting to read a cache block
- BusRdX: A snooper request indicates that another processor is requesting to write a cache block that the processor does not own
- BusUpgr: A snooper request indicates that another processor is requesting to write a cache block owned by that processor
- Flush: The pry request indicates that the request writes back the entire cache to main memory
- FlushOpt: The snooper request indicates that the entire cache block is sent to the bus to be sent to another processor (cache to cache copy)
The transitions between states are shown below:
The initial state | operation | The response |
---|---|---|
Invalid(I) | PrRd |
|
PrWr |
|
|
Exclusive(E) | PrRd |
|
PrWr |
|
|
Shared(S) | PrRd |
|
PrWr |
|
|
Modified(M) | PrRd |
|
PrWr |
|
The initial state | operation | The response |
---|---|---|
Invalid(I) | BusRd |
|
BusRdX/BusUpgr |
|
|
Exclusive(E) | BusRd |
|
BusRdX |
|
|
Shared(S) | BusRd |
|
BusRdX |
|
|
Modified(M) | BusRd |
|
BusRdX |
|
Introduction of memory barriers
MESI’s design is straightforward, but there are two areas where performance deteriorates. First, when updating an invalidate cache, you need to try to retrieve the latest data from another CPU or even memory. Second, invalidate a cache must wait for other cpus to validate it. Both of these operations are time consuming, and if the CPU waits for them, it becomes wasteful.
store buffer
To reduce the latency of writing to an invalidate cache, the Store Buffer can be introduced. Since the write is bound to happen anyway, the CPU notifies the other cpus that the cache is invalid, updates the write to the Store Buffer, and writes the result to memory after the other cpus acknowledge receipt of the write.
In this way, it avoids the time spent waiting for other cpus to confirm the update, but it also causes the UPDATE to not be written to the cache in time. Therefore, when the CPU needs to read the cache, it needs to check whether the store buffer has the required data. This mechanism is called Store Forwarding. It is important to note that when a CPU reads or writes its own Store buffer, the corresponding data changes are not perceived by other cpus.
invalidate queue
When a CPU receives a message invalidating a cache, the expected behavior is for the CPU to invalidate the cache immediately. Instead of invalidating immediately, the CPU sends an acknowledgement message and then adds the invalidate operation to the Invalidate queue, which then executes at an appropriate time (not necessarily immediately). The invalidate queue is also required because the invalidate operation is expensive, and the CPU must discard the cache to perform the invalidate operation. As a result, the cache hit ratio decreases. This improves CPU performance, but at the same time, stale data may exist in the cache.
The memory barrier
The store Buffer and Invalidate Queue optimization problems were solved by a memory barrier. Memory barriers are in the hands of the people who write the programs to circumvent the problems mentioned above.
Memory barriers are classified as write barriers and read barriers. Memory barriers can be added where desired when writing programs. The write barrier forces the CPU to empty the Store Buffer. That is, all changes are written to the cache, and the changes are then written to memory, making them visible to other cpus. The read barrier forces the CPU to perform all invalidate operations in the invalidate queue, invalidating its own cache, and enabling the CPU to obtain the latest cache data from memory or other cpus.
other
At first glance, the MESI protocol looks similar to the Memory model and volatile keyword in Java, but more on that later.