preface

In a processor with multiple cores, each core has its own cache, and how to ensure that the contents of the cache of multiple cores are consistent is an easy problem to encounter. MESI is a protocol specifically designed to solve the problem of cache consistency. Many processors use the MESI protocol or a variant of the MESI protocol, which is a variant of the MSI protocol. The MESI protocol uses a write-back strategy to update the cache, which further improves performance, but also introduces an additional risk that can be avoided by writing programs using memory barriers.

Introduction to the MESI protocol

The MESI protocol takes its name from the four cache states it describes: M(Modified), E(exclusive), S(shared) and I(invalid). The description of each state is as follows:

state describe
Modified The contents of the current cache are valid. The data has been modified and is inconsistent with the data in memory. The data only exists in the current cache
Exclusive The contents of the current cache are valid. The data is consistent with the data in memory. The data exists only in the current cache
Shared The current cache is valid, and the data is consistent with the data in memory. The data exists in multiple caches
Invalid The current cache is invalid

State transition

The MESI protocol is a state machine, and the state of the cache is triggered by two types of external events: processor requests to the cache and bus requests to the cache:

  1. PrRd: The processor requests to read a cache block
  2. PrWr: The processor requests to write a cache block
  3. BusRd: A snooper request indicates that another processor is requesting to read a cache block
  4. BusRdX: A snooper request indicates that another processor is requesting to write a cache block that the processor does not own
  5. BusUpgr: A snooper request indicates that another processor is requesting to write a cache block owned by that processor
  6. Flush: The pry request indicates that the request writes back the entire cache to main memory
  7. FlushOpt: The snooper request indicates that the entire cache block is sent to the bus to be sent to another processor (cache to cache copy)

The transitions between states are shown below:

State transitions resulting from processor operations
The initial state operation The response
Invalid(I) PrRd
  • Send BusRd signal to bus
  • Other processors see BusRd, check if they have a valid copy of the data, and notify the requesting cache
  • If other caches have valid copies, the state transitions to (S)Shared
  • If no other cache has a valid copy, the state transitions to (E)Exclusive
  • One of the caches emits data if the other caches have a valid copy; Otherwise get the data from main memory
PrWr
  • Send BusRdX signal to the bus
  • The state transitions to (M)Modified
  • One of the caches emits data if the other caches have a valid copy; Otherwise get the data from main memory
  • If other caches have a valid copy, the BusRdX signal will invalidate the copy
  • Writes the modified value to the cache block
Exclusive(E) PrRd
  • Bus-free transaction generation
  • The state stays the same
  • Read operation is a cache hit
PrWr
  • Bus-free transaction generation
  • The state transitions to (M)Modified
  • Writes the modified value to the cache block
Shared(S) PrRd
  • Bus-free transaction generation
  • The state stays the same
  • Read operation is a cache hit
PrWr
  • The bus transaction BusUpgr signal is emitted
  • The state transitions to (M)Modified
  • Other caches see the BusUpgr bus signal and mark its copy as (I)Invalid.
Modified(M) PrRd
  • Bus-free transaction generation
  • The state stays the same
  • Read operation is a cache hit
PrWr
  • Bus-free transaction generation
  • The state stays the same
  • The write operation is a cache hit
State transitions resulting from different bus operations
The initial state operation The response
Invalid(I) BusRd
  • The state remains unchanged and the signal is ignored
BusRdX/BusUpgr
  • The state remains unchanged and the signal is ignored
Exclusive(E) BusRd
  • Status Changed to Shared
  • The bus FlushOpt signal is emitted and the contents of the block are emitted
BusRdX
  • State becomes invalid
  • The bus FlushOpt signal is emitted and the contents of the block are emitted
Shared(S) BusRd
  • Status Changed to Shared
  • May issue the bus FlushOpt signal and issue the contents of the block (design time determines which shared cache issues data)
BusRdX
  • State becomes invalid
  • May issue the bus FlushOpt signal and issue the contents of the block (design time determines which shared cache issues data)
Modified(M) BusRd
  • Status Changed to Shared
  • FlushOpt the bus signal and issues the contents of the block. The receiver is the cache and main memory controller (write back main memory) that originally issues BusRd.
BusRdX
  • State becomes invalid
  • FlushOpt the bus signal and issues the contents of the block. The receiver is the cache and main memory controller (write back main memory) that originally issues BusRd.

Introduction of memory barriers

MESI’s design is straightforward, but there are two areas where performance deteriorates. First, when updating an invalidate cache, you need to try to retrieve the latest data from another CPU or even memory. Second, invalidate a cache must wait for other cpus to validate it. Both of these operations are time consuming, and if the CPU waits for them, it becomes wasteful.

store buffer

To reduce the latency of writing to an invalidate cache, the Store Buffer can be introduced. Since the write is bound to happen anyway, the CPU notifies the other cpus that the cache is invalid, updates the write to the Store Buffer, and writes the result to memory after the other cpus acknowledge receipt of the write.

In this way, it avoids the time spent waiting for other cpus to confirm the update, but it also causes the UPDATE to not be written to the cache in time. Therefore, when the CPU needs to read the cache, it needs to check whether the store buffer has the required data. This mechanism is called Store Forwarding. It is important to note that when a CPU reads or writes its own Store buffer, the corresponding data changes are not perceived by other cpus.

invalidate queue

When a CPU receives a message invalidating a cache, the expected behavior is for the CPU to invalidate the cache immediately. Instead of invalidating immediately, the CPU sends an acknowledgement message and then adds the invalidate operation to the Invalidate queue, which then executes at an appropriate time (not necessarily immediately). The invalidate queue is also required because the invalidate operation is expensive, and the CPU must discard the cache to perform the invalidate operation. As a result, the cache hit ratio decreases. This improves CPU performance, but at the same time, stale data may exist in the cache.

The memory barrier

The store Buffer and Invalidate Queue optimization problems were solved by a memory barrier. Memory barriers are in the hands of the people who write the programs to circumvent the problems mentioned above.

Memory barriers are classified as write barriers and read barriers. Memory barriers can be added where desired when writing programs. The write barrier forces the CPU to empty the Store Buffer. That is, all changes are written to the cache, and the changes are then written to memory, making them visible to other cpus. The read barrier forces the CPU to perform all invalidate operations in the invalidate queue, invalidating its own cache, and enabling the CPU to obtain the latest cache data from memory or other cpus.

other

At first glance, the MESI protocol looks similar to the Memory model and volatile keyword in Java, but more on that later.