This paper refers to:

  • Courses.cs.washington.edu/courses/cse…
  • Cache performance measurement and metric
  • Cache placement policies

The paper

I’m sure you’re familiar with this one (close the page if you’re hearing it for the first time :)). The closer you get to the CPU, the faster it is, but it is small and expensive. How to make efficient use of cache (LEVEL 1) is a very important part of operating system.

In the last article, we talked about how cores in the same CPU can compete with each other for last-level cache, which can affect system performance. In UMA architecture, if a CPU has multiple Threads running on different cores, threads on one core can be moved to other cores, alleviating last-level CAHCE competition. To increase speed (we also explained that this approach does not work in NUMA architectures, see the previous article for reasons).

The reason last-level cache contention affects performance is obvious. Because the cache size is limited, one thread occupies a portion of the cache, which is likely to cannibalize other caches. This phenomenon is a type of cache miss. The size of cache miss is an important indicator to measure cache efficiency.

This article will cover some of the basics of cache first and then go further.

Direct Mapped Cache

In the figure below, main memory has 16 bytes and cache has 4 bytes.

Map 0,4,8,12 in main memory to 0 in cache. 1,5,9,13 mapped to cache 2… Yeah, it’s modulo operations. That answers our first question.

This method is called Direct Mapped Cache is the simplest kind of map method, also is not the highest efficiency, there are several can through the following link to see: en.wikipedia.org/wiki/Cache_…

If more complex, we can see the wikipedia on this chart, the original address is as follows: en.wikipedia.org/wiki/Cache_…

Read data from the cache

Using the above method, it is easy to know where a byte in main Memory is located in the cache, but remember that there are many different blocks pointing to the same location in the cache.

You can store the highest bit of main memory in the cache as a tag, as shown in the following figure:

For example, if the index of an entry in the cache is 01 and its tag is 11, the corresponding main memory address is tag + index (1101).

In addition, a valid bit is added to indicate whether the cache block is valid.

  • At first, the cache is empty, so the bits are 0.
  • When data is loaded into the block, the bit becomes 1.

What happens when a cache hit occurs?

We already know how to retrieve data from the cache using the main Memory address. If, as we expect, we do succeed in retrieving data from the cache, what happens to this segment?

When the CPU needs to read data from memory, the address is sent to the cache controller:

  • The lowest bits of the address correspond to a block in the cache
  • If the valid bit is 1 and the tag bit matches, the cache hit is successful and the data can be returned to the CPU

The diagram below:

What happens when you cache miss?

If a cache miss occurs, the CPU must load the data from main Memory to the cache first. Now that the data has been fetched from main Memory, load it to cache using the following steps:

  • The low k bit specifies the cache block location
  • The high (M-K) bit contains the tag bit of the cache block
  • The data in main Memory is copied to the data field in cache
  • Set the valid bit to 1

As shown in figure:

More about cache miss

First, the performance difference between cache misses and cache hits is of an order of magnitude, and the probability of cache misses should be minimized. There are many kinds of reasons, the cache miss here two of the most main, detailed list please see: en.wikipedia.org/wiki/Cache_…

  • Unavoidable misses: All caches must not be in the cache to begin with, so they must be loaded from main memory. (unless the trial prefetch: en.wikipedia.org/wiki/Cache_…).
  • The data was already in the cache, but was preempted by another program. (Recall the last article about last-level cache contention, which can be resolved by scheduling.)

Spatial locality

Each byte is a block. We have an assumption that if you visit an address, the next visit will most likely visit a nearby address. This assumption is true in most cases. This assumption is called Spatial locality. How can this assumption be realized? You can increase the size of each block.

Now set the size of each cache block to 2 bytes, so we can load two bytes at once. When we load location 12 to the cache, we also load location 13 to the cache.

In this case, main memory can also be divided into two bytes and a block. The mapping between byte address and block address is also simple. 0 and 1 correspond to a block address: 0.

The process of importing the cache is the same as before, so there is nothing to explain: now the 12th or 13th byte is loaded from main memory at the same time as the 12th or 13th byte.

conclusion

Cache resources have a great influence on the performance of the whole system. There are endless papers about how to design Cache and how to schedule Cache resources. The Direct Mapped Cache described in this article is the simplest, but it will go a long way towards understanding Cache. We’ll learn more about cache later.

Finally, the Gakki form asks for praise

Welcome to pay attention to my wechat public number during commercial time. This article is available at github: github.com/liaochangji…