This article comes from the public account yongxing
Original link: mp.weixin.qq.com/s… Author: Yongxing ingenuity
Mentioning cache, we are certainly familiar with, because most of the system data are local, that is, some data is often used, we can cache it first, so that, on the one hand, can improve the throughput of the system; On the other hand, it can also reduce the request pressure of third-party systems such as databases.
Cache displacement refers to when the cache is full and new data needs to be cached, an entry in the cache needs to be eliminated to make room for the new data. If the design of a cache replacement scheme is not reasonable, we often cannot find the desired data in the cache, at this time, we need to carry out frequent cache replacement, the role of cache is small, or even a negative effect, originally only need to request the external system once, but now additional read and write to the cache system.
When it comes to weeding out data from the cache, we want to weeding out data that won’t be used in the future and keep data that will be accessed frequently in the future, but the problem with caching is that it doesn’t predict the future. One solution is to predict through LRU: recently accessed data is more likely to be accessed in the future.
Conventional LRU algorithm implementation
Common LRU implementations use hash lists, which are a combination of bidirectional lists and hash tables.
When querying, hash table can be used to quickly find out whether a key is in the cache (linked list) and read the value under O (1) complexity; After each access, the cached entry is moved to the linked list header. In this way, data that has not been accessed recently is at the end of the list, and when a cache swap occurs, the data at the end of the list is deleted and new data is written to the head of the list.
Why two-way lists? What’s wrong with using one-way lists? Bidirectional lists are used in order to move cached data to table headers with a complexity of O (1). Moving the location of cached data in the linked list is equivalent to deleting the node first and then moving it to the head of the list. When deleting the node, we need to know which node is the precursor node and which node is the back-end node respectively before they can be connected. Unidirectional linked list requires traversal of the list to obtain the precursor node, which obviously cannot meet the requirements.
Redis approximates the LRU implementation
The above LRU implementation uses a bidirectional linked list to record the recent access of data. Each linked list node needs to maintain a precursor pointer and a rear pointer. When the cache is large, the extra memory occupied by the two Pointers can not be ignored. Therefore, in order to save memory, an approximate LRU substitution algorithm based on sampling is implemented in redIS.
The cached data is still managed by a hash table, and the corresponding data can be found quickly with the key. In addition to key-value, store a timestamp (last_read_time) of the last accessed data. When a cache swap occurs, N cached data is randomly selected and the data that has not been accessed for the longest is eliminated.
Although this scheme may not filter the data that has not been accessed for the longest time in the whole cache, it is simple to calculate and the execution efficiency is O (1) level. Moreover, this scheme can be further optimized. Each time we eliminate, the last access time of the n-1 data left after the elimination of the last sample may be earlier than the last access time of the N data obtained from the next sample. In this case, the old data left after the first sample will not be eliminated.
New data: The last access time is recent, and the value of last_read_time is large
Old data: The value of last_read_time is small
In order to live up to the previous sampling effort and make the algorithm eliminate older data as much as possible, we can maintain an additional large top heap of size M, where the data is sorted by the value of last_read_time, so that the latest data in the heap is at the top. If the last_read_time is smaller than the top element (i.e. the sampled data is older), delete the top element and insert the sampled data into the heap. If it is larger than the top of the heap (that is, the sampled data is relatively new), it is not processed. After all the samples are compared, we then eliminate the oldest piece of data in the heap, so that we can combine the historical samples with older pieces of data.
Bit level emulates LRU
In both implementations, we have touched on the hash table in passing, but in some scenarios, caching is expensive and the cost of operating it is high, requiring a more low-level design of the cache to make better use of the cache space. For example, the cache on the CPU is very small, maybe only hundreds or thousands of cache lines, but it is very close to the CPU, the cost is very high, and the requirements for cache performance are also higher.
We first abstract the data structure of this type of cache into an array of specific length, and cache this array.
In order to quickly query a cache data, we can still refer to the idea of hash table and design a hash function to quickly locate the location of data in the array according to the key. Of course, the problem is also obvious. After a data is hashed, the array position is determined, so the cache data replaced by the cache replacement is also determined. There is no option to discard older data.The problem is that data is uniquely positioned in an array, and if you allow data to map to multiple locations in the array, you can eliminate older data from the cache at those locations.
Here we present a scheme in which, after hashing a position A, we can find the data in the next N positions starting from a. The N positions form a selection group. For example, if the total cache capacity is 100, set the group size to 8. Key =”lru”; lru =”lru”; lru =”lru”; lru =”lru”;When new data needs to be cached, N data of the selection group is calculated by hash, and the old data is selected to replace the new data in the N data. So, how to choose at this time?
It is easy to think of a redis implementation that records the last accessed timestamp for each cached data and, when substituting, removes the oldest data from the selection group. However, storing an extra timestamp is a bit too expensive for a CPU cache.
1 bit simulation LRU
Here is a more “economical” simulated LRU substitution scheme, where each cache entry takes out 1 LRU bit (bit) to record the cache access. When the cache is accessed, this bit is set to 1 and the cache looking for 0 is replaced. When all the cache entries in the selection group are 1, all the LRU bits of the cache bar in the selection group are reset to 0.
Bit search tree emulates LRU
Finally, a more ingenious method of simulating LRU is introduced. Construct a full binary tree with several bits for each selection group, as shown below.Each node in the tree is a bit. When the node is 0, it points to the left child node, and when the node is 1, it points to the right child node. The initial state is 0, that is, they all point to the left.When the cache is read, the arrow pointing to the node of the cache is changed. For example, when A is read, the arrow pointing to A is flipped, resulting in the following figure.
Of course, if you read d, you don’t have to change the direction of the whole tree.
When a cache substitution occurs, the search starts at the root node and follows the arrow to find the cache entry that needs to be eliminated and replaced. During the search, the arrows at all nodes in the path are reversed so that 0 becomes 1 and 1 becomes 0. For example, to write a new cache “K”, the result is as follows.In summary, the cached entries that the leaves of the tree point to are accessed earlier and should be eliminated first.
Is there a requirement for the number of caches in the selection group to construct a bit-tree simulated LRU?
It should be 2 to the n, because the search tree is a full binary tree, the number of leaves is 2 to the n, and each leaf is responsible for two cached data, so, how many caches? > The number of data should also be 2^n, otherwise the cache data to be discarded may not be found during the substitution.
Recommended reading
[Architecture design] Using PAXOS algorithm to achieve the master-slave election scheme
Write in the last
Like this article friends, welcome to pay attention to the public account “Yongbu ingenuity”, focus on plain words to share practical technology