The full name of LRU algorithm is Least Recently Use, which is widely used in cache mechanism. When the space used by the cache reaches the upper limit, part of the existing data needs to be eliminated to maintain the availability of the cache, and the selection of the eliminated data is accomplished through the LRU algorithm.
The basic idea of LRU algorithm is time locality based on locality principle:
If an information item is being accessed, it is likely to be accessed again in the near future.
So, as the name implies, the LRU algorithm selects the least recently used data for elimination.
The principle of
Generally speaking, LRU maintains the order or time of accessing data and the data itself in a container. When accessing a data:
- If the data is not in the container, set the priority of the data to the highest and add the data to the container.
- If the data is in the container, update the data with the highest priority.
When the total amount of data reaches the upper limit, the data with the lowest priority is removed from the container. Here is a simple schematic diagram of the LRU principle:
If we access the data in the order of 7 0 1 2 0 3 0 4 and the total amount of data is 3, as shown in the figure above, the LRU algorithm will eliminate the data 7 1 2 in turn.
Naive LRU algorithm
So we now follow the above principle, to implement a naive LRU algorithm. There are three options:
-
Based on the array
Solution: Attach an additional attribute — a timestamp — to each piece of data, and update the timestamp to the current time each time the data is accessed. When the data space is full, the entire array is scanned and the data with the smallest timestamp is eliminated.
Disadvantages: Extra space is required to maintain timestamps, and the entire array is scanned when data is weeded out.
-
Bidirectional linked list based on finite length
Solution: When accessing a data, if the data is not in the list, insert the data into the head of the list, if in the list, move the data to the head of the list. When the data space is full, the data at the end of the list is eliminated.
Disadvantage: When inserting or fetching data, scan the entire list.
-
Based on bidirectional linked lists and hash tables
Solution: In order to improve the above defect of scanning linked list, cooperate with hash table to form mapping between data and nodes in the linked list, and reduce the time complexity of insert operation and read operation from O(N) to O(1).
LRU based on bidirectional linked list + hash table
Here we will implement an LRU algorithm based on bidirectional linked list and hash table
public class LRUCache {
private int size; // Current capacity
private int capacity; // Limit the size
private Map<Integer, DoubleQueueNode> map; // Map data to nodes in the linked list
private DoubleQueueNode head; // Headers avoid null checks
private DoubleQueueNode tail; // Endnodes avoid null checking
public LRUCache(int capacity) {
this.capacity = capacity;
this.map = new HashMap<>(capacity);
this.head = new DoubleQueueNode(0.0);
this.tail = new DoubleQueueNode(0.0);
this.head.next = tail;
}
public Integer get(Integer key) {
DoubleQueueNode node = map.get(key);
if (node == null) {
return null;
}
// If the data is in the list, it moves to the head of the list
moveToHead(node);
return node.val;
}
public Integer put(Integer key, Integer value) {
Integer oldValue;
DoubleQueueNode node = map.get(key);
if (node == null) {
// Discard data
eliminate();
// The data is not in the list, insert the data to the header
DoubleQueueNode newNode = new DoubleQueueNode(key, value);
DoubleQueueNode temp = head.next;
head.next = newNode;
newNode.next = temp;
newNode.pre = head;
temp.pre = newNode;
map.put(key, newNode);
size++;
oldValue = null;
} else {
// If the data is in the list, it moves to the head of the list
moveToHead(node);
oldValue = node.val;
node.val = value;
}
return oldValue;
}
public Integer remove(Integer key) {
DoubleQueueNode deletedNode = map.get(key);
if (deletedNode == null) {
return null;
}
deletedNode.pre.next = deletedNode.next;
deletedNode.next.pre = deletedNode.pre;
map.remove(key);
return deletedNode.val;
}
// Insert the node into the header node
private void moveToHead(DoubleQueueNode node) {
node.pre.next = node.next;
node.next.pre = node.pre;
DoubleQueueNode temp = head.next;
head.next = node;
node.next = temp;
node.pre = head;
temp.pre = node;
}
private void eliminate(a) {
if (size < capacity) {
return;
}
// Remove the last node in the list
DoubleQueueNode last = tail.pre;
map.remove(last.key);
last.pre.next = tail;
tail.pre = last.pre;
size--;
last = null; }}// Two-way linked list node
class DoubleQueueNode {
int key;
int val;
DoubleQueueNode pre;
DoubleQueueNode next;
public DoubleQueueNode(int key, int val) {
this.key = key;
this.val = val; }}Copy the code
Basically, the above LRU algorithm idea is implemented with the code again, relatively simple, just need to pay attention to the pre and next two Pointers pointing and synchronous update hash table, put() and GET () operation time complexity is O(1), space complexity is O(N).
LRU based on the LinkedHashMap implementation
We can implement LRU directly from the LinkedHashMap provided by the JDK. Because the underlying LinkedHashMap is a combination of a bidirectional linked list and a hash table, it can be used directly.
public class LRUCache extends LinkedHashMap {
private int capacity;
public LRUCache(int capacity) {
// Note that accessOrder of LinkedHashMap is set to true
super(16.0.75 f.true);
this.capacity = capacity;
}
@Override
protected boolean removeEldestEntry(Map.Entry eldest) {
return super.size() >= capacity; }}Copy the code
By default LinkedHashMap does not weed out data, so we’ve rewritten its removeEldestEntry() method to weed out data when it reaches a preset limit, with accessOrder set to true meaning sort by order of access. The entire implementation is not a large amount of code and is mainly applied to the features of LinkedHashMap.
Because LinkedHashMap is so useful, we can see that Dubbo’s LRU cache LRUCache is also implemented based on it.
Optimization of LRU algorithm
The naive LRU algorithm has been able to meet the requirements of caching, but there are still some shortcomings. When there is a large amount of hotspot data, the cache has a high hit ratio. However, if there is occasional batch operation, hotspot data will be squeezed out of the container by non-hotspot data, resulting in “pollution” of the cache. Therefore, in order to eliminate this influence, the following optimization methods are derived.
LRU-K
Lru-k algorithm is an improvement of LRU algorithm. The evaluation criterion for entering the cache queue is changed from one access to K access. It can be said that the naive LRU algorithm is LRU-1.
Lru-k algorithm has two queues, one is the cache queue, one is the data access history queue. When accessing a data, the number of times in the access history queue is accumulated first. When the number of access history records exceeds K times, the data is cached to the cache queue to avoid contamination of the cache queue. Simultaneous access to data in the history queue can be eliminated according to LRU rules. The details are shown in the figure below:
Let’s implement an LRU-K cache:
// Inherit LRUCache directly from what we wrote earlier
public class LRUKCache extends LRUCache {
private int k; // Criteria for entering the cache queue
private LRUCache historyList; // Access data history
public LRUKCache(int cacheSize, int historyCapacity, int k) {
super(cacheSize);
this.k = k;
this.historyList = new LRUCache(historyCapacity);
}
@Override
public Integer get(Integer key) {
// Record the number of data accesses
Integer historyCount = historyList.get(key);
historyCount = historyCount == null ? 0 : historyCount;
historyList.put(key, ++historyCount);
return super.get(key);
}
@Override
public Integer put(Integer key, Integer value) {
if (value == null) {
return null;
}
// Return the cached data if it is already in the cache
if (super.get(key) ! =null) {
return super.put(key, value);;
}
// If the number of historical accesses reaches the upper limit, the data is added to the cache
Integer historyCount = historyList.get(key);
historyCount = historyCount == null ? 0 : historyCount;
if (historyCount >= k) {
// Remove historical access records
historyList.remove(key);
return super.put(key, value); }}}Copy the code
The above is a simple model without the necessary concurrency control.
Generally speaking, the higher the value of K is, the higher the cache hit ratio is, but it also makes the cache harder to weed out. Overall, using LRU-2 gives the best performance.
Two Queue
The Two Queue is a variant of LRU-2, changing the data access history to FIFO queues. The benefits are obvious: FIFO is simpler and consumes less resources, but it reduces the cache hit ratio compared to LRU-2.
// Directly inherit LinkedHashMap
public class TwoQueueCache extends LinkedHashMap<Integer.Integer> {
private int k; // Criteria for entering the cache queue
private int historyCapacity; // Maximum size of access data history records
private LRUCache lruCache; // We wrote LRUCache earlier
public TwoQueueCache(int cacheSize, int historyCapacity, int k) {
// Note that accessOrder of LinkedHashMap is set to false
super(a);this.historyCapacity = historyCapacity;
this.k = k;
this.lruCache = new LRUCache(cacheSize);
}
public Integer get(Integer key) {
// Record data access records
Integer historyCount = super.get(key);
historyCount = historyCount == null ? 0 : historyCount;
super.put(key, historyCount);
return lruCache.get(key);
}
public Integer put(Integer key, Integer value) {
if (value == null) {
return null;
}
// Return the cached data if it is already in the cache
if(lruCache.get(key) ! =null) {
return lruCache.put(key, value);
}
// If the number of historical accesses reaches the upper limit, the data is added to the cache
Integer historyCount = super.get(key);
historyCount = historyCount == null ? 0 : historyCount;
if (historyCount >= k) {
// Remove historical access records
super.remove(key);
return lruCache.put(key, value);
}
return null;
}
@Override
protected boolean removeEldestEntry(Map.Entry eldest) {
return super.size() >= historyCapacity; }}Copy the code
The LinkedHashMap is inherited directly, and the accessOrder defaults to false, meaning that it is sorted by insertion order, so the combination of the two forms a FIFO queue. Override the removeEldestEntry() method to automatically weed out the earliest data inserted.
Multi Queue
Compared to the above two optimizations, the implementation of Multi Queue is much more complex. As the name implies, Multi Queue consists of multiple LRU queues. Each LRU queue has a corresponding priority, and the data is calculated and placed in the queue according to the number of accesses.
-
Data insertion and access: When data is inserted for the first time, it is put into the Q0 queue with the lowest priority. When accessed again, it moves to the head of the queue according to the rules of the LRU. After the priority calculated by access times is increased, the system moves the data to the head of the queue with a higher priority and deletes the data from the original queue. Similarly, when the priority of this data is reduced, it is moved to the lower priority queue.
-
Data culling: Data culling always takes place from the end of the lowest priority queue and adds it to the head of the Q-History queue. If the data is accessed in the Q-History data, the priority of the data is recalculated and it is added to the queue with the corresponding priority. Otherwise it is completely eliminated according to the LRU algorithm.
Multi Queue can also be seen as a variant of LRU-K, which expands from two queues to multiple queues. The advantage is that the data in and out of the cache becomes more delicate, but incurs additional overhead.
conclusion
This paper introduces the basic LRU algorithm and several optimization strategies, and compares the similarities and differences between them. I did not think of LRU before there are so many ways, there will be Redis, Mysql for LRU algorithm application of the article.