The full name of LRU algorithm is Least Recently Use, which is widely used in cache mechanism. When the space used by the cache reaches the upper limit, part of the existing data needs to be eliminated to maintain the availability of the cache, and the selection of the eliminated data is accomplished through the LRU algorithm.

The basic idea of LRU algorithm is time locality based on locality principle:

If an information item is being accessed, it is likely to be accessed again in the near future.

So, as the name implies, the LRU algorithm selects the least recently used data for elimination.

The principle of

Generally speaking, LRU maintains the order or time of accessing data and the data itself in a container. When accessing a data:

  1. If the data is not in the container, set the priority of the data to the highest and add the data to the container.
  2. If the data is in the container, update the data with the highest priority.

When the total amount of data reaches the upper limit, the data with the lowest priority is removed from the container. Here is a simple schematic diagram of the LRU principle:

If we access the data in the order of 7 0 1 2 0 3 0 4 and the total amount of data is 3, as shown in the figure above, the LRU algorithm will eliminate the data 7 1 2 in turn.

Naive LRU algorithm

So we now follow the above principle, to implement a naive LRU algorithm. There are three options:

  1. Based on the array

    Solution: Attach an additional attribute — a timestamp — to each piece of data, and update the timestamp to the current time each time the data is accessed. When the data space is full, the entire array is scanned and the data with the smallest timestamp is eliminated.

    Disadvantages: Extra space is required to maintain timestamps, and the entire array is scanned when data is weeded out.

  2. Bidirectional linked list based on finite length

    Solution: When accessing a data, if the data is not in the list, insert the data into the head of the list, if in the list, move the data to the head of the list. When the data space is full, the data at the end of the list is eliminated.

    Disadvantage: When inserting or fetching data, scan the entire list.

  3. Based on bidirectional linked lists and hash tables

    Solution: In order to improve the above defect of scanning linked list, cooperate with hash table to form mapping between data and nodes in the linked list, and reduce the time complexity of insert operation and read operation from O(N) to O(1).

LRU based on bidirectional linked list + hash table

Here we will implement an LRU algorithm based on bidirectional linked list and hash table

public class LRUCache {
    private int size; // Current capacity
    private int capacity; // Limit the size
    private Map<Integer, DoubleQueueNode> map; // Map data to nodes in the linked list
    private DoubleQueueNode head; // Headers avoid null checks
    private DoubleQueueNode tail; // Endnodes avoid null checking
    
    public LRUCache(int capacity) {
        this.capacity = capacity;
        this.map = new HashMap<>(capacity);
        this.head = new DoubleQueueNode(0.0);
        this.tail = new DoubleQueueNode(0.0);
        this.head.next = tail;
    }

    public Integer get(Integer key) {

        DoubleQueueNode node = map.get(key);
        if (node == null) {
            return null;
        }

        // If the data is in the list, it moves to the head of the list
        moveToHead(node);

        return node.val;
    }

    public Integer put(Integer key, Integer value) {

        Integer oldValue;
        DoubleQueueNode node = map.get(key);
        if (node == null) {
            // Discard data
            eliminate();
            // The data is not in the list, insert the data to the header
            DoubleQueueNode newNode = new DoubleQueueNode(key, value);
            DoubleQueueNode temp = head.next;
            head.next = newNode;
            newNode.next = temp;
            newNode.pre = head;
            temp.pre = newNode;
            map.put(key, newNode);
            size++;
            oldValue = null;
        } else {
            // If the data is in the list, it moves to the head of the list
            moveToHead(node);
            oldValue = node.val;
            node.val = value;
        }
        return oldValue;
    }

    public Integer remove(Integer key) {
        DoubleQueueNode deletedNode = map.get(key);
        if (deletedNode == null) {
            return null;
        }
        deletedNode.pre.next = deletedNode.next;
        deletedNode.next.pre = deletedNode.pre;
        map.remove(key);
        return deletedNode.val;
    }
    
    // Insert the node into the header node
    private void moveToHead(DoubleQueueNode node) {
        node.pre.next = node.next;
        node.next.pre = node.pre;
        DoubleQueueNode temp = head.next;
        head.next = node;
        node.next = temp;
        node.pre = head;
        temp.pre = node;
    }

    private void eliminate(a) {
        if (size < capacity) {
            return;
        }
        
        // Remove the last node in the list
        DoubleQueueNode last = tail.pre;
        map.remove(last.key);
        last.pre.next = tail;
        tail.pre = last.pre;
        size--;
        last = null; }}// Two-way linked list node
class DoubleQueueNode {
    int key;
    int val;
    DoubleQueueNode pre;
    DoubleQueueNode next;
    public DoubleQueueNode(int key, int val) {
        this.key = key;
        this.val = val; }}Copy the code

Basically, the above LRU algorithm idea is implemented with the code again, relatively simple, just need to pay attention to the pre and next two Pointers pointing and synchronous update hash table, put() and GET () operation time complexity is O(1), space complexity is O(N).

LRU based on the LinkedHashMap implementation

We can implement LRU directly from the LinkedHashMap provided by the JDK. Because the underlying LinkedHashMap is a combination of a bidirectional linked list and a hash table, it can be used directly.

public class LRUCache extends LinkedHashMap {

    private int capacity;

    public LRUCache(int capacity) {
        // Note that accessOrder of LinkedHashMap is set to true
        super(16.0.75 f.true);
        this.capacity = capacity;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return super.size() >= capacity; }}Copy the code

By default LinkedHashMap does not weed out data, so we’ve rewritten its removeEldestEntry() method to weed out data when it reaches a preset limit, with accessOrder set to true meaning sort by order of access. The entire implementation is not a large amount of code and is mainly applied to the features of LinkedHashMap.

Because LinkedHashMap is so useful, we can see that Dubbo’s LRU cache LRUCache is also implemented based on it.

Optimization of LRU algorithm

The naive LRU algorithm has been able to meet the requirements of caching, but there are still some shortcomings. When there is a large amount of hotspot data, the cache has a high hit ratio. However, if there is occasional batch operation, hotspot data will be squeezed out of the container by non-hotspot data, resulting in “pollution” of the cache. Therefore, in order to eliminate this influence, the following optimization methods are derived.

LRU-K

Lru-k algorithm is an improvement of LRU algorithm. The evaluation criterion for entering the cache queue is changed from one access to K access. It can be said that the naive LRU algorithm is LRU-1.

Lru-k algorithm has two queues, one is the cache queue, one is the data access history queue. When accessing a data, the number of times in the access history queue is accumulated first. When the number of access history records exceeds K times, the data is cached to the cache queue to avoid contamination of the cache queue. Simultaneous access to data in the history queue can be eliminated according to LRU rules. The details are shown in the figure below:

Let’s implement an LRU-K cache:

// Inherit LRUCache directly from what we wrote earlier
public class LRUKCache extends LRUCache {
    
    private int k; // Criteria for entering the cache queue
    private LRUCache historyList; // Access data history

    public LRUKCache(int cacheSize, int historyCapacity, int k) {
        super(cacheSize);
        this.k = k;
        this.historyList = new LRUCache(historyCapacity);
    }

    @Override
    public Integer get(Integer key) {

        // Record the number of data accesses
        Integer historyCount = historyList.get(key);
        historyCount = historyCount == null ? 0 : historyCount;
        historyList.put(key, ++historyCount);

        return super.get(key);
    }

    @Override
    public Integer put(Integer key, Integer value) {

        if (value == null) {
            return null;
        }
        
        // Return the cached data if it is already in the cache
        if (super.get(key) ! =null) {
            return super.put(key, value);;
        }

        // If the number of historical accesses reaches the upper limit, the data is added to the cache
        Integer historyCount = historyList.get(key);
        historyCount = historyCount == null ? 0 : historyCount;
        if (historyCount >= k) {
            // Remove historical access records
            historyList.remove(key);
            return super.put(key, value); }}}Copy the code

The above is a simple model without the necessary concurrency control.

Generally speaking, the higher the value of K is, the higher the cache hit ratio is, but it also makes the cache harder to weed out. Overall, using LRU-2 gives the best performance.

Two Queue

The Two Queue is a variant of LRU-2, changing the data access history to FIFO queues. The benefits are obvious: FIFO is simpler and consumes less resources, but it reduces the cache hit ratio compared to LRU-2.

// Directly inherit LinkedHashMap
public class TwoQueueCache extends LinkedHashMap<Integer.Integer> {

    private int k; // Criteria for entering the cache queue
    private int historyCapacity; // Maximum size of access data history records
    private LRUCache lruCache; // We wrote LRUCache earlier

    public TwoQueueCache(int cacheSize, int historyCapacity, int k) {
        // Note that accessOrder of LinkedHashMap is set to false
        super(a);this.historyCapacity = historyCapacity;
        this.k = k;
        this.lruCache = new LRUCache(cacheSize);
    }

    public Integer get(Integer key) {
        // Record data access records
        Integer historyCount = super.get(key);
        historyCount = historyCount == null ? 0 : historyCount;
        super.put(key, historyCount);
        return lruCache.get(key);
    }

    public Integer put(Integer key, Integer value) {

        if (value == null) {
            return null;
        }

        // Return the cached data if it is already in the cache
        if(lruCache.get(key) ! =null) {
            return lruCache.put(key, value);
        }

         // If the number of historical accesses reaches the upper limit, the data is added to the cache
        Integer historyCount = super.get(key);
        historyCount = historyCount == null ? 0 : historyCount;
        if (historyCount >= k) {
            // Remove historical access records
            super.remove(key);
            return lruCache.put(key, value);
        }

        return null;
    }

    @Override
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return super.size() >= historyCapacity; }}Copy the code

The LinkedHashMap is inherited directly, and the accessOrder defaults to false, meaning that it is sorted by insertion order, so the combination of the two forms a FIFO queue. Override the removeEldestEntry() method to automatically weed out the earliest data inserted.

Multi Queue

Compared to the above two optimizations, the implementation of Multi Queue is much more complex. As the name implies, Multi Queue consists of multiple LRU queues. Each LRU queue has a corresponding priority, and the data is calculated and placed in the queue according to the number of accesses.

  • Data insertion and access: When data is inserted for the first time, it is put into the Q0 queue with the lowest priority. When accessed again, it moves to the head of the queue according to the rules of the LRU. After the priority calculated by access times is increased, the system moves the data to the head of the queue with a higher priority and deletes the data from the original queue. Similarly, when the priority of this data is reduced, it is moved to the lower priority queue.

  • Data culling: Data culling always takes place from the end of the lowest priority queue and adds it to the head of the Q-History queue. If the data is accessed in the Q-History data, the priority of the data is recalculated and it is added to the queue with the corresponding priority. Otherwise it is completely eliminated according to the LRU algorithm.

Multi Queue can also be seen as a variant of LRU-K, which expands from two queues to multiple queues. The advantage is that the data in and out of the cache becomes more delicate, but incurs additional overhead.

conclusion

This paper introduces the basic LRU algorithm and several optimization strategies, and compares the similarities and differences between them. I did not think of LRU before there are so many ways, there will be Redis, Mysql for LRU algorithm application of the article.