preface
Redis (a) how to implement fixed size cache?
Java from zero handwriting implementation of Redis (c) Redis expire principle
Java from zero handwriting implementation redis (three) how to restart memory data is not lost?
Java from zero handwriting implementation redis (four) add listener
Java from zero handwriting redis (five) another way to implement expiration strategy
Java from zero handwriting implementation redis (six) AOF persistence principle detailed explanation and implementation
Java from zero handwriting implementation redis (seven) LRU cache elimination strategy in detail
Handwritten Redis from scratch (eight) naive LRU elimination algorithm performance optimization
In this section, we will learn another common cache elimination algorithm, the LFU least frequently used algorithm.
LFU basic knowledge
concept
LFU(Least Frequently Used) is the Least Frequently Used recently. As the name suggests, it’s an algorithm based on frequency of access.
LRU is based on time. It will eliminate the least frequently accessed data at time and put it at the top of the list in terms of algorithm performance. LFU is to weed out the least frequently accessed data at frequency.
Since it is frequency-based, you need to have the number of times each data access is stored.
From the storage space, compared with LRU, there will be more space to hold count.
core idea
If a piece of data has been used infrequently in the recent past, it is less likely to be used in the future.
Implementation approach
O (N) to delete
To weed out the least used data, my first instinct is to use a HashMap
, where String corresponds to key information and Integer corresponds to times.
+1 each time access, set and read time complexity is O(1); But delete is more troublesome, need all traversal comparison, time complexity is O(n);
O (logn) deleted
In addition, another implementation idea is to use small top heap + HashMap. Small top heap insertion and deletion can reach O(logn) time complexity, so the efficiency is more efficient than the first implementation method. Such as TreeMap.
O (1) removed
Can it be further optimized?
In fact, there is an algorithm for O(1), see this paper:
An O(1) algorithm for implementing the LFU cache eviction scheme
A brief personal comment:
If we want to implement O(1) operation, we can’t do without Hash operation. We can implement O(1) put/get in O(N) deletion.
However, deletion performance is poor because it takes the least amount of time to find comparisons.
private Map<K, Node> map; // The mapping between key and data
private Map<Integer, LinkedHashSet<Node>> freqMap; // A linked list of data frequencies and corresponding data
class Node {
K key;
V value;
int frequency = 1;
}
Copy the code
We can basically solve this problem by using a double Hash.
Map stores the mapping between keys and nodes. Put /get is definitely O(1).
The key-mapped node has the corresponding frequency information. The same frequency will be associated with freqMap, you can quickly get the corresponding linked list by frequency.
Deleting is also very simple, it can be determined that the minimum frequency of deleting is 1, if not from 1 at most… N to start the cycle, the minimum freq select the first element of the list to start the deletion.
As for the priority of the list itself, it can be based on FIFO, or whatever you like.
The core content of paper is introduced
The stone of another mountain can attack jade.
Before we implement the code, let’s read the O(1) paper.
introduce
The structure of this article is as follows.
Description of the LFU use case, which can prove superior to other cache eviction algorithms
Dictionary operations that the LFU cache implementation should support. These are the actions that determine the run-time complexity of a policy
A description of the most famous CURRENT LFU algorithms and their runtime complexity
Description of the proposed LFU algorithm; Run time complexity for each operation is O (1)
The purpose of the LFU
Consider a caching network proxy application for the HTTP protocol.
The agent is typically located between the Internet and a user or group of users.
It ensures that all users have access to the Internet and enables the sharing of all shareable resources for optimal network utilization and response speed.
Such a caching agent should try to maximize the amount of data it can cache within the limited amount of storage or memory at its disposal.
Typically, static resources (such as images, CSS stylesheets, and javascript code) can easily be cached for a long time before they are replaced with newer versions.
These static resources, or “assets” as programmers call them, are contained in almost every page, so caching them is most beneficial because they will be needed for almost every request.
In addition, because network agents are required to process thousands of requests per second, the overhead required to do so should be kept to a minimum.
To do this, it should expel only those resources that are not often used.
Therefore, frequently used resources should be kept at less frequently used resources, because the former have proved themselves useful over time.
There is, of course, an argument to the contrary that heavy use of resources may not be needed in the future, but we have found that this is not the case in most cases.
For example, static resources on a heavily used page are always requested by each user on that page.
Therefore, when memory is low, these cache agents can use the LFU cache replacement strategy to expel the least-used items from their cache.
LRU may also be the applicable strategy here, but it will fail when the request pattern keeps all requested items out of the cache and requests them in a circular fashion.
Ps: Circular requests for data will result in the LRU just not being suitable for this scenario.
With LRU, items are constantly coming in and out of the cache without a user requesting access to the cache.
However, under the same conditions, the PERFORMANCE of LFU algorithm is better, and most cache items will result in cache hits.
The pathological behavior of the LFU algorithm is not impossible.
We are not presenting the case for LFU here, but rather trying to demonstrate that if LFU is the applicable strategy, there is a better way to implement it than the previously published approach.
Dictionary operations supported by the LFU cache
When we talk about the cache eviction algorithm, we mainly need to do three different operations on the cached data.
-
Set (or insert) items in the cache
-
Retrieves (or finds) items in the cache; Also increase its usage count (for LFU)
-
Eject (or delete) from the cache least-used (or as a strategy for ejected algorithms)
LFU algorithms are currently most famous for their complexity
At the time of writing, the most famous runtimes for each of the above operations on the LFU cache eviction policy are as follows:
Insert: O (log n)
Search: O (log n)
Delete: O (log n)
These complexity values are derived directly from binomial heap implementations and standard conflict-free hash tables.
LFU caching strategies can be easily and efficiently implemented using minimal heap data structures and hash graphs.
The minimum heap is created based on the usage count and indexes the hash table by the key of the element.
All operations on a conflict-free hash table are in order O (1), so the running time of the LFU cache is controlled by the running time of the operations on the smallest heap.
When an element is inserted into the cache, it enters with a usage count of 1, and since the overhead of inserting the minimum heap is O (log n), it takes O (log n) time to insert it into the LFU cache.
When looking for an element, the element can be found through a hash function that hashes the key to the actual element. At the same time, the count (the count in the largest heap) is incremented by 1, which results in a reorganization of the smallest heap, and the element is removed from the root.
Since the element can be moved down to log(n) level at any stage, this operation also takes O (log n) time.
When an element is selected to eject and eventually deleted from the heap, it can result in a major reorganization of the heap data structure.
The element with the least use count is at the root of the smallest heap.
Removing the root of the smallest heap involves replacing the root node with the last leaf node in the heap and bubbling that node into the correct position.
The runtime complexity of this operation is also O (log n).
LFU algorithm is proposed
The run-time complexity of the proposed LFU algorithm is O (1) for each dictionary operation (insert, find, and delete) that can be performed on the LFU cache.
This is achieved by maintaining a list of 2 links. One for access frequency and one for all elements with the same access frequency.
The hash table is used for keystroke access to elements (not shown in the figure below for clarity).
Double-linked lists are used to link together nodes that represent a set of nodes with the same access frequency (shown as rectangular blocks in the figure below).
We call this double-linked list a frequency list. The set of nodes with the same access frequency is actually a bidirectional linked list of such nodes (shown as circular nodes in the figure below).
We refer to this list of bidirectional links (locally at a particular frequency) as a node list.
Each node in the node list has a pointer to its parent.
List of frequencies (not shown for clarity). So, nodes X and you will have a pointer to node 1, nodes Z and A will have a pointer to node 2, and so on…
The pseudocode below shows how to initialize the LFU cache.
The hash table used for key location elements is represented by key variables.
To simplify the implementation, we use a SET instead of a linked list to store elements with the same access frequency.
A variable entry is a standard SET data structure that contains the keys of such elements with the same access frequency.
Its insert, find, and delete runtime complexity is O (1).
Pseudo code
Behind are some pseudo-code, we are a domestic.
Just to understand the core idea, let’s go to the real code.
feeling
The core of O(1) algorithm is not much, so it should be a medium difficulty problem in Leetcode.
But oddly enough, this paper was published in 2010, and it used to be thought that order logn was the limit, right?
Java code implementation
Basic attributes
public class CacheEvictLfu<K.V> extends AbstractCacheEvict<K.V> {
private static final Log log = LogFactory.getLog(CacheEvictLfu.class);
/** * Key mapping information *@since0.0.14 * /
private final Map<K, FreqNode<K,V>> keyMap;
/** * frequency map *@since0.0.14 * /
private final Map<Integer, LinkedHashSet<FreqNode<K,V>>> freqMap;
/** ** Minimum frequency *@since0.0.14 * /
private int minFreq;
public CacheEvictLfu(a) {
this.keyMap = new HashMap<>();
this.freqMap = new HashMap<>();
this.minFreq = 1; }}Copy the code
Node definition
- FreqNode.java
public class FreqNode<K.V> {
/**
* 键
* @since0.0.14 * /
private K key;
/**
* 值
* @since0.0.14 * /
private V value = null;
/** * frequency *@since0.0.14 * /
private int frequency = 1;
public FreqNode(K key) {
this.key = key;
}
//fluent getter & setter
// toString() equals() hashCode()
}
Copy the code
Remove elements
/** * remove elements ** 1. Remove * 2 from freqMap. Remove * * from keyMap. Update minFreq information * *@paramThe key element *@since0.0.14 * /
@Override
public void removeKey(final K key) {
FreqNode<K,V> freqNode = this.keyMap.remove(key);
//1. Obtain the frequency by key
int freq = freqNode.frequency();
LinkedHashSet<FreqNode<K,V>> set = this.freqMap.get(freq);
//2. Remove the node corresponding to the frequency
set.remove(freqNode);
log.debug("Freq ={} Remove element node: {}", freq, freqNode);
/ / 3. Update minFreq
if(CollectionUtil.isEmpty(set) && minFreq == freq) {
minFreq--;
log.debug("MinFreq reduced to {}", minFreq); }}Copy the code
Update the element
/** * Update elements to update minFreq information *@paramThe key element *@since0.0.14 * /
@Override
public void updateKey(final K key) {
FreqNode<K,V> freqNode = keyMap.get(key);
//1. It already exists
if(ObjectUtil.isNotNull(freqNode)) {
1.1 Remove the original node information
int frequency = freqNode.frequency();
LinkedHashSet<FreqNode<K,V>> oldSet = freqMap.get(frequency);
oldSet.remove(freqNode);
//1.2 Update the minimum data frequency
if (minFreq == frequency && oldSet.isEmpty()) {
minFreq++;
log.debug("MinFreq increased to: {}", minFreq);
}
//1.3 Updates the frequency information
frequency++;
freqNode.frequency(frequency);
// add a new collection
this.addToFreqMap(frequency, freqNode);
} else {
/ / (2) does not exist
//2.1 Build new elements
FreqNode<K,V> newNode = new FreqNode<>(key);
//2.2 is fixed into the list of frequency 1
this.addToFreqMap(1, newNode);
//2.3 Update minFreq information
this.minFreq = 1;
//2.4 Added to keyMap
this.keyMap.put(key, newNode); }}/** * adds frequency MAP *@paramFrequency frequency *@param* / freqNode node
private void addToFreqMap(final int frequency, FreqNode<K,V> freqNode) {
LinkedHashSet<FreqNode<K,V>> set = freqMap.get(frequency);
if (set == null) {
set = new LinkedHashSet<>();
}
set.add(freqNode);
freqMap.put(frequency, set);
log.debug("Freq ={} Add element node: {}", frequency, freqNode);
}
Copy the code
Data selection
@Override
protected ICacheEntry<K, V> doEvict(ICacheEvictContext<K, V> context) {
ICacheEntry<K, V> result = null;
final ICache<K,V> cache = context.cache();
// Remove the least frequent element when the limit is exceeded
if(cache.size() >= context.size()) {
FreqNode<K,V> evictNode = this.getMinFreqNode();
K evictKey = evictNode.key();
V evictValue = cache.remove(evictKey);
log.debug("Eliminate minimum frequency information, key: {}, value: {}, freq: {}",
evictKey, evictValue, evictNode.frequency());
result = new CacheEntry<>(evictKey, evictValue);
}
return result;
}
/** * get the lowest frequency node **@returnResults *@since0.0.14 * /
private FreqNode<K, V> getMinFreqNode(a) {
LinkedHashSet<FreqNode<K,V>> set = freqMap.get(minFreq);
if(CollectionUtil.isNotEmpty(set)) {
return set.iterator().next();
}
throw new CacheRuntimeException("Key of minimum frequency not found");
}
Copy the code
test
code
ICache<String, String> cache = CacheBs.<String,String>newInstance()
.size(3)
.evict(CacheEvicts.<String, String>lfu())
.build();
cache.put("A"."hello");
cache.put("B"."world");
cache.put("C"."FIFO");
// call A
cache.get("A");
cache.put("D"."LRU");
Assert.assertEquals(3, cache.size());
System.out.println(cache.keySet());
Copy the code
The log
[the DEBUG] [21:23:43 2020-10-03. 722] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. AddToFreqMap] add element node - freq = 1: FreqNode{key=A, value=null, Frequency = 1} [DEBUG] [21:23:43 2020-10-03. 723] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. AddToFreqMap] add element node - freq = 1: FreqNode{key=B, value=null, Frequency = 1} [DEBUG] [21:23:43 2020-10-03. 725] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. AddToFreqMap] add element node - freq = 1: FreqNode{key=C, value=null, Frequency = 1} [DEBUG] [21:23:43 2020-10-03. 727] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. AddToFreqMap] - freq = 2 add element node: FreqNode{key=A, value=null, Frequency = 2} [DEBUG] [21:23:43 2020-10-03. 728] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. DoEvict] - out minimum frequency information, the key: B, value: World, freq: 1 [DEBUG] [21:23:43 2020-10-03. 731] [the main] [C.G.H.C.C.S.L.R.C acheRemoveListener. Listen] - Remove the key: B, value: world, type: Evict [DEBUG] [21:23:43 2020-10-03. 732] [the main] [C.G.H.C.C.S.E.C acheEvictLfu. AddToFreqMap] add element node - freq = 1: FreqNode{key=D, value=null, frequency=1} [D, A, C]Copy the code
LFU vs LRU
The difference between
LFU is based on access frequency, while LRU is based on access time.
advantage
Compared with LRU algorithm, LFU algorithm has higher cache hit ratio when data access conforms to normal distribution.
disadvantage
-
LFU is a bit more complex than LRU.
-
The frequency of data access needs to be maintained and updated with each access.
-
Early data is easier to cache than late data, which makes late data difficult to cache.
-
Data added to the cache can easily be culled, such as “jitter” at the end of the cache.
summary
However, in practice, the application scenarios of LFU are not that extensive.
Because real data is skewed and hotspot data is the norm, LRU generally performs better than LFU.
Open source address: github.com/houbb/cache
If you feel this article is helpful to you, welcome to click on it. Your encouragement is my biggest motivation ~
At present, we have realized LRU and LFU algorithms with excellent performance, but the operating system actually uses these two algorithms. In the next section, we will learn the clock elimination algorithm favored by the operating system together.
Do not know what you have gained? Or have other more ideas, welcome to discuss with me in the comments section, looking forward to meeting your thinking.