Basic Concepts of caching
Caching usually means using a memory that responds faster than the original memory. Using this kind of memory can improve the data access performance, thus improving the overall responsiveness of the system.
But caches are not without their drawbacks, and caches are typically one level more expensive than comparable storage.
In the computer system, the most common storage medium is disk, which has a large storage capacity and is relatively cheap, but runs slowly. Memory is a faster storage device, but it is more expensive and has very limited capacity.
For example, the price of an ordinary 8G DDR4 memory is about 200-300 yuan, which is similar to the price of a 1T capacity mechanical hard disk. Therefore, the cache capacity is not unlimited waste, we must set the cache size properly, maximize the use of resources.
MySQL, the commonly used data storage middleware, stores data in disk, while Redis, the cache middleware, stores data in memory, so Redis can help us improve the speed of accessing data.
The cache elimination policy usually refers to the policy of which data should be eliminated when a new cache is added to a limited storage space and the original cache space is full.
Cache elimination algorithm
Common cache elimination algorithms include LFU, LRU, ARC, MRU, NMRU, FIFO and 2Q. They are briefly introduced below.
LFU (further Frequently informs) :The least recently used algorithm that filters data based on its historical frequency of access.
LRU (further Recently User) :The least recent algorithm is used to weed out data based on its historical access history.
The ARC (the Adaptive Replacement Cache) :Adaptive cache replacement algorithm, which combines LRU and LFU to get the best use of available caches.
MRU:The recently-used algorithm removes the most recently-used data first.
NMRU:The non-recently used algorithm randomly selects one of the recently unused contents as a replacement object.
First in First out:First in, first out, first in, first out.
2 q (Two in the queues) :There are two cache queues, a FIFO queue and an LRU queue. When the data is accessed for the first time, 2Q algorithm will cache the data in FIFO queue. When the data is accessed for the second time, the data will be moved from FIFO queue to LRU queue, and the two queues will eliminate the data in their own way.
There are many cache elimination algorithms, and today we mainly talk about the LRU algorithm, which is very common.
LRU implementation idea
LeetCode 146 requires the design and implementation of LRU, but LeetCode has only two types of operations: PUT and GET. In practice, this may not be so easy. The LRU interface is defined as follows:
type Key interface{}
type Value interface{}
type Node {
Key Key
Value Value
}
type LRU interface {
Add(key Key, value Value) error
Get(key Key) (value Value, ok bool)
GetAll()([]*Node)
Del(key Key)
}
Copy the code
Hash table addend group
The simplest idea is to use an array as the underlying data storage structure, and then use a hash table to store the subscript corresponding to the Key. Here is the code implementation:
package lru
import (
"errors"
"sync"
)
type LRU struct {
mu *sync.Mutex
keys map[Key]int
l []*Node
max int
}
func New(max int) *LRU {
return &LRU{
mu: new(sync.Mutex),
l: make([]*Node, 0),
max: max,
keys: make(map[Key]int, max),
}
}
func (l *LRU) Add(key Key, value Value) error {
if l.l == nil {
return errors.New("not init lru")}if l.max <= 0 {
return nil
}
l.mu.Lock()
defer l.mu.Unlock()
if idx, ok := l.keys[key]; ok {
for i, el := range l.l {
if i > idx {
l.l[i- 1] = el
}
}
v := &Node{key, value}
l.l[len(l.l)- 1] = v
return nil
}
if len(l.l) == l.max {
for i, el := range l.l {
if i == 0 {
continue
}
l.l[i- 1] = el
}
l.l[len(l.l)- 1] = &Node{key, value}
for k, v := range l.keys {
if v == 0 {
delete(l.keys, k)
continue
}
l.keys[k] = v - 1
}
l.keys[key] = len(l.l) - 1
return nil
}
l.l = append(l.l, &Node{Key: key, Value: value})
l.keys[key] = len(l.l) - 1
return nil
}
func (l *LRU) Get(key Key) (value Value, ok bool) {
if l.keys == nil {
return nil.false
}
idx, ok := l.keys[key]
v := l.l[idx]
if! ok {return nil, ok
}
for i, v := range l.l {
if i > idx {
l.l[i- 1] = v
}
}
l.l[len(l.l)- 1] = v
return v.Value, ok
}
func (l *LRU) GetAll(a)[] *Node {
l.mu.Lock()
defer l.mu.Unlock()
return l.l
}
func (l *LRU) Del(key Key) {
l.mu.Lock()
defer l.mu.Unlock()
idx, ok := l.keys[key]
if! ok {return
}
delete(l.keys, key)
for i, el := range l.l {
if i > idx {
l.l[i- 1] = el
}
}
}
Copy the code
Intuitively, the time complexity of each query should be O(1), and the time complexity of both insert and delete is O(n). However, the time complexity of the query is also O(n) because all elements after the element are subscript -1 and the value of the query is placed at the end of the array.
operation | Time complexity |
---|---|
insert | O(n) |
delete | O(n) |
The query | O(n) |
rearrangement | O(n) |
Hash list plus linked list
Since the algorithm of hash table addend group has high time complexity and unsatisfactory performance, we can try to use linked list to improve performance. Due to the nature of linked lists, time complexity is guaranteed to be O(n) only when the data is queried, and O(1) in all other cases. But using a simple linked list, the query is still slow because it traverses the entire list. Another way to think about it is to use a hash table to store a pointer to a node for query purposes. Use linked lists to store a copy of data for updating. Data is placed at the head of the list each time data is inserted, data is moved to the head of the list each time data is queried, and data at the end of the list is discarded when the list is full. Thus, we can achieve O(1) time complexity for both queries and updates. Here is the code implementation:
package lru
import (
"container/list"
"github.com/pkg/errors"
"sync"
)
type LRUCache struct {
max int
l *list.List
Call func(key interface{}, value interface{})
cache map[interface{}]*list.Element
mu *sync.Mutex
}
type Key interface{}
type Value interface{}
type Node struct {
Key Key
Value Value
}
func New(max int) *LRUCache {
return &LRUCache{
max: max,
l: list.New(),
cache: make(map[interface{}]*list.Element, max),
mu: new(sync.Mutex),
}
}
func (l *LRUCache) Add(key interface{}, value interface{}) error {
if l.l == nil {
return errors.New("not init lru")
}
l.mu.Lock()
if e, ok := l.cache[key]; ok {
e.Value.(*Node).Value = value
l.l.MoveToFront(e)
l.mu.Unlock()
return nil
}
element := l.l.PushFront(&Node{
key, value,
})
l.cache[key] = element
ifl.max ! =0 && l.l.Len() > l.max {
ife := l.l.Back(); e ! =nil {
l.l.Remove(e)
node := e.Value.(*Node)
delete(l.cache, node.Key)
ifl.Call ! =nil {
l.Call(node.Key, node.Value)
}
}
}
l.mu.Unlock()
return nil
}
func (l *LRUCache) GetAll(a)[] *Node {
l.mu.Lock()
var data []*Node
for _, v := range l.cache {
data = append(data, v.Value.(*Node))
}
l.mu.Unlock()
return data
}
func (l *LRUCache) Get(key interface{}) (value interface{}, ok bool) {
if l.cache == nil {
return nil.false
}
l.mu.Lock()
if element, ok := l.cache[key]; ok {
l.l.MoveToFront(element)
l.mu.Unlock()
return element.Value.(*Node).Value, true
}
l.mu.Unlock()
return nil.false
}
func (l *LRUCache) Del(key interface{}) {
if l.cache == nil {
return
}
l.mu.Lock()
if element, ok := l.cache[key]; ok {
delete(l.cache, element)
ife := l.l.Back(); e ! =nil {
l.l.Remove(e)
delete(l.cache, key)
ifl.Call ! =nil {
node := e.Value.(*Node)
l.Call(node.Key, node.Value)
}
}
}
l.mu.Unlock()
}
Copy the code
Third-party libraries
In addition to the above implementation, there are some mature third-party cache libraries, most of which come with LRU implementations ready to use out of the box. Examples include Golang/Groupcache, Allegro/Bigcache and Coocood/Freecache. However, it is important to note that some of these LRU implementations are not concurrency safe.