When using Redis, we usually set a size for the Redis cache and do not allow unlimited data to be stored in the Redis cache.

For Redis, once you have determined the maximum cache size, say 4GB, you can set the cache size using the following command:

CONFIG SET maxmemory 4gb

Redis sets the size of the cache, so it is inevitable that the cache will be full. We need to face replacement operations when the cache is full. Cache replacement requires solving two problems: deciding which data to weed out, and what to do with the data that is weeded out.

What are Redis’ elimination strategies

Redis provides eight cache flushing policies, volatile- LFU and allKeys-lfu are new to Redis 4.0.

  • Noevction: Once the cache is full, Redis will not serve any more requests and will return an error. When Redis is used as a cache, the actual data set is usually larger than the cache capacity, and there is always new data to write to the cache. This policy does not eliminate data itself, so it does not free up new cache space, so we do not use it in Redis cache.
  • Volatile – TTL deletes the key value pairs with expiration time. The earlier the key pair expires, the earlier the key pair is deleted.
  • Volatile -random, as its name suggests, randomly deletes key – value pairs with expiration dates.
  • Volatile – lRU filters key-value pairs with expiration dates using the LRU algorithm (described below).
  • Volatile – lFU selects key-value pairs with expiration times using the LFU algorithm (described below).
  • Allkeys-random policy that randomly selects and deletes data from all key-value pairs.
  • Allkeys-lru policy, which uses the LRU algorithm to filter all data.
  • Allkeys-lfu policy, which uses LFU algorithm to filter all data.

The AllKeys-LRU policy is generally preferred. In this way, you can make full use of the advantages of LRU, the classical cache algorithm, and keep the most recently accessed data in the cache, improving the application access performance. If there is a clear distinction between hot and cold data in your business data, I recommend using the AllKeys-LRU strategy.

If the data access frequency of service applications is similar and there is no obvious distinction between hot and cold data, it is recommended to use the Allkeys-random policy to randomly select the discarded data.

LRU and LFU algorithms in Redis

LRU algorithm

The full name of LRU algorithm is Least Recently Used. As can be seen from its name, it filters data according to the principle of Least Recently Used. The Least frequently Used data will be filtered out, while the Recently frequently Used data will be kept in the cache.

LRU organizes all the data into a linked list, with the MRU end and the LRU end representing the most recently used data and the least recently used data. Let’s look at an example.

If there is a new data 45 to be written to the cache, but there is no space left in the cache, the LRU algorithm does two things: data 45 has just been accessed, so it will be placed on the MRU side; The algorithm deletes data 5 at the LRU end from the cache, and there is no record of data 5 in the corresponding linked list.

LRU thinks that the data that has just been accessed will definitely be accessed again, so it is placed on the MRU side; Data that has not been accessed for a long time will certainly not be accessed again, so it is gradually moved back to the LRU side and deleted first when the cache is full.

When the LRU algorithm is actually implemented, it needs to manage all the cached data with a linked list, which incurs additional space overhead. In addition, when data is accessed, the data needs to be moved to the MRU side of the linked list. If a large amount of data is accessed, a lot of linked list movement operations will occur, which will be time-consuming and reduce Redis cache performance.

Therefore, in Redis, the LRU algorithm is simplified to mitigate the impact of data obsolescence on cache performance. Specifically, Redis by default records the timestamp of the last access to each piece of data (recorded by the LRU field in the key-value pair data structure RedisObject). Redis then selects N randomly for the first time as a candidate set when deciding which data to eliminate. Redis then compares the LRU fields of the N data and removes the data with the smallest LRU field value from the cache.

Redis provides a configuration parameter, maxmemory-samples, which is the number of data selected by Redis, N. For example, we can have Redis select 100 candidate datasets by executing the following command:

CONFIG SET maxmemory-samples 100

When the data needs to be weeded out again, Redis needs to pick the data into the candidate set created during the first weeding. The selection criteria here is that the LRU field value of the data entering the candidate set must be less than the smallest LRU value in the candidate set. When new data is entered into the candidate dataset, Redis will exclude the data with the smallest LRU field value from the candidate dataset if the number of data in the candidate dataset reaches maxmemory-samples. This improves the performance of the Redis cache by eliminating the need to maintain a large linked list for all data and moving linked list entries each time data is accessed.

LFU algorithm

LFU is appeared after Redis4.0, LRU least recently used in fact is not accurate, consider the following situation, if in the | deleted, then the longest period of A distance, but in fact A frequently used than B is frequent, so reasonable selection strategy should be eliminated B. The LFU was born to deal with this situation.

LFU (Least Frequently Used) : the Least Frequently Used one is eliminated.

LRU (Least Recently Used): the LRU (Least Recently Used) is Used Least Recently. It is related to the last time when it was Used.

~~~~~A~~~~~A~~~~~A~~~~A~~~~~A~~~~~A~~|
~~R~~R~~R~~R~~R~~R~~R~~R~~R~~R~~R~~R~|
~~~~~~~~~~C~~~~~~~~~C~~~~~~~~~C~~~~~~|
~~~~~D~~~~~~~~~~D~~~~~~~~~D~~~~~~~~~D|
Copy the code

Each tilde represents one second, A every five seconds, R every two seconds, C and D every ten seconds. The most recent accessed character is D, but it is clear that the next accessed character is more likely to be R than D.

LRU implementation is relatively simple, the simplest only need to list and Map, remove elements directly from the end of the list, add to the head can be added.

But actually the Redis implementation is not like this, the Redis implementation is very straightforward, almost is LRU itself meaning. Redis itself has a global clock server.lruclock (unit is second,24 bits, 190 days will overflow), and then random sampling N key access time, from the present longest, eliminate it.

A redis object (simply a key-value) is defined as follows:

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;
Copy the code

LFU implementation is more complex and requires consideration of several issues:

  1. If implemented as a linked list, it may be inefficient to move objects to an ordered position in the list by number of accesses when they are accessed, because there may be a large number of keys with the same number of accesses, worst-case O(n).
  2. Some key accesses can be very large, theoretically infinitely large, but in practice we don’t need exact accesses
  3. A key with a particularly high number of accesses may never be accessed again, but because the number of accesses is so large that it is still consuming memory, a method is needed to progressively “drive out” (LRU). The simplest method is to progressively decrement the number of accesses

Redis uses only 24bits (server.lruclock is also 24bits) to record the above information, not 24bytes, even 32-bit Pointers!

16bit: Last decrement time (solve the third problem)

8bit: Number of accesses (solve the second problem)

The number of visits is calculated as follows:

uint8_t LFULogIncr(uint8_t counter) { if (counter == 255) return 255; double r = (double)rand()/RAND_MAX; double baseval = counter - LFU_INIT_VAL; if (baseval < 0) baseval = 0; Double p = 1.0 / (baseval * server. Lfu_log_factor + 1); if (r < p) counter++; return counter; }Copy the code

The core is that the greater the number of accesses, the less likely the number of accesses will be incremented. The maximum value is 255. In addition, you can specify the increment of the number of accesses in the configuration redis.conf.

Since the number of accesses is limited, the first problem is solved as well, just a 255 array or a linked list.

How do I use the 16bit part? The last 16 bits (minutes) of the timestamp are saved, indicating the time of the last decrement. The algorithm is executed like this: random sampling N keys (same as the original version), check the decrement time, if the distance is more than N minutes (configurable), then decrement or halve (if the number of accesses is large).

In addition, since the number of accesses to the new key is likely to be smaller than the number of accesses to the old key that is not accessed, the number of accesses to the new key is set to 5 in order not to be immediately obsolete.