The “Playing with Redis” series focuses on the basics and advanced applications of Redis. This is the first article in the Redis series [14]. Please visit the public account “Zxiaofan” or search for “Redis zxiaofan” on Baidu.

Key words: Redis, Redis data elimination strategy, 8 kinds of data elimination strategy, Redis cache full how to do, Redis approximate LRU algorithm, Redis LFU algorithm;

Redis: How to Import, Export, and Delete Large Amounts of Data in a Production Environment

The outline

  • Why does Redis need a data elimination mechanism?
  • Redis’ 8 Data elimination strategies
  • Redis approximate LRU algorithm
    • Principle of LRU algorithm
    • Principle of Approximated LRU Algorithm
  • Redis LFU algorithm
    • Difference between LFU and LRU
    • Principle of LFU algorithm
  • The little knowledge
    • Why does Redis use its own clock?
    • How do I find hot keys?

1. Why does Redis need data elimination mechanism?

As we all know, Redis, as a well-known in-memory NOSQL, has greatly improved the performance of program access to data. In high-performance Internet applications, Redis can almost be seen. In order to improve system performance, Redis has also developed from stand-alone version, master/slave version to cluster version, read/write separation cluster version and so on. There are also many well-known three-party extension libraries (such as Codis and Twemproxy) in the industry.

The enterprise version of Alibaba Cloud Redis (Tair) has a performance enhanced cluster version that is even more “unhuman”, with a memory capacity of 4096 GB and a support of about 61440,000 QPS. Tair Hybrid storage edition is a clustered Redis instance that uses both memory and disk to store data at the same time. The maximum size is 1024 GB of memory and 8192 GB of disk (16 nodes). help.aliyun.com/document_de…

If Redis is so great, should we just store our data in it?

32GB DDR4 memory module is about 900 yuan, and 1TB SSD disk is about 1000 yuan, the price is really a disparity. In addition, even if the amount of data is large, but the common data is relatively small, full memory cost performance is too low. The 80/20 principle applies here, too.

Since memory space is limited, to avoid full memory, you definitely need to perform memory data flushing.

  • Cost performance;
  • Limited memory space;

2. Redis’ 8 data elimination strategies

In redis.conf, you can set maxMemory to the maximum memory size of redis. If maxMemory is set to 0, it indicates that there is no maximum memory limit in a 64-bit system and 3 GB in a 32-bit system. When the actual mem_used memory reaches the set threshold maxmemory, Redis will perform data elimination according to the preset elimination strategy.

# redis.conf maximum memory configuration example, public id zxiaofan# if not, the unit is byte <bytes> maxmemory1048576
maxmemory 1048576B
maxmemory 1000KB
maxmemory 100MB
maxmemory 1GB
maxmemory 1000K
maxmemory 100M
maxmemory 1G
Copy the code

In addition to modifying the configuration in the configuration file, you can also modify maxMemory dynamically using the config command.

# redis maxMemory dynamic setting and viewing command example, public id zxiaofanChange maxMemory config dynamicallyset maxmemory 10GB # check maxmemory config get maxmemory info memory | grep maxmemory redis - cli - h127.0.01 -p 6379 config get maxmemory
Copy the code

Next, we will talk about 8 data elimination strategies. Starting with Redis 4.0, there are 8 data elimination mechanisms.

Name of elimination Strategy Policy implications
noeviction Default policy, do not eliminate data; Most write commands will return an error (with a few exceptions such as DEL)
allkeys-lru Select data from all data according to LRU algorithm
volatile-lru Cull data from data with expiration time set according to LRU algorithm
allkeys-random From all the data selected at random
volatile-random Data is randomly selected from data with expiration time set
volatile-ttl From the data whose expiration time is set, select and delete the data whose expiration time is earlier
allkeys-lfu Select data from all data according to LFU algorithm for elimination (version 4.0 and above available)
volatile-lfu Select data obsolescence based on LFU algorithm from data with expiration time set (version 4.0 and above available)
// redis.conf, redis 6.0.6
// The default policy is Noeviction, recommended modification in production environments.
# The default is:
#
# maxmemory-policy noeviction
Copy the code
// Set maxmemory-policy online

config set maxmemory-policy volatile-lfu
Copy the code

Write commands that noeviction involves to return errors include:

set,setnx,setex,append,incr,decr,rpush,lpush,rpushx,lpushx,linsert,lset,rpoplpush,sadd,sinter,sinterstore,sunion,sunions tore,sdiff,sdiffstore,zadd,zincrby,zunionstore,zinterstore,hset,hsetnx,hmset,hincrby,incrby,decrby,getset,mset,msetnx,ex Ec, sort.

Allkeys, with the exception of Noeviction, will be voted out of all data, and volatile, will be voted out of data with expiration time set. The elimination algorithm is divided into LRU, RANDOM, TTL and LFU.

Let’s summarize it with a picture:

3. Approximate LRU algorithm of Redis

Before understanding the Redis approximation LRU algorithm, let’s first understand the native LRU algorithm.

3.1 LRU algorithm

LRU (Least Recently Used) Used Least Recently. The core idea is that “if data has been accessed recently, it has a higher chance of being accessed in the future”.

The underlying structure of LRU is hash table + bidirectional linked list. The hash table is used to ensure that the time complexity of the query operation is O(1), and the bidirectional linked list is used to ensure that the time complexity of node insertion and node deletion is O(1).

Why a bidirectional list instead of a single list? Singly linked lists can insert a new node implementation head and tail to remove the old node time complexity is O (1), but for the intermediate node time complexity is O (n), because for the intermediate node c, we need to move the node c to the head, only know his next node at this time, to know it a node need to traverse the entire list, Time complexity is O(n).

The LRU GET operation: moves the node to the head of the list if it exists and returns the value of the node; LRU PUT operation: ① If the node does not exist, add a new node and PUT it in the head of the linked list. ② If the node exists, update the node and put it in the head of the linked list.

LRU algorithm source code can refer to Leetcode:www.programcreek.com/2013/03/lee… .

# LRU algorithm underlying structure pseudo-code, public id Zxiaofanclass Node{
    int key;
    int value;
    Node prev;
    Node next;
}

class LRUCache {
    Node head;
    Node tail;
    HashMap<Integer, Node> map = null;
    int cap = 0;
 
    public LRUCache(int capacity) {
        this.cap = capacity;
        this.map = new HashMap<>();
    }
 
    public int get(int key) {}public void put(int key, int value) {
        // Move /add to tail should be to head.
    }
 
    private void removeNode(Node n){}private void offerNode(Node n){}}Copy the code

[LRU cache] [Hash + bidirectional list] structure diagram

3.2 Approximated LRU Algorithm

Why doesn’t Redis use the native LRU algorithm?

  • The native LRU algorithm requires bidirectional linked lists to manage data and requires extra memory.
  • Data access involves data movement and has performance loss.
  • The existing data structure of Redis needs to be reformed;

This in turn can answer another question: what are the advantages of Redis approximation LRU algorithm?

In Redis, the underlying structure of Redis key is redisObject. The LRU :LRU_BITS field in redisObject is used to record the Redis clock when the key was accessed last time. The lookupKey method is called to update the clock for that key).

Server. Lruclock is actually a 24bit integer. By default, it is the result of modulo 2^24 of Unix timestamp, and its precision is milliseconds.

# Redis key () {server.htypedef struct redisObject {
    unsigned type:4; / / type
    unsigned encoding:4; / / code
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */
    int refcount; // Reference count
    void *ptr; // A pointer to the data structure where the actual value is stored. The data structure is determined by type and encoding.
} robj;
Copy the code

How is the value of server.lruclock updated?

When Redis is started, serverCron is registered as a time event with aeCreateTimeEvent in the initServer method (serverCron is the core Redis timing handler). ServerCron triggers the Redis clock update method server.lruclock = getLRUClock().

// Redis core timed task serverCron, source code is located in sercer.c, public id zxiaofan

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {.../* We have just LRU_BITS bits per object for LRU information. * So we use an (eventually wrapping) LRU clock. * * Note that even if the counter wraps it's not a big problem, * everything will still work but some object will appear younger * to Redis. However for this to happen a given object should never be * touched for all the time needed to the counter to wrap, which is * not likely. * * Note that you can change the resolution altering the * LRU_CLOCK_RESOLUTION define. */server.lruclock = getLRUClock(); . }Copy the code

When mem_used > maxMemory, Redis completes data elimination through the freeMemoryIfNeeded method. The LRU policy eliminates the core logic in the evictionPoolPopulate method.

Redis approximate LRU elimination strategy logic:

  • First elimination: select at most N data randomly into evictionPoolEntry to be eliminated;
    • Data volume N: Determined by maxmemory-samples configured with redis.conf. The default value is 5. Setting it to 10 would be very close to the real LRU effect, but more cpu-consuming;
    • Samples n. V. sampling;
  • Elimination again: at most N data are randomly selected. As long as the LRU of the data is smaller than any one of the data in evictionPoolEntry, the data will be filled into the data pool to be eliminated.
    • The capacity of evictionPoolEntry is EVPOOL_SIZE = 16;
    • See the comments on the evictionPoolPopulate method in the source code.
  • Perform elimination: Select the smallest LRU data in the [data pool to be eliminated] for elimination;

DictGetSomeKeys specifies the maximum number of times a single data search is set to [maxsteps = count*10;]. , the value of count is actually maxmemory-samples. Another important takeaway from this is that the data retrieved at a single time may not reach maxmemory-samples. Also, if the amount of Redis data (all data or data with an expiration date) is inherently smaller than maxmemory-samples, then the count value is equal to the amount of data in Redis.

Product and technology interworking: many products in the market also have similar logic, the list page is not forced to return the specified number of pages, adjust to manual sliding, each time the number of pages returned is not fixed, the background as needed asynchronous pull, for the user is continuous sliding and no sense.

// 
// The source code is located in server.c, public id zxiaofan.
// call lookupCommand() to get the Redis command.
// check whether the command can be executed (including execution data elimination).
// 3. Call the call() method to execute the command.

int processCommand(client *c) {...if(server.maxmemory && ! server.lua_timedout) {intout_of_memory = freeMemoryIfNeededAndSafe() == C_ERR; . }}Copy the code
// The source code is located at evict.c, public id zxiaofan.
/* This is a wrapper for freeMemoryIfNeeded() that only really calls the * function if right now there are the conditions to do so safely: * * - There must be no script in timeout condition. * - Nor we are loading data right now. * */
int freeMemoryIfNeededAndSafe(void) {
    if (server.lua_timedout || server.loading) return C_OK;
    return freeMemoryIfNeeded();
}

// The source code is located in evict.c
int freeMemoryIfNeeded(void) {
    // Redis memory free core logic code
    // Calculate the size of memory used;
    // Determine the configured data elimination policy and process it according to the corresponding processing mode;
    void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {... }}// evictionPoolEntry fills evictionPoolPoolPopulate
// The source code is located in evict.c

/* This is an helper function for freeMemoryIfNeeded(), it is used in order * to populate the evictionPool with a few entries every time we want to * expire a key. Keys with idle time smaller than one of the current * keys are added. Keys are always added if there are free entries. * * We insert keys on place in ascending order, so keys with the smaller * idle time are on the left, and keys with the higher idle time on the * right. */

void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) { 
    // Sampledict: db->dict (deprecated from all data with the value dict) or DB ->expires (deprecated from data with the value expires);
    // Pool: Data pool to be deleted
    // Get up to maxmemory_samples for subsequent comparisons;
    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
}

// // [evictionPoolEntry]
// The source code is located in evict.c
// EVPOOL_SIZE: number of data stored in the data pool to be eliminated;
// EVPOOL_CACHED_SDS_SIZE: specifies the maximum size of the cache key. If the cache key length is greater than 255, memory space will be allocated separately. If the cache key length is smaller than 255, memory space will be allocated during initialization.
// evictionPoolEntry is initialized at evictionPoolAlloc(), and initServer() will call evictionPoolAlloc().

#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {
    unsigned long long idle;    /* The idle time of the key (the inverse of the LFU access frequency) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

static struct evictionPoolEntry *EvictionPoolLRU;
Copy the code

4. LFU algorithm of Redis

LFU: Less Frequently Used

  • The core idea is that “if a piece of data is rarely accessed in the recent period, then it is very unlikely to be accessed in the future”.

4.1 Differences between LFU and LRU

If a piece of data is only accessed suddenly (and may not be accessed later), under the LRU algorithm, the data is defined as hot data and will be eliminated at the latest. However, in the actual production environment, we often need to calculate the key access frequency during a period of time to eliminate cold data during this period.

Compared with LRU, LFU algorithm can improve the data hit ratio in some cases, and the data with more frequent use will be easier to be retained.

Compare the item Approximate LRU algorithm LFU algorithm
The first data to expire Not recently visited The least visited in a recent period
Applicable scenario Data is continuously accessed Data is accessed continuously over a period of time
disadvantages The new key will occupy the cache The elimination rate of keys with large historical access times depends on lfu-decay-time

4.2 Principle of LFU algorithm

LFU uses the Morris Counter probability counter to maintain the frequency of access with just a few bits. The Morris algorithm uses a random algorithm to increase the count. In the Morris algorithm, the count is not a real count, it represents the magnitude of the actual count.

Under the LFU data flushing policy, the LRU :LRU_BITS field (24 bits) of a redisObject is stored in two parts:

  • Ldt: Last Decrement Time, 16 bits, precision minute, stores the last LOG_C update.
  • LOG_C: dsmcounter, 8 bits, maximum 255, storage key access frequency.

Note:

  • LOG_C stores the access frequency, not the number of accesses;
  • LOG_C The access frequency decays with time;
    • Why does log base c decay over time? For example, in the seckill scenario, hot keys are accessed for a large number of times. If the hot keys do not decay with time, they are stored in the memory.
  • The LOG_C value of the new object is LFU_INIT_VAL = 5 to avoid being discarded as soon as it is created.
          16 bits      8 bits
     +----------------+--------+
     + Last decr time | LOG_C  |
     +----------------+--------+
Copy the code

For details, you can search “LFU (Least Frequently Used) implementation” in the source evict.c.

Core CONFIGURATIONS of the LFU:

  • Lfu-log-factor: counter grows logarithmic factor, adjust the growth rate of probability counter counter, the larger the value of lfu-log-factor is, the slower the growth rate of counter is; Lfu-log-factor Default is 10.
  • lfu-decay-time: Decay time period, adjustedDecrease rate of probability counter, in minutes. The default value is 1.
    • N minutes without access, counter will decay N/lfu-decay-time until decaying to 0;
    • If set to 0, counter is attenuated each time it is accessed.

The range of counter is 0-255, and its growth and the number of visits show a logarithmic growth trend. With the increasing number of visits, the growth of counter is slower and slower. Examples of counter values under different factors and different hit rates provided by Redis official website are as follows:

+--------+------------+------------+------------+------------+------------+ | factor | 100 hits | 1000 hits | 100K hits | 1M hits | 10M hits | +--------+------------+------------+------------+------------+------------+ | 0 | 104 | 255 | 255 255 | 255 | | + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- + | 1 18 49 | | 255 | | 255 | 255 | + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + 10 10 18 | | | | 142 | | 255 | 255 + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- + | 100 | 8 11 49 | | 143 | | 255 | +--------+------------+------------+------------+------------+------------+Copy the code

Different from THE LRU algorithm, the value of Ldt under the LFU algorithm is not updated when the key is accessed, but when the memory reaches maxMemory and the elimination policy is triggered.

Redis LFU elimination strategy logic:

  • Randomly select N data into evictionPoolEntry;
  • Re-elimination: select at most N data randomly and update the values of Ldt and counter. As long as counter is smaller than any counter in evictionPoolEntry, the data will be filled into the data pool to be eliminated.
    • The capacity of evictionPoolEntry is EVPOOL_SIZE = 16;
  • Perform elimination: Select the smallest counter data in the [datapool to be eliminated] for elimination;

When Redis processes data, lookupKey is called to update the clock of the key. If the memory flushing strategy is LFU, then lookupKey is called to update the clock of the key. The ‘updateLFU()’ method is called to calculate and update the LRU in LFU mode, otherwise the clock for the key is updated ‘val->lru = LRU_CLOCK()’.

// Source code is located in db.c, public id zxiaofan. /* Update LFU when an object is accessed. * Firstly, decrement the counter if the decrement time is reached. * Then logarithmically increment the counter, And update the access time. */ void updateLFU(robj *val) {// First decay-time lfu-decay-time; unsigned long counter = LFUDecrAndReturn(val); // refer to the lFU_log_factor configuration for an increment; counter = LFULogIncr(counter); // Update lRU; val->lru = (LFUGetTimeInMinutes()<<8) | counter; }Copy the code

Redis data elimination schematic diagram:

5. Tips

5.1. Why does Redis use its own clock?

  • Obtaining the system timestamp invokes methods provided under the system;
  • Single-threaded Redis is very performance demanding, and getting the timestamp from the cache can greatly improve performance.

5.2. How do I discover hot Keys?

The object freq key command supports the counter of the key, so we can scan through all the keys and then get the counter through the object freq.

Note that object FREQ is implemented only when the data elimination policy is LFU.

127.0. 01.:6379> object freq key1
(error) ERR An LFU maxmemory policy is not selected, access frequency not tracked. Please note that when switching between policies at runtime LRU and LFU data will take some time to adjust.
127.0.0.1:6379> config set maxmemory-policy volatile-lfu
OK
127.0.0.1:6379> object freq key1
(integer) 0
127.0.0.1:6379> get key1
"v1"
127.0.0.1:6379> object freq key1
(integer) 1
Copy the code

Redis 4.0.3 also provides the hotkey function of Redis -cli. Run “./ Redis -cli –hotkeys” to obtain hotkeys. Note that hotKeys are essentially scan + object freq, so it may take a long time if there is a large amount of data.

>./redis-cli -p 6378 -a redisPassword--hotkeys Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. # Scanning the entire keyspace To find hot keys as well as # average sizes per key type. You can use -I 0.1 to sleep 0.1 SEC # per 100 SCAN commands [00.00%] Hot key '"key-197804"' found so far with counter 4 [00.00%] Hot key '"key-242392"' found So far with counter 4 [00.00%] Hot key '"key-123994"' found so far with counter 4 [00.00%] Hot key '"key-55821"'... . [03.72%] Hot Key '" Key1 "' found so far with Counter 6 -------- summary ------- Sampled 300002 keys in the Keyspace! hot key found with counter: 6 keyname: "key1" hot key found with counter: 4 keyname: "key-197804" hot key found with counter: 4 keyname: "key-242392" hot key found with counter: 4 keyname: "key-123994" hot key found with counter: 4 keyname: "key-55821" hot key ...Copy the code

Using Redis as an LRU cache: Redis. IO /topics/ LRU -…

【 Redis series of articles recently selected @zxiaofan】

Redis: How to Import, Export, and Delete Large Amounts of Data in a Production Environment

Redis- Deleted 2 million keys, why is the memory still not free?

Play with the Use and Principle of Bloom Filter in Redis

Exploration of Redis-Hyperloglog Principle

“Playing With Redis-Hyperloglog Statistical Micro-blog Day and month”


Good luck!

Life is all about choices!

In the future, you will be grateful for your hard work now!

【CSDN】 【GitHub】 【OSCHINA】 【The Denver nuggets】 【Language finches】 【Wechat official account】