Preface introduces

However, finding hot keys has always been a problem for many users. Redis4.0 brings many new features, including the LFU-based hot key discovery mechanism.

Least Frequently Used

The Least Frequently Used memory eviction strategy is a new type of memory eviction strategy in REDIS4.0. From the word LFU, we can easily associate the key access frequency. However, the original version 4.0 only Used memory eviction, and the access frequency was not well recorded. After some modifications, Redis officially supports lFU-based hot key discovery in version 4.0.3.

LFU algorithm introduction

Each object in Redis has 24 bits of space to record LRU/LFU information:

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;
Copy the code

When used as an LFU, the 24 bits are divided into two parts:

  • The high 16 bits record the access time (in minutes) LRU
  • The lower 8 bits are used to record the frequency of access, called counter LFU

Counter: Logarithmic counter based on probability

8 bits maximum is 255. Is it enough to use only 8 bits to record the access frequency?

For counter, Redis used a trick. Counter is not a simple linear counter, but a logarithmic counter based on probability. The algorithm is as follows:

uint8_t LFULogIncr(uint8_t counter) {
      if (counter == 255) return 255;
      double r = (double)rand()/RAND_MAX;
      double baseval = counter - LFU_INIT_VAL;
      if (baseval < 0) baseval = 0;
      double p = 1.0/(baseval*server.lfu_log_factor+1);
      if (r < p) counter++;
      return counter;
  }
Copy the code
The corresponding probability distribution calculation formula is as follows:
1/((counter-LFU_INIT_VAL)*server.lfu_log_factor+1)
Copy the code

Where LFU_INIT_VAL is 5, let’s look at the probability distribution diagram to get a more intuitive understanding, using the default server. lFU_log_factor =10 as an example:

As can be seen from the figure above, the larger counter is, the less probability it will increase. 8 bits is also enough to record high access frequency. The following table shows the corresponding relationship between different probability factors server. lFU_log_factor and access frequency counter:

That is, with the default server. lFU_log_factor of 10, an 8-bit counter can represent 1 million access frequencies.

The decay factor of counter

As can be seen from the counter growth function LFULogIncr, as the number of visits to key increases, counter eventually converges to 255, which brings a problem. If counter only grows but does not decay, hot keys cannot be distinguished.

To solve this problem, Redis provides the decay factor server. lFU_decay_time, which is in minutes and can be easily calculated. If a key has not been accessed for a long time, its counter will be reduced.

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0;
    if (num_periods)
        counter = (num_periods > counter) ? 0 : counter - num_periods;
    return counter;
}
Copy the code

If it defaults to 1, which means no access for N minutes, then counter is reduced by N.

Both probability factor and attenuation factor can be configured. It is recommended to use the default value of Redis:

lfu-log-factor 10
lfu-decay-time 1
Copy the code

Hot Key Discovery

After introducing the LFU algorithm, the hot key discovery mechanism is our concern.

Update the access time and counter represented by the 24-bit field of the LFU every time a key is read or written, so that each key can obtain the correct LFU value:

void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    counter = LFULogIncr(counter);
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}
Copy the code

So how do users get access frequency? Redis provides the OBJECT FREQ subcommand to obtain the LFU information, but note that the memory eviction policy must be set to allkeys-lFU or volatile-lfu, otherwise an error will be returned:

127.0.0.1:6379> config get maxmemory-policy
1) "maxmemory-policy"
2) "noeviction"
127.0.0.1:6379> object freq counter:000000006889
(error) ERR An LFU maxmemory policy is not selected, access frequency not tracked. Please note that when switching between policies at runtime LRU and LFU data will take some time to adjust.

127.0.0.1:6379> config set maxmemory-policy allkeys-lfu
OK
127.0.0.1:6379> object freq counter:000000006889
(integer) 3
Copy the code

Underlying algorithm: Run the scan command to traverse all keys, obtain the access frequency through OBJECT FREQ, and sort the keys to obtain hotspot keys.

Redis 4.0.3 also provides hotkey discovery with redis-cli. You can use the –hotkeys option when executing redis-cli as shown in the following example:

$./redis-cli --hotkeys

# Scanning the entire keyspace to find hot keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Hot key 'counter:000000000002' found so far with counter 87
[00.00%] Hot key 'key:000000000001' found so far with counter 254
[00.00%] Hot key 'mylist' found so far with counter 107
[00.00%] Hot key 'key:000000000000' found so far with counter 254
[45.45%] Hot key 'counter:000000000001' found so far with counter 87
[45.45%] Hot key 'key:000000000002' found so far with counter 254
[45.45%] Hot key 'myset' found so far with counter 64
[45.45%] Hot key 'counter:000000000000' found so far with counter 93

-------- summary -------

Sampled 22 keys in the keyspace!
hot key found with counter: 254    keyname: key:000000000001
hot key found with counter: 254    keyname: key:000000000000
hot key found with counter: 254    keyname: key:000000000002
hot key found with counter: 107    keyname: mylist
hot key found with counter: 93    keyname: counter:000000000000
hot key found with counter: 87    keyname: counter:000000000002
hot key found with counter: 87    keyname: counter:000000000001
hot key found with counter: 64    keyname: myset
Copy the code

As you can see, the hot keys rank first.


This section describes the Scan operation

As anyone familiar with Redis knows, it is single-threaded. So be very careful about using some O(N) time complexity commands. You can accidentally block the process, causing Redis to stall.

  • Sometimes, we need to operate on a subset of commands that meet the criteria, such as deleting keys that begin with test_. So how do I get these keys? Prior to Redis2.8, we could use the keys command to get the key we needed by following the re match. But this command has two drawbacks:

  • Without limit, we can only get all the keys that meet the criteria at once, and if there are millions of them, you are waiting for an “endless” string output.

  • The keys command is a traversal algorithm with O(N) time complexity. As we mentioned earlier, this command can easily cause the Redis service to stall. Therefore, we should try to avoid using this command in production.

How do you choose between satisfying your needs and having Redis stuck? Redis provides a solution to this dilemma in version 2.8, the scan command.

Advantages of Scan operation

Scan has two distinct advantages over keys:

  • The time complexity of the scan command is also O(N), but it is performed in batches and does not block threads.

  • The scan command provides the limit parameter to control the maximum number of results returned at a time.

These two advantages help us solve the above problem, but the scan command is not perfect, and the results may be repeated, so the client has to redo them. Scan the command

The summary has the following characteristics:

  • It is important that the results returned may be duplicated and need to be repeated by the client;
  • If data is modified during traversal, it is uncertain whether the modified data can be traversed.
  • If a single return result is empty, it does not mean the traversal is over, but whether the returned cursor value is zero.

Example of the scan command

// The first argument specifies the cursor, the second argument specifies the matching mode, and the third argument specifies the number of rows to return // Note: the count argument limits the number of dictionary slots that the server can traverse in a single pass, not the number of keys returned

127.0.0.1:6379> Scan 0 match key99* count 1000 1) "13976" 2) 1) "key9911" 2) "key9974" 3) "key9994" 4) "key9910" 5) "Key9907" 6) "key9989" 7) "key9971" 8) "key9966" 10) "key992" 11) "key9903" 12) "key9905" 127.0.0.1:6379> scan 13976 match key99* count 1000 1) "1996" 2) 1) "key9982" 2) "key9997" 3) "key9963" 4) "key996" 5) "key9912" 6) "Key9999" 7) "key9921" 8) "key994" 9) "key9956" 10) "key9919" 127.0.0.1:6379> Scan 1996 match key99* count 1000 1) "12594" 2) 1) "key9939" 2) "key9941" 3) "key9967" 4) "key9938" 5) "key9906" 6) "key999" 7) "key9909" 8) "key9933" 9) "key9992" ...... 127.0.0.1:6379> Scan 11687 match key99* count 1000 1) "0" 2) 1) "key9969" 2) "key998" 3) "key9986" 4) "key9968" 5) "key9965" 6) "key9990" 7) "key9915" 8) "key9928" 9) "key9908" 10) "key9929" 11) "key9944"...Copy the code

This article mainly discusses how SCAN works from the perspective of the underlying structure and source code.

The structure of Redis

Redis uses Hash tables as the underlying implementation for reasons of efficiency and simplicity. When it comes to Hash tables, the first thing many Java programmers think of is a HashMap. Yes, the storage structure of the underlying key of Redis is an array + linked list structure similar to HashMap. Where the array size of the first dimension is 2n(n>=0). The size of the array doubles with each expansion.

The scan command traverses this one-dimensional array. The returned value is also the index of the array. The limit parameter indicates how many elements of the array are iterated over, returning all qualified results attached to those elements. Because the size of the linked list attached to each element is different, the number of results returned each time is different.

SCAN traversal order

The traversal order of the scan command can be looked at in detail with a small chestnut.

127.0.0.1:6379> keys *
1) "db_number"
2) "key1"
3) "myKey"
127.0.0.1:6379> scan 0 MATCH * COUNT 1
1) "2"
2) 1) "db_number"
127.0.0.1:6379> scan 2 MATCH * COUNT 1
1) "1"
2) 1) "myKey"
127.0.0.1:6379> scan 1 MATCH * COUNT 1
1) "3"
2) 1) "key1"
127.0.0.1:6379> scan 3 MATCH * COUNT 1
1) "0"
2) (empty list or set)
Copy the code

We have three keys in our Redis, and we only iterate through the elements of one one-dimensional array at a time. As shown above, the traversal order of the SCAN command is

0 - > 2 - > 1 - > 3Copy the code

This order seems a little strange. Let’s convert it to binary to make a little bit more sense.

00 – > 10 – > 01 – > 11

And we find that every time this sequence is incremented by one. Ordinary binary addition, from right to left, carry. And this sequence adds up and carries from left to right. This is also demonstrated in the redis source code.

Cursors are treated as follows in the dictScan function of the dict.c file

v = rev(v);
v++;
v = rev(v);
Copy the code

This means that you turn the cursor upside down, add one, and then invert it again, which is what we call the “high plus one” operation.

Now, you might wonder why this order is used instead of the normal 0, 1, 2… This order is due to the need to take into account dictionary expansion and shrinkage throughout the process (I have to admire the developers for considering the comprehensiveness of the problem).

Let’s take a look at how the SCAN traversal works when an expansion occurs during SCAN traversal. Adding our original array had four elements, two digits in the index, we need to expand it to three digits and rehash it.

All elements originally attached to xx are assigned to 0xx and 1xx. In the figure above, as we are about to iterate over 10, the dict rehashes so that the scan command is iterated from 010, and 000 and 100 (the elements that were attached under 00) are not iterated again.

Now let’s look at the shrinkage. Assume that dict shrinks from 3 bits to 2 bits and immediately traverses 110. When dict shrinks, scan traverses 10. Elements attached to 010 will be iterated, but elements before 010 will not be iterated. So, there may be some repeating elements when you shrink.

Rehash of Redis

Rehash is a complex process that uses a progressive rehash mechanism in order not to block the Redis process.

/* dict */ typedef struct dict {// type specific function dictType *type; // Private data void * privData; // Dictht ht[2]; // Rehash index // If rehash is not in progress, the value is -1 int rehashidx; /* Rehashing not in progress if rehashidx == -1 */ / /* number of iterators currently running */ } dict;Copy the code

In the dictionary structure of Redis, there are two hash tables, one new and one old. During the rehash process, Redis migrates elements from the old table to the new table, so let’s look at the source code for dict’s rehash operation.

/* Performs N steps of incremental rehashing. Returns 1 if there are still * keys to move from the old to the new hash table, otherwise 0 is returned. * * Note that a rehashing step consists in moving a bucket (that may have more * than one key as we use chaining) from the old to the new hash table, however * since part of the hash table may be composed of empty spaces, it is not * guaranteed that this function will rehash even a single bucket, since it * will visit at max N*10 empty buckets in total, otherwise the amount of * work it does would be unbound and the function may block for a long time. */ int dictRehash(dict *d, int n) { int empty_visits = n*10; /* Max number of empty buckets to visit. */ if (! dictIsRehashing(d)) return 0; while(n-- && d->ht[0].used ! = 0) { dictEntry *de, *nextde; /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used ! = 0 */ assert(d->ht[0].size > (unsigned long)d->rehashidx); while(d->ht[0].table[d->rehashidx] == NULL) { d->rehashidx++; if (--empty_visits == 0) return 1; } de = d->ht[0].table[d->rehashidx]; /* Move all the keys in this bucket from the old to the new hash HT */ while(de) { uint64_t h; nextde = de->next; /* Get the index in the new hash table */ h = dictHashKey(d, de->key) & d->ht[1].sizemask; de->next = d->ht[1].table[h]; d->ht[1].table[h] = de; d->ht[0].used--; d->ht[1].used++; de = nextde; } d->ht[0].table[d->rehashidx] = NULL; d->rehashidx++; } /* Check if we already rehashed the whole table... */ if (d->ht[0].used == 0) { zfree(d->ht[0].table); d->ht[0] = d->ht[1]; _dictReset(&d->ht[1]); d->rehashidx = -1; return 0; } /* More to rehash... */ return 1; }Copy the code

As you can see from the comments, the rehash process migrates in buckets. Buckets are the elements of the one-dimensional array we mentioned earlier. Migrate the list one at a time.

  1. First determine if you’re rehashing, and if so, proceed; Otherwise, return directly.
  2. The next step is incremental rehash. It also determines whether there are any remaining elements to ensure security.
  3. Before rehash, determine whether the bucket to be migrated is out of bounds.
  4. The empty buckets are then skipped, with an empty_visits variable that represents the maximum number of accessible empty buckets, which is mainly used to ensure that there are not too many blocked Redis.
  5. The next step is to migrate the elements, rehash all the elements of the current bucket and update the number of elements in both tables.
  6. After each bucket migration, point the bucket in the old table to NULL.
  7. Finally, determine if all the migration is complete, if so, reclaim the space and reset the Rehash index, otherwise tell the caller that there is still data that has not been migrated.
  8. Because Redis uses a progressive rehash mechanism, the scan command returns the results to the client when both the old and new tables need to be scanned.