As anyone familiar with Redis knows, it is single-threaded. So be very careful about using some O(N) time complexity commands. You can accidentally block the process, causing Redis to stall.

Sometimes, we need to operate on a subset of commands that meet the criteria, such as deleting keys that begin with test_. So how do I get these keys? Prior to Redis2.8, we could use the keys command to get the key we needed by following the re match. But this command has two drawbacks:

  1. Without limit, we can only get all the keys that meet the criteria at once, and if there are millions of them, you are waiting for an “endless” string output.
  2. The keys command is a traversal algorithm with O(N) time complexity. As we mentioned earlier, this command can easily cause the Redis service to stall. Therefore, we should try to avoid using this command in production.

How do you choose between satisfying your needs and having Redis stuck? Redis provides a solution to this dilemma in version 2.8, the scan command.

Scan has two distinct advantages over keys:

  1. The time complexity of the scan command is also O(N), but it is performed in batches and does not block threads.
  2. The scan command provides the limit parameter to control the maximum number of results returned at a time.

These two advantages help us solve the above problem, but the scan command is not perfect, and the results may be repeated, so the client has to redo them. As for why it is repeated, I believe you will have the answer after reading this article.

For the basic usage of the scan command, see the Redis command: Keys.

Today we’ll focus on the underlying structure and source code for how SCAN works.

The structure of Redis

Redis uses Hash tables as the underlying implementation for reasons of efficiency and simplicity. When it comes to Hash tables, the first thing many Java programmers think of is a HashMap. Yes, the storage structure of the underlying key of Redis is an array + linked list structure similar to HashMap. Where the array size of the first dimension is 2n(n>=0). The size of the array doubles with each expansion.

The scan command traverses this one-dimensional array. The returned value is also the index of the array. The limit parameter indicates how many elements of the array are iterated over, returning all qualified results attached to those elements. Because the size of the linked list attached to each element is different, the number of results returned each time is different.

SCAN traversal order

The traversal order of the scan command can be looked at in detail with a small chestnut.

127.0.0.1:6379 > keys * 1)"db_number"
2) "key1"
3) "myKey"
127.0.0.1:6379> scan 0 MATCH * COUNT 1
1) "2"
2) 1) "db_number"
127.0.0.1:6379> scan 2 MATCH * COUNT 1
1) "1"
2) 1) "myKey"
127.0.0.1:6379> scan 1 MATCH * COUNT 1
1) "3"
2) 1) "key1"
127.0.0.1:6379> scan 3 MATCH * COUNT 1
1) "0"
2) (empty list or set)
Copy the code

We have three keys in our Redis, and we only iterate through the elements of one one-dimensional array at a time. As shown above, the traversal order of the SCAN command is

0 – > 2 – > 1 – > 3

This order seems a little strange. Let’s convert it to binary to make a little bit more sense.

00 – > 10 – > 01 – > 11

And we find that every time this sequence is incremented by one. Ordinary binary addition, from right to left, carry. And this sequence adds up and carries from left to right. This is also demonstrated in the redis source code.

Cursors are treated as follows in the dictScan function of the dict.c file

v = rev(v);
v++;
v = rev(v);
Copy the code

This means that you turn the cursor upside down, add one, and then invert it again, which is what we call the “high plus one” operation.

Now, you might wonder why this order is used instead of the normal 0, 1, 2… This order is due to the need to take into account dictionary expansion and shrinkage throughout the process (I have to admire the developers for considering the comprehensiveness of the problem).

Let’s take a look at how the SCAN traversal works when an expansion occurs during SCAN traversal. Adding our original array had four elements, two digits in the index, we need to expand it to three digits and rehash it.

All elements originally attached to xx are assigned to 0xx and 1xx. In the figure above, as we are about to iterate over 10, the dict rehashes so that the scan command is iterated from 010, and 000 and 100 (the elements that were attached under 00) are not iterated again.

Now let’s look at the shrinkage. Assume that dict shrinks from 3 bits to 2 bits and immediately traverses 110. When dict shrinks, scan traverses 10. Elements attached to 010 will be iterated, but elements before 010 will not be iterated. So, there may be some repeating elements when you shrink.

Rehash of Redis

Rehash is a complex process that uses a progressive rehash mechanism in order not to block the Redis process.

* / / * dictionary
typedef struct dict {
    // Type specific functions
    dictType *type;
    // Private data
    void *privdata;
    / / a hash table
    dictht ht[2];
    / / rehash index
    // When rehash is not in progress, the value is -1
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // The number of security iterators currently running
    int iterators; /* number of iterators currently running */
} dict;
Copy the code

In the dictionary structure of Redis, there are two hash tables, one new and one old. During the rehash process, Redis migrates elements from the old table to the new table, so let’s look at the source code for dict’s rehash operation.

/* Performs N steps of incremental rehashing. Returns 1 if there are still * keys to move from the old to the new hash table, otherwise 0 is returned. * * Note that a rehashing step consists in moving a bucket (that may have more * than one key as we use chaining) from the old to the new hash table, however * since part of the hash table may be composed of empty spaces, it is not * guaranteed that this function will rehash even a single bucket, since it * will visit at max N*10 empty buckets in total, otherwise the amount of * work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if(! dictIsRehashing(d))return 0;

    while(n-- && d->ht[0].used ! =0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used ! = 0 * /
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... * /
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = - 1;
        return 0;
    }

    /* More to rehash... * /
    return 1;
}
Copy the code

As you can see from the comments, the rehash process migrates in buckets. Buckets are the elements of the one-dimensional array we mentioned earlier. Migrate the list one at a time. Let’s explain this code.

  • First determine if you’re rehashing, and if so, proceed; Otherwise, return directly.
  • The next step is incremental rehash. It also determines whether there are any remaining elements to ensure security.
  • Before rehash, determine whether the bucket to be migrated is out of bounds.
  • The empty buckets are then skipped, with an empty_visits variable that represents the maximum number of accessible empty buckets, which is mainly used to ensure that there are not too many blocked Redis.
  • The next step is to migrate the elements, rehash all the elements of the current bucket and update the number of elements in both tables.
  • After each bucket migration, point the bucket in the old table to NULL.
  • Finally, determine if all the migration is complete, if so, reclaim the space and reset the Rehash index, otherwise tell the caller that there is still data that has not been migrated.

Because Redis uses a progressive rehash mechanism, the scan command returns the results to the client when both the old and new tables need to be scanned.