Redis Scan algorithm design idea

What if I want to return all the keys in redis’s current database? Using keys? In the case of a large number of keys, this command takes a long time for the single-threaded Redis server to execute, resulting in no response to subsequent commands. In addition, this command also causes certain pressure on the memory and seriously reduces the availability of the Redis service

For this purpose, Redis 2.8.0 and later provides multiple scan related commands to provide related traversal functions for different data structures (such as databases, collections, hashes, and ordered collections)

The SCAN command and its related SSCAN, HSCAN, and ZSCAN commands are used to incrementally iterate a collection of elements.

The SCAN command is used to iterate over the database keys in the current database
The SSCAN command is used to iterate over elements in a collection key
The HSCAN command is used to iterate over the key-value pairs in a hash key
The ZSCAN command is used to iterate over elements in an ordered collection (including element members and element scores)

SCAN

SCAN cursor [MATCH pattern] [COUNT COUNT]
The SCAN command is a cursor based iterator: Each time the SCAN command is invoked, a new cursor is returned to the user. The user needs to use the new cursor as the cursor parameter of the SCAN command in the next iteration to continue the previous iteration process
whenSCANThe cursor parameter of the command is set to0, the server will start a new iteration, and when the server returns a value of0Represents the end of the iteration
SSCAN/HSCAN /ZSCAN commands are similar to SCAN commands except for slightly different command formats and different traversal modes in non-hash table implementations. For details, please click the link to query
Guarantee: All elements that have been in the data set from the start of the full walk until the end of the full walk are returned by the full walk
Disadvantages: 1) the same element may be returned several times, after the rehash narrow traversal or rehash traversal in the process of shrinking may be happen this situation (personal) 2) if an element is in the process of iteration is added to the data set, or is in the process of iteration is deleted from the data set, then this element may be returned, Or maybe not, it’s not certain

Note: the above content from http://redisdoc.com/key/scan.html

For the SCAN command and the set, hash and ordered set implemented by hash table at the bottom, the same SCAN algorithm is used for traversal (dictScan function is called). The dictScan function is short and concise, which is the core of this paper’s attempt to explain, as follows

unsigned long dictScan(dict *d,// To iterate over the hash table
                       unsigned long v,//cursor value, the traversal position, the initial is 0
                       dictScanFunction *fn,// The single item traversal function copies the item object according to the item type so that it can be added to the return object
                       dictScanBucketFunction* bucketfn,//null
                       void *privdata)// Return the object
{
    dictht *t0, *t1;
    const dictEntry *de, *next;
    unsigned long m0, m1;

    if (dictSize(d) == 0) return 0;// If dict is empty, return directly

    if(! dictIsRehashing(d)) {// If hashing is not at rehashing, only HT [0] has data
        t0 = &(d->ht[0]);// set ht[0] as traversal table
        m0 = t0->sizemask;// the size of the traversal table is sizemask. If the size of the traversal table is 8, then m0 is 111

        /* Walk over all entries at the cursor position */
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);// This line will not be executed if the condition is false
        de = t0->table[v & m0];
        while (de) {// Iterate over all entries at the current cursor. That is, the hash key moulds all entries of the hash table with the same size
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        /* Set v cursor to 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
        v |= ~m0;

        /* Change the cursor position 0 to 1 or (successive high 1's become 0 and the first 0 becomes 1) */
        v = rev(v);// Reverse the binary order of the cursor to 100+61 ones
        v++;// Add 1 to the last digit, which is 101+61 zeros
        v = rev(v);// Set cursor to binary reverse order, i.e. 61 zeros +101

    } else {// Hash tables are rehashing
        t0 = &d->ht[0];
        t1 = &d->ht[1];

        /* Make sure t0 is small and t1 is large */
        if (t0->size > t1->size) {
            t0 = &d->ht[1];
            t1 = &d->ht[0];
        }

        m0 = t0->sizemask;// if the table size is 8, then m0 is 7, i.e. 0111
        m1 = t1->sizemask;// if the table size is 64, m1 is 63, that is, 00111111

        /* Add all entries at the cursor position */
        if (bucketfn) bucketfn(privdata, &t0->table[v & m0]);/ / does not perform
        de = t0->table[v & m0];
        while (de) {// Add all entries for t0
            next = de->next;
            fn(privdata, de);
            de = next;
        }

        * If the cursor is 1, the small table is 8, and the large table is 64, the entries at 0, 8, 16, 24, 32, 40, 48, 56, etc. are added to the cursor
        do {
            /* Add all elements at position V of the large table, notice that position V is constantly changing with the while loop */
            if (bucketfn) bucketfn(privdata, &t1->table[v & m1]);/ / does not perform
            de = t1->table[v & m1];
            while (de) {// Add all entries for position v
                next = de->next;
                fn(privdata, de);
                de = next;
            }

            The cursor position of a small table is extended to all positions of a large table */
            v |= ~m1;
            v = rev(v);
            v++;
            v = rev(v);

            /* For example, m0 is a 3-bit 1, m1 is a 6-bit 1, and m0 is a 6-bit 1, and m1 is a 6-bit 1, and m0 is a 3-bit 1, and M1 is a 6-bit 1. That is, return */ for all entries that m0 might rehash to position M1
        } while (v & (m0 ^ m1));
    }

    return v;
}
Copy the code

This algorithm is very subtle, looked for a long time to understand the point of meaning, if there is improper, welcome to clap brick

Hash tables have multiple states, and traversals can be in

After the hash is extended
After the contract
In rehashing

As a result, scan algorithm is faced with a very complicated situation. How to traverse all elements (elements that have not changed in the traverse process are guaranteed to complete the traverse) and return as few duplicate elements as possible is a difficult problem. Redis Scan iterator — dictScan reverse binary iterator (search online, process is long, careful, but there are some good diagrams)

Specific algorithm flow can also refer to the above source notes

Algorithmic thinking (thinking of shrinkage as expansion in reverse)

1, If the hash table size is expanded from N to 2^M x N, then the I element of the original hash table may be distributed to I + j x N where j <- [0, 2^ m-1], for example, N = 4, M = 3. Then I (originally 1) may be divided into 1, 1+1×4, 1+2×4… And 1+7×4 positions. Note that the log(N) bits after these positions are the same, that is, the M bits before them are different, e.g

00001
00101
01001
.
11101

If the can will expand after traversal process behind the location of the both is the same as 01 are ignored, which is just behind the same N traverse, means that the front M all possibilities are listed out, which is always put in front of the possibility of exhaustion, exhaustive back again, then extended slot (e.g., 1 of 1, 5, 9… 29) There is no need to iterate again. The contraction is similar, but the position after contraction may contain the possibility that the original hash table has not been exhausted, so it needs to iterate again

2. How to traverse the possibility of high position first? DictScan gives the reverse binary iterative algorithm (always invert the highest position first, exhaust the possibility of high position, and advance to the low position in turn. This transformation ensures that all elements will be traversed) :

Set the position corresponding to the first encountered high position 0 to 1 (that is, the two have the most consecutive same low position from right to left before and after transformation, that is, the maximum range of the same modulus). Under this rule, the next position of the 32-size hash table after 00001 traversal is 10001. If the next traversal of 10001 shrinks the hash table to 16 size, Position 0001 is retraversed (modulo 10001 and 16), and both 00001 and 10001 shrink to that position, in which case the element may return repeatedly; If 32 expands to 64, then 00001 expands into two positions 000001/100001. Due to the principle of high level exhaustion, these positions will not be processed again, reducing the probability of repeated return of elements
Or will the frontRow 1When set to 0, the first 0 is set to 1, such as 10001 and the next 0 is 01001, the next high is exhausted

3, rehashing this situation, after traversing the cursor position of the small table, the cursor position of the small table may rehash to all the positions of the large table, and then return to traversal elements and the next small table traversal position

Welcome to follow my wechat official account

SCAN

Algorithmic thinking (thinking of shrinkage as expansion in reverse)

Welcome to follow my wechat official account

Related Posts

Logback combines custom filter to trace access logs

Deadknock Java Threads series 8 ways to create threads

Java8 List to Map, I am stuck……