Compared with array and linked list, dictionary is a higher level data structure. Like our Chinese dictionary, a Chinese character can be uniquely determined by pinyin or side. In the program, we call each mapping relationship a key-value pair, and many key-value pairs together constitute our dictionary structure.

There are many advanced dictionary structure implementations, such as the low-level implementation of HashMap in Java, which evenly distributes key-value pairs into an array according to the Hash value of the key. In case of Hash conflicts, the conflicting key-value pairs are cascaded through one-way linked lists and split into red-black trees over eight nodes in the linked list structure.

So how does that work in Redis? Let’s take a look.

Definition of dictionary structure

Dictionary-related constructs in Redis are defined in dict.h, where dict represents a dictionary structure:

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; 
    unsigned long iterators;
} dict;
Copy the code

Among them, the type field points to the dictType structure, which defines several polymorphic methods as follows:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;
Copy the code

HashFunction pointer. When we use the set command to store data in the dictionary, we will first pass the hashFunction with the key of the key-value pair as the parameter to get a relatively evenly hashed value, and then we will actually store the data. This is where the hash function is used. If you need to provide different hashes for your dictionary structure, make an implementation of the hash function in dictType when initializing the dictionary.

KeyDestructor destroys a key and valDup destroys the value of a key and value pair. Is a polymorphic presentation, the specific implementation needs to be provided by the user.

Moving on to the dict structure, the privData pointer stores some additional information attached to the dictionary structure, ht is an array of dictht structures, dictht is a hash table structure, which we’ll look at in a second. The rehashidx field is used to record the keys being transferred during the rehash process. The iterators field records the iterators currently in progress for the dictionary.

Dictht is our hash table structure,

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;
Copy the code

Table is a two-dimensional array that points to dictentries, and each dictEntry actually represents a key-value pair. Why is it a two-dimensional structure?

In fact, our dictionary normally stores data like this:

Each dictEntry will store a key/value pair inside, and then we can traverse all the key/value pairs through the table pointer. But what if the key of a key/value pair is hashed and the location that should be stored is calculated by another node, which is often called hashing conflict?

The practice in Redis, and even most dictionary structure implementations, is to choose to concatenate the conflicting nodes into a linked list, so the dictionary structure becomes this.

The same node of the linked list hash value of the key must be the same, also it is because the same can be strung together, from a logical point of view, the dictionary structure as shown on the image above, but the abstract into our code, is a two-dimensional array structure, first dimension is node pointer to a pointer, 2 d point to is the pointer to our keys to the structure, Each dictEntry structure has a next pointer that can concatenate all the conflicting nodes in case of hash collisions.

In addition, the size attribute in dictht is used to describe the maximum addressable size of the entire hash dictionary table, which is also the maximum length of the first dimension in a two-dimensional array. The sizemask attribute is always equal to size-1, which expresses the concept of a sizemask, which is used to determine the initial position of nodes in the array. Used records the number of key-value pairs that have been stored in the entire hash table.

Ht is an array with only two elements in the dict dictionary structure. Normally we use the DICTIONARY table HT [0] and HT [1] is used to transfer all nodes in HT [0] in our progressive rehash process.

Finally, let’s look at the dictEntry key-value pair structure:

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;
Copy the code

Key is a pointer to any structure, representing our key can use any object type in redis. V is a union type, which can be a pointer, uint64_t or INT64_t, or a double type. Different field attributes are used according to the actual value of value.

The next pointer points to another dictEntry structure that is used to link the next key-value pair node in the event of a hash collision.

Dict describes a dictionary, where dictht describes hash tables, and dictEntry describes key-value pair structures. Iterators we’ll talk about in a second.

2. Progressive Rehash data migration

Redis’s rehash may be slightly different from Java and other hashing implementations, because Redis is single-threaded and does not require writing a large number of concurrent statements to ensure data consistency, but single-threaded processing can also result in a very slow rehash process, with the client blocking for too long. So what does Redis do?

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if(! dictIsRehashing(d))return 0;

    while(n-- && d->ht[0].used ! = 0) { dictEntry *de, *nextde; /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used ! = 0 */ assert(d->ht[0].size > (unsigned long)d->rehashidx); while(d->ht[0].table[d->rehashidx] == NULL) { d->rehashidx++; if (--empty_visits == 0) return 1; } de = d->ht[0].table[d->rehashidx]; /* Move all the keys in this bucket from the old to the new hash HT */ while(de) { uint64_t h; nextde = de->next; /* Get the index in the new hash table */ h = dictHashKey(d, de->key) & d->ht[1].sizemask; de->next = d->ht[1].table[h]; d->ht[1].table[h] = de; d->ht[0].used--; d->ht[1].used++; de = nextde; } d->ht[0].table[d->rehashidx] = NULL; d->rehashidx++; } /* Check if we already rehashed the whole table... */ if (d->ht[0].used == 0) { zfree(d->ht[0].table); d->ht[0] = d->ht[1]; _dictReset(&d->ht[1]); d->rehashidx = -1; return 0; } /* More to rehash... */ return 1; }Copy the code

The default value of rehashidx is -1, indicating that the dictionary is not in the rehash phase. In other cases, the value of this field is equal to the index of the bucket that is being transferred.

The new version of dictRehash requires an additional parameter n, which controls the maximum number of empty buckets transferred at a time. What does that mean? Let’s look at a picture:

There is a dictionary structure in which the buckets with index values 2 and 3 are empty, i.e. there are no key-value pairs in them. Under normal circumstances, a rehash can only transfer a bucket, but if the last time the barrels, transfer the index to 1 next to traverses back a barrel, if continue to empty it continue to traverse back, until you find a storage node our not empty barrels, extreme circumstances, if the dictionary only last barrels a node in the table, The time complexity of rehash is O(n), which causes the client to wait too long. Therefore, in the new version, an additional parameter n is passed to control the maximum number of empty buckets to be traversed.

The relevant code snippet is as follows:

while(d->ht[0].table[d->rehashidx] == NULL) {
    d->rehashidx++;
    if (--empty_visits == 0) return 1;
}
Copy the code

If the current dictionary rehash is complete after the current bucket transfer is complete, modify the ht[0] pointer reference to point to the new dictionary table HT [1] and set rehashidx to -1 to mark the end of the dictionary rehash.

That’s the whole process of Rehash in Redis, and it’s fairly simple, but why is it progressive? Let’s look at adding and querying key-value pairs.

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    int index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}
Copy the code

So that’s how we call set, and we add key-value pairs underneath, and the initial logic of this function is to call dictIsRehashing to see if the current dictionary table is in the rehash state, that is, if rehashidx is not equal to -1. _dictRehashStep

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}
Copy the code

By default, redis allows access to up to 10 empty buckets for a single rehash session to return and not linger. It is worth noting that the subsequent logic of the method determines that if the current dictionary is being rehashed, the new key-value pair will no longer be added to HT [0], but will be added directly to HT [1].

Let’s look at the underlying API call of the get command for querying key value pairs, which calls the dictFind method.

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;

    if (d->ht[0].used + d->ht[1].used == 0) return NULL; 
    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        if(! dictIsRehashing(d))return NULL;
    }
    return NULL;
}
Copy the code

As you can see, the dictIsRehashing method also says, if the dictionary is in rehash state, you need to do a bucket transfer before you can return. It is worth noting that the intermediate logic of the method is nested in a for loop for two times, the first time searching from HT [0] for the key-value pair we were given, and if not found, the second loop will search from HT [1] for the key-value pair we are looking for.

The reason why Redis rehash is progressive is that even if it is in the Rehash state, the insertion, query, and even deletion of all nodes are unaffected until the whole rehash is completed and redis releases the unused memory occupied by HT [0].

Ps: Dictionary implementation in Redis is much simpler than Java implementation, mainly because Redis is called in a single thread and does not require additional concurrent statement control.

Dictionary iterators

An iterator is a tool used to iterate over all the nodes in a dictionary. There are two types: safe iterators and unsafe iterators. A safe iterator is one that allows you to make changes to the structure of the dictionary as you iterate, which allows you to add, delete, and modify key-value pairs in the dictionary. An unsafe iterator is one that does not allow changes to any node in the dictionary.

The dictIterator structure is defined as follows:

typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;
} dictIterator;
Copy the code

Field D refers to a dictionary structure that will be iterated over, index records the bucket index that is iterated into the dictionary, table has a value of 0 or 1 indicating the dictionary hash table that is iterated over, and safe marks whether the iterator is safe or unsafe. Entry records the node of the current iteration. The value of nextEntry is equal to the next pointer of the entry, which is used to prevent node loss after the current node is deleted. Fingerprint Stores a fingerprint information calculated by the dictFingerprint function according to the basic information of the current dictionary. If there is a slight change, the fingerprint information will change, which is used for insecure iterator test.

Safe iterator acquisition method:

dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;
    return iter;
}
Copy the code

Unsafe iterators can be obtained by:

dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    i->safe = 1;
    return i;
}
Copy the code

The iterator’s core method, dictNext, is used to fetch the next node in the dictionary.

dictEntry *dictNext(dictIterator *iter)
{
    while(1) {// If the iterator works for the first time, entry must be nullif(iter->entry == NULL) {dictht *ht = &iter->d->ht[iter->table];if (iter->index == -1 && iter->table == 0) {
                if(iter->safe) // Add the iterators field to the dictionary, disabledrehashIter operation - > d - > iterators++;elseIter ->fingerprint = dictFingerprint(iter->d); } // the iterator starts working, pointing to the 0 bucket iter->index++; // If index is greater than or equal to size, the last bucket iteration is completeif (iter->index >= (long) ht->size) {
                if(dictIsRehashing(iter->d) && iter->table == 0) {// The current dictionary structure is workingrehashHt [1] iter->table++; ht[0] iter->table++; iter->index = 0; ht = &iter->d->ht[1]; }else{// Otherwise, the iteration is indeed completebreak; Iter ->entry = ht->table[iter->index]; }else{// If entry is not null, try iterating through its subsequent nodes iter->entry = iter->nextEntry; } // At this point, the iterator has reached the next nodeif(iter->entry) {// Record the value of nextEntry iter->nextEntry = iter->entry->next;returniter->entry; }}return NULL;
}
Copy the code

Most of the logic is already commented out. The entire method is an infinite loop, and if entry is null, either the iterator is working for the first time or the iterator is iterating to the last node of a bucket. If it is the latter, it goes to the if logic to determine whether the entire dictionary has been iterated over, if not the next bucket.

If the dictionary is not in the rehash state, incrementing the iterators property prevents subsequent operations from triggering the rehash. If the dictionary is already in the rehash state, do not panic. Iterate over the nodes that have been migrated to HT [1] before the iterator works. Because if you are a safe iterator, once iterators increment, subsequent nodes will not trigger rehash migrations, so you will not iterate over the data repeatedly.

The iterator should be released after iteration. The corresponding method in Redis is:

void dictReleaseIterator(dictIterator *iter)
{
    if(! (iter->index == -1 && iter->table == 0)) {if (iter->safe)
            iter->d->iterators--;
        else
            assert(iter->fingerprint == dictFingerprint(iter->d));
    }
    zfree(iter);
}
Copy the code

An unsafe iterator recalculates the fingerprint and compares it to the one the iterator originally calculated. An unsafe iterator uses assert assertions to determine if the fingerprint is consistent. If it is inconsistent, you perform a dictionary modification in the unsafe iterator, and the program exits with an error.

These are the two basic safe and insecure iterators in redis dictionary and their principles. After all, it is not allowed to iterate while rehash. In fact, there is an advanced traversal method in Redis called scan traversal, which allows iteration while rehash. We will analyze its source code later, stay tuned!


Focus on the public not getting lost, a programmer who loves to share.
Public number reply “1024” add the author wechat together to discuss learning!
All the case code materials used in each article are uploaded to my personal Github
Github.com/SingleYam/o…
Welcome to step on it!