Dict is the most core data structure in Redis, because it holds all the data in Redis. You can simply think of Redis as a large Dict, which stores all the key-values.

Dict in Redis is essentially a hashtable, so it also needs to consider all the issues of hashtables, such as how to organize K-Vs, how to deal with hash conflicts, expansion policies and methods… . In fact, the implementation of Hashtable in Redis is ordinary Hashtable, but Redis innovatively introduces progressive hash to reduce the impact of hashtable expansion on performance. Next, let’s look at the specific implementation of Hashtable in Redis.

Redis Dict implementation

Dict is defined in dict.h, and its fields and meanings are as follows:

typedef struct dict {
    dictType *type;  // A pointer to the dictType structure encapsulates many function Pointers for data operations, enabling dict to handle arbitrary data types (similar to interface in object-oriented language, its methods can be overridden)
    void *privdata;  // A privdata pointer passed by the caller when the dict is created.
    dictht ht[2];  // Two hashtables, ht[0] is the main hashtable, ht[1] is used in progressive hash.
    long rehashidx; Rehashidx == -1 indicates that no rehash is being performed
    unsigned long iterators; /* Number of iterators running */
} dict;
Copy the code

DictType * Type field (I feel it is not appropriate to name it type) is mainly introduced. Its function is to make dict support various data types, because different data types need different operation functions. For example, the calculation method of hashcode string and integer is different. Therefore the dictType encapsulates the operations of different data types by means of function pointer. From the perspective of face object, dictType can be regarded as the interface of various data types related operations in dict, and each data type only needs to realize its corresponding data operations. The following function Pointers are encapsulated in dictType.

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);  // Generate hash values for keys
    void *(*keyDup)(void *privdata, const void *key); // Copy the key
    void *(*valDup)(void *privdata, const void *obj);  // make a copy of val
    int (*keyCompare)(void *privdata, const void *key1, const void *key2); // Compare two keys
    void (*keyDestructor)(void *privdata, void *key); // Key destruction
    void (*valDestructor)(void *privdata, void *obj); // Destruction of val
} dictType;
Copy the code

Dict also has another important field, dictht ht[2]. Dictht is actually hashtable, but why ht[2]? It is necessary to mention the progressive hash of Redis dict. The expansion of dict hashtable is not completed at one time. It is to first build a large new Hashtable and store it in HT [1], and then gradually migrate the data of HT [0] to HT [1]. Rehashidx is the progress of data migration in HT [0]. The progressive hash process will be explained later.

Here’s the definition of dictht:

typedef struct dictht {
    dictEntry **table;  // Contiguous space in hashtable
    unsigned long size; // Table size
    unsigned long sizemask;  // The mask of hashCode
    unsigned long used; // The number of stored data
} dictht;
Copy the code

DictEntry is the encapsulation of each pair of key-value in the dict. In addition to the specific key-value, it also contains some other information as follows:

typedef struct dictEntry {
    void *key;
    union {   // dictEntry stores different data for different purposes
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; // Hash conflicts open chain, singly linked next pointer
} dictEntry;
Copy the code

Dict Hashtables are open chained when hash conflicts occur. If multiple entries fall into the same bucket, they are stored in a single linked list.

If we were to plot dict in memory, it would look like this.

capacity

Before we look at dict’s core API implementations, let’s take a look at dict’s expansion, redis’s progressive hash. What is progressive hash? Why is Redis using progressive hash? How is progressive hash implemented?

To answer these questions, consider the hashtable expansion process. If you are familiar with Java, you may know that the expansion of hashmap in Java is to create a larger space after the data elements reach a certain threshold, move the old data to the space once, and then continue the subsequent operations after the move. If the amount of data is too large, HashMap expansion is very time-consuming, so some programming specifications recommend specifying the capacity of a new HashMap to prevent automatic expansion.

However, when redis creates dict, it cannot know the amount of data. If it directly uses Java Hashmap for capacity expansion, because Redis is single-threaded, it is bound to be unable to do anything during capacity expansion, blocking subsequent requests and ultimately affecting the performance of the whole Redis. How to solve it? In fact, it is very simple. It is to divide a large expansion operation into several small steps to reduce the impact of expansion on other operations step by step. The specific implementation is as follows:

We have seen above that there is dictht HT [2] in the definition of dict. During dict expansion, two hashtables will be stored in HT [0] and HT [1] respectively, where HT [0] is the old hashtable and HT [1] is the new and larger hashtable.

/* Check whether dict needs to be expanded */
static int _dictExpandIfNeeded(dict *d)
{
    /* already in the process of progressive hash, returns */ directly
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* When capacity expansion is configured, capacity will be expanded when the capacity load reaches 100%. If the capacity cannot be expanded, the system forcibly expands the capacity when the load reaches 5 */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2); // Double the capacity
    }
    return DICT_OK;
}
Copy the code

Every time Redis searches the index subscript of a key, it checks whether ht[0] needs to be expanded. If it can be expanded, the expansion will be triggered when hashtable usage exceeds 100%(uesed/size); otherwise, it will be forcibly expanded when usage exceeds 500%. The code to perform capacity expansion is as follows:

/* Dict creation and expansion */ 
int dictExpand(dict *d, unsigned long size)
{
    /* If size is smaller than the number of elements in the hashtable, then size is invalid and error */ is returned
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    dictht n; /* New hashtable */
    // The capacity of the new table is greater than the minimum power of 2 of the current size, but there is an upper limit
    unsigned long realsize = _dictNextPower(size);

    // If the new capacity is the same as the old capacity, no need to continue, return err
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Create a larger hashtable */
    n.size = realsize;
    n.sizemask = realsize- 1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    // For dict initializations, simply assign the new hashtable to HT [0]
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    // Uninitialized, assign the new table to HT [1] and mark rehashidx 0
    d->ht[1] = n;
    d->rehashidx = 0; // rehashidx indicates the subscript of the current rehash to HT [0]
    return DICT_OK;
}
Copy the code

Here the dictExpand just creates a new space and marks rehashidx as 0(rehashidx==-1 indicates that it is not in the process of rehash), without migrating the data in HT [0] to HT [1]. The logic for data migration is all in _dictRehashStep(). _dictRehashStep() migrates only one bucket, which is called during dict look-ups, insertions, and deletions, and migrates at least one bucket per call. And dictRehash() is the concrete implementation of _dictRehashStep(), as follows:

 /* Redis is a progressive hash, which uses batch mode to gradually transfer HT [0] to HT [2], avoiding performance problems caused by large * data migration during hashtable expansion * parameter n refers to the rehash of n bucket */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Specifies the maximum number of empty buckets. If empty_visits an empty bucket, the rehash process ends
    if(! dictIsRehashing(d))return 0;

    while(n-- && d->ht[0].used ! =0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used ! = 0 * /
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1; // If empty_visits an empty bucket, end it
        }
        // Iterate over the linked list in the current bucket and move it directly to the new HashTable
        de = d->ht[0].table[d->rehashidx];
        /* Move all keys from the old hash bucket to the new hash bucket */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the subscript */ of the key in the new hashtable
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if the entire table has been rehash */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);  // Free up the memory occupied by the old HT
        d->ht[0] = d->ht[1];  // HT [0] is always a new HT, and HT [1] is always a new HT
        _dictReset(&d->ht[1]);
        d->rehashidx = - 1;   
        return 0;  // If all tables are hashed, return 0
    }

    /* Still need to hash returns 1 */
    return 1;
}
Copy the code

It can be seen that rehash is to move the data in HT [0] to HT [1] in batches. In this way, the original large operation is divided into many small operations step by step, avoiding the situation that dict expansion is temporarily unavailable in Redis. The disadvantage is that two storage Spaces will be occupied during redis expansion. And it takes a long time.

Core API

insert

/* Adds the element */ to the dict
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL);  
    // 
    if(! entry)return DICT_ERR;  
    dictSetVal(d, entry, val);
    return DICT_OK;
}

/* This function returns only the entry for the key, and does not set the value for the key. Instead, it gives the caller the power to set the value. * * This function is also directly exposed to the user as an API, mainly for storing non-pointer data in dict, such as * Entry = dictAddRaw(dict,mykey,NULL); * if (entry ! = NULL) dictSetSignedIntegerVal(entry,1000); If the key is already in the dict, null is returned and the existing entry pointer is placed in &existing. Otherwise * creates an entry for the key and returns a pointer to it. * /
dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Gets the subscript of the new element, and returns null */ if -1 indicates that the element already exists in the dict
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == - 1)
        return NULL;

    /* Otherwise, the new element is allocated and inserted into the head of the list (generally, newly inserted data is accessed more frequently)*/
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* For a new entry, enter the key */
    dictSetKey(d, entry, key);
    return entry;
}
Copy the code

The insert process is also relatively simple, which is to locate the index of the bucket and insert it into the head node of the singlelinked list. Note that this also needs to take into account the case of rehash, in which case the new data must be inserted into HT [1].

To find the

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (dictSize(d) == 0) return NULL; /* Dict is empty */
    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);
    // We may be rehashing, so we need to check both old and new hashtables
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        // If ht[0] is not found in rehas, there is no need to continue searching for HT [1].
        if(! dictIsRehashing(d))return NULL;
    }
    return NULL;
}
Copy the code

The search process is relatively simple, using Hashcode to locate and then traverse the single linked list. However, consider that if you are in the rehash process, you may need to look up two hashtables in HT [2].

delete

/* Find and delete an element, an auxiliary function of dictDelete() and dictUnlink(). * /
static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
    uint64_t h, idx;
    dictEntry *he, *prevHe;
    int table;

    if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;

    if (dictIsRehashing(d)) _dictRehashStep(d);
    h = dictHashKey(d, key);

    Ht [0] and HT [1] are both deleted
    for (table = 0; table <= 1; table++) {
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        prevHe = NULL;
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                /* Unlink elements from the list */
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht[table].table[idx] = he->next;
                // If nofree is 0, the memory space corresponding to k and v needs to be freed
                if(! nofree) { dictFreeKey(d, he); dictFreeVal(d, he); zfree(he); } d->ht[table].used--;return he;
            }
            prevHe = he;
            he = he->next;
        }
        if(! dictIsRehashing(d))break;
    }
    return NULL; /* Not found for key */
}
Copy the code

Other API

Other API implementations are relatively simple, I have made a lot of annotations in dict.c source code, if you are interested in reading, I only listed and explained its general function.

dict *dictCreate(dictType *type, void *privDataPtr);  / / create the dict
int dictExpand(dict *d, unsigned long size);  / / enlarge shrinks
int dictAdd(dict *d, void *key, void *val);  / / add k - v
dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing); // The dictEntry corresponding to the added key
dictEntry *dictAddOrFind(dict *d, void *key); // Add or find
int dictReplace(dict *d, void *key, void *val); // Replace the value corresponding to the key. If not, add a new k-v
int dictDelete(dict *d, const void *key);  // Delete the data corresponding to a key
dictEntry *dictUnlink(dict *ht, const void *key); // Unloads the entry corresponding to a key
void dictFreeUnlinkedEntry(dict *d, dictEntry *he); // Unloads and clears the entry corresponding to the key
void dictRelease(dict *d);  // Release the entire dict
dictEntry * dictFind(dict *d, const void *key);  // Data search
void *dictFetchValue(dict *d, const void *key);  // Get the value of the key
int dictResize(dict *d);  // Resize dict, mainly for scaling
/************ iterator related ************ /
dictIterator *dictGetIterator(dict *d);  
dictIterator *dictGetSafeIterator(dict *d);
dictEntry *dictNext(dictIterator *iter);
void dictReleaseIterator(dictIterator *iter);
/************ iterator related ************ /
dictEntry *dictGetRandomKey(dict *d);  // Return a random entry
dictEntry *dictGetFairRandomKey(dict *d);   // Return a random entry, but each entry will be returned more evenly
unsigned int dictGetSomeKeys(dict *d, dictEntry **des, unsigned int count); // Get some data from the dict
Copy the code

See codes dict.c and dict.h for other apis.

This article is a Redis source code analysis series of blog posts, but also with the corresponding Redis Chinese annotated version, there are students who want to further study Redis, welcome star and attention. Redis Chinese Annotated version warehouse: github.com/xindoo/Redi… IO /s/1h Redis source code analysis column: zxs. IO /s/1h If you find this article useful, welcome to three links.