Redis source code analysis series of articles

Redis source code series installation and common APIS in Liunx

Why analyze from Redis source code

String low-level implementation – dynamic String SDS

Redis bidirectional linked list article all know

preface

Hello, lovely people, hello again. Today’s article is from last year’s interview reading essay interview question, the result is abused. This part does not say, next time special open one, write down my interview was abused famous scene, embarrassing not good, the whole conversation. Ha, ha, ha, ha, ha, ha, ha. 😂




Today I’m going to write the Hash type of Redis. If you’re not familiar with Redis, or are interested in other data types, you can follow the series above. (Top, top, top. Important things three times.)



The Hash type is widely used in Redis, where the key-to-value mapping is maintained through dictionary structures. Take notes here.



Use the API

The API is relatively simple to use, so the following is sketched out.

Insert data hset

Use the hset command to insert two key/value pairs (name, Zhangsan) and (age,20) into the MyHash to return the length of the current Myhash.

Get data hGET

Run the hget command to obtain the value of myhash whose key is name.



Get all data hGEtall

Use the hgetall command to getall the keys and values in the MyHash.

Get all keys

Use the hkeys command to get all the keys in myHash.



To obtain the length of the

Use the hlen command to get the length of myhash.

Get all values

Get all values in myhash using the hvals command.



Concrete logic diagram

The bottom layer of hash mainly adopts the dict structure and presents layer encapsulation.

DictType is a dictType,dictht a core, rehashidx a marker of progressive hash, and iterators. The most important of these are Dictht and rehashidx.

Next up is dictht, which consists of two arrays, one for the real data store location, and one for the hash procedure, including the real data table and some common variables.

Finally, the data nodes, just like the bidirectional linked list mentioned in the previous part, each node has a next pointer, which is convenient to point to the next node, in order to solve the hash collision. See the picture below for details.

It doesn’t matter if you don’t understand it, we’ll explain it for each module later. (Don’t skip the ☺)

Definition of a bidirectional linked list

Dict structure dict

Let’s start with dict, a dictionary structure with four parts, focusing on dictht[2] (real data) and Rehashidx (tokens for progressive hash). The specific figure is as follows.



The specific code is as follows:

// typedef struct dict {dictType *type; // Type, including custom functions that enable keys and values to store void * privData; // Private data dictht HT [2]; / / twohashTable long rehashidx; / / progressivehashFlag, if -1, no progress is madehashunsigned long iterators; // Number of iterators being iterated} dict;Copy the code

Array structure dictht

Dictht mainly consists of four parts. 1 is the real array of data dictEntry type, which stores data nodes. 2 is the array length size; 3 is the parameter to hash, sizemask, that’s not important, just remember it’s equal to size-1; 4 indicates the number of data nodes. Used indicates the number of data nodes.



The specific code is as follows:

//hashTypedef struct dictht {dictEntry **table; // Array of real data unsigned long size; // Array size unsigned long sizemask; / / the user will behashThe value of the position index mapped to table is always equal to size-1 unsigned long used; } dictht;Copy the code

Data node dictEntry

DictEntry is the real data node, including the key, value, and next nodes.



// typedef struct dictEntry {void *key; //key union { void *val; uint64_t u64; int64_t s64; double d; } v; //value struct dictEntry *next; // the address of the next data node} dictEntry;Copy the code


Expansion process and progressive Hash diagrams

Let’s start with the first part, why does dictht[2] require two arrays to store real data in one array?



In fact, this is similar to Java HashMap, which is a data plus linked list structure. As the amount of data increases, hash collisions occur more frequently, and the linked list behind each array becomes longer, making the whole linked list very cumbersome. If the business needs a large number of query operations, because it is a linked list, you can only start the query from the head, and wait for the list of an array to complete the query before starting the next array, so that the query time will be extended.

This is definitely scaling up, so the first array holds the real data and the second array is used for scaling up. Nodes in the first array are hash mapped to the second array, and so on. Can we provide services to the outside world during the process? And the answer is yes, because it can stop at any time, which brings us to the next variable, rehashidx. (Not a stiff transition, hahaha)

Rehashidx is actually a marker. If it is -1, it indicates that the capacity is not expanded; if it is not -1, it indicates the subscript position to which the capacity is expanded to facilitate the expansion from the subscript position next time.

This is not too abstract, or a face meng force, close to the expansion of the process of all solutions, must point praise comment more praise me oh. (More and more shameless…)



Step 1

First, before capacity expansion, rehashidx is -1, indicating capacity expansion. The dictEntry length of the first array is 4, and there are 5 nodes in total, so used is 5.



Step 2

When expansion occurs, rahashidx is the first subscript of the first array, which is 0. The enlarged size is the minimum that is greater than 2 to the n of used*2, that is, the minimum that can contain multiples of 2 of these nodes *2. Since there are currently 5 data nodes, used*2=10. After expansion, the size of the array is greater than 10 to the power of 2, which is the minimum value of 16. Start at the index 0 of the first array, look for the first element, find the node whose key is name and value is triple, hash it, find the node whose index is 1 in the second array, and move it over, which is actually a pointer move. So I’m just going to make it easy here.




Step 3

After the node whose key is name and value is zhang SAN is moved, the subsequent node with subscript 0 of the first array dictht[0] is moved in the same way as above.



Step 4

Continue to move the subsequent nodes with subscript 0 of the first array dictht[0], and start to move the node with subscript 1, finding that there is no data, so move the node with subscript 2, and change rehashidx to 2 at the same time. The move procedure is the same as above.



The whole process is focused on rehashidx, which is the subscript of the first array being moved, and can be stopped at any time if memory is running low or the operating system is busy.

What would it look like to operate on that object after it’s stopped?

  • If it’s a new array, you just add the second array, because if you add it to the first array, you still have to move it over, so there’s no need to waste time
  • If delete, update, query, search first array, if not found, query second array.

Dictionary implementation (source code analysis)

Create and initialize the dictionary

First, allocate memory, then call the initializer _dictInit, mainly assignment, focusing on rehashidx with a value of -1 (this verifies the diagram, -1 indicates no hash expansion), and finally return whether the creation was successful.

/* Create and initialize the dictionary */ dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));
    _dictInit(d,type,privDataPtr);
    return d;
}

/* Initialize the hash table */
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);
    d->type = type; d->privdata = privDataPtr; d->rehashidx = -1; // Assign -1 to indicate no progresshash
    d->iterators = 0;
    return DICT_OK;
}Copy the code

capacity

Dict has a static method called dictexpandifNeed to determine whether an expansion is required.

DictIsRehashing (d) ((d)->rehashidx! If it is -1, it is not in hash state. If the condition is false, expansion can be performed; if it is not -1, it is in hash state. If the condition is true, expansion cannot be performed.

Then check whether the size of the first array is 0. If it is 0, expand to the default size 4. If it is not 0, execute the following code.

Then determine whether expansion is needed. There are three conditions in IF, and specific analysis is as follows.

At last, the dictExpand method is called, and the parameter is the double size of data node HT [0]. Used *2. This verifies that the array size of the expansion process above is 16.

The capacity expansion method is relatively simple. Obtain the expanded size and set the second size to the new size.

It’s kind of empty. Look at the flow chart.

Capacity Expansion Flowchart



Specific code:

Static int _dictExpandIfNeeded(dict *d) {// Determine whether the state is expanding by calling macro constants#define dictIsRehashing(d) ((d)->rehashidx != -1)// To determine whether the capacity can be expandedif (dictIsRehashing(d)) returnDICT_OK; // Check whether the first array size is 0, if 0, call the expansion method, the size of the macro constant //#define DICT_HT_INITIAL_SIZE 4
    if (d->ht[0].size == 0) returndictExpand(d, DICT_HT_INITIAL_SIZE); // It is listed belowifStatic int dict_can_resize = 1; static int dict_can_resize = 1; Static unsigned int dict_force_resize_ratio = 5; static unsigned int dict_force_resize_ratio = 5; // Let's analyzeifAll the nodes of an array of conditions, if the first number is greater than or equal to the size of the first array (said node data has some more) / / and available capacity (number 1) or all nodes number divided by the array size is greater than 5 / / conditions of the expansion and said that the need is greater than or equal to the array length, the first is the node // The second point is that you can choose between expansion and too much dataif(d->ht[0].used >= d->ht[0].size && (dict_can_resize || d->ht[0].used/d->ht[0].size > dict_force_resize_ratio)) { // Call the expansion methodreturn dictExpand(d, d->ht[0].used*2);
    }
    returnDICT_OK; } int dictExpand(dict *d, unsigned long size) { dictht n; Unsigned long realsize = _dictNextPower(size); // Some judgment criteriaif (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    if (realsize == d->ht[0].size) returnDICT_ERR; n.size = realsize; n.sizemask = realsize-1; n.table = zcalloc(realsize*sizeof(dictEntry*)); n.used = 0; / / the firsthashIf the value is null, initialization is underwayif (d->ht[0].table == NULL) {
        d->ht[0] = n;
        returnDICT_OK; } / / ishash, to the secondhashLength set to new, d->ht[1] = n; d->rehashidx = 0; // Set the current statushash 
    returnDICT_OK; } /* Find the minimum value greater than size, */ static unsigned long _dictNextPower(unsigned long size) {unsigned long I = DICT_HT_INITIAL_SIZE;if (size >= LONG_MAX) return LONG_MAX;
    while(1) {
        if (i >= size)
            returni; i *= 2; }}Copy the code


Progressive hash

The progressive hash process has been illustrated above, but let’s focus on how the code is implemented and whether the process is correct.

After the expansion, the dictRehash method is executed, taking the hash table D to be moved and the step number N.

First, check whether the marker rehashidx is equal to -1. If it is equal to -1, it means that the hash is complete. If it is not equal to -1, execute the following code.

We then loop through each subscript on the first array, updating the rehashidx value by 1 each time we move the subscript position.

Then carry on the second loop, traverses the next linked list each node, completes the data migration, mainly is the pointer movement and some parameter modification.

Finally, it returns an int value, 0 indicating that the entire data has been hashed, or 1 indicating that part of the data has been hashed but not all. You can continue the hash using the rehashidx value next time.

The specific code is as follows:

/ / tohashThe hash table structure of Redis contains two table arrays, t0 and t1. Normally, only one t0 is usedhashWhen the heavyhashD: The hash table to be moved. The structure holds the current weighthashWhich bucket is it? // 2. n: Step NrehashThe value 0 indicates that the entire table is heavyhashInt dictRehash(dict *d, int n) {int empty_visits = n*10; // If rehashidx=-1, 0 is returned, indicating thathashcompleteif(! dictIsRehashing(d))return0; // There are n steps in ht[0], and there are no moving nodeswhile(n-- && d->ht[0].used ! = 0) { dictEntry *de, *nextde; assert(d->ht[0].size > (unsigned long)d->rehashidx); // The first loop is used to update the value of rehashidx. Since some buckets are empty, rehashidx does not advance one position at a time. Instead, it may advance several positions, but not more than 10. Table [d->rehashidx] is not emptywhile(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return1; } de = d->ht[0].table[d->rehashidx]; // The second loop is used to copy the linked list (or single node) of each non-empty bucket found in ht[0] to HT [1] /* Use the loop to move the data node */while(de) {
            unsigned int h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    return 1;
}
Copy the code


conclusion

This paper mainly describes the Dict structure of Redis Hash data type, which is the bottom implementation of Dict structure. First, using some Hash API, Dict structure is derived, and its three main components are analyzed: Dict structure Dict, array structure Dictht, data node structure DictEntry, Furthermore, the process of expansion and rehash is explained through multiple process diagrams. Finally, the dictionary is described with source code, such as the creation process, expansion process, and progressive hash process, with flow charts in the middle.

If you feel that writing is also good, trouble to give a praise 👍, your recognition is the power of my writing!

If you think something is wrong, please feel free to comment.

All right, bye.

And a focus on