Interviewer: Tell me about the underlying structure of the Hash data type in Redis

preface

This article will introduce the underlying structure of Redis Hash data structure, dictionary.

An overview of the

A dictionary, also known as a symbol table, associative array, or map, is an abstract data structure for holding key-value pairs. In dictionaries, a key can be associated (or mapped to) with a value. These associated keys and values are called key-value pairs.

Dictionaries are often built into many high-level programming languages as a data structure, but Redis’s C language does not have such a data structure built in, so Redis built its own dictionary implementation.

Dictionaries are widely used in Redis. For example, the database of Redis uses dictionaries as the bottom layer, and the operation of adding, deleting, searching and changing databases is also built on the operation of dictionaries.

For example, “website” is a hash key containing 10086 key-value pairs. The keys of the hash key are some database names, and the values of the keys are the main database url:

redis>HLEN website
(integer) 10086

redis>HGETALL website
1)"Redis"
2)"Redis.io"
3)"MariaDB"
4)"MariaDB.org"
5)"MongoDB"
6)"MongoDB.org".Copy the code

The underlying implementation of the “website” key is a dictionary containing 10086 key-value pairs, such as:

The key is” Redis” and the value is” Redis. IO “.
Key: “MariaDB”, value: “MariaDB.org”
Key: “MongoDB”, value: “MongoDB.org”

Next, I’ll show you how the Redis dictionary is implemented.

The dictionary implementation

Hash table

The hash table structure is defined as follows:

typedef struct dictht {
    // Hash table array
    dictEntry **table;
    // Hash table size
    unsigned long size;
    // Hash table size mask, used to calculate the index value, always equal to size-1
    unsigned long sizemask;
    // The number of existing nodes in the hash table
    unsigned long used;
}
Copy the code

Hash table node

The hash table node structure is defined as follows:

typedef struct dictEntry{
    / / key
    void *key;
    / / value
    union{
        void *val;
        uint64_tu64;
        int64_ts64;
    } v;
    
    // Point to the next hash table node to form a linked list
    struct dictEntry *next;
} dictEntry;
Copy the code

The dictionary

The field data structure in Redis is defined as follows:

typedef struct dict{
    // Type specific functions
    dictType *type;
    // Private data
    void *privdata;
    / / a hash table
    dictht ht[2];
    // Rehash index. When rehash is no longer performed, the value is -1
    int reshidex;
} dict;
Copy the code

The type and privData attributes are set for creating polymorphic dictionaries for different types of key-value pairs:

typeProperty is a point todictTypeStructure pointer to eachdictTypeStructure holds a family of functions that operate on specific types of key-value pairs,RedisDifferent type-specific functions are set for dictionaries with different uses
whileprivdataAttributes hold optional parameters that need to be passed to those type-specific functions

The following is a dictionary in its normal state (without rehash)

The hash algorithm

To add a new key-value pair to a dictionary, the program calculates the hash and index values based on the keys of the key-value pair. Then, based on the index value, the hash table node containing the new key-value pair is placed above the specified index of the hash table array.

The structure of an empty dictionary is shown below:

Now add a key-value pair k0 and v0 to the dictionary, and the program will first execute the statement:

// Calculate the hash value of key using the hash function set by the dictionary
hash = dict -> type -> hashFunction(k0);
Copy the code

Compute the hash value of k0, assuming the calculated hash value is 8, then the program continues:

index = hash&dict -> ht[0].sizemask = 8 & 3 = 0;
Copy the code

Calculate the index value of k0 as 0, and place k0 and v0 in the corresponding positions in the hash table array, as shown in the figure below:

Key conflict

When two or more keys are assigned to the same index of the hash table array, we say those keys have a collision.

The hash table of Redis uses separate chaining to solve key conflicts. Each hash table node has a next pointer. Multiple hash table nodes can use the next pointer to form a one-way linked list. This solves the problem of key collisions.

The following figure (k1,v1) shows the structure that is placed into the hash table array after key collisions occur:

rehash

As the operation continues, the number of key pairs stored in the hash table will gradually increase or decrease. In order to keep the load factor of the hash table within a reasonable range, when the number of key pairs stored in the hash table is too high or too low, the program needs to expand or shrink the size of the hash table accordingly.

The work of expanding and contracting hash tables can be done by performing a rehash operation. Redis rehashes the dictionary’s hash table as follows:

The size of the hash table depends on the operation to be performed and the number of key-value pairs ht[0] currently contains (i.e. Ht [0]. Used) :

If an extension operation is performed, then the size of ht [1] is the first one greater than or equal to ht [0]. Used *2 of 2″(2 to the NTH power).
If the shrink operation is performed, then the size of ht[1] is the first 2″ greater than or equal to ht[0]. Used (2 to the n power).

2. Rehash all key/value pairs saved in HT [0] onto HT [1] : Rehash means to recalculate the key hash and index values and then place the key/value pairs in the ht[1] hash table at the specified position

3. When all key pairs contained in HT [0] are migrated to HT [1] (HT [0] becomes empty), release HT [0], set HT [1] to HT [0], and create a new blank hash table in HT [1] for the next rehash

Here’s an example:

Suppose the program wants to extend HT [0] as follows:

1, ht[0]. Used is currently 4, 4 * 2=8, and 8 is the first hash table greater than or equal to 4 to the power of 2 n, so ht[1] is set to 8

Ht [0] rehash all four key pairs to int[1]

3, Release HT [0], set HT [1] to HT [0], and allocate a blank hash table for HT [1]. At this point, the expansion of the hash table is complete, and the program has successfully changed the size of the hash table from 4 to 8

Expansion and contraction of hash tables

When any of the following conditions are met, the program automatically begins to perform an extension operation on the hash table:

The server is not currently executingBGSAVEThe command orBGREWRITEAOFCommand and the load factor of the hash table is greater than or equal to 1
The server is currently executingBGSAVEThe command orBGREWRITEAOFCommand and the load factor of the hash table is greater than or equal to 5

Depending on whether the BGSAVE command or BGREWRITEAOF command is being executed, the load factor required for the server to perform the extension operation is different because Redis needs to create a child of the current server process during the BGSAVE or BGREWRITEAOF command execution, While most of the operating system is used to write the copy (copy – on – write) technology to optimize the use of the child process efficiency, so in the period of the child process, the server will improve execution load factor required for operation of the extension, so as to avoid as much as possible in the period of the child process to hash table to expand operations, this can avoid unnecessary memory write operation, Maximize memory savings.

On the other hand, when the load factor of the hash table is less than 0.1, the program automatically starts to shrink the hash table.

Progressive rehash

Expanding or contracting the hash table requires rehashing all key/value pairs in HT [0] into HT [1]. However, this rehash action is not done in a single, centralized way, but in multiple, incremental ways.

The reason for this is that if only four key-value pairs are stored in HT [0], the server can rehash all of these key-value pairs into HT [1] in an instant; However, if the number of key-value pairs stored in a hash table is not four, but four, or forty, or even four million, then rehashing all of these key-value pairs to HT [1] at once can cause the server to go out of service for a period of time.

Therefore, to avoid the impact of rehash on server performance, the server does not rehash all key pairs in HT [0] into HT [1] at once. Instead, the server slowly rehashes all key pairs in HT [0] into HT [1] in multiple steps.

Here are the steps for progressive rehash of a hash table:

1. Allocate space for HT [1] so that the dictionary holds both HT [0] and HT [1]

2. Maintain an index counter variable rehashidx in the dictionary and set its value to 0 to indicate that rehash work has officially begun

3. During the rehash process, every time a dictionary is added, deleted, found, or updated, the program will rehash to HT [1] all key/value pairs of the HT [0] hash table on the rehashidx index, in addition to the specified operation. The program increments the value of the rehashidx property by one

4. At some point in time, all key-value pairs of HT [0] will be rehashed to HT [1] as the dictionary operation continues, and the program sets the rehashidx property to -1 to indicate that the rehash operation is complete

The benefit of progressive Rehash is that it takes a dial-and-conquer approach, spreading the computation required for rehash key and value pairs over each add, delete, find, and update operation to the dictionary, thereby avoiding the massive computation that comes with centralized Rehash.

Because the dictionary uses both THE HT [0] and HT [1] hash tables during progressive Rehash, delete, find, update, and other dictionary operations are performed on both hash tables during progressive rehash. For example, to look for a key in a dictionary, the program looks in HT [0] first, and if it doesn’t find one, continues to ht[1], and so on.

In addition, during progressive Rehash, new key-value pairs added to the dictionary are stored in HT [1] and ht[0] does not perform any additions. This ensures that the number of key-value pairs contained in HT [0] will only decrease and eventually become empty as the rehash operation is performed.

summary

Dictionaries are widely used to implement various Redis features, including databases and hash keys. Each dictionary has two hash tables, one for normal use and one for rehash only.

If you’re interested in Redis, check out the rest of this column.