This is the second day of my participation in Gwen Challenge

Summary:

There are many places in Redis to use the dictionary structure, such as our redis hash structure is to use the dictionary, and the bottom of the dictionary structure is to use the hash table to achieve (😄 a little around ~), XDM can understand the following knowledge points through this article.

  1. Hash table, hash table node, and dictionary structure and implementation.

  2. How the hash algorithm Redis resolves hash conflicts.

  3. Rehash principle and implementation of Redis

Hash table, hash table node, and dictionary structure and implementation

Hash table

The structure of a hash table looks like this:

Typedef struct dictht {// dictEntry **table; // Hash table size unsigned long size; // Always equal to size-1 unsigned long sizemask; // The number of existing nodes in the hash table unsigned long used; } dictht;Copy the code

Table is an array of Pointers, each of which points to a hash table node called dictEntry.

Size is the size of the current hash table, which is the number of elements in our table array

Used records the number of nodes (key-value pairs) that the hash table currently has

The sizemask attribute always equals size-1. This attribute, along with the hash value, determines which index of the array a key should be placed on. This should be used for rehash if MEMORY serves.

Hash table node

Typedef struct dictEntry {// key void *key; // value union {void *val; uint64_t u64; int64_t s64; } v; Struct dictEntry *next; struct dictEntry *next; } dictEntry;Copy the code

Key is the key that we hash, which refers to an SDS object or an integer.

Next is a pointer to a dictEntry object, which is used to resolve hash conflicts. When hash conflicts occur, all the keys in conflict form a linked list.

The dictionary

Typedef struct dict {// type specific function dictType *type; // Private data void * privData; // Dictht ht[2]; // Rehash index // If rehash is not in progress, the value is -1 int rehashidx; /* rehashing not in progress if rehashidx == -1 */ } dict;Copy the code

Type is a pointer to the dictType structure. Each dictType contains methods to manipulate different types of strings, and privData holds the parameters required by these methods

DictType structure is as follows:

Typedef struct dictType {// calculate the hashFunction unsigned int (*hashFunction)(const void *key); Void *(*keyDup)(void * privData, const void *key); Void *(*valDup)(void * privData, const void *obj); Int (*keyCompare)(void * privData, const void *key1, const void *key2); Void (*keyDestructor)(void * privData, void *key); Void (*valDestructor)(void * privData, void *obj); } dictType;Copy the code

We can see that there are also two fields ht[2] and rehashidx. Normally our data is in HT [0]. Ht [1] is only used for rehash, and rehashidx is our rehash progress. When rehashidx is -1, it indicates that the current hash table has not been rehash. We can see this judgment in redis source code.

Here is a normal dictionary structure without rehash:

How does redis solve hash conflicts

A hash collision occurs when two or more keys are hashed to the same value. As mentioned earlier in the basic structure, next is a pointer to a dictEntry object. When a hash collision occurs, all the conflicting keys form a linked list. When the query is performed, the linked list will be followed one by one to find the value equal to the required key, and then return.

Rehash principle and implementation of Redis

The load factor

First of all, we will introduce the concept of load factor, which will cause our hash table to grow as we add and delete the operation. Ht [0]. Used/HT [0]. Size imbalance

So we can think about this ratio when is this hash table optimal?

It should be 1, because if it’s 1 there’s no wasted space, there’s not that many hash collisions.

Redis will use this ratio to determine whether our hash table should expand or shrink. Let’s take a look at the rehash process.

  1. The first step is to use our HT [1] and allocate space for it. How much space is allocated depends on whether we want to expand or shrink, and how many elements are currently in HT [0]. Ht [1] is the first operation whose size is greater than or equal to HT [0]. Used * 2^n Is the first operation whose size is greater than or equal to ht[0]. Used * 2^n For example, if our used is 6, the space we need to apply is (6*2 = 12) <2^4 is 16. It should all be understandable.
  2. Then we put the key on our HT [0] on our HT [1] using the new recalculated hash and index value.
  3. Ht [0] will be null when all key pairs on ht[0] are migrated to HT [1], and ht[0] will point to HT [1] for null.

When is rehash going to happen?

  1. The server is not currently running the BGSAVE or BGREWRITEAOF command, and the load factor of the hash table is greater than or equal to 1

    .

  2. The server is running the BGSAVE or BGREWRITEAOF command, and the load factor of the hash table is greater than or equal to 5.

  3. When the load factor of the hash table is less than 0.1, the program automatically starts to shrink the hash table

Why the load factor values are different during the execution of BGSAVE or BGREWRITEAOF, I guess it is because of performance considerations, XDM can check the relevant information and comment.

If we have a large hash table, the rehash process will take a long time. Our redis is single-threaded, and the logic that consumes the main thread time is fatal. Redis has a really elegant design here, and XDM must have figured out that we still have a rehashidx field missing, which is important for the progressive Rehash we’ll be writing later.

conclusion

XDM this first to this, the main sort of Redis dictionary hash table implementation, and the process of rehash. I’ll take a look at Redis’s elegant design progressive Rehash and its source code implementation.

Stay tuned ~