This is the third day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

Dictionaries in Redis are widely used to implement various Redis features, including databases and hash keys.

The underlying implementation of a dictionary is a hash table, with each dictionary having two hash tables, one for normal use and one for rehashing space.

Dictionary structure definition

typedef struct dict {
	
	// Type specific functions
	dictType *type;
	
	// Private data
	void *privdata;
	
	// Hash table, two elements
	dictht ht[2]
	
	// Index subscript of the record when rehash is performed. If there is no rehash, the value is -1
	int rehashidx;

} dict;
Copy the code

== When rehash is performed, rehashidx migrates the entry data of each index by + 1; = =

Wherein, the structure of hash table Dictht is defined as:

typedef struct dictht {
	
	// Hash table array
	dictEntry **table;
	
	// Hash table size
	unsigned long size;
	
	// Hash table size mask, used to calculate index values
	unsigned long sizenask;
	
	// The number of existing nodes in the hash table
	unsigned long uesd;

} dictht;
Copy the code

Table is an array, and each element of the array points to a pointer to type dictEntry, which holds a key-value pair.

Here we can also see that the nodes of the hash table are linked to solve the hash conflict problem, that is, the chained address method.

Hashing conflicts and hashing algorithms

To achieve fast key-to-value access, Redis uses hash tables to hold all key-value pairs. The Key corresponds to the Key set by Redis, and the value corresponds not to the value itself, but to a pointer to a specific value. The biggest advantage of using hash tables is that you can quickly find key-value pairs with O(1) time complexity. But since it’s a hash table, there’s always the problem of hash conflict.

Hash conflict refers to the fact that when the hash values of two keys and the hash bucket calculate the corresponding relationship, they fall on the same hash bucket.

Redis resolves hash conflicts by using chained hashing, or zipping. When multiple elements point to the same hash bucket, the linked list is used to store the corresponding data in the same hash bucket, and they are connected by Pointers in turn.

The hash algorithm

To add a new key-value pair to a dictionary, the program calculates the hash value and index value based on the key-value pair. Then, based on the index value, the hash table node containing the new key-value pair is placed on the specified index of the hash table array.

ReHash process

There is a load factor in the hash table that controls the number of key-value pairs the hash table holds. This requires a rehash operation. The calculation formula of the load factor is as follows:

// Load factor = Number of nodes stored in the hash table/size of the hash table load_factor = ht[0].used / ht[0].size
Copy the code

The conditions for expanding and shrinking hash tables are as follows:

  • The server is not running BGSAVE or BGREWRITEAOF, and the hash table load factor is greater than or equal to 1.
  • The server is running the BGSAVE or BGREWRITEAOF command, and the hash table load factor is greater than or equal to 5.

If one of the above conditions is met, the rehash process is performed.

If the server is executing BGSAVE or BGREWRITEAOF, Redis creates a child of the current server process

The rehash process can be divided into three steps:

  1. Allocate more space to hash table 2, for example, double the current hash table 1;
  2. Remap and copy the data from hash table 1 into hash table 2;
  3. Free hash table 1;

The size of the first allocated space is determined by the current type of rehash operation and the number of key-value pairs in the current hash table.

  • When an extension operation is performed, the space allocated is the first value greater than or equal to 2^n (the number of key-value pairs in the hash table * 2);

    Assuming the current number of key-value pairs is 4, then 4 * 2 = 8, because 8 is exactly 2^3, which is exactly the first value equal to 2^n, so the expansion space is 8;

  • If shrink is performed, the space allocated is the first value greater than or equal to 2^n;

Progressive reHash

When the number of hash tables is large, if the data is copied all at once, then it is very likely to affect the server. So Redis rehashes multiple times, which is a progressive rehash.

To put it simply, in the second step, Redis still processes client requests normally, starting with the first index position in hash table 1 and copying all entries from this index position into hash table 2. On the next request, copy the entries at the next index location.

This cleverly spreads the cost of making a large number of copies at once over multiple requests, avoiding time-consuming operations and ensuring fast data access.

Hash table operation during rehash

In progressive Rehash operations, dictionary deletions, lookups, updates, and so on are performed in both hash tables. For example, if you want to find a key in the dictionary, you will first look up the original table, and if you cannot find it, you will look up the new table.

The addition of dictionaries will be stored in the new table.