When asked if you know that the hash data type of Redis is based on a hash table, do you know the difference between the hash table of Redis and the HASHmap of JDK? At that time, my friend was confused, and later came back to me to exchange a wave, decided to take a look at the redis hash table structure, not to say more, first to see the source code of redis hash table.
Redis hash table structure
typedef struct dictEntry {
void *key; / / key
union {
void *val;
uint64_t u64;
int64_t s64;
} v; / / value
struct dictEntry *next; // Point to the next hash table node to form a linked list
} dictEntry;
typedef struct dictht {
dictEntry **table; // Array pointer, each element is a pointer to dictEntry
unsigned long size; // This dictht allocates 2 to the n elements
unsigned long sizemask; Sizemask = size-1 = 2^n-1
unsigned long used; // The number of elements used
} dictht;
typedef struct dict {
dictType *type; // A pointer to a method
void *privdata; // Private data that can be passed to the dict
dictht ht[2]; // An array of two dicthts
int rehashidx; // Whether to rehash the flag bit
int iterators; / / the iterator
} dict;
Copy the code
It is composed of arrays and linked lists. The actual object that stores the data is a dictEntry object corresponding to a Hashmap, which is an Entry object. The biggest difference is that there are two dictEntry array objects in redis hash table. Therefore, why redis hash table uses two Dictht arrays to store dictEntry list?
Let’s start by reviewing the hashmap expansion process. In simple terms, you need to copy a new array, rehash the elements of the old array, and then insert the elements back into the new array. Since the resize operation is completed in a one-time and centralized manner, when the array size of the HashMap is 16, the whole resize process is controllable. However, after multiple expansion, the array elements become more and the operation elements also increase during resize, and the time required for resize becomes uncontrollable. However, redis, as a high-performance cache, needs to avoid such a scenario of uncontrollable operation time from the structural design, and adopts a pleasing strategy of exchanging space for time. Therefore, dict contains an array of two Dicthts. So what does Redis do?
Progressive rehash
The trigger condition
When any of the following conditions are met, the program automatically starts to expand the hash table:
- The server is not running BGSAVE or BGREWRITEAOF, and the hash table load factor is greater than or equal to 1.
- The server is running the BGSAVE or BGREWRITEAOF command and the hash table load factor is greater than or equal to 5. (Tips: In order to avoid the existence of child process to expand the hash table operation, avoid unnecessary memory write, save the maximum memory)
Steps:
- Allocate space for HT [1] so that the dictionary holds both ht[0] and HT [1] hash tables
- Maintain an index counter variable, rehashidx, in the dictionary and set its value to 0 to indicate that rehash work has officially begun
- Every time a dictionary is added, deleted, found, or updated during rehash, the program rehashes to HT [1] all key/value pairs on the REhashidx index of the HT [0] hash table, in addition to the specified operation. When the rehash is complete, The program increments the value of the rehashidx property by 1
- As the dictionary operation continues, eventually at some point in time all key-value pairs of HT [0] are rehashed to HT [1], at which point the program sets the value of the rehashidx property to -1 to indicate that the rehash operation is complete
Due to the addition of a DICtht maintenance, when the load factor of the hash table is less than 0.1, it will also shrink the hash table, the operation process is the same as progressive rehash.
To summarize the differences with the JDK’s HashMap
- In terms of data structure, two arrays are used to store data. When hash conflicts occur, only the chain address method is used to resolve hash conflicts. Unlike JDK1.8, when the linked list exceeds 8, it is optimized into a red-black tree.
- Unlike the JDK’s hashMap, incremental rehash is used when expansion occurs. Each operation will only operate on the current element, remove it from the current array, or store it in a new array, until the elements of the old array are completely empty.
- When the load factor is less than 0.1, the system automatically shrinks the capacity. The JDK’s HashMap does not provide scaling for performance reasons.
- Whereas Redis uses MurmurHash to compute the hash value of a hash table key, JDK hashMap uses the high and low 16 bits of key.hashcode() to compute the hash value of the key.