preface
Redis is a key-value pair database whose keys are stored by hashing. The entire Redis can be thought of as an outer hash, which is called an outer hash because Redis also provides a hash type inside, which can be called an inner hash. When we use hash objects for data storage, there are two levels of hash storage for Redis as a whole.
The hash object
The hash object itself is also a key-value storage structure, and the underlying storage structure can be divided into two types: ziplist (compressed list) and hashtable (hashtable). The two storage structures are also distinguished by encoding:
Encoding attribute | describe | Object encoding Returned value of the command |
---|---|---|
OBJ_ENCODING_ZIPLIST | Implement hash objects using compressed lists | ziplist |
OBJ_ENCODING_HT | Implement hash objects using dictionaries | hashtable |
hashtable
The key-value in Redis is wrapped by the dictEntry object, and the hash table is wrapped by the dictEntry object again. This is the hash table object Dictht:
typedef struct dictht {
dictEntry **table;// Hash table array
unsigned long size;// Hash table size
unsigned long sizemask;// Mask size, used to calculate the index value, always equal to size-1
unsigned long used;// Hash table has nodes
} dictht;
Copy the code
Note: The table in the structure definition above is an array, each element of which is a dictEntry object.
The dictionary
Dictionaries, also known as symbol table, associative array or map, are nested with hash table DICtht objects. The following is a definition of dictionary HT:
typedef struct dict {
dictType *type;// Some specific functions of dictionary type
void *privdata;// Private data, which may be needed by specific functions in type
dictht ht[2];// Hash table (note that there are two hash tables)
long rehashidx; // Rehash index. When not rehash, the value is -1
unsigned long iterators; // The number of iterators in use
} dict;
Copy the code
DictType defines some common functions internally, and its data structure is defined as follows:
typedef struct dictType {
uint64_t (*hashFunction)(const void *key);// Compute the hash function
void *(*keyDup)(void *privdata, const void *key);// Copy the key function
void *(*valDup)(void *privdata, const void *obj);// Copy the value function
int (*keyCompare)(void *privdata, const void *key1, const void *key2);// Compare key functions
void (*keyDestructor)(void *privdata, void *key);// Destroy the key function
void (*valDestructor)(void *privdata, void *obj);// Destroy the value function
} dictType;
Copy the code
When we create a hash object, we get the following schematic (some properties are omitted) :
Rehash operations
Dict defines an array HT [2], and HT [2] defines two hash tables: HT [0] and HT [1]. By default, Redis only uses HT [0], not HT [1], and does not allocate space for HT [1] initialization.
When setting a hash object, which subscript will fall in the hash array (dictEntry[3] in the figure above) is determined by calculating the hash value. If a hash collision occurs, then there will be multiple dictentries of the same index, forming a linked list (the right-hand side of the figure points to NULL), but it is important to note that the last element inserted is always at the top of the list (in case of hash collisions, the last element inserted is always at the top of the list). Always put nodes to the head of the list).
When reading data, a node with multiple elements needs to traverse the linked list, so the longer the linked list, the worse the performance. To ensure the performance of the hash table, you need to rehash the hash table when either of the following conditions is met:
- The load factor is greater than or equal to
1
且dict_can_resize
为1
At the right time. - The load factor is greater than or equal to the security threshold (
dict_force_resize_ratio=5
).
PS: Load factor = Number of nodes in the hash table/size of the hash table (h[0]. Used /h[0].size).
Rehash steps
Both extended and contracted hashing are done by performing rehash, which involves allocating and freeing space through the following five steps:
-
Allocate space for dict DICT HT [1] hash tables, depending on the number of nodes currently stored in the table (i.e. Ht [0]. Used) :
- If the operation is extended
ht[1]
The size of2
nThe first one is greater than or equal to
ht[0].used * 2The value of the property (e.g
used=3At this time,
ht[0].used * 2=6, therefore,
2the
3The power to
8So the first one is greater than
used * 2(2 to the 2nd power < 6 and 2 to the 3rd power > 6).
- If it is a shrink operation
ht[1]
The first one is greater than or equal to magnitude 2 to the nht[0].used
The value of the.
- If the operation is extended
-
Setting the value of the attribute rehashix in the dictionary to 0 indicates that a rehash operation is being performed.
-
All key/value pairs in HT [0] are rehashed in sequence and placed in the corresponding position of ht[1] array. The value of Rehashix increases by 1 after each rehash of key/value pairs is completed.
-
After all key-value pairs in HT [0] have been migrated to HT [1], release HT [0] and change HT [1] to HT [0], then create a new HT [1] array in preparation for the next rehash.
-
If rehashix in the dictionary is set to -1, the rehash operation is complete and the next rehash operation is waiting.
Progressive rehash
This rehash operation in Redis is also called progressive rehash because it does not rehash all of ht[0] at once, but rehashes key/value pairs from HT [0] into HT [1] over several times. Progressive Rehash is a divide-and-conquer idea that avoids the massive amount of computation that centralized rehash entails.
In a progressive rehash process, since new key-value pairs may also be stored, ** Redis puts the newly added key-value pairs into HT [1], ensuring that the number of HT [0] key-value pairs is only reduced **.
When the rehash operation is being performed, if the server receives a command request from the client, it queries HT [0] first and then ht[1].
ziplist
Some of ziplist’s features have been analyzed separately in previous articles. For more details, click here. But the ziplist in a hash object differs from the Ziplist in a list object in that the hash object is a key-value, so the ziplist is also a key-value. Key and value are next to each other:
Ziplist and Hashtable encoding conversion
When a hash object can satisfy either of the following two conditions, the hash object will choose to use ziplist encoding for storage:
- The total length of all key-value pairs (both key and value) in a hash object is less than or equal to
64
Byte (This threshold can be passed by a parameterhash-max-ziplist-value
To control). - The number of key-value pairs in a hash object is less than or equal to
512
This threshold can be specified by the parameterhash-max-ziplist-entries
To control).
If either of these conditions is not met, the hash object chooses to be stored using hashTable encoding.
Common command used to hash objects
- Hset Key field value: Sets a single key field
field
Hash objectkey
Value). - Hmset Key field1 value1 field2 value2: Set multiple fields
field
Hash objectkey
Value). - Hsetnx Key field value: Hash table
key
In the domainfield
Is set tovalue
If thefield
If yes, no operation is performed. - Hget Key field: Gets the hash table
key
In the domainfield
The correspondingvalue
. - Hmget key field1 field2: Get the hash table
key
Multiple fields infield
The correspondingvalue
. - Hdel key field1 field2: Delete the hash table
key
One or more of themfield
. - Hlen key: Returns the number of fields in the hash table key.
- Hincrby Key Field Increment: Indicates a hash table
key
In the domainfield
Delta plus deltaincrement
,increment
It could be negative iffield
An error will be reported if it is not a number. - Hincrbyfloat Key field Increment: indicates a hash table
key
In the domainfield
Delta plus deltaincrement
.increment
It could be negative iffield
notfloat
Type will report an error. - Hkeys key: Obtains the hash table
key
All fields in. - Hvals key: Retrieves the values of all fields in the hash table.
To flushes the Redis database, run the flushall command to prevent interference from other key values.
Then run the following commands in sequence:
hset address country china
type address
object encoding address
Copy the code
The following results are obtained:
You can see that when we have only one key-value pair in our hash object, the underlying encoding is ziplist.
Now let’s change the hash-max-ziplist-entries parameter to 2, restart Redis and type the following command to test:
hmset key field1 value1 field2 value2 field3 value3
object encoding key
Copy the code
The output results in the following:
As you can see, the encoding has changed to hashTable.
conclusion
This paper mainly introduces the use of hashtable, the storage structure underlying the hash type among the five common data types in Redis, and how Redis re-hashes when the hash distribution is not uniform. Finally, some common commands for hash objects are introduced. Some examples are given to verify the conclusion of this paper.