Hash underlying storage principle and some suggestions for optimizing big Hash in Redis

Hash is the most frequent compound data structure in Redis. In addition to dict structure data, all keys and values in the entire Redis database also form a global Hash, and key sets with expiration times are also Hash. The set is equivalent to a Hash with a value of null, and the mapping between value and score in the Zset is also implemented using the Hash structure.

Due to poor business consideration, a hash structure in the production environment stores 40W data, resulting in increasing memory usage of Redis, and the query efficiency of this key becomes increasingly low, losing the original intention of using cache to speed up the query. The implementation principle of hash and the capacity expansion mechanism in stored procedures are analyzed here.

Hash principle

The internal implementation structure of Hash is roughly the same as that of Java HashMap, which adopts the two-dimensional structure of array + linked list. When the array positions of the first hash are collided, the colliding elements will be concatenated using a linked list. If the list length is too long, the query time complexity will be reduced to O(n).

In Java8, when the length of a linked list is larger than 8, it will be automatically converted into a red-black tree to improve the query efficiency. Redis uses zipList and HashTable structures to store linked lists.

The underlying structure

zipList

Ziplist is a compressed list data structure developed to save memory. Ziplist is a sequential data structure composed of a series of consecutively encoded memory blocks. A Ziplist can contain any number of entries, and each entry can store a byte array or an integer value. Ziplist does not store Pointers to the last node and the next node, but the length of the last node and the current node. It sacrifices part of the read and write performance for efficient memory utilization. It is a time for space idea, and ziplist is suitable for scenarios with few fields and few field values.

Ziplist consists of:

<zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>
Copy the code

hashtable

Hashtable is implemented through the dictEntry object, which is rewrapped to produce the object Dictht:

typedef struct dictht { dictEntry **table; // Hash table array, each element is a dictEntry object. unsigned long size; // Hash table size unsigned long sizemask; // The mask size, used to calculate the index value, is always equal to size-1 unsigned long used; // the number of nodes in the hash table} dictht;Copy the code

Inside the dictionary is a nested hash table dictht object. Here is the definition of the dictionary:

typedef struct dict { dictType *type; // Some specific functions of dictionary type void * privData; // Private data, specific functions in type may need to use dictht HT [2]; // Hash table (note that there are two hash tables) long rehashidx; // Rehash index. When not rehash, the value is -1 unsigned long iterators; // Number of iterators in use} dict;Copy the code

So when creating a hash object, the overall class structure is as follows

Ziplist and Hashtable conversion mechanism

When a hash object can satisfy either of the following two conditions, the hash object will choose to use ziplist for storage:

  1. The total length of all key-value pairs (both key and value) in a hash object is less than 64 bytes (this threshold can be controlled with the hash-max-ziplist-value argument).
  2. The number of key-value pairs in a hash object is less than 512 (this threshold can be controlled with the hash-max-ziplist-entries parameter).

If either of these conditions is not met, the hash object chooses to be stored using hashTable.

Expansion process

Big expansion of the dictionary is very time consuming, you need to apply for a new array, under normal circumstances, when the number of elements in a hash table is equal to the length of the first dimensional array, will begin expansion, expansion of the new array is 2 times the size of the original array, and then the old dictionary all hooked up to a new array elements in the list below, This is an O(n) level operation, and Redis uses progressive rehash, which is called progressive rehash, to slowly rehash key and value pairs from the old array into the new array multiple times. Progressive rehash can avoid the massive computation of centralized rehash. In the progressive rehash process, because new key-value pairs may be stored, Redis will put the newly added key-value pairs into HT [1] uniformly, thus ensuring that the number of HT [0] key-value pairs is only reduced. When the rehash operation is performed, ht[0] is queried first. If no result is found, ht[1] is queried.

Problem analysis

1. Storage problems

When the key value reaches about 40W, the underlying storage is bound to be converted to Hashtable. Compared with Hashtable, ziplist structure has fewer Pointers, greatly reducing the use of memory, which is precious to Redis. When Ziplist is stored, the memory allocation is continuous and the query is faster.

2. Capacity expansion

Each expansion needs to apply for a new array twice the size of the current array. The larger the old array is, the memory usage of the new array will also double. During the expansion process, because Redis is a single thread, other operations should be supported during the process of relocating the old data. The redis may migrate to a new array for a long time, and then reach the capacity expansion condition. The performance of the entire Redis server will suffer.

3. Query questions

When the number of key values is doubled, the probability of hash conflicts will also increase. Redis only stores linked lists at the bottom, without efficient data structures such as query trees, which degrades the query speed from O(1) to O(N), affecting service query efficiency and user experience.

Optimization Suggestions

  1. What is the demand of the old data, whether can be done by changing the data structure, if simply to judge whether the data exists, you can use the bloom filter, bloom filter is an array, a very small memory footprint, although certain deviation may occur, but will not result in massive cache penetration problem, A small number of data errors can be handled at the database level without affecting the flow of normal requests.
  2. The key name is usually XXX: XXX: XXX. The use of: is similar to a tree structure. We can use different types of hash to distinguish different hashes, or we can use time to divide the time interval. In this way, you can use the type field and time field to perform different streams during query. In addition, you can avoid the judgment mechanism and try to keep the number of keys in each hash at around 1W.
  3. A proper deletion mechanism is required because the expiration time of the hash can only be set for the whole hash but not for each key. Therefore, it is necessary to determine the expiration time in the code and delete the rarely used key values in time, leaving only hot data.
  4. Use STR instead of hash, such benefits can be flexible to each STR set expiration time, constantly updated expiration time, when every visit to ensure data is not timeout, hot cold data can be automatically fails, but it also has some problems, the expired data cleaning is used in redis random strategy and strategy of inertia, This prevents massive data failures from taking up the main thread for purging, but it can also result in a lot of data that is not actually being purged, and redis’ memory footprint continues to increase.
  5. Optimize the content size of the key or value. For example, the user can be replaced with U, and the order can be used with O. Keep the data naming simple and clear.