1. Data structure of data stored by Redis
We all know that Redis commonly used data structures are String, List, Hash, Set, Sorted Set. But this is just the way key-value pairs are represented when we use them. The underlying data structures they really use are simple dynamic strings, bidirectional linked lists, compressed lists, hash tables, tunable tables, and integer arrays
As you can see, the underlying implementation of String has only one data structure, a simple dynamic String. List, Hash, Set, and Sorted Set all have two underlying implementation structures. Typically, we refer to these four types as collection types, which are characterized by the fact that a key corresponds to a collection of data
Redis middle key and worth data structure
2.1 Data structure of redis key values
Since Redis itself requires fast fetching, the time complexity must be O1. To achieve fast key-to-value access, Redis uses a hash table to hold all key-value pairs. The data structure is as follows
Redis is a global hash table that stores all key and value relationships. The value stores the address of the object.
2.2 the hash conflict
Since you’re using hash tables, you’re bound to have hash conflicts. Redis resolves hash conflicts in the same way that hash tables in JAVA resolve chained hashes. Chained hashing is also easy to understand, which means that multiple elements in the same hash bucket are kept in a linked list, connected in turn by Pointers.
So when the Key is particularly large, there is a time drain on the chain lookup. Especially when we have tens of millions of keys, that’s a lot of time
2.3 rehash blocking
To resolve hash conflicts, the linked list is too long, so Redis introduces a rehash operation on the hash
Rehash is to increase the number of existing hash buckets so that the increasing number of entry elements can be scattered among more buckets, reducing the number of elements in a single bucket and thus reducing conflicts in a single bucket.
Specific operation
Redis uses two global hash tables by default: hash table 1 and hash table 2. At first, when you insert data, hash table 1 is used by default, and hash table 2 is not allocated. As the data grows, Redis starts rehash, a three-step process:
- 1, allocate more space to hash table 2, for example, double the current size of hash table 1;
- 2. Rehash the data in hash table 1 into hash table 2.
- Alter table 1; alter table 1;
We can switch from hash table 1 to hash table 2, store more data with the enlarged hash table 2, and save the original hash table 1 for the next rehash expansion
This process may seem simple, but the second step involves a lot of copying. If you migrate all the data in hash table 1 at once, the Redis thread will block and be unable to service other requests. At this point, Redis cannot access the data quickly.
To avoid this problem, Redis uses progressive Rehash.
2.4 Progressive Rehash
To put it simply, in the second step of copying data, Redis still processes client requests normally, starting with the first index position in hash table 1 and copying all entries in this index position into hash table 2. When the next request is processed, copy the entries for the next index position in hashed table 1. As shown below:
In this way, the cost of a large number of copies at one time is cleverly spread over the processing of multiple requests, avoiding time-consuming operations and ensuring fast data access.
Second, compress the list
As mentioned earlier, there are five main types of underlying data structures for collection types: integer arrays, bidirectional linked lists, hash tables, compressed lists, and hop tables. Among them, hash tables have just been described, integer arrays, bidirectional linked lists are also more common. So let’s look specifically at compressed lists and hop tables
** ZLbytes: ** Records the number of memory bytes used by the compressed list. It is used for memory reallocation or calculation of zlen position. It takes 4 bytes
**Zltail: ** Records the number of bytes from the start node of the compressed list to the end node of the compressed list. With this offset, the program can determine the position of the end node of the table to occupy 4 bytes without facilitating the entire compressed list
**Zllen: ** Records the number of nodes in the list, occupying 2 bytes
**Zllend: ** Records the end of the compressed list, occupying 1 byte
** Entry: ** The data saved in the compressed list has the following attributes
- Prev_len: indicates the length of the previous entry. Prev_len has two values: 1 byte or 5 bytes. If the value is 1 byte, it indicates that the length of the last entry is less than 254 bytes. Although a 1-byte value can represent values ranging from 0 to 255, the default value of zlend in the compressed list is 255. Therefore, 255 is used to indicate the end of the compressed list by default, and 255 is not used anywhere else to indicate length. Therefore, the prev_len value is 1 byte if the length of the previous entry is less than 254 bytes; otherwise, the prev_len value is 5 bytes.
- Len: indicates its length, which is 4 bytes.
- Encoding: Encoding mode, 1 byte.
- Content: Saves actual data.
If we want to locate the first and last elements, we can locate them directly by the length of the first three fields of the table, which is O(1). When you look for other elements, it’s not as efficient, you have to look for them one by one, and it’s order N.
Three, jump table
An ordered list can only look up elements one by one, making operations very slow, hence the appearance of skip lists. Specifically, the skip table is based on the linked list, adding multi-level index, through several jumps of index position, to achieve fast location of data.
If we use the skip list shown above, we can reduce the search time to O(n/2) because we can skip half of the nodes by looking through the top pointer of each node first.
For example, if we want to find 50, we first compare it with 20, when it is larger than 20, we compare it with 40, and then compare it with 70. When 70 is larger than 50, we find that the search point is between 40 and 50. From this process, we can see that 30 is skipped in the search.
Skip list is actually an algorithm that trades time for space. Each node in the linked list not only records the position of the next node, but also records the next level node according to the level level. This method is the use of “first big step to find the scope, and then gradually reduce the idea of approaching” search. Skip lists are very close to red-black trees in algorithmic efficiency.
4. Time complexity of Redis data structure
Five, Redis use suggestions
- First, for single-element operations, try to use single-element operations for redis. This kind of efficiency is generally relatively high.
- Second, scope operations refer to traversal operations in a collection type, which can return all data in the collection. Such operations are generally time-consuming and should be avoided as far as possible
- Third, statistical operation refers to the record of the set type on the number of all elements in the set. Because these data structures themselves record the number of all elements, the time complexity is O(1), so the efficiency is very high
- Fourth, the exception refers to the special records of some data structures, such as compressed lists and bidirectional linked lists, which record the offset of the head and tail of the table. For header and tail inserts, add and delete elements can be directly located by offsets, so the time complexity is also O(1). Fast operation