“This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!”
preface
-
Believe you must have used xinhua dictionary! The words I couldn’t read as a child were looked up in the dictionary. There’s a similar feature in Redis called a dictionary or symbol table! Is an abstract data structure that holds key – value pairs
-
This article is still part of the [Redis Prequel] series because it is still parsing Redis data structures! Only when you try to understand Redis can you see how it works! To understand why Redis is touted as fast, not told!
Application scenarios
- There are many scenarios in Redis that use dictionaries as the underlying data structure! We should use redis library Settings and five basic data types Hash structure data!
- In the last part of the Redis prequel, we learned about the list data structure. Today we’ll continue our study of the dominant data structure Hash.
- There is a dictionary structure and a hash structure inside Redis, but the hash structure here is not the same as the basic data hash in Redis. We simply understand the dictionary structure and hash structure as a more low-level abstract structure of Redis. The basic hash data structure we use is known as the hash tool
- Today we will focus on the Hash analysis of five data structures. It uses the dictionary structure at the bottom. Inside the dictionary structure is the underlying hash structure. A little round! Understand that you can
Hash table
- The figure above illustrates hashes as the underlying structure of Redis. Internally Redis calls it dictht. Now, why are we going to conflict with the hash structure that we had before? We’re going to use the class name called the dictht class.
- There are four properties in the hictht class: table, size, Sizemask, used; Where the table is an array; The elements in the array are another class called the dictEntry class.
- DictEntry is actually storing data. Inside is the key and value storage structure. A simple hash table looks like this. The data is eventually stored in a dictEntry object in the table.
- Why sizemask = size -1; This is needed to compute the hash index. So why not just use size-1 instead of a variable? This!! I don’t know. Let me go to Baidu, Baidu.
An array of node
- The hash table above is familiar. It is the same as the Map data structure in Java. Yes and no, they’re similar but they’re different.
- We mentioned above that the data is ultimately stored as elements in the table array in the hash table. This element is called dictEntry. Let’s see what the dictEntry structure looks like.
- We can see from the definition of dictEntry on the left. The value stored by dictEntry can be a pointer, a positive number, a floating point, or any number of data types! Similar to the Object property in Java. What doesn’t mean anything to the key above is just a key.
- Since it’s an array and the index is of fixed length, the classic problem with finite length is hash conflict. In Java we resolve conflicts with linked lists and red-black trees! In Redis it is solved by linked lists. The conflicting elements are concatenated in the dictEntry with the next pointer.
- Here we can understand the Map structure in Java. They are very similar inside!!
- It is important to note that redis is indeed stored by the linked list in the case of hash collisions, but the dicthT does not record the tail node of each unindexed linked list, only the head node information. We also know that lists are not very efficient at querying, so redis adds the new node to the top of the list when a hash conflict occurs!
The dictionary
Polymorphic dictionary
- Dictionary is the structure proposed at the beginning of this article! It is also a kind of low-level data structure that is widely used in Redis. The name is dict in Redis.
- It’s clear from the diagram that the inside contains the hash table. In fact, we can see from the name why the hash table is called dictht. The author here thinks it is DicthashCodeTable. A hash – related array inside the dictionary table
- As mentioned earlier, dictionaries are used in many places within Redis! Just like we go to school to use the xinhua Dictionary, idiom dictionary, Xiehouyu dictionary and so on. The names are different but the internal structure is a dictionary for us to quickly locate. Internally, the dict in Redis uses the type field to distinguish each dictionary. Privdata is the specific parameter required for each dictionary. With type and privData, you can easily implement a variety of different dictionaries, which have a proper name ~~Polymorphic dictionary~ ~
rehash
- Besides type and privData, the rest are HT and rehashidx. Where HT is an array of length 2. The elements in the array are the hash table that we mentioned earlier. The reason why ht has length 2 is that we need to understand the rehash process of Redis. And rehashidx is a record of the progress of the rehash! Rehashidx =-1 when no rehash occurs;
- In practice, in the dictionary, all of our data will be stored in the hash table corresponding to HT [0]. Ht [1] is always an empty array. All of these are the reasons for rehash preparation. Before we get started, let’s understand why Redis needs to rehash
- First of all, we have a fixed length array in our hash table and when there’s a conflict internally it’s solved by a linked list. In theory, a hash table can store enough data, and enough here means as much as space allows. But we know that the characteristics of the list is to add, delete quickly but the query is slow, especially when the list is very long when there will be low query efficiency! In order to avoid the list is too long Redis will in certain conditions on the hash table array length extension to solve the local list is too long!
- Each time the length of the array changes, the previous hash needs to go through the process of hashing and addressing the index all over again. This process is calledrehash 。
- The rehash function is the same as the resize function of Map in Java! In Java, resize is a piece of memory that is copied directly out of new memory and it is expanded twice every time. Redis’ rehash is slightly different and basically we can think of it as a 2x extension! The two-memory replication is a little bit similar to garbage collection in the JVM. Sometime we can look at the JVM section together.
- So when do I need to rehash? This is the same as the Java load factor; But in addition to the load factor as a spatial measure, Redis also considers a performance issue. Because in the case of single threading we also have to consider the perception of client use! Single threaded means that the execution commands are executed sequentially. We can’t block all client usage during our rehash process. It’s not user-friendly to the stability of the operation experience.
- What we call a background command involving the above two commands combined with a load factor produces the following conditions
Progressive rehash
-
Always emphasized that Redis is single threaded. So what is a single threaded model? The redis service is a linear operation! However, the commands of each client are unordered, and the first to arrive will be queued first. Redis service will fetch the commands from the queue once for execution. In addition to client-side commands, there are also system-generated commands such as the rehash operation we mentioned above!
-
Firstly, in order to avoid blocking the client or try to control the blocking time within the scope of client awareness, rehash within Redis is not a one-time operation but a gradual process. Copy only one part at a time
-
(2) Remember the dict attribute rehashidx, which tracks the progress of the rehash. Because the inside of a hash table is an array and rehashidx is the index of that array. Thus we can also know that each time the rehash is copied, a fully indexed linked list is copied as a unit.
-
(3) All other operations will affect ht[0] and HT [1] because both arrays are in use during the rehash process
-
(4) When new value is added, it only needs to be added to HT [1]. The ultimate goal is to synchronize all values into HT [1]. The value of HT [0] will gradually decrease; – There is no need to add ht[0]
-
(5) Looking for elements in the rehash process will look for elements in the union of the two arrays. This is why new elements in the rehash process only need to be added to HT [1]
conclusion
① Dictionary table is widely used in Redis. The excellent design of dictionary table is used to solve the single thread problem in Redis
The dictionary contains a hash table, and the internal use node of the hash table is responsible for storing keys and values
3, dictionary type to achieve polymorphic dictionary for multiple scenarios!
(4) Progressive Rehash to solve the problem of service lag