Redis source code analysis series of articles

Redis source code series installation and common APIS in Liunx

Why analyze from Redis source code

String low-level implementation – dynamic String SDS

Redis bidirectional linked list article all know

Interviewer: Tell me about Redis Hash base:…… (Interview questions from reading)

preface

I’ll see you on Friday. In the past few weeks we have looked at the underlying data structure of Redis, such as dynamic string SDS, bidirectional linked list Adlist, Dict Dict. If you do not understand the common types or underlying data structure of Redis, please see the above portal.



Today we will take a look at the underlying architecture of ZSET. If you don’t know what ZSET is, you can read the first portal above. Simply put, ZSET is a data structure provided by Redis to judge its ranking based on data and scores. The most common is the ranking of wechat sports. Each user corresponds to the number of steps, and the ranking of users can be given every night.

There is no need to introduce the ZSET structure of Redis.

Of course, if the sorting method is used, it can achieve the same function, but the code requires hard coding, will add work, but also provide code bugs, ha ha ha. And the bottom of Redis is C implementation, direct operation of memory, speed will be higher than Java method implementation.

To sum up, there are many advantages to using Redis ZSET structure. Don’t say any more of that. Start. Before we begin in earnest, we need to introduce the concept of a down-skip list, which is the underlying implementation of the ZSET structure. This is going to be a little boring, but I’ll try to keep it simple.



What is a skip list?

For the linked list structure with a large amount of data, the insertion and deletion are fast, but the query speed is slow. That’s because you can’t get a node directly. You need to start at the beginning and get the next node using the next pointer of one node. Even if the data is orderly discharged, if you want to query a certain data, you can only traverse the variables from the beginning to the end. The query efficiency is very low, and the time complexity is O(n).

What if we need to query a linked list quickly? Some of you said array, but what if you don’t change the data structure?

Let’s think about the dichotomy in an ordered array structure, which reduces the range by half each time, and makes the query much faster. Can we use this idea in linked lists?

This brings us to the main character of today’s lecture — skip lists. (The introduction of the concept is also a bit stiff 😊)



Step 1 Create an ordered single necklace table

Take a look at the ordered one-way list below, which contains seven elements: 1,2,3,4,5,6,7.



Step 2 Extract secondary index nodes

We can extract some nodes from the linked list, as shown in the figure below, 1,3,5,7 nodes are extracted, that is, one node for every two nodes is extracted to the higher level, and the extracted nodes are called indexes.

Notice that it’s not always perfect, but it’s like flipping a coin, where the probability of each coin is the same, 1/2. When the amount of data is small, the probability of heads and tails can vary greatly. But as the data gets bigger and bigger, the probability of heads and tails gets closer and closer to 1/2. By analogy, it means the same thing. Every node has the same chance, either staying at the original level or extracting to the higher level, with a probability of 1/2. However, as the number of nodes increases, the extracted nodes are closer and closer to 1/2.



Step 3 Extract level 3 index nodes

We can extract some nodes from the linked list. In the figure below, 1 and 5 nodes are extracted, that is, one node for every two nodes is extracted to the higher level. The extracted nodes are called indexes.

Step four types of binary query

Let’s say we want to find a node with a value of 6, start with the tertiary index, find the node with a value of 1, find the node with a value of 5, based on the next pointer to the node with a value of 1, find the node with a value of 5, there is no other tertiary index after 5.

Therefore, we searched down and reached the secondary index. According to the next pointer of the node with the value of 5, we found the node with the value of 7, which was smaller than 6, indicating that the node 6 to be found was within this range.

After that, the next pointer of the node with the value of 5 points to the node with the value of 6. It is found to be the data to be queried, so the query process ends.

Based on the above query process (the blue line in the figure below), we can see that the core idea used is the dichotomy method, constantly narrowing the query scope, if the upper index is found in the range, then go down to the next level to find the real data.



conclusion

It can be seen from the whole process above that when the amount of data is small, this method of exchanging space for time and consuming memory is not the optimal solution. Therefore, the ZSET structure of Redis adopts the compression table when the amount of data is small, and the skip table when the amount of data is large.

Such a linked list and multi-level index structure is a skip list. The name of the image, the process is jumping to query.



Redis skip list diagram

It is to jump table below simple improvements and encapsulation again, first of all introduced the concept of header, the two-way linked list, dictionary structure, are the data encapsulation, because they are using a pointer, the pointer inevitably leads to the calculation length, obtain the final node data will produce the performance of the query is too slow, So wrapping headers is a way to speed things up, and the time wasted on adding, deleting, etc., is negligible in comparison.

The second is the introduction of a layer array that manages all nodes. We can see that there are 32 layers, that is, 32 arrays, which is the same as the data node structure. It was introduced to make it easy to locate each element directly based on the number of layers in this array.

Secondly, each level of the data node has a level and span (that is, the number on the arrow pointer in the figure below, which is convenient to calculate the distance between the two nodes).

The last is the backward pointer of the data node. The purpose of introducing it is that the Level array has only the front pointer, that is, it can only point to the address of the next node, while the backward pointer is to find the node backwards.



The figure above is mainly divided into three blocks :(it is ok to have a general look here, and the codes of each module will be explained in detail below)

header

It mainly includes four attributes, including header, tail, length of node, and maximum level of all nodes.

Header: indicates the table head node of the skip table. The table head can be directly found by this pointer address. The time complexity is O (1).

Tail: Indicates the node at the end of a skip table. You can use this pointer to find the end of a skip table. The time complexity is O (1).

Length: Records the length of the skip table, that is, the number of elements in the skip table excluding the head node.

Level: records the level with the maximum number of layers of all nodes in the current skip table (excluding the table head node).

Manages an array of levels for all nodes

The object value is empty and the level array is 32 layers, in order to manage real data nodes. What attributes are placed on the data node for the specific level.

Data nodes

It mainly includes four attributes: object value obj, score score, backward pointer backward and level array. The number of levels in the Level array for each data is randomly generated, just like the skip list mentioned above.

Member object obj: The real real data, the data of each node is unique, but the scores of nodes may be the same. Two nodes of the same score are sorted according to the size of their member objects in the dictionary, with nodes with smaller member objects first and nodes with larger member objects second.

Score score: The number in each node is the number of points saved by the node. In the skip list, nodes are arranged in ascending order of the number of points saved by each node.

Backward pointer Backward: Used to traverse from the end of the table to the top of the table, each node has only one backward pointer, that is, only one backward at a time.

Level: Each layer of a node is marked with the words 1, 2, and 3. L1 represents the first layer, L2 represents the second layer, L3 represents the third layer, and so on.

Skip list definition

Table header structure zskiplist



Typedef struct zskiplist {// tail struct zskiplistNode *header, *tail; Length unsigned long length; // Maximum level of all nodes level int level; } zskiplist;Copy the code

Specific data node zskiplistNode



// typedef struct zskiplistNode {SDS ele; // Double score; Struct zskiplistNode *backward; // Backward pointer // hierarchy array struct zskiplistLevel {struct zskiplistNode *forward; // Forward pointer forward unsigned int span; // span span level[]; } zskiplistNode;Copy the code

Implementation of skip lists (source code analysis)

Redis apis for skip lists are defined in the t_zset.c file.

Don’t run away from source code analysis, be sure to read it.



Create skip list

Create an empty skip list, which creates the table header and manages the level array of all the nodes. First, define some variables to try to allocate memory. Next, initialize the level and length of the table header, assigning 1 and 0, respectively. We then create an array that manages the levels of all nodes by calling the zslCreateNode function with the array size macro constant ZSKIPLIST_MAXLEVEL (32), score 0, and object value NULL. (This is the key to the implementation of skip lists). This is followed by the initialization of the forword and span Pointers to each element of the array. Finally initializes the tail pointer and returns the value.

Please refer to the following diagram and source code:



Zskiplist *zslCreate(void) {int j; zskiplist *zsl; ZSL = zmalloc(sizeof(* ZSL)); ZSL ->level = 1; zsl->length = 0; // Call zslCreateNode with an array length ZSKIPLIST_MAXLEVEL 32. ZSL ->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,0, NuLL); zslCreateNode(ZSKIPLIST_MAXLEVEL,0, NuLL); // Assign the 32 arrays to pointer forward and spanfor(j = 0; j < ZSKIPLIST_MAXLEVEL; j++) { zsl->header->level[j].forward = NULL; zsl->header->level[j].span = 0; } // Set tail pointer ZSL ->header->backward = NULL; zsl->tail = NULL; // Return the objectreturn zsl;
}
zskiplistNode *zslCreateNode(int level, double score, sds ele) {
    zskiplistNode *zn =
        zmalloc(sizeof(*zn)+level*sizeof(struct zskiplistLevel));
    zn->score = score;
    zn->ele = ele;
    return zn;
}
Copy the code

Insert the node

For example, there are 6 elements in the figure below, and we need to insert the element with the value of zhao 6 and the score of 101. Let’s think about it roughly. The general steps include finding the position to insert, creating a new data node, and then adjusting the level array of the related head and tail Pointers. Let’s take a look at redis and see if it’s the same as we thought.

Boom, boom, boom, the answer. Of course the big picture is the same.



Here we go :(pictures first)



1. Traverse the level array of all nodes managed, starting with the largest level, that is, 3. Pair the ratio one by one.

That’s too abstract. Take the above figure as an example, starting from level 3 of the table head, first go to L3 of Zhang SAN and find that the score is 70, which is smaller than the target score 101, skip. According to the previous pointer, find L3 of Zhao Liu and find that the score is 102, which is larger than the target score 101. Record Zhao Liu L3 in the array update to be updated, and record span is 4. Then go to the next layer, the L2 layer of Zhang SAN, find that the score 70 is smaller than the target score 101 skip, according to the previous pointer to find Wang 5’s L2, find that the score 90 is smaller than the target score 101 skip, according to the previous pointer to find Zhao 6’s L2, find that the score 102 is larger than the target score 101, record zhao 6’s L2 to the array update, The record span is 2. Finally to the next layer, Zhang SAN’s L1 layer, logic and just the same, also record Zhao Six’s L1 layer and span span is 1.

2. Generate the number of levels for the new node randomly (by bit operation). If the level generated is greater than the current maximum level 3, the parts larger than the level will be traversed one by one and the span information will be recorded in the update table above.

For example, if the level of a new node is 5 and the current maximum level is 3, there will only be one node and it will span all previous nodes, then we will iterate through the fourth and fifth levels and record them into the array update.

3. After all the preparation work is done, the node will be inserted into which position, which layer, and the corresponding span of each layer are found, the next data node will be added. Add the information from the previous two steps to the new node, and adjust the cursor before and after the position.

4. Finally, some finishing work is done, such as modifying the table header level, node size length, tail pointer and other attributes.

So the whole process is over. This might look a little complicated, but look at the code below.



// Insert node, Ele: insert element score ele zskiplistNode *zslInsert(zskiplist * ZSL, double score, ZskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x; Unsigned int rank[ZSKIPLIST_MAXLEVEL]; unsigned int rank[ZSKIPLIST_MAXLEVEL]; int i, level; serverAssert(! isnan(score)); x = zsl->header; // Start with the highest level and find the position of each layer to be inserted from the top to the bottomfor(i = zsl->level-1; i >= 0; i--) { /* store rank that is crossed to reach the insert position */ rank[i] = i == (zsl->level-1) ? 0 : rank[i+1]; // Find the location where the first fraction is greater than the element directly // or the location where the fraction is the same as the element but the object's ASSICC code is greater than the elementwhile(x->level[i].forward && (x->level[i].forward->score < score || (x->level[i].forward->score == score && SDSCMP (x->level[I]. Forward ->ele,ele) < 0)) {rank[I] += x->level[I]. x = x->level[i].forward; } update[I] = x; } // Generate a random level between 1 and 32, the higher the level, the lower the probability of generating level = zslRandomLevel(); // If the number of layers generated is greater than the existing number of highest levels, all additional layers need to be initializedif(level > ZSL ->level) {// Start the loopfor(i = zsl->level; i < level; i++) { rank[i] = 0; Header update[I] = ZSL ->header; Update [I]->level[I]. Span = ZSL ->length; update[I]->level[I]. } zsl->level = level; } // create node x = zslCreateNode(level,score,ele);for(i = 0; i < level; X ->level[I]. Forward = update[I]->level[I]. Forward; update[i]->level[i].forward = x; // rank[0] is the span between P1 and P1 at level 0 and P1 at level 1 The span f(X,Y) between the inserted node X and the post-inserted node Y can be calculated by the following formula-f(P2,0)+1 is equal to the span between the new node and P2, This is because the span is fanned down to the bottom layer. Update [I]->level[I]. Span = update[I]->level[I]. Span - (rank[0] -rank [I]); Update [I]->level[I]. Span = (rank[0] -rank [I]) + 1; update[I]->level[I]. Span = (rank[0] -rank [I]) + 1; } /* increment spanfor untouched levels */
    for(i = level; i < zsl->level; i++) { update[i]->level[i].span++; } // Layer 0 is a bidirectional linked list, so redis often supports backward classes to find x->backward = (update[0] == ZSL ->header)? NULL : update[0];if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    zsl->length++;
    return x;
}Copy the code

int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}Copy the code

Obtaining node Rankings

In case you forget this picture, paste it again. The following figure, this part of the logic is relatively simple, I will not write, specific reference code analysis.



// input parameters are ZSL, score, Real data ele unsigned long zslGetRank(zskiplist * ZSL, double score, SDS ele) {zskiplistNode *x; unsigned long rank = 0; int i; X = ZSL ->header; // Start the loop from level, the maximum value of the table headerfor(i = zsl->level-1; i >= 0; I --) {// If the score is less than the target score, the ranking plus the span // or the score is the same, but the specific data is less than the target data, the ranking plus the spanwhile(x->level[i].forward && (x->level[i].forward->score < score || (x->level[i].forward->score == score && sdscmp(x->level[i].forward->ele,ele) <= 0))) { rank += x->level[i].span; x = x->level[i].forward; } // Make sure the same score is found at level I and the same object is returnedif (x->ele && sdscmp(x->ele,ele) == 0) {
            returnrank; }}return 0;
}Copy the code

conclusion

This article is mainly about Redis ZSET data type of the underlying implementation jump table, what is first to jump from the table, the concept of jump tables and data structures, analyses its main constituent, and through many explained the Redis process image is how to design the jump table, finally combined with the source code to jump table is described, such as the creation process, add node process, Get a node ranking process, interspersed with examples and process diagrams.

If you feel that writing is also good, trouble to give a praise 👍, your recognition is the power of my writing!

If you think something is wrong, please feel free to comment.

All right, bye.

And a focus on