Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

A skip list is an ordered data structure that enables fast access to nodes by maintaining multiple Pointers to other nodes in each node.

In most cases, skip lists are as efficient as balanced trees, and because skip lists are much easier to implement than balanced trees, many programs use skip lists instead of balanced trees.

Redis uses skip lists as one of the underlying implementations of ordered set keys. If an ordered set contains a large number of elements, or if the elements in the ordered set are long strings, Redis uses skip lists as the underlying implementation of ordered set keys.

Ordered collections are implemented using two data structures that allow insert and delete operations to achieve O(log(N)) time complexity. The two types of data structures are hash tables and skip tables. Add elements to the hash table that map member objects to scores; The element is also added to the skip list to sort by score.

Unlike linked lists, dictionaries and other data structures that are widely used in Redis, Redis only uses skip lists in two places: one is to realize ordered set keys, and the other is to be used as internal data structures in cluster nodes. Beyond that, skip lists have no other use in Redis.

1. The reason for introducing the skip table

Arrays and linked lists pain points:

The solution: Dimension, also known asSpace for time. The diagram below:

2. Data structure

Skiplist is a space-for-time structure. Because of the linked list, binary search can not be carried out, so draw lessons from the idea of database index, extract the key node in the linked list (index), search on the key node first, and then enter the lower linked list search. A skip list is formed by extracting multiple key nodes.

2.1 Skip list nodes

The skip table nodes in Redis are defined in the SRC /server.h file

typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;
Copy the code
  • Ele: Records the value of the skip list node. The type is SDS.
  • Score: Saves the score of a skip table node. In a skip table, the nodes are arranged in ascending order of their saved score.
  • Backward: A backward pointer that points to the previous node of the current node and is used when the program traverses the table from the end to the top. Because a skip list node has only one back pointer, you can only go back to the previous node at a time.
  • Level: Layers of skip table nodes, each with two properties: forward pointer forward and span span. The forward pointer is used to access other nodes at the end of the table, while the span records the distance between the node to which the forward pointer points and the current node. Each skip list node has a maximum of 32 layers.

2.2 jump table

In Redis, skip lists are defined in the SRC /server.h file

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;
Copy the code
  • Header: Points to the table header node of the skip table.
  • Tail: Refers to the end-of-table node of a skip table.
  • Length: records the length of the skip table, that is, the total number of nodes excluding the table head node.
  • Level: Records the number of layers of the node with the largest number of layers in the skip table (the number of layers of the top node is not counted).

A skiplist can be composed of multiple skiplist nodes, but by using a Zskiplist structure to hold these nodes, programs can more easily process the entire skiplist, such as quickly accessing the top and bottom of the skiplist, or quickly obtaining the number of skiplist nodes.

Redis’ Skip list implementation is similar to the Skip list algorithm described by WillianmPugh in “Skip Lists: A Probabilistic Alternative to Balanced Trees”, but there are some differences:

  • Redis skip lists allow repetition of scores;
  • Redis’s skip list sorts not only by score, but also by comparing member objects when the score is the same.
  • Redis skip lists have a backward pointer, forming a bidirectional linked list that can be traversed from the end of the table to the beginning of the table, forZREVRANGEImplementation of a command.

The specific structure diagram is as follows:

headertailThe pointer points to the head and tail of the skip table, respectively. By using these two Pointers, the program locates the head and tail of the skip tableO(1).

By using the length attribute to record the number of nodes, the program can return the length of the skip list at O(1) complexity.

The level attribute is used to get the number of levels of the node with the highest middle height of the skip table in O(1) complexity. Note that the levels of the top and bottom of the skip table are not counted.

Note:

  • Redis skiplist implementation is composed of zskiplist and zskiplistNode. Zskiplist is used to store skiplist information (head node, tail node, length), and zskiplistNode is used to represent skip nodes

  • The height of each jump node is a random number between 1 and 32

  • In the same skip list, multiple nodes can contain the same branches, but the member objects of each node must be unique

  • The nodes in the skip list are sorted by the size of the point value, and the nodes are sorted by the size of the member objects when the points are the same

3. Add, delete, and query operations

Skip lists are based on ordered single linked lists. On the basis of linked lists, each node contains not only one pointer, but also multiple Pointers to subsequent nodes. In this way, some unnecessary nodes can be skipped to speed up search and delete operations. Here is a skip list:

The traditional single-linked list is a linear structure, and it takes O(n) time to insert and find a node into the ordered list. If you use the skip list shown above, you can reduce the time required to find.

Skip lists insert and delete operations are based on lookup operations, understand lookup operations, also understand the nature of skip lists. A lookup is given a key and finds out if that key is present in the skip list.

Combined with the figure above, if you want to find out if 19 exists, you start at the highest level and compare it to the highest level successor of the head node, 9, which is greater than 9, so you compare it to the highest level successor of 9, 21, which is less than 21, so it must be somewhere between 9 and 21.

So we move down one level, and then we compare it to the successor of 9 at that level, 17, which is greater than 17, and then we compare it to 21, which is less than 21, so it’s definitely between 17 and 21.

Then move down one level and compare it with 17’s successor at that level, 19, and you finally find it.

Reference: skip lists in Redis

4. Time and space complexity

Time complexity:

  1. First of all, we increased the span of each index by 2 times, that is, we reduced the number of steps by 2 times, so n/2, N /4, N /8 and so on; The number of k-th index nodes is n/(2^k);

  2. Suppose the index has grade H, and the highest index has two nodes; N over 2 to the h is equal to 2, and from this formula we get h is equal to log2 of n minus 1; So it turns out that the time of the hop table is order logN.

See the derivation for details: Understanding the search time complexity from the Redis skip table implementation

Space complexity:

  1. First, the length of the original list isn, if the index is one index node for every 2 nodes, the node number of each index layer is:n/2, n/4, n/8 ... , 8, 4, 2And so on;
  2. Or so there is one index node for every 3 nodes, and the number of nodes per index level:n/3, n/9, n/27 ... , 9, 3, 1And so on; So the space complexity is zeroO(n);

5. The advantages and disadvantages

  • Skip tables are one of the most typical space-for-time solutions, and only benefit when there is a large amount of data. And should be read more write less under the circumstances can be used, so its scope of application should be relatively limited

  • Maintenance costs are relatively high: all indexes need to be updated when adding or deleting;

  • And finally, updates in the process of adding and deleting are order log n.

Skip list API

function role Time complexity
zslCreate Create a new skip list O(1)
zslFree Releases the skip list and all its containing nodes O(N)
zslInsert Adds a new node to the skip list with the given member and score Average order log to the N, worst order N.
zslDelete Deletes a skip table containing a given member and a value node Average order log to the N, worst order N.
zslGetRank Returns the rank of nodes in a skip list containing the given members and values Average order log to the N, worst order N.
zslGetElementByRank Returns the node of the skip list in the given rank Average order log to the N, worst order N.
zslIsInRange Determines whether any node in the skip list has a score within a given range O(1)
zslFirstInRange Returns the first node in the skip list in a range of points Average order log to the N, worst order N.
zslLastInRange Returns the last character in the skip list in a range of points Average order log to the N, worst order N.
zslDeleteRangeByScore Deletes all nodes in the skip table for a range of points O(N)
zslDeleteRangeByRank Deletes all nodes in a rank range from the skip table O(N)

6.1 Creating a Skip Table

#define ZSKIPLIST_MAXLEVEL 32 /* Should be enough for 2^64 elements */

/* Create a skiplist node with the specified number of levels. * The SDS string 'ele' is referenced by the node after the call. */
// Create a skip list node
zskiplistNode *zslCreateNode(int level, double score, sds ele) {
    zskiplistNode *zn =
        zmalloc(sizeof(*zn)+level*sizeof(struct zskiplistLevel));
    zn->score = score;
    zn->ele = ele;
    return zn;
}

/* Create a new skiplist. */
// Create a skip list
zskiplist *zslCreate(void) {
    int j;
    zskiplist *zsl;

    zsl = zmalloc(sizeof(*zsl));
    zsl->level = 1;
    zsl->length = 0;
    // A skip list contains an empty head node with a maximum of 32 layers
    zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL,0.NULL);
    for (j = 0; j < ZSKIPLIST_MAXLEVEL; j++) {
        zsl->header->level[j].forward = NULL;
        zsl->header->level[j].span = 0;
    }
    zsl->header->backward = NULL;
    zsl->tail = NULL;
    return zsl;
}
Copy the code

6.2 Layer random algorithm of skip table nodes

#define ZSKIPLIST_MAXLEVEL 32 /* Should be enough for 2^64 elements */
#define ZSKIPLIST_P 0.25      /* Skiplist P = 1/4 */

/* Returns a random level for the new skiplist node we are going to create. * The return value of this function is between 1 and ZSKIPLIST_MAXLEVEL * (both inclusive), with a powerlaw-alike distribution where higher * levels are less likely to be returned. */
int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}
Copy the code

In the Redis skip table, if a new node is added, the number of layers of the node is randomly calculated by using the function zslRandomLevel. The number of layers of the node ranges from 1 to 32,0 xFFFF is 65535, that is, a number between 0 and 65535 is generated each time. The following rules can be summarized:

  1. becauselevelInitialize to 1, so the probability that the final number of layers is 1 is1-ZSKIPLIST_P, i.e.,0.75.
  2. When the final number of layers is greater than 1, the probability of each increase in the number of layers isZSKIPLIST_PSo the probability of n layers is zero(1-ZSKIPLIST_P ) * ZSKIPLIST_P ^ (n - 1), i.e., 。
  3. Because the probability of increasing the number of layers is ZSKIPLIST_P, it can be regarded as the number of nodes at the k layer is 1/ZSKIPLIST_P times of the number of nodes at the K +1 layer, so the skip list of Redis is equivalent to a quadtree.

6.3 Inserting a Skip Table Node

/* Insert a new node in the skiplist. Assumes the element does not already * exist (up to the caller to enforce that). The skiplist takes ownership * of the passed SDS string 'ele'. */
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
    // Update array records the new node in each layer of the precursor node
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    // The rank array is used to record the rank of the new node in the skip list
    // The top node of the table is ranked 0, so the sum of the nodes passing through is the ranking of the inserted node
    unsigned int rank[ZSKIPLIST_MAXLEVEL];
    inti, level; serverAssert(! isnan(score));//x points to the head node of the skip list
    x = zsl->header;
    // Work your way down from the top
    for (i = zsl->level- 1; i >= 0; i--) {
        /* store rank that is crossed to reach the insert position */
        // The top rank element is initialized to 0, and the other levels are initialized to the top rank element
        rank[i] = i == (zsl->level- 1)?0 : rank[i+1];
        // A node below the current layer exists
        // The point of the node below the current layer is less than the point of the inserted node
        // When the score is equal, the element of the inserted node is larger than the element of the next node
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                    sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            // Count the span of nodes searched in the current layer
            rank[i] += x->level[i].span;
            // The node pointer cursor moves back to find the insertion location
            x = x->level[i].forward;
        }
        // Record the precursor nodes of the new node in the current layer
        update[i] = x;
    }
    /* we assume the element is not already inside, since we allow duplicated * scores, reinserting the same element should never happen since the * caller of zslInsert() should test in the hash table if the element is * already inside or not. */
    // We assume that the element has not yet been inserted into the skip list. Although we allow duplicate node scores in the skip list, repeated insertion of the same element is not allowed
    //, so we need to test whether the same elements already exist in the hash table before calling zslInsert.
    // The number of layers of random new nodes
    level = zslRandomLevel();
    // The height of the new node is greater than the maximum height of the current skip list node
    if (level > zsl->level) {
        for (i = zsl->level; i < level; i++) {
            // There is only one node at the current level, so the sum of nodes at this level is 0
            rank[i] = 0;
            // The precursor node of the node inserted by the taller layer is the head node
            update[i] = zsl->header;
            // Initializes the span of the layer's head nodes
            update[i]->level[i].span = zsl->length;
        }
        // Update the highest level of the skip list
        zsl->level = level;
    }
    // Create a skip list node
    x = zslCreateNode(level,score,ele);
    for (i = 0; i < level; i++) {
        // Insert a new node in the current layer (similar to insert a node in a linked list)
        x->level[i].forward = update[i]->level[i].forward;
        update[i]->level[i].forward = x;

        /* update span covered by update[i] as x is inserted here */
        // The span of each layer of the new node needs to be calculated
        // The span of each layer is the rank of subsequent nodes minus the rank of new nodes
        Update [I]->level[I].span + rank[I] + 1
        // Rank [0] + 1
        x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
        // The span of the precursor nodes of each layer of the new node needs to be calculated
        // The span of the precursor node is: the rank of the new node minus the rank of the precursor node
        // Rank [0] + 1
        // rank[I]
        update[i]->level[i].span = (rank[0] - rank[i]) + 1;
    }

    /* increment span for untouched levels */
    for (i = level; i < zsl->level; i++) {
        // At the higher layer, the new node is preceded by the head node. The span of the head node is the length from the tail node
        update[i]->level[i].span++;
    }

    // Set the back node
    // If the new node is preceded by a head node, the back node is empty
    x->backward = (update[0] == zsl->header) ? NULL : update[0];
    // If a new node is not a tail node, set the back pointer of its next node to point to the new node
    // The backward node can only make the reverse traversal backward by one bit, so only the first level is judged
    if (x->level[0].forward)
        x->level[0].forward->backward = x;
    else
        zsl->tail = x;
    // The number of skip list nodes increases
    zsl->length++;
    return x;
}
Copy the code

6.4 Deleting a Node from a Skip Table

/* Internal function used by zslDelete, zslDeleteRangeByScore and * zslDeleteRangeByRank. */
void zslDeleteNode(zskiplist *zsl, zskiplistNode *x, zskiplistNode **update) {
    int i;
    for (i = 0; i < zsl->level; i++) {
        if (update[i]->level[i].forward == x) {
            // After node x is deleted, the span of the current layer X node's precursor needs to be modified
            // The span of the precursor: the span of the precursor + the span of the x node -1
            update[i]->level[i].span += x->level[i].span - 1;
            update[i]->level[i].forward = x->level[i].forward;
        } else {
            update[i]->level[i].span -= 1; }}if (x->level[0].forward) {
        x->level[0].forward->backward = x->backward;
    } else {
        zsl->tail = x->backward;
    }
    while(zsl->level > 1 && zsl->header->level[zsl->level- 1].forward == NULL)
        zsl->level--;
    zsl->length--;
}

/* Delete an element with matching score/element from the skiplist. * The function returns 1 if the node was found and deleted, otherwise * 0 is returned. * * If 'node' is NULL the deleted node is freed by zslFreeNode(), otherwise * it is not freed (but just unlinked) and *node is set to the node pointer, * so that it is possible for the caller to reuse the node (including the * referenced SDS string at node->ele). */
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node) {
    // The update array records the nodes to be deleted at the beginning of each layer
    zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
    int i;

    x = zsl->header;
    for (i = zsl->level- 1; i >= 0; i--) {
        while (x->level[i].forward &&
                (x->level[i].forward->score < score ||
                    (x->level[i].forward->score == score &&
                     sdscmp(x->level[i].forward->ele,ele) < 0)))
        {
            x = x->level[i].forward;
        }
        update[i] = x;
    }
    /* We may have multiple elements with the same score, what we need * is to find the element with both the right score and object. */
    // The same score may correspond to different elements. The node to be deleted must have the same score and element value
    x = x->level[0].forward;
    if (x && score == x->score && sdscmp(x->ele,ele) == 0) {
        //x is the node to be deleted
        zslDeleteNode(zsl, x, update);
        if(! node) zslFreeNode(x);else
            *node = x;
        return 1;
    }
    return 0; /* not found */
}
Copy the code

Interview: Why does Redis use skiplist instead of Red-black?

  1. Skiplist is as complex as a red-black tree and much simpler to implement.
  2. Skiplist has another advantage in concurrent environments. Inserting and deleting red-black trees may require some rebalance operations, which may involve the rest of the tree. Skiplist is more local, with fewer nodes to lock on. So the performance is better in this case.

See The Choose Concurrency-Friendly Data Structures by Herb Sutter for details.

In addition this paper have a more detailed description and comparison, page50 ~ 53: www.cl.cam.ac.uk/research/sr…

Why skiplist The Skip List

There are a few reasons:

  1. They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
  2. A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
  3. They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don’t think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I’m using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores) , and using the “Redis Cluster” solution that I plan to develop in the future.

Redis SkipList SkipList SkipList SkipList SkipList SkipList SkipList SkipList