5000 words detailed solution jump table implementation (challenge the most detailed Chinese network)

Recently, I have been studying the storage architecture of the company’s business. The existing storage uses Redis and LevelDB to do data landing through the middleware written by myself. So write business and data recovery a little trouble, thinking about optimization, to study redis and LevelDB source code. I found that the data structure of the hop table is very interesting, with good performance and relatively simple implementation, so I wanted to use go to achieve a hop table, and through this hop table to achieve a zset function similar to Redis. I’ll go through all the implementation details in as much detail as possible. Looking at this article, first familiarize yourself with related knowledge of linked lists, such as the insertion and deletion of linked lists. The grammar of GO is very simple, which is a little similar to THAT of C. Those who are basic in computer can quickly understand the grammar of GO.

The following points to introduce

What is a skip watch
Advantages and disadvantages of a skip table
The structure of a jumper
How to implement a skip table (add, delete, change, check)

What is a skip watch

The time of search, insert and delete operations is O(logn). The detailed definition of the hop table is very useful data structure, but it is not written in many books. I also did not write the hop table in my data structure textbook in university. Lead to a lot of people are not familiar with the jumper.

A skip list is essentially a linked list, and because linked lists have poor random lookup performance, O(N), lookup elements can only be traversed from the beginning node or the end node.

To find node 6 in this picture, you have to go a little bit to the right from node 1.

Can we use binary search in the linked list, of course it is possible, is to add index to the linked list, also known as a hop table

The image above is a simple hop chart (the various colors and numbers will be explained in more detail later). As can be seen from the figure, the skip list is on the basis of bidirectional linked list, with a multi-layer index to achieve.

Advantages and disadvantages of a skip table

As a quick lookup data structure, skip tables are often compared to red black trees. Let’s list the pros and cons of both

Advantages of a jumper:

Skip tables are relatively simple to implement. The definition of a red-black tree and the operation of left-right rotation, it is really complicated, I am not smart enough to understand it. I’ll explain the simplicity of the implementation later.
Interval lookup is convenient. Once a node is found in the hop table, adjacent elements can be found through the front and back Pointers. Red-black trees need to find through the parent node, child node, relatively troublesome.

Advantages of red-black trees

Small memory footprint, only 3 Pointers are needed (left subtree, right subtree, parent node) and the skip table has a backward pointer, each time with a forward pointer
A red black tree has a very strict definition. Every time data is inserted and deleted, it balances the structure of the tree by turning left and right. A red black tree has a stable search time O(logn)

Compared with regular linked lists, skip lists seem to have no disadvantages except memory

Github skip table simple said can have two kinds of implementation ideas, one is that there can not be repeated elements in the hop table, the other is that there are repeated elements in the hop table. Redis in the jump table is allowed to have repeated elements, I can implement repeated elements.

##### Skip table Node structure

type SkipListLevel struct {
	// point to the next node
	forward *SkipListNode

	/* * The distance to the next node; * Think about how far it is to the next node instead of the previous node */
	span int64
}

type SkipListNode struct {
	// point to the last node
	backward *SkipListNode
	// The index layer
	level []SkipListLevel
	// The stored value
	value ISkipListNode
	// The score used for ranking
	score float64
}
Copy the code

In a node, the black 1 is value, the yellow square is level, the white number is SPAN, the two Pointers backward and forward are not reflected in the figure, there will only be a backward pointer, what is the length of the level array, How many Pointers are going to be forward. Span is the span to the next node, and the adjacent node is 1. This value is used to calculate the node’s rank in the hop table. At first I didn’t understand the value of span, thinking it was the distance from the last node to here. It wasn’t until I was writing and inserting and searching nodes that I realized that this was the span to the next node.

Jump table structure

type SkipList struct {
	// Header and tail
	// The header is a real object, the tail is just a pointer
	head, tail *SkipListNode

	size int64 / / the total number of the node

	level int // Indicates the highest level of the current hop table

	maxLevel int // The current maximum number of layers
}
Copy the code

Size refers to the number of nodes in the table. Level refers to the current maximum height of the table. The height of the table is not fixed, but changes dynamically with insertion and deletion. MaxLevel is the maximum height that a jumper can reach. This value is fixed from the start.

Take the above diagram again and analyze the organizational relationship between the nodes of the entire hop table

There is a head node in a hop table, but there is no tail node. Tail is just a pointer to the last node in the hop table. This is shown in the initialization function

// Initialize a default skip table
func NewDefaultSkipTable(a) *SkipList {
	rand.Seed(time.Now().UnixNano())
	return &SkipList{
		head:     NewSkipListNode(SKIP_TABLE_DEFAULT_MAX_LEVEL, 0.nil),
		size:     0,
		level:    1,
		maxLevel: SKIP_TABLE_DEFAULT_MAX_LEVEL,
	}
}
Copy the code

In the diagram, the Pointers forward, namely the orange arrow, and the Pointers backward, namely the green arrow. A forward pointer is in level, that is, each level has a forward pointer, and a backward pointer only exists in node, that is, each node has only one backward pointer. Why only a backward pointer is needed? Watch the entire search process to see why you only need a backward pointer.

Insert the node

The basic knowledge is almost ready, start to jump into the table insert logic, first through a diagram to show the logic

The first step in inserting an element is to find the element. Because it’s an ordered list, the elements are arranged in order, so find where the elements should be before you insert them.

In the figure, the element with score of 3 is inserted into the hop table. The numbered arrow (pointer) is the search order, where the black arrow is the actual path determined. During the search, the rank should be recorded, which is the sum of the span that passes through all nodes. So let’s figure out what node we’re searching for. We want to find the largest number less than 3, which is represented in the diagram, so we want to find the position of element 2.

Determine the highest level value of the current hop table before searching. You can start at the top without thinking, but that doesn’t make sense. The highest level in the figure is 4, which is Level3 (because the Level array starts at 0). Then start at level level (Level3) of the head node.

Let’s set a temporary pointer t to head. First, the pointer 1 in the figure points to element 4, and the node to be inserted is 3, which is obviously greater than 3, so it does not conform. The point is that when the current node and the next node in the current layer don’t meet the criteria, you need to start searching for the next layer and that’s kind of a transition condition. The current node is still in head, so start at Level2 of head and search to the right. The next node of node is 1, 1 is less than 3, so the condition is met. The t pointer points to 1 node.

At this point, the next node to node2 is greater than 3, which is also the last layer, so node2 is the maximum element less than 3, which is the element to be sought.

Finding the right location, the next step is to insert Node 3 and adjust the span.

After determining the insertion location, determine the level height of the node3 element. Ideally, the node in the middle is the highest, similar to a mountain shape, the left side of the mountain, and then locate the middle node, which is the second highest, and so on. In reality, however, this ideal state is not strictly enforced by the hop table. The node height is determined by a random number, you read that correctly. This is why jumpers are easier to implement than red-black trees.

This is a random function

// The probability of adding an index to the table
var SKIPLIST_P = 0.25

// The number of random index levels
func (list *SkipList) randLevel(a) int {
	level := 1
	for (rand.Uint32()&0xFFFF) < uint32(0xFFFF*SKIPLIST_P) && level < list.maxLevel {
		level++
	}
	return level
}

Copy the code

func (list *SkipList) randLevel(a) int {
	level := 1
	for rand.Int31n(100) < 25 && level < list.maxLevel {
		level++
	}
	return level
}

Copy the code

The above two functions have the same function: they both have a 25% chance of increasing the number of node levels by 1.

Here is the entire insert node source code

// Insert a node
func (list *SkipList) InsertByScore(score float64, value ISkipListNode) *SkipListNode {
	rank := make([]int64, list.maxLevel)
	update := make([]*SkipListNode, list.maxLevel)
	t := list.head
    / / search node
	for i := list.level - 1; i >= 0; i-- {
		if i == list.level- 1 {
			rank[i] = 0
		} else {
			rank[i] = rank[i+1]}/ / the current layer to the next node is && (the next node score < score | | phase when the score at the same time, compare these two nodes, the next node < the newly inserted node)
		fort.Next(i) ! =nil && (t.Next(i).score < score || (t.Next(i).score == value.Score() && t.Next(i).value.Compare(value) < 0)) {
			rank[i] += t.level[i].span
			t = t.Next(i)
		}
		update[i] = t
	}

	level := list.randLevel()

	if level > list.level {
		// Handle rand level, level> current level
		for i := list.level; i < level; i++ {
			rank[i] = 0
			update[i] = list.head
			update[i].SetSpan(i, list.size)
		}
		list.level = level
	}
	newNode := NewSkipListNode(level, score, value)
    // Insert a new node
	for i := 0; i < level; i++ {
		newNode.SetNext(i, update[i].Next(i))
		update[i].SetNext(i, newNode)

		newNode.SetSpan(i, update[i].Span(i)-(rank[0]-rank[i]))
		update[i].SetSpan(i, rank[0]-rank[i]+1)}// Process the span of the new node
	for i := level; i < list.level; i++ {
		update[i].level[i].span++
	}
	// Handle the back pointer to the new node
	if update[0] == list.head {
		newNode.backward = nil
	} else {
		newNode.backward = update[0]}// Determine whether the newly inserted node is the last node
	if newNode.Next(0) != nil {
		newNode.Next(0).backward = newNode
	} else {
		// If it is the last node, let the tail pointer point to the newly inserted node
		list.tail = newNode
	}
	list.size++
	return newNode
}
Copy the code

In the search process, you need two auxiliary structures, a Rank array and an UPDATE array. Rank is used to record the rank value of each tier, which can be used later to adjust the rank of the new node. Update is used to record the path from the top down, which is the last node in the hop table for each layer of the new node.

        / / the current layer to the next node is && (the next node score < score | | phase when the score at the same time, compare these two nodes, the next node < the newly inserted node)
        fort.Next(i) ! =nil && (t.Next(i).score < score || (t.Next(i).score == value.Score() && t.Next(i).value.Compare(value) < 0)) {
            rank[i] += t.level[i].span
            t = t.Next(i)
        }
Copy the code

Above is the logic that determines whether to move to the right or to the next level during the search. In this case, it is not only necessary to judge the SOcore of nodes, but also to consider the situation that the score of two nodes is the same. If the socRE of two nodes is the same, the size of two nodes should be judged by Compare function.

    level := list.randLevel()
    if level > list.level {
        // Handle rand level, level> current level
        for i := list.level; i < level; i++ {
            rank[i] = 0
            update[i] = list.head
            update[i].SetSpan(i, list.size)
        }
        list.level = level
    }
Copy the code

This code handles the new layer after determining the height of the new node. Because the current higher level head was not placed in the update during the search, it is now placed in the update.

    // Insert a new node
    for i := 0; i < level; i++ {
        newNode.SetNext(i, update[i].Next(i))
        update[i].SetNext(i, newNode)

        newNode.SetSpan(i, update[i].Span(i)-(rank[0]-rank[i]))
        update[i].SetSpan(i, rank[0]-rank[i]+1)}Copy the code

This code adds a new node to the hop table by adjusting its front and back Pointers. And adjust the span value of node. In this case, the span value is the distance from the next node, not the distance from the previous node to the next node, so you can change the span value of the current node. If you store the distance from the previous node to this node, you need to change the span value of the next node. It’s gonna be a problem if you change it. Say may be a little wordy, oneself derivation once can understand come out.

    // Process the span of the new node
    for i := level; i < list.level; i++ {
        update[i].level[i].span++
    }
Copy the code

If the level of the new node is less than the maximum level in the hop table (if the level of the new node is 2, the maximum level in the hop table is 5), span all nodes above 2 +1.

What remains is a simple case of dealing with a backward pointer for the new node and determining if it is the last node.

By now, inserting node is complete. In fact, the most complex hop table is the insert process, the rest of the delete, update, search is similar logic, first find the key node, and then do processing.

Delete the node

Similarly, to delete a node, search first, find the node before the node to be deleted through the path of the black arrow, then modify the pointer of the node before the target node, skip the target node, complete the deletion, and finally adjust the span value of the node before the target node.

First search node and record into the UPDATE array

// Get the path of each layer of the node.
func (list *SkipList) GetUpdateList(node *SkipListNode) (update []*SkipListNode) {
	update = make([]*SkipListNode, list.maxLevel)
	t := list.head
	for i := list.level - 1; i >= 0; i-- {
		fort.Next(i) ! =nil && (t.Next(i).score < node.score || (t.Next(i).score == node.score && t.Next(i).value.Compare(node.value) < 0)) {
			t = t.Next(i)
		}
		update[i] = t
	}
	return
}
Copy the code

This logic is the same as the search logic for insertion

Here is the deletion logic

// Delete the corresponding node
func (list *SkipList) Delete(node *SkipListNode, update []*SkipListNode) {
	if node == nil {
		return
	}
	/ / the head can't delete
	if node == list.head {
		return
	}

	for i := 0; i < list.level; i++ {
		if update[i].Next(i) == node {
			/ / modify span
			update[i].SetSpan(i, update[i].Span(i)+node.Span(i)- 1)
			// Delete the corresponding node
			update[i].SetNext(i, node.Next(i))
		} else {
			update[i].level[i].span--
		}
	}

	// Handle the back pointer to node
	if node.Next(0) = =nil { // Node is the last node, and the tail pointer points to the last node (update[0]).
		list.tail = update[0]}else { // Node is not the last node, the next node points to the previous node (update[0])
		node.Next(0).backward = update[0]}// Delete the highest level, the current level should be corresponding to --
	for list.level > 1 && list.head.Next(list.level- 1) = =nil {
		list.level--
	}

	list.size--
}
Copy the code

The code for deleting node is much simpler, passing in the path from the highest to the node, the update array.

    for i := 0; i < list.level; i++ {
        if update[i].Next(i) == node {
            / / modify span
            update[i].SetSpan(i, update[i].Span(i)+node.Span(i)- 1)
            // Delete the corresponding node
            update[i].SetNext(i, node.Next(i))
        } else {
            update[i].level[i].span--
        }
    }
Copy the code

This section of logic is used to delete the node, namely, delete the target node in front of the back pointer to the node behind the target node. All that is left is to process the back pointer to node and determine if it is the last node.

Update the node

// Update node score
func (list *SkipList) UpdateScore(node *SkipListNode, score float64) {
	if score == node.score {
		return
	}
	// After the update, the score is still < next node position unchanged
	if score > node.score {
		if node.Next(0) != nil && score < node.Next(0).score {
			node.score = score
			return}}// After the update, the score is still > per node position unchanged
	if score < node.score {
		ifnode.Pre() ! =nil && score > node.Pre().score {
			node.score = score
			return}}// Delete node and insert it again
	updateList := list.GetUpdateList(node)
	list.Delete(node, updateList)
	// Re-insert
	list.InsertByScore(score, node.value)
}
Copy the code

Update node score. The premise of updating node is to find the target node. The principle is the same as the previous search logic for insert and delete. To update a node score, delete the node and insert it again. Of course, this logic can be optimized, for example, in one case

Node4 score = 5 node4 score = 5 Node4 score = 5 Node4 score = 5

In cases where changing the score changes the position, you need to delete the node and then reinsert it.

##### Find node Find node is very simple, because the previous insert, update, delete all need to find node. But the search is divided into two cases, one is based on ranking search, the other is based on score search.

First look at the search according to the ranking

// Find nodes based on the ranking range
func (list *SkipList) GetNodeByRank(left, right int64) (result []*SkipListNode) {
	// Scope error
	if list.Size() == 0 || left == 0 || right == 0 || right < left || left > list.Size() {
		return
	}
	tRank := int64(0)
	t := list.head
	result = make([]*SkipListNode, 0, right-left+1)
	// First find the least ranked element, then search a little to the right until you find the most ranked element
	for i := list.level - 1; i >= 0; i-- {
		fort.Next(i) ! =nil && tRank+t.level[i].span <= left {
			tRank += t.level[i].span
			t = t.Next(i)
		}
		if tRank == left {
			for; t ! =nil && tRank <= right; t = t.Next(0) {
				result = append(result, t)
				tRank++
			}
			return}}return
}
Copy the code

First exclude non-conforming situations, and then according to the previous search logic, step by step to find. The difference is that the restrictions now apply when ranking.

Look up according to score


// Determine whether the maximum and minimum values of the hop table contain the score range to be queried
func (list *SkipList) ScoreInRange(findRange *SkipListFindRange) bool {
	if! findRange.MaxInf && list.head.Next(0).score > findRange.Max {
		return false
	}
	if! findRange.MinInf && list.tail.score < findRange.Min {return false
	}
	return true
}

// Find node based on score range
func (list *SkipList) GetNodeByScore(findRange *SkipListFindRange) (result []*SkipListNode) {
	if findRange == nil || list.Size() == 0 {
		return
	}
	// Return if the search scope is not in the table
	if! list.ScoreInRange(findRange) {return
	}
	t := list.head
	if findRange.MinInf {
		// Start from scratch
		t = list.head.Next(0)}else {
		// Instead of starting from scratch, find the smallest element
		for i := list.level - 1; i >= 0; i-- {
			fort.Next(i) ! =nil && t.Next(i).score < findRange.Min {
				t = t.Next(i)
			}
		}
	}
	for {
		/ / can meet the requirements of range (from minus infinity | | current score > = to find the minimum) && (to is infinitely | | current element < = looking for maximum)
		if (findRange.MinInf || t.score >= findRange.Min) && (findRange.MaxInf || t.score <= findRange.Max) {
			result = append(result, t)
		}
		if t.Next(0) = =nil| | (! findRange.MaxInf && t.Next(0).score > findRange.Max) {
			/ / the next element is empty (to finish) | | (not find are infinite && the next element score > to find the maximum)
			break
		} else {
			// Move to the right
			t = t.Next(0)}}return
}
Copy the code

It’s a little bit more complicated to look up by score, because by score you can have both positive infinity and negative infinity. It is no longer enough to give two simple range values; a structure is needed to describe the range.

// Find the condition of the element according to scores
type SkipListFindRange struct {
	Min, Max       float64 // Maximum and minimum values
	MinInf, MaxInf bool    // whether it is plus infinity and minus infinity
}
Copy the code

Check whether the given range is within the range of the hop table before lookup. If not, skip the search.

    t := list.head
    if findRange.MinInf {
        // Start from scratch
        t = list.head.Next(0)}else {
        // Instead of starting from scratch, find the smallest element
        for i := list.level - 1; i >= 0; i-- {
            fort.Next(i) ! =nil && t.Next(i).score < findRange.Min {
                t = t.Next(i)
            }
        }
    }
Copy the code

This logic is to determine the starting position of the search, that is, to find the target node (find the smallest node according to the range). If you start at negative infinity, you don’t need to identify the starting node in the previous way, just start at head. The logic behind nothing special, find the target lane, began to look back, until the last or do not conform to the scope, the end.

Well, the search is done.

The basic function of the hop table is now complete, the only thing left is to wrap the hop table layer, to implement the function of Zset in Redis. I will not post the specific code, and FINALLY I will give the github connection.

conclusion

A hopelist can be regarded as an ordered bidirectional linked list that supports binary lookup
At the heart of a hop table is searching, whether you’re inserting, updating, deleting, or searching
The hop table determines the intermediate node number by random number when inserting node
The node of the hop table has a backward pointer and a forward pointer at each level
The advantage of a skip table over a red-black tree is that it is relatively easy to implement and easy to find a range

Finally, the application of the common skip table. Redis zset is based on the hop table, of course, I also read redis source code, learning the hop table. Leveldb, developed by Google, is also based on hoptables.

Finally put all the source code, Github Gitee

If this article is helpful to you, please give me a star. Thank you

5000 words detailed solution jump table implementation (challenge the most detailed Chinese network)

What is a skip watch

Advantages and disadvantages of a skip table

Jump table structure

Insert the node

Delete the node

Update the node

conclusion

Related Posts

HBase analysis and service practices

How architects design highly concurrent architectures

Tell you about guangzhou rent in Python