Inheritance, is the continuation of happiness; Overload, is the rebirth of happiness.
Arrays and linked lists
An array of
In computer science, an array data structure, or array for short, is a data structure composed of a collection of elements of the same type, allocated a contiguous chunk of memory for storage. The index of an element can be used to calculate the corresponding storage address of the element.
advantages
- Random access is faster (subscription-based access).
- Simple to implement and use.
- Disruptor leverages CPU cache + array address continuity to maximize performance.
disadvantages
- Memory contiguity can be a plus or a minus, and arrays can be greatly limited if memory is tight.
- Inserts and deletions result in slow movement of elements (data copying).
- The array size is fixed, which greatly limits the number of elements and is not friendly to many dynamic data.
The list
Linked list is a common basic data structure. It is a linear list, but it does not store data in a linear order. Instead, it stores Pointers to the next node in each node. Because they do not have to be stored sequentially, linked lists can be O(1) complex when inserted, much faster than the other linear sequential lists, but it takes O(n) time to find a node or access a node with a specific number.
advantages
- Linked list structure can make full use of computer memory space and realize flexible memory dynamic management.
- Delete inserts without moving other elements.
- Element size is not limited, can be expanded at will.
disadvantages
- The advantage of random array reading is lost, and the linked list has a large space overhead due to the increase of node pointer field.
- Random access is less efficient than arrays.
Time complexity analysis
-
Random access arrays: O(1) do not support random access
-
Insert, delete array: ordered array ->O(n), unordered array -> (1) linked list: O(1)
List can be divided into single table and two-way chain table again, but this is not the problem of this article is to discuss, tell me the arrays and linked list here, just do the bedding, because when it comes to jump table below will use the list of relevant knowledge, so the list also not very familiar with classmates, can learn first, and then at the start of today’s text: jump table.
What is a skip watch?
The concept of linked list was mentioned above. The query time complexity of linked list is O(n). Even if it is an ordered linked list, it has to start from the first element of the list to traverse the search. That’s where the skip table comes in.
Skip list: In computer science, a skip list is a data structure. It makes the average time complexity of both lookup and insert operations for ordered sequences containing n elements:
Let’s draw a normal linked list structure
This is a common single-linked list structure, if we want to find the node 58, we have to query 1 -> 4 -> 7 -> 15 -> 20 -> 35 -> 50 -> 58, a total of 8 times to find the result we need, time complexity: Order n, you can imagine the efficiency, so let’s try to optimize it together.
Mysql query efficiency also drops dramatically when the data volume is large. How do they optimize the efficiency problem? That’s an index, and here we can also use something like an index to optimize a regular single linked list.We made a copy of the nodes after every two nodes in the original linked list for making indexes to facilitate data search.
1 -> 7 -> 20 -> 50 -> 58. Let me explain this figure now. Since we have established an index layer, we search the first index layer first. When we search the index layer, we find that 50 and the previous nodes are all less than 58. So you find the down node in 50, and then you go down one more time and you find 58.
Added “index” also less query three times, optimization is not very obvious ah, this optimization is necessary to exist? So here we have a layer of index. What if we have several more layers of index? What would it look like?At this time, we found that we only need 4 times to find the node 58 we need to search, 4 times is half faster than 8 times. I don’t have too much data here, when there are enough data, the effect will be more obvious.
This kind of data structure composed of layer by layer index, we call him: hop table, now believe you have a deeper understanding of what is hop table, this time you may have a question? In order to build the index layer, we will certainly consume a lot of space, which is a way of space for time, but when there is a large amount of data, will there be insufficient space?
Now let’s analyze the time/space complexity of the hop table
Jump table analysis
Assume that the number of original linked list nodes is: N, we know that we are out of every two nodes in a node as a node of index layer, then the first index layer nodes number should be the original list of number of 1/2, which is n / 2, the second index number of the nodes is the first layer of the index number of 1/2, 1/4, original list node number, which is n / 4, The number of nodes in the third layer index is 1/2 of the number of nodes in the second layer, 4/1 of the number of nodes in the first layer, 1/8 of the number of nodes in the original linked list, and so on. The number of nodes in the k layer index is 1/2 of the number of nodes in the K-1 layer index, so the number of nodes in the K layer index is:
A linked list, assuming that we are all assigned an index node every two nodes, the highest index has only two nodes, and the index level is h, will result in the following formula:
Let’s solve for h:
Let’s break this formula down:
So the value of h is:
Since h is the height of the index layer, we calculate the original linked list, so the height of the real hop table is:
Now let’s calculate the time complexity of the hop table query. The hop table has many layers of indexes, and each layer of indexes will be traversed for corresponding times. Assuming that the maximum number of traverses for each index is x, then the number of traverses for a query is:
Therefore, expressed in O notation, the time complexity of skip table is:
Why is the base 2 missing from the logarithm? In fact, in time complexity analysis, the analysis is only a trend, not a fixed value, relative level, the base can be ignored, specific introduction, not to expand the introduction, if you are interested in baidu, it is not complicated, what we need to understand is what is this x on earth?
In the structure diagram of the hop table drawn above, we extract one index node every two nodes, so when we need to query a node in the hop table, we need to traverse the index layer from top to bottom, each layer is not more than 3, why 3 instead of 4, 5, 6? Let’s draw a simple graph to illustrate this.
We need to find 12, and when we get to 9 we find that 12 is greater than 9 and less than 13, so we go down one index, we get to the first index, and there are only three nodes between 9 and 13, even if we go down to the original list, there are only three nodes before the two range nodes, Therefore, the maximum number of nodes to be traversed on each layer can be obtained in this way. Since x is a constant 3, it can be directly omitted, so the final time complexity of the skip table is:
Although the efficiency of the query is greatly improved, there is a very important problem, which is that the hop table takes up extra memory space, because it requires many layers of indexes, so this data structure is not recommended. Now that Redis is officially in use, what are you afraid of? Let’s analyze the space complexity of the hop table, to see whether there is a large memory consumption?
If the length of the original list is n, then the length of the index at the first level is N /2, the length of the index at the second level is N /4, the length of the index at the third level is N /8, and the length of the index at the last level is 2. Obviously, this is a geometric sequence:
Sum formula of geometric sequence:
Insert the formula:
The total size of the index layer is n-2, plus the original linked list n = 2n-2, so the spatial complexity of the hop table is O(n), and the constant can be omitted. This is the spatial complexity of extracting an index node from each two nodes. How many nodes can extract an index node from each node? For example, extract one every three, extract one every five, the calculation is the same as above, resulting in space complexity: O(n), but it does reduce the space a lot.
Now that we’re done with query, what about insert and delete hops? Will they also perform well? Let’s take a look.
Since our linked list is ordered, we have to find the insertion location before insertion. The query time complexity of the hop table is O(logn), and the insertion operation is after finding the insertion location. The insertion time complexity of the single linked list is O(1), which will not be proved here. So the whole insert is going to be the same thing as finding the insert position plus insert is order logn.
So the time to delete is order logn. Why is that? In fact, it is the same as insert, but there will be another operation to find the precursor node, because deletion will cause the pointer change of the precursor node, the latter node exists in the current deleted node, but the former and precursor node does not, and the query is also O(logn), so delete O(logn)+O(logn), so the time complexity of deletion is still: O(logn), but there is a point to note: if the node to be deleted also exists in the index layer, then the node of the index layer needs to be deleted at the same time. The problem of precursor nodes can be solved by using a bidirectional linked list.
Index update
The query, delete, add performance of a hop table is good, but can it remain so good? Since jump table is used to store data, so will certainly be accompanied by frequent addition and deletion, assuming that at a certain interval of adjacent index node is inserted between the more data, then leads to uneven distribution of data, in a certain range query time efficiency is lower, the worst may be degraded into singly linked lists, all of the data in this range, Therefore, when inserting data, we update the index layer in real time to ensure that the data structure of the hop table does not degenerate excessively. Then, how do we maintain the change of the index layer?
Actually not difficult also, jumping by random function to maintain the balance of a data table, which is to keep data is relatively uniform, when we jump to add data in the table, through the random function to generate a random number, that number is index layer, then need to add in this layer node is added to the index layer node, Assume that the data we need to insert is 7, and the random number generated by the random function is: 2, then the insertion will be as follows:
Here, the random function is very demanding, it can ensure the balance of the table, will not degrade the performance of the table, this point is the same as red black tree left/right rotation.
Code implementation
The above are the theoretical basis, we finished the theory, that must be true to the dry, or not white see it, then we have a look at how to achieve the code jump table
` `
package com.lx.cloud.test;
import java.util.Random;
/**
* @ClassName SkipList
* @Description 跳表
*/
public class SkipList {
private static final int MAX_LEVEL = 16;
private int levelCount = 1;
/**
* 带头链表
*/
private Node head = new Node(MAX_LEVEL);
private Random r = new Random();
public Node find(int value) {
Node p = head;
// 从最大层开始查找,找到前一节点,通过--i,移动到下层再开始查找
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
// 找到前一节点
p = p.forwards[i];
}
}
if (p.forwards[0] != null && p.forwards[0].data == value) {
return p.forwards[0];
} else {
return null;
}
}
/**
* 优化了作者zheng的插入方法
*
* @param value 值
*/
public void insert(int value) {
int level = head.forwards[0] == null ? 1 : randomLevel();
// 每次只增加一层,如果条件满足
if (level > levelCount) {
level = ++levelCount;
}
Node newNode = new Node(level);
newNode.data = value;
Node update[] = new Node[level];
for (int i = 0; i < level; ++i) {
update[i] = head;
}
Node p = head;
// 从最大层开始查找,找到前一节点,通过--i,移动到下层再开始查找
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
// 找到前一节点
p = p.forwards[i];
}
// levelCount 会 > level,所以加上判断
if (level > i) {
update[i] = p;
}
}
for (int i = 0; i < level; ++i) {
newNode.forwards[i] = update[i].forwards[i];
update[i].forwards[i] = newNode;
}
}
/**
* 优化了作者zheng的插入方法2
*
* @param value 值
*/
public void insert2(int value) {
int level = head.forwards[0] == null ? 1 : randomLevel();
// 每次只增加一层,如果条件满足
if (level > levelCount) {
level = ++levelCount;
}
Node newNode = new Node(level);
newNode.data = value;
Node p = head;
// 从最大层开始查找,找到前一节点,通过--i,移动到下层再开始查找
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
// 找到前一节点
p = p.forwards[i];
}
// levelCount 会 > level,所以加上判断
if (level > i) {
if (p.forwards[i] == null) {
p.forwards[i] = newNode;
} else {
Node next = p.forwards[i];
p.forwards[i] = newNode;
newNode.forwards[i] = next;
}
}
}
}
/**
* 作者zheng的插入方法,未优化前,优化后参见上面insert()
*
* @param value
* @param level 0 表示随机层数,不为0,表示指定层数,指定层数
* 可以让每次打印结果不变动,这里是为了便于学习理解
*/
public void insert(int value, int level) {
// 随机一个层数
if (level == 0) {
level = randomLevel();
}
// 创建新节点
Node newNode = new Node(level);
newNode.data = value;
// 表示从最大层到低层,都要有节点数据
newNode.maxLevel = level;
// 记录要更新的层数,表示新节点要更新到哪几层
Node update[] = new Node[level];
for (int i = 0; i < level; ++i) {
update[i] = head;
}
/**
*
* 1,说明:层是从下到上的,这里最下层编号是0,最上层编号是15
* 2,这里没有从已有数据最大层(编号最大)开始找,(而是随机层的最大层)导致有些问题。
* 如果数据量为1亿,随机level=1 ,那么插入时间复杂度为O(n)
*/
Node p = head;
for (int i = level - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
p = p.forwards[i];
}
// 这里update[i]表示当前层节点的前一节点,因为要找到前一节点,才好插入数据
update[i] = p;
}
// 将每一层节点和后面节点关联
for (int i = 0; i < level; ++i) {
// 记录当前层节点后面节点指针
newNode.forwards[i] = update[i].forwards[i];
// 前一个节点的指针,指向当前节点
update[i].forwards[i] = newNode;
}
// 更新层高
if (levelCount < level){
levelCount = level;
}
}
public void delete(int value) {
Node[] update = new Node[levelCount];
Node p = head;
for (int i = levelCount - 1; i >= 0; --i) {
while (p.forwards[i] != null && p.forwards[i].data < value) {
p = p.forwards[i];
}
update[i] = p;
}
if (p.forwards[0] != null && p.forwards[0].data == value) {
for (int i = levelCount - 1; i >= 0; --i) {
if (update[i].forwards[i] != null && update[i].forwards[i].data == value) {
update[i].forwards[i] = update[i].forwards[i].forwards[i];
}
}
}
}
/**
* 随机 level 次,如果是奇数层数 +1,防止伪随机
*
* @return
*/
private int randomLevel() {
int level = 1;
for (int i = 1; i < MAX_LEVEL; ++i) {
if (r.nextInt() % 2 == 1) {
level++;
}
}
return level;
}
/**
* 打印每个节点数据和最大层数
*/
public void printAll() {
Node p = head;
while (p.forwards[0] != null) {
System.out.print(p.forwards[0] + " ");
p = p.forwards[0];
}
System.out.println();
}
/**
* 打印所有数据
*/
public void printAll_beautiful() {
Node p = head;
Node[] c = p.forwards;
Node[] d = c;
int maxLevel = c.length;
for (int i = maxLevel - 1; i >= 0; i--) {
do {
System.out.print((d[i] != null ? d[i].data : null) + ":" + i + "-------");
} while (d[i] != null && (d = d[i].forwards)[i] != null);
System.out.println();
d = c;
}
}
/**
* 跳表的节点,每个节点记录了当前节点数据和所在层数数据
*/
public class Node {
private int data = -1;
/**
* 表示当前节点位置的下一个节点所有层的数据,从上层切换到下层,就是数组下标-1,
* forwards[3]表示当前节点在第三层的下一个节点。
*/
private Node forwards[];
/**
* 这个值其实可以不用,看优化insert()
*/
private int maxLevel = 0;
public Node(int level) {
forwards = new Node[level];
}
@Override
public String toString() {
StringBuilder builder = new StringBuilder();
builder.append("{ data: ");
builder.append(data);
builder.append("; levels: ");
builder.append(maxLevel);
builder.append(" }");
return builder.toString();
}
}
public static void main(String[] args) {
SkipList list = new SkipList();
list.insert(1, 3);
list.insert(2, 3);
list.insert(3, 2);
list.insert(4, 4);
list.insert(5, 10);
list.insert(6, 4);
list.insert(8, 5);
list.insert(7, 4);
list.printAll_beautiful();
list.printAll();
SkipList list2 = new SkipList();
list2.insert2(1);
list2.insert2(2);
list2.insert2(6);
list2.insert2(7);
list2.insert2(8);
list2.insert2(3);
list2.insert2(4);
list2.insert2(5);
System.out.println();
list2.printAll_beautiful();
}
}
Copy the code
Print the result
`null:15——- null:14——- null:13——- null:12——- null:11——- null:10——- 5:9——- 5:8——- For 12 — — — — — — — — — — — — — — 5:5 — — — — — — — and — — — — — — — now — — — — — — — and — — — — — — — and — — — — — — — o — — — — — — — and — — — — — — — and — — — — — — – 1:2 2:2 and the things — — — — — — — — — — — — — — and — — — — — — – a soul — — — — — — — o — — — — — — — when — — — — — — — — — — — — — — lo 1:1 2-1 — — — — — — — — — — — — — — 3:1 4:1 — — — — — — — — — — — — — — — — — — — — — 6:1 5-1 — — — — — — — 7:1 — — — — — — — 1 — — — — — — – 1:0——-2:0——-3:0——-4:0——-5:0——-6:0——-7:0——-8:0——- { data: 1; levels: 3 } { data: 2; levels: 3 } { data: 3; levels: 2 } { data: 4; levels: 4 } { data: 5; levels: 10 } { data: 6; levels: 4 } { data: 7; levels: 4 } { data: 8; levels: 5 }
null:15——- null:14——- null:13——- null:12——- null:11——- null:10——- null:9——- null:8——- null:7——- null:6——- null:5——- 5:4——- 3:3——-5:3——- 3:2 4-2 — — — — — — — — — — — — — — — — — — a soul — when — — — — — — — — — — — — — — lo 3-1 2:1 — — — — — — — — — — — — — — — — — — — — — 5-1 4-1 — — — — — — — — — — — — — — country 7:1 — — — — — — — 1 — — — — — — – 1-0 — — — — — — — — — — — — — — 2-0 3-0 — — — — — — — — — — — — — — 4:0 5-0 — — — — — — — lost — — — — — — – away — — — — — — — 8-0 — — — — — — –
`
Code source github, not written by myself: github.com/wangzheng08…
conclusion
- What is a skip watch?
Skip list: In computer science, a skip list is a data structure. It makes the average time complexity of both lookup and insert operations for ordered sequences containing n elements:
- Time/space complexity of a hop table
Time complexity: O(logn), space complexity: O(n), the space complexity will decrease with the increase of the index interval distance, but the trend of space complexity is O(n) unchanged. Because the information stored in the index layer is much less than that in the original linked list, even O(n) space complexity is acceptable.
- The application of skip tables
1. Ordered collection of Redis.
Google’s open source Key/Value storage engine.
3.HBase MemStore internal storage structure
- Dynamic update of the hop table
Due to the jump table is composed of multilayer index, so frequent insert again leads to an adjacent index node before the end of the data too much, the worst nixuefeileam condition, can lead to all the data are concentrated in a certain period between adjacent index nodes, which can lead to jump table degenerated into ordinary chain table, time complexity will be degraded to O (n), Therefore, it is necessary to dynamically update the index nodes in the hop table, which is currently implemented by random functions.
- Why does Redis use hoppers to implement ordered collections?
1. Skip table is a data structure composed of linked list and multi-level index. Through the design idea of space for time, “binary search” based on linked list is realized, and the time complexity of search, delete and insert is O(logn).
2. Compared with B tree, red-black tree, AVL tree, etc., the implementation of hoppers is much simpler. The balance maintenance of these trees is quite troublesome, while the dynamic index update of hoppers is much simpler.
3. Interval lookup of hop tables is better than red-black trees.
My friend, did you quit school?
This article is published by OpenWrite, a blogging tool platform