This article will focus on the tree data structure, related content. In fact, this article should be regarded as a reading note.
Content overview
- The tree
- Binary tree
- Binary search tree
- Heaps and some operations on heaps
- Heap sort
- The application of the heap
In addition, I first give the source address of the JS implementation here:
- Tree and binary tree
- The realization of the heap
- Heap sort
The tree
So let me just tell you what a tree is.
A tree is a nonlinear data structure. Elements in a tree are called “nodes”. Each node has a finite number of children or no children, and there can be no loops in the tree.
The relationship between two connected nodes is called a parent-child relationship.
Some terms (from Wikipedia) :
- Degree of a node: The number of subtrees a node contains is called the degree of the node;
- Tree degree: In a tree, the degree of the largest node is called tree degree;
- 2. A node of degree zero;
- 2. A node of degree not zero;
- Parent node or parent node: If a node has children, the node is called the parent of the children;
- Child node or child node: The root node of the subtree contained by a node is called the child node of that node;
- Sibling node: Nodes with the same parent node are called sibling nodes.
- The hierarchy of nodes is defined from the root, which is at layer 1, its child nodes at layer 2, and so on.
- Depth: For any node N, the depth of n is the length of the unique path from the root to n, and the root depth is 0;
- Height: For any node N, the height of n is the longest path length from n to a leaf, and the height of all leaves is 0;
- Cousin node: Nodes whose parent node is in the same layer are Cousins to each other;
- Ancestor of a node: all nodes on the branch from the root to the node;
- Descendant: Any node in a subtree whose root is a node is called a descendant of that node.
- Forest: the set of m (m>=0) trees that do not intersect each other is called forest;
Binary tree
There are many kinds of trees, such as binary tree, trigtree, quadtree and so on. But the most common tree is a binary tree.
A binary tree is a tree structure in which each node has at most two branches. The nodes of these two branches are called the left child node and the right child node.
A full binary tree is one in which every node has two child nodes in addition to the leaves.
Complete binary tree: a binary tree in which the number of nodes in all layers except the last layer is maximum and the nodes in the last layer are all arranged to the left.
For those of you who think that a perfect binary tree doesn’t look very useful, why is it still to the left, not to the middle? It’s left because one of the ways that binary trees store data is in arrays, and if you use a full binary tree you don’t waste array space.
Binary tree storage
1. Chain storage
Chain storage is a method of recording the parent-child relationship by means of Pointers. It’s kind of like a linked list, in that each node holds its own data and has left and right Pointers to the other two nodes.
Const node = {data: 1, // left: node2, // left: node2, // right: null // null means there is no right node}Copy the code
2. Sequential storage
It’s stored in an array. In order to make the code more readable, we generally choose to waste the array at the zero subscript, that is, the root node at the one subscript. At this time, the subscript relationship between the parent node and the left and right nodes is as follows:
left = 2 * i; right = 2 * i + 1; i = left / 2; i = right / 2; // This is rounding downCopy the code
Where I is the subscript of the parent node, left and right are the subscripts of the two child nodes.
There is one caveat: here the parent node is subtracted by the child node divided by 2 and rounded. (Some programming languages automatically remove the decimal part of the result when dividing integers, while others, such as Javascript, do and need to be rounded down manually.)
If you just don’t want to waste the first element of the array, make sure the root node is stored first in the array. Then the subscript relationship between the parent node and the child node is:
left = 2 * i + 1;
right = 2 * i + 2;
i = (left - 1) / 2;
i = (right - 1) / 2;
Copy the code
If a binary tree is a complete binary tree, array storage is undoubtedly the most memory efficient way.
Traversal of a binary tree
This is a very common interview question.
1. Preorder traversal
Around the root. The “before” here describes the root node, which prints first, then the left subtree, and finally the right subtree.
The tree in the code is stored in a chain store. The code implementation uses recursion.
// Preorder traversal (left and right root)preOrder() {
let order = ' '; r(this.root); order = order.slice(0, order.length - 1); // Delete the last comma.returnorder; // Recursive functionfunction r(node) {
if(! node)return;
order += `${node.val}`; r(node.left); r(node.right); }},Copy the code
2. In order traversal
The left the right. The “middle” of “middle order” also means that the output position of the root node is in the middle. In order traversal first output the left subtree, then output the root node, and finally output the right subtree.
// In order traversalinOrder() {
let order = ' ';
r(this.root);
order = order.slice(0, order.length - 1);
returnorder; // Recursive functionfunction r(node) {
if(! node)return;
r(node.left);
order += `${node.val}`; r(node.right); }},Copy the code
3. Follow-up traversal
Around the root. Print the left subtree first, then the root node, and finally the right subtree.
postOrder() {
let order = ' ';
r(this.root);
order = order.slice(0, order.length - 1);
returnorder; // Recursive functionfunction r(node) {
if(! node)return;
r(node.left);
r(node.right);
order += `${node.val}`; }},Copy the code
4. Hierarchical traversal
Hierarchy traversal is the process of traversing nodes at each level from left to right until all nodes are traversed. If you store it sequentially, you just iterate through the array. If you store the tree chained, the implementation is a little more complicated, and you use a queue
levelOrder() {
if (this.root == null) return ' ';
leta = [], left, right; a.push(this.root); // The node joins the queue, pointing to the header element, if it has left/right, joins the queue. // Move the pointer back and continue the same step... Until the pointer goes to the back of the queue...for(leti = 0; i < a.length; I ++) {left = a[I].left;if (left) a.push(left);
right = a[i].right;
if (right) a.push(right);
}
return a.map(item => item.val).join(', ');
}
Copy the code
Binary search tree
A binary search tree is also called a binary search tree. In addition, it is also called binary sort tree, because in order to walk through the data can be ordered sequence (very efficient, time complexity is O(n)).
The purpose of a binary search tree is to find things quickly. In addition to fast lookup, it also supports fast insertion and deletion of data.
So what kind of binary tree is a binary search tree? A binary search tree is a binary tree in which the nodes of the left subtree of any node are all less than that node and the nodes of the right subtree of any node are all greater than that node.
By definition, binary lookup trees are not allowed to have two nodes with the same data.
A binary search tree lookup operation
First, compare the value of the root node. If it is equal, it is found. If you want to find a value smaller than the root, you recursively look in the left subtree; If it is larger than the root, it recursively searches in the right subtree.
Find (val) {// Assume that the binary tree has no duplicate datalet p = this.root;
while(p ! = null) {if (val == p.val) return p;
else if (val < p.val) p = p.left;
else p = p.right;
}
returnnull; // Not found},Copy the code
Binary lookup tree insertion operation
Similar to the lookup operation. Starting from the root node, compare the data to be inserted with the size of the node. If the data to be inserted is larger than that of the current node, and the right subtree is empty, it is the right subnode of the node. If the right subtree is not empty, the recursion continues to the right subtree. Similarly, if the data is smaller than the current node, look at the left subtree of the node. If it is empty, insert it into the position of the left subnode. If it is not empty, recurse the left subtree.
// Insert (val) {if (this.root == null) {
this.root = node;
return true;
}
let node = new Node(val);
let p = this.root;
while(p ! = null) {if (val < p.val) {
if (p.left == null) {
p.left = node;
returnthis.inOrder(); // Return the result of the middle order traversal and check whether the insert was correct. You can change it totrue} p = p.lapt; }else if (val > p.val) {
// preP = p;
if (p.right == null) {
p.right = node;
return this.inOrder();
}
p = p.right;
}
if (val == p.val) {
console.warn('Binary tree contains data of the same value, cannot be inserted')
return false; }}},Copy the code
Binary search tree deletion operation
This one is very complicated. There are three cases:
- The node to be deleted has no children: update the pointer to it from the parent node to null
- The node to be deleted has only one child: the pointer in the parent to the node to be deleted, updated to the child of the node to be deleted.
- The node to be deleted has two children: find the smallest (typically leaf) node in the right subtree and replace it with the node to be deleted. (Of course, you can choose to find the largest node in the left subtree)
Remove (val) {// Assume that the binary tree has no duplicate datalet p = this.root;
letparent, dir; // For the time being, we will not consider the case where there is only one root node.while(p ! = null) {if(val == p.val) {// The node to be deleted has no childrenif (p.left == null && p.right == null) {
parent[dir] = null;
return true; } // The node to be deleted has only one child nodeelse if(p.left == null && p.right ! = null) { parent[dir] = p.right console.log('Right node only');
return true;
} else if(p.right == null && p.left ! = null) { parent[dir] = p.left; console.log('Left node only');
return true; } // The node to be deleted has two children // You can replace the smallest node in the right subtree with the node to be deleted and delete the smallest node // You can also find the largest node in the left subtree.else if(p.left ! = null && p.right ! = null) {this.findm cannot be used because the parent of the smallest node is to be recordedinStep 1: Find the smallest node minPlet minParent,
minP = p;
while (minP) {
if(minp.left == null) {// Found.break;
// returnminP; } minParent = minP; minP = minP.left; } var = minp.val;} var = minp.val; // Step 3: Delete the smallest nodeif (minP.right == null) {
minParent.left = null;
console.log('Minimum node has no children');
return true;
} else if(minP.right ! = null) { minParent.left = minP.right; console.log('The smallest node is only the right node');
return true; }}returnp; } // Continue looking for the node to delete.else {
parent = p;
if (val < p.val) {
p = p.left;
dir = 'left';
} else {
p = p.right;
dir = 'right'; }}}returnnull; // To save the parent node and record whether the current node is left or right of the parent node. },Copy the code
Another simple deletion operation is to mark a node as “deleted.” Although this becomes simpler, the “deleted” data is still in memory, wasting memory space.
Binary lookup trees that support duplicate data
In general, binary lookup trees are by definition not allowed to have duplicate data. But in real development, the data is generally impossible not to repeat. So let’s look at how binary lookup trees can support duplicate data stores:
- Nodes can store multiple data instead of just one. Consider linked lists and dynamically augmented arrays.
- If duplicate data is found during the insert process, place it on the leftmost node of the right subtree of the node (or consider placing it on the rightmost node of the left subtree). In such an implementation, the find and delete operations are followed by some minor modifications.
Balanced binary tree
A Balanced Binary Tree is a structurally Balanced Binary search Tree in which the absolute value of the leaf height difference is less than 1 and both the left and right subtrees are a Balanced Binary Tree. It can complete insertion, search and delete operations in O(logn). The earliest balanced binary search tree invented is AVL tree.
The invention of balanced binary tree is to solve the problem that the time complexity of binary tree degrades after the dynamic operation such as continuous insertion and deletion. It’s going to keep the binary tree as balanced as possible, short and chubby.
The most famous of the balanced binary trees are the red-black trees. How often do groups of friends joke that when they hire someone, they have to write a red-black tree on the spot? Red-black trees have good performance and are widely used in practical development, while other balanced binary trees rarely appear in people’s view.
The implementation of balanced binary tree is very complicated, so I won’t analyze it in detail.
The heap
A heap is a complete binary tree in which the data of each node is greater than or equal to its children.
A heap where each node’s value is greater than or equal to the value of each node in the subtree is called a large top heap. The heap where each node’s value is less than or equal to the value of each node in the subtree is called the small top heap.
Since the heap is a full binary tree, we store it in an array.
1. Insert an element
You insert elements into the heap, you do that by inserting them at the end of the array, and then by heapification, you resize the tree and make it back into the heap.
Heapify, “heapify,” is when a node is constantly swapped up and down until it finds the right place to make the current binary tree into a heap.
The heap that is done at insertion is the heap that is done from the bottom up. The newly inserted element is at the end and needs to be compared and swapped with the parent until it finds the right position, at which point the tree becomes a heap again.
// Stack up from bottom to top. Insert (val) {// count is the size of the current array, n is the size of the array // (of course js arrays are dynamic arrays). The n here is the limit that I put on manually.)if (this.count >= this.n) {
console.log('It's piled up, don't add it!! ')
return;
}
this.count++;
leta = this.a; A [this.count] = val;leti = this.count, j = Math.floor(i/2); // Store I /2 temporarilywhile (i > 1 && a[i] > a[j]) {
[a[i], a[j]] = [a[j], a[i]];
i = j;
j = Math.floor(i/2);
}
return true;
}
Copy the code
2. Delete the top element of the heap
After the top element is removed, we need to move the last element to the top element and then heapize from top to bottom.
When the last element replaces the top element, the top element is compared to its two children to see which is the largest. If not, the top element is swapped with the child node with the largest value. Repeat the above steps until the current node reaches its maximum or becomes a leaf.
// Remove the top element of the heapremoveMax() {
if (this.count == 0) return false; A [1] = this.a[this.count]; this.a[this.count] = undefined; this.count--; // Heap from top to bottomlet i = 1,
maxPos = i;
while (true) {
if (i * 2 <= this.count && this.a[i*2] > this.a[maxPos]) maxPos = i * 2;
if (i * 2 + 1 <= this.count && this.a[i*2 + 1] > this.a[maxPos]) maxPos = i * 2 + 1;
if (maxPos == i) {
break;
}
[this.a[i], this.a[maxPos]] = [this.a[maxPos], this.a[i]];
i = maxPos;
}
return true;
}
Copy the code
Heap sort
The heapsort algorithm consists of two steps: first build the heap, then sort it.
1. Build the heap
Heaps the array in place. In situ means that you operate on the original array without having to open a separate heap.
There are two ways to build a heap:
- By way of the insert operation mentioned earlier.
In this case, you insert elements at the end of the heap area and then heap them from the bottom up. Similar to insertion sort, we divide the array into “heap region” and “unprocessed region” and continue to insert elements from the “unprocessed region” into the “heap region” until we iterate through the entire array.
- Heapize from top to bottom, starting with the last non-leaf node.
Leaf nodes do not need to be heaped because they have no children. In addition, for a complete binary tree, the last non-leaf node has the subscript I / 2 (I starts at 1, you can draw your own full binary tree to verify).
The complexity of building a heap is order n.
2. The sorting
The top node of the heap is swapped with the last node, and then the remaining n-1 elements are heaped from the top down. And then we swap the top of the heap and the n-1st element, and then we heap the remaining n-2 elements from the top down, and so on and so on, until there’s only one element in the heap.
Performance analysis
1. The time complexity of heap sort is O(nlogn).
It takes O(n) to build the heap, and it takes O(nlogn) to sort, so it takes O(nlogn) to sort the heap.
2. Heap sort is an unstable sort
Because sorting is unstable because the element at the top of the heap is swapped with the last element in the heap.
3. Heap sort is sort in place
Comparison of heap sort and quicksort
Quicksort is better than heapsort. The reasons are as follows:
- Heapsort is not a good way to access data. It is a skipping access to array elements, which is not good for CPU cache.
- Heap sort has more swap operations. When heapsort is built, it reduces the ordered heap of data, which causes more exchange operations.
Code implementation is in the original array for data exchange.
The application of the heap
1. Priority queue
Priority can be used to solve problems such as merging ordered small files and high performance timers.
2. Obtain TopK data
This is maintaining a small top heap of size K. The unheaped element is compared to the top element. If it is larger than the top element, it is heaped. After all the elements are heaped, the heap is a TopK element.
The time complexity is order nlogK. In the worst case, a heap takes order logK, n times of heap.
3. Find the median
If you have static data, you sort it first and then you find the median, so the marginal cost is low.
If the data is dynamic, you need to maintain a large top heap and a small top heap. The number of large top heaps is equal to the number of small top heaps or the number of small top heaps +1, and the elements of the large top heaps are all less than the elements of the small top heaps. The median is the top element of the big top heap.
When initializing, you can use a similar topK algorithm to make a number of k n/2 small top stack array to the right, and then convert the remaining elements into a large top heap.
Then each time you add data, you compare the top element of the big top heap to the top element of the small top heap, and decide which side to insert. After the insertions, some heap insertions and deletions are performed to maintain that the two heaps are roughly the same (with the same number of left and right elements or one more heap on the left than on the right).
In addition to using the heap to calculate the median, we can also use the heap to calculate the “99% response time” problem. That is, maintain a 99/n number of large top heaps and a 1/n number of small top heaps.
reference
- Wikipedia – Trees (data structures)
- The beauty of data structures and algorithms