The tree

Common term for tree

The use of wires to represent the relationship between adjacent nodes is called the parent-child relationship.

Node A is the parent of node B, and node B is the child of node A. The parent of nodes B, C, and D is the same node, so they are called sibling nodes. We call a node without a parent the root node, which is the node E in the diagram. We call nodes without child nodes leaf nodes or leaf nodes. For example, G, H, I, J, K and L in the figure are leaf nodes.

The number of subtrees a node has is called the degree of the node. Nodes with degree 0 are called leaf nodes; Nodes whose degree is not zero are called branch nodes. In addition to root nodes, branch nodes are also called internal nodes. The degree of the tree is the maximum degree of each node in the tree.

Height, depth, number of layers

Height of node = longest path from node to leaf node (number of edges)

The depth of a node = the number of edges that the root node passes through to this node

Number of layers of a node = depth of the node + 1

Height of tree = height of root node

Binary tree

A binary tree is a finite set of N (n>=0) nodes, which is either an empty set (called an empty binary tree) or consists of a root node and two disjoint left and right subtrees, called root nodes respectively.

Full binary tree: All the leaf nodes are at the bottom level. In addition to the leaf nodes, each node has left and right children. This kind of binary tree is called full binary tree.

Complete binary tree: a complete binary tree is one in which the leaves are in the bottom two layers, the leaves in the last layer are arranged to the left, and the number of nodes in all layers except the last layer is maximized.

The storage structure of binary trees

Chain storage structure based on linked list

Each node has three fields, one of which stores data and the other two are Pointers to the left and right child nodes. By holding the root node, we can string the whole tree together using Pointers to the left and right child nodes.

Array – based sequential storage structure

If the node X is stored at subscript I in the array, the subscript 2 * I stores the left child, and the subscript 2 * I + 1 stores the right child. Conversely, the location store with subscript I /2 is its parent. In this way, we only need to know where the root node is stored (normally, the root node is stored at subscript 1 for the convenience of counting children), so that the whole tree can be strung together by subscripting.

However, the example I just gave is a complete binary tree, so only one subscript 0 storage location is “wasted”. If it’s not a complete binary tree, it actually wastes a lot of array storage space. You can see my example below.

So, ** if a binary tree is a complete binary tree, then using arrays is undoubtedly the most memory efficient way. ** Because arrays don’t need to store extra Pointers to the left and right child nodes as they do in chained storage. That’s why complete binomial trees stand alone, and why complete binomial trees require the children of the last level to be left.

A heap is a complete binary tree, and the most common storage is arrays.

Traversal of binary trees

The former sequence traversal

For any node in the tree, print that node first, then its left subtree, and finally its right subtree.

In the sequence traversal

For any node in the tree, print its left subtree first, then itself, and finally its right subtree.

After the sequence traversal

For any node in the tree, print its left subtree first, then its right subtree, and finally the node itself.

conclusion

** In fact, the front, middle, and back traversal of a binary tree is a recursive process. ** For example, pre-order traversal is essentially printing the root node, then recursively printing the left subtree, and finally recursively printing the right subtree.

It can be seen from the preceding sequence diagram of pre-middle and post-order traversal that each node will be accessed at most twice, so the time complexity of traversal operation is proportional to the number of nodes N, that is to say, the time complexity of binary tree traversal is O(n).

Binary search tree

The most important feature of binary search tree is that it supports fast insertion, deletion and search of dynamic data set.

A binary search tree requires that at any node in the tree, the value of each node in the left subtree is less than the value of this node, and the value of each node in the right subtree is greater than the value of this node.

To find the

We take the root node, and if it’s equal to what we’re looking for, we return it. If the data you’re looking for is smaller than the value of the root node, recursively look it up in the left subtree; If the data you’re looking for is larger than the value of the root node, recursively look it up in the right subtree.

insert

Starting at the root node, compare the data to be inserted to the size of the node. If the data to be inserted is larger than that of the node and the right subtree of the node is empty, the new data is directly inserted into the position of the right child node. If it is not empty, the right subtree is recursively traversed to find the insertion location. Similarly, if the data to be inserted is smaller than the value of the node and the left subtree of the node is empty, the new data is inserted at the position of the left child node. If it is not empty, the left subtree is recursively traversed to find the insertion location.

delete

According to the different number of child nodes of the node to be deleted, we need to deal with three cases.

  • In the first case, if the node to be deleted has no children, we simply set the pointer to the node to be deleted to NULL in the parent node. For example, delete node 55 in the figure.

  • In the second case, if the node to be deleted has only one child (either the left child or the right child), we simply update the pointer in the parent node to point to the child of the node to be deleted. For example, delete node 13 in the figure.

  • The third case is more complicated if the node to be deleted has two children. We need to find the smallest node in the right subtree of this node and replace it with the node we want to delete. Then delete the smallest node, because the smallest node definitely has no left children (if it has left children, it is not the smallest node), so we can apply the above two rules to delete the smallest node. For example, delete node 18 in the figure.

In fact, there is a very simple and tricky way to delete a binary search tree, which is simply to mark the node to be deleted as “deleted” without actually removing the node from the tree. In this way, the deleted nodes need to be stored in the memory, which wastes memory space, but the deletion operation becomes much easier. Moreover, this approach does not make it more difficult to insert and find the implementation of the operation code.

Other operating

In addition to insert, delete and find operations, the binary search tree can also support the rapid search of the largest node and the smallest node, precursor node and successor node.

** order traversal binary search tree, can output ordered data sequence, time complexity is O(N), very efficient. ** Therefore, binary search trees are also called binary sort trees.

Binary search tree that supports repeated data

All the binary search tree operations we talked about are for cases where there are no identical keys. What if two objects are stored with the same key value? I have two solutions here.

  • The first way is easier. We can use linked lists or dynamically scalable arrays to store the same value of data, and then store Pointers to the list or array in the nodes of the binary lookup tree.
  • The second method is harder to understand, but more elegant.
    • Insert: Each node still stores only one data. In the process of finding the insertion location, if we encounter a node with the same value as the data to be inserted, we place the data to be inserted into the right subtree of the node, that is, we treat the newly inserted data as if it were greater than the value of the node.
    • Search: When searching for data, we do not stop searching for nodes with the same value. Instead, we continue searching in the right subtree until we reach a leaf node. This will find all nodes whose key value is equal to the value you are looking for.
    • Delete: first find each node to be deleted, and then delete it in turn according to the previous deletion operation method.

Time complexity analysis of binary search tree

In binary search trees, the time complexity of search, insert, delete and many other operations is proportional to the height of the tree. The time complexity of the two extreme cases is O(n) and O(logn) respectively, corresponding to the case where the binary tree degenerates into a linked list and the complete binary tree respectively.

In binary search trees, whether the operation is insert, delete, or find, the time complexity is actually proportional to the height of the tree, which is O(height). So now the question becomes, how do you find the height of a complete binary tree with n nodes?

The height of the tree is equal to the maximum number of layers minus one, which is converted to layers for easy calculation. As can be seen from the figure, in a complete binary tree containing N nodes, the first layer contains 1 node, the second layer contains 2 nodes, and the third layer contains 4 nodes, and so on. The number of nodes in the lower layer is twice that of the previous layer, and the number of nodes in the K layer is 2^(k-1).

However, for a complete binary tree, the number of nodes at the last level is a bit different. It contains between 1 and 2^(L-1) nodes (we assume the maximum number of layers is L). So if we add up the number of nodes at each level, that’s the total number of nodes. In other words, if the number of nodes is n, then n satisfies the following relation:

n >= 1+2+4+8+... +2^(L-2)+1 n <= 1+2+4+8+... +2^(L-2)+2^(L-1)Copy the code

Using the formula for the sum of geometric sequences, we can calculate that the range of L is log2(n+1), log2n +1. The number of layers of a complete binary tree is less than or equal to log2n +1, that is, the height of a complete binary tree is less than or equal to log2n, that is, the time complexity of a complete binary tree is O(logn).

Why do you need a binary lookup tree when you have an efficient hash table?

The time complexity of the insert, delete, and find operations of the hash table can be constant O(1), which is very efficient. Binary search tree in a balanced case, insert, delete, search operation time complexity is O(logn), relative to hash table, seems to have no advantage, so why we still use binary search tree?

I think there are several reasons:

  1. The data in the hash table is stored unordered. To output ordered data, sort it first. For binary search trees, we only need middle-order traversal to output ordered data sequences within O(n) time complexity.
  2. It takes a lot of time to expand hash table, and the performance is not stable when hash conflicts are encountered. Although the performance of binary lookup tree is not stable, in engineering, the performance of the most commonly used balanced binary lookup tree is very stable, and the time complexity is stable at O(logn).
  3. In general, although the time complexity of operations such as lookup of hash tables is constant, the constant is not necessarily less than logn because of hash conflicts, so the actual lookup may not be faster than O(logn). Plus the time consuming of the hash function, it is not necessarily more efficient than balancing binary search trees.
  4. Hash table construction is more complex than binary search tree, there are many things to consider. Such as the design of hash functions, conflict resolution, capacity expansion, capacity reduction, etc. Balanced binary search tree only needs to consider the problem of balance, and the solution of this problem is relatively mature and fixed.
  5. In order to avoid too many hash collisions, the hash table load factor should not be too large, especially if the hash table is based on the open addressing method, otherwise some storage space will be wasted.

Taken together, balanced binary search trees are superior to hash tables in some ways, so the existence of the two is not in conflict. In the actual development process, we need to choose which one to use according to specific requirements.

Balanced binary search tree

The strict definition of a balanced binary tree is that the height difference between the left and right subtrees of any node in the binary tree cannot be greater than 1.

Balanced binary search trees should not only meet the above definition of balanced binary search trees, but also meet the characteristics of binary search trees, but many balanced binary search trees do not strictly meet the above definition, such as red-black trees.

We learn about data structures and algorithms in order to apply them to actual development, so I don’t think it’s necessary to define them. The original intention of the invention of balanced binary search tree is to solve the problem of time complexity degradation of ordinary binary search tree in the case of frequent insertion, deletion and other dynamic updates.

Therefore, the meaning of “balance” in the balanced binary search tree is to make the whole tree look more “symmetric” and “balanced”, do not appear the left subtree is very high, the right subtree is very short. This keeps the height of the whole tree relatively low, and the insertion, deletion, and lookup operations are more efficient.

Red and black tree

How do you define a red-black tree?

There are a lot of balanced binary lookup trees, but when we talk about balanced binary lookup trees, we basically hear about red black trees, and sometimes we even default to balanced binary lookup trees. So what exactly is a red-black tree?

A red-black Tree is a red-black Tree, r-B Tree for short. It is a self-balancing binary search tree with red and black nodes. It must satisfy the following properties:

  • Each node is either black or red.

  • The root node is black

  • Each leaf node is black and NIL, that is, the leaf node does not store data

  • Both children of every red node must be black

  • The path from any node to each of its leaf nodes contains the same number of black nodes

The second requirement, “all leaves are black empty nodes”, is a bit strange, but it is mainly set up to simplify the code implementation of red-black trees. We will leave this out for the moment, so I will omit the black and empty leaves when I draw and explain them.

Why are red-black trees approximately balanced?

Red-black trees guarantee that the longest path is no more than twice as long as the shortest path, so they are approximately balanced.

The original intention of balancing binary search tree is to solve the performance degradation caused by dynamic update of binary search tree. Therefore, “balance” can be equated to mean that performance does not degrade. “Approximate equilibrium” is equivalent to the fact that performance does not degrade too much.

The performance of many operations on binary lookup trees is proportional to the height of the tree. The height of a perfectly balanced binary tree (full or complete) is about log base 2n, so if we want to prove that a red-black tree is approximately balanced, we just need to analyze whether the height of a red-black tree tends to log base 2n steadily.

The height of a red-black tree is not easy to analyze, so I’ll walk you through it step by step.

First of all, if we remove the red nodes from the red-black tree, what is the height of a red-black tree that contains only black nodes?

After the red node is deleted, some nodes have no parent node, so they take the node’s grandparent as the parent node. So, what was a binary tree becomes a quadtree.

The previous definition of a red-black tree states that each path from any node to the reachable leaf node contains the same number of black nodes. We take some nodes out of a quadtree, put them in leaf positions, and the quadtree becomes a complete binary tree. So the height of a quadtree containing only black nodes is smaller than the height of a complete binary tree containing the same number of nodes.

In the last video, we said that the height of a complete binary tree is approximately log base 2n, and that the height of a quad-black tree is lower than that of a complete binary tree, so the height of a black tree without the red nodes is no higher than log base 2n.

We now know the height of the black tree, which contains only black nodes. Now if we add back the red nodes, what height will we get?

From the example and definition of a red-black tree I drew above, in a red-black tree, the red nodes cannot be adjacent, that is, every red node must have at least one black node separated from the other red nodes. The path that contains the most black nodes in a red-black tree cannot exceed log base 2n, so the longest path after adding red nodes cannot exceed log base 2n, which means that the height of the red-black tree is approximately 2log base 2n.

As a result, the height of the red-black tree is only twice as large as the height of the highly balanced AVL tree (log2n), and the performance decrease is not much. It’s not very precise, and red black trees actually perform better.

Why do people like to use red black trees in projects?

As mentioned earlier, Treap and Splay Tree operate efficiently in most cases, but they are not immune to the degradation of time complexity in extreme cases. Although these situations occur infrequently, they are not applicable to scenarios where the time of a single operation is very sensitive.

AVL tree is a highly balanced binary tree, so the search efficiency is very high. However, there are advantages and disadvantages, and AVL tree has to pay more costs in order to maintain such a high balance. Each insertion and deletion needs to be adjusted, which is complicated and time-consuming. Therefore, AVL trees can be a bit expensive for data sets with frequent insert and delete operations.

Red-black trees are approximately balanced, not strictly balanced, so the cost of maintaining equilibrium is lower than AVL trees.

Therefore, the operation performance of red-black tree insertion, deletion and search is relatively stable. For engineering applications facing all kinds of anomalies, we prefer this kind of stable balanced binary search tree to support such industrial applications.

Red black tree implementation principle

Red-black trees are always self-balancing by rotating and changing colors. Three operations: left rotation, right rotation and color change

  • Left-rotation: a node is used as the fulcrum (rotation node), and its right child becomes the parent of the rotation node, the left child of the right child becomes the right child of the rotation node, and the left child remains unchanged.

  • Dextral: a node is used as the fulcrum (rotation node), and its left child becomes the parent of the rotation node, the right child of the left child becomes the left child of the rotation node, and the right child remains unchanged.

  • Discoloration: the color of the node changes from red to black or from black to red.

Red-black tree insertion strategy

The naming convention for insert operation nodes

All insertion scenarios

Red black tree deletion strategy

Delete the naming convention of the operation node

All Deleted Scenarios