One: The basic concept of trees

What is a tree?

Tree is a kind of data set used to simulate the nature of Tree structure. It is a hierarchical set composed of n(n > 0) finite nodes. The reason it’s called a tree is because it’s a data structure that looks like an upside-down tree, that is, it has roots up and leaves down.

The data structure of tree has the following characteristics:

  • Each node has finite or no children
  • A node without a parent is called the root node
  • Each non-root node has one and only one parent node
  • In addition to the root node, each child node can be divided into multiple disjoint subtrees
  • Is there a cycle in the tree?

A noun term for a tree

There are several common terms used in tree data structures:

  • Degree of node: the number of subtrees contained by a node is called the degree of the node;
  • Tree degree: in a tree, the largest node degree is called tree degree;
  • Leaf node or terminal node: node with degree zero;
  • Non-terminal node or branch node: node whose degree is not zero;
  • Parent node or parent node: If a node has children, this node is called the parent node of its children.
  • Child node or child node: The root node of the subtree that a node contains is called the child node of that node.
  • Sibling node: nodes with the same parent node are called sibling nodes.
  • Node hierarchy: the root is defined as the first layer, the child nodes of the root are defined as the second layer, and so on.
  • Depth: For any node n, the depth of n is the length of the unique path from the root to n, and the depth of the root is 0. Some books define the depth of the root as 1.
  • Height: For any node N, the height of n is the longest path from n to a leaf, and the height of all leaves is 0. Some books define the height of leaves as 1.
  • Cousin node: nodes whose parent node is in the same layer are Cousins of each other.
  • Ancestor of a node: all nodes from the root to the branch through which the node passes;
  • Descendant: Any node in the subtree rooted in a node is the descendant of the node.
  • Forest: the collection of m (m >= 0) trees that do not intersect each other is called forest;

Binomial trees, full binomial trees, complete binomial trees and balanced binomial trees

Binary tree

A Binary Tree is an important branch of a Tree structure. Each node can have at most two children, called left child and right child respectively.

Full binary tree

For a binary tree, its nodes are either leaf nodes or have two child nodes. Such a tree is called a full binary tree.

A full binary tree has the property that if the number of layers of a full binary tree is KKK, the total number of nodes in the tree is 2K−12^ K-12K −1.

Complete binary tree

If the depth of a binary tree is KKK and the number of nodes in all other layers (111 ~ K−1K-1K−1) reaches the maximum except for the KKK layer, starting from the KKK layer, all nodes are continuously concentrated in the far left, such a tree is called a complete binary tree.

Balanced binary trees

The idea of a balanced binary tree is one that we can look at first, and then come back and understand a little bit more after we look at binary search trees.

First, the balanced binary tree must be a binary search tree. It is either an empty tree, or the absolute value of the difference between the height of its left and right subtrees (the balance factor) is no more than 1, and both its left and right subtrees are balanced binary trees.

Two: binary search tree

What is a Binary Search Tree?

First of all, binary search tree is a binary tree, and binary search tree has the following two characteristics:

  • The nodes of a binary search tree must store elements that are comparable.
  • The value of each node in a binary search tree is greater than the value of all nodes in its left subtree and less than the value of all nodes in its right subtree.

The nodes of a binary search tree are defined as follows:

public class Node<E extends Comparable<E>> {
    public E e;
    public Node left;
    public Node right;

    public Node(E e) {
        this.e = e;
        left = null;
        right = null; }}Copy the code

Since the tree is a natural recursive data structure, every subtree of a binary search tree is also a binary search tree.

Binary search tree basic operations

The definition of binary search tree does not contain repeated nodes (the equivalent value of nodes is considered to be repeated). Therefore, the binary search tree I implemented follows the original definition of binary search tree, that is, there will be no repeated nodes. However, this is only a matter of design, if you want to design a binary search tree that can add repeating elements.

Adds an element to a binary search tree

If the root node of the current binary search tree is empty, the newly inserted node becomes the root node.

If the root node of the current binary search tree is not empty, the root is used as the node for the current comparison: the newly inserted node is compared with the current node; If the value is smaller than the current node, “go left”, if the value is larger than the current node, “go right”, and then let the node at the next level continue as the current comparison node until it reaches the insertion position.

The following figure shows the process of adding node “28” to the current binary search tree:

Java code:

/ * * *@paramE adds a new element */ to the binary search tree
public void add(E e) {
    root = add(e, root);
}

/ * * *@paramE searches the binary tree for the newly inserted node *@paramNode Node of the current comparison *@returnReturns the root node of the binary search tree */
private Node add(E e, Node node) {
    if (node == null) {
        size++;
        return new Node(e);
    }
    if (e.compareTo((E) node.e) < 0) {
        node.left = add(e, node.left);
    } else if (e.compareTo((E) node.e) > 0) {
        node.right = add(e, node.right);
    }
    return node;
}
Copy the code

Queries the binary search tree for the presence of elements

The logic of querying an element in a binary search tree is essentially the same as adding an element. We start at the root and let the root node be the current comparison node: if the value is equal to the current node, we have found the element; Otherwise, we “go left” if the value is smaller than the current node and “go right” if the value is larger than the current node, recursively processing our algorithmic logic.

If we get to the leaf node and still don’t find the element, then the element is not in the current binary search tree.

Java code for querying an element in a binary search tree:

/ * * *@paramE is looking for the element e star@returnReturns whether the current binary search tree contains the element e */
public boolean contains(E e) {
    return contains(e, root);
}

/ * * *@paramE is looking for the element e star@paramNode Node of the current comparison *@returnReturns whether the current binary search tree contains the element e */
private boolean contains(E e, Node node) {
    if (node == null) {
        return false;
    }

    if (e.compareTo((E) node.e) == 0) {
        return true;
    } else {
        if (e.compareTo((E) node.e) < 0) {
            return contains(e, node.left);
        } else {
            returncontains(e, node.right); }}}Copy the code

Removes an element from a binary search tree

Delete the maximum and minimum values in the binary search tree

The smallest element in a binary search tree represents the node that “goes left” along the root, the “leftmost” node; The node represented by the largest element in a binary search tree is the node “to the right”, “to the right”, along the root.

For the binary search tree shown in the figure above, the smallest element is “22”, which is the “leftmost” node that goes “left” along the root; The largest element is “58”, which is the “right-most” node that goes “right” along the root.

If the “leftmost node” does not have a right subtree, then we just need to delete this node, without any adjustment to the whole binary search tree; If the “leftmost node” has a right subtree, then we replace the node with the right child of the “leftmost node”, as shown below:

This completes the operation of deleting the smallest node in the binary search tree.

The Java code is as follows:

/ * * *@returnReturns the smallest element of the binary search tree */
public E minimum(a) {
    if (size == 0) {
        throw new RuntimeException("BST is empty");
    }
    return (E) minimum(root).e;
}

/ * * *@paramNode returns the node * that has the minimum value of the binary search tree rooted in node@returnReturns the node */ that has the minimum value of the binary search tree rooted in node
private Node minimum(Node node) {
    if (node.left == null) {
        return node;
    }
    return minimum(node.left);
}
/ * * *@returnRemoves the node with the minimum value from the binary search tree and returns the minimum value */
public E removeMin(a) {
    E ret = minimum();
    root = removeMin(root);
    return ret;
}

/ * * *@paramNode Removes the smallest node * in the binary search tree rooted in node@returnReturns the root */ of the new binary search tree after the node is deleted
private Node removeMin(Node node) {
    if (node.left == null) {
        Node rightNode = node.right;
        node.right = null;
        size--;
        return rightNode;
    }
    node.left = removeMin(node.left);
    return node;
}
Copy the code

Similarly, deleting the largest node in a binary search tree is a similar idea. If the “rightmost node” does not have a left subtree, then we just need to delete this node, without any adjustment to the whole binary search tree; If the “rightmost node” has a left subtree, we replace the node with the left child of the “rightmost node”, thus completing the operation of deleting the largest node in the binary search tree.

The Java code is as follows:

/ * * *@returnReturns the largest element of the binary search tree */
public E maximum(a) {
    if (size == 0) {
        throw new RuntimeException("BST is empty");
    }
    return (E) maximum(root).e;
}

/ * * *@paramNode returns the node * that has the maximum value of the binary search tree rooted in node@returnReturns the node */ that has the maximum value of the binary search tree rooted in node
private Node maximum(Node node) {
    if (node.right == null) {
        return node;
    }
    return maximum(node.right);
}
/ * * *@returnDelete the node with the maximum value from the binary search tree and return the maximum value */
public E removeMax(a) {
    E ret = maximum();
    root = removeMax(root);
    return ret;
}

/ * * *@paramNode Removes the largest node * in the binary search tree rooted in node@returnReturns the root */ of the new binary search tree after the node is deleted
private Node removeMax(Node node) {
    if (node.right == null) {
        Node leftNode = node.left;
        node.left = null;
        size--;
        return leftNode;
    }
    node.right = removeMax(node.right);
    return node;
}
Copy the code
Removes any element in the binary search tree

First, let’s consider two cases:

If we delete nodes that only have left subtrees and no right subtrees

Then we just need to replace the deleted node with the left child of the deleted node.

If we delete only the right subtree and no left subtree

Then we just need to let the right child of the deleted node replace the deleted node.

But what if we delete a node that has both left and right subtrees?

In 1962, Hibbard, a computer scientist, proposed an algorithm called Hibbard Deletion. If we want to delete a node that has both left and right subtrees, we first need to find the successors of the nodes to be deleted.

What is a successor node? We perform middle-order traversal on the binary tree, and the next node of this node is the successor node of this node in the order of middle-order traversal.

For a binary search tree, the successor node of the node to be deleted is the “leftmost” node in the right subtree of the node. The deletion operation is completed by replacing the successor node with the node to be deleted.

In addition to looking for the successor node of the node to be deleted, we can also look for the precursor node of the node to be deleted. For binary search tree, the precursor node of the node to be deleted is the “rightmost” node of the left subtree of the node. It is also possible to replace the precursor node with the node to be deleted, and I will use the idea of the successor node here.

The Hibbard Deletion algorithm is as follows:

The Java code for deleting a node in a binary search tree is as follows

public void remove(E e) {
    root = remove(e, root);
}

private Node remove(E e, Node node) {
    if (node == null) {
        return null;
    }
    if (e.compareTo((E) node.e) < 0) {
        node.left = remove(e, node.left);
        return node;
    } else if (e.compareTo((E) node.e) > 0) {
        node.right = remove(e, node.right);
        return node;
    } else {
        if (node.left == null) {
            Node rightNode = node.right;
            node.right = null;
            size--;
            return rightNode;
        }
        if (node.right == null) {
            Node leftNode = node.left;
            node.left = null;
            size--;
            return leftNode;
        }
        
        // Hibbard Deletion
        Node successor = minimum(node.right);
        Node right = removeMin(node.right);
        Node left = node.left;
        successor.left = left;
        successor.right = right;
        node.left = null;
        node.right = null;
        returnsuccessor; }}Copy the code

The complete code for my binary search tree implementation is linked at the end of this article.

The complexity analysis of add, delete, change and search operation in binary search tree

In the best case, when the binary search tree is a balanced binary tree. The time complexity of add, delete, modify and query is O(logN)O(logN)O(logN).

Unfortunately, a binary search tree is not a balanced binary tree.

If we add the elements of an increasing array to a binary search tree, the binary search tree is reduced to a linked list, and the time complexity of adding, deleting, modifying and searching is reduced to O(N)O(N)O(N).

In order to solve the problem of binary search tree, the data structure of binary search tree is constantly improved in the future, such as AVL tree, such as absolute balanced binary tree, and red black tree, such as weak balanced binary tree data structure.

Stay tuned as I introduce each of these data structures in turn.

Three: binary search method

After we understand binary search tree such a data structure, it is not difficult to understand binary search method.

Binary Search, also known as Binary Search, is a highly efficient Search algorithm, but the premise of Binary Search is that the sequence to be searched must be an ordered sequence. In other words, sorting is a prerequisite for binary search.

The principle of binary search is very simple:

If our target value is smaller than v in the middle of the array, we continue to look to the left of V. If our target value is greater than the middle number v, we continue to look to the right of v. It’s easy to imagine that this is really just a recursive process.

What is the time complexity of binary search?

Each time we order to locate the halfway point of the array, and let the target compared with mid-range, we will return a result directly, if found, if not found, the target is not divided in mid-range as left side is the mid-range as dividing the right half, in other words, each time, we can cut down half of the data. So, the time complexity of binary search is actually the number of ordered array N divided by 2 until we find the target value, which is O(logN)O(logN)O(logN).

Another way to think about it is to assume that all of our data is distributed in a binary search tree, and that our binary search tree is an absolutely balanced binary tree. The maximum number of times we need to search for an element is the height of the tree h, where the relationship between h and the amount of data N is as follows: N=2h−1N =2 ^ h-1n =2h−1. Therefore, we can also conclude that the time complexity of the binary search method is O(logN)O(logN)O(logN).

The idea of binary search seems simple enough, but implementation requires a lot of attention to detail. It is worth mentioning that the idea of binary search was put forward by John Mauchly in 1946. It was not until 1962 that the first bug-free binary search algorithm was implemented, with a full gap of 16 years :).

The realization of binary search method

Binary search method Java code:

public class BinarySearch {
    private BinarySearch(a) {}/** * Nonrecursive algorithm for binary lookup: Returns target's index in the ordinal group data, or -1 ** if not@param data
     * @param target
     * @param <E>
     * @return* /
    public static <E extends Comparable<E>> int search(E[] data, E target) {
        int l = 0, r = data.length - 1;

        while (l <= r) {
            int mid = l + ((r - l) >> 1);

            if (target.compareTo(data[mid]) == 0)
                return mid;
            else if (target.compareTo(data[mid]) > 0)
                l = mid + 1;
            else
                r = mid - 1;
        }
        return -1;
    }

    /** * Binary search recursive algorithm: returns target's index in the ordinal group data, or -1 ** if not@param data
     * @param target
     * @param <E>
     * @return* /
    public static <E extends Comparable<E>> int searchR(E[] data, E target) {
        return searchR(data, 0, data.length - 1, target);
    }

    private static <E extends Comparable<E>> int searchR(E[] data, int l, int r, E target) {
        if (l > r)
            return -1;

        int mid = l + ((r - l) >> 1);

        if (data[mid].compareTo(target) == 0)
            return mid;
        if (target.compareTo(data[mid]) > 0)
            return searchR(data, mid + 1, r, target);

        return searchR(data, l, mid - 1, target); }}Copy the code

The implementation of binary lookup algorithms seems very simple now, so why was the implementation of binary lookup buggy before 1962?

The main bug in this is the way to write the median mid.

The previous code was written like this:

int mid = (l + r) / 2;
Copy the code

In the case that L and R do not overflow, we cannot guarantee that L + R does not overflow.

So, the improved code is written like this:

// int mid = l + (r - l)/2;
int mid = l + ((r - l) >> 1);
Copy the code

In this way, we can guarantee the problem of data overflow range in calculation when L and R are very large.

Four:

In this article, I introduced binary search tree data structure and its code implementation, and also introduced binary search algorithm. As you can see, binary search and binary search tree data structures are inextricably linked in nature.

However, binary search tree is not a balanced binary tree, so in the extreme case of given data, binary search tree will degenerate into a linked list, which seriously reduces its add, delete, change and query performance.

I’m going to introduce you to some balanced binary trees later, and we can explore how these balanced binary trees are optimized from the binary search tree.

Links to the code used in this article:

Github.com/jinrunheng/…

Ok, so far, this article I will introduce the end ~ welcome to pay attention to my public number [kim_talk], here I hope you can harvest more knowledge, we see you next issue!