What is an AVL tree

What is a binary search tree

The binary search method can be regarded as one of the most widely used basic algorithms. For the ordered sequence, the binary search method can optimize the time complexity from O(n) to O(logn), which can greatly improve the running speed of the program.

Using the idea of binary search, we can construct a binary search tree in which each node is larger than its left child and smaller than its right child. Like this:

In this ideal case, the entire tree is uniform and the ideal time complexity for each operation is O(logN).

What is a balanced tree

In reality we have to consider extreme cases. For example, the same set of data might generate a tree like this:



Such cases are rare, but when they do occur, the results can be fatal. Moreover, in general, the probability of unbalanced binary tree is not small, but once unbalanced operation efficiency will decrease. For example, in this extreme case, the entire binary tree is reduced to a linked list, and binary search is reduced to a linear search.

To solve this problem, many balanced trees have been designed. The balance tree maintains its own balance after each operation, which can maintain the high efficiency of binary search.

What is an AVL tree

The most basic of all balanced trees is the AVL tree, so it’s a good place to start learning about balanced trees.

Tree and node definition

It’s easy to define a node, but here I’m going to design the node as a tree inner class, and then I’m going to use a little generics.

public class AVLTree<T extends Comparable<? super T>>
    class Node {
        int height;
        Node left;
        Node right;
        T key;
        Node (T key, Node l, Node r, int h) {
            left = l;
            right = r;
            this.key = key; height = h; }}}Copy the code

A couple of things to say here, left and right are obviously the left and right subtrees, and then key is the key that compares the size. Since this is the simplest AVL tree, there is no value corresponding to a key, but the principle is the same. And then height, the definition of height varies from source to source. Here I define the height of the leaf node as 0, the height of the null node as -1, and the height of the other nodes as Max {left.height, right.height}.


p ( p t r e e Sunday afternoon p indicates n u l l ) ( p . l e f t . h e i g h t p . r i g h t . h e i g h t Or less 1 ) \forall p (p \in tree \wedge p \ne null) \rightarrow (|p.left.height – p.right.height| \le 1)

The reason for this definition is clear: if the height of the left and right subtrees is exactly the same, then no nodes can be added or deleted (for a single node, it is not balanced), but keeping the height difference less than or equal to 1 will do all the operations and keep the tree well balanced. To eliminate null judgments when getting height, define a helper method

private int getHeight(Node p) {
        if (p == null)
            return -1;
        else
            return p.height;
}
Copy the code

On this basis, define the function that determines equilibrium (p is not null).

private boolean isBalance(Node p) {
    return Math.abs(getHeight(p.left) - getHeight(p.right)) <= 1;
}
Copy the code

Now consider the add and delete operations separately.

How to add nodes

Consider the state before and after adding nodes. Since this is a balanced tree, the tree must be balanced at every moment. The initial state of this tree is an empty tree, which by definition is an equilibrium tree. Now the job is to make sure, like mathematical induction, that if you add nodes to a balanced tree, it’s still a balanced tree. Now define the insert method:

private Node insert(Node p, T key) {}
Copy the code

Now to illustrate the behavior of this method, for some node p (p can be null), insert key (key cannot be null).

  1. If the key already exists in the p-tree, this method does not change the structure of the p-tree.
  2. If the key does not exist, insert the new node into the p-tree.
  3. Maintain the balance of the p tree and return to the balanced p root (the root may change after the balance, more details below).
  4. Update the height of the P tree.

Now to use a bit of recursion, our operation to the current node could look like this (here we change it directly with p as the return value) :

if (p == null) {
    p = new Node(key, null.null.0);
} else {
    int cmp = key.CompareTo(p.key);
}
Copy the code

Here we define a variable CMP to represent the size relationship. If CMP is equal to 0, then this key already exists, so we don’t need to insert it. We focus on the state where CMP is not equal to 0. Ignoring the equilibrium operation, we can write:

    if (cmp < 0)
        p.left = insert(p.left, key);
    else if (cmp > 0)
        p.right = insert(p.right, key);
    if(! isBalance(p)) p = keepBalance(p); p.height = Math.max(p.left.height, p.right.height);Copy the code

So how to keep the balance?

How to Keep your Balance

Define what balance is

The initial state of this operation is that the left and right subtrees are balanced, but the p-tree is unbalanced (the height difference between the left and right subtrees is 2). The final state is that the left and right subtrees are balanced and the p-tree is balanced, so this balancing operation can also be used to delete nodes.

LL rotation

Right at the height of the left subtree is higher than the subtree, under the condition of only add a node in the left subtree is likely to make the entire tree imbalance (the height of a node can only make it + 1), is now considering a left subtree, also is to add a node b (according to the method definition, after adding b is still a balance tree), and then when the height of the b + 1, A tree rooted in A is unbalanced.



Obviously, b must not be null in this case. Now let’s analyze the b subtree case.



In this case, the height of the left subtree is determined, and the value of the right subtree H2 can only be h0, h0 + 1, and then we use a clever transformation (don’t ask me how I came up with this thing) called LL rotation to adjust the whole tree like this:



Then we analyze the height case:


known h e = h 0 or h 0 + 1 . h c = h 0 . h d = h 0 + 1 for a Two subtrees of theta h e h c Or less 1 , so a balance for a for h a = m a x ( h e . h c ) = h 0 + 1 or h 0 + 2 for b I have two subtrees of phi h d h a Or less 1 . so b balance \ begin} {aligned & known h_e = h_0 or h_0 + 1, h_c = h_0, h_d = h_0 + 1 \ \ & for a two KeZi trees | h_e – h_c | \ le 1, So a balance for a \ \ & h_a = Max (h_c h_e) = h_0 + 1 or h_0 + 2 \ \ & for two KeZi tree also has a | b h_d – h_a | \ le 1, so the balance b \ \ \ end} {aligned

Now that the entire subtree has been rebalanced, the most important thing is that the middle-order traversal hasn’t changed. . We can write this method in Java:

//p ! = null && p.left ! = null Returns the new root and changes the height
private Node rotateLL(Node p) {
    Node p1 = p.left;
    p.left = p1.right;
    p1.right = p;

    p.height = Math.max(getHeight(p.left), getHeight(p.right)) + 1;
    p1.height = Math.max(getHeight(p1.left), getHeight(p1.right)) + 1;
    return p1;
}
Copy the code

And similarly RR can be written:

private Node rotateRR(Node p) {
    Node p1 = p.right;
    p.right = p1.left;
    p1.left = p;

    p.height = Math.max(getHeight(p.left), getHeight(p.right)) + 1;
    p1.height = Math.max(getHeight(p1.left), getHeight(p1.right)) + 1;
    return p1;
}
Copy the code

Both methods are valid when called. For example, for LL operations, b is obviously not null (higher than C).

LR rotation



Now we’re talking about a situation where H1 is not equal to h0+1, which is h1 = h0, and we can also make a neat adjustment while keeping the middle order traversal constant, so we’ll give the conclusion and then prove it. We only need to perform RR transform on node B (then node B is also obviously not null, and e is higher than D is not null), and then perform LL transform on node A, resulting in the following figure (let the left and right children of e be el, er) :

Do RR rotation for B:



Then rotate A LL:

Now analyze the equilibrium:


Now known h d = h 0 . h e = h 0 + 1 . h c = h 0 ; First analysis b Due to e The subtree with root is a balanced tree, so h e l . h e r = h 0 or h 0 1 ; so h d h e l Or less 1 . so b Balanced and h b = h 0 + 1 ; reanalysis a The same h c h e r Or less 1 . so a Balance, and h a = h 0 + 1 ; so h a = h b ; so e Balance. \begin{aligned} & Now we know h_d = h_0, h_E = h_0 + 1, h_c = h_0; Since the subtree root e is a balanced tree, h_{el}, h_{er} = h_0 or h_0-1; \ \ & so | h_d h_ – {el} | \ le 1, so balanced and b h_b = h_0 + 1; A \ \ \ \ & analysis & the same | h_c h_ – {er} | \ le 1, so a balance, and h_a = h_0 + 1; \\ & Then h_a = h_b; So e is equilibrium. \end{aligned}

As for middle-order traversal, it is easy to verify that there is no change. So far, we have discussed both cases when we insert to the left. Because of symmetry, we insert to the right the node is completely symmetric, and we can write out the way to maintain equilibrium: first LR:

private Node rotateLR(Node p) {
    p.left = rotateRR(p.left);
    returnrotateLL(p); } &emsp; &emsp; Then there's balance:Copy the code
// Can be called when p is unbalanced and the left and right subtrees are 2 different in height, return the new root and update the height
private Node keepBalance(Node p) {
    int dh = getHeight(p.left) - getHeight(p.right);
    if (dh > 0) {
        if (getHeight(p.left) == getHeight(p.left.left) + 1)
            p = rotateLL(p);
        else
            p = rotateLR(p);
    } else {
        if (getHeight(p.right) == getHeight(p.right.right) + 1)
            p = rotateRR(p);
        else
            p = rotateRL(p);
    }
    return p;
}
Copy the code

The complete insertion method is as follows:

private Node insert(Node p, T key) {
    if (p == null) {
        p = new Node(key, null.null.0);
    } else {
        int cmp = key.compareTo(p.key);
        if (cmp < 0)
            p.left = insert(p.left, key);
        else if (cmp > 0)
            p.right = insert(p.right, key);
        if(! isBalance(p)) p = keepBalance(p); p.height = Math.max(getHeight(p.left), getHeight(p.right)) +1;
    }
    return p;
}
Copy the code

Insert maximum height plus one

We have described the call condition of the balanced method, now we can prove it:


For an empty tree . After inserting the node height from 1 become 0 , highly + 1 For any non-empty tree, if you insert nodes into two subtrees, the height of the subtree is maximum + 1 If the balancing method is not called, p The height of the tree is maximum + 1 . If you call the balancing method (which of course is eligible), According to the above analysis, p Maximum tree height + 1 And through this left and right subtree construction method, you can construct any tree A V L Tree, so this is true \begin{aligned} & For empty trees, the height changes from -1 to 0, +1\\ & For any non-empty tree, if you insert nodes into two subtrees, the height of the subtree is at most +1\\ & if no balancing method is called, the height of the p-tree is at most +1. \\ & If the balancing method is called (which of course is true), \\ & according to the above analysis, the p-tree height is at most +1\\ & and any AVL tree can be constructed using the left/right subtree construction method, so the statement \end{aligned} is true.

delete

Delete is the reverse of insert, and the principle is similar. Let’s also use recursion. Define the behavior of this method first:

private Node delete(Node p, T key){}
Copy the code
  1. Delete key from the subtree root p and return the new root.
  2. P can be null, but key cannot be null
  3. Update the height of the subtree and keep the subtree balanced.

Implement the simple part first:

if(p ! =null) {
        int cmp = key.compareTo(p.key);
        if(cmp ! =0) {
            if (cmp < 0)
                p.left = delete(p.left, key);
            else if (cmp > 0)
                p.right = delete(p.right, key); 
            if (!isBalance(p))
                p = keepBalance(p);
        }
}
Copy the code

And then the node that you want to delete is the current node, and in that case, if one of the left and right subtrees is empty or both of them are empty, then it’s easy to just delete them like a list, and the height of the whole tree is reduced by one. In the case that neither subtree is empty, replace the current node with the largest node from the left subtree if the height of the left subtree is higher, otherwise replace the current node with the smallest node from the right subtree. The operation sequence is to delete the node found first, and then replace the value of node P, and the whole tree is still balanced after the operation, which is proved as follows:


Proof: When a node is removed from a tree, the height of the tree is reduced at most 1 This is obviously true when the tree is empty (the height is constant), and now assuming that this is true for the left and right subtrees of some tree, Consider four scenarios 1. Delete from the left subtree, obviously p The height of the tree is reduced at most 1 2. Delete from the right subtree, obviously p The height of the tree is reduced at most 1 3. Delete the current node, but at least one subtree of the current node is empty, obviously p Tree height reduction 1 4. Delete the current node, and the left and right subtrees are not empty When the left subtree is higher, the maximum node is removed from the left subtree and the maximum height of the left subtree is reduced 1 If I subtract the height of the left subtree 1 . p The height of the tree goes down 1 If the left subtree is not taller than the right subtree, the smallest node of the right subtree is deleted The height of the right subtree is at most subtracted 1 , so p The height of the tree is at most reduced 1 In conclusion, the above proposition is valid, and it can be seen that the deletion operation is correct \begin{aligned} &\ prove: removing a node from the tree reduces the height of the tree by at most 1\\ & This is obviously true if the tree is empty (the height remains the same). Now assume that this is true for the left and right subtrees of a tree, \\ & consider four cases \\ &1. Deleting from the left subtree obviously reduces the height of the p-tree by at most 1\\ &2. Deleting from the right subtree obviously reduces the height of the p-tree by at most 1\\ &3. Delete the current node, but at least one subtree of the current node is empty, obviously the height of the p-tree is reduced by 1\\ &4. Delete the current node, and the left and right subtrees are not empty \ \ & when left subtree is higher, most to delete from the left subtree nodes, the left subtree is most highly minus 1 \ \ & if left subtree is minus 1, the height of the tree of p minus 1 \ \ & when left subtree than right subtree is high, the minimum node delete right subtree \ \ & the height of the right subtree is minus 1, So the height of the p-tree is reduced by at most 1 \\ &\ \

Now write the delete operation in code, the first is to find the Max/min method:

private T findMin(Node p) { //p should not be null
    T ret = p.key;
    if(p.left ! =null)
        ret = findMin(p.left);
    return ret;
}

private T findMax(Node p) { // p should not be null
    T ret = p.key;
    if(p.right ! =null)
        ret = findMax(p.right);
    return ret;
} 
Copy the code

On this basis it is easy to write the delete operation:

private Node delete(Node p, T key) {
    if(p ! =null) {
        int cmp = key.compareTo(p.key);
        if(cmp ! =0) {
            if (cmp < 0)
                p.left = delete(p.left, key);
            else
                p.right = delete(p.right, key);
            if(! isBalance(p)) p = keepBalance(p); }else {
            if (p.left == null && p.right == null)
                p = null;
            else if (p.right == null) {
                p = p.left;
            } else if (p.left == null) {
                p = p.right;
            } else {
                if (getHeight(p.left) > getHeight(p.right)) {
                    T maxKey = findMax(p.left);
                    p.left = delete(p.left, maxKey);
                    p.key = maxKey;
                } else{ T minKey = findMin(p.right); p.right = delete(p.right, minKey); p.key = minKey; }}}if(p ! =null)
                p.height = Math.max(getHeight(p.left), getHeight(p.right)) + 1;
    }
    return p;
}
Copy the code

Remaining operations

All that is left is the search and traverse. These two operations do not involve element modification, so they are common to the ordinary binary search tree method, which is implemented as follows:

@Override
public String toString(a) {
    builder.delete(0, builder.length());// StringBuilder is used
    walk(root);
    return builder.toString();
}

private void walk(Node p) { // p can be null.
    if(p ! =null) {
        walk(p.left);
        builder.append(p.key).append(' '); walk(p.right); }}private boolean contain(Node p, T key) { // p can be null, but key should not be null
    boolean ret = false;
    if(p ! =null) {
        int cmp = key.compareTo(p.key);
        if (cmp < 0) {
            ret = contain(p.left, key);
        } else if (cmp > 0) {
            ret = contain(p.right, key);
        } else {
            ret = true; }}return ret;
}
Copy the code

encapsulation

The final step is to shell the previous recursive method so that the outside can only operate on the root:

public void add(T key) {
    root = insert(root, key);
}

public void remove(T key) {
    root = delete(root, key);
}

public boolean contain(T key) {
    return contain(root, key);
}
Copy the code

conclusion

The balancing method of AVL is written in detail, and the operation of inserting and deleting elements is implemented on this basis, and then the whole AVL tree is completed. In the case of both diagram and code, a little proof was added to make the article more rigorous. Of course, there are many deficiencies, and we will correct them if we find them.