The prefix tree

The problem

Words =[‘accommodate’,’accompany’,’accord’,’account’,’accurate’,’adjoin’,’bang’,’bare’,’duplicate’,’migration’,’ Account ‘,’accurate’,’adjoin’,’bang’] To solve the following problems

Sort the string array words
Count the number of occurrences of the string ‘bang’ in words
Count the number of strings in the string array words prefixed with ‘ba’

You can probably come up with several solutions, some of which are good. But I thought that without them, there is actually a solution to solve, and the effect is also good, this solution is the prefix tree.

What is a prefix tree

A prefix tree, also known as a Trie, dictionary tree, or word lookup tree, is a tree consisting of a large number of words with many child nodes. Properties:

The root node does not contain any keys
Each child node contains a different key

Core idea: Use the same prefix to reduce repeated comparison and improve search efficiency disadvantages: The Trie tree has a large space consumption (if a node has 26 children, each child has 26 children 26 * 26 * 26 *…… , exponential increase) Advantage: use the same prefix, improve the efficiency of comparison, reduce useless work

Building a Prefix Tree

Use the string array words to build a prefix tree

Graphs show prefix trees

This graph is an array of strings to build a prefix tree, without the path and end notation

Code implementation

There’s no complicated logic to the implementation of the code, just breaking up the string into individual characters, and then following a path of insertion, okay

// Node information
   private  char key;//
    private int path;// Indicates passing the node
    private int end;// Ends with the node
    private TrieNode[] childs = new TrieNode[26];

    public  TrieNode getIndexNode(int index){
        return childs[index];
    }
    public  TrieNode getIndexNodeNoEmpty(int index){
        if(childs[index] ==null){
            childs[index] = new TrieNode();
        }

        return childs[index];
    }
    public void updatePath(boolean isUp){
        if(isUp){
            path++;
        }else{ path--; }}public  void updateEnd( boolean isUp){
        if(isUp) {
            end++;
        }else{ end --; }}Copy the code

In node information, path is used to count the number of words passing through the node. End indicates the number of words ending with this node. Childs are all child nodes;

// Build prefix tree, also insert string
 public  void  buildTrie(String value){
        if(value == null ||value.length() == 0) {return;
        }
        if(root == null){
            root = new TrieNode();
        }
      char[] chars =  value.toCharArray();
        TrieNode current = root;
        int length = chars.length;
        for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNodeNoEmpty(chars[i]-'a');
            node.setKey(chars[i]);
            node.updatePath(true);// Increase the number of passes
            current =node;
        }
        current.updateEnd(true);// Increase the number of endings

    }
Copy the code

The Trie code is implemented by splitting words into character arrays and then following the character path to create or share existing nodes

Basic operations and applications of Trie

The basic operation is add, delete, find, plus a traversal

Augmentation: Augmentation is simply adding strings to the prefix tree, which is called constructing. See the previous section
Change: Change can be viewed as the sum of delete and add operations
Find and delete: The code for these two operations is similar to the logic of the add operation, search only returns the result, delete is a subtraction of the path through the operation
Traversal: Similar to tree traversal

To find the

 public int  searchValue(String value){
        if(value == null ||value.length() == 0) {return 0;
        }
        if(root == null) {return  0;
        }
        char[] chars =  value.toCharArray();
        TrieNode current = root;
        int length = chars.length;
        for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNode(chars[i]-'a');
              if(node == null) {return  0;
              }
              current = node;
        }
        return  current.getEnd();
    }
Copy the code

delete

 public  void delete(String value){
        if(value == null ||value.length() == 0) {return;
        }
        char[] chars =  value.toCharArray();
        TrieNode current = root;
        int length = chars.length;
        for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNode(chars[i]-'a');
            if(node ! =null) {
                node.updatePath(false);// Reduce the number of passes
                current = node;
            }
        }
        current.updateEnd(false);// Reduce the number of endings

    }
Copy the code

traverse

This traversal excludes repeated strings. If you want to include all strings, you can just loop through the number of ends when adding to the list

// Go through the root children one by one
  public  List travelTrie(a){
        if(root == null) {return null;
        }
        if(result == null){
            result = new ArrayList<>();
        }
        result.clear();
        for(int i = 0; i<26; i++){ travel(root.getIndexNode(i),"");
        }
        return  result;
    }
// recursive traversal
 private   void travel(TrieNode node ,String value){
        if(node == null) {return;
        }
        value = value+node.getKey();
        if(node.getEnd()>0){
            result.add(value);
        }
        for(int i =0; i<26; i++){if(node.getIndexNode(i)! =null){ travel(node.getIndexNode(i),value); }}}Copy the code

Apply (solve the problem at the beginning of the article)

If the words array becomes thousands or even tens of thousands and the array increases and decreases at any time, the query string changes, the superiority of the prefix tree is not reflected

Problem 1: Sort the string array words

Solution: Make sure childs are ordered. By iterating through the Trie, synthesizing the path to the string, output the string according to the end tag, and get an ordered array of strings

Problem 2: Count the number of occurrences of the string ‘bang’ in words

Solution: This is easier to do, is to find and output end, is the searchValue operation can be

Problem 3: Count the number of strings in the string array words prefixed with ‘ba’

Solution: Similar to problem 2, just change end to PATH

The compression

Look at the first diagram and see if there are many nodes, some nodes are not shared. Like the word migration, the entire path is by itself. There are no forks. You can combine the nodes that have no forks consecutively into one node to save space. The advantages of this optimization are obvious, spatial optimization is very large, and the disadvantages are also obvious to increase the workload of coding. The image below is a compressed version of the image above

This is the result of space compression

conclusion

Prefix tree is a sharp tool to deal with strings, including but not limited to the problems mentioned in the article, its efficiency is good, the algorithm is easy to understand, the focus and difficulty is how to abstract the problems encountered into a prefix tree, with the idea of prefix tree to solve the problems encountered.