The problem
Words =[‘accommodate’,’accompany’,’accord’,’account’,’accurate’,’adjoin’,’bang’,’bare’,’duplicate’,’migration’,’ Account ‘,’accurate’,’adjoin’,’bang’] To solve the following problems
- Sort the string array words
- Count the number of occurrences of the string ‘bang’ in words
- Count the number of strings in the string array words prefixed with ‘ba’
You can probably come up with several solutions, some of which are good. But I thought that without them, there is actually a solution to solve, and the effect is also good, this solution is the prefix tree.
What is a prefix tree
A prefix tree, also known as a Trie, dictionary tree, or word lookup tree, is a tree consisting of a large number of words with many child nodes. Properties:
- The root node does not contain any keys
- Each child node contains a different key
Core idea: Use the same prefix to reduce repeated comparison and improve search efficiency disadvantages: The Trie tree has a large space consumption (if a node has 26 children, each child has 26 children 26 * 26 * 26 *…… , exponential increase) Advantage: use the same prefix, improve the efficiency of comparison, reduce useless work
Building a Prefix Tree
Use the string array words to build a prefix tree
Graphs show prefix trees
This graph is an array of strings to build a prefix tree, without the path and end notation
Code implementation
There’s no complicated logic to the implementation of the code, just breaking up the string into individual characters, and then following a path of insertion, okay
// Node information
private char key;//
private int path;// Indicates passing the node
private int end;// Ends with the node
private TrieNode[] childs = new TrieNode[26];
public TrieNode getIndexNode(int index){
return childs[index];
}
public TrieNode getIndexNodeNoEmpty(int index){
if(childs[index] ==null){
childs[index] = new TrieNode();
}
return childs[index];
}
public void updatePath(boolean isUp){
if(isUp){
path++;
}else{ path--; }}public void updateEnd( boolean isUp){
if(isUp) {
end++;
}else{ end --; }}Copy the code
In node information, path is used to count the number of words passing through the node. End indicates the number of words ending with this node. Childs are all child nodes;
// Build prefix tree, also insert string
public void buildTrie(String value){
if(value == null ||value.length() == 0) {return;
}
if(root == null){
root = new TrieNode();
}
char[] chars = value.toCharArray();
TrieNode current = root;
int length = chars.length;
for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNodeNoEmpty(chars[i]-'a');
node.setKey(chars[i]);
node.updatePath(true);// Increase the number of passes
current =node;
}
current.updateEnd(true);// Increase the number of endings
}
Copy the code
The Trie code is implemented by splitting words into character arrays and then following the character path to create or share existing nodes
Basic operations and applications of Trie
The basic operation is add, delete, find, plus a traversal
- Augmentation: Augmentation is simply adding strings to the prefix tree, which is called constructing. See the previous section
- Change: Change can be viewed as the sum of delete and add operations
- Find and delete: The code for these two operations is similar to the logic of the add operation, search only returns the result, delete is a subtraction of the path through the operation
- Traversal: Similar to tree traversal
To find the
public int searchValue(String value){
if(value == null ||value.length() == 0) {return 0;
}
if(root == null) {return 0;
}
char[] chars = value.toCharArray();
TrieNode current = root;
int length = chars.length;
for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNode(chars[i]-'a');
if(node == null) {return 0;
}
current = node;
}
return current.getEnd();
}
Copy the code
delete
public void delete(String value){
if(value == null ||value.length() == 0) {return;
}
char[] chars = value.toCharArray();
TrieNode current = root;
int length = chars.length;
for(int i = 0; i<length; i++){ TrieNode node = current.getIndexNode(chars[i]-'a');
if(node ! =null) {
node.updatePath(false);// Reduce the number of passes
current = node;
}
}
current.updateEnd(false);// Reduce the number of endings
}
Copy the code
traverse
This traversal excludes repeated strings. If you want to include all strings, you can just loop through the number of ends when adding to the list
// Go through the root children one by one
public List travelTrie(a){
if(root == null) {return null;
}
if(result == null){
result = new ArrayList<>();
}
result.clear();
for(int i = 0; i<26; i++){ travel(root.getIndexNode(i),"");
}
return result;
}
// recursive traversal
private void travel(TrieNode node ,String value){
if(node == null) {return;
}
value = value+node.getKey();
if(node.getEnd()>0){
result.add(value);
}
for(int i =0; i<26; i++){if(node.getIndexNode(i)! =null){ travel(node.getIndexNode(i),value); }}}Copy the code
Apply (solve the problem at the beginning of the article)
If the words array becomes thousands or even tens of thousands and the array increases and decreases at any time, the query string changes, the superiority of the prefix tree is not reflected
- Problem 1: Sort the string array words
Solution: Make sure childs are ordered. By iterating through the Trie, synthesizing the path to the string, output the string according to the end tag, and get an ordered array of strings
- Problem 2: Count the number of occurrences of the string ‘bang’ in words
Solution: This is easier to do, is to find and output end, is the searchValue operation can be
- Problem 3: Count the number of strings in the string array words prefixed with ‘ba’
Solution: Similar to problem 2, just change end to PATH
The compression
Look at the first diagram and see if there are many nodes, some nodes are not shared. Like the word migration, the entire path is by itself. There are no forks. You can combine the nodes that have no forks consecutively into one node to save space. The advantages of this optimization are obvious, spatial optimization is very large, and the disadvantages are also obvious to increase the workload of coding. The image below is a compressed version of the image above
This is the result of space compression
conclusion
Prefix tree is a sharp tool to deal with strings, including but not limited to the problems mentioned in the article, its efficiency is good, the algorithm is easy to understand, the focus and difficulty is how to abstract the problems encountered into a prefix tree, with the idea of prefix tree to solve the problems encountered.