Dictionary trees for data structures and algorithms

preface

The Trie tree is a data structure that is not very common but is very useful and one of the data structures that the blogger recently learned about was doing a few bytes of problems. And will share their learning content with everyone.

define

First, what is a Trie tree? As the name implies, when querying the target, access the nodes of the tree in the same order as the dictionary according to certain criteria and steps. For a simple example, if you look up the word “He” in the English dictionary, you must first find the letter h in the order of A-Z, and then find the letter E in the same order. A dictionary tree is a structure like a dictionary.

The process of finding the word mentioned above is to continuously find the string with the same prefix as the word to be looked up until the last letter of the word to be looked up. Therefore, a dictionary Tree is also called a prefix Tree.

The nature of the

Use hell, hi, new, and NOp as examples to create a dictionary tree

The structural properties of the dictionary tree can be obtained according to the above

The root node of the Trie tree does not store any characters. Each node except the root node stores only one character
The string formed on the path from the root node to a node is the corresponding string of the node
The child nodes of each node store different characters

Construct the dictionary tree according to the above three points.

The construction of a dictionary tree

The construction of a dictionary tree is relatively simple, not much different from the construction of a normal tree. The following will be the dictionary tree insertion, deletion, query operations are analyzed and explained.

The insert

In the absence of a dictionary tree, we need to build a dictionary tree first.

For example, insert hell:

Check to see if ‘h’ exists in the children of the root node. If no, insert a node with ‘h’. If yes, continue to access ‘h’ node
Continue to the ‘h’ node, check to see if ‘e’ exists in the ‘h’ node, if not insert a node with ‘e’, and so on
Until all characters in the string to be inserted are inserted, the last node is marked with a word tag, indicating that a word is reached from the root node to this point

Then insert the word hit, the process is as follows: check = > exists, access/not exist, create a new node and then access = > until the word to be inserted reaches the last character.

The insertion operation of the dictionary tree is relatively simple and does not require much sorting.

Query operation

As stated above, in accordance with the standards of a certain query string, each node to store one character at a time, the root node to child nodes path string is the node of the string, the query target string according to the corresponding character in step by step from the root node access nodes, actually also is the process of matching string prefix.

In the dictionary tree, look for “hell”,

Select ‘h’ from ‘root’ where ‘h’ is found
Query ‘e’ from ‘h’ child node
Query ‘l’ from ‘e’ child node
Select ‘l’ from ‘l’ child node where ‘l’ has word marker bits, string match successful, return

If you query no in this dictionary

Select ‘n’ from root; select ‘n’ from root;
Select ‘o’ from ‘n’; select ‘o’ from ‘n’; ‘o’ from ‘n’

Delete operation

Delete is a bit more complex than insert and query, but it’s simple, as long as the word already exists in the dictionary tree.

The operation of removing a dictionary tree node takes into account whether the last character of the target string is a leaf node in the tree.

Because a word may be a prefix to another word, if it is not a leaf node, we only need to empty the word flag bit of the word, instead of deleting the whole “branch”.

The deleted word is in the prefix

For example, if you want to delete the word “no”

So let’s start at the root and find the node ‘n’
Start with ‘n’ and find the node ‘0’
At the end of string matching, the ‘o’ node has a word flag bit. Since ‘o’ is not a leaf node at this time, the word flag bit of this node can be cleared.

Delete the entire word

For example, if you want to delete the word “hell”, you would do the same as the first deletion, but start with the last node, the ‘l’ node is the leaf node, and then go up.

Delete the word on the branch

For example, if you want to delete “hi”, you can access the leaf node ‘I ‘, delete the leaf node, and go up to ‘h’. Since ‘h’ is still not a leaf node after’ I ‘is deleted, you will not continue to delete the node.

The deleted words contain other words

For example, to delete “nop”, just like the previous ones, access the leaf node ‘p’ to delete, then move up to find that ‘o’ is a leaf node, but ‘o’ has a word marker bit, so there is no further deletion.

There are several delete operations above, and we get the delete criteria:

If the last character of the word is a non-leaf node, only the word flag bit is cleared
If the word is on its own, separate from other words, delete the whole branch
If a non-leaf node is encountered during word deletion, the deletion operation ends
If the word flag bit of another word is encountered during the word deletion, the deletion is complete

The purpose of the dictionary tree

With so many operations of the dictionary tree, I believe you have a general understanding of the purpose of the dictionary tree, the most important role of the dictionary tree is for == string matching ==, prefix matching (fuzzy search), string lookup (dictionary) and so on.

Prefix matching (fuzzy search)

The blogger typed in just four keywords, and the search list returned a number of options led by trickle

String lookup (dictionary)

Just as the name suggests, it is a simple dictionary, not many examples.

meaning

Dictionary trees are built to make the insertion and lookup of strings efficient by using the space-for-time idea and the common prefixes of strings to reduce invalid string comparisons. Its insertion or search time is O(n), where n is the length of the string.

Dictionary trees have their downsides, of course. When inserted words don’t have many common prefixes, the construction of a dictionary tree becomes complicated and inefficient.

Code implementation (Java + Go)

Java version

import java.util.*;

public class TrieTree {
	class TrieNode{
		public TrieNode[] subNode;
		public int count;// Number of child nodes of this node
		public boolean isWord;// Word marker bits
		public TrieNode(a) {
			this.count = 0;
			this.isWord = false;
			this.subNode = new TrieNode[26];// Contains only lowercase letters a to Z}}public TrieNode root;
	public TrieTree(a) {
		root = new TrieNode();
	}
	
	public boolean search(String word) {
		TrieNode curNode = root;
		int index;
		for(int i = 0; i < word.length(); i++) {
			index = word.charAt(i)-'a';
			if(curNode.subNode[index]! =null) {
				curNode = curNode.subNode[index];
			}else{
				return false; }}return curNode.isWord;
	}
	
	public void insert(String word) {
		if(search(word)) {
			System.out.println("The word already exists.");
			return;
		}
		TrieNode node = root;
		int index;
		for(int i = 0; i < word.length(); i++) {
			index = word.charAt(i) - 'a';
			if(node.subNode[index]==null) {
				node.subNode[index]= new TrieNode();
			}
			node.count++;
			node = node.subNode[index];
		}
		node.isWord = true;
	}
	
	
	public void delete(String word) {
		if(! search(word)) { System.out.println("No such word.");
			return;
		}
		TrieNode node = root;
		LinkedList<Integer> indexList = new LinkedList();
		LinkedList<TrieNode> nodeList = new LinkedList();
		int index;
		for(int i = 0; i < word.length(); i++) {
			index = word.charAt(i) - 'a';
			indexList.add(index);
			nodeList.add(node);
			node = node.subNode[index];
		}
		for(int i = word.length() - 1; i >= 0; i--) {
			node = nodeList.pollLast();
			index = indexList.pollLast();
			if(node.subNode[index].subNode == null) {
				if(i ! = word.length() -1) {
					if(node.subNode[index].isWord == true) {// If the prefix node has a word marker bit, then the deletion is not continued
						return;
					}
				}
				node.subNode[index] = null;
				node.count--;
			}
			if(i == word.length()-1) {
				if(node.subNode[index].subNode ! =null) {
					node.subNode[index].isWord = false;
					return; }}}}public static void main(String[] args) {
		TrieTree myTrieTree = new TrieTree();
		String[] words = {"hello"."face"."hi"."hell"."why"};
		// Insert string
		for(String word : words)
			myTrieTree.insert(word);
		
		// Insert duplicate string
		myTrieTree.insert("hello");
		// Delete the string
		myTrieTree.delete("hell");
		// Deduplicates the string
		myTrieTree.delete("hell");
		myTrieTree.delete("hi");
		// Query the string, true if found, false if not found
		System.out.println(myTrieTree.search("hello"));
		System.out.println(myTrieTree.search("hi"));
		System.out.println(myTrieTree.search("hell")); }}Copy the code

Go version

package main

import (
	"fmt"
	)

type TrieNode struct {
	children []*TrieNode
	count int
	isWord bool
}
func createTrieNode(a) *TrieNode{
	return &TrieNode{
		children: make([]*TrieNode, 26),
		count:    0,
		isWord:   false,}}type TrieTree struct {
	root *TrieNode
}

func createTrieTree(a) *TrieTree{
	root := createTrieNode()
	return &TrieTree{
		root: root,
	}
}

type method interface {
	search(word string) bool
	insert(word string)
	delete(word string)}func(tireTree *TrieTree) search(word string) bool{
	tmpNode := tireTree.root
	for _,w := range word {
		index := int(w - 'a')
		iftmpNode.children[index] ! =nil{
			tmpNode = tmpNode.children[index]
		}else{
			return false}}return tmpNode.isWord
}

func(tireTree *TrieTree) insert(word string) {
	if tireTree.search(word){
		fmt.Println("The word already exists!")
	}
	tmpNode := tireTree.root
	for _,w := range word {
		index := int(w - 'a')
		if tmpNode.children[index] == nil{
			tmpNode.children[index] = createTrieNode()
			tmpNode.count++
		}
		tmpNode = tmpNode.children[index]
	}
	tmpNode.isWord = true
}

func(tireTree *TrieTree) delete(word string) {
	if! tireTree.search(word){ fmt.Println("No such word")
	}
	tmpNode := tireTree.root
	var (
		indexList []int
		nodeList  []*TrieNode
		)
	for _,w := range word {
		index := int(w - 'a')
		indexList = append(indexList, index)
		nodeList = append(nodeList, tmpNode)
		tmpNode = tmpNode.children[index]
	}

	for i := len(indexList)- 1; i >=0; i++ {
		index := indexList[i]
		tmpNode = nodeList[i]
		if tmpNode.children[index].children == nil{
			ifi ! =len(indexList) - 1 {
				if tmpNode.children[index].isWord {
					return
				}
			}
			tmpNode.children[index] = nil
			tmpNode.count--
		}
		if i == len(indexList) - 1{
			iftmpNode.children[index].children ! =nil{
				tmpNode.children[index].isWord = false
				return}}}}func main(a){
	words := []string{"hello"."face"."hi"."hell"."why"}
	myTrieTree := createTrieTree()
	// Insert string
	for _, word := range words {
		myTrieTree.insert(word)
	}

	// Insert duplicate string
	myTrieTree.insert("hello")
	// Delete the string
	myTrieTree.delete("hell")
	// Deduplicates the string
	myTrieTree.delete("hell")
	myTrieTree.delete("hi")
	// Query the string, true if found, false if not found
	fmt.Println(myTrieTree.search("hello"))
	fmt.Println(myTrieTree.search("hi"))
	fmt.Println(myTrieTree.search("hell"))}Copy the code

conclusion

The dictionary tree is not very difficult, but it is a very useful data structure that can be learned to solve some problems with string matching and common prefixes.

Of course, as we’ve said, dictionary trees have their drawbacks. Because of the space for time, when you have a bunch of words with very few common prefixes, the space requirement is very large.