“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”

preface

Recently, I found the input reminder of intelligent question answering system interesting while using the company’s system. The specific effect is similar to baidu search when the text reminder. Today we’ll look at the implementation. The main data structure is the dictionary tree.

First look at the specific effect to achieve (here to Baidu search as an example) :

Read it: Think of the text you want to search for by typing it. Ideal for domain specific and knowledge base searches.

The body of the

Introduction to dictionary Tree

Dictionary tree, or trie. As the name suggests, it’s a tree that looks like a dictionary.

The dictionary tree has the following characteristics:

  • The root node contains no characters, and each node except the root node contains only one character.
  • From the root node to a node, the characters passing through the path are concatenated to be the string corresponding to the node.
  • All children of each node contain different characters.

Figure 1 is a dictionary tree 1->2->6->11 is a string ABA.

Dictionary tree implementation

See Letcode for an implementation

Application of dictionary tree

As the picture in the preface shows, can be used in the search of automatic completion. The following is a dictionary tree in Python, in addition to basic functions, complete the automatic recommendation of data.

class Trie:

    def __init__(self) :
        self.root = {}

        
    def insert(self, word: str) - >None:
        node = self.root
        for s in word:
            if s in node.keys():
                node = node[s]
            else:
                node[s] = {}
                node = node[s]
        node['is_word'] = True
                
    def search(self, word: str) - >bool:
        node = self.root
        for s in word:
            if s in node.keys():
                node = node[s]
            else:
                return False
        
        if 'is_word' in node.keys():
            return True
        else:
            return False
        

    def startsWith(self, prefix: str) - >bool:
        node = self.root
        for s in prefix:
            if s in node.keys():
                node = node[s]
            else:
                return False
        
        return True
    
    def startsWithWords(self, prefix: str) :
        node = self.root
        for s in prefix:
            if s in node.keys():
                node = node[s]
            else:
                return [""]
        res = []
        self.printWords(node, prefix, res)
        
        return res
    
    def printWords(self, node: dict, start: str, res) :
        if node.get('is_word'):
            res.append(start)
            return 
        for k in node.keys():
            self.printWords(node[k], start+k, res)
Copy the code

Insert data manually to test, the result is as follows:

t = Trie()
t.insert("Catering invoice")
t.insert("How do I get paid for overtime meals?")
t.insert("How to get taxi invoice")
t.insert("Standards for Travel expenses.")
t.insert("Invoice mailing")
t.insert("Requirements for f&B invoice types")
t.insert("How to write the invoice title")

print(t.startsWithWords("Invoice"))
# [' invoice mailing ', 'invoice heading ']
Copy the code

PS: here only demonstrate the implementation of the code, the implementation of the specific search box interested partners can package THEIR own API.

conclusion

  1. The above is just an introduction to the basic dictionary tree implementation. Here we recommend a Dictionary tree based on Google’s Python implementation

The function is similar to matching possible results based on the prefix:

import pygtrie 

t = pygtrie.StringTrie()
t['invoice'] = 'invoice'
t['Invoice/Express'] = 'Invoice Express'
t['Invoice/Reimbursement'] = 'Invoice reimbursement'
t['Invoice/Tax Rate'] = 'Invoice rate'

t['Catering/Invoice'] = 'Food and beverage Invoice'
t['Traffic/Invoice'] = 'Traffic invoice'

print(t.items(prefix='invoice')) 

# [(' invoice ', 'invoice'), (' invoice/delivery ', 'invoice delivery), (' invoice/reimbursement,' invoice reimbursement), (' invoice/tax ', 'tax invoice)]
Copy the code
  1. The dictionary tree implementation here does not consider the space complexity, for more dictionary tree types and implementation can refer to Trie tree.

  2. Letcode has a lot of dictionary tree implementation, interested partners can learn by themselves.