figure

Graphs are a common structure, both in data structure algorithms and probabilistic graphs in machine learning. A graph is a graph composed of several vertices and the edges connecting two vertices. It can be used to describe a certain relationship between some things.

Directed acyclic graph

Directed Acyclic Graph (Directed Acyclic Graph) belongs to Directed Graph, and there is no ring in the Graph structure, which can be used to represent the dependencies between events.

Trie tree

A Trie is a search tree whose key is a string, from which a value can be found. Can achieve efficient query and insert, time complexity is O(K), the disadvantage is memory consumption. Its core idea is to reduce unnecessary character comparison, make the query efficient, that is, space for time, use common prefix to improve the query efficiency.

The root node of the Trie tree contains no characters. The string that links the root node to the path of a node is the string corresponding to that node. Each node contains only one character.

For example, add five words to the Trie tree as shown in the following figure.

TrieTree tree = new TrieTree();
tree.put("America");
tree.put("Beautiful");
tree.put("Gold");
tree.put("Gold");
tree.put("Emperor");
Copy the code

Double array Trie

Trie trees can be implemented in many ways, the main difference is the storage structure, different storage structure will lead to different space. There are fixed – length array, variable – length list and so on.

One way to achieve a good level of performance and footprint is the binary Trie tree. A double array means that it is stored in two arrays, base and check, which are parallel to each other.

The base and check arrays record all transition states. If a character C is received and the transition from state S to state T is performed, the conditions in the even-number group are as follows:

check[base[s] + c] = s
base[s] + c = t
Copy the code

After all the states are established according to the existing dictionary, the subsequent query can judge whether the word search has been completed directly according to the state conversion between words and the value of the check array. The main function of the base array is to judge whether there is some state conversion from word to word. The check array, on the other hand, is mainly used to determine whether a word is complete.

Directed acyclic graph action

Such as for such a task: there is a dictionary, according to the dictionary to find a passage contains all of the possible word, so you can build several path from start to finish, and every word corresponding edge may have different weights, and how will this article word segmentation depends on which path which path to walk to the end of the value of the minimum or maximum probability.

Building this directed acyclic graph involves traversing a large number of data sets, which is why the binary Trie tree was introduced. By handing over the time-consuming search to the binary Trie tree, a directed acyclic graph can be created by violent matching from start to finish.

Continue to optimize?

The introduction of AC automata should enable further optimization.

github

https://github.com/sea-boat/TextAnalyzer/blob/master/src/main/java/com/seaboat/text/analyzer/data/structure/DAGModel.jav a

————- Recommended reading ————

Summary of my open Source projects (Machine & Deep Learning, NLP, Network IO, AIML, mysql protocol, Chatbot)

Why to write “Analysis of Tomcat Kernel Design”

My 2017 article summary – Machine learning

My 2017 article summary – Java and Middleware

My 2017 article summary – Deep learning

My 2017 article summary — JDK source code article

My 2017 article summary – Natural Language Processing

My 2017 Article Round-up — Java Concurrent Article


Talk to me, ask me questions:

Welcome to: