Dependency Parser is a key issue in the field of natural language processing. Dependency Parser, as a grammar system in syntax analysis, has become a hot topic in recent years and is gradually widely used in other natural language processing tasks. On the evening of March 21st, Peima.com invited Hu Xiang, who has a master’s degree in computer Science and information from Shanghai Jiaotong University and has worked in UBS and Ant Financial, to share with us the research progress of Dependency Parser and some mainstream methods.
Here’s what we shared:
I. The main problems of natural language comprehension
Let’s start by looking at some of the problems that the field of natural language understanding is trying to solve. The field of natural language processing is very wide, information retrieval, intelligent question answering, emotion analysis, automatic translation and so on, can be said to be very difficult and to be solved in this field, today we mainly discuss semantic expression and knowledge expression.
Let’s start with a simple example to look at semantic representation. The two sentences in the following picture are almost the same in the current automatic semantic matching model, but why do they have completely different meanings by the time humans understand them?
From the perspective of semantics, the meaning of a sentence has levels and main elements, which are semantic framework in academic discourse, that is, what we call “subject, predicate and object”. Not all sentences are covered by subject-verb-object, but let’s take a subject-verb-object sentence as an example.
The subject-verb-object pattern can be understood as there are three different types of slots in the sentence, and each slot can only be filled with one word or word. Once we structure the two sentences, the differences between them become obvious.
So the question becomes, is there a structured way to effectively address the problem of semantic representation? Let’s think about it carefully. There are several difficulties in this problem. First, if such a structure exists, it must be universal, and second, the more difficult problem is how to get a computer implementation to translate a serialized sentence into such a structure.
2. Several mainstream theories of natural language understanding
In the following part, we will talk about the current solutions to these two problems and thinking. There are three main theories of natural language understanding.
1.Phrase structure:
This theory was put forward by Chomsky. In syntactic structure, a non-terminal can only produce less than or equal to two non-terminals, or produce one terminal.
Non-terminal characters are labels without content words such as “S, NP, VP” in this graph. In the structure of tree nodes, there will be child nodes under them. Nodes with child nodes are called non-terminal characters.
Terminators are the concrete content words in this picture, such as “she, bought, car”, which are leaf nodes in the tree structure.
From the sentence level, a sentence can be expressed as a binary tree according to the semantic hierarchy, i.e. the tightness between them, as shown in the figure above.
So what’s wrong with this syntactic construction? Let’s look at the figure below. First of all, it’s a strong sequence requirement structure. Second, it is not a universal framework for all languages. It relies on language-specific rules and has poor universality. Finally, it reflects limited semantic information.
2.Dependency Structure (一层 syntax) :
It is simpler and more intuitive in form because it expresses grammatical relations through effective edges between words.
In the example above, we can see an edge SBJ, which refers to the subject of the sentence, OBJ, which refers to the object of the sentence, and NMOD, which refers to the modifier of the noun. It defines the relationship between words by the type of edge. And there are more types of edges, there are more than 30 kinds at present.
3.Frame Semantics
Compared with the previous two theories, it lays more emphasis on semantic and knowledge expression. This set of theories holds that a complete expression should be combined with background knowledge. The word “eat”, for example, cannot be discussed independently of knowledge about how the word should be used, but should be placed in the full semantic structure. The same is true of the word “transfer.”
Iii. Significance of Dependency Parser
Dependency Parser is our main topic today, so we will discuss the main meaning of Dependency Parser. In fact, it may be a difficult topic to discuss the significance of Dependency Parser, as it is a relatively basic research in the field of natural language processing, but its practical effects mainly include the following aspects:
It can be seen that tree-LSTM does not have obvious advantages over Bi-LSTM, so whether structuralization is necessary is still a controversial point. In teacher Hu Xiang’s opinion, he thinks that the study of dependency Parser on this topic is equivalent to revealing the nature of language. Its findings are useful for understanding natural language, though not necessarily useful in practical applications.
Iv.Dependency Parser Structure
In the traditional dependency Parser architecture, there are several limitations:
Under these conditions, the natural result is a tree. The controversial point is whether there is crossover or no crossover when we construct a dependency Parser structure. In fact, crossover is supported and sentence expression is more semantic, but subsequent pass difficulty is also higher.
With the specification of dependency syntax understood, we will introduce several mainstream pass methods. Parser teaches the computer to automatically translate a sentence into a specified structure, so we need to have some methods to help the computer to narrow its search area and space with context or certain operations, and effectively build a tree.
1. The Transition – -based parser:
So this is the classic method, transition-based is basically a set of operations that it defines, and we just take that set of operations and we do it all the way down, and we end up turning it into a tree, and that’s a little bit intuitive.
The pass method is based on state transitions. Let’s take a look at the following example to visualize the process:
The shift function is to push a token or structure inside the top of the current temporary station I to the left of station S.
Then we see the left-arc operation in Figure 3. The specific meaning of this operation is to take one element from I and one element from S, draw A Left Arc to connect them, and throw it into A on the far right. The meaning of A is an operation history.
A left arc is formed between “she” and “bought”. The meaning of the left arc is that where the arc is directed, the token pointed to is a child node, and the token of others is a parent node. Here, “bought” is pressed into the station again as a parent node, and then the operation is repeated. Until station I is empty, at which point the process terminates.
Station A is the record of the operation in this process, and when we look at station A and see what the arcs are, we can form A tree.
Transition-based Parser is a parser that merges locally based words until the whole thing is eaten up. The parser is parser based on the combination of local words until the whole thing is eaten. It has advantages in terms of runtime performance and contains more semantics.
2. The Graph – -based parser:
Graph-based Parser is a global approach compared to the local approach above.
The idea is to turn the dependency parser relationship into a value, regardless of the distance between each word, and then generate a diagonal tree from the dependency Parser path. Its advantage is that it can realize the local optimal solution, so it will not cause the global final pass result to be wrong because of the previous error.
V. Introduction of parser Model based on deep learning
Finally, we introduce some deep learning models based on transition-based theory.
1. The Transition – -based the methods:
The idea is to turn a sentence into a syntax tree and then a Dependency Parser, which is a bit of a step too far, but it works. It also defines operations that are repeated over a sequence until the end of the sequence forms a tree. There are three main operations:
This model is the atypical sequence marking problem of inserting open parentheses or close parentheses in sequences. The role of deep learning in this model is to express each token enclosed by open parentheses and close parentheses into a higher meaning.
2. The Graph – -based model:
This model turns the previous problem of constructing numbers into a distance calculation problem. What it essentially does is to build a matrix, where rows and columns are tokens in this sentence. One point in each two columns represents the probability of A modifying B or a score of the probability of B modifying A. Finally, to form a tree, we find a minimum spanning tree in this distance vector, and the resulting dependency Parser is generated.
Therefore, in the model of deep learning, it is more important to calculate the distance vector. Similarly, it is also because of the discovery of Word embedding that we can transform a sentence into continuous vector.
So how much does dependency Parser make sense in practice? We cannot deny the significance of dependency Parser, but it has several major limitations:
So why is the accuracy of Chinese so low? There are two main points:
To solve this problem, we designed a way to simplify labeling:
How do I simplify this? It is to let the annotators to mark the semantic structure and identify the semantic blocks inside, but it still has defects, and we can only strengthen learning.
All the above parts are the main sharing content of this live online broadcast. I believe you must have a deeper understanding of Dependency Parser. For more details, you can pay attention to the service number: FMI Pegas.com and click the menu bar Pegas.com to learn.