The author
Yufeng Zhang1∗, Xueli Yu 1∗, Zeyu Cui, Shu Wu, Zhongzhen Wen and Liang Wang
An overview of the
Graph neural networks (GNN) have recently been applied to text classification. However, the existing models can neither capture the context in each document nor complete the induction and learning of new words. In this work, the author builds individual diagrams for each document, generalizing text learning through TextING. Experiments on four benchmark data sets show that this method is superior to the most advanced text classification methods.
motivation
Text categorization provides a basic method for other NLP tasks, such as sentiment analysis, intent detection, etc. Traditional text classification methods include Primitive Bayes, K nearest neighbor and support vector machines. However, they rely mainly on manufactured features, at the expense of labor and efficiency.
To solve this problem, a variety of deep learning methods are proposed, among which recursive neural network (RNN) and convolutional neural network (CNN) are the most basic methods. However, they all focus on word locality and thus lack long-distance, discontinuous word interactions.
In recent years, the graph-based approach has been used to solve this problem, which treats text not as a sequence, but as a set of co-occurrence words. However, these graph-based approaches have two major drawbacks. First, context-aware word relationships in each document are ignored. Second, because of the global structure, test files must be present during training. Therefore, they have an inherent transformational nature, which makes it difficult to perform inductive learning.
model
In this paper, a Text Classification method based on Graph Neural Networks is proposed. In contrast to the previous global structure-based graph approach, the authors trained a GNN that only used training documents to describe detailed word-word relationships, and generalized them to new documents in tests. In this model, each document is an independent topology, and word relationships can be learned at the document level. At the same time, this model can also be applied to the new words that do not appear in the training process.
The model includes three key parts: graphic construction, graph-based word interaction and readout function. The architecture is shown in the following figure.
The author uses a sliding Windows of length 3 to learn the co-occurrence relationship between words, and constructs a topology diagram for each document.
(2) Graph-based Word Interaction The author uses GGNN gated Graph neural network to learn the Word term embedding of documents.
(3) The Readout function obtains the characteristic representation of each node on a document through two multi-level perceptrons MLP, and then calculates the characteristic representation of the whole graph through these nodes, that is, the characteristic representation of this document.
The authors also propose a variant of the model, TextING-M. The authors combined local and global maps, trained them individually, and then made the final predictions on a 1:1 scale. This model does not allow inductive learning, so the author focuses on examining whether the two can complement each other from the micro and macro perspectives
The experiment
The author divided the training set and the verification set in a ratio of 9:1. The learning rate is 0.01, the dropout is 0.5, and the initial word feature representation uses GloVe with dimension 300. For fair comparison, other baseline models share the same embedding, and the results of the experiment are shown in the table below.
The value is the accuracy of classification. The author has conducted ten experiments, and the positive and negative are the ups and downs of these experiments. It can be seen that TextING has the best results in every task. TextING is the most effective in the MR task than TextGCN, because in MR, the comments are short text, resulting in the low-density map in TextGCN, which limits the transfer of tag messages between document nodes, but TextING’s individual images do not rely on the tag messaging mechanism.