“What is a graph convolutional network?

Since 2012, deep learning has enjoyed great success in fields such as computer vision and natural language processing, and is now the most common approach in artificial intelligence. Convolutional neural network is suitable for processing data in Euclidean domain. For example, the position relationship of pixels in an image is very clear. Therefore, the output of the current layer can be calculated by row by row scanning with a convolutional template of fixed size (such as a 3×3 matrix).

Image source: http://xilinx.eetrend.com/d6-xilinx/article/2017-01/10849.html

In the image field, pixels are closely arranged and have a clear position relationship. However, in real life, many data, such as the relationship between users in social networks, the relationship between molecules in chemical structures, do not have the same strict correlation as pixels.

Image source: http://www.sohu.com/a/131115660_489557

These data cannot be described directly by pixel matrix, but can be described by graph. For example, users can be abstracted as vertices (V), which are connected with each other by lines. Graph G=(E,V) is a topological structure based on vertices and lines. For Graph data, the number of adjacent vertices of each vertex is uncertain, and processing Graph data directly with Convolutional neural Network will cause great problems. Therefore, Graph Convolutional Network (GCN) is proposed by some scholars to deal with topological data.

In order to realize the convolution calculation on the graph, we use the adjacency matrix of the graph to represent the relationship between nodes, and introduce the Laplacian operator as the basis vector of convolution. For the specific derivation process, please refer to [1].

Graph convolutional network is a very effective graph processing model. Usually, the two-layer graph convolutional network obtained by random initialization can effectively extract feature representation in graph vertices. Each layer of graph convolutional network processes the graph according to the topological structure: first, each vertex transforms its own characteristic information and sends it to its neighbor; Secondly, each vertex gathers the feature information transmitted by adjacent vertices. Finally, the vertex information is transformed nonlinear, which is similar to the activation layer in the convolutional neural network. The structure of graph convolutional networks is usually shown in the following figure.

Photo: www.jianshu.com/p/89fbed65c…

“Algorithms for text classification in graph convolutional networks

The algorithm proposed by Yao et al. [2] can process text classification simply and efficiently through graph convolution network. Firstly, a large text atlas is constructed according to the training data. With word and document as vertices, word-word edges and document-word edges are established. Then, adjacency matrix A is established according to the graph defined above.

A is NXN dimensional square matrix, n represents the number of all vertices in the graph, that is, the sum of the total number of documents and words. Each element (I, j) in A represents the weight between vertex I and vertex J. Generally, the more correlated the vertices are, the greater the value of the element is. If there is no correlation between vertices, the element at this position is 0. To describe the relationship between vertices, edges are defined using the following four rules:

1. Connect the edge of the document and the word, word frequency in the document – reverse text frequency (TF-IDF) as the weight, which indicates the importance of a word in a text;

2. The edges of words are connected, and the mutual information between words and words (PMI) is used as the weight, which represents the correlation between words. The specific calculation method is to scan documents with a sliding window of fixed length, count and normalize the frequency of words that simultaneously appear in the sliding window;

3. The relation between vertices and themselves, i.e. the elements on the diagonal of matrix A, is denoted as 1;

4. Other cases, such as the weight between documents, are marked as 0.

After the adjacency matrix is defined, the Laplacian matrix A’ of the text graph can be easily derived according to the algorithm introduced in [1].

Where, W0 is the model parameter obtained through training. ρ is the ReLU activation function. The relation of convolution of adjacent layers is

The graph convolutional network “refines” the input data layer by layer, and classifies the text by classifier.

Where Z represents the output category information, softmax is the most commonly used multi-class classifier in machine learning.

During training, parameters W0, W1…… in the model are learned with the training data set. , Wj, and then during the test, input the text data to be classified, and output the text classification result through graph convolutional network.

“Text classification experiments for graph convolutional networks

In order to test the effect of graph convolutional network (GCN) in text classification, we found the corpus of Chinese and English FAQ in some projects to do classification, and compared with the classification results of FastText. Among them, fasttext is a very mature fasttext classification algorithm at present, and also one of the commonly used algorithms in FAQ projects.

In the absence of the pre-training model, the default initial word vector and sentence vector of GCN are 0. After the training, the model will automatically generate the vector of each word and sentence.

Reference to the original

[1] zhuanlan.zhihu.com/p/54505069

[2] Yao L, Mao C, Luo Y. Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33: 7370-7377.