Graph convolutional neural network from text classification

“Graph neural networks have been hot for so long that it’s time to master them.”

This article contains the following contents, reading time 10min\

What does a graph neural network mean
How does text construct diagrams
Graph convolutional neural network
Source code implementation
Recent progress in graph convolutional neural networks

This article reads basics \

Fundamentals of neural networks
This article does not cover the mathematical derivation of the Laplace matrix

–

What is a graph neural network

Neural networks have dominated machine learning in the past few years. For example, the success of convolutional neural network (CNN) in the field of image recognition and the success of recurrent neural network (LSTM) in the field of text recognition. For the image, the computer quantizes it into a multidimensional matrix; For text, document sentences can also be quantified into regular matrix representation by word embedding. Deep learning technology represented by neural network has been successfully applied to these standardized data. But in real life there are many irregular data in the form of graphs. For example, the connection between people in the social relationship graph, or the relationship between people and goods in the e-commerce system, etc., these data structures look like the following:

The actor-movie relationship is located in the graph data of Neo4j

Graph Neural Network (GNN) refers to the general name of the model applied by Neural Network on Graph. There are five categories of Graph Neural Network: Graph Convolution Networks GCN, Graph Attention Networks, Graph Autoencoders, Graph Generative Networks and Graph space-time network Spatial and temporal Networks). This paper only focuses on introducing the most classical and meaningful basic model GCN.

Professor Sun Maosong and his team from Tsinghua University published A paper Graph Neural Networks: A Review of Methods and Applications on arXiv. The author made A detailed and comprehensive Review of existing GNN models.

— – \

How does text construct diagrams

We’re going to build a graph with n well-defined nodes and m edges. \

Take the classic task of sorting. I have five different machine learning books in my drawer, with a total of a chapters and b different types of words (not the number of words, but all kinds of words) in each book. Then we can tag a chapter and b words with unique ids, n=a+b nodes, which are the nodes of our graph.

The edge of the create

We have two types of nodes, chapters and words. Then edges are constructed from chapter-word relationships and word-word relationships. For the side section-word, the weight of the side is reused by the TF-IDF algorithm of the word in the chapter, which can better represent the relationship between the word and the chapter. This algorithm works better than using word frequency directly [1]. The weight of the edges of the word-word relationship depends on the co-occurrence relationship of the word. We can calculate the relationship between two words by smoothing the contents of five books with a fixed-width slide window, similar to the training sampling process for word2vector. The specific algorithm is implemented by PMI algorithm. \

Point-wise Mutual Information (PMI) is a popular algorithm for calculating the relationship between two words. We can use it to calculate the weights of two word nodes. The weight calculation formula of nodes I and J is as follows:

PMI(I, j) is calculated as follows: \

#W(I) represents the number of all sliders that contain the word node I.

#W(i; J) represents the number of word nodes I and j in all slide Windows.

#W is the total number of slides

If THE PMI value is regular, it indicates that the two words are highly correlated; if the PMI value is negative, it indicates that the correlation is not high. Therefore, in the final graph construction process, only edges composed of positive pairs of word nodes are reserved.

The nodes and edges of the graph are determined, and then we introduce how to apply graph convolutional neural network for some learning applications.

A 2019 AAAI paper used this method to classify chapters. Graph Convolutional Networks for Text Classification

— – \

Graph convolutional neural network

Graph Convolutional neural Network (GCN) is a kind of neural Network that adopts Graph convolution. It has been developed into numerous versions based on the simplest Graph convolution, and its status in the field of Graph Network is just as that of convolution operation in image processing.

What is convolution \

The essence of discrete convolution is a weighted sum.

www.zhihu.com/question/22…

Diagram of convolution process \

The essence of convolution in CNN is to extract spatial features by calculating the weighted sum of central pixels and adjacent pixels by using kernel, a filter with shared parameters. The weighted coefficient is the weight coefficient of the convolution kernel. The weight coefficient of convolution kernel is optimized iteratively by BP algorithm. It is through optimization that the parameters of convolution kernel can realize feature extraction. An important point of GCN theory is to introduce convolution parameters that can be optimized to realize feature acquisition of graph structure data.

Graph structure in social networks

The goal of graph convolution is similar, hoping to learn a node representation that depends on each node and its neighboring nodes. The node representation can then be output as a classification task, which is often referred to as node classification.

The definition of the figure

For the figure ， Is the set of nodes,Is the set of edges, for each nodeBoth have their own characteristicsYou can use a matrixSaid. Among themRepresents the number of nodes,Represents the feature number of each node, which can also be said to be the dimension of feature vector.

So what is there to measure nodesA neighbor nodeWhat about this relationship?Laplace matrix. For a simple example, for the left figure below, its degree matrix, adjacency matrixAnd the Laplace matrixThe degree matrix is shown in the figure below.Only the value on the diagonal is the degree of the corresponding node, and the rest is 0. Adjacency matrix1 only between two nodes with edge connections, 0 everywhere else; Laplace matrix 为 . This is a simpler Laplace matrix.

Various representations of graph structured data

Here are the highlights

The propagation formula of the first layer of graph convolution network (GCN) is as follows: \

ρ is an activation function, such as ReLU.

So let’s think about the adjacency matrixARepresents the topological structure and dimension of the graphN*N, N represents the number of nodes;

X is the eigenmatrix of the input of the first layer, dimension N*M, M represents the eigenvector dimension of each node;

Wo is the weight parameter matrix, dimension M*K, K represents the vector dimension transferred to the next layer.

So the vector dimension of the first output L1 is N times K. \

In the text categorization task described above,

X is the original input, and we represent it as an identity matrix with a diagonal of one, dimension N by N; Is a one-hot representation of a node. The parameter Wo adopts is N*K random initialization (K=200),. \

The dimension of XWo is N*200, which means that each input node is added with 200 dimension. \

How do I understand the matrix multiplication of A times XWo? That’s the key to understanding graph convolution. Reviewing the matrix multiplication formula, it is found that the K dimensions of each node of the newly generated L1 N*K matrix are the sum of the weights of adjacent nodes of corresponding nodes multiplied by the values of adjacent nodes in this dimension. Thus, through one convolution, GCN can make each node have the information of its neighbor node.

When the adjacency matrix of graph is multiplied by graph node embedding, it equals to do a convolution

Below I draw a diagram \

Conclusion: All the newly generated vectors of node 0 are equal weighted sum of the vectors of adjacent node 1 and node 3. In this way, the convolution (weighted sum) of surrounding nodes is realized to obtain a new self. \

(The first row of adjacency matrix A, 0 1 0 1, indicates that node 0 is connected to node 1 and node 3, but not to node 2)\

If you want a node to have more information about the surrounding nodes, you can convolve multiple times.

The aboveThere are two disadvantages to using an adjacency matrix instead.

The influence of the node on itself is not considered, because the diagonal of the adjacency matrix is 0.
Adjacency matrixIt is not normalized, which may cause problems when extracting graph features, such as nodes with more neighbors tend to have more influence.

So the more common formula is:

Also known as normalized Symmetric Adjacency matrix. For understanding of this formula, please refer to [1]

— – \

Pytorch code implementation

Some people understand it better when they look at the code. The model definition of two-layer graph convolutional network is introduced below:

class gcn(nn.Module) :
    def __init__(self, X_size, A_hat, args, bias=True): # X_size = num features
        super(gcn, self).__init__()
        self.A_hat = torch.tensor(A_hat, requires_grad=False).float()
        self.weight = nn.parameter.Parameter(torch.FloatTensor(X_size, args.hidden_size_1))
        var = 2. / (self.weight.size(1) +self.weight.size(0))
        self.weight.data.normal_(0,var)
        self.weight2 = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_1, args.hidden_size_2))
        var2 = 2. / (self.weight2.size(1) +self.weight2.size(0))
        self.weight2.data.normal_(0,var2)
        if bias:
            self.bias = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_1))
            self.bias.data.normal_(0,var)
            self.bias2 = nn.parameter.Parameter(torch.FloatTensor(args.hidden_size_2))
            self.bias2.data.normal_(0,var2)
        else:
            self.register_parameter("bias", None)
        self.fc1 = nn.Linear(args.hidden_size_2, args.num_classes)


    def forward(self, X): ### 2-layer GCN architecture
        X = torch.mm(X, self.weight)
        if self.bias is not None:
            X = (X + self.bias)
        X = F.relu(torch.mm(self.A_hat, X))
        X = torch.mm(X, self.weight2)
        if self.bias2 is not None:
            X = (X + self.bias2)
        X = F.relu(torch.mm(self.A_hat, X))
        return self.fc1(X)
Args. Hidden_size_1 = 200;
Args. Hidden_size_2 = 20;
# args.num_classes=5
Copy the code

There are 100 chapter nodes and 5000 word nodes in the first 5 book chapters and word nodes. Each chapter node is labeled to which book it belongs. There are five categories. It is hoped that the network will learn which book the remaining 50 chapters belong to by marking and training the labels of 50 chapters. It belongs to semi-supervised learning.

— – \

Recent progress in graph convolutional neural networks

The writing basis of this paper is Graph Convolutional Networks for Text Classification, which is derived from AAAI2019 and uses GCN for Text Classification. At THE AAAI2020 conference, scholars from Tsinghua University IFLYtek proposed the Tensor Graph Convolutional Networks for Text Classification. By using texts to form multiple Graph structures, Further improve the performance of text classification.

In the New Year of 2020, what are the new development possibilities of graph neural network GNN? Share a PPT of AAAI2020 explaining GNN in detail, which answers these questions well.

link

Cse.msu.edu/~mayao4/tut…

\ \

[1] Alternative interpretation

zhuanlan.zhihu.com/p/89503068

[2] Alternative interpretation

www.zhihu.com/question/54…

[3] Good writing on digging

Yao, Liang, Chengsheng Mao, and Yuan Luo. “Graph convolutional networks for text classification.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

[4] A graph convolution classification project code

Github.com/plkmo/Bible…

Machine Learning Online Manual Deep Learning online Manual AI Basics download (PDF updated to25Set) site QQ group1003271085To join the wechat group, please reply to "add group" to get a discount station knowledge planet coupon, please reply to "knowledge planet" like the article, click on itCopy the code

Graph convolutional neural network from text classification

Related Posts

Dubbo source code analysis (nine) load balancing algorithm

Logback Abnormal logs are monitored and alarms are generated

Spring Boot 2.X(14) : Logback