Updated on 11 March 2020:

  1. Add links to cited papers
  2. Optimize some details

This paper mainly collated from the paper interpretation video “CNN-Graph Convolutional Network Model on the Network” and “Semi-supervised Classification Based on Graph Convolutional Network”.

Pre-knowledge: Convolutional Neural networks (CNN), fundamentals of linear algebra

Introduction to the speaker:

  1. Xin Ruyue: Master of The College of Systems Science, Beijing Normal University, co-edited the book Approaching 2050. Research interests: Complex network classifier, deep learning for complex networks, GCN, etc
  2. Gao Fei: Ph. D. candidate, College of Systems Science, Beijing Normal University. Research interest and direction: Deep learning in complex networks

This paper mainly introduces graph convolutional neural network. That is, the method of convolutional neural network (CNN) is applied to the data of “graph”.

What is graph data/why do I deal with graph data

At present, machine learning technology has developed very well in various fields, especially in the field of image and language. The classic representative models are CNN and RNN, as well as the models based on them. These models have good effects in image recognition, speech recognition and other tasks.

However, these models are all Euclidean data acting on rules — images are regular two-dimensional lattice data, text or sound are regular one-dimensional sequence data. Without the data of these rules, it is difficult for these models to perform well. What we encounter on a daily basis are more irregular, non-Euclidean geometric data, social networks, transportation networks, electrical networks, brain nervous systems, weather systems, etc., and we desperately need better ways to process these data and accomplish more complex tasks.

There are also some traditional methods for processing irregular data. However, they are only limited to small data sets at present. Once the amount of data reaches a certain scale, traditional methods will be very difficult to process. After having the method of deep learning, people began to think about how to apply deep learning to the processing of irregular data. There’s been a lot of research in recent years. Another name for this kind of research is Geometric deep leaning. This article is an overview of processing non-European data. Geometric deep learning: going beyond euclidean data[J] Bronstein M M, Bruna, J.IEEE Signal Processing Magazine (2017)

According to this article, non-European data can be divided into streaming data and graph data. Because we have been studying complex networks, we do not pay much attention to the flow type data, but mainly focus on the graph structure data, namely network data.

Application of machine learning to graph data

When machine learning is applied to the study of graph structure data, of course, the first thing that comes to mind is the further extension of classical RNN and CNN. Here is a brief mention of the extension to RNN. This is the Gated Graph Sequence Neural Networks proposed by someone in 2016. It processes the data of Graph structure according to the idea of RNN and processes every node on the Graph into a neuron.

One advantage of using RNN is that it is good at handling temporal data. The downside is that it is inefficient for very large networks.

Gated Graph Sequence Neural Networks Yujia Li,Daniel Tarlow,Marc Brockschmidt. ArXiv: 1511.05493(2015).

Our research team mainly focuses on using CNN’s idea to process graph data. Relatively, this study focuses on a large number of people, and the effect of the method is better. Therefore, the next part will introduce how to use CNN to process graph structure data, which is also called graph convolutional network (GCN).

The application of CNN on the graph — GCN

Next, we will introduce GCN by focusing on the following paper, which is an article about the theoretical basis of GCN. It mainly solves how to define convolution operation in topology graph. Convolutional Neural Networks on Graphs with Fast Psychosocial Filtering Michael Defferrard,Xavier Bresson,Pierre Vandergheynst.(NIPS 2016) (2016)

The article background

CNN is mainly used for image classification. Take handwritten data set MINIST as an example. When you input a picture, the model will tell you which number (classification) is on the picture. CNN has a very high accuracy in image classification task. At the same time, CNN can also do some identification or detection tasks with good results.

In this paper, the final experiment uses GCN to do a basic classification task on a MINIST dataset. There are two kinds of classification tasks in graph data: graph classification and node classification. The input of graph classification task is multiple graphs, and the corresponding category of each graph is output at last. The input of node classification task is a network, and the final output is the label of each node in the network.

We will focus on graph classification here because we want to simulate images.

Let’s explain how GCN works. Since GCN can be said to be an extension of CNN, let’s first briefly review the network architecture of CNN.

For a classified task performed at CNN:

  1. The input is a set of images and the output is a category for each image.
  2. The layer by layer operation in the middle of the figure is constantly going onConvolution + poolingIn the process.
  3. Convolution is to strengthen image feature information, while pooling is to extract features and reduce dimensions on the premise of preserving image features.
  4. Finally, there are several fully connected layers, mapped to the output.

The most important part of the whole process is the convolution operation using the convolution kernel and the pooling process.

First of all, the convolution layer defines a convolution Filter. The convolution kernel will continuously scan the image and convolve with the scanned region, and finally get a result that can be regarded as the feature of the image. Convolution helps us find specific local image features (such as edge enhancement).


The pooling layer can be understood as a zooming operation on the image. The purpose is to reduce the image feature dimension, reduce the order of magnitude of training parameters, and eliminate the redundant information in the image on the premise of preserving the original image features.

Having said CNN, let’s take a look at the structure of GCN in this article. Again, we perform a sorting task on it

GCN input and output

Each pixel in the analog image will have a gray value, and each node in the graph data will also have a “signal value” as the feature of the node. Therefore, in the classification task of GCN, if it is a graph classification task, the input is a bunch of graph information (including the signal value of nodes in the graph), and the output is the corresponding classification information of each graph.

Graph classification also needs to be divided into situations, and detailed node information is not required in all cases. For example, in citation networks, graph classification is mainly determined by node information, so node information needs to be input at this time. In other cases, node information is not sufficient, or the classification of the graph is mainly determined by the topology of the graph, so node information is not a necessary input.

Let’s use the graph classification task as an example to visualize how the “signal” here is transmitted.

As shown in the figure above, each red dot represents a node in the network. The blue columns represent the signals carried by the nodes. Each node has its own signal value, that is, node state. This state can be represented as a vector. Suppose we have m networks, and a network has N nodes. If the vector representing the node state is d-dimensional, then the input of the model is the signal value of the nodes of these networks, which means that the input is a matrix with a size of M * N * D.

The corresponding output is the label represented by each network, and each category can also be represented by a vector. Assuming that the output vector representing the category is D ‘, the output of the model is a matrix with a size of M * D ‘. All you have to do is map the inputs to the outputs.

The convolution operation on the graph

The convolution of GCN is similar to the convolution of CNN, which is to extract the features of input nodes and finally map them into the classification of this graph. In the graph convolution operation, a convolution kernel filter is also defined. The biggest difference between this step and CNN is that nodes of fixed size are linked around each lattice of the image, so the filter of CNN can be of fixed size. In non-European data, the degree of nodes is different. For GCN, the primary task is to define a “convolution kernel” that can complete feature extraction.

Here, it is mainly inspired by the field of graph signal processing, using the theory of graph to realize convolution on topology graph. The researchers suggested that you could think of the convolution on the graph as the convolution of the Fourier transform in the time domain, and in the case of the Fourier transform, the convolution in the time domain, you could transform the signal into the frequency domain and multiply it directly.

Fourier transform: The decomposition of periodic signals on the x axis into linear summations (discrete summations, continuous integrations) of multiple wave functions. This is a Fourier transform process (spectral decomposition).

The convolution of two functions is essentially the product of their inverse Fourier transforms

What’s the good of that? And what that means is, instead of defining a display convolution kernel, as we did in the image task, we just need to find the inverse Fourier transform of the convolution kernel.

For more details on the mathematics of spectral decomposition, read this paper.

“The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” Shuman D. I., Narang S. K., Frossard P..in IEEE Signal Processing Magazine(2013).

This paper introduces the theory of graph here, mainly referring to the GraphConvNet of Kpif and using chebyshev polynomials to define convolution. After has evolved a lot about the definition of figure convolution, but based on the convolution definition is still a classic map, after what we call the GCN, to a certain extent, is actually mean kipf based on previous work, is put forward based on the figure of the convolution spectrum decomposition method, and good results have been achieved in the downstream classification task.

GraphConvNet: KIPF, Welling 2016

Semi-Supervised Classification with Graph Convolutional Networks Thomas N. Kipf,Max Welling(2016).

Coarse-grained, pooled

Pooling and coarse-granulation follow. The two steps here, equivalent to the pooling operation in CNN, aim to retain features and improve training efficiency at the same time. Coarse-graining can be understood as the simplification of the structure of the graph. Graclus algorithm is used here to find out similar nodes for fusion by comparing the weight of nodes and obtain a new graph. The new graph can be considered as retaining the structural information of the original graph. The node signals are then fused and clustered, i.e. pooled, into the simplified graph.


The effect of GCN

GCN can be considered as a generalization of CNN, which is not only limited to the application in the field of images, but also can be extended to networks with more complex structures and more complex data, so that more tasks can be completed.

More subsequent studies and expansion of GCN have shown that GCN not only achieves the level of CNN in the image field. For example, the experiment in this paper applies THE GCN method to MINIST data set. After comparison with CNN, it is found that the accuracy of GCN and CNN in recognition task is almost the same. And in the classification of the task is also very good performance


conclusion

For tasks based on graph data, graph structure is also very important in addition to node features on the graph. In order to analyze the information contained in the graph topology, GCN used CNN’s convolution operation for reference to aggregate the information of neighbors, and defined convolution on the graph inspired by the graph theory. More exploration of convolution on graphs followed, simplifying and extending GCN. The development of GCN series models provides a very good means for us to deal with graph data. Due to the extensive existence of graph data in the real world, the development of GCN may also mean that our future exploration of the world will be deeper and more thorough.