AI machine learning and knowledge graph Research direction: Natural language processing and knowledge graphCopy the code

GCNII (ICML 2020) Sharing, GCNII: Graph Convolutional Networks via Initial residual and Identity Mapping

A, Motivation

In computer vision, model CNN can learn deeper feature information with the deepening of its layers. It is quite normal to overlay 64 or 128 layers, and it can achieve better effect than shallow layers.

Graph convolution neural network (GCNs) is a deep learning method for graph structure data. However, most GCN models are shallow. For example, GCN and GAT models get the optimal effect at the second level. When neighboring nodes become more and more similar as the network becomes deeper, nodeSmoothing can not be differentiated.

In the figure above, with the deepening of model level, the Test Accuracy in Cora data gradually decreases. Quantitative Metric for Smoothness proposes a Quantitative index SVM_ for over-Smoothness, as shown in the following formula:

It measures the delta between any two nodes in the graphEuclidean distanceThe sum,The smaller the graph is when learningOver-SmoothingThe more serious when, when, all nodes in the figure are completely the same. It can also be seen from the figure that with the deepening of layers,Is getting smaller and smaller.

Second, the Method

In order to solve the problem of Over Smoothing in deep GCN, GCNII proposed two simple techniques: Initial Residual and Identit Mapping, which successfully solved the problem of Over Smoothing in deep GCN.

1.Initial residual

Residua connection has always been one of the most commonly used techniques to solve the problem of over-smoothing. The traditional GCN and residua connection can be expressed by the formula:

GCNII Initial Residual did not obtain information from the previous layer, but carried out Residual connection from the Initial layer and set the obtained weight. Here, the initial representation of the initial layer is not the original input feature, but obtained by the input feature after linear transformation, as shown in the following formula:

However, Initial Residual was not first proposed by GCNII, but by ICLR 2019 model APPNP.

2, Identity Mapping

Using only residuals can only ease the issue of Over-Smoothing, so GCNII borrowed the idea of ResNet and developed Identity Mapping. Initial Residual thought was to select the weight between the current layer representation and the Initial layer representation, and Identity Mapping set the weight between parameter W and the Identity matrix I, as shown in the following formula:

From the above formula, the first part is Initialresidual, the second part is IdentityMapping, α and β are superparameters. GCNII also explained why IdentityMapping can help alleviate the over-smoothing problem of DeepGNN. To sum up, IdentityMapping can speed up the convergence of the model and reduce the loss of effective information.

Three, the Conclusion

1. Experimental data

In the experiment, three citation data, Cora, Citeseer and Pubmed, are homogenous map data, often used in Transductive Learning tasks. The three kinds of data are composed of the following eight files with similar storage formats:

2. Experimental results

DeepGNN tests were conducted on Cora, Citeseer and PubMed. As the network level deepens, the effectiveness of the model does not decline like that of traditional GNN, but improves as its depth increases. Smoothing problem of traditional DeepGNN has been solved.