This is the 8th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Introduction

Since you have clicked on this blog, I assume that you understand the most basic concepts of graphs, including but not limited to the definition of directed and undirected graphs. I won’t go into details here, but I’ll just cover some important parts

Graph neural network is a kind of neural network that operates directly on graph structure. A typical application of GNN is node classification. In essence, each node in the diagram is associated with a label. As shown in the figure below, the vector of node A is [0.3, 0.02, 7, 4…]. , the vector is sent downstream to continue classification to obtain the category of node A

How exactly is the vector representation of node A obtained, I will elaborate later, but here we can simply think, the vector representation of node A must be related to its adjacent nodes and adjacent edges, assuming that the one-hot vector representation of each node VVV is XvX_vXv, then


h v = f ( X v . X c o [ v ] . h n e [ v ] . X n e [ v ] ) h_v=f(X_v, X_{co[v]}, h_{ne[v]},X_{ne[v]})

Xco[v]X_{co[V]}Xco[v] represents the one-hot representation of the edge connected to VVV, hne[V]h_{ne[V]}hne[v] represents the embedding of the node adjacent to VVV. Xne[v]X_{NE [V]}Xne[v] indicates the one-hot representation of nodes adjacent to VVV. Finally, hVH_VHV vector and real label are used to calculate Loss. However, in some places, hVH_VHV vector and XvX_vXv are fused for a round and then loss is calculated, for example


O v = g ( h v . X v ) l o s s = i = 1 p ( t i o i ) \begin{aligned} O_v&=g(h_v,X_v)\\ loss &= \sum_{i=1}^p (t_i-o_i) \end{aligned}

The above formula and notation may be confusing, but the analogy here is Word2Vec. In Word2Vec, the input is the one-hot for each word, that is, XvX_vXv. It outputs the corresponding embedding, hvh_vhv

DeepWalk: The first algorithm for embedding unsupervised learning nodes

DeepWalk is summarised in one sentence: randomly generate a sequence of graph nodes and Word2Vec the sequence

Specifically, given a graph, we randomly select a node to start with, and then randomly “walk” to the neighbor node until the length of the node sequence reaches a given maximum. For example, in the figure below, d, E and F are selected as the starting point to walk away and get the sequence of three nodes

In this case, node and node sequence can be regarded as “words” and “sentences” respectively, and then the embedding of each node can be trained by skpp-Gram or CBOW algorithm

Node2vec: Bias random walk

The general random walk formula


P ( c i = x c i 1 = v ) = { 1 N ( v ) .  if  ( v . x ) E 0 .  otherwise  {P}\left(c_{i}=x \mid c_{i-1}=v\right)=\left\{\begin{array}{cc} \frac{1}{|N(v)|}, & \text { if }(v, x) \in E \\ 0, & \text { otherwise } \end{array}\right.

Where ∣ N (v) ∣ (v) | | N ∣ N (v) ∣ said VVV node, the number of neighbor nodes ci – 1 c_ {1} I – ci – 1 the current node, cic_ici said under a selected node

General random walks have the following problems:

  1. If it’s a weighted graph, it doesn’t take into account the edge weights
  2. It’s too random for the model to learn by itself which way to swim better

The actual wandering in the figure above can be divided into two categories, namely DFS and BFS

In order to introduce edge weights and choose DFS or BFS according to probability, we first need to modify the general random walk formula


P ( c i = x c i 1 = v ) = { PI. v x Z .  if  ( v . x ) E 0 .  otherwise  {P}\left(c_{i}=x \mid c_{i-1}=v\right)=\left\{\begin{array}{cc} \frac{\pi_{vx}}{Z}, & \text { if }(v, x) \in E \\ 0, & \text { otherwise } \end{array}\right.

Among them, the PI vxZ \ frac {\ pi_ {from}}} {Z Z vx on the value of PI and 1 ∣ N (v) ∣ \ frac {1} {| N (v) |} ∣ N (v) ∣ is equal to 1, Z can be interpreted as the normalized scaling factor

⋅ WVX \pi_{vx}πvx could be rewritten as αpq(t,x) \ WVX \alpha_{pq}(t,x)\cdot w_{vx}αpq(t,x)⋅ WVX


Alpha. p q ( t . x ) = { 1 p .  if  d t x = 0 1 .  if  d t x = 1 1 q .  if  d t x = 2 \alpha_{{pq}}(t, x)=\left\{\begin{array}{l} \frac{1}{p}, &\quad{ \text { if } d_{t x}=0} \\ 1, &\quad{\text { if } d_{t x}=1} \\ \frac{1}{q}, &\quad{ \text { if } d_{t x}=2} \end{array}\right.

Dtxd_ {tx} DTX indicates the distance between the first-order neighbor node of the current VVV and node TTT

  • PPP: Controls the probability of a random walk turning back

  • q q
    : Controls random walk bias to DFS or BFS

    • When QQQ is large (q>1), (Q >1) and (q>1), it tends to BFS
    • When QQQ is small (Q <1)(Q <1)(Q <1), it tends to DFS
  • P = q = 1 p = q = 1 p = q = 1, PI vx = WVX \ pi_ {from} = w_ {from} PI vx = WVX

As far as Embedding algorithms in GNN are concerned, there are many better Embedding algorithms, such as LINE and Struct2vec. However, I feel that Embedding algorithms are not important, or they are not the core part of GNN. They can only be used as tools. The position of Word Embedding in analog Transformer

References

  • An introduction to Graph Neural Networks
  • Random Walk in Node Embeddings (DeepWalk, node2vec, LINE, and GraphSAGE)
  • How to do Deep Learning on Graphs with Graph Convolutional Networks
  • A Gentle Introduction to Graph Neural Networks (Basics, DeepWalk, and GraphSage)
  • Hands-on Graph Neural Networks with PyTorch & PyTorch Geometric
  • PGL Global Champions take you through graph neural networks
  • Teaching assistant Li Hongyi of NATIONAL Taiwan University explains GNN graph neural network
  • Diagram neural network details