What I want to share this time is the thesis DeepWalk in 2014: Online learning of Social Representations, DeepWalk[1], reference CODE[2], this paper is an earlier article in the field of map learning, and it is an article that cannot be circumnavigated by learning map. Although the overall difficulty is not difficult, However, the method proposed in the article is very original and interesting.

Thesis motivation and innovation

  • This paper proposes a representation learning method for graph structure. From a certain perspective, the development of machine learning or deep learning is carried out around representation learning method.
  • In the field of natural language processing, Word2VEc (previously summarized word2vec[3]) is a very basic and famous word representation method. It uses the co-occurrence or adjacency relationship between words in sentences to carry out word representation learning, and the learned word representation vector can accurately depict the actual meaning relationship between words.
  • So in other non-verbal structures, such as graph structures, can we learn the representation of each node according to the relationship between nodes in the graph and adjacent nodes? Obviously, it is possible. The paper proposes that in the graph structure, random walk can be carried out with any node as the starting node. When walking to the longest number of steps, a sequence composed of nodes can be obtained, which can be analogous to sentences in natural language, and nodes can be analogous to words in sentences. Then, the method of Word2vec is used for representation learning of each node.
  • The proposed method is an unsupervised feature learning method. Specifically, it obtains a sequence of nodes from the truncated random walk and uses word2VEc method to learn the representation vector of each node. We believe that the representation vector learned in this way can capture the proximity similarity between nodes and the community (category) to which they belong.
  • In this paper, the proposed method has the following characteristics: ① adaptability: in the actual social network, the graph generated by social relations is constantly developing, and this method can be adaptive to learn, without repeated learning from the beginning. (2) Sense of community: The hidden space learned by this method should represent the distance information of homogeneous nodes or adjacent nodes in the network. (3) Low dimension: when labeled data is sparse and low dimension, model generalization ability is better and training is easier to converge. (4) Continuity: The representation learned by this method is continuous, so it has smooth decision boundary and higher robustness.
  • Experimental results show that the proposed method performs better than other unsupervised representational learning methods in the absence of sufficient labeled data.
  • Personally, in the experiment part of this paper, some details are not clear, for example, how is the graph constructed? What is the specific business meaning of nodes and edges? It’s not clear, it’s just general, how many nodes, how many edges are there using this data.
  • In practical application, nodes need to be defined according to the business logic, as well as the edges connecting nodes according to some rules, so that a graph can be formed for representation learning.

Learning Social Representations

The definition of the figure

Take multi-classification tasks in social networks as an example:

V represents the node in the figure, and the meaning of the node is the category (member) in the figure. E represents the edge in the graph, represents a labeled graph, where S represents the size of the node’s feature space, and represents the number of categories.

In practical application, it is necessary to build a graph according to the specific logic of business. The traditional machine learning method is to map X to Y. What the model needs to learn is how to accurately learn this mapping function. The method proposed in this paper is independent of the distribution information of labels and learns the vector representation of nodes from the topological information of graphs. It’s a completely unsupervised learning method.

The representation vectors learned by this method can be used in any classification algorithm.

Random walk

Take any node as the starting node, and walk with its neighboring nodes randomly every time until the maximum stride length is reached. This truncated random walk offers two advantages

  • The local exploration walk is easy to parallelize, and several different walker can walk several different lines at the same time.
  • Information is obtained from truncated random walk, and when the graph structure changes a little, there is no need to repeat and re-learn.

In practical application, it is very convenient.

The long tail distribution

In the paper, it refers to power laws, which is actually similar to the long-tail distribution. In the field of natural language, we find that most words have very small word frequency, and only a few words have very high word frequency, which conforms to the long-tail distribution. In YouTube Social Graph, the random walk shows that the distribution of nodes also conforms to this long-tail distribution, as shown in the figure below:The word frequency distribution is consistent with the node frequency distribution, so the word2vec method which is effective in the field of natural language processing can be reused in the graph structure.

Deep Walk

Insert a picture description here

The above two algorithm flows can well explain the steps of Deep Walk, among which SkipGram algorithm represents the prediction of nodes around it by a node, which is a model in word2VEc method.According to the above formula, the neighboring nodes can be predicted by nodes.The by-product, the embedding matrix, is what we want to learn.

Similar to natural language processing, SkIP-Gram here also faces the problem of high computational complexity. If no processing is done, probability calculation of all nodes is required every time, with high time complexity and slow training, thus Hierarchical Softmax is introduced.

Hierarchical Softmax is a common method in NLP. For details, please refer to Hierarchical Softmax[4]. The main idea is to construct Huffman tree based on word frequency. The leaf nodes of the tree are the words in the word list, and the corresponding high-frequency words are closer to the root node. When the probability of generating a certain word needs to be calculated, instead of softmax calculation for all words, the path from the root node to the node where the word is located in the Huffman tree is selected to calculate the probability of generating the word, and the time complexity is reduced from O(N) to O(logN) (N nodes, Then the depth of the tree logN). This method is also applicable to the skip-Gram model of graph structure.

The experiment

Three data sets were used in the paper:

However, the paper does not give in detail the actual business meaning of nodes and edges in the graph structure formed by the three data sets. For example, in the graph, what is the specific meaning of nodes in each data set? What rules define edges?

In the paper experiment, some samples are randomly selected as training set (with label) and the rest as test set. In the training set, the representation vector of each node is learned in the way mentioned above (DeepWalk+Skip-Gram+Hierarchical softmax). Then, the LR classifier is trained with the learned representation vector as the feature to train a classification model. Test on the test set. The experimental results are as follows:You can see the electionLess training data with label(Label Sparse), DeepWalk performed better than other methods.

Personal summary

The method proposed in this paper is not difficult, and word2vec is a very basic method, but it is interesting and meaningful to apply this simple and effective method to graph structure, making word2vec more widely used.

The resources

[1]

DeepWalk: www.perozzi.net/publication…

[2]

CODE: github.com/phanein/dee…

[3]

Previously summarized word2vec: blog.csdn.net/Mr_tyting/a…

[4]

Hierarchical Softmax: blog.csdn.net/itplus/arti…

On this site

The “Beginner machine Learning” public account was founded by Dr. Huang Haiguang. Huang Bo has more than 23,000 followers on Zhihu and ranks among the top 110 in github (32,000). This public number is committed to the direction of artificial intelligence science articles, for beginners to provide learning routes and basic information. Original works include: Personal Notes on Machine learning, notes on deep learning, etc.

Highlights from the past

  • All those years of academic philanthropy. – You’re not alone

  • Suitable for beginners to enter the artificial intelligence route and information download

  • Ng machine learning course notes and resources (Github star 12000+, provide Baidu cloud image) \

  • Ng deep learning notes, videos and other resources (Github standard star 8500+, providing Baidu cloud image)

  • Statistical Learning Methods of Python code implementation (Github 7200+) \

  • Carefully organized and translated mathematical materials related to machine learning

  • Introduction to Deep Learning – Python Deep Learning, annotated version of the original code in Chinese and ebook

Note: If you join our wechat group or QQ group, please reply”Add group

To join Knowledge Planet (4300+ user ID: 92416895), please reply”Knowledge of the planet