STGCN: Spatio-temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting

the paper “Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting”

Please check the whole content of the article + the corresponding PPT:STGCN-keras

Problem definition

How to accurately carry out medium and long term traffic forecast (medium and long term: over 30 minutes)

This paper is mainly to predict the speed of the location

Previous work

Before this paper, there are several methods of traffic prediction: dynamic modeling, data driven and later some methods of deep learning. However, the traditional methods regard the data as grid data, and the prediction effect is not good for the medium and long term.

The solution proposed in this paper

We all know that graph can be represented by adjacency matrix, just as W in the graph is the adjacency matrix of graph. PeMSD7(M) data set used in the experiment has 228 data points in total, which is equivalent to a graph with 228 vertices, because the model is mainly used to predict the speed. So each vertex has only one characteristic: speed.

Then a structure ST-Conv Block is proposed to model space-time.

The proposed framework

For graph convolution, see dynamic Understanding graph convolution

Dynamic understanding graph convolution is being perfected…

Graph CNNs for Extracting Spatial Features

First using convolution to capture the spatial correlation, this paper USES is chebyshev approximation and the first-order approximate figure after the convolution formula, we only see the final formula of the convolution of them figure D degree matrix, A_hat figure of adjacency matrix + unit matrix, in order to not only consider the state of the neighbor node in the process of convolution, also considering their own state.

Gated CNNs for Extracting Temporal Features

Gated convolution is used to capture time dependence in the dimension of time. Different from the traditional convolution method, causal convolution is used here because time series is considered. Because we use the convolution operation, we do not need to rely on the previous output like the previous method using RNN, so we can carry out parallel calculation for the data, which makes the model training faster.

And GLU operation is also adopted. GLU is proposed in this paper: Language Modeling with Gated Convolutional Networks. In the paper of STGCN, the author did not explain too much about this. My understanding is that this operation can alleviate the phenomenon of gradient disappearance and retain the nonlinear ability of the model.

Moreover, we can see that the operation effect of the model is consistent with the description in the paper. When Kt neighbors in the time dimension are considered, the length of the output sequence will be reduced by KT-1. Kt is 3 in the code, the input time dimension is 12, and the result of the convolution is 10.

ST-Conv Block

The above graph convolution and gated CNN are combined into the structure shown in the figure, in which the bottleneck strategy is used to achieve scale compression and feature compression. A normalized layer is followed after each layer to prevent overfitting.

The formula of ST-Conv Block is another interpretation of the graph. The input data is first convolved in time dimension, and the output result of graph convolution is then convolved in time dimension through a RELU, which is the output of the whole ST-Conv Block.

Model

In the final model, two st-conv blocks are stacked and then an output layer is connected. The output layer first merges the time dimension of the previous output data with the convolution of the time dimension, and then outputs the final prediction data through a convolution, which is a graph of the next time dimension [1,228,1]. The model uses L2 loss.

The model summary

STGCN is a general framework for dealing with structured time series. It can not only solve traffic network modeling and prediction problems, but also be applied to more general spatio-temporal sequence learning tasks.
The space-time convolution block combines graph convolution and gated time convolution, which can extract the most useful spatial features and capture the most basic time features coherently.
The model is completely composed of convolution structure, which achieves parallelization at the input end, with fewer parameters and faster training speed. More importantly, this economic architecture allows the model to deal with large-scale networks more efficiently.

The overall structure of the model can be viewed through tensorBoard

The experiment

Data description: Two real-world traffic datasets, BJER4 and PeMSD7, were collected by the Beijing Municipal Commission of Transportation and the California Transportation Agency

The Dataset Description: two real-world traffic datasets, BJER4 and PeMSD7, collected by Beijing Municipal Traffic Commission and California Deportment of Transportation, respectively

PeMSD7 website: pems.dot.ca.gov/?dnode=Clea…

The diagram shows the road network shown on the front page of PeMSD7.

For PeMSD7, there are two data: PeMSD7(M) and PeMSD7(L). Because the data is aggregated every five minutes, 12 pieces of data are generated in an hour, so 288 pieces of data are generated in a day.

STGCN captured the trend of rush hour more accurately than other methods, and it detected the end of rush hour earlier than other models.

With a Titan XP dual-card I7-8700K, 12-core 32G configuration and bathc_size of 50, an EPOCH training time would be around 8 seconds.

In Tensorboard, we can also see the changes of learning rate and train_loss in the training process.

reference

[1] Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

[2] UrbanFM: Inferring Fine-Grained Urban Flows

Welcome to pay attention to our public number: knowledge precipitation tribe.