GeoMAN: Multi-layer Attention network for temporal prediction of geographic sensors

Author: Wei Zuchang

1 introduction

In our real life, a large number of sensors are already deployed (such as weather stations). Each sensor has its own unique geospatial location and is constantly producing time series readings. When a group of sensors jointly monitor the environment in a space, there will be spatial correlation between the readings. We call the readings of these sensors geographical sensing time series. In addition, when different spatial locations are monitored by the same sensor, a variety of geographical perception time series will usually be generated. For example, as shown in Figure 1(a), a ring detector on the road reports the readings of passing vehicles and their speed in time. Figure 1(b) shows that the sensor produces three different water-quality chemical indicators every 5 minutes. In addition to monitoring, there is a growing need for geographically aware time series prediction, such as traffic forecasting.

Figure 1:(a)-(b) example of geosensor temporal data

However, the prediction of geographical perception time series is very complex, mainly affected by the following two complex factors:

Dynamic temporal correlation.
External factors. Sensor readings are also influenced by the surrounding environment, such as weather (such as strong winds), time of day (such as peak hours) and land use.

To address these challenges, the paper proposes a multi-layered Attention network (GeoMAN) to predict geometric sensor readings over the next few hours. The research of this paper has three contributions:

We build a multi-layer attention mechanism to model spatio-temporal dynamics. Especially in the first layer, this paper proposes an innovative attention mechanism (consisting of local spatial attention and global spatial attention) to capture complex spatial connections (such as connections within sensors) between sequential sequences of different sensors. In the second layer, a temporal attention was applied to model the dynamic time association (such as the connection between sensors) at different time intervals in time series.
This module designs a general extraction module to integrate external parameters from different fields. Then the extracted potential representative factors are input into the multi-layer ATTENTION network to enhance the importance of these external factors.

Multilayer Attention network

Figure 2 shows the overall framework of the paper. According to encoder-decoder framework, we use two separate LSTM networks, one for encoder input sequences (such as historical time series of geographic sensors), and the other for predicted output sequences. To be more specific, the GeoMAN model in the paper mainly consists of two parts:

Multiple layers of attention. The encoder uses two spatial attention mechanisms, while the decoder uses a temporal attention mechanism. In this paper, two different attention mechanisms (Local spatial attention and Global spatial attention) are used in the Encoder layer. As shown in Figure 2, it can pass the hidden state before encoder. Historical sensor data and spatial information (such as sensor networks) to capture the complex relationships between sensors within each time interval. In the decoder layer, a temporal attention is used to automatically select previous similar time intervals for prediction.
External factor extraction. This module handles the effects of external factors and inputs them to the decoder layer as part of its input. Ht and ST are used here to represent the hidden state and cell state of encoder layer at time T respectively. Similarly, dt and S ‘are used to represent these two parts of the decoder layer

Figure 2: Framework of the paper.Attn: attention. Local: local spatial attention. Global: global spatial attention. Concat: concatenation.: Predicting Value at time T.ct: Context Vectors at time t.h0: Initial value of encoder.

2.1 Spatial Attention

2.1.1 the Local Spatial Attention

This paper is the first to introduce the local spatial attention mechanism. For a sensor, there is complex correlation between its local time series. For example, an air quality monitoring station reports time series of different substances, such as PM2.5(specific substances), NO and SO2. In fact, PM2.5 concentrations are often influenced by other time series, including other air pollutants and local weather conditions. To solve this problem, given the KTH local eigenvector of the i-th sensor (i.e), we use the attention mechanism to adaptively capture the dynamic correlation between the target sequence and each local feature, and the formula is as follows:

Where [·; ·] is the merge operation,It’s a learned parameter. The local eigenvalues of the attention weights are made up of the local features of the input and the historical state in the Encoder layer (i.e.Jointly determined, this weight value represents the importance of each local feature. Once attention’s weight is obtained, the output vector of Local spatial atttention at time T can be calculated by the following formula:

2.1.2 Global Spatial Attention

The historical time series monitored by other sensors have a direct effect on the sequence to be predicted. However, the weight of influence is highly dynamic and changes over time. Since there are many unrelated sequences, directly using all the time series input to the Encoder layer to capture the correlation between different sensors will lead to very high computational cost and lower performance. Note that the weight of this influence is influenced by local conditions of other sensors. For example, when a wind blows from far away, air quality in some areas is affected more than it was before. Inspired by this, we built a new attention mechanism to capture dynamic changes between different sensors. Given that the ith sensor is the object of our prediction and the other sensors are L, we can calculate the attention weight (i.e. influence weight) between them, and the formula is as follows:

Among themandThese are learned parameters. The attentional mechanism adaptively selects relevant sensors for prediction by referring to the target sequence and local features of other sensors. Meanwhile, by considering the previous hidden state in the encoder:And the cell state:To spread historical information across time steps.

Note that spatial factors can also affect the correlation between different sensors. In general, geographic sensors are connected to each other explicitly or implicitly. Here, we use matricesTo represent the geographic spatial similarity, whereRepresents the similarity between sensor I and j. Unlike attention weights, geographic spatial similarity can be regarded as prior knowledge. In particular, ifToo large, choose the nearest or similar sensor will be better. Then, we use a Softmax function to ensure that the sum of all attention weights is 1, and the following formula is obtained considering geographic spatial similarity:

Among themIs a tunable hyperparameter. ifBig. This formula gives attention the same weight as geospatial similarity. Through these attention weights, we can calculate the following output vectors of global spatial attention:

2.2 Temporal Attention

As the encoding length increases, the performance of encoder-decoder structure will decline rapidly, so adding a temporal attention mechanism can adaptively select the relevant hidden States of encoder layer to generate output sequence, that is, The dynamic time correlation between different time intervals in the prediction series is modeled. Specifically, in order to calculate the attention vector at each output time t ‘in each hidden state of encoder, we define:

Among them

2.3 Extraction of external factors

There is a strong relationship between the time series of geographic sensors and spatial factors, such as POIs and sensor networks. Formally, these factors together determine the function of an area. In addition, there are many temporal factors (such as weather and time) that affect the sensor readings. Inspired by relevant papers, this paper designs a simple and effective component to deal with these factors. As shown in Figure 2 above, time factors including time characteristics, meteorological characteristics and SensorID that require the sensor to be predicted are first combined. We use weather forecasts to improve our performance as the weather conditions for future periods are unknown. Note that most of these factors are classification values and cannot be directly input into the neural network. We input the attributes of each classification separately into different embedding layers and convert them into a low-dimensional vector. In terms of spatial factors, we use different categories of POIs densities as POIs characteristics. Since the characteristics of the sensor network depend on the specific environment, we simply use the structural characteristics of the network (such as the number of residents and intersections). Finally, we connect the obtained embedding vector with the spatial feature vector as the output of the module, denoted as, includingRepresents the future time step of the decoder layer.

2.4 Encoder-decoder and model training

In the Encoder layer, we simply summarize local spatial attention and the global spatial attention into:

Among themLSTM unit is used in decoder layer. And then, let’s take the previous environment vectorAnd now I have hidden stateCombined with the new Hidden State to make the following final prediction:

Experiment 3

3.1 Experimental Data

In this paper, two data sets are used to train the model respectively. The detailed content of the data sets is shown in Figure 3:

Figure 3: Data set details.

However, due to the problem that the complete data is not publicly disclosed, we will use the sample_data provided by him for later reproduction, which is the vector obtained after his processing, so this part will not be introduced in depth. If you have any questions or interest in this part, you can refer to the corresponding part of the paper by yourself.

3.2 Evaluation Indicators

We use several criteria to evaluate our model, including root mean square error (RMSE) and mean absolute error (MAE), both of which are widely used in regression tasks.

3.3 super parameter

In light of previous studies, the paper set an interval of 6 days to make short-term predictions. In the training process, we set the batch size as 256 and the learning rate as 0.001. In the external feature automatic extraction module, the paper embedded SensorID into, embed the time feature intoIn the. In general, there are 4 hyperparameters in this model, among which the tradeoff parameterEmpirically, it is fixed between 0.1 and 0.5. For window length T, let T∈{6,12,24,36,48}. For simplicity, we use the hidden layer of the same dimension in the Encoder layer and decoder layer, and do a grid search on {32, 64, 128, 256}. In addition, we use stacked LSTMs(layers q) as encoder and decoder units to improve our performance. The experiment found that when q=2, m=n=64,When =0.2, it performs best in the verification set.

4 Model comparison

In this section, we compare the paper’s model with two data sets. To be fair, the best performance for each method with different parameter Settings is shown in Figure 4.

Figure 4: Comparison of performance in different models

In terms of water quality prediction, our proposed method is obviously superior to other methods on both indexes. In particular, GeoMAN surpassed the most advanced method (DA-RNN) on MAE and RMSE with 14.2% and 13.5%, respectively. On the other hand, because the concentration of residual chlorine (RC) followed a certain periodic rule, the stDNN and RNN methods (Seq2seq, DA-RNN and GeoMAN) obtained better performance than stMTMVL and FFA by considering a longer time relationship. GeoMAN and Seq2seq offer significant improvements over LSTM’s prediction of future time steps due to the active role of decoder components. It is worth noting that GBRT performs well on most baselines, which illustrates the superiority of the integration approach.

Compared with relatively stable water quality readings, PM2.5 concentration fluctuates greatly and is difficult to predict. Figure 4 is a comprehensive comparison of air quality data in Beijing. It is easy to see that our model achieves the best performance of both MAE and RMSE. Following our previous work focusing on MAE, we focused on this metric. The proposed method is 7.2% to 63.5% lower than these methods, indicating that it has better generalization performance in other applications. Another interesting observation was that stMTMVL worked well for water quality prediction, but showed a disadvantage in this respect because the number of joint learning tasks for air quality prediction was much greater than the number of joint learning tasks for water quality prediction.

5 concludes

This paper proposes a time series prediction network based on multilayer attention. At the first level, the local and global spatial attention mechanisms are applied to capture the dynamic sensor association in the geo-aware data. In the second layer, the paper adaptively selects the relevant time step for prediction by using temporal attention. In addition, the model considers the influence of external factors and uses a common feature extraction module. In this paper, the model is evaluated on the data sets of two kinds of geographic sensors. The experimental results show that the model achieves the best performance in RMSE and MAE with nine other models.

Project address: momodel.cn/workspace/5…

6 Reference Materials

GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction
Blog: Softmax details
Blog: Attention
Blog: Adam Optimizer

About us

Mo (https://momodel.cn) is a Python-enabled online modeling platform for artificial intelligence that helps you quickly develop, train, and deploy models.

Mo Artificial Intelligence Club is initiated by the website’s R&D and product design team, committed to lowering the threshold of artificial intelligence development and use of the club. The team has experience in big data processing and analysis, visualization and data modeling, has undertaken multi-field intelligent projects, and has full design and development capabilities from the bottom to the front end. His research interest is big data management and analysis and artificial intelligence technology, which can promote data-driven scientific research.

At present, the club holds offline paper sharing and academic exchange in Hangzhou every two weeks. We hope to gather friends from all walks of life who are interested in ARTIFICIAL intelligence, continue to communicate and grow together, and promote the democratization and popularization of artificial intelligence.