Long short-term memory Network (LSTM) is a powerful recursive neural network capable of learning long sequences of observations.
One advantage of LSTMS is their ability to predict time series effectively, but they can be difficult to configure and use for this purpose.
A key feature of LSTMS is that they maintain an internal state that assists in prediction. This raises the question of how to initialize state seeds in an appropriate LSTM model before making predictions.
In this tutorial, you will learn how to design, conduct experiments, and interpret the results from experiments, discussing whether it is better to use training data sets to initialize state seeds for appropriate LSTM models or not to use previous states.
After completing this tutorial, you will know:
-
An open question on how to preset the state for a suitable LSTM prediction model.
-
How to develop powerful testing tools for evaluating the ability of LSTM models to solve univariate time series prediction problems.
-
How to determine if it is appropriate to initialize the LSTM state seed before prediction when solving your time series prediction problem.
Let’s get started.
How to make time series prediction for LSTM initialization state in Python
An overview of the tutorial
The tutorial is divided into five parts; They are:
-
LSTM state seed initialization
-
Shampoo sales data set
-
LSTM model and test tools
-
The code
-
The test results
The environment
This tutorial assumes that you have the Python SciPy environment installed. You can use Python 2 or 3 as you follow this example.
You must install Keras (version 2.0 or later) using either TensorFlow or Theano backend.
This tutorial also assumes that you have sciKit-learn, Pandas, NumPy, and Matplotlib installed.
If you need help installing your environment, check out this article:
How do I use Anaconda to install Python environments for machine learning and deep learning
How to Setup Your Python Environment for Machine Learning with Anaconda
1. LSTM state seed initialization
When stateless LSTM is used in Keras, you have precise control over when the internal state of the model is cleared.
This is done by using the model.reset_states() function.
When training has stateful LSTM, it is important to empty the model state between training epochs. In this way, the states created during training for each epoch will match the sequence of observations for that epoch.
Assuming we can achieve such precise control, there is the question of whether and how to preset the state of the LSTM before making predictions.
Choose to have:
-
Reset status before prediction.
-
Use the training data set to preset the state prior to prediction.
Hypothetically, it is better to use the training data set to preset the model state, but this needs to be verified experimentally.
In addition, there are many ways to initialize the state; Such as:
-
Complete a training epoch, including weight updates. For example, the state is not reset after the last training epoch has ended.
-
Complete the prediction of training data.
It is generally accepted that the two methods are equivalent to some extent. Predicting the latter of training data is better because this approach does not require any modification of network weights and can be used as a repeatable step for immutable networks stored in folders.
In this tutorial, we will consider the differences between the two approaches:
-
Predict test data sets using stateless appropriate LSTM (for example, after a reset).
-
Use the stateful appropriate LSTM to predict the test data set after the training data set is predicted.
Next, let’s take a look at the standard time series data set that we will use in this experiment.
Shampoo sales data set
This data set describes the monthly sales of a shampoo over a 3-year period.
The data unit is sales volume, and there are 36 observations. The original data set was provided by Makridakis, Wheelwright and Hyndman (1998).
You can download and learn more about this data set by following this link:
https://datamarket.com/data/set/22r0/sales-of-shampoo-over-a-three-year-period
The sample code below loads and generates a view of the loaded dataset.
Run the example to load the dataset in the Pandas sequence and print the first five lines.
A sequence diagram showing a clear growth trend is then generated.
Shampoo Sales Chart
Next, we’ll take a look at the LSTM configuration and testing tools used in this experiment.
3. LSTM model and test tools
Data partitioning
We will divide the shampoo sales data set into two sets: a training set and a test set.
The first two years of sales data will be used as a training data set and the last year of sales data will be used as a test set.
We will use the training data set to create the model, and then make predictions against the test data set.
Model evaluation
We will use rolling prediction, also known as stepping model validation.
Run each time step of the test dataset one at a time. The model is used to predict the time step, and then the actual expected values generated by the test group are collected, which the model will use to predict the next time step.
This mimics a real life scenario where new shampoo sales observations are published at the end of the month and then used to predict sales the following month.
The structure of the training and test datasets will simulate this. We will generate all the predictions in one step.
Finally, the prediction of all test data sets is collected and the error value is calculated to summarize the prediction ability of the model. The reason for adopting root mean square error (RMSE) is that this calculation method can reduce the impact of coarse error on the results. The unit of the score obtained is the same as the unit of the forecast data, namely the monthly sales of shampoo.
Data preparation
Before we can match the LSTM model for the data set, we must transform the data.
The following three data transformations are required before model matching and prediction.
-
Transform sequence data to render it static. Specifically, the difference lag=1 is used to remove the growth trend from the data.
-
The time series problem is transformed into supervised learning problem. Specifically, the data set is used as input and output mode, and the observed value of the previous time step can be used as input to predict the observed value of the current time step.
-
Transform the observations so that they are in a certain range. Specifically, the data is scaled from -1 to 1 to satisfy the default hyperbolic tangent activation function of the LSTM model.
LSTM model
The LSTM model used will be effective for prediction but not adjusted.
This means that the model will match the data and make valid predictions, but it is not the best model to match the data set.
The network topology consists of an output, a hidden layer of 4 units, and an output layer of 1 output value.
The model will match data sets of batch size 4 and epoch 3000. The training data set will be reduced to 20 observations upon completion of data preparation. This allows the Batch size to be evenly distributed between the training and test datasets (as a requirement).
Test run
Thirty tests will be conducted for each scheme.
This means that each scenario will create and benchmark 30 models. The result distribution is given from the root mean square error (RMSE) collected for each trial, which can then be summarized using descriptive statistics (e.g., mean and standard deviation) methods.
This must be done because neural networks similar to LSTM are affected by their initial conditions (such as their initial random weights).
This means that the results of each scenario will allow us to explain the average performance of each scenario and how they compare.
Let’s examine these results.
4. Code writing
To enable you to reuse this experimental setup, the key modularity behaviors are divided into readable functions and testable functions.
The experiment() function describes the parameters of each scheme.
The complete code is written as follows:
5. Test results
Running trials takes some time or consumes CPU or GPU hardware.
Print the root mean square error of each test to show the status of progress.
At the end of each trial, summary data for each scheme, including mean deviation and standard deviation, were calculated and printed.
The complete output is as follows:
In addition, generate a box and whiskers diagram and save it to a folder as follows:
Box-and-whisker plots of initialized and uninitialized LSTM
The results were surprising.
They show that schemes that do not initialize the LSTM state seeds before predicting the test data set yield better results.
The conclusion can be drawn by comparing the average error predicted by this scheme (146.6005050) with the average error predicted by another scheme (initial state seed) (186.432143). You can see it more clearly by looking at the box and whiskers diagram.
Perhaps the model configuration chosen created the model too small to show the benefits of the pre-predictive initialization state seed on the sequence and internal state. Larger trials may also be needed.
extension
The unexpected results set the stage for further experiments.
-
Measure the impact of clearing and not clearing after each training epoch.
-
Test sets and one-time prediction training sets compare the effects of prediction one time step at a time.
-
Measures the impact of resetting the LSTM state after each epoch and not resetting it.
Have you tried any of these extensions?
conclusion
Through this tutorial, you will learn how to experimentally determine the best way to initialize LSTM state seeds when solving univariate time series prediction problems.
Specifically, you learned:
-
On the problem of initializing LSTM state seeds before prediction and the method to solve this problem.
-
How to develop powerful testing tools to measure the performance of LSTM models to solve time series problems.
-
How to determine whether to use training data to initialize LSTM model state seeds prior to prediction.
Jason Brownlee is the author
The original address