Writing in the front
The following article first simply introduces the main advanced time series prediction method, time, the basic principle of the convolutional neural network (TCN), and then open source code based on TCN, taught you how to how to through the time convolution neural network for stock price forecasting, interested readers can also be based on this model for their own training data set and the forecast.
1
The basic principle and structure of TCN
TCN, which stands for Temporal Convolutional Network, is a model proposed in 2018 that can be used for Temporal data processing. Convolutional networks have proved to be effective in extracting advanced features from structured data. Temporal convolutional network is a neural network model using causal convolution and void convolution, which can adapt to the timing of time series data and provide visual field for time series modeling.
1. Causal convolution
When the time convolution network is trained to predict the next value of the input time series, it is assumed that the input sequence is, and some corresponding outputs are expected to be predicted, and its value is equal to the forward movement of the input value by one unit. The main limitation in making predictions is that when predicting the output of a time step t, it can only use previously observed inputs.
Therefore, TCN has two main constraints: the output of the network should be of the same length as its input, and the network can only use past time step information. In order to satisfy these timeliness principles, a 1-dimensional full convolutional network structure is used in TCN, that is, all its convolutional layers have the same length, with zero padding, to ensure that the higher layer is the same length as the previous layer. In addition, TCN uses causal convolution, that is, the output of time step T of each layer only computes the region that is no later than the time step T of the previous layer, as shown in the figure below. For one-dimensional data, causal convolution can be easily implemented by moving the output of the conventional convolution by a few time steps.
2. Empty convolution
In general, networks are expected to remember long-term information when dealing with time series. However, in the causal convolution we showed earlier, the size of the receiving field is finite unless many layers are superimposed. Therefore, in order to overcome these problems, empty convolution is used here to realize the receiving domain of finite layer network with exponential level size. Empty convolution is a convolution by skipping the size of a given step and then applying a filter over a region larger than its size. This is similar to pooling or step-convolution in that it enhances the size of the receive region, but it makes the output size equal to the input. When empty convolution is used, the empty exaggeration factor D is usually increased with the depth index of the network. This ensures a history of each input receiving domain and obtains a very large receiving domain as a valid history by using the deep network. The figure below is a diagram of the convolution of a causal cavity.
3. Residual connection
Since the receiving domain size of TCN depends on network depth N, filter size K and cavity expansion factor D, making TCN deeper and larger is the key to obtain a sufficiently large receiving domain. Empirically, making the network deeper and narrower, that is, stacking a large number of layers and choosing a fine-scale filter size, is an effective architecture. Residual connections have proved to be very effective for training deep networks. In the residual network, the whole network adopts jump connection to speed up the training process and avoid the gradient disappearance problem of depth model.
2
Stock price forecast based on TCN
Below is a basic example of stock price forecasting based on TCN. The principle of TCN model used comes from the paper: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, the code of TCN model comes from github’s open source code: Github.com/philipperem…
Local Environment:
Python 3.7
IDE:Pycharm
Copy the code
Library version:
pandas 1.0.3
numpy 1.18.1
matplotlib 3.2.1
tensorflow 2.3.1
sklean 0.22.2
Copy the code
1. Data preprocessing and data set division
For the sake of illustration, daily data of the Shanghai Composite index from 2005 to 2018 are used, which can also be replaced by local data for interested readers. First, we need to import the required related modules, where the TCN module comes from the code in the Github link above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tcn.tcn import TCN
from tensorflow import keras
Copy the code
Next we need to define a few global variables:
window_size = 10 # window size
batch_size = 32 # Training batch size
epochs = 200 # training epoch
filter_nums = 10 # filter number
kernel_size = 4 # the kernel size
Copy the code
Then, we define a function to read the data and divide the training set and test set. First, we need to read the data and normalize the data to facilitate the training of the model. Here, the opening price of Shanghai Composite Index is taken as an example. Then, the data is divided in the form of sliding Windows. Finally, the first 2000 pieces of divided data are used as training sets, and the last 1000 pieces of data are used as test sets. The specific code is as follows:
def get_dataset():
df = pd.read_csv('./000001_Daily_2006_2018.csv')
scaler = MinMaxScaler()
open_arr = scaler.fit_transform(df['Open'].values.reshape(-1, 1)).reshape(-1)
X = np.zeros(shape=(len(open_arr) - window_size, window_size))
label = np.zeros(shape=(len(open_arr) - window_size))
for i in range(len(open_arr) - window_size):
X[i, :] = open_arr[i:i+window_size]
label[i] = open_arr[i+window_size]
train_X = X[:2000, :]
train_label = label[:2000]
test_X = X[2000:3000, :]
test_label = label[2000:3000]
return train_X, train_label, test_X, test_label, scaler
Copy the code
2. Model construction
Keras is used to build the model, where the input layer of the model needs to follow the form specified by the TCN layer, window_size represents the size of the window, that is, the length of input data, and 1 represents the dimension of each time point. In the subsequent TCN layer, the meanings of each parameter are as follows:
- Nb_filters: an integer. The number of filters used in the convolution layer. Will be similar to units in the LSTM layer.
- Kernel_size: an integer. The size of the kernel used in each convolution layer.
- Dilation: list. A list representing the size of the void factors used in each layer. For example :[1, 2, 4, 8, 16, 32, 64].
- Nb_stacks: an integer. The number of stack residuals to use.
- Padding: string. The padding used in convolution. “Causal” is used in causal networks and “Same” is used in non-causal networks.
- Use_skip_connections: Boolean. If we want to add a Skip Connection from the input to each residual block.
- Return_sequences: Boolean. Whether to return the last output in the output sequence or the complete sequence.
- Dropout_rate: floats between 0 and 1. Specific gravity to dropout.
- Activation: Activation function to use.
- Kernel_initializer: initializer for the kernel weight matrix (Conv1D).
- Use_batch_norm: Whether batch normalization is being used.
- Kwargs: Any other parameters used to configure the parent layer.
After completing the setting of TCN layer, the fully connected layer followed by one layer and one output layer outputs the final prediction result, in which the activation function used is RELu. Finally, compile and train the model. Mae is the loss and Adam is the optimizer for training.
def build_model() :
train_X, train_label, test_X, test_label, scaler = get_dataset()
model = keras.models.Sequential([
keras.layers.Input(shape=(window_size, 1)),
TCN(nb_filters=filter_nums,
kernel_size=kernel_size,
dilations=[1.2.4.8]),
keras.layers.Dense(units=1, activation='relu')
])
model.summary()
model.compile(optimizer='adam', loss='mae', metrics=['mae'])
model.fit(train_X, train_label, validation_split=0.2, epochs=epochs)
Copy the code
Finally, the model is evaluated, and the prediction results are reversely normalized and visualized:
model.evaluate(test_X, test_label)
prediction = model.predict(test_X)
scaled_prediction = scaler.inverse_transform(prediction.reshape(-1, 1)).reshape(-1)
scaled_test_label = scaler.inverse_transform(test_label.reshape(-1, 1)).reshape(-1)
print('RMSE ', RMSE(scaled_prediction, scaled_test_label))
plot(scaled_prediction, scaled_test_label)
Copy the code
The structure of the model and the training process are as follows:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
tcn (TCN) (None, 10) 2940
_________________________________________________________________
dense (Dense) (None, 1) 11 =================================================================
Total params: 2,951
Trainable params: 2,951
Non-trainable params: 0
_________________________________________________________________ Epoch 1/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 1 s 14 ms/step - loss: 0.0739 - mae: 0.0739 val_Loss: 0.0131 - val_mae: 0.0131 Epoch 2/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 5 ms/s step - loss: 0.0141 - mae: 0.0141 val_Loss: 0.0063 - val_mae: 0.0063 Epoch 3/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 7 ms/step - loss: 0.0130 - mae: 0.0130 val_Loss: 0.0048 - val_mae: 0.0048 Epoch 4/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0115 - mae: 0.0115 val_Loss: 0.0081 - val_mae: 0.0081 Epoch 5/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0130 - mae: 0.0130 val_Loss: 0.0052 - val_mae: 0.0052 Epoch 6/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0122 - mae: 0.0122 val_Loss: 0.0046 - val_mae: 0.0046 Epoch 7/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0109 - mae: 0.0109 val_Loss: 0.0058 - val_mae: 0.0058 Epoch 8/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0116 - mae: 0.0116 val_Loss: 0.0063 - val_mae: 0.0063 Epoch 9/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] - 0 s 6 ms/step - loss: 0.0111 - mae: 0.0111 val_Loss: 0.0062 - val_mae: 0.0062 Epoch 10/200 50/50 [= = = = = = = = = = = = = = = = = = = = = = = = = = = = = =] 0 5 ms/s step - loss: 0.0112 - mae: 0.0112 val_Loss: 0.0056 - val_mae: 0.0056
Copy the code
After the model training, the RMSE printed is as follows:
RMSE 35.78251267773438
Copy the code
The prediction effect is shown in the figure below. It can be seen that the fitting effect of the model is ok, and the basic trend of the price is basically fitted.
The complete code is shown below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tcn.tcn import TCN
from tensorflow import keras
window_size = 10 # window size
batch_size = 32 # Training batch size
epochs = 200 # training epoch
filter_nums = 10 # filter number
kernel_size = 4 # the kernel size
def get_dataset() :
df = pd.read_csv('./000001_Daily_2006_2018.csv')
scaler = MinMaxScaler()
open_arr = scaler.fit_transform(df['Open'].values.reshape(-1.1)).reshape(-1)
X = np.zeros(shape=(len(open_arr) - window_size, window_size))
label = np.zeros(shape=(len(open_arr) - window_size))
for i in range(len(open_arr) - window_size):
X[i, :] = open_arr[i:i+window_size]
label[i] = open_arr[i+window_size]
train_X = X[:2000, :]
train_label = label[:2000]
test_X = X[2000:3000, :]
test_label = label[2000:3000]
return train_X, train_label, test_X, test_label, scaler
def RMSE(pred, true) :
return np.sqrt(np.mean(np.square(pred - true)))
def plot(pred, true) :
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(range(len(pred)), pred)
ax.plot(range(len(true)), true)
plt.show()
def build_model() :
train_X, train_label, test_X, test_label, scaler = get_dataset()
model = keras.models.Sequential([
keras.layers.Input(shape=(window_size, 1)),
TCN(nb_filters=filter_nums, # Number of filters, analogous to units
kernel_size=kernel_size, The size of the convolution kernel
dilations=[1.2.4.8]), # void factor
keras.layers.Dense(units=1, activation='relu')
])
model.summary()
model.compile(optimizer='adam', loss='mae', metrics=['mae'])
model.fit(train_X, train_label, validation_split=0.2, epochs=epochs)
model.evaluate(test_X, test_label)
prediction = model.predict(test_X)
scaled_prediction = scaler.inverse_transform(prediction.reshape(-1.1)).reshape(-1)
scaled_test_label = scaler.inverse_transform(test_label.reshape(-1.1)).reshape(-1)
print('RMSE ', RMSE(scaled_prediction, scaled_test_label))
plot(scaled_prediction, scaled_test_label)
if __name__ == '__main__':
build_model()
Copy the code
4
conclusion
In this paper, we first briefly introduced some basic content of time convolutional network, and then realized a simple example based on the open-source code of time convolutional network, which was applied to the price prediction of Shanghai Composite Index. It can be seen from the results that time convolution network has strong data fitting ability and time sequence modeling ability. When it is applied to stock price forecast, it is still unavoidably affected by fitting, model migration and market noise. Even so, as RNN and LSTM occupy the field of timing processing, TCN, as a novel technology, still has strong research significance and value.
Learn more about artificial intelligence and quantitative finance
<- Please scan for attention
Let me know you’re watching