1. Definition and classification of time series

Definition: Time series refers to the establishment of the relationship between observed results and time changes, which can guarantee the prediction of results and changes in a future period of time. Mathematics is defined as a sequence of random variables:


{ x t . t = 2 . 1 . 0 . 1 . 2 . . . . . T } \ left \ {x_ {t}, t = 2, 1, 2,… ,T \right \}

{Xt}\left \{X_{t}\right \}{Xt}.

  1. The time series can be divided into:

    • Stationary series: a series in which there is no trend and observations fluctuate at a fixed level.
    • Nonstationary series: a series containing trends, seasonality, or periodicity, which may include one or more.
  2. Generally, the time series model is divided into four influencing factors according to the length of the period:

    • Trend: a monotonous upward or downward Trend over a long period of time, such as long-term economic growth or the size of an emerging industry.
    • Seasonality: This refers to the periodic fluctuations that occur repeatedly in a year and have similar trends in the same period of each year, such as spring, summer, fall and winter, the first half of the year, and the low and high season.
    • Cyclicity: Alternating fluctuations, usually caused by changes in the economic environment, as opposed to changes in trend. Cyclical fluctuations are often longer than a year.
    • Irregular: Irregular changes affected by contingent factors that are not predictable in time series, usually noise.

    Impact factors can be analyzed in the Time Series Analysis toolkit using the following Python statements:

    result = sm.tsa.seasonal_decompose(data, freq=365)
    Copy the code

2. Time series model

There are two kinds of time series models commonly used at present, one is the linear model based on statistics, the other is the neural network model.

2.1 Linear Model

2.1.1 Autoregression Model (AR)

The Auto Regressive model is the simplest of the linear time series models. It uses the correlation between the data in the front part and the data in the back part to create the regression equation. It develops from linear regression. When a time series x1x_1x1, x2x_2x2, x3x_3x3… Xnx_nxn, then the mathematical expression of its P-order AR model AR(P)AR(P)AR(P)AR(P) is:


x t = ϕ 1 x t 1 + ϕ 2 x t 2 + . . . + ϕ p x t p + u t x_{t}=\phi _{1}x_{t-1}+\phi _{2}x_{t-2}+… +\phi _{p}x_{t-p}+u_{t}

Among them, ϕ\phi ϕ is the autoregression coefficient, and utu_tut is white noise, which is the random fluctuation in the time series. When TTT is equal to one, it’s a first order autoregression model.

  • Truncation, AR model of partial autocorrelation function of the tail of the autocorrelation function (truncated said starting from a certain order are (almost) 0, trailing refers to suddenly jump zero does not exist a certain order but gradually decay to zero, and the time series autocorrelation sequence itself measure gain and loss in different time correlation of random variables, puts forward the partial autocorrelation test other random variables of interference between the two times, Is a more pure correlation.) .

  • In the modeling process of AR model, the problem of determining p order is called order determination, and partial autocorrelation function and AIC are commonly used. The specific practice is to draw a graph of PACF to judge its truncation, so as to select the best order P.

  • The least square method and maximum likelihood estimation are used to estimate the parameters of the model.

  • AR model is mainly used to model the financial sequence in the financial model.

2.1.2 Moving Average Model (MA)

Moving Average is also a kind of linear time series model. Different from AR model, it is not a linear combination of time history series values, but a linear combination of historical white noise to affect the predicted value of the current moment point. His principle is that historical white noise indirectly affects the predicted value of the current point in time by influencing historical time sequence values. If a white noise sequence u1U_1U1, U2U_2U2, U3U_3U3… Unu_nun, then the mathematical expression of order Q MA model MA(q)MA(Q)MA(Q)MA(Q)MA(Q) is:


x t = u t + Theta. 1 u t 1 + Theta. 2 u t 2 + . . . + Theta. q u t q x_{t}=u_{t}+\theta _{1}u_{t-1}+\theta _{2}u_{t-2}+… +\theta _{q}u_{t-q}

Where θ\theta θ is the moving regression coefficient, and utu_tut is the white noise in different periods.

  • The order theory of MA is truncated after the autocorrelation coefficient is q, so the last position significantly not equal to 0 in the ACF diagram can be provisionally defined as the order of MA. You can actually use AIC to determine the order.
  • The parameters of MA can be estimated by the method of moment estimation, inverse correlation function, conditional maximum likelihood estimation and exact maximum likelihood estimation.
  • In financial models MA is often used to describe shock effects.

2.1.3 Autoregressive Moving Average Model (ARMA)

If the two models are combined, they possess the properties of partial autocorrelation function truncation of AR and correlation function truncation of MA at the same time, then some causal time series that cannot realize partial autocorrelation function truncation or correlation function truncation in low order can be processed. Therefore, the ARMA model can be obtained under the condition that the goodness-of-fit of the data is similar, which can solve the relationship between the current data and the later data as well as the random changes, and does not require the partial autocorrelation function and correlation function truncation. Its mathematical expression is as follows:


x t = ϕ 1 x t 1 + ϕ 2 x t 2 + . . . + ϕ p x t p + u t + Theta. 1 u t 1 + Theta. 2 u t 2 + . . . + Theta. q u t q x_{t}=\phi _{1}x_{t-1}+\phi _{2}x_{t-2}+… +\phi _{p}x_{t-p}+u_{t}+\theta _{1}u_{t-1}+\theta _{2}u_{t-2}+… +\theta _{q}u_{t-q}

Where ARMA has two orders (p,q)(p,q)(p,q) (p,q), which respectively represent the order of AR and MA.

  • The order determination of ARMA is usually to select the order combination in a range by observing and searching the minimum AIC and BIC, which can be obtained by grid search.

2.1.4 ARIMA Model

ARMA model can deal with stationary time series, but for non-stationary time series, difference needs to be introduced to solve this problem, so ARIMA model appears. Its mathematical expression is as follows:


( 1 i = 1 p ϕ i L i ) ( 1 L ) d X t = ( 1 + i = 1 q Theta. i L i ) Epsilon. t \left ( 1-\sum_{i=1}^{p}\phi _{i}L^{i} \right )\left ( 1-L \right )^{d}X_{t}=\left ( 1+\sum_{i=1}^{q}\theta _{i}L^{i} \right )\varepsilon _{t}

Where, LLL is the hysteresis operator, d∈Zd\in Zd∈Z.

Difference operator:


Δ d x t = ( 1 L ) d X t \Delta ^{d}x_{t}=\left ( 1-L \right )^{d}X_{t}

Let WTWTWT equal:


w t = Δ d x t = ( 1 L ) d x t w_{t}=\Delta ^{d}x_{t}=\left ( 1-L \right )^{d}x_{t}

Then the ARIMA formula can be deformable as:


w t = ϕ 1 w t 1 + ϕ 2 w t 2 + . . . + ϕ p w t p + Delta t. + u t + Theta. 1 u t 1 + Theta. 2 u t 2 + . . . + Theta. q u t q w_{t}=\phi _{1}w_{t-1}+\phi _{2}w_{t-2}+… +\phi _{p}w_{t-p}+\delta +u_{t}+\theta _{1}u_{t-1}+\theta _{2}u_{t-2}+… +\theta _{q}u_{t-q}

And the third parameter DDD is introduced as the difference order, so ARIMA’s model order combination is (P,q,d)(P,q,d)(P,q,d). The difference is introduced to eliminate the linear trend and change the non-stationary columns into stationary ones. The multi – order difference can also solve the low – order difference is still unstable state.

The development process of ARIMA is as follows:

  1. Data visualization to identify stationarity.
  2. Non-stationary time series data are differentiated to obtain stationary series.
  3. Build appropriate models:
    • After stabilization, if PACF is truncated and ACF is trailing, the AR model is established.
    • If PACF is trailing and ACF is truncated, then the MA model is established.
    • If PACF and ACF are trailing, the sequence fits the ARMA model.
  4. After the order of the model is determined, the parameters of the ARMA model are estimated. The most commonly used method is the least square method.
  5. Hypothesis testing to determine (diagnose) whether the residual sequence is white noise sequence.
  6. Make predictions using models that have been tested.

Project Code basic steps:

  1. Pandas reads and loads the data, and preprocesses the data to handle null values and missing points in time.
  2. Use visualizations to explore the data, using charts to understand basic statistics and possible seasonal trends.
  3. Select the model, specify the parameter range of the grid search, and exhaust the best parameter combination with the lowest AIC score.
  4. Set the time period to be predicted, and then use the optimal model to predict.

2.1.5 Seasonal ARIMA (SARIMA)

SARIMA: An ARIMA model with seasonal differences because seasonality is defined in time series. Seasonal differences are similar to regular differences, but can be achieved by subtracting the value from the previous season instead of subtracting consecutive terms. The order combination of SARIMA is:


( p . q . d ) x ( P . Q . D . m ) \left ( p,q,d \right )\times \left ( P,Q,D,m \right )

Where (P,Q,D)\left (P,Q,D \right)(P,Q,D) is the order of SAR, SMA and seasonal difference, and m is the time frequency range generally attached to 12.

SARIMAX: and then add the exogenous variable to SARIMA.

2.1.6 Model evaluation

AIC: From the perspective of prediction, the criterion of optimal fitting of statistical models is measured. The smaller the AIC is, the better the model is. The model with the smallest AIC is usually selected.


A I C = 2 k 2 ln ( L ) = ( 2 k 2 L ) n AIC=2k-2\ln \left ( L \right )=\frac{\left ( 2k-2L \right )}{n}
  • K is the number of parameters, small k means the model is concise;
  • L is the likelihood function, L = – (n2) ln ⁡ (PI) 2 – (n2) ln ⁡ (ssen) – n2L = – \ left (\ frac {n} {2} \ right), ln, left (2, PI, right) – \ left (\ frac {n} {2} \ right) \ ln \ left (\ frac {sse} {n} \ right) – \ frac {n} {2} L = – (2 n) ln (PI) 2 – (2 n) ln (nsse) – 2 n, sse is the sum of squared residuals. Large L means the model is accurate;
  • N is the number of observed values;

BIC: Bayesian information criterion, which measures the model’s good data fitting from the perspective of fitting. Similarly, the smaller BIC is, the better the model is.


B I C = k ln ( n ) 2 ln ( L ) BIC=k\ln\left ( n \right )-2\ln \left ( L \right )

HQ: BIC = KLN ⁡ (ln ⁡ (n)) – 2 ln ⁡ BIC (L) = k, ln, left (\ ln (n) \ right) – 2, ln, left (L) \ right BIC = KLN (ln (n)) – 2 ln (L)

Generally, AIC and BIC scores are determined to determine the best combination of parameters.

2.2 Nonlinear model LSTM

The principle of LSTM (Long term short-term memory Network) is summarized here. LSTM is improved based on recurrent neural network (RNN), mainly to solve the problem of gradient disappearance in RNN and the loss of earlier data when the sequence is too long. Its main internal mechanism is to regulate the flow of information through three gates, knowing which data in the sequence needs to be saved or discarded.

Core Concepts: The core concepts of LSTM are cellular states, three gates, and two activation functions. Cellular states act as highways, transmitting relevant information along the chain of sequences. Gates are different neural networks that determine what information to allow on the state of the cell. Some doors can learn what information to keep or forget during training.

The figure above shows a cell cell of LSTM, where the upper part of its input segment is CT − 1C_ {T-1} CT −1, which is the state of the previous cell, and the lower part is HT − 1H_ {T-1} HT −1, which is the hidden state of the output of the previous cell. After passing through the whole cell, the upper level outputs the state cTC_TCT of the cell, and the lower level outputs the hidden state hTH_tht of the cell.

2.2.1 Forget Gate

Oblivion gate determines which information should be discarded or retained. Information from previously hidden states and information from the current input are passed through the sigmoid function. Values are close to 0 and 1, where closer to 0 means to forget, and closer to 1 means to hold.

As shown in the figure above, the hidden layer output HT − 1H_ {T-1} HT −1 and the input xtx_{t}xt of the unit are added to enter the forgetting gate. After a sigmoid function, ftF_tFT is obtained. The value of this value determines whether the information of this combination needs to be discarded. At the same time, the unit state entered into the upper layer is dotted with CT − 1C_ {T-1} CT −1.

2.2.2 Input Gate

The input gate updates the cell state, passing the sigmoid and TANh functions to the previously hidden state and the current input, respectively. And then you multiply the outputs of the two functions.

As shown in the figure above, ht−1+xth_{t-1}+ X_tht −1+xt is multiplied again after the input gate is mapped by the two activation functions. The result is added to ct−1×ftc_{t-1} \times f_tct−1×ft to obtain the state cTC_tct of this cell.

2.2.3 Output Gate

The output gate determines what the next hidden state should be and can be used for prediction. First, the previously hidden state and the current input HT −1+ xTH_ {t-1}+ X_THt −1+ xT are passed to the sigmoid function, then the newly modified cell state CTC_TCT is passed to the TANh function, and the results are multiplied. The output is the hidden state, and then the new cell state and the new hidden state are moved to the next time series.

2.2.4 Mathematical representation

According to the previous graphical operation, our topic summarizes the mathematical description of LSTM:


c t = z f Even though c t 1 + z i Even though z c^{t}=z^{f}\odot c^{t-1}+z^{i}\odot z

Where CTC ^ TCT is the current cell state, ZFZ ^ FZF is the forgetting gate, and Ziz ^izi and ZZZ are the two operations of the output gate. Represents the forgetting phase of THE LSTM, selectively forgetting the input that was passed in the previous session.


h t = z o Even though t a n h ( c t ) h^{t}=z^{o}\odot tanh(c^{t})

Where HTH ^ THT indicates the current hidden state, zoz^ozo indicates the previous operation in the output gate. Represents the LSTM selection memory stage, the input XTX ^ TXT selection memory. Write down what is important, and write down less of what is not.


y t = sigma ( W h t ) y^{t}=\sigma \left ( W{}’h^{t} \right )

Represents the input phase of LSTM, obtained by some changes to the current hidden state HTH ^ THT.

LSTM Timing project development process:

  1. Data preprocessing, and the data set into time series data in the form of sliding window;
  2. Build LSTM model (LSTM layer, dense layer, optimizer, loss function);
  3. Fitting model, it is best to set the timing preservation model to obtain the best fitting model;
  4. Make predictions about future time data.

Refer to the article

  1. Cloud.tencent.com/developer/a…
  2. www.jianshu.com/p/f547bb4b5…