Learning notes
Bibliography: Econometrics, Time Series Analysis and Application of R Language, Fundamentals of Econometrics, Econometric Models and Application of R Language
Autoregressive AR model and its stability conditions
Why is steadiness so important
We know that if random time series XtX_tXt has the following properties:
① Mean value E(Xt)=μE(X_{t})=\muE(Xt)=μ. Mean value is a constant independent of time TTT
Var(Xt)=σ2Var(X_{t})=\sigma^2Var(Xt)=σ2, the variance is a constant independent of time TTT
(3) covariance Cov (Xt, Xt + k) = gamma kCov (X_ {t}, X_} {t + k) = \ gamma_kCov (Xt, Xt + k) = gamma k, covariance is only related to time interval KKK.
It is said that random time series XtX_tXt has weak stationary type.
At this point, if we move the origin of XXX from XtX_tXt to Xt+mX_{t+m}Xt+m, then Xt+mX_{t+m}Xt+m has the same mean, variance, and autocovariance as XtX_tXt. In short, if a time series is stationary, its mean, variance, and autocovariance remain constant no matter what time it is measured, i.e. they do not change over time. Such a stationary time series has a tendency to return to its mean, and the fluctuation around its mean has a roughly constant amplitude. It should be pointed out that the speed of mean restoration (i.e. return to mean) of a stationary process depends on its self-covariance. The smaller the self-covariance is, the faster it will be; the larger the self-covariance is, the slower it will be.
If a process is nonstationary, then either its mean changes over time, or its variance changes over time, or both.
Why are stationary time series so important? Because if a sequence is nonstationary, then we can only study its behavior during the study period, and we can’t generalize it to other periods. Therefore, from the point of view of prediction, if the sequence is not stationary, there is no research value.
Autoregressive AR model
After discussing the importance of stationary time series, the next practical problem is how to build a stationary time series model, and how to use the model to predict. Different from classical regression analysis, the time series model established here is not mainly based on the causal relationship between different variables, but looking for the change rule of the time series itself. Similarly, in predicting future changes in a time series, we no longer use a set of other variables that are causally related to it, but simply use the past behavior of the series to predict the future.
First order autoregressive process
The simplest example of using past behavior to predict the future is a first-order autoregressive model, abbreviated as AR(1)AR(1)AR(1) :
Where ete_TET is white noise, it is assumed that ete_TET is independent of Yt−1,Yt−2,Yt−3… Y_{t-1}, Y_{t-2}, Y_{t-3},… Yt – 1, Yt – 2, Yt – 3,… , assuming that the mean of the process has been removed and the mean of the sequence is zero.
We know that this model makes sense only if the production process is stable. So we first study the stationary condition of the process.
For the first-order autoregressive model, there is the following recurrence:
It can be seen that equation (1) of the first-order autoregression process can be expressed as a linear combination of white noise sequences.
Due to E (et) = 0 E (e_t) = 0 E (et) = 0, so the E (Yt) = 0 E (Y_t) = 0 E (Yt) = 0, the steady condition was established.
Take the variance of both sides of Equation (2) :
If and only if ∣ ϕ ∣ < 1 | \ phi | < 1 ∣ ϕ ∣ < 1, (3) is:
Is only satisfy ∣ ϕ ∣ < 1 | \ phi | < 1 ∣ ϕ ∣ < 1, the condition of stable condition (2).
Meanwhile, for Equation (2) :
Then the covariance of YtY_tYt and Yt+kY_{t+ K}Yt+ K is:
If and only if ∣ ϕ ∣ < 1 | \ phi | < 1 ∣ ϕ ∣ < 1, (6) is:
Equation (7) shows that cov(Yt,Yt+k)cov(Y_t,Y_{t+ K})cov(Yt,Yt+k) is only related to time interval KKK, but not to time point TTT, then the stationary condition ③ is established.
In conclusion, eTE_TET is independent of Yt−1,Yt−2,Yt−3… Y_{t-1}, Y_{t-2}, Y_{t-3},… Yt – 1, Yt – 2, Yt – 3,… And the sigma e2 > 0 \ sigma_e ^ 2 sigma e2 > > 0 0, if and only if ∣ ϕ ∣ < 1 | \ phi | < 1 ∣ ϕ ∣ < 1 AR (1) AR (1) AR (1) recursive definition of solution is smooth, ∣ ϕ ∣ < 1 | \ phi | < 1 ∣ ϕ ∣ < 1 is often referred to as AR (1) AR (1) AR (1) the process of steady condition.
Second order autoregressive process
Consider the sequence satisfying the following equation:
Where ete_tet is white noise, it is assumed that ete_TET is independent of Yt−1,Yt−2,Yt−3… Y_{t-1}, Y_{t-2}, Y_{t-3}… Yt – 1, Yt – 2, Yt – 3…
In order to prove the stationarity, AR characteristic polynomial is introduced:
And the corresponding AR characteristic equation:
A quadratic equation always has two roots (and possibly complex roots), which are called characteristic roots.
It can be proved that when ete_TET is independent of Yt−1,Yt−2Yt−3… Y_{t-1},Y_{t-2}Y_{t-3}… Yt – 1, Yt Yt – 2-3… The stationary solution of equation (10) exists only when the absolute value (modulus) of the root of AR characteristic equation is greater than 1. This result, without any change, can be generalized to order P.
In the second-order autoregressive model, the roots of the quadratic characteristic equation can be easily found as:
In order to satisfy the stationary condition, the absolute value of the root is required to be greater than, if and only if the following conditions are met:
Like AR(1)AR(1)AR(1) we call this the stationary condition for the AR(2)AR(2) model.
The following figure shows the stationary region of AR(2)AR(2) model:
General autoregressive process
Consider the p-order autoregressive model:
There are AR characteristic polynomials:
And the corresponding characteristic equation:
Suppose ete_TET is independent of Yt−1,Yt−2,Yt−3… Y_{t-1}, Y_{t-2}, Y_{t-3}… Yt – 1, Yt – 2, Yt – 3… Equation (14) has a stationary solution only when the absolute value (modulus) of each root of the AR characteristic equation is greater than 1. In order to ensure that the modulus of the root of the characteristic equation is greater than 1, the following two inequalities are necessary but not sufficient conditions:
The following inequalities are sufficient conditions:
R language implementation
Now we simulate the second-order autoregressive process and test its stationarity:
library(tseries)
set.seed(1236)
data01 <- arima.sim(n = 50.list(ar =c(0.8, -0.5)))
plot(data01, main = 'Sequence diagram', type='o')
adf.test(data01)
Copy the code
Sequence diagram:
Unit root test results:
Augmented Dickey-Fuller Test
data: data01
Dickey-Fuller = -3.8032, Lag order = 3, p-value = 0.02523
alternative hypothesis: stationary
Copy the code
It is found that the significance level of P value is less than 0.05, and the null hypothesis is rejected, so the second-order autoregression process is stable.
Postscript: This Blog has been modified twice. When simulating the second-order AR model, the autoregressive coefficients were written as 0.8 and 0.5, which did not meet the stability condition. When simulating with R, the program should have reported errors. The result is 0.8 and 0.5????? when writing the Blog Therefore, take warning