Original link:tecdat.cn/?p=2831

Original source:Tuo End number according to the tribe public number

 

Forecasting is very difficult, especially about the future. Neils Bohr, Danish physicist

A lot of people will see this quote. Prediction is the theme of this blog post. In this article, we will introduce the popular ARIMA forecasting model to predict stock returns and demonstrate the step-by-step process of ARIMA modeling using R programming.

What is the prediction model in time series?

Prediction involves predicting the value of a variable using its historical data points, or it can also involve predicting a change in a variable given a change in the value of another variable. The forecasting methods are mainly divided into qualitative forecasting and quantitative forecasting. Time series prediction belongs to the category of quantitative prediction, in which statistical principles and concepts are applied to given historical data of a variable to predict the future value of the same variable. Some of the time series prediction techniques used include:

  • Autoregressive model (AR)
  • Moving Average Model (MA)
  • Seasonal regression model
  • Distributed lag model

What is autoregressive moving Average model (ARIMA)?

ARIMA stands for Autoregressive Integrated Moving Average. ARIMA is also known as the box-Jenkins method. Box and Jenkins claimed that non-stationary data could be made stationary by differentiating series Y t.

ARIMA model combines three basic approaches:

  • Autoregressions (AR) – The values in autoregressions of a given time series data in their own lag, which are represented by the “P” value in the model.
  • Differential (I-for Integrated) – This involves differentiating time series data to eliminate trends and convert non-stationary time series to stationary time series. This is represented by the “D” value in the model. If d = 1, look at the difference between the two time series entries, if d = 2, look at the difference obtained at d = 1, and so on.
  • Moving Average (MA) – The nature of the model’s moving average is represented by the “Q” value, which is the number of lag values of the error term.

The model is called autoregressive integrated moving average or YT ARIMA (P, D, Q). We will follow the steps listed below to build our model.

Step 1: Test and ensure smoothness

To model a time series using the Box-Jenkins method, the series must be stationary. Stationary time series represent time series with no trend, one of which has a constant mean and variance over time, which makes it easy to predict values.

** Test stationarity -** We tested stationarity using Augmented Dickey-Fuller unit root tests. For stationary time series, p values obtained by ADF tests must be less than 0.05 or 5%. If the p value is greater than 0.05 or 5%, it can be concluded that the time series has a unit root, which means that it is a non-stationary process.

** Difference -** To convert a non-stationary process to a stationary one, we apply the difference method. Distinguishing time series means finding the difference between the continuous values of time series data. The difference score forms a new time series data set that can be tested to discover new correlations or other interesting statistical properties.

We can continuously apply the difference method for many times, producing “first order difference”, “second order difference” and so on.

Before we proceed to the next step, we apply the appropriate difference order (d) to stabilize the time series.

Step 2: Identify P and Q

In this step, we determine the appropriate order of autoregressive (AR) and moving average (MA) processes by using autocorrelation functions (ACF) and partial correlation functions (PACF).

Identify p order of AR model

For AR models, ACF will decay exponentially and PACF will be used to identify the order (p) of the AR model. If we have a significant peak at lag 1 on PACF, then we have a first-order AR model, i.e. AR (1). If we have significant peaks of lag 1,2 and 3 on PACF, then we have a 3-order AR model, namely AR (3).

Identify order Q of MA model

For the MA model, PACF will decay exponentially and the ACF graph will be used to identify the order of the MA process. If we have a significant peak at lag 1 on ACF, then we have a MA model of order 1, i.e. MA (1). If we have significant peaks at lag 1,2 and 3 on ACF, then we have a MA model of order 3, i.e. MA (3).

Step 3: Estimate and forecast

Once we have identified the parameters (P, D, q), we can estimate the accuracy of the ARIMA model on the training data set and then use the fit model to predict the values of the test data set using the prediction function. Finally, we cross-check that our predicted value is consistent with the actual value.

Use R programming to build the ARIMA model

Now, let’s follow the explained steps to build the ARIMA model in R. There are many software packages available for time series analysis and prediction. We loaded the associated R package for time series analysis and extracted stock data from Yahoo Finance.

GetSymbols (' techm.ns', from ='2012-01-01', to =' 2015-01-01') # select the corresponding closing price sequence stock_prices = techm.ns [, 4]Copy the code

In the next step, we calculate the logarithmic returns of stocks because we expect ARIMA models to predict logarithmic returns rather than stock prices. We also plot the logarithmic payoff sequence using a plot function.

Stock = diff (log (stock_prices), lag = 1) plot (stock, type ='l', main ='log return plot')Copy the code

Next, we invoke ADF tests on the revenue sequence data to check for stationarity. A P value of 0.01 from the ADF test tells us that the sequence is stationary. If the sequence is non-stationary, we first differentiate the regression sequence to make it stationary.

In the next step, we split the data set into two parts – training and testing

Acf.stock = acf (stock [c (1: breakpoint),], main =' acf Plot', lag.max = 100)Copy the code

We can look at these graphs and get autoregressive (AR) and moving average (MA) orders.

We know that for AR models, ACF will decay exponentially and the PACF graph will be used to identify the order (p) of AR models. For MA models, PACF will decay exponentially and the ACF graph will be used to identify the order (q) of the MA model. From these graphs we choose order AR = 2 and order MA = 2. Therefore, our ARIMA parameter will be (2,0,2).

Our goal is to predict the entire sequence of returns from the break point. We will use a For loop statement in R, where we predict the profit value For each data point in the test data set.

In the code shown below, we first initialize a sequence that will store the actual benefits and another series that will store the predicted benefits. In the For loop, we first divide the training data set and the test data set according to the dynamic split point.

We call the arima function on the training data set, which specifies order (2,0,2). We use this fitting model to predict the next data point by using the Forecast.Arima function. This feature is set to 99% confidence level. Confidence parameters can be used to enhance the model. We will use prediction point estimation in the model. The “h” parameter in the prediction function indicates the number of values we want to predict.

We can use the summary function to confirm that the results of the ARIMA model are within the acceptable range. In the last section, we append each forecast and actual return to the forecast and actual return sequence, respectively.

Actual_series = XTS (0, as.Date("2014-11-25", "%Y-%m-%d")) Order = c(2,0,2), include.mean = FALSE) Arima. Forecast = Forecast.Arima(fit, h = 1, Forecasted_series = rbind(forecasted_series, forecasted_series, Actual_series = C (Actual_series, XTS (Actual_return)) RM(Actual_return)Copy the code

Before we move to the final part of the code, let’s examine the results of the ARIMA model from the test dataset to get the sample data points.

From the coefficient obtained, the return equation can be written as:

Y t = 0.6072 * Y (t-1) -0.8818 * Y (t-2) -0.5447ε (t-1) +0.8972ε (t-2)

The coefficients give the standard error, which needs to be within an acceptable range. Akaike Information Standard (AIC) score is a good indicator of the accuracy of ARIMA model. The model better reduced AIC scores. We can also look at the ACF diagram of residuals; A good ARIMA model will have less autocorrelation than the threshold limit. The predicted point gain of -0.001326978 is given in the last line of the output.

Let’s check the accuracy of ARIMA’s model by comparing predicted returns with actual returns. The last part of the code calculates this accuracy information.

Actual_series = Actual_series [-1] # forecasted_series = XTS (forecasted_series, Index (Actual_series)) # create a table for the accuracy of the forecast = merge(Actual_series) $Accuracy = sign(comparsion $Actual_series)== sign(comparsion $Precasted) # Accuracy_percentage = sum(comparsion $ Accuracy == 1)* 100 / length(comparsion $ Accuracy)Copy the code

The accuracy rate of the model is about 55%. You can try to run the model to get other possible combinations of (p, d, q), or use the auto-. arima function to choose the best best parameters to run the model.

conclusion

Finally, in this paper, we introduce the ARIMA model and apply it to predict stock price returns using the R programming language. We also checked our forecasts against actual earnings.



Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction