Original link:tecdat.cn/?p=22849

Original source:Tuo End number according to the tribe public number

When it comes to selecting the most appropriate prediction model or method for data, forecasters usually divide the available sample into two parts: in-sample (also known as the “training set”) and reserved sample (or out-sample, or “test set”). Then, the model is estimated in the sample and some error indicators are used to evaluate its predictive performance.

If such a procedure is done only once, it is called a “fixed origin” evaluation. However, time series may contain outliers, and a poor model may perform better than a more suitable one. To enhance the evaluation of the model, we use a method called “rolling origin”.

The rolling origin is a prediction method according to which the prediction origin is continuously updated and the prediction is generated from each origin (Tashman 2000). This approach allows the prediction errors of several time series to be obtained to better understand the performance of the model.

How do you do that?

The following figure illustrates the basic idea of a rolling origin. The white cells correspond to the in-sample data, while the light gray cells correspond to the predictions of the first three steps. The time series in this figure has 25 observations, and the prediction is generated from 8 origin, starting from origin 15. The model is reestimated at each iteration and produces predictive results. Later, a new observation is added at the end of the series, and the process continues. The process stops when there is no more data to add. This can be thought of as a rolling origin with a fixed retention sample size. The result of this program is eight one – to three-step predictions. Based on this, we can calculate the error measurement method and select the best performing model.

Another option for generating predictions from eight origins is to start at origin 17 instead of 15 (see figure below). In this case, the program continues until origin 22, when the last three-step advance prediction is generated, and then continues with a decreasing prediction range. Thus, the two-step prediction is generated from origin 23, and only one step prediction is generated from origin 24. Thus, we get eight one-step predictions, seven two-step predictions and six three-step predictions. This can be thought of as a rolling origin with an unfixed retention sample size. Can be used in the case of small samples, when we don’t have extra observations.

Finally, in both cases, our sample size is increasing. For some research purposes, however, we may need a constant internal sample. The chart below shows such a situation. In this case, in each iteration, we add one observation at the end of the series and remove one observation (dark gray cell) from the beginning of the series.

R implementation: unary time series ARIMA case

R implements rolling origin estimation for any function, has a predefined call, and returns the expected value.

Let’s start with a simple example, generating a sequence from a normal distribution. ` `

X < - rnorm (100100, 10)Copy the code

We use ARIMA(0,1,1) in this example.

Predict (arima (x = data, order = c (0,1,1)), n.a. head = hCopy the code

The call includes two important elements: data and h. Data specifies the position of the in-sample value in the function we will use. H will tell our function that specifies the range of predictions in the selected function. In this example, we use arima(x=data,order=c(0,1,1)) to produce the desired arima(0,1,1) model, and then we use predict(… ,n. ahead=h), a prediction is generated from the model.

You also need to specify what the function should return. These can be conditional averages (point predictions), prediction intervals, and parameters of the model. However, depending on what is returned by the function you use, the scroll forecast returns a little differently. If it is a vector, then the scroll prediction will produce a matrix (columns with values for each origin). If it is a matrix, then an array is returned. Finally, if it is a list, a list of lists is returned.

We first collected the conditional mean from the predict() function. ` `

We can use rolling origin to generate predictions from the model. For example, we want a three-step prediction and eight origins, default values for all other parameters. ` `

predro(x, h , orig )
Copy the code

This function returns a list of all the values we asked for, plus the actual values of the retained sample. From these values we can calculate some basic indicators of error, for example, the average absolute error in proportion. ` `

apply(abs(holdo - pred),1,mean) / mean(actual)
Copy the code

` `

In this example, we use the apply() function to distinguish between different forecast periods and see how the model behaves in each forecast period. In a similar way, we can evaluate the performance of some other models and compare them with the errors generated by the first model. These numbers don’t say much by themselves, but if we compare the performance of this model to another model, then we can infer whether one model fits the data better than the other.

We can also plot the predicted results from the rolling origin. ` `

plot(Values1)
Copy the code

In this example, the predictions from different sources are close to each other. This is because the data is stationary and the model is fairly stable.

If we look at the returned matrices, we will notice that they contain missing values. ` `

` `

This is because by default, the reserved sample is set to nonconstant. The inner sample is also set to nonconstant, which is why the model reestimates the increased sample at each iteration. We can modify this. ` `

predro(x, h , ori )
Copy the code

Note that the value of return2 is not directly comparable to the value of returN1 because they are generated from different starting points. We can see that when we plot it. ` `

plot(returned2)
Copy the code

If you use the functions in the prediction package, you can modify the call and return values in the following ways.

 "forecast(ets(data) ,level=95"
 c("mean","lower","upper")
Copy the code

Multivariate time series ARIMA case

When you have a model and a time series, rolling predictions is a handy way to do it. But what if you need to apply different models to different time series? We’re going to need a loop. In this case, there is an easy way to use rolling predictions. Now let’s introduce some time series.

For this example, we need an array of return values. ` `

Array (NA, c (3,2,3,8))Copy the code

Here we will have 3 time series, 2 models and 3 step forward predictions from 8 sources. Our models will be kept in a separate list. In this example, we will have ARIMA (0,1,1) and ARIMA (1,1,0). ` `

The list (c (0,1,1), c (1, 0))Copy the code

We return the same predicted value from the function, but we need to change the way we call it, because now we have to take these two different models into account. ` `

"predict(arima(data,Models[[i]])ahead=h)"
Copy the code

Instead of specifying the model directly, we use the ith element in the list.

We also want to save the actual values from the retained sample so that we can calculate the error.

This array has three time series and three predictive dimensions from eight origins.

Finally, we can write a loop and produce a prediction. ` `

for(j in 1:3)  for(i in 1:2)predro(data, h , or=8)
Copy the code

Compare the performance of the two in different time series. ` `

exp(mean(log(apply(Holdout - Fore  / apply(abs(Holdout - Fore ))
Copy the code

` `

Therefore, based on these results, it can be concluded that ARIMA (0,1,1) is, on average, more accurate than ARIMA (1,1,0) on our three time series.

Linear regression and the ARIMAX case

For our final example, we create a data box and fit a linear regression.

` `

Note that in this example, the regression implemented in the LM () function relies on the data framework and does not use the prediction range. ` `

predict(lm(y~x1+x2+x3,xre),newdat
Copy the code

In addition, the function predict.lm() returns a matrix with values, not a list. Finally, roll prediction is called. ` `

pred(y, h , ori  )
Copy the code

In this case, we need to provide the dependent variable in the data parameters of the call, because the function needs to extract the value of the Holdout.

predict(lm( xreg ,new =xreg "
predro( $y, h , or  )
plot( Return)
Copy the code

As a final example, consider the ARIMAX model for the following data. ` `

And modify the call accordingly. ` `

OurCall <- "predict(arima(x=data, order=c(0,1,1), xreg=xreg[counti,]), n =h, newxreg=xreg[counto,])"Copy the code

Considering that we are now dealing with ARIMA, we need to specify both the data and h. In addition, XREG differs from the previous examples in that it should not now contain dependent variables.

If you use the ETSX model, the call can be simplified to: ‘ ‘

 "es(x=dat, xreg, h=h"
Copy the code

Finally, all of the examples mentioned above can be done in parallel, especially if the data is very large and the sample size is large.

reference

Andrey, Davydenko, and Robert Fildes. 2013. “Measuring Forecasting Accuracy: The Case of Judgmental Adjustments to Sku-level Demand Forecasts. “International Journal of Forecasting 29 (3). Elsevier B.V. : 510-22. Doi.org/10.1016/j.i… .

Petropoulos, Fotios, Nikolaos Kourentzes. 2015. “Forecast fourth generation for intermittent demand.” Journal of the Operational Research Society 66 (6). Nature Publishing Group: 914 — 24 doi.org/10.1057/jor… .


Most welcome insight

1. Use LSTM and PyTorch for time series prediction in Python

2. Long and short-term memory model LSTM is used in Python for time series prediction analysis

3. Time series (ARIMA, exponential smoothing) analysis using R language

4. R language multivariate Copula – Garch – model time series prediction

5. R language Copulas and financial time series cases

6. Use R language random wave model SV to process random fluctuations in time series

7. Tar threshold autoregressive model for R language time series

8. R language K-Shape time series clustering method for stock price time series clustering

Python3 uses ARIMA model for time series prediction