Original link:tecdat.cn/?p=6962

Original source:Tuo End number according to the tribe public number

 

 

Suppose you have time series data, as shown below. Experience shows that the target variable Y seems to be related to the explanatory variable X. However, at first glance, Y fluctuates between levels, so it doesn’t always seem to have a stable relationship (with multiple states behind it).

The sample data above is created as follows. The relationship between x and y changes over time.

 

x <- rpois(500, lambda = 10)  
y1 <- x * 4 + 20     
y2 <- x * 2 + 60    

 
noise <- rnorm(1:500, mean = 10, sd = 5)
y1 <- y1 + noise
y2 <- y2 + noise

 y <- c(y1[1:200], y2[201:400], y1[401:500])
 observed <- data.frame(x = x, y = y)
Copy the code

The relationship between x and y1 and y2 is shown below.

data

 

In the Markov transformation model, the observed data is thought to be generated from several states and can be well separated as shown above.

Observed data

Create a Markov transformation model

 

Model formula

 

# Call: # lm(formula = y ~ x, data = observed) # # Residuals: # Min 1Q Median 3Q Max # -24.303-9.354-1.914 9.617 29.224 # # Coefficients: # Estimate Std. Error t value (Pr > | | t) # (Intercept) 45.7468 1.7202 26.59 < 2-16 * * * e # 3.2262 0.1636 19.71 x < 2-16 * * * e # # # - Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1' '1 # # Residual standard error: 11.51 on 498 degrees of freedom # Multiple R-squared: 0.4383, Adjusted R-squared: 0.4372 # f-statistic: 388.7 on 1 and 498 DF, p-value: < 2.2e-16Copy the code

 

Parameter meaning is

  • k: The number of states of the Markov transformation model. In this case, it is specified as having two states following it.
  • sw: Specifies whether each parameter changes when the state changes
  • p: AR model coefficient
  • familyFamily of probability distributions (in the case of GLM)
# # Markov transformation models # # AIC BIC logLik # 3038.846 3101.397-1513.423 # # Coefficients: 1 # # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 69.3263 4.0606 17.0729 < 2 e - * * * # 16 X (S) 2.1795 0.1187 18.3614 <2e-16 *** # y_1(S) -0.0103 0.0429-0.2401 0 '***' 0.001 '**' 0.01 '*' 0.05 '1 # # Residual standard error: Multiple r-squared: 0.6288 # # Standardized Residuals: # Min Q1 Med Q3 Max # -1.431396e+ 01-2.056292E-02 + 1.536781E-03 + 1.098923E-05 + 584478e # # # Regime 2 # --------- # T value Estimate Std. Error (Pr > | | t) # (Intercept) (S) 30.2820 1.7687 17.1210 < 2 e - * * * # 16 x 3.9964 0.0913 43.7722 (S) < 2 e - * * * # 16 y_1 (S) to 0.0045 0.0203 0.2217 0.8245 # # - Signif... codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '1 # # Residual standard error: Multiple r-squared: # # Standardized Residuals: # Min Q1 Med Q3 Max # -13.202056966-0.771854514 0.002211602 1.162769110 12.417873232 # # Transition probabilities: # Regime 2 # Regime 1 0.994973376 0.003347279 # Regime 2 0.005026624 0.996652721Copy the code

Extents 1 and 2 in the output represent two states of the model.

1 # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 69.3263 4.0606 17.0729 < 2-16 # * * * e x (S) 2.1795 0.1187 18.3614 <2e-16 *** # y_1(S) -0.0103 0.0429-0.2401Copy the code

You can see that block 2 matches y1 < -x * 4 + 20.

From the adjusted R square value, it has improved on the whole.

2 # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 30.2820 1.7687 17.1210 < 2-16 # * * * e x (S) 3.9964 0.0913 43.7722 <2e-16 *** # y_1(S) -0.0045 0.0203-0.2217Copy the code

model

For each state, the probability of being in that state is shaded

The probability at each point in time

Get the status and change point each time

If you want to know your regime at a particular point in time, choose the one with the highest probability at that point.

> probable [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2...Copy the code

Outliers/change points are when the state changes

c(FALSE, diff(probable) ! = 0) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [11] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [191] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [201] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [381] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [391] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [491] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSECopy the code

Therefore, we can see that the point of change specified during the first data creation is detected.