Original link:tecdat.cn/?p=6962
Original source:Tuo End number according to the tribe public number
Suppose you have time series data, as shown below. Experience shows that the target variable Y seems to be related to the explanatory variable X. However, at first glance, Y fluctuates between levels, so it doesn’t always seem to have a stable relationship (with multiple states behind it).
The sample data above is created as follows. The relationship between x and y changes over time.
x <- rpois(500, lambda = 10)
y1 <- x * 4 + 20
y2 <- x * 2 + 60
noise <- rnorm(1:500, mean = 10, sd = 5)
y1 <- y1 + noise
y2 <- y2 + noise
y <- c(y1[1:200], y2[201:400], y1[401:500])
observed <- data.frame(x = x, y = y)
Copy the code
The relationship between x and y1 and y2 is shown below.
data
In the Markov transformation model, the observed data is thought to be generated from several states and can be well separated as shown above.
Observed data
Create a Markov transformation model
Model formula
# Call: # lm(formula = y ~ x, data = observed) # # Residuals: # Min 1Q Median 3Q Max # -24.303-9.354-1.914 9.617 29.224 # # Coefficients: # Estimate Std. Error t value (Pr > | | t) # (Intercept) 45.7468 1.7202 26.59 < 2-16 * * * e # 3.2262 0.1636 19.71 x < 2-16 * * * e # # # - Signif. Codes: 0 '* * *' 0.001 '* *' 0.01 '*' 0.05 '. '0.1' '1 # # Residual standard error: 11.51 on 498 degrees of freedom # Multiple R-squared: 0.4383, Adjusted R-squared: 0.4372 # f-statistic: 388.7 on 1 and 498 DF, p-value: < 2.2e-16Copy the code
Parameter meaning is
k
: The number of states of the Markov transformation model. In this case, it is specified as having two states following it.sw
: Specifies whether each parameter changes when the state changesp
: AR model coefficientfamily
Family of probability distributions (in the case of GLM)
# # Markov transformation models # # AIC BIC logLik # 3038.846 3101.397-1513.423 # # Coefficients: 1 # # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 69.3263 4.0606 17.0729 < 2 e - * * * # 16 X (S) 2.1795 0.1187 18.3614 <2e-16 *** # y_1(S) -0.0103 0.0429-0.2401 0 '***' 0.001 '**' 0.01 '*' 0.05 '1 # # Residual standard error: Multiple r-squared: 0.6288 # # Standardized Residuals: # Min Q1 Med Q3 Max # -1.431396e+ 01-2.056292E-02 + 1.536781E-03 + 1.098923E-05 + 584478e # # # Regime 2 # --------- # T value Estimate Std. Error (Pr > | | t) # (Intercept) (S) 30.2820 1.7687 17.1210 < 2 e - * * * # 16 x 3.9964 0.0913 43.7722 (S) < 2 e - * * * # 16 y_1 (S) to 0.0045 0.0203 0.2217 0.8245 # # - Signif... codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '1 # # Residual standard error: Multiple r-squared: # # Standardized Residuals: # Min Q1 Med Q3 Max # -13.202056966-0.771854514 0.002211602 1.162769110 12.417873232 # # Transition probabilities: # Regime 2 # Regime 1 0.994973376 0.003347279 # Regime 2 0.005026624 0.996652721Copy the code
Extents 1 and 2 in the output represent two states of the model.
1 # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 69.3263 4.0606 17.0729 < 2-16 # * * * e x (S) 2.1795 0.1187 18.3614 <2e-16 *** # y_1(S) -0.0103 0.0429-0.2401Copy the code
You can see that block 2 matches y1 < -x * 4 + 20.
From the adjusted R square value, it has improved on the whole.
2 # # Regime -- -- -- -- -- -- -- -- - # Estimate Std. Error t value Pr (> | | t) # (Intercept) (S) 30.2820 1.7687 17.1210 < 2-16 # * * * e x (S) 3.9964 0.0913 43.7722 <2e-16 *** # y_1(S) -0.0045 0.0203-0.2217Copy the code
model
For each state, the probability of being in that state is shaded
The probability at each point in time
Get the status and change point each time
If you want to know your regime at a particular point in time, choose the one with the highest probability at that point.
> probable [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [30] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2...Copy the code
Outliers/change points are when the state changes
c(FALSE, diff(probable) ! = 0) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [11] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [191] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [201] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [381] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [391] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ... [491] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSECopy the code
Therefore, we can see that the point of change specified during the first data creation is detected.