Original link:tecdat.cn/?p=6129
introduce
Finite-mix models are useful when applied to data where observations are made from different populations and the population membership is unknown.
Simulated data
First, we’ll simulate some data. Let’s simulate two normal distributions – one with an average of 0 and the other with an average of 50, both with a standard deviation of 5.
m1 <- 0
m2 <- 50
sd1 <- sd2 <- 5
N1 <- 100
N2 <- 10
a <- rnorm(n=N1, mean=m1, sd=sd1)
b <- rnorm(n=N2, mean=m2, sd=sd2)
Copy the code
Now let’s “mix” the data together……
print(table(clusters(flexfit), data$class))
##
## 1 2
## 1 100 0
## 2 0 10
Copy the code
What about the parameters?
cat('pred:', c1[1], '\n') cat('true:', m1, '\n\n') cat('pred:', c1[2], '\n') cat('true:', sd1, '\n\n') cat('pred:', C2 [1], '\ n') the cat (' true: ', m2, '\ n \ n') the cat (' Mr Pred: ', c2 [2], '\ n') the cat (' true: 'sd2,' \ n \ n ') # # Mr Pred: 0.5613484 # # true: 0 ## ## pred: 4.799484 ## true: 5 ## ## pred: 52.86911 ## true: 50 ## ## pred: 6.89413 ## true: 5Copy the code
Let’s visualize the real data and the hybrid model we fit.
ggplot(data) + geom_histogram(aes(x, .. density..) , binwidth = 1, colour = "black", fill = "white") + stat_function(geom = "line", fun = plot_mix_comps, args = list(c1[1], c1[2], lam[1]/sum(lam)), stat_function(geom = "line", fun = plot_mix_comps, args = list(c2[1], C2 [2], lam, [2] / sum (lam)), see colour = "blue", LWD = 1.5) + ylab (" Density ")Copy the code
Looks like we’re doing great!
example
Now, let’s consider a real-world example of a iris with petal width.
p <- ggplot(iris, aes(x = Petal.Width)) + geom_histogram(aes(x = Petal.Width, .. density..) , binwidth = 0.1, colour = "black", fill = "white") pCopy the code
flexfit <- flexmix(Petal.Width ~ 1, data = iris, k = 3, model = list(mo1, mo2, mo3)) print(table(clusters(flexfit), iris$Species)) ## ## setosa versicolor virginica ## 1 0 2 46 ## 2 0 48 4 ## 3 50 0 0 geom_histogram(aes(x = Petal.Width, . density..) , binWidth = 0.1, colour = "black", fill = "white") + args = list(c1[1], c1[2], lam[1]/sum(lam)), colour = "red", Stat_function (geom = "line", fun = plot_mix_comps, args = list(c2[1], C2 [2], lam[2]/sum(lam)), stat_function(geom = "line", fun = plot_mix_comps, args = list(c3[1], c3[2], lam[3]/sum(lam)), colour = "green", LWD = 0.5) + ylab("Density")Copy the code
Even if we do not know the underlying species allocation, we can make certain statements about the basic distribution of petal widths.