Original link:tecdat.cn/?p=22609 

Original source:Tuo End number according to the tribe public number

 

Abstract

This paper provides a set of methods for analyzing various finite mixture models. It includes both traditional methods, such as EM algorithm of univariate and multivariate normal mixture, and some new methods reflecting finite mixture model. Many algorithms are EM algorithms or based on em-like ideas, so this article includes an overview of EM algorithms for finite mixed models.

 

1. Introduction to finite mixing model

Individuals in a crowd can often be divided into groups. However, even if we observe the characteristics of these individuals, we may not actually observe the group of members. This task is sometimes referred to in the literature as “unsupervised clustering”, and in fact hybrid models can generally be thought of as consisting of a subset of clustering methods called “model-based clustering”.

Finite-mixing models can also be used in cases where individual clustering is of interest. First, the finite-mixing model gives a description of the entire subgroup rather than assigning individuals to these subgroups. Sometimes finite-mixture models simply provide a means of adequately describing a particular distribution, such as the residual distribution of outliers in a linear regression model. Whatever the modeler’s goal in adopting mixed models, most of the theory of these models involves the assumption that subgroups are distributed according to a particular parametric form — and this form tends to be univariate or multivariate normal.

Recent research aims to relax or modify the multivariate normal hypothesis, computational techniques for finite mixed model analysis, in which the components are regression, vectors generated by discretization of multivariate data, and even completely unspecified distributions.

2. EM algorithm for finite mixed model

 

The EM algorithm iteratively maximizes, instead of the observed logarithmic likelihood Lx(θ), the formula is

Step 1. E: calculate Q (theta | theta (t)) (2) M steps: setting the theta (t + 1) = argmax theta ∈ Φ Q (theta | theta (t))

For finite-mixed models, the E step does not depend on the structure of F, because the missing data part is only relevant to Z.

 

Z’s are discrete, and their distribution is given by Bayes’ theorem. The M step itself can be divided into two parts, the maximization with respect to λ, which does not depend on F, and the maximization with respect to φ, which must be handled specifically for each model (e.g., parameterized, semi-parameterized, or nonparameterized). Therefore, the EM algorithm of the model has the following common characteristics. Step 11. E. Calculate the posteriori probability of inclusion of the component (based on data and θ (t)).

 

For all I = 1,.., n and j = 1,.. Numerically, it is dangerous to write it exactly as in formula (2), because all φ (t) j 0 (xi) values will overflow to zero if XI is far from any of the components, and therefore may be of the indeterminate form 0/0. Therefore, many routines actually use equivalent expressions


Or some variation of it.

2. Step M of λ. set

2.3. An example of EM algorithm

As an example, we consider a univariate normal mixed analysis of the geyser eruption-interval wait data described in Figure 1. This fully parameterized condition corresponds to the mixed distribution of the univariate Gaussian family described in Section 1, where the JTH component density φj (x) in (1) is normal, with a mean of μj and a variance of σ 2j.

For M steps of parameters (µj, σ 2j), j = 1,.. This EM algorithm is simple for the M-step of this univariate mixed distribution, as can be found for example in McLachlan and Peel (2000).

 

mixEM(waiting, lambda = .5)
Copy the code

The code above will fit a two-component mixed distribution (because mu is a vector of length 2), where the standard deviations are assumed to be equal (because sigma is a scalar rather than a vector).

Figure 1: Sequence of logarithmic likelihood values, Lx(θ (t))

Figure 2: Geyser wait data fitting with parameterized EM algorithm. Gaussian component of the fit.

R> plot(wait1, density = TRUE, cex. Axis = 1.4, cex. Lab = 1.4, cex. Main = 1.8, + main2 = "Time between Old Faithful eruptions", xlab2 = "Minutes")Copy the code

Two graphs: the observed logarithmic likelihood values of the sequence T 7→Lx(θ (t)) and a histogram of data that has N(µj, σˆ 2 j) fitting m(here M =2) of the Gaussian component densities, j=1,.., m, and M, stacked together. Estimated theta ˆ

Alternatively, you can get the same output using summary.

summary(wait1)
Copy the code

3. Cutpoint methods

Traditionally, most literature on finite-mixed models assumes that the density function φj (x) of equation (1) comes from a known family of parameters. However, some authors have recently considered the problem that φj (x) is not specified except for some conditions required to ensure the identifiability of the parameters in the model. We use the cut-point method of Elmore et al. (2004). We refer to Elmore et al. ‘s use of pointcuts at approximately 10.5 intervals from -63 to 63. Then create a multi-metric dataset from the raw data, as shown below.

 

R> cutpts < -10.5 *(-6:6) R> mult(data, cuts = cutpts)Copy the code

Once the multi-indicator data is created, we can apply the EM algorithm to estimate the multi-indicator parameters. Finally, the estimated distribution function of the equation is calculated and plotted. Figure 3 shows a diagram of the 3-component and 4-component solutions; These graphs are very similar to the corresponding ones in Figures 1 and 2 by Elmore et al. (2004).

 

R> plot(data, posterior, LWD = 2, + main = "three-component solutions ")Copy the code

Figure 3 (a)

Figure 3 (b)

You can also summarize EM output with summary.

Semi – parametric examples of univariate symmetry and positional offset

Under the additional assumption that φ(-) is absolutely continuous relative to the Lebesgue metric, Bordes et al. (2007) propose a stochastic algorithm for estimating model parameters, i.e. (λ, µ, φ). A special case

R> plot(wait1, which = 2 )
R> wait2 <-EM(waiting)
R> plot(wait2, lty = 2)
Copy the code

Figure 4 (a)

 

Figure 4 (b)

 

Because the semi-parametric version relies on the kernel density estimation step (8), it is necessary to select a bandwidth for this step. By default, using “Silverman’s rule of thumb “(Silverman 1986) applies to the entire dataset.

R> bw.nrd0(wait)
Copy the code

But the choice of bandwidth makes a big difference, as shown in Figure 4(b).

> wait2a <- EM(wait, bw = 1)
> plot(wait2a
> plot(wait2b
Copy the code

We find that for bandwidths approaching 2, the semi-parametric solution looks very similar to the normal mixed distribution solution in FIG. 2. A further reduction in bandwidth results in the “bump” shown by the solid line in Figure 4(b). On the other hand, in the case of bandwidth 8, semi-parametric solutions work poorly because the algorithm tries to make each component look similar to the entire mixed distribution.


Most welcome insight

1.R language K-Shape algorithm stock price time series clustering

2. Comparison of different types of clustering methods in R language

3. K-medoids clustering modeling and GAM regression are performed for time series data of electricity load using R language

4. Hierarchical clustering of IRIS data set of R. language

5.Python Monte Carlo K-means clustering

6. Use R to conduct website comment text mining clustering

7. Python for NLP: Multi-label text LSTM neural network using Keras

8.R language for MNIST data set analysis and exploration of handwritten digital classification data

9.R language deep learning image classification based on Keras small data sets