Author: CHEONG
AI Machine Learning and Knowledge Graph
Research interests: Natural language processing and knowledge mapping
Before reading this article, note the following two points:
1. Machine learning series of articles often contain a large number of formula derivation and proof. In order to better understand, important conclusions of this paper will be given at the beginning of the article to facilitate the understanding of the core of this paper as soon as possible. You can look further down the line if you want to know more about the derivation.
2, the article contains a large number of formulas, if readers need to obtain the original Word document containing the formula, they can follow the public account [AI machine Learning and Knowledge Graph] and reply: GMM Second lecture, you can add wechat [17865190919] into the learning exchange group, add friends when the remarks come from nuggets. Original is not easy, reprint please inform and indicate the source!
This paper mainly introduces the Learning problem of Gaussian mixture model, discusses whether to use maximum likelihood estimation or EM algorithm, and the complete solution idea.
I. Problem definition
The following table shows a gaussian mixture model consisting of c1, C2… ,ckc_1,c_2,… ,c_kc1,c2,… Where the implicit variable ZZZ means that the probability of a sample belonging to the ith Gaussian distribution is PIp_IPi:
C over Gaussian | … | |||
---|---|---|---|---|
z | 1 | 2 | … | k |
P(z) | … |
According to the above table, the probability density function p(x)p(x)p(x) of gaussian mixture model is firstly given, and the formula is as follows:
Combined with the table data, p(z=ck)p(z=c_k)p(z=ck) is the probability value PKP_kpK, and when z= CKZ =c_kz=ck, the probability of XXX conforms to gaussian distribution, so the probability density function of gaussian mixture model can be expressed as:
Therefore, the problem solved in this paper is: the Learning problem of gaussian mixture model, that is, the parameter Learning of gaussian mixture model. Existing Observed Data X=(x1,x2… ,xk)X=(x_1,x_2,… ,x_k)X=(x1,x2,… ,xk), hidden variable ZZZ, Complete Data is (X,Z)(X,Z)(X,Z), combined with the above probability density function, the parameters to be learned are:
In the following part, we will discuss how to solve θ \Theta Theta.
Ii. Conclusions of this paper
Conclusion 1: The maximum likelihood estimation (MLE) is used in gaussian mixture model Learning problem, so the analytical solution cannot be obtained. Therefore, EM algorithm is needed to solve the approximate solution of gaussian mixture model Learning problem.
Conclusion 2: The solving process of EM algorithm is divided into E-step and M-step, and the formula derivation of the solving process is complicated, but it is necessary to grasp the main purpose of e-step and M-step, instead of falling into a large number of formulas first. The idea of EM algorithm is as follows: firstly, the model expectation Q(θ, θ (t))Q(\Theta,\Theta^{(t)})Q(θ, θ (t)) is solved by e-step, then the expectation is maximized by m-step, and the optimal parameter value θ \Theta θ is obtained when the iteration converges.
3. Why can’t maximum likelihood estimation solve GMM
The derivation process of solving gauss mixed model Learning problem using maximum likelihood estimation is given below:
According to the above formula, it can be concluded that there is a summation symbol behind log in the last derivation formula, and it cannot be solved further if there is a summation symbol in log. Therefore, the Gaussian mixture model cannot be solved analytically by MLE, but it can be solved by MLE for a single Gaussian distribution.
Fourth, EM algorithm to solve GMM
Important: Again, although the derivation of the following formula is complicated, it is important to understand what e-step and m-step are mainly doing. It is much more important to get your mind straight than to get stuck in the derivation of the formula.
Next, we will explain how to use EM algorithm to solve gaussian mixture model. First, the iterative formula of EM algorithm is given:
The idea of EM algorithm is as follows: firstly, the model expectation Q(θ, θ (t))Q(\Theta,\Theta^{(t)})Q(θ, θ (t)) is solved by e-step, then the expectation is maximized by M-step, and the optimal parameter value θ \Theta θ is obtained when the iteration converges.
Before explaining EM algorithm to solve the gaussian mixture model, gives the edge of the gaussian mixture model probability density function p (x) p (x) p (x), the joint probability function p (x, z) p (x, z) p (x, z) and the conditional probability density function p (z ∣ x) p (z | x) p (z ∣ x).
EM algorithm: E-step solution process
Next, we will explain the e-step of EM algorithm to solve the Gaussian mixture model, and calculate the expectation Q(θ, θ (t))Q(\Theta,\Theta^{(t)})Q(θ, θ (t)). The solution process is relatively complicated, and the expected solution results can be directly given first. Those who do not want to see the derivation process can directly skip it. The expectation obtained through e-step is:
The gaussian mixture model of conditional probability density function p (z ∣ x) p (z | x) p (z ∣ x) and the joint probability density function p (x, z) p (x, z) p (x, z) in the type:
The derivation process of the e-step conclusion is given below, and the complicated ones that do not need to be understood can be directly skipped.
We can see that the above formula is very complicated, and now we need to simplify the above formula. Let’s first remove one of the terms in the summation of the above formula to simplify it:
The simplification process of the above formula is shown in the figure below:
After simplification, the derivation continues as follows:
So far we through E – Step to solve out the expectations of gaussian mixture model Q (Θ, Θ (t)) Q (\ Theta, \ Theta ^ {(t)}) Q (Θ, Θ (t))
EM algorithm: M-Step solution process
In the E – Step we have calculated the expected Q (Θ, Θ (t)) Q (\ Theta, \ Theta ^ {(t)}) Q (Θ, Θ (t))
Replace ZiZ_iZi with k=1,2… , Kk = 1, 2,… , Kk = 1, 2,… ,K represents, and conditional probability is expressed in the following form without expansion:
Then m-step is to solve the parameter values by maximizing the expectation Q(θ, θ (t))Q(\Theta,\Theta^{(t)})Q(θ, θ (t)) :
Here, the parameters P1,p2… ,pkp_1,p_2,… ,p_kp1,p2,… Pk to explain the solution process, using the Lagrange multiplier method
For the solution of the maximum value of the above constrained term, the Lagrange multiplier method is used as follows:
We take k from 1,2… , K1, 2,… , K1, 2,… ,K is assigned with:
Among them are:
So we can get:
At this point, the following two expressions are combined:
The final solution is:
The other two parameters have the same solution as p(t+1)p^{(t+1)}p(t+1).