After the previous two articles, we have finally arrived at Gibbs Samping, why am I so excited? It is because of the LDA model that I used Gibbs Sampling method to solve parameter problems.

Due to the headlines for Markdown support is not too good, can go to the link to see the original: https://www.zybuluo.com/zhuanxu/note/1027270.

All right, let’s start with Gibbs sampling.

Gibbs sampling

In a previous article, I introduced the meticulous stationary condition, namely:

Gibbs sampling algorithm is as follows:

We prove that Gibbs sampling algorithm also satisfies the delicate and stationary condition.

Assuming x = x 1,.., x D, when we sample the KTH data,

At this point, our acceptance rate is:

A key part of the formula above is:

The gibbs sampling is a certain accepted sampling.

Now let’s do the usual example.

example

Suppose we have we have data x1,x2… XN, where 1-n numbers obey one Poisson distribution and n+ 1-n numbers obey another poisson distribution.

While the parameters of Poisson distribution obey Gamma distribution, we summarize the current prior assumptions:

At this point, the posterior probability is:

Let’s start our sampling next by calculating logP:

And then sampling

We only take the terms related to the current sampling parameters here, because the rest are just constants that normalize the probability distribution and do not affect the sampling.

Here they all follow the Gamma distribution.

The last is n:

N shows no distribution information, but we simply consider it a multinomial distribution.

Here is the key Python code:

See Gibbs for the full code.

The output after sampling is as follows:

LDA

Let’s look at the most exciting LDA model and see how it can be solved by Gibbs sampling.

First look at the MODEL of LDA:

The whole process can be described as

  1. For document D, sample from the distribution, that is, the topic distribution of document D

  2. For each word of the document, subject K is sampled from the multinomial distribution, i.e

  3. For theme K, the word distribution of theme K is sampled

  4. For each word, the specific word T is sampled from the topic distribution

In the whole model above, the model parameters are:

In order to do Gibbs sampling, we first look at the joint distribution of these parameters.

The above formula is a multinomial distribution, but a Dir distribution, and we know that the Dir distribution and the multinomial distribution are conjugate distributions, so the posterior distribution is easier to write down.

Now we start to calculate the probability of each parameter, mainly calculating the following three probabilities:

Let’s calculate the probability distribution for the first topic:

Among them

Represents the number of words belonging to each topic in the MTH document.

The word distribution for each topic is then calculated

Among them

The number of words per k topics in document M is (1-V) words. Finally, estimate the topic of each word.

We can see that p(ZMJ =k) is a multinolial distribution, each of which has a probability of beta* Theta, and they themselves need to be sampled from the Dir distribution. A natural thought is that we can use estimates instead, and from the Dir distribution we can easily calculate the probability.

The mathematical basis for this can be found in the topic model in the headline number: MATHEMATICAL basis of LDA.

One of the points inside is the Dir distribution and its mathematical expectation

We calculated in Gibbs sampling above that each sample can use the expectation of Dir distribution as the new parameter value.

coding

After introducing the math basics, we can see how to implement it. Here is a pseudocode.

The complete code can be seen playing a bit of advanced – get you started Topic model LDA (small improvements + attached source code)

Pymc3 implementation

Let’s implement pyMC3 again.

The code written by PyMC3 is really neat. But. It’s just too slow. For the full code, see Gibbs -lda.

conclusion

This paper introduces the special case of MH algorithm Gibbs sampling, and gives the reason why Gibbs sampling work, finally we use Gibbs sampling to solve the parameter estimation problem of LDA.

Your encouragement is my motivation to continue writing, and I look forward to our common progress.