Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Today we will explore a method of comparing two probability distributions called kullback-Leibler divergence (often referred to simply as KL divergence). So let’s give you the formula


D K L ( P Q ) = i P ( i ) log P ( i ) Q ( i ) D K L ( P Q ) = P ( x ) log P ( x ) Q ( x ) d x D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}\\ D_{KL}(P||Q) = \int P(x) \log \frac{P(x)}{Q(x)} dx\\

The above formula looks at the difference between Q(I)Q(I)Q(I)Q(I) and the probability of the target P(I)P(I)P(I)P(I). It is asymmetrical and nonnegative for KL divergence.

For the coin toss problem let’s say that what we observe is that out of N flips, heads are NHN_HNH, and tails are NTN_TNT. Suppose there are two probability distributions P (real coin) and Q, where the P probability distribution is p1, P2p_1, P_2P1, and P2, which indicate that the probability of heads is P1p_1P1 and the probability of tails is P2p_2P2 in the P probability distribution. The predicted probability distribution is Q(Coin1), where the probability of heads is Q1Q_1Q1 and the probability of tails is Q2Q_2Q2


P ( O b s e r v a t i o n s r e a l c o i n ) = p 1 N H p 2 N T P ( O b s e r v a t i o n s c o i n 1 ) = q 1 N H q 2 N T P(Observations|real\,coin) = p_1^{N_H}p_2^{N_T}\\ P(Observations|coin1 ) = q_1^{N_H}q_2^{N_T}\\

Using the probability distribution P and the probability distribution Q respectively, if you see N times, the number of heads is NHN_HNH, and the number of tails is NTN_TNT, which is the likelihood.


log ( p 1 N H p 2 N T q 1 N H q 2 N T ) 1 N \log(\frac{p_1^{N_H}p_2^{N_T}}{q_1^{N_H}q_2^{N_T}})^{\frac{1}{N}}

The ratio above is regularized by taking log of 1N\frac{1}{N}N1


N H N log p 1 + N T N log p 2 N H N log q 1 N T N log q 2 q 1 log p 1 + q 1 log p 2 q 1 log p 1 q 2 log p 2 q 1 q 1 log p 1 q 1 + q 2 log p 2 q 2 = i q i p i q i \frac{N_H}{N} \log p_1 + \frac{N_T}{N} \log p_2 – \frac{N_H}{N} \log q_1 -\frac{N_T}{N} \log q_2\\ q_1 \log p_1 + q_1 \log p_2 – q_1 \log p_1 – q_2 \log p_2\\ q_1\\ q_1 \log \frac{p_1}{q_1} + q_2 \log \frac{p_2}{q_2}\\ =\sum_i q_i \frac{p_i}{q_i}

That’s the whole derivation.