Exercise 11.2 - EM for mixture of Gaussians

Answers

We are to optimize the following target w.r.t. 𝜃 :

Q ( 𝜃 , 𝜃 old ) = 𝔼 p ( 𝐳 | 𝒟 , 𝜃 old ) [ n = 1 N log p ( 𝐱 n , 𝐳 n | 𝜃 ) ] = n = 1 N 𝔼 p ( 𝐳 | 𝒟 , 𝜃 old ) [ log k = 1 K ( π k p ( 𝐱 n | z k , 𝜃 ) ) z 𝑛𝑘 ] = n = 1 N k = 1 K 𝔼 p ( 𝐳 | 𝒟 , 𝜃 old ) [ z 𝑛𝑘 log ( π k p ( 𝐱 n | z k , 𝜃 ) ) ] = n = 1 N k = 1 K 𝔼 p ( 𝐳 | 𝒟 , 𝜃 old ) [ z 𝑛𝑘 ] log ( π k p ( 𝐱 n | z k , 𝜃 ) ) = n = 1 N k = 1 K r 𝑛𝑘 log ( π k p ( 𝐱 n | z k , 𝜃 ) ) ,

where:

r 𝑛𝑘 = p ( z 𝑛𝑘 = 1 | 𝐱 n , 𝜃 old ) .

(Recall the graphical structure of GMM model. 𝐳 n is the one-hot variable that encodes the belonging of sample 𝐱 n to the centroids.) When the base distribution p ( 𝐱 | 𝐳 , 𝜃 ) is Gaussian, consider the terms involving μ k and Σ k in Q ( 𝜃 , 𝜃 old ) first (adopting non-information prior):

n = 1 N r 𝑛𝑘 log p ( 𝐱 n | z k = 1 , 𝜃 ) = n = 1 N r 𝑛𝑘 log 𝒩 ( 𝐱 n | μ k , σ k 2 ) = n = 1 N r 𝑛𝑘 ( C 1 2 log | Σ k | 1 2 ( 𝐱 n μ k ) T Σ k 1 ( 𝐱 n μ k ) ) = L ( μ k , Σ k ) .

Optimizing this target w.r.t. μ k and Σ k is tantamount to optimizing the mean and covariance of a weighted Gaussian model, hence:

L μ k = n = 1 n r 𝑛𝑘 Σ 1 ( 𝐱 n μ k ) .

Setting it to zero yields:

μ k = n = 1 N r 𝑛𝑘 𝐱 n n = 1 N r 𝑛𝑘 .

Finally:

L Λ k = n = 1 N r 𝑛𝑘 ( 1 2 Λ k 1 1 2 ( 𝐱 μ k ) ( 𝐱 μ k ) T ) ,

where Λ k = Σ k 1 . Setting it to zero yields:

Σ k = Λ k 1 = n = 1 N r 𝑛𝑘 ( 𝐱 μ k ) ( 𝐱 μ k ) T n = 1 N r 𝑛𝑘 .

So far we have proven (11.114) and (11.115).

User profile picture
2021-03-24 13:42
Comments