Exercise 24.2 - Gibbs sampling for a 1D Gaussian mixture model

Answers

The likelihood for this model is (assume that there are K Gaussian components, we discuss this problem for MVN):

p ( 𝐗 , 𝐙 , π , μ , Σ ) = p ( π ) ( k = 1 K p ( μ k ) p ( Σ k ) ) ( n = 1 N p ( 𝐳 n | π ) p ( 𝐱 n | 𝐳 n , μ , Σ ) ) = 1 B ( α ) k = 1 K π k α k 1 k = 1 K 𝒩 ( μ k | 𝐦 0 , 𝐕 0 ) IW ( Σ k | 𝐒 0 , v 0 ) n = 1 N k = 1 K ( π k 𝒩 ( 𝐱 n | μ k , Σ k ) ) z 𝑛𝑘 .

For Gibbs sampling, we derive the conditional distribution of 𝐙 , π , μ and Σ conditioned on all other variables.

For the latent variables:

p ( z 𝑛𝑘 = 1 | 𝐗 , π , μ , Σ ) = p ( z 𝑛𝑘 = 1 , 𝐗 , π , μ , Σ ) p ( 𝐗 , π , μ , Σ ) p ( z 𝑛𝑘 = 1 | π ) p ( 𝐱 n | z 𝑛𝑘 = 1 , μ , Σ ) = π k 𝒩 ( 𝐱 n | μ k , Σ k ) .

For the weights:

p ( π | 𝐗 , 𝐙 , π , μ , Σ ) = p ( π , 𝐗 , 𝐙 , π , μ , Σ ) p ( 𝐗 , 𝐙 , π , μ , Σ ) p ( π ) p ( 𝐙 | π ) ( k = 1 K π k α k 1 ) ( n = 1 N k = 1 K π k z 𝑛𝑘 ) = k = 1 K π k α k + ( n = 1 N z 𝑛𝑘 ) 1 .

Hence the conditional distribution for π follows (24.11).

For the means:

p ( μ k | 𝐗 , 𝐙 , π , μ k , Σ ) = p ( μ k , 𝐗 , 𝐙 , π , μ k Σ ) p ( 𝐗 , 𝐙 , π , μ k Σ ) p ( μ k ) p ( 𝐗 | μ k , 𝐙 , Σ , μ k ) = 𝒩 ( μ k | 𝐦 0 , 𝐕 0 ) n = 1 N ( π k 𝒩 ( 𝐱 n | μ k , Σ k ) ) z 𝑛𝑘 .

At this point it is better to observe in the logarithm perspective:

log p ( μ k | 𝐗 , 𝐙 , π , μ k , Σ ) = 1 2 ( μ k 𝐦 0 ) T 𝐕 0 1 ( μ k 𝐦 0 ) + n = 1 N z 𝑛𝑘 2 ( μ k 𝐱 n ) T Σ k 1 ( μ k 𝐱 n ) + const ,

from which we observe that the coefficient for μ T μ and μ are:

1 2 ( 𝐕 0 1 + Σ k 1 n = 1 N z 𝑛𝑘 ) ,

𝐕 0 1 𝐦 0 + n = 1 N z 𝑛𝑘 Σ k 1 𝐱 n .

Therefore the conditional distribution for μ k is a Gaussian with covariance:

𝐕 k = ( 𝐕 0 1 + Σ k 1 n = 1 N z 𝑛𝑘 ) 1

and mean:

𝐦 k = 𝐕 k ( 𝐕 0 1 𝐦 0 + n = 1 N z 𝑛𝑘 Σ k 1 𝐱 n ) .

Finally, for the covariance:

p ( Σ k | 𝐗 , 𝐙 , π , μ , Σ k ) = p ( Σ k , 𝐗 , 𝐙 , π , μ , Σ k ) p ( 𝐗 , 𝐙 , π , μ , Σ k ) p ( Σ k ) p ( 𝐗 | Σ k , 𝐙 , μ , Σ k ) .

This is tantamount to:

log p ( Σ k | 𝐗 , 𝐙 , π , μ , Σ k ) = v 0 2 log | Σ k | 1 2 tr [ 𝐒 0 1 Σ k 1 ] + n = 1 N z 𝑛𝑘 ( 1 2 log | Σ k | 1 2 ( 𝐱 n μ k ) T Σ k 1 ( 𝐱 n μ k ) ) + const = v 0 + n = 1 N z 𝑛𝑘 2 log | Σ k | 1 2 tr [ ( 𝐒 0 1 + n = 1 N z 𝑛𝑘 ( 𝐱 n μ k ) T ( 𝐱 n μ k ) ) Σ k 1 ] + const .

Hence the conditional distribution of Σ k is the inverse-Wishart distribution with:

𝐯 k = v 0 + n = 1 N z 𝑛𝑘 ,

𝐒 k = ( 𝐒 0 1 + n = 1 N z 𝑛𝑘 ( 𝐱 n μ k ) T ( 𝐱 n μ k ) ) 1 .

The difference from (24.18) is in the definition of 𝐒 0 .

User profile picture
2021-03-24 13:42
Comments