Exercise 11.13 - EM for EB estimation of Gaussian shrinkage model

Answers

This is an example of non-mixture latent graphical model. In this case the latent variable is no longer the one-hot type, making it different from the EM forms that we have developed.

Recall that the complete likelihood for Gaussian shrinkage model is:

p ( 𝜃 , 𝒟 | μ , τ 2 , { σ j 2 } j = 1 D ) = p ( 𝜃 | μ , τ 2 ) p ( 𝒟 | 𝜃 , { σ j 2 } j = 1 D ) = j = 1 D [ 𝒩 ( 𝜃 j | μ , τ 2 ) i = 1 N j 𝒩 ( x 𝑖𝑗 | 𝜃 j , σ j 2 ) ] .

Taking logarithm yields:

log p ( 𝜃 , 𝒟 | μ , τ 2 , { σ j 2 } j = 1 D ) = j = 1 D [ log 𝒩 ( 𝜃 j | μ , τ 2 ) + i = 1 N j log 𝒩 ( x 𝑖𝑗 | 𝜃 j , σ j 2 ) ] = j = 1 D [ 1 2 log 2 π τ 2 1 2 τ 2 ( 𝜃 j μ ) 2 ] + j = 1 D i = 1 N j [ 1 2 log 2 π σ j 2 1 2 σ j 2 ( x 𝑖𝑗 𝜃 j ) 2 ] = D 2 log 2 π τ 2 j = 1 D ( 𝜃 j μ ) 2 2 τ 2 + j = 1 D [ N j 2 log 2 π σ j 2 ] j = 1 D i = 1 N j ( x 𝑖𝑗 𝜃 j ) 2 2 σ j 2 .

Note that p ( 𝜃 , 𝒟 | μ , τ 2 , σ j 2 s ) is essentially Gaussian, hence the posterior over 𝜃 can be analytically written down with (4.125) (though tedious). Hence all terms that dependent on 𝜃 in the logarithm of the complete likelihood can be estimated as moments of their posterior. This is possible since all such terms taking the form 𝜃 j or 𝜃 j 2 . This completes the E-step.

For the M-step, this model is not different from others we have developed so far. Taking partial gradient w.r.t. μ and τ 2 and setting them to zero would yield the update rules.

User profile picture
2021-03-24 13:42
Comments