Exercise 12.1 - M-step for FA

Answers

The EM for FA, as a useful exercise if you want to become proficient at the math, is presented in detail as follows. As for the mixture of FAs, you can refer to The EM Algorithm for Mixtures of Factor Analyzers, Zoubin Gharamani, Geoffrey E.Hinton, 1996.

We begin with: (centralize 𝐗 to cancel μ w.l.o.g):

p ( 𝐳 n ) = 𝒩 ( 𝐳 | 0 , 𝐈 ) , p ( 𝐱 n | 𝐳 n , 𝐖 , Ψ ) = 𝒩 ( 𝐱 | 𝐖 𝐳 , Ψ ) ,

where we have centralized 𝐗 to simplify the deduction. Now we apply (4.124) and (4.125) to the two equations before, this ends up with:

p ( 𝐳 n | 𝐱 n , 𝐖 , Ψ ) = 𝒩 ( 𝐳 n | 𝐦 , Σ ) , Σ = ( 𝐈 + 𝐖 T Ψ 1 𝐖 ) 1 , 𝐦 = Σ 𝐖 T Ψ 1 𝐱 n .

The log-likelihood for the complete data set { 𝐱 , 𝐳 } is:

log n = 1 N p ( 𝐱 n , 𝐳 n | 𝐖 , Ψ ) = n = 1 N [ log p ( 𝐳 n ) + log p ( 𝐱 n | 𝐳 n , 𝐖 , Ψ ) ] = n = 1 N 𝐳 n T 𝐳 n 2 N 2 log | Ψ | + n = 1 N 1 2 ( 𝐱 n 𝐖 𝐳 n ) T Ψ 1 ( 𝐱 n 𝐖 𝐳 n ) = 1 2 ( n = 1 N 𝐳 n T 𝐳 n ) N 2 log | Ψ | 1 2 ( n = 1 N 𝐱 n T Ψ 1 𝐱 n + 𝐳 n T 𝐖 T Ψ 1 𝐖 𝐳 n 2 𝐳 n T 𝐖 T Ψ 1 𝐱 n ) .

We are now ready to formulate the auxiliary function, let 𝜃 = ( 𝐖 , Ψ ) :

Q ( 𝜃 , 𝜃 old ) = 𝔼 p ( 𝐙 | 𝐗 , 𝜃 old ) [ n = 1 N log p ( 𝐱 n , 𝐳 n | 𝜃 ) ] = 1 2 n = 1 N 𝔼 [ 𝐳 n T 𝐳 n ] N 2 log | Ψ | 1 2 n = 1 N ( 𝐱 n T Ψ 1 𝐱 n + 𝔼 [ 𝐳 n T 𝐖 T Ψ 1 𝐖 𝐳 n ] 2 𝔼 [ 𝐳 n T ] 𝐖 T Ψ 1 𝐱 n ) ,

where the conditional first and second moments are:

𝔼 [ 𝐳 n ] = ( 𝐈 + 𝐖 T Ψ 1 𝐖 ) 1 𝐖 T Ψ 1 𝐱 n = 𝐦 n , 𝔼 [ 𝐳 n 𝐳 n T ] = ( 𝐈 + 𝐖 T Ψ 1 𝐖 ) 1 + 𝐦 n 𝐦 n T = Σ + 𝐦 n 𝐦 n T .

Therefore:

Q ( 𝜃 , 𝜃 old ) = 1 2 n = 1 N tr [ Σ + 𝐦 n 𝐦 n T ] N 2 log | Ψ | 1 2 n = 1 N ( 𝐱 n T Ψ 1 𝐱 n + tr [ 𝐖 T Ψ 1 𝐖 ( Σ + 𝐦 n 𝐦 n T ) ] 2 𝐦 n T 𝐖 T Ψ 1 𝐱 n ) .

Finally, let us take partial gradient of the auxiliary function w.r.t. 𝐖 and Ψ . We start with 𝐖 , note that Σ and 𝐦 n only dependents on 𝐖 old , hence the first term is a constant for 𝜃 :

∂𝑄 𝐖 = 1 2 n = 1 N 𝐖 tr [ 𝐖 T Ψ 1 𝐖 𝔼 [ 𝐳 n 𝐳 n T ] ] 2 𝐖 tr [ 𝐱 n 𝐦 n T Ψ 1 𝐖 ] ] = 1 2 n = 1 N 𝔼 [ 𝐳 n 𝐳 n T ] 2 Ψ 1 W 2 𝐱 n 𝐦 n T Ψ 1 .

Setting it to zero yields:

𝐖 = ( n = 1 N 𝐱 n 𝐦 n T ) ( n = 1 N 𝔼 [ 𝐳 n 𝐳 n T ] ) 1 .

Meanwhile, let Λ = Ψ 1 :

∂𝑄 Λ = N 2 Λ 1 1 2 n = 1 N ( 𝐱 n 𝐱 n T + 𝐖 𝔼 [ 𝐳 n 𝐳 n T ] 𝐖 T 2 𝐱 n 𝐦 n T 𝐖 T ) .

Hence:

Ψ = 1 N n = 1 N ( 𝐱 n 𝐱 n T + 𝐖 𝔼 [ 𝐳 n 𝐳 n T ] 𝐖 T 2 𝐱 n 𝐦 n T 𝐖 T ) .

This completes the proof.

User profile picture
2021-03-24 13:42
Comments