Exercise 11.5 - Gradient descent for fitting GMM

Answers

From the given (11.118) and (11.119):

p ( 𝐱 | 𝜃 ) = k π k 𝒩 ( 𝐱 | μ k , Σ k ) l ( 𝜃 ) = n = 1 N log p ( 𝐱 n | 𝜃 ) .

While r 𝑛𝑘 = p ( z 𝑛𝑘 = 1 | 𝐱 n , 𝜃 ) is defined by (11.120).

For question (a), recall that:

l ( 𝜃 ) = n = 1 N log [ 𝐳 n p ( 𝐱 n , 𝐳 n | 𝜃 ) ] = n = 1 N log [ k = 1 K p ( 𝐱 n , z 𝑛𝑘 = 1 | 𝜃 ) ] = n = 1 N log [ k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) ] .

We are now ready to taking partial gradient of l w.r.t. μ k , which yields:

∂𝑙 μ k = n = 1 N μ k log [ k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) ] = n = 1 N π k k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) 𝒩 ( 𝐱 n | μ k , Σ k ) μ k = n = 1 N π k 𝒩 ( 𝐱 n | μ k , Σ k ) k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) Σ k 1 ( 𝐱 n μ k ) ,

using (4.10) for the last step. Now we have arrived in (11.121).

For question (b):

∂𝑙 π k = n = 1 N π k log [ k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) ] = n = 1 N 𝒩 ( 𝐱 n | μ k , Σ k ) k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) ,

For question (c), with:

π k = exp ( w k ) k = 1 K exp ( w k ) ,

we have:

∂𝑙 w k = j ∂𝑙 π j π j w k = j n = 1 N 𝒩 ( 𝐱 n | μ j , Σ j ) k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) π j ( 1 π j ) 𝕀 ( j = k ) ( π k ) 𝕀 ( j k ) = n = 1 N j r 𝑛𝑗 ( 1 π j ) 𝕀 ( j = k ) ( π k ) 𝕀 ( j k ) = n = 1 N r 𝑛𝑘 ( 1 π k ) + ( 1 r 𝑛𝑘 ) ( π k ) = n = 1 N r 𝑛𝑘 π k ,

where no constant factor is missed.

For question (d), we have:

∂𝑙 Σ k = n = 1 N Σ k log [ k = 1 K π k 𝒩 ( 𝐱 n | μ k , Σ k ) ] = n = 1 N r 𝑛𝑘 ( 1 2 ) ( ( 𝑡𝑒𝑥𝑡𝑏𝑓 x n μ k ) ( 𝑡𝑒𝑥𝑡𝑏𝑓 x n μ k ) T Σ k ) ,

during which process we need to use (4.10) and the fact:

| 𝐀 | 𝐀 = | 𝐀 | 𝐀 1 ,

for a symmetric matrix 𝐀 . Thus the optimal Σ k takes the same form as what has been derived in exercise 11.2.

For question (e), this process is redundant since the MLE for Σ is already a positive definite matrix.

User profile picture
2021-03-24 13:42
Comments