Exercise 17.3 - EM for HMMs with mixture of Gaussian observations

Answers

The complete likelihood takes the form:

p ( 𝐳 1 : T , 𝐱 1 : T | π , 𝐀 , 𝐖 , μ , Σ ) = ( j = 1 J π j 𝕀 [ z 1 = j ] ) ( t = 2 T i = 1 J j = 1 J 𝐀 i , j 𝕀 [ z t 1 = i , z t = j ] ) [ t = 1 T j = 1 J ( k = 1 K w 𝑗𝑘 𝒩 ( 𝐱 t | μ 𝑗𝑘 , Σ 𝑗𝑘 ) ) 𝕀 [ z t = j ] ] ,

where π is the initial distribution of the hidden state, 𝐀 is a J J matrix with transition probability, 𝐖 is a J K matrix for w 𝑗𝑘 , μ and Σ are tensors whose ( j , k ) -th component denote μ 𝑗𝑘 and Σ 𝑗𝑘 respectively. Now its logarithm reads (we temporarily drop the condition on paramters for conciseness):

log p ( 𝐳 1 : T , 𝐱 1 : T ) = j = 1 J 𝕀 [ z 1 = j ] log π j + t = 2 T i = 1 J j = 1 J 𝕀 [ z t 1 = i , z t = j ] log 𝐀 i , j + t = 1 T j = 1 J 𝕀 [ z t = j ] log ( k = 1 K w 𝑗𝑘 𝒩 ( 𝐱 t | μ 𝑗𝑘 , Σ 𝑗𝑘 ) ) .

When being taken expectation w.r.t. p ( 𝐳 1 : T | 𝐱 1 : T ) , the only terms that matter are identical to those in exercise 17.1. This finishes the E-step for this model.

For the M-step, the update for 𝐀 and π are identical to those for an ordinary HMM since their gradients remain the same. To update 𝐖 , μ , Σ , note that their dependence on the auxiliary function is through:

n = 1 N t = 1 T n 𝔼 [ 𝕀 [ z n , t = j ] ] log ( k = 1 K w 𝑗𝑘 𝒩 ( 𝐱 t | μ 𝑗𝑘 , Σ 𝑗𝑘 ) ) .

This is tantamount to estimate the parameters for J K Gaussian components independently, with extra weight on each sample. Let us denote

α n , t ( j ) = 𝔼 [ 𝕀 [ z n , t = j ] ] ,

so α n , t ( j ) is determined from the old set of parameters, then consider the likelihood, in which we use a one-hot vector of length J K to embed the new latent variables:

p ( 𝐱 n , t | 𝐡 n , t ) = 𝒩 ( 𝐱 | μ 𝑗𝑘 , Σ 𝑗𝑘 ) 𝕀 [ h n , t , j , k = 1 ] .

Although tedious, one can consider ( n , t ) as the complex index for data, ( j , k ) as the complex index for Gaussian components. Now the internal auxiliary function reads (the evidence of the latent variables does not depend on the new parameters and is thus omitted):

Q I ( 𝜃 , 𝜃 old ) = n , t j , k 𝔼 [ 𝕀 [ h n , t , j , k = 1 ] ] log 𝒩 ( 𝐱 | μ 𝑗𝑘 , Σ 𝑗𝑘 ) ,

where:

𝔼 [ 𝕀 [ h n , t , j , k = 1 ] ] = p ( h n , t , j , k = 1 ) = p ( z n , t = j ) p ( h n , t , j , k = 1 | z n , t = j ) ,

in which p ( z n , t = j ) = α n , t ( j ) and p ( h n , t , j , k = 1 | z n , t = j ) can be computed as in an ordinary GMM by focusing on the K Gaussian components under the hidden state j . This concludes the E-step for the second auxiliary function. The M-step for the Q I is thus similar to that for an ordinary, except for the introduction of an extra factor. This also completes the rest M-step for the first auxiliary function.

User profile picture
2021-03-24 13:42
Comments