Exercise 17.3 - EM for HMMs with mixture of Gaussian observations

Answers

The complete likelihood takes the form:

\begin{aligned} p (𝐳_{1 : T}, 𝐱_{1 : T} | π, 𝐀, 𝐖, μ, Σ) & = (\prod_{j = 1}^{J} π_{j}^{𝕀 [z_{1} = j]}) \cdot (\prod_{t = 2}^{T} \prod_{i = 1}^{J} \prod_{j = 1}^{J} 𝐀_{i, j}^{𝕀 [z_{t - 1} = i, z_{t} = j]}) \\ \cdot [\prod_{t = 1}^{T} \prod_{j = 1}^{J} {(\sum_{k = 1}^{K} w_{𝑗𝑘} \cdot 𝒩 (𝐱_{t} | μ_{𝑗𝑘}, Σ_{𝑗𝑘}))}^{𝕀 [z_{t} = j]}], \end{aligned}

where $π$ is the initial distribution of the hidden state, $𝐀$ is a $J * J$ matrix with transition probability, $𝐖$ is a $J * K$ matrix for $w_{𝑗𝑘}$ , $μ$ and $Σ$ are tensors whose $(j, k)$ -th component denote $μ_{𝑗𝑘}$ and $Σ_{𝑗𝑘}$ respectively. Now its logarithm reads (we temporarily drop the condition on paramters for conciseness):

\begin{aligned} \log p (𝐳_{1 : T}, 𝐱_{1 : T}) & = \sum_{j = 1}^{J} 𝕀 [z_{1} = j] \cdot \log π_{j} + \sum_{t = 2}^{T} \sum_{i = 1}^{J} \sum_{j = 1}^{J} 𝕀 [z_{t - 1} = i, z_{t} = j] \cdot \log 𝐀_{i, j} \\ + \sum_{t = 1}^{T} \sum_{j = 1}^{J} 𝕀 [z_{t} = j] \cdot \log (\sum_{k = 1}^{K} w_{𝑗𝑘} \cdot 𝒩 (𝐱_{t} | μ_{𝑗𝑘}, Σ_{𝑗𝑘})) . \end{aligned}

When being taken expectation w.r.t. $p (𝐳_{1 : T} | 𝐱_{1 : T})$ , the only terms that matter are identical to those in exercise 17.1. This finishes the E-step for this model.

For the M-step, the update for $𝐀$ and $π$ are identical to those for an ordinary HMM since their gradients remain the same. To update $𝐖, μ, Σ$ , note that their dependence on the auxiliary function is through:

\sum_{n = 1}^{N} \sum_{t = 1}^{T_{n}} 𝔼 [𝕀 [z_{n, t} = j]] \cdot \log (\sum_{k = 1}^{K} w_{𝑗𝑘} \cdot 𝒩 (𝐱_{t} | μ_{𝑗𝑘}, Σ_{𝑗𝑘})) .

This is tantamount to estimate the parameters for $J * K$ Gaussian components independently, with extra weight on each sample. Let us denote

α_{n, t} (j) = 𝔼 [𝕀 [z_{n, t} = j]],

so $α_{n, t} (j)$ is determined from the old set of parameters, then consider the likelihood, in which we use a one-hot vector of length $J * K$ to embed the new latent variables:

p (𝐱_{n, t} | 𝐡_{n, t}) = 𝒩 {(𝐱 | μ_{𝑗𝑘}, Σ_{𝑗𝑘})}^{𝕀 [h_{n, t, j, k} = 1]} .

Although tedious, one can consider $(n, t)$ as the complex index for data, $(j, k)$ as the complex index for Gaussian components. Now the internal auxiliary function reads (the evidence of the latent variables does not depend on the new parameters and is thus omitted):

Q^{I} (𝜃, 𝜃^{old}) = \sum_{n, t} \sum_{j, k} 𝔼 [𝕀 [h_{n, t, j, k} = 1]] \cdot \log 𝒩 (𝐱 | μ_{𝑗𝑘}, Σ_{𝑗𝑘}),

where:

𝔼 [𝕀 [h_{n, t, j, k} = 1]] = p (h_{n, t, j, k} = 1) = p (z_{n, t} = j) \cdot p (h_{n, t, j, k} = 1 | z_{n, t} = j),

in which $p (z_{n, t} = j) = α_{n, t} (j)$ and $p (h_{n, t, j, k} = 1 | z_{n, t} = j)$ can be computed as in an ordinary GMM by focusing on the $K$ Gaussian components under the hidden state $j$ . This concludes the E-step for the second auxiliary function. The M-step for the $Q^{I}$ is thus similar to that for an ordinary, except for the introduction of an extra factor. This also completes the rest M-step for the first auxiliary function.

solour_lfq

2021-03-24 13:42

Exercise 17.3 - EM for HMMs with mixture of Gaussian observations

Answers

Comments

Add answer