Exercise 18.1 - Derivation of EM for LG-SSM

Answers

We directly work on the auxiliary function, in which 𝜃 denotes the collection of all six matrices. The dependence on 𝐮 is omitted for simplicity. The first subscript for variables 𝐳 and 𝐲 is the index of the sequence. :

Q ( 𝜃 , 𝜃 old ) = 𝔼 p ( 𝐙 | 𝐘 , 𝜃 old ) [ log n = 1 N p ( 𝐳 n , 1 : T n , 𝐲 n , 1 : T n | 𝜃 ) ] = 𝔼 [ log ( n = 1 N p ( 𝐳 n , 1 | 𝜃 ) ( i = 2 T n p ( 𝐳 n , i | 𝐳 n , i 1 , 𝜃 ) ) ( i = 1 T n p ( 𝐲 n , i | 𝐳 n , i , 𝜃 ) ) ) ] = 𝔼 [ n = 1 N log 𝒩 ( 𝐳 n , 1 | μ 0 , Σ 0 ) + n = 1 N i = 2 T n log 𝒩 ( 𝐳 n , i | 𝐀 i 𝐳 n , i 1 + 𝐁 i 𝐮 i , 𝐐 i ) + n = 1 N i = 1 T n log 𝒩 ( y n , i | 𝐂 i z n , i + 𝐃 i u i , 𝐑 i ) ] = 𝔼 [ N 2 log | Σ 0 | + { 1 2 n = 1 N ( 𝐳 n , 1 μ 0 ) T Σ 0 1 ( 𝐳 n , 1 μ 0 ) } 1 2 n = 1 N i = 2 T n log | 𝐐 i | + { 1 2 n = 1 N i = 2 T n ( 𝐳 n , i 𝐀 i 𝐳 n , i 1 𝐁 i 𝐮 i ) T 𝐐 i 1 ( 𝐳 n , i 𝐀 i 𝐳 n , i 1 𝐁 i 𝐮 i ) } 1 2 n = 1 N i = 2 T n log | 𝐑 i | + { 1 2 n = 1 N i = 2 T n ( 𝐲 n , i 𝐂 i 𝐳 n , i 𝐃 i 𝐮 i ) T 𝐑 i 1 ( 𝐲 n , i 𝐂 i 𝐳 n , i 𝐃 i 𝐮 i ) } ] .

For the E-step, we are to compute:

𝔼 p ( 𝐙 | 𝐘 , 𝜃 old ) [ 𝐳 n , i ] , 𝔼 p ( 𝐙 | 𝐘 , 𝜃 old ) [ 𝐳 n , i T 𝐄 𝐳 n , i ] , 𝔼 p ( 𝐙 | 𝐘 , 𝜃 old ) [ 𝐳 n , i T 𝐅 𝐳 n , i 1 ] ,

respectively, in which 𝐄 can be Σ 0 1 , 𝐐 i 1 , 𝐀 i T 𝐐 i 1 𝐀 i and 𝐂 i T 𝐑 i 1 𝐂 i , while 𝐅 is 𝐐 i 1 𝐀 i . We then evoke the linearity of the trace operator:

𝔼 p ( 𝐙 | 𝐘 , 𝜃 old ) [ 𝐳 T 𝐄 𝐳 ] = 𝔼 [ tr ( 𝐄 𝐳 𝐳 T ) ] = 𝐄 𝔼 [ 𝐳 𝐳 T ] .

Therefore we only need the conditional expectation of 𝐳 n , i , 𝐳 n , i 𝐳 n , i T and 𝐳 n , i 𝐳 n , i 1 T . The first two variables have been covered by (18.56)-(18.59). For the last variable, we simply need to borrow the form from (18.63). Adding all these together completes the E-step.

For the M-step, we should note that the auxiliary function after the E-step has become the form such that for μ 0 :

∂𝑄 μ 0 = μ 0 ( N 2 μ 0 T Σ 0 1 μ 0 + ( n = 1 N 𝔼 [ 𝐳 n , 1 ] ) T Σ 0 1 μ 0 ) ,

whose form is close to that for the mean in an ordinary MVN. For Σ 0 , 𝐐 i and 𝐑 i , the conclusion is similar. As for matrices such as 𝐀 i , 𝐁 i , 𝐂 i and 𝐃 i , note that ∂𝑄 𝐀 i is composed of a series with form:

𝐀 i 𝐳 T 𝐀 i 𝐯 = 𝐯 𝐳 T ,

and

𝐀 i 𝐳 T 𝐀 i T 𝐐 i 1 𝐀 i 𝐳 = 2 𝐳 𝐳 T 𝐀 i T 𝐐 .

(assuming symmetry!) Hence we end up with an involving update formula between 𝐀 i and 𝐁 i , so is that for 𝐂 i and 𝐃 i . This finishes the M-step for the EM for LG-SSM.

User profile picture
2021-03-24 13:42
Comments