Exercise 13.2 - Derivation of M-step for EB for linear regression

Answers

We give the EM for Automatic Relevance Determination(ARD) for the linear regression scene, the model is defined by:

p ( 𝐘 | 𝐗 , 𝐰 , β ) = 𝒩 ( 𝐘 | 𝐗 T 𝐰 , β 1 𝐈 N ) , p ( 𝐰 ) = 𝒩 ( 𝐰 | 0 , 𝐀 1 ) , 𝐀 = ( α 1 α D ) .

In which the latent variables are 𝐰 , and it is the hyperparameters 𝐀 that requires estimation.

During the E-step, we are to estimate the expectation of 𝐰 , conditioned on 𝐗 and 𝐘 . Recall (4.125):

p ( 𝐰 | 𝐘 , 𝐗 , β , 𝐀 ) = 𝒩 ( 𝐰 | μ E , Σ E ) ,

where:

Σ E = ( 𝐀 + β 𝐗 𝐗 T ) 1 , μ E = β Σ E 𝐗 𝐘 .

We are now ready to write down the logarithm of the complete likelihood:

p ( 𝐘 , 𝐰 | 𝐗 , β , 𝐀 ) = 𝒩 ( 𝐰 | 0 , 𝐀 ) 𝒩 ( 𝐘 | 𝐗 T 𝐰 , β 1 𝐈 N ) | 𝐀 | 1 2 β N 2 exp { β 2 ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) 1 2 𝐰 T 𝐀 𝐰 } .

Hence the auxiliary function is (let 𝜃 = ( β , 𝐀 ) ):

Q ( 𝜃 , 𝜃 old ) = 𝔼 [ 1 2 log | 𝐀 | + N 2 log β β 2 ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) 1 2 𝐰 T 𝐀 𝐰 ] = 1 2 log | 𝐀 | + N 2 log β β 2 𝔼 [ ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) ] 1 2 𝔼 [ 𝐰 T 𝐀 𝐰 ] .

The dependence of Q on 𝐀 is through:

1 2 log | 𝐀 | 1 2 tr ( 𝐀 𝔼 [ 𝐰 𝐰 T ] ) = 1 2 log | 𝐀 | 1 2 tr ( 𝐀 ( Σ E + μ E μ E T ) ) .

Hence:

∂𝑄 α j = 1 2 α j 1 2 ( Σ E + μ E μ E T ) 𝑗𝑗 .

Using a conjugate onto α j yields (13.166).

Finally, the dependence of Q on β is through:

N 2 log β β 2 𝔼 [ ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) ] ,

where:

𝔼 [ ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) ] = 𝐘 T 𝐘 2 𝔼 [ 𝐰 ] T 𝐗 𝐘 + tr ( 𝐗 𝐗 T 𝔼 [ 𝐰 𝐰 T ] ) = 𝐘 T 𝐘 2 μ E 𝐗 𝐘 + tr ( 𝐗 𝐗 T [ Σ E + μ E μ E T ] ) .

Hence we have:

β = N 𝐘 T 𝐘 2 μ E 𝐗 𝐘 + tr ( 𝐗 𝐗 T [ Σ E + μ E μ E T ] ) .

Adding a Gamma prior results in (13.168).

User profile picture
2021-03-24 13:42
Comments