Exercise 13.9 - EM for sparse probit regression with Laplace prior

Answers

The ordinary Probit regression involves no latent variable. Introducing Laplace prior for the linear weight 𝐰 results in its lasso version. Since Laplace distribution is a continuous mixture of Gaussian according to (13.86), a latent variable τ 2 with the same dimension as 𝐰 is introduced. For each component w j of 𝐰 , there is a corresponding latent variable τ j 2 to guide its variance. The PGM for this Probit regression looks like:

γ τ 2 𝐰 𝐲 𝐗 .

The joint distribution is:

p ( τ 2 , 𝐰 , 𝐲 | 𝐗 , γ ) = ( d = 1 D p ( τ d 2 | γ ) p ( w d | τ d 2 ) ) ( n = 1 N Φ ( 𝐰 T 𝐱 n ) y n ( 1 Φ ( 𝐰 T 𝐱 n ) ) 1 y n ) ,

where Φ is the c.d.f. for a unit Gaussian. According to (13.86):

p ( τ 2 | γ ) = Ga ( τ d 2 | 1 , γ 2 2 ) ,

p ( w d | τ d 2 ) = N ( w d | 0 , τ d 2 ) .

Hence:

p ( τ 2 , 𝐰 , 𝐲 | 𝐗 , γ ) exp { 1 2 d = 1 D ( γ 2 τ d 2 + w d 2 τ d 2 ) } d = 1 D 1 τ d n = 1 N Φ ( 𝐰 T 𝐱 n ) y n ( 1 Φ ( 𝐰 T 𝐱 n ) ) 1 y n .

To build the auxiliary function, we assumed 𝐰 as the parameter to be estimated and τ 2 as latent variable, thus:

Q ( 𝐰 , 𝐰 old ) = 𝔼 p ( τ 2 | 𝐰 old , 𝐲 , 𝐗 , γ ) [ log p ( τ 2 , 𝐰 , 𝐲 | 𝐗 , γ ) ] .

We now extract terms involving 𝐰 from log p ( τ 2 , 𝐰 , 𝐲 | 𝐗 , γ ) :

Q ( 𝐰 , 𝐰 old ) = c 1 2 d = 1 D w d 2 τ d 2 + n = 1 N y n log Φ ( 𝐰 T 𝐱 n ) + ( 1 y n ) ( 1 Φ ( 𝐰 T 𝐱 n ) ) .

Thus we only need to calculate the conditional expectation:

𝔼 [ 1 τ d 2 | 𝐰 𝑜𝑙𝑑 ]

for the E-step. Whose result is already given in exercise 13.8. The M-step is the same as Gaussian-prior Probit regression and hence is omitted.

User profile picture
2021-03-24 13:42
Comments