Exercise 21.10 - VB for binary FA with probit link

Answers

To attack this question, we assume the following variational distribution:

q ( 𝐗 , 𝐙 , 𝐖 ) = ( l = 1 L q ( 𝐰 l ) ) ( i = 1 N q ( 𝐱 i ) q ( z i ) ) .

Since we have three independent family of variables to be estimated, this pseudo VB proceduce consists of three steps (not simple an E-step and an M-step). We assume L = 1 w.l.o.g.

For the first step, we update for 𝐱 i , this is done by collecting terms relevent to 𝐱 i from the expectation of the log likelihood w.r.t. q ( 𝐰 , 𝐙 ) :

𝔼 q ( 𝐰 , 𝐙 ) [ log p ( 𝐱 i , 𝐰 , z i , y i ) ] = 𝔼 q ( 𝐰 ) q ( z i ) [ log p ( 𝐱 i ) + log p ( z i | 𝐱 i , 𝐰 ) ] + const = 𝔼 q ( 𝐰 ) q ( z i ) [ log p ( 𝐱 i ) ( z i 𝐰 T 𝐱 i ) 2 2 ] + const .

From which we observe that q ( 𝐱 i ) takes the form of a Gaussian, the update for its precision is:

Λ 𝐱 i Λ 𝐱 i + 𝔼 q ( 𝐰 ) [ 𝐰 𝐰 T ] .

For its mean:

μ 𝐱 i Λ 𝐱 i 1 ( Λ 𝐱 i μ 𝐱 i + 𝔼 [ 𝐰 ] 𝔼 [ z i ] ) .

In the second step, we update 𝐰 , note that:

𝔼 q ( 𝐗 , 𝐙 ) [ log p ( 𝐗 , 𝐰 , 𝐙 , 𝐘 ) ] = 𝔼 q ( 𝐗 ) q ( 𝐙 ) [ log p ( 𝐰 ) + i log p ( z i | 𝐱 i , 𝐰 ) ] + const = 𝔼 q ( 𝐗 ) q ( 𝐙 ) [ log p ( 𝐰 ) i ( z i 𝐰 T 𝐱 i ) 2 2 ] + const .

Therefore the variational distribution for 𝐰 should better be a Gaussian, whose update takes the form:

Λ 𝐰 Λ 𝐰 + i 𝐱 i 𝐱 i T ,

μ 𝐰 Λ 𝐰 1 ( Λ 𝐰 μ 𝐰 + i 𝔼 [ z i ] 𝔼 [ 𝐱 i ] ) .

Finally, we update for z i :

𝔼 q ( 𝐗 , 𝐰 ) [ log p ( 𝐱 i , 𝐰 , z i , y i ) ] = 𝔼 q ( 𝐱 i ) q ( 𝐰 ) [ ( z i 𝐰 T 𝐱 i ) 2 2 ] + const ,

moreover, the domain for z i is confined in ( 0 , ) if y i = 1 and ( , 0 ] otherwise. Therefore the variational distribution for z i is truncated, yet it has a quadratic form w.r.t. z i in its exponential, hence it is a truncated Gaussian as what has to be proven. The variance for this distribution (if not truncated) is uniformly unity, and its mean is:

𝔼 [ 𝐰 ] T 𝔼 [ 𝐱 i ] .

All three update steps form a fixed-point fashion learning.

User profile picture
2021-03-24 13:42
Comments