Exercise 21.10 - VB for binary FA with probit link

Answers

To attack this question, we assume the following variational distribution:

q (𝐗, 𝐙, 𝐖) = (\prod_{l = 1}^{L} q (𝐰_{l})) \cdot (\prod_{i = 1}^{N} q (𝐱_{i}) q (z_{i})) .

Since we have three independent family of variables to be estimated, this pseudo VB proceduce consists of three steps (not simple an E-step and an M-step). We assume $L = 1$ w.l.o.g.

For the first step, we update for $𝐱_{i}$ , this is done by collecting terms relevent to $𝐱_{i}$ from the expectation of the log likelihood w.r.t. $q (𝐰, 𝐙)$ :

\begin{aligned} 𝔼_{q (𝐰, 𝐙)} [\log p (𝐱_{i}, 𝐰, z_{i}, y_{i})] \\ = & 𝔼_{q (𝐰) q (z_{i})} [\log p (𝐱_{i}) + \log p (z_{i} | 𝐱_{i}, 𝐰)] + const \\ = & 𝔼_{q (𝐰) q (z_{i})} [\log p (𝐱_{i}) - \frac{{(z_{i} - 𝐰^{T} 𝐱_{i})}^{2}}{2}] + const . \end{aligned}

From which we observe that $q (𝐱_{i})$ takes the form of a Gaussian, the update for its precision is:

Λ_{𝐱_{i}} \to Λ_{𝐱_{i}} + 𝔼_{q (𝐰)} [𝐰 𝐰^{T}] .

For its mean:

μ_{𝐱_{i}} \to Λ_{𝐱_{i}}^{- 1} (Λ_{𝐱_{i}} μ_{𝐱_{i}} + 𝔼 [𝐰] 𝔼 [z_{i}]) .

In the second step, we update $𝐰$ , note that:

\begin{aligned} 𝔼_{q (𝐗, 𝐙)} [\log p (𝐗, 𝐰, 𝐙, 𝐘)] \\ = & 𝔼_{q (𝐗) q (𝐙)} [\log p (𝐰) + \sum_{i} \log p (z_{i} | 𝐱_{i}, 𝐰)] + const \\ = & 𝔼_{q (𝐗) q (𝐙)} [\log p (𝐰) - \sum_{i} \frac{{(z_{i} - 𝐰^{T} 𝐱_{i})}^{2}}{2}] + const . \end{aligned}

Therefore the variational distribution for $𝐰$ should better be a Gaussian, whose update takes the form:

Λ_{𝐰} \to Λ_{𝐰} + \sum_{i} 𝐱_{i} 𝐱_{i}^{T},

μ_{𝐰} \to Λ_{𝐰}^{- 1} (Λ_{𝐰} μ_{𝐰} + \sum_{i} 𝔼 [z_{i}] 𝔼 [𝐱_{i}]) .

Finally, we update for $z_{i}$ :

\begin{aligned} 𝔼_{q (𝐗, 𝐰)} [\log p (𝐱_{i}, 𝐰, z_{i}, y_{i})] \\ = & 𝔼_{q (𝐱_{i}) q (𝐰)} [- \frac{{(z_{i} - 𝐰^{T} 𝐱_{i})}^{2}}{2}] + const, \end{aligned}

moreover, the domain for $z_{i}$ is confined in $(0, \infty)$ if $y_{i} = 1$ and $(- \infty, 0]$ otherwise. Therefore the variational distribution for $z_{i}$ is truncated, yet it has a quadratic form w.r.t. $z_{i}$ in its exponential, hence it is a truncated Gaussian as what has to be proven. The variance for this distribution (if not truncated) is uniformly unity, and its mean is:

𝔼 {[𝐰]}^{T} 𝔼 [𝐱_{i}] .

All three update steps form a fixed-point fashion learning.

solour_lfq

2021-03-24 13:42

Exercise 21.10 - VB for binary FA with probit link

Answers

Comments

Add answer