Exercise 21.7 - Forwards vs reverse KL divergence

Answers

To formulate $𝕂𝕃 (p ∥ q)$ with $q (x, y) = q (x) q (y)$ , we have:

\begin{aligned} 𝕂𝕃 (p (x, y) | | q (x, y)) = & 𝔼_{p (x, y)} [\log \frac{p (x, y)}{q (x, y)}] \\ = & \sum_{x, y} p (x, y) \log p (x, y) - \sum_{x, y} p (x, y) \log q (x) - \sum_{x, y} p (x, y) \log q (y) \\ = & \sum_{x, y} p (x, y) \log p (x, y) - \sum_{x} (\sum_{y} p (x, y)) \log q (x) - \sum y (\sum_{x} p (x, y)) \log q (q) \\ = & const + 𝕂𝕃 (p (x) ∥ q (x)) + 𝕂𝕃 (p (y) ∥ q (y)) . \end{aligned}

The non-negativity of KL divergence implies that we should set $q (x) = p (x)$ and $q (y) = p (y)$ .

For the tabular case, let $q = {q_{𝑖𝑗}}_{i, j = 1}^{4}$ be the variational distribution s.t.:

q (x, y) = q_{x} (x) \cdot q_{y} (y),

with six free parameters: $q_{x} (1)$ , $q_{x} (2)$ , $q_{x} (3)$ , $q_{y} (1)$ , $q_{y} (2)$ and $q_{y} (3)$ . The reverse KL is:

𝕂𝕃 (q ∥ p) = \sum_{x, y} q (x, y) \log p (x, y) - H_{q} (x, y) .

Since the decomposition assumes the independence between $x$ and $y$ , we have:

H_{q} (x, y) = H_{q} (x) + H_{q} (y) .

Meanwhile, the symmetry within the table indicates that the roles of $q_{x} (1)$ and $q_{x} (2)$ are interchangable, so are $q_{x} (3)$ and $q_{x} (4)$ , $q_{y} (1)$ and $q_{y} (2)$ , $q_{y} (3)$ and $q_{y} (4)$ . This observation, together with the property of the entropy (and the Cauchy inequality), implies that $q_{x}$ takes the form:

(α, α, \frac{1}{2} - α, \frac{1}{2} - α), α \in [0, \frac{1}{2}] .

So is $q_{y}$ , hence the KL takes the value:

\min_{α \in [0, \frac{1}{2}]} {- 16 α^{2} + 4 α - 2 \cdot H (α, α, \frac{1}{2} - α, \frac{1}{2} - α)},

where $α$ takes some value between $\frac{1}{8}$ and $\frac{1}{4}$ . In this case the minimum is not necessarily unique since the function to be minimized is the sum of a convex and a concave function w.r.t. $α$ .

solour_lfq

2021-03-24 13:42

Exercise 21.7 - Forwards vs reverse KL divergence

Answers

Comments

Add answer