Homepage › Solution manuals › Kevin P. Murphy › Machine Learning: a Probabilistic Perspective › Exercise 21.7 - Forwards vs reverse KL divergence
Exercise 21.7 - Forwards vs reverse KL divergence
Answers
To formulate with , we have:
The non-negativity of KL divergence implies that we should set and .
For the tabular case, let be the variational distribution s.t.:
with six free parameters: , , , , and . The reverse KL is:
Since the decomposition assumes the independence between and , we have:
Meanwhile, the symmetry within the table indicates that the roles of and are interchangable, so are and , and , and . This observation, together with the property of the entropy (and the Cauchy inequality), implies that takes the form:
So is , hence the KL takes the value:
where takes some value between and . In this case the minimum is not necessarily unique since the function to be minimized is the sum of a convex and a concave function w.r.t. .