Exercise 21.7 - Forwards vs reverse KL divergence

Answers

To formulate 𝕂𝕃 ( p q ) with q ( x , y ) = q ( x ) q ( y ) , we have:

𝕂𝕃 ( p ( x , y ) | | q ( x , y ) ) = 𝔼 p ( x , y ) [ log p ( x , y ) q ( x , y ) ] = x , y p ( x , y ) log p ( x , y ) x , y p ( x , y ) log q ( x ) x , y p ( x , y ) log q ( y ) = x , y p ( x , y ) log p ( x , y ) x ( y p ( x , y ) ) log q ( x ) y ( x p ( x , y ) ) log q ( q ) = const + 𝕂𝕃 ( p ( x ) q ( x ) ) + 𝕂𝕃 ( p ( y ) q ( y ) ) .

The non-negativity of KL divergence implies that we should set q ( x ) = p ( x ) and q ( y ) = p ( y ) .

For the tabular case, let q = { q 𝑖𝑗 } i , j = 1 4 be the variational distribution s.t.:

q ( x , y ) = q x ( x ) q y ( y ) ,

with six free parameters: q x ( 1 ) , q x ( 2 ) , q x ( 3 ) , q y ( 1 ) , q y ( 2 ) and q y ( 3 ) . The reverse KL is:

𝕂𝕃 ( q p ) = x , y q ( x , y ) log p ( x , y ) H q ( x , y ) .

Since the decomposition assumes the independence between x and y , we have:

H q ( x , y ) = H q ( x ) + H q ( y ) .

Meanwhile, the symmetry within the table indicates that the roles of q x ( 1 ) and q x ( 2 ) are interchangable, so are q x ( 3 ) and q x ( 4 ) , q y ( 1 ) and q y ( 2 ) , q y ( 3 ) and q y ( 4 ) . This observation, together with the property of the entropy (and the Cauchy inequality), implies that q x takes the form:

( α , α , 1 2 α , 1 2 α ) , α [ 0 , 1 2 ] .

So is q y , hence the KL takes the value:

min α [ 0 , 1 2 ] { 16 α 2 + 4 α 2 H ( α , α , 1 2 α , 1 2 α ) } ,

where α takes some value between 1 8 and 1 4 . In this case the minimum is not necessarily unique since the function to be minimized is the sum of a convex and a concave function w.r.t. α .

User profile picture
2021-03-24 13:42
Comments