Homepage Solution manuals Kevin P. Murphy Machine Learning: a Probabilistic Perspective Exercise 2.15 - MLE minimizes KL divergence to the empirical distribution

Exercise 2.15 - MLE minimizes KL divergence to the empirical distribution

Answers

Expand the KL divergence:

𝜃 = arg min 𝜃 { 𝕂𝕃 ( p emp | | q ( 𝜃 ) ) } = arg min 𝜃 { 𝔼 p emp [ log p emp q ( 𝜃 ) ] } = arg min 𝜃 { H ( p emp ) 𝐱 dataset ( log q ( 𝐱 ; 𝜃 ) ) } = arg max 𝜃 { 𝐱 dataset log p ( 𝐱 ; 𝜃 ) } .

We use the weak law of large numbers in the third step and drop the entropy of empirical distribution, which is independent of 𝜃 , in the last step. The other direction of optimization is arg min 𝜃 { 𝕂𝕃 ( q ( 𝜃 ) | | p emp ) } . It contains an expectation term w.r.t. q ( 𝜃 ) and is harder to solve.

User profile picture
2021-03-24 13:42
Comments