Exercise 2.15 - MLE minimizes KL divergence to the empirical distribution

Answers

Expand the KL divergence:

\begin{align} 𝜃 = & \arg \min_{𝜃} {𝕂𝕃 (p_{emp} | | q (𝜃))} \\ = & \arg \min_{𝜃} {𝔼_{p_{emp}} [\log \frac{p_{emp}}{q (𝜃)}]} \\ = & \arg \min_{𝜃} {- H (p_{emp}) - \sum_{𝐱 \in dataset} (\log q (𝐱; 𝜃))} \\ = & \arg \max_{𝜃} {\sum_{𝐱 \in dataset} \log p (𝐱; 𝜃)} . \end{align}

We use the weak law of large numbers in the third step and drop the entropy of empirical distribution, which is independent of $𝜃$ , in the last step. The other direction of optimization is $\arg \min_{𝜃} {𝕂𝕃 (q (𝜃) | | p_{emp})}$ . It contains an expectation term w.r.t. $q (𝜃)$ and is harder to solve.

solour_lfq

2021-03-24 13:42

Exercise 2.15 - MLE minimizes KL divergence to the empirical distribution

Answers

Comments

Add answer