Exercise 5.7 - Bayes model averaging helps predictive accuracy

Answers

Suppose the variable Δ is generated from a mixture of models, so:

p ( Δ | 𝒟 ) = m M p ( Δ | m , 𝒟 ) p ( m | 𝒟 ) .

The Bayes model averaging (BMA) result is just:

p BMA ( Δ ) = m M p ( Δ | m , 𝒟 ) p ( m | 𝒟 ) .

While that from an individual model m M is:

p m ( Δ ) = p ( Δ | m , 𝒟 ) .

The expected loss of the BMA result is:

𝔼 p ( Δ ) [ log p BMA ( Δ ) ] ,

while that of model m is:

𝔼 p ( Δ ) [ log p m ( Δ ) ] .

Now it is easy to see that:

𝔼 p ( Δ ) [ log p m ( Δ ) ] 𝔼 p ( Δ ) [ log p BMA ( Δ ) ] = 𝕂𝕃 ( p BMA ( Δ ) | | p m ( Δ ) ) ,

since the distribution on which the expectation is computed is just p BMA ( Δ ) . Therefore the non-negativity of the KL divergence yields (5.127).

The conclusion from this exercise is of hardly any practical significance. Since the underlying distribution is usually intractable, even with the mixture Bayes model. Once the form of the latent distribution is revealed, it is obvious that other distributions result in higher loss.

User profile picture
2021-03-24 13:42
Comments