We are to optimize the following target w.r.t.
:
where:
(Recall the graphical structure of GMM model.
is the one-hot variable that encodes the belonging of sample
to the centroids.) When the base distribution
is Gaussian, consider the terms involving
and
in
first (adopting non-information prior):
Optimizing this target w.r.t.
and
is tantamount to optimizing the mean and covariance of a weighted Gaussian model, hence:
Setting it to zero yields:
Finally:
where
. Setting it to zero yields:
So far we have proven (11.114) and (11.115).