Homepage › Solution manuals › Kevin P. Murphy › Machine Learning: a Probabilistic Perspective › Exercise 11.3 - EM for mixtures of Bernoullis
Exercise 11.3 - EM for mixtures of Bernoullis
Answers
For the mixture of Bernoullis model, consider bases, from which each is a Bernoulli distribution:
The auxiliary function, whom we are to optimize w.r.t. is:
Taking differential w.r.t.
set it to zero:
This is exactly (11.116) modules -reduction.
If a prior is introduced for each base then we introduce positive samples and negative samples into the computation, this is tantamount to setting for , so:
At this point one might wonder the necessity of introducing a mixture of Bernoullis. Unlike the mixture of Gaussians, that of Bernoullis seems less convincing. Let denotes the weighted average of base models:
then the variance of the mixture model remains . There is no need of using a mixture of Bernoullis (regardin prediction) unless we have to explicitly model a scenario in which there has to be a mixture structure. For example, if we were told that a binary string is generated from a set of unbalanced coins where each coin has different dynamics and we are asked to tell which coin generates some specific toss. But even this scenario might lead to abnormality, considering a coin that always yields head and another that always yields tail.