Homepage › Solution manuals › Kevin P. Murphy › Machine Learning: a Probabilistic Perspective › Exercise 8.6 - Elementary properties of l2 regularized logistic regression
Exercise 8.6 - Elementary properties of l2 regularized logistic regression
Answers
For question (a), the Hessian of regularized negative log likelihood is:
where , following the derivation in exercise 8.3, is at least semi-positive definite. So the Hessian for this model is strictly (for non-trivial ) positive definite, there is a unique optimal solution. The answer is False.
For question (b), the result is not necessarily true. For a sparse optimum, one should resort to the LASSO model, where a Laplace prior is exerted on weights.
For question (c), if then the model reduces to ordinary logistic regression. If the dataset is linearly separable then there exists such that :
Now for an arbitrary number , the weights also meets the separation condition. Let justifies that the statement in (c) is True.
For question (d), the statement is False since the model now has to trade-off between fitness and prior knowledge. Concretely, we can prove the other statement: as we increase , the likelihood of the training dataset monotonically decreases.
Assume that minimize the (8.131) with , denoted by . Now increase to , we are now optimizing the loss:
whose optimal solution is denoted by .
If then we already have:
Otherwise, we would have:
If then we would have:
Finally, consider:
If then is not the optimum of and we arrive in a contradiction. Otherwise:
Hence we still have the optimality of fail. This finishes the proof.
For question (e), the statement is False. This can be easily shown by imagine .