Exercise 3.10

Answers

(a) If η = 1, then the SGD algorithm will update the w by: w(t + 1) = w(t) ηen(w) = w(t) en(w).

When en(w) = max (0,ynwTxn), the derivative of en(w) when ynwTxn0 (when the sample is correctly classified) is zero, the derivative is ynxn when ynwTxn0 (i.e. when the sample is misclassified).

Take the derivatives into the SGD update equation, we see that’s exactly PLA.

(b) For logistic regression, we have en(w) = ynxn 1+eynwTxn. If w is very large: * When ynwTxn0, en(w) 0. * When ynwTxn 0, en(w) ynxn.

The above results are consistent with the values used in PLA.

This is another indication that the logistic regression weights can be used as a good approximation for classification.

User profile picture
2021-12-07 22:22
Comments