Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 3.10

Exercise 3.10

Answers

(a) If $η = 1$ , then the SGD algorithm will update the $w$ by: $w (t + 1) = w (t) - η \nabla e_{n} (w) = w (t) - \nabla e_{n} (w)$ .

When $e_{n} (w) = \max (0, - y_{n} w^{T} x_{n})$ , the derivative of $e_{n} (w)$ when $y_{n} w^{T} x_{n} 0$ (when the sample is correctly classified) is zero, the derivative is $- y_{n} x_{n}$ when $y_{n} w^{T} x_{n} 0$ (i.e. when the sample is misclassified).

Take the derivatives into the SGD update equation, we see that’s exactly PLA.

(b) For logistic regression, we have $\nabla e_{n} (w) = \frac{- y_{n} x_{n}}{1 + e^{y_{n} w^{T} x_{n}}}$ . If $w$ is very large: * When $y_{n} w^{T} x_{n} 0$ , $\nabla e_{n} (w) \approx 0$ . * When $y_{n} w^{T} x_{n} \leq 0$ , $\nabla e_{n} (w) \approx - y_{n} x_{n}$ .

The above results are consistent with the values used in PLA.

This is another indication that the logistic regression weights can be used as a good approximation for classification.

niuers

2021-12-07 22:22

Exercise 3.10

Answers

Comments

Add answer