Exercise 7.14

Answers

Take derivative of E(w) w.r.t. w, we have E(w) = (QT + Q)(w w) = 2Q(w w) = 2Qw when w = 0.

The weights that minimize E(w) is w where E(w) = 0. From the calculated E(w), the gradient descent at w = 0 moves in the same direction as w, i.e. E(w) = Qw.

The direction should be the same as w, we need normalize E(w) to get a unit direction and the size of the step is controlled by the step size. Otherwise, we may overshoot, even if we are moving in the right direction.

User profile picture
2021-12-08 09:56
Comments