Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 7.11
Exercise 7.11
Answers
Take derivative w.r.t. in the second term of , we have its derivative equals to
This proves the equation.
We use the ratio of gradient versus weight to check the rate of decay.
From the derivative, we check the ratio of the second term to the weight , and we have , which achieves maximum value of 1 when . So the smaller the weight, the larger the decay w.r.t. itself.
This indicates that small weights decay much faster than large ones.