Exercise 7.12

Answers

"More data is better" applies to a fixed model (H,A). However when we are doing early stopping, we are selecting models on a nested hypothesis sets H1 H1 determined by Dtrain, that’s because at each step, the w1 is selected by choosing the one with minimal in-sample error. If we use the full data D, the w1,w2, will be different and as a result, the hypothesis sets will change even if we keep the step size η the same.

That’s why the ’more data is better’ doesn’t apply here.

User profile picture
2021-12-08 09:55
Comments