Exercise 4.7

Answers

(a)
Note that the expectation w.r.t. Dval is equivalent to x because the y are assumed to be generated by a true f(x).

σval2 = V ar Dval [Eval(g)] = V arDval [ 1 K xnDvale (g(x n),yn)] = 1 K2 [V arDval xnDvale (g(x n),yn)] = 1 K2 [ xnDvalV arDval [e (g(x n),yn)]] = 1 K2 [ xnDvalV arx [e (g(x n),yn)]] = 1 K2 [ xnDvalσ2(g)] = 1 Kσ2(g)
(b)
In classification problem, e (g(x),y) = 1(g(x)y). We have

Ex [e (g(x),y)] = P(g(x)y) × 1 + P(g(x) = y) × 0 = P(g(x)y)

So the variance is:

σ2(g) = V ar x [e (g(x),y)] = Ex [ (e Ex[e])2] = P(g(x)y) [(1 E x[e])2] + (1 P(g(x)y)) [(0 E x[e])2] = P(1 P)2 + (1 P)P2 = P(1 P)
(c)
In the end

σval2 = 1 Kσ2(g) = P(1 P) K = (P 0.5)2 + 0.25 K 1 4K
(d)
The squared error e (g(x),y) is unbounded. The variance of it is also unbounded. So there’s no uniform upper bound for V arDval [Eval(g)] = 1 Kσ2(g).
(e)
For regression with squared error, if we train using fewer points (smaller N K) to get g, then the resulting g will be worse, the expectation of the squared error E [e (g(x),y)] becomes larger. For continuous, non-negative random variables, higher mean often implies higher variance, so σ2(g) will be higher.
(f)
When we increasing the size of validation set K, the error between Eval(g) and Eout(g) is σ(g) K . It can drop in the case of classification. But for regression, it depends on which of σ(g) or K increases faster, so the Eval(g) as an estimate of Eout can become worse or better.

Does it mean for classification the estimate will always become better when we increase the K?

But note, the Eout(g) is only for the hypothesis g, which can be pretty bad when K is large. So for classification problem, even the error between Eout(g) and Eval(g) goes to zero, but the Eout(g) can be quite large.

User profile picture
2021-12-08 09:30
Comments