Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 4.7

Exercise 4.7

Answers

(a)

Note that the expectation w.r.t.

D_{val}

is equivalent to

x

because the

y

are assumed to be generated by a true

f (x)

\begin{array}{l} σ_{val}^{2} & = V a r_{D_{val}} [E_{val} (g^{−})] \\ = V a r_{D_{val}} [\frac{1}{K} \sum_{x_{n} \in D_{val}} e (g^{−} (x_{n}), y_{n})] \\ = \frac{1}{K^{2}} [V a r_{D_{val}} \sum_{x_{n} \in D_{val}} e (g^{−} (x_{n}), y_{n})] \\ = \frac{1}{K^{2}} [\sum_{x_{n} \in D_{val}} V a r_{D_{val}} [e (g^{−} (x_{n}), y_{n})]] \\ = \frac{1}{K^{2}} [\sum_{x_{n} \in D_{val}} V a r_{x} [e (g^{−} (x_{n}), y_{n})]] \\ = \frac{1}{K^{2}} [\sum_{x_{n} \in D_{val}} σ^{2} (g^{−})] \\ = \frac{1}{K} σ^{2} (g^{−}) \end{array}

(b)

In classification problem,

e (g^{−} (x), y) = 1 (g^{−} (x) \neq y)

. We have

\begin{array}{l} E_{x} [e (g^{−} (x), y)] & = P (g^{−} (x) \neq y) \times 1 + P (g^{−} (x) = y) \times 0 \\ = P (g^{−} (x) \neq y) \end{array}

So the variance is:

\begin{array}{l} σ^{2} (g^{−}) & = V a r_{x} [e (g^{−} (x), y)] \\ = E_{x} [{(e - E_{x} [e])}^{2}] \\ = P (g^{−} (x) \neq y) [{(1 - E_{x} [e])}^{2}] + (1 - P (g^{−} (x) \neq y)) [{(0 - E_{x} [e])}^{2}] \\ = P {(1 - P)}^{2} + (1 - P) P^{2} \\ = P (1 - P) \end{array}

(c)

In the end

\begin{array}{l} σ_{val}^{2} & = \frac{1}{K} σ^{2} (g^{−}) \\ = \frac{P (1 - P)}{K} \\ = \frac{- {(P - 0.5)}^{2} + 0.25}{K} \\ \leq \frac{1}{4 K} \end{array}

(d)

The squared error

e (g^{−} (x), y)

is unbounded. The variance of it is also unbounded. So there’s no uniform upper bound for

V a r_{D_{val}} [E_{val} (g^{−})] = \frac{1}{K} σ^{2} (g^{−})

(e)

For regression with squared error, if we train using fewer points (smaller

N - K

) to get

g^{−}

, then the resulting

g^{−}

will be worse, the expectation of the squared error

E [e (g^{−} (x), y)]

becomes larger. For continuous, non-negative random variables, higher mean often implies higher variance, so

σ^{2} (g^{−})

will be higher.

(f)

When we increasing the size of validation set

K

, the error between

E_{val} (g^{−})

and

E_{out} (g^{−})

\frac{σ (g^{−})}{\sqrt{K}}

. It can drop in the case of classification. But for regression, it depends on which of

σ (g^{−})

K

increases faster, so the

E_{val} (g^{−})

as an estimate of

E_{out}

can become worse or better.

Does it mean for classification the estimate will always become better when we increase the K?

But note, the $E_{out} (g^{−})$ is only for the hypothesis $g^{−}$ , which can be pretty bad when $K$ is large. So for classification problem, even the error between $E_{out} (g^{−})$ and $E_{val} (g^{−})$ goes to zero, but the $E_{out} (g^{−})$ can be quite large.

niuers

2021-12-08 09:30

Exercise 4.7

Answers

Comments

Add answer