Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 6.5

Exercise 6.5

Answers

(a)

Since we are selecting hypothesis from a fixed set of

H_{train}

, there are

N - K

hypotheses and the validation data set is the ’input data set’, which has a size of

K

. We apply the generalization bound equation (2.1), and for any

g_{k}^{−}, k = 1, 2, \dots, N - K

, we have

$E_{out} (g_{k}^{−}) \leq E_{val} (g_{k}^{−}) + \sqrt{\frac{1}{2 K} \ln \frac{2 (N - K)}{δ}}$

If we assume $\frac{K}{\log (N - K)} \to \infty$ , then we have $E_{out} (g_{k}^{−}) \approx E_{val} (g_{k}^{−})$

Since $g^{−}$ is the hypothesis with minimum validation error $E_{val} (g^{−})$ , so we have

$E_{out} (g^{−}) \approx E_{val} (g^{−}) \leq E_{val} (g_{∗}^{−}) \approx E_{out} (g_{∗}^{−})$

On the other hand, $g_{∗}^{−}$ minimizes $E_{out}$ , so we always have $E_{out} (g_{∗}^{−}) \leq E_{out} (g^{−})$

Compare the two inequalities, we conclude $E_{out} (g^{−}) \approx E_{out} (g_{∗}^{−})$ .

(b)

N - K \to \infty

, according to Theorem 6.2, we can find a

k (N - K)

, such that

k (N - K) \to \infty

and

\frac{k (N - K)}{N - K} \to 0

, then we know that

E_{in} (g_{k}^{−}) \to E_{out} (g_{k}^{−})

and

E_{out} (g_{k}^{−}) \to E_{out}^{∗}

Since $E_{out}^{∗}$ is the optimal out-of-sample error we can ever achieve, so we know that $E_{out} (g_{k}^{−}) \approx E_{out} (g_{∗}^{−})$ , by problem (a), we thus conclude $E_{out} (g^{−}) \approx E_{out}^{∗}$

(c)

If we used the

k^{−} - NN

rule on the full data set

D

, we would see performance improvement because the learning curve tells us we should use more data to achieve better performance.

niuers

2021-12-08 09:39

Exercise 6.5

Answers

Comments

Add answer