Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 8.14

Exercise 8.14

Answers

If we remove a data point $(x_{n}, y_{n})$ with $α_{n}^{∗} = 0$ , suppose the previous optimal solution is $α^{∗}$ .

(a)

Since

α^{∗}

is the optimal solution for the previous dual problem, It satisfies the constraints in (8.21). i.e.

α_{i} \geq 0

for

i = 1, \dots, N

. And

$\sum_{i = 1}^{N} y_{i} α_{i} = \sum_{i \neq n}^{N} y_{i} α_{i} = 0$ since $α_{n}^{∗} = 0$ . The second part is exactly the new constraint for the problem with $(x_{n}, y_{n})$ removed.

So the solution $α^{∗}$ (after removing $α_{n}^{∗}$ ) is feasible for the new dual problem.

(b)

If there’s another feasible solution (

α^{'}

) for the new dual and it has a lower objective value than

α^{∗}

. We construct a new solution for previous dual problem by adding

α_{n}^{∗} = 0

into

α^{'}

, i.e.

α^{c}

. It’s clear that

α^{c}

is a feasible solution for the previous dual problem.

From (8.21), the objective value of $α^{c}$ for previous dual problem is thus:

\begin{array}{l} V (α^{c}) & = \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} y_{i} y_{j} α_{i}^{c} α_{j}^{c} x_{i}^{T} x_{j} - \sum_{i = 1}^{N} α_{i}^{c} \\ = \frac{1}{2} \sum_{i \neq n}^{N} \sum_{j \neq n}^{N} y_{i} y_{j} α_{i}^{c} α_{j}^{c} x_{i}^{T} x_{j} - \sum_{i \neq n}^{N} α_{i}^{c} \\ < \frac{1}{2} \sum_{i \neq n}^{N} \sum_{j \neq n}^{N} y_{i} y_{j} α_{i}^{∗} α_{j}^{∗} x_{i}^{T} x_{j} - \sum_{i \neq n}^{N} α_{i}^{∗} \\ = V (α^{∗}) \end{array}

This contradicts the fact that $α^{∗}$ is the optimal solution for the previous problem. So we conclude there’s no other feasible solution for the new dual problem that has a lower objective value than $α^{∗}$ .

(c)

Hence we showed that

α^{∗}

(minus

α_{n}^{∗}

) is optimal for the new dual problem.

(d)

Since

w^{∗} = \sum_{i = 1}^{N} y_{i} α_{i}^{∗} x_{i} = \sum_{i \neq n}^{N} y_{i} α_{i}^{∗} x_{i}

w^{∗}

is the same as previous problem. Also

b^{∗}

is computed using a point where

α_{s}^{∗} > 0

, it’s not affected by

α_{n}^{∗}

as well. So we conclude that the optimal hyperplane doesn’t change.

(e)

As the final hypothesis is not changed when we throw out any data point with

α_{n}^{∗} = 0

, after we throw out all such points, we are left with data points that have

α_{n}^{∗} > 0

, thus we shows that

E_{CV} = \frac{1}{N} \sum_{n = 1}^{N} e_{n} \leq \frac{number of α_{n}^{∗} > 0}{N}

niuers

2021-12-08 10:13

Exercise 8.14

Answers

Comments

Add answer