Solution to Exercise 6.9 from „Learning from Data“

e (f (x)) = P [f (x) \neq y] = \sum_{c = 1}^{C} P [y = c, f (x) \neq c]

, since

f (x) = {argmax}_{c} π_{c} (x)

, suppose

f (x) = m

, where

π_{m} (x) = \max_{c} π_{c} (x)

, so in the error term, as long as

c_{i} \neq m

, we’ll have

f (x) \neq c

, so we have

$e (f (x)) = \sum_{c = 1}^{C} P [y = c, f (x) \neq c] = \sum_{c \neq m} P [y = c] = \sum_{c \neq m} π_{c} (x) = 1 - π_{m} (x) = η (x)$

Suppose the nearest neighbor for

x

is

x_{[1]}

\begin{array}{l} e (g_{N} (x)) & = P [g_{N} (x) \neq y] \\ = \sum_{c = 1}^{C} P [y = c, g_{N} (x) \neq c] \\ = \sum_{c = 1}^{C} P [y = c, y_{[1]} \neq c] \\ = \sum_{c = 1}^{C} P [y = c] P [y_{[1]} \neq c] \\ = \sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x_{[1]})) \\ = \sum_{c = 1}^{C} π_{c} (x) - π_{c} (x) π_{c} (x_{[1]}) \\ = \sum_{c = 1}^{C} π_{c} (x) - π_{c}^{2} (x) + π_{c}^{2} (x) - π_{c} (x) π_{c} (x_{[1]}) \\ = \sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x)) + π_{c} (x) (π_{c} (x) - π_{c} (x_{[1]})) \\ = \sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x)) + \sum_{c = 1}^{C} π_{c} (x) (π_{c} (x) - π_{c} (x_{[1]})) \\ = \sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x)) + 𝜖_{N} (x) \end{array}

Observe that

\begin{array}{l} | 𝜖_{N} (x) | & = | \sum_{c = 1}^{C} π_{c} (x) (π_{c} (x) - π_{c} (x_{[1]})) | \\ \leq \sum_{c = 1}^{C} | π_{c} (x) (π_{c} (x) - π_{c} (x_{[1]})) | \\ \leq \sum_{c = 1}^{C} π_{c} (x) | (π_{c} (x) - π_{c} (x_{[1]})) | \\ \leq \sum_{c = 1}^{C} π_{c} (x) \max_{c} | (π_{c} (x) - π_{c} (x_{[1]})) | \\ = \max_{c} | (π_{c} (x) - π_{c} (x_{[1]})) | \sum_{c = 1}^{C} π_{c} (x) \\ = \max_{c} | (π_{c} (x) - π_{c} (x_{[1]})) | \end{array}

When $N \to \infty$ , we expect $𝜖_{N} (x) \to 0$ because when the data set gets very large, every point $x$ has a nearest neighbor that is close by. That is, we have $x_{[1]} (x) \to x$ for all $x$ . This is the case if $P (x)$ has bounded support.

By the continuity of $π (x)$ , This indicates that $π_{c} (x_{[1]}) \to π_{c} (x)$ and since $| 𝜖_{N} (x) | \leq | \max_{c} | (π_{c} (x) - π_{c} (x_{[1]})) | |$ , it follows that $𝜖_{N} (x) \to 0$ .

So we conclude that $e (g_{N} (x)) \to \sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x))$ when $N \to \infty$ with high probability according to the law of large number.

We first prove that for

C

numbers of

a_{i}, i = 1, \dots, C

, we have

C \sum a_{i}^{2} \geq {(\sum a_{i})}^{2}

. This can be proved by notice that for

a_{i}

, we always have

\begin{array}{l} \sum_{i = 1}^{C} \sum_{j = 1}^{C} {(a_{i} - a_{j})}^{2} & \geq 0 \\ \sum_{i = 1}^{C} (C a_{i}^{2} + \sum_{j = 1}^{C} a_{j}^{2} - \sum_{j = 1}^{C} 2 a_{i} a_{j}) & \geq 0 \\ 2 C \sum_{i = 1}^{C} a_{i}^{2} - 2 \sum_{i = 1}^{C} a_{i} \sum_{j = 1}^{C} a_{j} & \geq 0 \\ C \sum_{i = 1}^{C} a_{i}^{2} & \geq {(\sum_{i = 1}^{C} a_{i})}^{2} \end{array}

Since we also have $\sum a_{i} = 1$ , apply above inequality to $a_{2}, a_{3}, \dots, a_{C}$ , we have

$(C - 1) \sum_{i \neq 1} a_{i}^{2} \geq {(\sum_{i \neq 1} a_{i})}^{2}$ , add $(C - 1) a_{1}^{2}$ to both sides we have

$(C - 1) \sum a_{i}^{2} \geq (C - 1) a_{1}^{2} + {(\sum_{i \neq 1} a_{i})}^{2} = (C - 1) a_{1}^{2} + {(1 - a_{1})}^{2}$ . Divide both sides by $C - 1$ , we have

$\sum a_{i}^{2} \geq a_{1}^{2} + \frac{{(1 - a_{1})}^{2}}{C - 1}$ .

Now take expectation w.r.t. $x$ on $g_{N} (x)$ , we have

\begin{array}{l} E_{out} (g_{N} (x)) & = E_{out} [\sum_{c = 1}^{C} π_{c} (x) (1 - π_{c} (x))] + E_{out} [𝜖_{N} (x)] \\ = E_{out} [\sum_{c = 1}^{C} (π_{c} (x) - π_{c}^{2} (x))] + E_{out} [𝜖_{N} (x)] \\ = E_{out} [1 - \sum_{c = 1}^{C} π_{c}^{2} (x)] + E_{out} [𝜖_{N} (x)] \end{array}

Apply the above inequality, we have

\begin{array}{l} E_{out} (g_{N} (x)) & = E_{out} [1 - \sum_{c = 1}^{C} π_{c}^{2} (x)] + E_{out} [𝜖_{N} (x)] \\ \leq E_{out} [1 - (π_{1}^{2} (x) + \frac{{(1 - π_{1} (x))}^{2}}{C - 1})] + E_{out} [𝜖_{N} (x)] \end{array}

From problem (a), we have $η (x) = 1 - π_{1} (x)$ , take this into above inequality, we have

\begin{array}{l} E_{out} (g_{N} (x)) & \leq E_{out} [1 - (π_{1}^{2} (x) + \frac{{(1 - π_{1} (x))}^{2}}{C - 1})] + E_{out} [𝜖_{N} (x)] \\ = E_{out} [1 - {(1 - η (x))}^{2} - \frac{η {(x)}^{2}}{C - 1}] + E_{out} [𝜖_{N} (x)] \\ = E_{out} [2 η (x) - \frac{Cη {(x)}^{2}}{C - 1}] + E_{out} [𝜖_{N} (x)] \\ = 2 E_{out}^{∗} - \frac{C}{C - 1} (E_{out} [η {(x)}^{2}]) + E_{out} [𝜖_{N} (x)] \\ \leq 2 E_{out}^{∗} - \frac{C}{C - 1} {(E_{out}^{∗})}^{2} + E_{out} [𝜖_{N} (x)] \end{array}

Where $E_{out}^{∗} = E_{out} [η (x)]$ and we have used $E_{out} [η^{2} (x)] \geq {(E_{out} [η (x)])}^{2} = {(E_{out}^{∗})}^{2}$

As $N \to \infty$ , we have $E_{out} [𝜖_{N} (x)] \to 0$ , so

$E_{out} (g_{N} (x)) \leq 2 E_{out}^{∗} - \frac{C}{C - 1} {(E_{out}^{∗})}^{2}$

This demonstrates that for multiclass problem, the simple nearest neighbor is at most a factor of 2 from optimal.

Exercise 6.9

Answers

Comments

Exercise 6.9

Answers

Comments

Add answer