Exercise 6.7

Answers

(a)
If S is not training set consistent, and if x is a point which is not training set consistent, so we have gS(x)gD(x). According to CNN heuristic, we can find the nearest data point to x which is not in S and has class gD(x). We need to prove that such a point exists.

We first fix the number of nearest neighbours to k, and consider the neighbourhood of NS in S and ND in D, both NS and ND have the same number of points, i.e. k.

We say NS and ND are not equal to each other, otherwise, x won’t be an inconsistent point. So there must be some point that is in ND but not in NS.

Now we say that among these points that not in S, there must be a point x such that gD(x) = gD(x).

If no such point exists, then all the points in ND but not in NS will have the class gS(x). Suppose the intersection of ND and NS has m points, then k m points in ND will have class gS(x), when it combines with the intersection of m points, we will definitely find that the class of x in D to be gD(x) = gS(x), because NS is composed of k m points plus the same m points, but gives x a class of gS(x) without requiring the k m points to be all in class gS(x).

This is the contradiction, so we proved that the CNN heuristic will surely find a point x that is not in S and has class gD(x)

(b)
Since the set S is a subset of D, so for any of the k nearest neighbors to x from D, i.e. ND, if it belongs to S, it must belong to the neighbord from S as well, i.e. NS. So the new nearest neighbors added to S, will become neighbors of x in NS.
(c)
Since there are total N points, from problem (a), each step will add at most 1 point into the original k points in S if S is still not training set consistent, and also the N points is the set of D, which automatically satisfies the training set consistency. So after at most N k iterations, the CNN heuristic must terminate with a training set consistent S.
User profile picture
2021-12-08 09:41
Comments