Exercise 6.9

Answers

(a)
e(f(x)) = P[f(x)y] = c=1CP[y = c,f(x)c], since f(x) = argmaxcπc(x), suppose f(x) = m, where πm(x) = max cπc(x), so in the error term, as long as cim, we’ll have f(x)c, so we have

e(f(x)) = c=1CP[y = c,f(x)c] = cmP[y = c] = cmπc(x) = 1πm(x) = η(x)

(b)
Suppose the nearest neighbor for x is x[1]

e(gN(x)) = P[gN(x)y] = c=1CP[y = c,g N(x)c] = c=1CP[y = c,y [1]c] = c=1CP[y = c]P[y [1]c] = c=1Cπ c(x)(1 πc(x[1])) = c=1Cπ c(x) πc(x)πc(x[1]) = c=1Cπ c(x) πc2(x) + π c2(x) π c(x)πc(x[1]) = c=1Cπ c(x)(1 πc(x)) + πc(x)(πc(x) πc(x[1])) = c=1Cπ c(x)(1 πc(x)) + c=1Cπ c(x)(πc(x) πc(x[1])) = c=1Cπ c(x)(1 πc(x)) + 𝜖N(x)

Observe that

|𝜖N(x)| = | c=1Cπ c(x)(πc(x) πc(x[1]))| c=1C|π c(x)(πc(x) πc(x[1]))| c=1Cπ c(x)|(πc(x) πc(x[1]))| c=1Cπ c(x)max c|(πc(x) πc(x[1]))| = max c|(πc(x) πc(x[1]))| c=1Cπ c(x) = max c|(πc(x) πc(x[1]))|

When N , we expect 𝜖N(x) 0 because when the data set gets very large, every point x has a nearest neighbor that is close by. That is, we have x[1](x) x for all x. This is the case if P(x) has bounded support.

By the continuity of π(x), This indicates that πc(x[1]) πc(x) and since |𝜖N(x)||max c|(πc(x) πc(x[1]))||, it follows that 𝜖N(x) 0.

So we conclude that e(gN(x)) c=1Cπc(x)(1 πc(x)) when N with high probability according to the law of large number.

(c)
We first prove that for C numbers of ai,i = 1,,C, we have C ai2 ( ai)2. This can be proved by notice that for ai, we always have

i=1C j=1C(a i aj)2 0 i=1C (Ca i2 + j=1Ca j2 j=1C2a iaj) 0 2C i=1Ca i2 2 i=1Ca i j=1Ca j 0 C i=1Ca i2 ( i=1Ca i)2

Since we also have ai = 1, apply above inequality to a2,a3,,aC, we have

(C 1) i1ai2 ( i1ai)2, add (C 1)a12 to both sides we have

(C 1) ai2 (C 1)a12 + ( i1ai)2 = (C 1)a12 + (1 a1)2. Divide both sides by C 1, we have

ai2 a12 + (1a1)2 C1 .

Now take expectation w.r.t. x on gN(x), we have

Eout(gN(x)) = Eout[ c=1Cπ c(x) (1 πc(x))] + Eout[𝜖N(x)] = Eout[ c=1C(π c(x) πc2(x))] + E out[𝜖N(x)] = Eout[1 c=1Cπ c2(x)] + E out[𝜖N(x)]

Apply the above inequality, we have

Eout(gN(x)) = Eout[1 c=1Cπ c2(x)] + E out[𝜖N(x)] Eout[1 (π12(x) + (1 π1(x))2 C 1 )] + Eout[𝜖N(x)]

From problem (a), we have η(x) = 1 π1(x), take this into above inequality, we have

Eout(gN(x)) Eout[1 (π12(x) + (1 π1(x))2 C 1 )] + Eout[𝜖N(x)] = Eout[1 (1 η(x))2 η(x)2 C 1] + Eout[𝜖N(x)] = Eout[2η(x) (x)2 C 1 ] + Eout[𝜖N(x)] = 2Eout C C 1(Eout[η(x)2]) + E out[𝜖N(x)] 2Eout C C 1(Eout)2 + E out[𝜖N(x)]

Where Eout = Eout[η(x)] and we have used Eout[η2(x)] (Eout[η(x)])2 = (Eout)2

As N , we have Eout[𝜖N(x)] 0, so

Eout(gN(x)) 2Eout C C1(Eout)2

This demonstrates that for multiclass problem, the simple nearest neighbor is at most a factor of 2 from optimal.

User profile picture
2021-12-08 09:45
Comments