Exercise 6.1 - Pessimism of LOOCV

Answers

Under the setting in this problem, the best classification method can achieve is the accuracy of 50%, since the features are independent of the labels. This is, however, the idealistic situation where we assume that

N ,

where N = N 1 = N 2 . When N is finite, the classification accuracy usually fluctuates around 50%. For the label to be randomly assigned, we should have that for an arbitrary (probabilistic) algorithm/machine that terminates within a time complexity a polynomial of N i , its classification accuracy should not deviate from 50% larger than 1 p ( N ) , in which p is an arbitrary polynomial function. The machine in this case is referred to as a probabilistic polynomial time (PPT) machine and is of remarkable interest to cryptologists. To generate cheap pseudo-random numbers efficiently is one of the most important tasks for cryptography.

For LOOCV, if a sample for class one is omitted, then the classification probability is:

( N 1 2 N 1 , N 2 N 1 ) .

Hence the probability that the omitted sample is correctly labelled is:

N 1 2 N 1 1 2 ,

justifies its pessimism in this setting.

The dependency between features and labels paves the way for decoupling features or security/forensics in machine learning. However, it is impossible to justify that some dataset cannot be learned by any machines, whose converse side is extremely simple.

User profile picture
2021-03-24 13:42
Comments