Exercise 6.1 - Pessimism of LOOCV

Answers

Under the setting in this problem, the best classification method can achieve is the accuracy of 50%, since the features are independent of the labels. This is, however, the idealistic situation where we assume that

N \to \infty,

where $N = N_{1} = N_{2}$ . When $N$ is finite, the classification accuracy usually fluctuates around 50%. For the label to be randomly assigned, we should have that for an arbitrary (probabilistic) algorithm/machine that terminates within a time complexity a polynomial of $N_{i}$ , its classification accuracy should not deviate from 50% larger than $\frac{1}{p (N)}$ , in which $p$ is an arbitrary polynomial function. The machine in this case is referred to as a probabilistic polynomial time (PPT) machine and is of remarkable interest to cryptologists. To generate cheap pseudo-random numbers efficiently is one of the most important tasks for cryptography.

For LOOCV, if a sample for class one is omitted, then the classification probability is:

(\frac{N - 1}{2 N - 1}, \frac{N}{2 N - 1}) .

Hence the probability that the omitted sample is correctly labelled is:

\frac{N - 1}{2 N - 1} \leq \frac{1}{2},

justifies its pessimism in this setting.

The dependency between features and labels paves the way for decoupling features or security/forensics in machine learning. However, it is impossible to justify that some dataset cannot be learned by any machines, whose converse side is extremely simple.

solour_lfq

2021-03-24 13:42

Exercise 6.1 - Pessimism of LOOCV

Answers

Comments

Add answer