Homepage › Solution manuals › Kevin P. Murphy › Machine Learning: a Probabilistic Perspective › Exercise 6.1 - Pessimism of LOOCV
Exercise 6.1 - Pessimism of LOOCV
Answers
Under the setting in this problem, the best classification method can achieve is the accuracy of 50%, since the features are independent of the labels. This is, however, the idealistic situation where we assume that
where . When is finite, the classification accuracy usually fluctuates around 50%. For the label to be randomly assigned, we should have that for an arbitrary (probabilistic) algorithm/machine that terminates within a time complexity a polynomial of , its classification accuracy should not deviate from 50% larger than , in which is an arbitrary polynomial function. The machine in this case is referred to as a probabilistic polynomial time (PPT) machine and is of remarkable interest to cryptologists. To generate cheap pseudo-random numbers efficiently is one of the most important tasks for cryptography.
For LOOCV, if a sample for class one is omitted, then the classification probability is:
Hence the probability that the omitted sample is correctly labelled is:
justifies its pessimism in this setting.
The dependency between features and labels paves the way for decoupling features or security/forensics in machine learning. However, it is impossible to justify that some dataset cannot be learned by any machines, whose converse side is extremely simple.