Exercise 6.6

Answers

In a 10-fold cross validation, suppose we have data folds as D1,,D10, each fold will have N 10 data points. For a given k and a given fold Di, for each point x in Di, we need compute its distance with all other points in d dimension, which requires O(Nd) computations. We then need select the smallest k distances, which requires O(Nlog k) computations. So for each point in fold Di, we need O(Nd + Nlog k) computations.

We need to do the same for all N 10 points in fold Di, so the total computation for a given fold is O(N 10(Nd + Nlog k)), with 10 folds, we have total cost of O(N2d + N2 log k) for a given k.

Now sum up over k, we have the total cost

T = j=1N+1 2 O(N2d+N2 log (2j1)) = O(N2dN + 1 2 )+O(N2 j=1N+1 2 log (2j1)).

The first term is approximately on the order of O(N3d)

For the second term, we have

j=1N+1 2 log (2j 1)) j=1N+1 2 log (2j)) N + 1 2 log 2 + log (N + 1 2 )!

By Sterling approximation formula, we have

ln n! = nln n n + O(ln n),

so

log (N + 1 2 )! N + 1 2 log N + 1 2 N + 1 2 + O(log N + 1 2 ) = O(Nlog N N).

Then

O(N2 j=1N+1 2 log (2j 1)) = O(N2(Nlog N N)) = O(N3 log N N3).

Adding up with the first term, we have

T = O(N3d + N3 log N).

User profile picture
2021-12-08 09:40
Comments