Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 6.8

Exercise 6.8

Answers

(a)

Pseudo code for the recursive branch and bound search for the nearest neighbor.

Given a test point $x$ , we search for its nearest neighbor in the data set $D$ .
We look for algorithm called $F (S, x)$ to find the nearest neighbor of $x$ in a given cluster $S$ . Let’s initialize with the whole data set as our first cluster $D$ . From clustering algorithm, we should know its two children clusters, their centers, $μ_{1}, μ 2$ , and radii $r_{1}, r_{2}$ .
Here is how we implement $F (S, x)$ :
(a)
Set the nearest neighbor be $x_{nn} = None$ as in python.
(b)
If the input cluster has no children cluster, i.e. it has at most 2 data points
- Find the nearest neighbor in the cluster by computing the distance from $x$ to all the points in the cluster. There are at most 2 computations needed since we assume every cluster with more than 2 points is split into 2 sub-clusters.
- Return this point as the nearest neighbor $x_{nn}$ in the input cluster to $x$
- Else, suppose the two children clusters of input cluster are: $S_{1}, S_{2}$
- If $∥ x - μ_{1} ∥ \leq ∥ x - μ_{2} ∥$ : let $x_{nn} = F (S_{1}, x)$
- Else: let $x_{nn} = F (S_{2}, x)$
(c)
Now we have $x_{nn}$ , let’s call the sibling cluster of $S$ be $S^{∗}$ , let’s compute the bound condition
(d)
If ∥x − xnn∥ > ∥x − μ∗∥− r∗: (Need check cluster S∗ for the nearest neighbor)
- $x_{nn}^{∗} = F (S^{∗}, x)$
- If $∥ x - x_{nn}^{∗} ∥ < ∥ x - x_{nn} ∥$ : $x_{nn} = x_{nn}^{∗}$
(e)
Return $x_{nn}$
We call $F (S, x)$ with $S = D$ to get the nearest neighbor in the data set to $x$

(b)

If the sub-clusters of a cluster are balanced, and assume the number of data points is power of 2, i.e.

N = 2^{k}

, the maximum depth of the recursive search for a nearest neighbor can be

\log_{2} N = k

(c)

If we have balanced sub-clusters and if the bound conditions always hold, then the if-condition in step 5 will be always False, so we just need to go through

\log_{2} N

steps, each with a computation of

d

for distance in the

R^{d}

space. The total cost is thus

d \log_{2} N

(d)

To apply the branch and bound algorithm to

k

-nearest neighbors, assume that every cluster with more than

k

points is split into 2 sub-clusters. Instead of search for the nearest neighbor in a cluster, we search for

k

-nearest neighbors in a clusters, then apply the bound condition on the

k

-th neighbor with another cluster.

niuers

2021-12-08 09:44

Exercise 6.8

Answers

Comments

Add answer