Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 6.8
Exercise 6.8
Answers
- (a)
- Pseudo code for the recursive branch and bound search for the nearest
neighbor.
- Given a test point , we search for its nearest neighbor in the data set .
- We look for algorithm called to find the nearest neighbor of in a given cluster . Let’s initialize with the whole data set as our first cluster . From clustering algorithm, we should know its two children clusters, their centers, , and radii .
-
Here is how we implement :
- (a)
- Set the nearest neighbor be as in python.
- (b)
- If the input cluster has no children cluster, i.e. it has at most 2 data
points
- Find the nearest neighbor in the cluster by computing the distance from to all the points in the cluster. There are at most 2 computations needed since we assume every cluster with more than 2 points is split into 2 sub-clusters.
- Return this point as the nearest neighbor in the input cluster to
- Else, suppose the two children clusters of input cluster are:
- If : let
- Else: let
- (c)
- Now we have , let’s call the sibling cluster of be , let’s compute the bound condition
- (d)
- If : (Need
check cluster
for the nearest neighbor)
- If :
- (e)
- Return
- We call with to get the nearest neighbor in the data set to
- (b)
- If the sub-clusters of a cluster are balanced, and assume the number of data points is power of 2, i.e. , the maximum depth of the recursive search for a nearest neighbor can be
- (c)
- If we have balanced sub-clusters and if the bound conditions always hold, then the if-condition in step 5 will be always False, so we just need to go through steps, each with a computation of for distance in the space. The total cost is thus
- (d)
- To apply the branch and bound algorithm to -nearest neighbors, assume that every cluster with more than points is split into 2 sub-clusters. Instead of search for the nearest neighbor in a cluster, we search for -nearest neighbors in a clusters, then apply the bound condition on the -th neighbor with another cluster.
2021-12-08 09:44