Exercise 6.1

Answers

(a)
Let x = (0,1),x = (0.01,1), then we have the Euclidean distance d(x,x) = x x = 0.01 and the cosine similarity CosSim(x,x) = xx xx 0.99995.

They have very high cosine similarity but very low Euclidean distance similarity.

Let x = (1,0),x = (1,0), then we have the Euclidean distance d(x,x) = x x = 2 and the cosine similarity CosSim(x,x) = xx xx = 1.

= np.array([0,1]) 
= np.array([0.011]) 
 
= np.linalg.norm(x-y) 
print('Euclidean distance: ', d) 
 
xdoty = np.dot(x,y) 
cos = xdoty/np.linalg.norm(x)/np.linalg.norm(y) 
print('Cosine similarity: ', cos)
     Euclidean distance:  0.01
     Cosine similarity:  0.9999500037496877

(b)
Suppose we have x,y w.r.t. the original origin. If the origin of the coordinate system changes, suppose we move the origin to P, then we have w.r.t. the new origin, x = x P,y = y P, so it’s easy to see that the Euclidean distance similarity doesn’t change. The cosine similarity however changes, this can be seen by checking two vectors perpendicular to each other, if we move to a new origin, they don’t perpendicular to each other anymore.

This puts some restriction on the choice of features, if we want to use cosine similarity, we can’t change their magnitude, e.g. mean subtraction, this may affect some algorithms which may perform badly given large differences between different features.

User profile picture
2021-12-08 09:37
Comments