Homepage › Solution manuals › Yaser Abu-Mostafa › Learning from Data › Exercise 6.1
Exercise 6.1
Answers
- (a)
- Let ,
then we have the Euclidean distance
and the cosine similarity .
They have very high cosine similarity but very low Euclidean distance similarity.
Let , then we have the Euclidean distance and the cosine similarity .
x = np.array([0,1]) y = np.array([0.01, 1]) d = np.linalg.norm(x-y) print('Euclidean distance: ', d) xdoty = np.dot(x,y) cos = xdoty/np.linalg.norm(x)/np.linalg.norm(y) print('Cosine similarity: ', cos)
Euclidean distance: 0.01 Cosine similarity: 0.9999500037496877 - (b)
- Suppose we have
w.r.t. the original origin. If the origin of the coordinate
system changes, suppose we move the origin to
, then we have w.r.t.
the new origin, ,
so it’s easy to see that the Euclidean distance similarity doesn’t change. The
cosine similarity however changes, this can be seen by checking two vectors
perpendicular to each other, if we move to a new origin, they don’t
perpendicular to each other anymore.
This puts some restriction on the choice of features, if we want to use cosine similarity, we can’t change their magnitude, e.g. mean subtraction, this may affect some algorithms which may perform badly given large differences between different features.