Homepage › Solution manuals › Kevin P. Murphy › Machine Learning: a Probabilistic Perspective › Exercise 12.4 - Deriving the second principal component
Exercise 12.4 - Deriving the second principal component
Answers
For this exercise, minimizing makes and the first and second principal component for the dataset. By definition, (where ) is the projection of onto .
For question (a), is the reconstruction loss measured by norm, hence we would have:
from the physics of projection. On the other hand, we could use a more mathematical way for deduction, with:
we have:
Using the fact that and , we arrive at:
For question (b), with:
we adopt straightforward matrix algebra:
where we have assumed that is symmetric and semi-positive definite. The next step is to decompose along the eigenvectors of as:
where is arranged in the inverse order of the eigenvalues , i.e., , Taking partial gradient w.r.t. and implies:
Hence, taking the gradient w.r.t. as zero is tantamount to write:
Recall that the eigenvectors for are orthogonal to each other, hence we have:
These equations tell that , and equals one eigenvalue of , , whose corresponding eigenvector is the optimal value for . (We ignore the degenerate case here, but the generalization is straightforward.) For such , the value of is , whose minimum is reached when is the eigenvector corresponding to the second largest eigenvalue.