Exercise 12.4 - Deriving the second principal component

Answers

For this exercise, minimizing J makes 𝐯 1 and 𝐯 2 the first and second principal component for the dataset. By definition, z i , j (where j = 1 , 2 ) is the projection of 𝐱 i onto 𝐯 j .

For question (a), J ( 𝐯 1 , 𝐯 2 ) is the reconstruction loss measured by l 2 norm, hence we would have:

z i , 2 = 𝐯 2 T 𝐱 i ,

from the physics of projection. On the other hand, we could use a more mathematical way for deduction, with:

J ( 𝐯 2 , 𝐳 2 ) = 1 N n = 1 N ( 𝐱 n z n , 1 𝐯 1 z n , 2 𝐯 2 ) T ( 𝐱 n z n , 1 𝐯 1 z n , 2 𝐯 2 ) ,

we have:

∂𝐽 z m , 2 = 1 N ( 2 z m , 2 𝐯 2 T 𝐯 2 2 𝐯 2 T ( 𝐱 m z m 1 𝐯 1 ) ) = 0 .

Using the fact that 𝐯 2 T 𝐯 2 = 1 and 𝐯 2 T 𝐯 1 = 0 , we arrive at:

z m , 2 = 𝐯 2 T 𝐱 m .

For question (b), with:

J ~ ( 𝐯 2 ) = 𝐯 2 T 𝐂 𝐯 2 + λ 2 ( 𝐯 2 T 𝐯 2 ) + λ 12 ( 𝐯 2 T 𝐯 1 0 ) ,

we adopt straightforward matrix algebra:

J ~ 𝐯 2 = 2 𝐂 𝐯 2 + 2 λ 2 𝐯 2 + λ 12 𝐯 1 ,

where we have assumed that 𝐂 is symmetric and semi-positive definite. The next step is to decompose 𝐯 2 along the eigenvectors of 𝐂 as:

𝐯 2 = d = 1 D f d 𝐮 d ,

where d is arranged in the inverse order of the eigenvalues y d , i.e., 𝐮 1 = 𝐯 1 , f 1 = λ 1 . Taking partial gradient w.r.t. λ 2 and λ 12 implies:

d = 1 D f d 2 = 1 , f 1 = 0 .

Hence, taking the gradient w.r.t. 𝐯 2 as zero is tantamount to write:

λ 12 𝐯 1 = d = 1 D 2 f d ( y d λ 2 ) 𝐮 d .

Recall that the eigenvectors for 𝐂 are orthogonal to each other, hence we have:

f 1 ( y 1 λ 2 ) = λ 12 2 , d 1 , f d ( y d λ 2 ) = 0 .

These equations tell that λ 12 = 0 , and λ 2 equals one eigenvalue of 𝐂 , y d , whose corresponding eigenvector 𝐮 d is the optimal value for 𝐯 2 . (We ignore the degenerate case here, but the generalization is straightforward.) For such 𝐯 2 , the value of J ~ is y d 2 , whose minimum is reached when 𝐯 2 is the eigenvector corresponding to the second largest eigenvalue.

User profile picture
2021-03-24 13:42
Comments