Exercise 12.3 - Heuristic for assessing applicability of PCA

Answers

We derive this heuristics from an information theory’s perspective. Recall that the differential entropy for a MVN is (with σ i 2 = λ i 1 ):

h ( { λ i } i = 1 d ) = 1 2 i = 1 d d log 2 ( 2 π e ) + log 2 ( σ i 2 ) .

After PCA, the covariance for this MVN model is obatined by replacing the smallest d variances into σ 2 0 , hence the difference in entropy is:

Δ h ( d ) = 1 2 i = 1 d log 2 σ i 2 σ 2 = d 2 log 2 λ 1 2 i = 1 d log 2 λ i .

For two eigen series with the same mean λ ¯ , it is plausible to expect that the product of the largest d values in the series with a larger variance is larger, hence the information loss is smaller, making the PCA better regarding information compression.

User profile picture
2021-03-24 13:42
Comments