Exercise 13.6 - Shrinkage in linear regression

Answers

For the ordinary least square, the loss is defined as:

RSS ( 𝐰 ) = ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) .

Since 𝐗 𝐗 T = 𝐈 :

RSS ( 𝐰 ) = 𝐘 T 𝐘 + 𝐰 T 𝐰 2 𝐘 T 𝐗 T 𝐰 .

Take its derivative w.r.t. w k , where we note that all D weights has been decoupled in the RSS :

w k RSS ( 𝐰 ) = 2 w k 2 n = 1 N y n x 𝑛𝑘 .

Therefore we ends up with:

ŵ k OLS = n = 1 N y n x 𝑛𝑘 = c k 2 .

For the ridge regression:

RSS ( 𝐰 ) = ( 𝐘 𝐗 T 𝐰 ) T ( 𝐘 𝐗 T 𝐰 ) + λ 2 𝐰 T 𝐰 .

Take its derivative and set it to zero:

( 2 + 2 λ 2 ) w k = 2 n = 1 N y n x 𝑛𝑘 .

Thus

ŵ k ridge = n = 1 N y n x 𝑛𝑘 1 + λ 2 = c k 2 ( 1 + λ 2 ) .

Finally, recall that

ŵ k lasso = sign ( ŵ k OLS ) ( | ŵ k OLS | λ 1 2 ) + .

Observe Figure 13.24, it is easy to address the black line as OLS, the gray one Ridge and the dotted one lasso. Obviously λ 1 = λ 2 = 1 . It is noticeable that ridge cause a shrinkage to horizontal axis while lasso cause a sharp shrinkage to zero under certain threshold.

User profile picture
2021-03-24 13:42
Comments