Exercise 13.11 - Projected gradient descent for l1 regularized least squares

Answers

Generally, we take the gradient of $𝐰$ and optimize. When some constraint on $𝐰$ is broken by the gradient descent, the increment is moderated so that the constraint remains valid. For the following loss function:

{min}_{𝐰} {NLL (𝐰) + λ | | 𝐰 | |_{1}},

consider under a linear regression context:

NLL (𝐰) = \frac{1}{2} | | 𝐲 - 𝐗 𝐰 | |_{2}^{2} .

For $λ ∥ 𝐰 ∥_{1}$ which is not differentiate, it is suggest:

𝐰 = 𝐮 - 𝐯,

where

u_{i} = {(x_{i})}_{+} = \max {0, x_{i}},

v_{i} = {(- x_{i})}_{+} = \max {0, - x_{i}} .

With $𝐮 \geq 0, 𝐯 \geq 0$ , we have:

∥ 𝐰 ∥_{1} = 1_{n}^{T} 𝐮 + 1_{n}^{T} 𝐯 .

Hence the original problem is translated into:

\min_{𝐰} {\frac{1}{2} | | 𝐲 - 𝐗 (𝐮 - 𝐯) | |_{2}^{2} + λ 1_{n}^{T} 𝐮 + λ 1_{n}^{T} 𝐯} .

s.t. 𝐮 \geq 0, 𝐯 \geq 0 .

Denote:

𝐳 = (\begin{matrix} 𝐮 \\ 𝐯 \end{matrix}),

then we can rewrite the original target to be optimized into:

\min_{𝐳} {f (𝐳) = 𝐜^{T} 𝐳 + \frac{1}{2} 𝐳^{T} 𝐀 𝐳},

s.t. 𝐳 \geq 0,

where:

𝐜 = (\begin{matrix} λ 1_{n} - 𝐲 𝐗 \\ λ 1_{n} + 𝐲 𝐗 \end{matrix}),

𝐀 = (\begin{matrix} 𝐗^{T} 𝐗 & - 𝐗^{T} 𝐗 \\ - 𝐗^{T} 𝐗 & 𝐗^{T} 𝐗 \end{matrix}) .

Now we have changed the problem to a quadratic problem with a simple bound constraint. The gradient is given by:

\nabla f (𝐳) = 𝐜 + 𝐀 𝐳 .

An ordinary gradient descent step is:

𝐳^{k + 1} = 𝐳^{k} - α \nabla f (𝐳^{k}) .

For projected case, take $𝐠^{k}$ :

𝐠_{i}^{k} = \min {𝐳_{i}^{k}, α \nabla f {(𝐳^{k})}_{i}} .

And:

𝐳^{k + 1} = 𝐳^{k} - 𝐠^{k},

hence $𝐳$ is constrained as a legal weight candidate.

The original paper suggest more delicate method to moderate the learning rate, refer to Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems, Mario A.T.Figueiredo.

solour_lfq

2021-03-24 13:42

Exercise 13.11 - Projected gradient descent for l1 regularized least squares

Answers

Comments

Add answer