Exercise 7.10 - Bayesian linear regression using the g-prior

Answers

For Bayesian linear regression model, the likelihood is as always:

p ( 𝒟 | 𝐰 , σ 2 ) = n = 1 N 𝒩 ( y n | 𝐰 T 𝐱 n , σ 2 ) .

The prior distribution is Gaussian-Inverse Gamma distribution:

p ( 𝐰 , σ 2 ) = NIG ( 𝐰 , σ 2 | 𝐰 0 , 𝐕 0 , a 0 , b 0 ) = 𝒩 ( 𝐰 | 𝐰 0 , σ 2 𝐕 0 ) IG ( σ 2 | a 0 , b 0 ) = 1 ( 2 π ) D 2 1 | σ 2 𝐕 0 | 1 2 exp { 1 2 ( 𝐰 𝐰 0 ) T ( σ 2 𝐕 0 ) 1 ( 𝐰 𝐰 0 ) } b 0 a 0 Γ ( a 0 ) ( σ 2 ) ( a 0 + 1 ) exp { b 0 σ 2 } = b 0 a 0 ( 2 π ) D 2 | 𝐕 0 | 1 2 Γ ( a 0 ) ( σ 2 ) ( a 0 + D 2 + 1 ) exp { ( 𝐰 𝐰 0 ) T 𝐕 0 1 ( 𝐰 𝐰 0 ) + 2 b 0 2 σ 2 } .

The posterior distribution takes the form:

p ( 𝐰 , σ 2 | 𝒟 ) p ( 𝐰 , σ 2 ) p ( 𝒟 | 𝐰 , σ 2 ) b 0 a 0 ( 2 π ) D 2 | 𝐕 0 | 1 2 Γ ( a 0 ) ( σ 2 ) ( a 0 + D 2 + 1 ) exp { ( 𝐰 𝐰 0 ) T 𝐕 0 1 ( 𝐰 𝐰 0 ) + 2 b 0 2 σ 2 } ( σ 2 ) N 2 exp { n = 1 N ( y n 𝐰 T 𝐱 n ) 2 2 σ 2 } .

To decompose the updated hyperparameters from this form, we have to find:

  • The exponential of σ 2 .
  • The squared term within the exponential.

The exponential of σ 2 in the posterior is:

( a 0 + D 2 + 1 ) N 2 ,

thus we have:

a N = a 0 + N 2 = N 2 .

The coefficient of 𝐰 T 𝐰 in the exponential (as the introducer matrix of the inner product) is:

𝐕 0 1 2 σ 2 𝐗 𝐗 T 2 σ 2 ,

therfore:

𝐕 N 1 = 𝐕 0 1 + 𝐗 𝐗 T ,

with 𝐕 0 = g ( 𝐗 𝐗 T ) 1 :

𝐕 N = g g + 1 ( 𝐗 𝐗 T ) 1 .

The coefficient of 𝐰 T in the exponential is:

𝐕 0 1 𝐰 0 σ 2 + n = 1 N y n 𝐱 n σ 2 = 𝐕 0 1 𝐰 0 σ 2 + 𝐗 𝐘 σ 2 .

So we have:

𝐕 0 1 𝐰 0 + 𝐗 𝐘 = 𝐕 N 1 𝐰 N .

This yields:

𝐰 N = 𝐕 N ( 𝐕 0 1 𝐰 0 + 𝐗 𝐘 ) = g g + 1 ( 𝐗 𝐗 T ) 1 ( 𝐗 𝐗 T g 0 + 𝐗 𝐘 ) = g g + 1 ( 𝐗 𝐗 T ) 1 𝐗 𝐘 .

Finally, completing the square within the exponential yields:

( 𝐰 𝐰 0 ) T 𝐕 0 1 ( 𝐰 𝐰 0 ) + 2 b 0 + ( 𝐰 T 𝐗 𝐘 ) 2 = ( 𝐰 𝐰 N ) T 𝐕 N 1 ( 𝐰 𝐰 N ) + 2 b N .

Plugging in what we have already known about 𝐰 N and 𝐕 N results in:

b N = 𝐘 T 𝐘 2 + g 2 ( g + 1 ) 𝐰 MLE T 𝐗 𝐗 T 𝐰 MLE .

Now we have (7.113)-(7.116) established.

User profile picture
2021-03-24 13:42
Comments