Homepage Solution manuals Kevin P. Murphy Machine Learning: a Probabilistic Perspective Exercise 7.8 - Bayesian linear regression in 1d with known $\sigma^{2}$

Exercise 7.8 - Bayesian linear regression in 1d with known $\sigma^{2}$

Answers

For question (a), the estimation for σ 2 is 0.3173 .

For question (b), we have:

p ( 𝐰 ) 𝒩 ( w 1 | 0 , 1 ) exp { 1 2 w 1 2 } .

To simplify the algebra, we observe that w 1 and w 0 are independent in this prior. Thus the prior distribution can be reduced to: 𝒩 ( w 1 | 0 , 1 ) 𝒩 ( w 0 | γ , ) , where γ can be an arbitrary finite number, thus:

𝐦 0 = ( 0 γ ) ,

𝐕 0 = ( 1 0 0 0 ) .

For question (c) and (d), we consider the posterior distribution for parameters:

p ( 𝐰 | 𝒟 , σ 2 ) = 𝒩 ( 𝐰 | 𝐦 0 , 𝐕 0 ) n = 1 N 𝒩 ( y n | w 0 + w 1 x n , σ 2 ) exp { w 1 2 2 } n = 1 N exp { ( y n w 0 w 1 x n ) 2 2 σ 2 } ,

where we only maintain terms dependent on w 0 and w 1 . To marginalize out w 0 , we have:

p ( w 1 | 𝒟 , σ 2 ) = p ( w 1 , w 0 ) d w 0 exp { w 1 2 2 } exp { A w 1 2 + B w 0 2 + C w 0 w 1 + D w 1 + E w 0 + F } d w 0 = exp { w 1 2 2 + A w 1 2 + D w 1 + F } exp { B w 0 2 + C w 0 w 1 + E w 0 } d w 0 = exp { w 1 2 2 + A w 1 2 + D w 1 + F ( C w 1 + E ) 2 4 B } exp { B w 0 2 + ( C w 1 + E ) w 0 + ( C w 1 + E ) 2 4 B } d w 0 exp { w 1 2 2 + A w 1 2 + D w 1 + F ( C w 1 + E ) 2 4 B } .

Hence the posterior distribution over w 1 is a normal distribution. The coefficients for w 1 2 and w 1 in the exponential are respectively:

1 2 + A C 2 4 B ,

D 𝐶𝐸 2 B .

Thence its posterior variance is:

1 1 2 A + C 2 2 B ,

its mean is:

2 𝐵𝐷 𝐶𝐸 2 B 4 𝐴𝐵 + C 2 .

Finally, let us plug A to E with statistics of 𝒟 :

A = n = 1 N x n 2 2 σ 2 , B = N 2 σ 2 , C = n = 1 N x n σ 2 , D = n = 1 N x n y n σ 2 , E = n = 1 N y n σ 2 .

The posterior variance is:

σ 2 σ 2 + x 2 ( x ) 2 N ,

from which we observe that, with N grows, the denominator increases as the bound from the Cauchy inequality:

n = 1 N x n 2 ( n = 1 N x n ) 2 N 0 .

To put in other words, The larger the Cauchy difference is, the more confidence we have for the estimation of w 1 . Such difference is determined by the variance of the distribution on x . The posterior variance can be reduced to:

σ 2 σ 2 + N var ( x ) .

Therefore for any fixed generative distribution on x , the uncertainty on w 1 declines as 𝒪 { 1 N } .

User profile picture
2021-03-24 13:42
Comments