Exercise 7.8 - Bayesian linear regression in 1d with known $\sigma^{2}$

Answers

For question (a), the estimation for $σ^{2}$ is $0.3173$ .

For question (b), we have:

p (𝐰) \propto 𝒩 (w_{1} | 0, 1) \propto \exp {- \frac{1}{2} w_{1}^{2}} .

To simplify the algebra, we observe that $w_{1}$ and $w_{0}$ are independent in this prior. Thus the prior distribution can be reduced to: $𝒩 (w_{1} | 0, 1) \cdot 𝒩 (w_{0} | γ, \infty),$ where $γ$ can be an arbitrary finite number, thus:

𝐦_{0} = (\begin{matrix} 0 \\ γ \end{matrix}),

𝐕_{0} = (\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}) .

For question (c) and (d), we consider the posterior distribution for parameters:

\begin{aligned} p (𝐰 | 𝒟, σ^{2}) & = 𝒩 (𝐰 | 𝐦_{0}, 𝐕_{0}) \cdot \prod_{n = 1}^{N} 𝒩 (y_{n} | w_{0} + w_{1} x_{n}, σ^{2}) \\ \propto \exp {- \frac{w_{1}^{2}}{2}} \cdot \prod_{n = 1}^{N} \exp {- \frac{{(y_{n} - w_{0} - w_{1} x_{n})}^{2}}{2 σ^{2}}}, \end{aligned}

where we only maintain terms dependent on $w_{0}$ and $w_{1}$ . To marginalize out $w_{0}$ , we have:

\begin{aligned} p (w_{1} | 𝒟, σ^{2}) & = \int p (w_{1}, w_{0}) d w_{0} \\ \propto \exp {- \frac{w_{1}^{2}}{2}} \cdot \int \exp {A w_{1}^{2} + B w_{0}^{2} + C w_{0} w_{1} + D w_{1} + E w_{0} + F} d w_{0} \\ = \exp {- \frac{w_{1}^{2}}{2} + A w_{1}^{2} + D w_{1} + F} \cdot \int \exp {B w_{0}^{2} + C w_{0} w_{1} + E w_{0}} d w_{0} \\ = \exp {- \frac{w_{1}^{2}}{2} + A w_{1}^{2} + D w_{1} + F - \frac{{(C w_{1} + E)}^{2}}{4 B}} \cdot \\ \int \exp {B w_{0}^{2} + (C w_{1} + E) w_{0} + \frac{{(C w_{1} + E)}^{2}}{4 B}} d w_{0} \\ \propto \exp {- \frac{w_{1}^{2}}{2} + A w_{1}^{2} + D w_{1} + F - \frac{{(C w_{1} + E)}^{2}}{4 B}} . \end{aligned}

Hence the posterior distribution over $w_{1}$ is a normal distribution. The coefficients for $w_{1}^{2}$ and $w_{1}$ in the exponential are respectively:

- \frac{1}{2} + A - \frac{C^{2}}{4 B},

D - \frac{𝐶𝐸}{2 B} .

Thence its posterior variance is:

\frac{1}{1 - 2 A + \frac{C^{2}}{2 B}},

its mean is:

\frac{2 𝐵𝐷 - 𝐶𝐸}{2 B - 4 𝐴𝐵 + C^{2}} .

Finally, let us plug $A$ to $E$ with statistics of $𝒟$ :

\begin{aligned} A & = - \frac{\sum_{n = 1}^{N} x_{n}^{2}}{2 σ^{2}}, \\ B & = - \frac{N}{2 σ^{2}}, \\ C & = - \frac{\sum_{n = 1}^{N} x_{n}}{σ^{2}}, \\ D & = \frac{\sum_{n = 1}^{N} x_{n} y_{n}}{σ^{2}}, \\ E & = \frac{\sum_{n = 1}^{N} y_{n}}{σ^{2}} . \end{aligned}

The posterior variance is:

\frac{σ^{2}}{σ^{2} + \sum x^{2} - \frac{{(\sum x)}^{2}}{N}},

from which we observe that, with $N$ grows, the denominator increases as the bound from the Cauchy inequality:

\sum_{n = 1}^{N} x_{n}^{2} - \frac{{(\sum_{n = 1}^{N} x_{n})}^{2}}{N} \geq 0 .

To put in other words, The larger the Cauchy difference is, the more confidence we have for the estimation of $w_{1}$ . Such difference is determined by the variance of the distribution on $x$ . The posterior variance can be reduced to:

\frac{σ^{2}}{σ^{2} + N var (x)} .

Therefore for any fixed generative distribution on $x$ , the uncertainty on $w_{1}$ declines as $𝒪 {\frac{1}{N}}$ .

solour_lfq

2021-03-24 13:42

Exercise 7.8 - Bayesian linear regression in 1d with known $\sigma^{2}$

Answers

Comments

Add answer