Exercise 7.9 - Generative model for linear regression

Answers

For question (a), assume that 𝐱 and y are jointly subject to a Gaussian:

𝒩 ( ( 𝐱 y ) | ( μ 𝐱 μ y ) , ( Σ 𝑥𝑥 Σ 𝑥𝑦 Σ 𝑥𝑦 T Σ 𝑦𝑦 ) ) .

We know from (4.68) that the marginal distribution of 𝐱 and y are 𝒩 ( 𝐱 | μ 𝐱 , Σ 𝑥𝑥 ) and 𝒩 ( 𝐲 | μ y , Σ 𝑦𝑦 ) respectively, this is sufficient for MLE μ 𝐱 , μ y , Σ 𝑥𝑥 and Σ 𝑦𝑦 :

μ 𝐱 = 1 N n = 1 N 𝐱 n , μ y = 1 N n = 1 N y n , Σ 𝑥𝑥 = 1 N n = 1 N ( 𝐱 n 𝐱 ¯ ) ( 𝐱 n 𝐱 ¯ ) T = 1 N 𝐗 ¯ 𝐗 ¯ T , Σ 𝑦𝑦 = 1 N n = 1 N ( y n ȳ ) 2 = 1 N 𝐘 ¯ T 𝐘 ¯ .

For an estimation of Σ 𝑥𝑦 , connecting 𝐱 and y together then estimating this vector’s covariance matrix (this is an ordinary Gaussian model), picking only the first D components from its last column:

Σ 𝑥𝑦 = 1 N 𝐗 ¯ 𝐘 ¯ .

Finally, plugging our observations into (4.69):

μ y | 𝐱 = μ y + Σ 𝑦𝑥 Σ 𝑥𝑥 1 ( 𝐱 μ 𝐱 ) ,

which equals:

ȳ + 1 N 𝐘 ¯ T 𝐗 ¯ T N ( 𝐗 ¯ 𝐗 ¯ T ) 1 ( 𝐱 𝐱 ¯ ) .

This is identical to what (7.109)-(7.111) implies, hence the proof is completed.

For question (b), there is no significant difference except that we are now equipped with a distribution of x . This enables us the ability of generating more samples and sheds light on active querying strategies, in which we can query an oracle for samples in order to accelerate the convergence of the learning process.

User profile picture
2021-03-24 13:42
Comments