Exercise 4.11 - Derivation of the NIW posterior

Answers

We begin with the likelihood for a MVN:

p ( 𝐗 | μ , Σ ) = ( 2 π ) 𝑁𝐷 2 | Σ | N 2 exp { 1 2 n = 1 N ( 𝐱 i μ ) T Σ 1 ( 𝐱 i μ ) } .

By (4.195), which can be proven by:

n = 1 N ( 𝐱 i μ ) T Σ 1 ( 𝐱 i μ ) = n = 1 N ( 𝐱 ¯ μ + ( 𝐱 i 𝐱 ¯ ) ) T Σ 1 ( 𝐱 ¯ μ + ( 𝐱 i 𝐱 ¯ ) ) = N ( 𝐱 ¯ μ ) T Σ 1 ( 𝐱 ¯ μ ) + n = 1 N ( 𝐱 i 𝐱 ¯ ) T Σ 1 ( 𝐱 i 𝐱 ¯ ) = N ( 𝐱 ¯ μ ) T Σ 1 ( 𝐱 ¯ μ ) + tr { Σ 1 n = 1 N ( 𝐱 i 𝐱 ¯ ) ( 𝐱 i 𝐱 ¯ ) T } = N ( 𝐱 ¯ μ ) T Σ 1 ( 𝐱 ¯ μ ) + tr { Σ 1 𝐒 𝐱 ¯ } ,

where we have used the fact that tr ( 𝐘 T 𝐙 ) = tr ( 𝐙 𝐘 T ) , with 𝐘 the shifted design matrix and 𝐙 = Σ 1 𝐘 .

The conjugate prior for MVN’s parameters ( μ , Σ ) is Normal-inverse-Wishart(NIW) distribution defined by:

NIW ( μ , Σ | 𝐦 0 , k 0 , v 0 , 𝐒 0 ) = 𝒩 ( μ | 𝐦 0 , 1 k 0 Σ ) IW ( Σ | 𝐒 0 , v 0 )

= 1 Z | Σ | v 0 + D + 2 2 exp { k 0 2 ( μ 𝐦 0 ) T Σ 1 ( μ 𝐦 0 ) 1 2 tr { Σ 1 𝐒 0 } } .

Hence the posterior reads (where we have omitted the condition on hyperparameters):

p ( μ , Σ | 𝐗 ) | Σ | v 𝐗 + D + 2 2 exp { k 𝐗 2 ( μ 𝐦 𝐗 ) T Σ 1 ( μ 𝐦 𝐗 ) 1 2 tr { Σ 1 𝐒 𝐗 } } ,

where v 𝐗 , k 𝐗 , 𝐦 𝐗 and 𝐒 𝐗 are variables whose values are to be decided. Only terms that dependent on μ and Σ can explicitly enter the terms on the r.h.s.

Firstly, by comparing the exponential for | Σ | , we have:

v 𝐗 = v 0 + N .

Secondly, compare the coefficient for the term μ T Σ 1 μ inside the exponential and we have:

k 𝐗 = k 0 + N .

Thirdly, check the coefficient for μ T so we have:

N Σ 1 𝐱 ¯ + k 0 Σ 1 𝐦 0 = k 𝐗 Σ 1 𝐦 𝐗 ,

therefore:

𝐦 𝐗 = N 𝐱 ¯ + k 0 𝐦 0 k 𝐗 .

Finally, recall that for an arbitrary column vector A :

A T Σ 1 A = tr ( A T Σ 1 A ) = tr ( Σ 1 A A T ) .

The terms that solely dependent on Σ 1 should equal to each other, so:

tr ( Σ 1 ( k 0 𝐦 0 𝐦 0 T + 𝐒 0 ) ) + tr ( Σ 1 ( N 𝐱 ¯ 𝐱 ¯ T 𝐒 𝐱 ¯ ) ) = tr ( Σ 1 ( k 𝐗 𝐦 𝐗 𝐦 𝐗 T + 𝐒 𝐗 ) ) .

Having arrived in:

N 𝐱 ¯ 𝐱 ¯ T + 𝐒 𝐗 ¯ + k 0 𝐦 0 𝐦 0 T + 𝐒 0 = k 𝐗 𝐦 𝐗 𝐦 𝐗 T + 𝐒 𝐗 ,

we obtain:

𝐒 𝐗 = N 𝐱 ¯ 𝐱 ¯ T + 𝐒 𝐗 ¯ + k 0 𝐦 0 𝐦 0 T + 𝐒 0 k 𝐗 𝐦 𝐗 𝐦 𝐗 T .

Recall the definition for mean we ends in (4.214) since:

𝐒 = n = 1 N 𝐱 i 𝐱 t T = 𝐒 𝐗 ¯ + N 𝐱 ¯ 𝐱 ¯ T .

This finishes proving that the posterior distribution for MVN takes the form: NIW ( 𝐦 𝐗 , k 𝐗 , v 𝐗 , 𝐒 𝐗 ) .

User profile picture
2021-03-24 13:42
Comments