We begin with the likelihood for a MVN:
By (4.195), which can be proven by:
where we have used the fact that
, with
the shifted design matrix and
.
The conjugate prior for MVN’s parameters
is Normal-inverse-Wishart(NIW) distribution defined by:
Hence the posterior reads (where we have omitted the condition on hyperparameters):
where
,
,
and
are variables whose values are to be decided. Only terms that dependent on
and
can explicitly enter the terms on the r.h.s.
Firstly, by comparing the exponential for
, we have:
Secondly, compare the coefficient for the term
inside the exponential and we have:
Thirdly, check the coefficient for
so we have:
therefore:
Finally, recall that for an arbitrary column vector
:
The terms that solely dependent on
should equal to each other, so:
Having arrived in:
we obtain:
Recall the definition for mean we ends in (4.214) since:
This finishes proving that the posterior distribution for MVN takes the form: