From the given (11.118) and (11.119):
While
is defined by (11.120).
For question (a), recall that:
We are now ready to taking partial gradient of
w.r.t.
, which yields:
using (4.10) for the last step. Now we have arrived in (11.121).
For question (b):
For question (c), with:
we have:
where no constant factor is missed.
For question (d), we have:
during which process we need to use (4.10) and the fact:
for a symmetric matrix
. Thus the optimal
takes the same form as what has been derived in exercise 11.2.
For question (e), this process is redundant since the MLE for
is already a positive definite matrix.