Homepage Solution manuals Kevin P. Murphy Machine Learning: a Probabilistic Perspective Exercise 8.4 - Gradient and Hessian of log-likelihood for multinomial logistic regression

Exercise 8.4 - Gradient and Hessian of log-likelihood for multinomial logistic regression

Answers

For question (a), given a sample indexed i we have,

μ 𝑖𝑘 = exp ( 𝐰 k T 𝐱 i ) c exp ( 𝐰 c T 𝐱 i ) ,

η 𝑖𝑗 = 𝐰 j T 𝐱 i .

Now we have:

μ 𝑖𝑘 η 𝑖𝑗 = exp ( η 𝑖𝑘 ) η 𝑖𝑗 c exp ( η 𝑖𝑐 ) c exp ( η 𝑖𝑐 ) η 𝑖𝑗 exp ( η 𝑖𝑘 ) ( c exp ( η 𝑖𝑐 ) ) 2 = exp ( η 𝑖𝑘 ) δ 𝑘𝑗 c exp ( η 𝑖𝑐 ) exp ( η 𝑖𝑗 ) exp ( η 𝑖𝑘 ) ( c exp ( η 𝑖𝑐 ) ) 2 = μ 𝑖𝑘 δ 𝑘𝑗 μ 𝑖𝑗 μ 𝑖𝑘 ,

what dominates is but the elementary calculus.

For question (b), recall that:

l ( 𝐖 ) = i = 1 N c y 𝑖𝑐 log μ 𝑖𝑐 .

Let l i ( 𝐖 ) = c y 𝑖𝑐 log μ 𝑖𝑐 , we are now ready for reduction:

l i 𝐰 j = 𝐰 j c y 𝑖𝑐 log μ 𝑖𝑐 = c y 𝑖𝑐 μ 𝑖𝑐 μ 𝑖𝑐 η 𝑖𝑗 η 𝑖𝑗 𝐰 j = c y 𝑖𝑐 μ 𝑖𝑐 μ 𝑖𝑐 ( δ 𝑐𝑗 μ 𝑖𝑗 ) 𝐱 i = c y 𝑖𝑐 ( 1 μ 𝑖𝑗 ) 𝐱 i = y 𝑖𝑗 ( 1 μ 𝑖𝑗 ) 𝐱 i c j y 𝑖𝑐 μ 𝑖𝑗 𝐱 i = y 𝑖𝑗 ( 1 μ 𝑖𝑗 ) 𝐱 i + ( y 𝑖𝑗 1 ) μ 𝑖𝑗 𝐱 i = ( y 𝑖𝑗 μ 𝑖𝑗 ) 𝐱 i .

Summarizing over i yields (8.126).

For question (c), we have by definition:

𝐇 c , c = 𝐰 c 𝐰 c l ( 𝐖 ) .

Hence we begin with the result from question (b):

𝐰 c 𝐰 c l i = 𝐰 c ( y 𝑖𝑐 μ 𝑖𝑐 ) 𝐱 i = 𝐰 c μ 𝑖𝑐 𝐱 = μ 𝑖𝑐 η i c η i c 𝐰 c 𝐱 i = μ 𝑖𝑐 ( δ c c μ i c ) 𝐱 i 𝐱 i T ,

where in the last step we have to adopt the outer product to span the Hessian. Summarizing over i yields the desired result (8.127).

User profile picture
2021-03-24 13:42
Comments