Homepage Solution manuals Kevin P. Murphy Machine Learning: a Probabilistic Perspective Exercise 5.8 - MLE and model selection for a 2d discrete distribution

Exercise 5.8 - MLE and model selection for a 2d discrete distribution

Answers

For question (a), the joint distribution p ( x , y | 𝜃 1 , 𝜃 2 ) is given by:

p ( x = 0 , y = 0 ) = ( 1 𝜃 1 ) 𝜃 2 , p ( x = 0 , y = 1 ) = ( 1 𝜃 1 ) ( 1 𝜃 2 ) , p ( x = 1 , y = 0 ) = 𝜃 1 ( 1 𝜃 2 ) , p ( x = 1 , y = 1 ) = 𝜃 1 𝜃 2 .

This can be compactly written as:

p ( x , y | 𝜃 1 , 𝜃 2 ) = 𝜃 1 x ( 1 𝜃 1 ) ( 1 x ) 𝜃 2 x y ( 1 𝜃 2 ) ( 1 x y ) ,

where is the Exclusive NOR operator.

For question (b), the MLE for 𝜃 1 is 4 7 , while that for 𝜃 2 is 4 7 . Since we assumed the independency between 𝜃 1 and 𝜃 2 , both MLE can be arrived at by simply counting. The evidence is given by:

p ( 𝒟 | 𝜃 MLE ) = ( 4 7 ) 4 ( 3 7 ) 3 ( 3 7 ) 3 ( 4 7 ) 4 .

For question (c), the MLE for 𝜃 is computed by normalizing the counting vector: ( 2 , 1 , 2 , 2 ) , so:

𝜃 MLE = ( 2 7 , 2 7 , 1 7 , 2 7 ) .

The evidence is:

( 2 7 ) 2 ( 2 7 ) 2 ( 1 7 ) ( 2 7 ) 2 .

For question (d):

import math 
x=[1,1,0,1,1,0,0] 
y=[1,0,0,0,1,0,1] 
l2=0 
l4=0 
e=1/10**5 
for shadow in range(7): 
   temp1=0 
   temp2=0 
   temp00=0 
   temp10=0 
   temp01=0 
   temp11=0 
   for i in range(len(x)): 
       if i==shadow: 
           continue 
       if x[i]==1: 
           temp1=temp1+1 
       if x[i]==y[i]: 
           temp2=temp2+1 
       if x[i]==0 and y[i]==0: 
           temp00=temp00+1 
       if x[i]==0 and y[i]==1: 
           temp01=temp01+1 
       if x[i]==1 and y[i]==0: 
           temp10=temp10+1 
       if x[i]==1 and y[i]==1: 
           temp11=temp11+1 
   theta_1=temp1/(len(x)-1) 
   theta_2=temp2/(len(x)-1) 
   s=temp00+temp01+temp10+temp11 
   theta_00=temp00/s 
   theta_01=temp01/s 
   theta_10=temp10/s 
   theta_11=temp11/s 
   p2=theta_1**(x[shadow])*(1-theta_1)**(1-x[shadow])*theta_2**(1-x[shadow]^y[shadow])*(1-theta_2)**(x[shadow]^y[shadow]) 
   p4=theta_00**(x[shadow]==0 and y[shadow==0])*theta_01**(x[shadow]==0 and y[shadow==1])*theta_10**(x[shadow]==1 and y[shadow==0])*theta_11**(x[shadow]==1 and y[shadow==1]) 
   l2=l2+math.log(p2+e) 
   l4=l4+math.log(p4+e) 
print(l2) 
print(l4)

The result is:

-12.136441189337646 
-28.04302596169576

Hence the CV will pick M 2 . The reason behind is that M 4 assumes zero probability for ( 0 , 1 ) during the cross-validation, which significantly declines the confidence.

For question (e), the BICs for M 2 and M 4 are respectively:

BIC ( M 2 , 𝒟 ) = 11 . 51 ,

BIC ( M 4 , 𝒟 ) = 12 . 38 .

Hence the BIC prefers M 2 as well.

User profile picture
2021-03-24 13:42
Comments