Exercise 5.8 - MLE and model selection for a 2d discrete distribution

Answers

For question (a), the joint distribution $p (x, y | 𝜃_{1}, 𝜃_{2})$ is given by:

\begin{aligned} p (x = 0, y = 0) & = (1 - 𝜃_{1}) \cdot 𝜃_{2}, \\ p (x = 0, y = 1) & = (1 - 𝜃_{1}) \cdot (1 - 𝜃_{2}), \\ p (x = 1, y = 0) & = 𝜃_{1} \cdot (1 - 𝜃_{2}), \\ p (x = 1, y = 1) & = 𝜃_{1} \cdot 𝜃_{2} . \end{aligned}

This can be compactly written as:

p (x, y | 𝜃_{1}, 𝜃_{2}) = 𝜃_{1}^{x} {(1 - 𝜃_{1})}^{(1 - x)} 𝜃_{2}^{x ⊙ y} {(1 - 𝜃_{2})}^{(1 - x ⊙ y)},

where $⊙$ is the Exclusive NOR operator.

For question (b), the MLE for $𝜃_{1}$ is $\frac{4}{7}$ , while that for $𝜃_{2}$ is $\frac{4}{7}$ . Since we assumed the independency between $𝜃_{1}$ and $𝜃_{2}$ , both MLE can be arrived at by simply counting. The evidence is given by:

p (𝒟 | 𝜃_{MLE}) = {(\frac{4}{7})}^{4} \cdot {(\frac{3}{7})}^{3} \cdot {(\frac{3}{7})}^{3} \cdot {(\frac{4}{7})}^{4} .

For question (c), the MLE for $𝜃$ is computed by normalizing the counting vector: $(2, 1, 2, 2)$ , so:

𝜃_{MLE} = (\frac{2}{7}, \frac{2}{7}, \frac{1}{7}, \frac{2}{7}) .

The evidence is:

{(\frac{2}{7})}^{2} \cdot {(\frac{2}{7})}^{2} \cdot (\frac{1}{7}) \cdot {(\frac{2}{7})}^{2} .

For question (d):

import math 
x=[1,1,0,1,1,0,0] 
y=[1,0,0,0,1,0,1] 
l2=0 
l4=0 
e=1/10**5 
for shadow in range(7): 
   temp1=0 
   temp2=0 
   temp00=0 
   temp10=0 
   temp01=0 
   temp11=0 
   for i in range(len(x)): 
       if i==shadow: 
           continue 
       if x[i]==1: 
           temp1=temp1+1 
       if x[i]==y[i]: 
           temp2=temp2+1 
       if x[i]==0 and y[i]==0: 
           temp00=temp00+1 
       if x[i]==0 and y[i]==1: 
           temp01=temp01+1 
       if x[i]==1 and y[i]==0: 
           temp10=temp10+1 
       if x[i]==1 and y[i]==1: 
           temp11=temp11+1 
   theta_1=temp1/(len(x)-1) 
   theta_2=temp2/(len(x)-1) 
   s=temp00+temp01+temp10+temp11 
   theta_00=temp00/s 
   theta_01=temp01/s 
   theta_10=temp10/s 
   theta_11=temp11/s 
   p2=theta_1**(x[shadow])*(1-theta_1)**(1-x[shadow])*theta_2**(1-x[shadow]^y[shadow])*(1-theta_2)**(x[shadow]^y[shadow]) 
   p4=theta_00**(x[shadow]==0 and y[shadow==0])*theta_01**(x[shadow]==0 and y[shadow==1])*theta_10**(x[shadow]==1 and y[shadow==0])*theta_11**(x[shadow]==1 and y[shadow==1]) 
   l2=l2+math.log(p2+e) 
   l4=l4+math.log(p4+e) 
print(l2) 
print(l4)

The result is:

-12.136441189337646 
-28.04302596169576

Hence the CV will pick $M_{2}$ . The reason behind is that $M_{4}$ assumes zero probability for $(0, 1)$ during the cross-validation, which significantly declines the confidence.

For question (e), the BICs for $M_{2}$ and $M_{4}$ are respectively:

BIC (M_{2}, 𝒟) = - 11.51,

BIC (M_{4}, 𝒟) = - 12.38 .

Hence the BIC prefers $M_{2}$ as well.

solour_lfq

2021-03-24 13:42

Exercise 5.8 - MLE and model selection for a 2d discrete distribution

Answers

Comments

Add answer