Exercise 3.2 - Marginal likelihood for the Beta-Bernoulli model

Answers

This exercise continues the discussion of the toy coin-toss experiment, so we borrow all symbols from the exercise above. The likelihood takes the form:

p ( 𝒟 | 𝜃 ) = 𝜃 N 1 ( 1 𝜃 ) N 0 .

The prior distribution of 𝜃 takes the form:

p ( 𝜃 | a , b ) = Beta ( 𝜃 | a , b ) 𝜃 a 1 ( 1 𝜃 ) b 1 = C 1 ( a , b ) 𝜃 a 1 ( 1 𝜃 ) b 1 ,

where we adopt C 1 ( a , b ) in the hope of eliminating the ambiguity of using , which, although simplifies the symbolization, results in countless errors.

The posterior distribution takes the form:

p ( 𝜃 | 𝒟 , a , b ) = p ( 𝜃 | a , b ) p ( 𝒟 | 𝜃 , a , b ) p ( 𝒟 | a , b ) = p ( 𝜃 | a , b ) p ( 𝒟 | 𝜃 ) p ( 𝒟 | a , b ) = C 1 ( a , b ) p ( 𝒟 | a , b ) 𝜃 N 1 + a 1 ( 1 𝜃 ) N 0 + b 1 .

The first step is the straightforward Bayesian rule, the second is the Markov property. In the last step, we adopt the equations before. Since p ( 𝜃 | 𝒟 , a , b ) should be normalized w.r.t. 𝜃 , it has to be a Beta distribution with hyperparameters N 1 + a , N 0 + b . We can now derive the evidence of 𝒟 w.r.t. a and b explicitly. The normalization of p ( 𝜃 | 𝒟 , a , b ) indicates that:

C 1 ( a , b ) p ( 𝒟 | a , b ) = C 1 ( N 1 + a , N 0 + b ) ,

so:

p ( 𝒟 | a , b ) = C 1 ( a , b ) C 1 ( N 1 + a , N 0 + b ) ,

where C 1 ( , ) is the normalization factor for the Beta distribution. This is enough for deriving (3.80) by recalling the normalization of Beta distribution. The value of p ( 𝒟 | a , b ) can help us select proper hyperparametes.

As for prediction:

p ( x new = 1 | 𝒟 , a , b ) = p ( x new = 1 | 𝜃 , a , b ) p ( 𝜃 | 𝒟 , a , b ) 𝑑𝜃 = p ( x new = 1 | 𝜃 ) p ( 𝜃 | 𝒟 , a , b ) 𝑑𝜃 = 𝜃 p ( 𝜃 | 𝒟 , a , b ) 𝑑𝜃 = 𝔼 Beta ( N 1 + a , N 0 + b ) ( 𝜃 ) = N 1 + a N 1 + a + N 0 + b .

The first step is the Bayesian rule, the second is Markov property. The rest is straightforward algebra.

Concretely, we calcualte p ( 𝒟 ) where 𝒟 = { 1 , 0 , 0 , 1 , 1 } :

p ( 𝒟 ) = p ( x 1 ) p ( x 2 | x 1 ) p ( x 3 | x 2 , x 1 ) . . . p ( x N | x N 1 , x N 2 , . . . x 1 ) = a a + b b a + b + 1 b + 2 a + b + 2 a + 1 a + b + 3 a + 2 a + b + 4 .

Rename the variables α = a + b , α 1 = a , α 0 = b , we have (3.83). To derive (3.80), we make use of:

[ ( α 1 ) . . ( α 1 + N 1 1 ) ] = ( α 1 + N 1 1 ) ! ( α 1 1 ) ! = Γ ( α 1 + N 1 ) Γ ( α 1 ) .

User profile picture
2021-03-24 13:42
Comments