Exercise 3.2 - Marginal likelihood for the Beta-Bernoulli model

Answers

This exercise continues the discussion of the toy coin-toss experiment, so we borrow all symbols from the exercise above. The likelihood takes the form:

p (𝒟 | 𝜃) = 𝜃^{N_{1}} {(1 - 𝜃)}^{N_{0}} .

The prior distribution of $𝜃$ takes the form:

p (𝜃 | a, b) = Beta (𝜃 | a, b) \propto 𝜃^{a - 1} {(1 - 𝜃)}^{b - 1} = C_{1} (a, b) \cdot 𝜃^{a - 1} {(1 - 𝜃)}^{b - 1},

where we adopt $C_{1} (a, b)$ in the hope of eliminating the ambiguity of using $\propto$ , which, although simplifies the symbolization, results in countless errors.

The posterior distribution takes the form:

\begin{aligned} p (𝜃 | 𝒟, a, b) = & \frac{p (𝜃 | a, b) \cdot p (𝒟 | 𝜃, a, b)}{p (𝒟 | a, b)} \\ = & \frac{p (𝜃 | a, b) \cdot p (𝒟 | 𝜃)}{p (𝒟 | a, b)} \\ = & \frac{C_{1} (a, b)}{p (𝒟 | a, b)} \cdot 𝜃^{N_{1} + a - 1} \cdot {(1 - 𝜃)}^{N_{0} + b - 1} . \end{aligned}

The first step is the straightforward Bayesian rule, the second is the Markov property. In the last step, we adopt the equations before. Since $p (𝜃 | 𝒟, a, b)$ should be normalized w.r.t. $𝜃$ , it has to be a Beta distribution with hyperparameters $N_{1} + a, N_{0} + b$ . We can now derive the evidence of $𝒟$ w.r.t. $a$ and $b$ explicitly. The normalization of $p (𝜃 | 𝒟, a, b)$ indicates that:

\frac{C_{1} (a, b)}{p (𝒟 | a, b)} = C_{1} (N_{1} + a, N_{0} + b),

so:

p (𝒟 | a, b) = \frac{C_{1} (a, b)}{C_{1} (N_{1} + a, N_{0} + b)},

where $C_{1} (\cdot, \cdot)$ is the normalization factor for the Beta distribution. This is enough for deriving (3.80) by recalling the normalization of Beta distribution. The value of $p (𝒟 | a, b)$ can help us select proper hyperparametes.

As for prediction:

\begin{aligned} p (x_{new} = 1 | 𝒟, a, b) = & \int p (x_{new} = 1 | 𝜃, a, b) \cdot p (𝜃 | 𝒟, a, b) 𝑑𝜃 \\ = & \int p (x_{new} = 1 | 𝜃) \cdot p (𝜃 | 𝒟, a, b) 𝑑𝜃 \\ = & \int 𝜃 \cdot p (𝜃 | 𝒟, a, b) 𝑑𝜃 \\ = & 𝔼_{Beta (N_{1} + a, N_{0} + b)} (𝜃) = \frac{N_{1} + a}{N_{1} + a + N_{0} + b} . \end{aligned}

The first step is the Bayesian rule, the second is Markov property. The rest is straightforward algebra.

Concretely, we calcualte $p (𝒟)$ where $𝒟 = {1, 0, 0, 1, 1}$ :

\begin{align} p (𝒟) = & p (x_{1}) p (x_{2} | x_{1}) p (x_{3} | x_{2}, x_{1}) . . . p (x_{N} | x_{N - 1}, x_{N - 2}, . . . x_{1}) \\ = & \frac{a}{a + b} \frac{b}{a + b + 1} \frac{b + 2}{a + b + 2} \frac{a + 1}{a + b + 3} \frac{a + 2}{a + b + 4} . \end{align}

Rename the variables $α = a + b, α_{1} = a, α_{0} = b$ , we have (3.83). To derive (3.80), we make use of:

[(α_{1}) . . (α_{1} + N_{1} - 1)] = \frac{(α_{1} + N_{1} - 1)!}{(α_{1} - 1)!} = \frac{Γ (α_{1} + N_{1})}{Γ (α_{1})} .

solour_lfq

2021-03-24 13:42

Exercise 3.2 - Marginal likelihood for the Beta-Bernoulli model

Answers

Comments

Add answer