Homepage Solution manuals Kevin P. Murphy Machine Learning: a Probabilistic Perspective Exercise 3.12 - MAP estimation for the Bernoulli with non-conjugate priors

Exercise 3.12 - MAP estimation for the Bernoulli with non-conjugate priors

Answers

For question (a), we adopt the different prior:

p ( 𝜃 ) = { 0.5 ,  if  𝜃 = 0.5 , 0.5 ,  if  𝜃 = 0.4 ,

The posterior distribution now reads:

p ( 𝜃 | 𝒟 ) = p ( 𝒟 | 𝜃 ) p ( 𝜃 ) p ( 𝒟 ) ,

whose support is { 0.4 , 0.5 } , so the MAP is:

max 𝜃 { 0.4 , 0.5 } { 𝜃 N 1 ( 1 𝜃 ) N 0 } .

For question (b), it is intuitive that the non-conjugate has better performance when N is small. But the conjugate Bayesian method prevails with N grows. For a solid verification, consider the case where N is large, the probability that N 1 N deviates 𝜖 from 0.41 can be bounded by the Chernoff bounding. Let { ξ n } n = 1 N be a collection of i.i.d. Bernoulli random variables with distribution:

ξ n = { 1 ,  with probability 0.41 , 0 ,  with probability 0.59 ,

Denote X = n = 1 N ξ n as the random variable marks their summation. Then:

Pr ( X N ( 0.41 + 𝜖 ) ) = Pr ( e λ X e 𝑁𝜆 ( 0.41 + 𝜖 ) ) 𝔼 [ e λ X ] e 𝑁𝜆 ( 0.41 + 𝜖 ) = ( 𝔼 [ e λ ξ 0 ] ) N e 𝑁𝜆 ( 0.41 + 𝜖 ) = ( 𝔼 [ e λ ξ 0 ] ) N e 𝑁𝜆 ( 0.41 + 𝜖 ) = ( 0.41 e λ + 0.59 ) N e 𝑁𝜆 ( 0.41 + 𝜖 ) .

Where λ can be an arbitrary positive number. The probability that X exceeds N ( 0.41 + 𝜖 ) , denoted by P 1 is bounded by the lower bound of the last line in the deduction above. With N = 1000 , 𝜖 = 0.05 , the probability is numerically bounded by 0.00602 . The other side of error Pr ( X N ( 0.41 𝜖 ) ) , whose probability is P 2 , can be derived in a similar way. The probability that the Bayesian way is dominated by the non-uniform prior is no higher than P 1 + P 2 . Taking 𝜖 0.1 and N , this bound remains negligible.

User profile picture
2021-03-24 13:42
Comments