Exercise 3.20 - Class conditional densities for binary data

Answers

For question (a), we have:

p ( 𝐱 | y = c ) = i = 1 D p ( x i | y = c , x 1 , . . . , x i 1 ) .

The number of parameter in this case is:

C i = 0 D 1 2 i = C ( 2 D + 1 2 ) = 𝒪 ( C 2 D ) .

For question (b) and (c), the overfitting is generally assumed to decline when N grows. The dependence within the variables generally hinder generalization when N is small, but it could correctly capture the dependence, be it exist.

For question (d), fitting each parameter for the naive Bayes model requires averaging all components in the samples belong to one class, which is of complexity order 𝒪 ( N ) . The total complexity is thus of order 𝒪 ( C D N ) . For the full Bayes classifier, the complexity is of order 𝒪 ( C 2 D N ) .

For question (e), the cost of inference in the naive Bayes model is 𝒪 ( C D ) for each sample. While that for full Bayes model is 𝒪 ( C 2 D ) .

For question (f), we have:

p ( y | 𝐱 v , 𝜃 ) p ( y | 𝜃 ) p ( 𝐱 v | 𝜃 ) 𝐱 h p ( 𝐱 v , 𝐱 h | 𝜃 ) ,

where we assumed a uniform prior on all classes since it is not the bottleneck of complexity. The complexity for the naive model is then:

𝒪 ( C v 2 h ) ,

while that for the full model is:

𝒪 ( C 2 v ) 2 h ) .

User profile picture
2021-03-24 13:42
Comments