Exercise 1.62 (Birthday problem)

In the birthday problem, we assumed that all 365 days of the year are equally likely (and excluded February 29). In reality, some days are slightly more likely as birthdays than others. For example, scientists have long struggled to understand why more babies are born 9 months after a holiday. Let p = (p1,p2,,p365) be the vector of birthday probabilities, with pj the probability of being born on the j th day of the year (February 29 is still excluded, with no offense intended to Leap Dayers). The kth elementary symmetric polynomial in the variables x1,,xn is defined by

ek (x1,,xn) = 1j1<j2<<jknxj1xjk

This just says to add up all of the ( n k ) terms we can get by choosing and multiplying k of the variables. For example, e1 (x1,x2,x3) = x1 + x2 + x3,e2 (x1,x2,x3) = x1x2+ x1x3 + x2x3, and e3 (x1,x2,x3) = x1x2x3. Now let k 2 be the number of people.
(a) Find a simple expression for the probability that there is at least one birthday match, in terms of p and an elementary symmetric polynomial.
(b) Explain intuitively why it makes sense that P (at least one birthday match) is minimized when pj = 1 365 for all j, by considering simple and extreme cases.
(c) The famous arithmetic mean-geometric mean inequality says that for x,y 0,

x + y 2 xy

This inequality follows from adding 4xy to both sides of x2 2xy + y2 = (x y)2 0. Define r = (r1,,r365) by r1 = r2 = (p1 + p2) 2,rj = pj for 3 j 365. Using the arithmetic mean-geometric mean bound and the fact, which you should verify, that

ek (x1,,xn) = x1x2ek2 (x3,,xn) + (x1 + x2) ek1 (x3,,xn) + ek (x3,,xn) ,

show that P( at least one birthday mat ch p) P( at least one birthday match r), with strict inequality if pr, where the "given r notation means that the birthday probabilities are given by r. Using this, show that the value of p that minimizes the probability of at least one birthday match is given by pj = 1 365 for all j.

Answers

(a)
1 k!ek (p)
(b)
Consider the extreme case where p1 = 1 and pi = 0 for i1. Then, the probability that there is at least one birthday match is 1. In general, if pi > 1 365 for a particular i, then a birthday match is more likely, since that particular day is more likely to be sampled multiple times. Thus, it makes intuitive sense that the probability of at least one birthday match is minimized when pi = 1 365.
(c)
First, consider ek (x1,...,xn). We can break up this sum into the sum of three disjoint cases.
(a)
Sum of terms that contain both x1 and x2. This sum is given by x1x2ek2 (x3,...,xn)
(b)
Sum of terms that contain either x1 or x2 but not both. This sum is given by (x1 + x2) ek1 (x3,...,xn)
(c)
Sum of terms that don’t contain either x1 or x2. This sum is give by ek (x3,...,xn)

Thus,

ek(x1,...,xn) = x1x2ek2(x3,...,xn) + (x1 + x2)ek1(x3,...,xn) + ek(x3,...,xn)

Next, compare ek (p) and ek (r). Expanding the elementary symmetric polynomials, it is easy to see that the only difference between the two are the terms that contain either the first, the second or both terms from p and r respectively.

Notice that because r1 = r2 = p1+p2 2 , the sum of the terms with only r1 and only r2 but not both is exactly equal to (p1 + p2)ek1(x3,...,xn). Thus, the only difference between ek (p) and ek (r) are the terms p1p2ek2(x3,...,xn) and r1r2ek2(x3,...,xn).

By the arithmetic geometric mean inequailty, r1r2ek2(x3,...,xn) p1p2ek2(x3,...,xn). Hence, 1 k!ek(p) 1 k!ek(r).

In other words, given birthday probabilities p, we can potentially reduce the probability of having at least one birthday match by taking any two birthday probabilities and replacing them with their average. For a minimal probability of at least one birthday match then, all values pi in p must be equal, so that averaging any pi and pj does not change anything.

User profile picture
2021-12-05 00:00
Comments