# Dice Collectors

Assume a standard 6-sided dice.

Q
A
Q

What is the expected number of times a die must be rolled until the numbers 1 through 6 have all shown up at least once?

A

Let $X$ be the number of times the die is rolled. Let's express $X$ as the sum of simpler r.v.s:

$X = T_1 + T_2 + \ldots + T_6$

where $T_1$ is the number of rolls until the $1^{st}$ unique number shows up, $T_2$ is the number of additional rolls until the $2^{nd}$ unique number, and so on. Note that $T_1$ always equals 1. Using the story of the Geometric, $T_2 \sim 1 + Geom\left({\frac{5}{6}}\right)$, $T_3 \sim 1 + Geom\left({\frac{4}{6}}\right)$, etc.

By linearity of expectation:

\begin{aligned} E(X) &= E(T_1) + E(T_2) + \ldots + E(T_6) \\ &= 1 + \frac{6}{5} + \frac{6}{4} + \frac{6}{3} + \frac{6}{2} + \frac{6}{1} \\ &= \boxed{14.7}\end{aligned}

# Mutual Friends

Alice and Bob have just met, and wonder whether they have a mutual friend. Each has 50 friends, out of 1000 other people who live in their town. They think that it's unlikely that they have a friend in common, saying each of us is only friends with 5% of the people here, so it would be very unlikely that our two 5%'s overlap.

Assume that Alice's 50 friends are a random sample of the 1000 people (equally likely to be any 50 of the 1000), and similarly for Bob. Also assume that knowing who Alice's friends are gives no information about who Bob's friends are. Let $X$ be the number of mutual friends they have.

Q
A
Q

Compute $E(X)$

A

Let $I_k$ be an indicator random variable representing whether the $k$th person is a mutual friend of Alice and Bob. Then, we can write

\begin{aligned} E(X) &= E\left( \sum_{k = 1}^{1000} I_k \right) \\ &= \sum_{k = 1}^{1000} E(I_k) \\ &= 1000 \cdot P(I_k = 1) \\ &= 1000 \cdot \frac{50}{1000}^2 = \boxed{2.5}\end{aligned}

Here, we used linearity of expectation and then the fundamental bridge. The probability that any particular person is a mutual friend of Alice and Bob is $(50/1000)^2$ because being a friend of Alice and a friend of Bob are independent events.

Q
A
Q

Find the PMF of $X$.

A

Suppose that Alice is friends with a fixed group of 50 people. We need to calculate the probability that out of those 50 people, exactly $k$ of them are also Bob's friends. This would be the probability $P(X = k)$.

First, we need to pick $k$ of these 50 of Alice's friends to be Bob's friends as well. Then, we need to choose $50-k$ other friends for Bob from the 950 remaining non-Alice friends. We multiply those two together to get the total number of ways Bob can have 50 friends. Finally, we divide by the number of ways Bob could have any 50 friends out of 1000. We get:

$P(X = k) = \frac{{50 \choose k} {950 \choose 50-k}}{{1000 \choose 50}}$
Q
A
Q

Is the distribution of X one of the important distributions we have looked at? If so, which?

A

Yes, this is the PMF of a hypergeometric distribution. We can think of the problem as follows: Bob is choosing 50 balls out of 50 white balls (Alice's friends) and 950 black balls (not Alice's friends). The number of white balls he ends up choosing is distributed $HGeom(50, 950, 50)$.

# Coin Runs

A coin with probability $p$ of Heads is flipped $n$ times. The sequence of outcomes canbe divided into runs (blocks of H's or blocks of T's), e.g., $HHHTTHTTTH$ becomes $\boxed{HHH}\boxed{TT}\boxed{H}\boxed{TTT}\boxed{H}$ , which has 5 runs.

Q
A
Q

Find the expected number of runs.

Hint: Start by finding the expected number of tosses (other than the first) where the outcome is different from the previous one.

A

Let $I_j$ be an indicator random variable for whether or not the $j$th flip is different from the $j-1$st, for $j = 2, 3, \ldots n$. The probability that the $j$th flip is different from the $j-1$st is equal to $p(1-p) + (1-p)p = 2p(1-p)$, meaning $E(I_j) = 2p(1-p)$. Let $X$ be the number of flips such that that flip is different from the previous one. Then $X = I_2 + \ldots + I_n$ and

\begin{aligned} E(X) &= (n-1) \cdot E(I_j) \\ &= 2(n-1)(p)(1-p)\end{aligned}

The number of flips for which this is true is one less than the number of runs, so the expected number of runs is: $\boxed{1 + 2(n-1)(p)(1-p)}$.