STAT110: Fall 2020
  • Welcome!
  • FAQ
  • Resources
  • Section 1
    • Questions
  • Section 2
    • Questions
  • Section 3
    • Questions
  • Section 4
    • Questions
  • Section 5
    • Questions
  • Section 6
    • Questions
  • Section 7
    • Questions
  • Section 8
    • Questions
  • Section 9
    • Questions
  • Section 10
    • Questions
Powered by GitBook
On this page
  1. Section 7

Questions

PreviousSection 7NextSection 8

Last updated 4 years ago

Was this helpful?

CtrlK
  • Housing Day
  • Jelly Beans
  • Stat Courses

Was this helpful?

Housing Day

Suppose Harvard College is conducting its housing lottery. For simplicity's sake, we'll say that there are 1200 Freshmen that will be randomly assigned to 12 houses. Let X1,X2,…,X12X_1, X_2, \ldots, X_{12}X1​,X2​,…,X12​ count how many students are place in Pforzheimer (X1X_1X1​), all the way to Eliot (X12X_{12}X12​) (organized by best house to worst).

Are X1X_1X1​ and X2X_2X2​ independent?

No they are not. Since the number of Freshmen is constrained to 1200, knowing that a lot of people got into one house decreases the number of people that could be in the remaining houses.

What is the joint distribution of X1,X2,…,X12X_1, X_2, \ldots, X_{12}X1​,X2​,…,X12​?

By the story of the Multinomial distribution, (X1,X2,…,X12)∼Mult12(1200,(1/12,…,1/12))(X_1, X_2, \ldots, X_{12}) \sim Mult_{12}\left({1200, \left({1/12, \dots, 1/12}\right)}\right)(X1​,X2​,…,X12​)∼Mult12​(1200,(1/12,…,1/12))

What is the marginal distribution of X1X_1X1​, the number of students who are placed into Pforzheimer House, and the joint distribution of X1X_1X1​ and 1200−X11200 - X_11200−X1​?

In this case, we can group together bins that are not in Pforzheimer House together. We have

What is the conditional distribution of X1X_1X1​ given X10+X11+X12=450X_{10} + X_{11} + X_{12} = 450X10​+X11​+X12​=450?

X1∣X10+X11+X12=450∼Bin(750,1/9)X_1 | X_{10} + X_{11} + X_{12} = 450 \sim Bin\left({750, 1/9}\right)X1​∣X10​+X11​+X12​=450∼Bin(750,1/9)

Jelly Beans

I have a jar of 30 jellybeans: 10 red, 8 green, 12 blue. I draw a sample of 12 jellybeans without replacement. Let XXX be the number of red jellybeans in the sample, YYY the number of green jellybeans.

Find .

Let , and , where

We can now solve using indicator variables and the fundamental bridge.

It's good to do a little sanity check at the end: it makes sense that the covariance is negative. If the sample contains a lot of red jellybeans, the sample probably has fewer green jellybeans. Another way to solve this is to create an indicator for each red jellybean and each green jellybean in the jar, where the indicator equals 1 if the jellybean is in the sample and 0 otherwise.

Stat Courses

Let XXX be the number of statistics majors in a certain college in the class of 2030, viewed as an r.v. Each statistics major chooses between two tracks: a general track in statistical principles, and a track in quant finance. Suppose that each statistics major chooses randomly which of these two tracks to follow, independently, with probability ppp of choosing the general track. Let YYY be the number of statistics majors who choose the general track, and ZZZ be the number of statistics majors who choose the quantitative finance track.

Suppose that . Find the correlation between and .

By the chicken-egg story, we know that and are independent Poisson random variables, with rate parameters and , respectively. We must first find the covariance between and .

We now plug this into the equation for correlation:

Let be the size of the Class of 2030, where is a known constant. For this part and the next, instead of assuming that is Poisson, assume that each of the students chooses to be a statistics major with probability , independently. Find the joint distributions of , , and the number of non-statistics majors, and their marginal distributions.

Under this new model, we have that . By the multiplication rule, we have that the probability of a student becoming a general Statistician is , a Goldman-Sachs Statistician is , and a non-Statstician (lame) is . Therefore, we can apply the story of the Multinomial here:

Continuing as in the previous part, find the correlation between XXX and YYY.

We use the fact that covariance of the marginal distributions in a multinomial is given by −npipj-np_i p_j−npi​pj​.

Cov(X,Y)=Cov(Y+Z,Y)=Var(Y)+Cov(Z,Y)=nrp(1−rp)−n(rp)(rq)=npr(1−r)\begin{aligned} Cov(X,Y) &= Cov(Y+Z, Y)\\ &= Var(Y) + Cov(Z,Y) \\ &= nrp(1-rp) - n(rp)(rq)\\ &=npr(1-r) \end{aligned}Cov(X,Y)​=Cov(Y+Z,Y)=Var(Y)+Cov(Z,Y)=nrp(1−rp)−n(rp)(rq)=npr(1−r)​
X1∼Bin(1200,1/12)X_1 \sim Bin\left({1200, 1/12}\right)X1​∼Bin(1200,1/12)
(X1,1200−X1)∼Mult2(1200,(1/12,11/12))(X_1, 1200 - X_1) \sim Mult_2\left({1200, (1/12, 11/12)}\right)(X1​,1200−X1​)∼Mult2​(1200,(1/12,11/12))
Cov(X,Y)Cov(X, Y)Cov(X,Y)
X=I1+…+I12X = I_1 + \ldots + I_{12}X=I1​+…+I12​
Y=J1+…+J12Y = J_1 + \ldots + J_{12}Y=J1​+…+J12​
Ii={1if ith jellybean in sample is red0otherwiseJi={1if ith jellybean in sample is green0otherwise\begin{aligned} I_i &= \begin{cases} 1 & \textrm{if $$i$$th jellybean in sample is red} \\ 0 & \textrm{otherwise} \end{cases} \\ J_i &= \begin{cases} 1 & \textrm{if $$i$$th jellybean in sample is green} \\ 0 & \textrm{otherwise} \end{cases}\end{aligned}Ii​Ji​​={10​if ith jellybean in sample is redotherwise​={10​if ith jellybean in sample is greenotherwise​​
Cov(I1,J1)=E(I1J1)−E(I1)E(J1)=0−(1030)(830)Cov(I1,J2)=E(I1J2)−E(I1)E(J2)=(1030)(829)−(1030)(830)Cov(X,Y)=∑i=112Cov(Ii,Ji)+2∑i<jCov(Ii,Jj)=∑i=112Cov(I1,J1)+2(122)Cov(I1,J2)=12⋅Cov(I1,J1)+12⋅11⋅Cov(I1,J2)=−96145\begin{aligned} Cov\left({I_1, J_1}\right) &= E\left({I_1 J_1}\right) - E\left({I_1}\right) E\left({J_1}\right) \\ &= 0 - \left({\frac{10}{30}}\right)\left({\frac{8}{30}}\right) \\ Cov\left({I_1, J_2}\right) &= E\left({I_1 J_2}\right) - E\left({I_1}\right) E\left({J_2}\right) \\ &= \left({\frac{10}{30}}\right)\left({\frac{8}{29}}\right) - \left({\frac{10}{30}}\right)\left({\frac{8}{30}}\right) \\ Cov\left({X, Y}\right) &= \sum_{i=1}^{12} Cov\left({I_i, J_i}\right) + 2 \sum_{i < j} Cov\left({I_i, J_j}\right) \\ &= \sum_{i=1}^{12} Cov\left({I_1, J_1}\right) + 2 \binom{12}{2} Cov\left({I_1, J_2}\right) \\ &= 12 \cdot Cov\left({I_1, J_1}\right) + 12 \cdot 11 \cdot Cov\left({I_1, J_2}\right) \\ &= - \frac{96}{145}\end{aligned}Cov(I1​,J1​)Cov(I1​,J2​)Cov(X,Y)​=E(I1​J1​)−E(I1​)E(J1​)=0−(3010​)(308​)=E(I1​J2​)−E(I1​)E(J2​)=(3010​)(298​)−(3010​)(308​)=i=1∑12​Cov(Ii​,Ji​)+2i<j∑​Cov(Ii​,Jj​)=i=1∑12​Cov(I1​,J1​)+2(212​)Cov(I1​,J2​)=12⋅Cov(I1​,J1​)+12⋅11⋅Cov(I1​,J2​)=−14596​​
X∼Pois(λ)X \sim \text{Pois}(\lambda)X∼Pois(λ)
XXX
YYY
YYY
ZZZ
λp\lambda pλp
λq\lambda qλq
XXX
YYY
Cov(X,Y)=Cov(Y+Z,Y)=Var(Y)+Cov(Y,Z)=λpCov(X,Y) = Cov(Y+Z, Y) = Var(Y) + Cov(Y,Z) = \lambda pCov(X,Y)=Cov(Y+Z,Y)=Var(Y)+Cov(Y,Z)=λp
Corr(X,Y)=Cov(X,Y)Var(X)Var(Y)=λpλλp=pCorr(X,Y) =\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{\lambda p}{\sqrt{\lambda \lambda p}} = \sqrt{p}Corr(X,Y)=Var(X)Var(Y)​Cov(X,Y)​=λλp​λp​=p​
nnn
nnn
XXX
nnn
rrr
YYY
ZZZ
X∼Bin(n,r)X\sim \text{Bin}(n,r)X∼Bin(n,r)
rprprp
rqrqrq
1−r1-r1−r
(Y,Z,n−X)∼Mult3(n,(rp,rq,1−r))(Y, Z, n-X) \sim Mult_3(n, (rp, rq, 1-r))(Y,Z,n−X)∼Mult3​(n,(rp,rq,1−r))