STAT110: Fall 2020
  • Welcome!
  • FAQ
  • Resources
  • Section 1
    • Questions
  • Section 2
    • Questions
  • Section 3
    • Questions
  • Section 4
    • Questions
  • Section 5
    • Questions
  • Section 6
    • Questions
  • Section 7
    • Questions
  • Section 8
    • Questions
  • Section 9
    • Questions
  • Section 10
    • Questions
Powered by GitBook
On this page
  • Law of Total Expectation
  • Properties of Conditional Independence
  • Law of Large Numbers (LLN)
  • Central Limit Theorem (CLT)

Was this helpful?

Section 9

Conditional Expectation, LLN, and CLT (BH Chapter 9)

Conditioning on an Event - We can find the expected value of Y given that event A or X = x has occurred.

This would be finding the values of E(Y|A) and E(Y|X = x). Note that conditioning in an event results in a number. Note the similarities between regularly finding expectation and finding the conditional expectation. The expected value of a dice roll given that it is prime is 132+133+135=313\frac{1}{3}2 + \frac{1}{3}3 + \frac{1}{3}5 = 3\frac{1}{3}31​2+31​3+31​5=331​. We are still taking an average.

Conditioning on a Random Variable - We can also find the expected value of Y given the random variable X. The resulting expectation, E(Y|X) is not a number but a function of the random variable X. For an easy way to find E(Y|X), find E(Y|X = x) and then plug in X for all x. This changes the conditional expectation of Y from a function of a number x, to a function of the random variable X.

Law of Total Expectation

This is an extension of the Law of Total Probability. For any set of events B1,B2,B3,...BnB_1, B_2, B_3, ... B_nB1​,B2​,B3​,...Bn​ that partition the sample space (simplest case being {BBB, BcB^cBc}):

E(X)=∑xxP(X=x)=∑xx∑iP(X=x∣Bi)P(Bi)=∑i∑xxP(X=x∣Bi)P(Bi)=∑iE(X∣Bi)P(Bi)E(X) = \sum_{x} x P(X=x)= \sum_{x} x \sum_{i} P(X =x| B_i) P(B_i) = \sum_i \sum_x xP(X=x|B_i) P(B_i) = \sum_i E(X|B_i)P(B_i)E(X)=x∑​xP(X=x)=x∑​xi∑​P(X=x∣Bi​)P(Bi​)=i∑​x∑​xP(X=x∣Bi​)P(Bi​)=i∑​E(X∣Bi​)P(Bi​)

Properties of Conditional Independence

Independence - if X and Y are independent, then Y∣X∼XY | X \sim XY∣X∼X as conditioning on XXX gives us no additional information about YYY:

E(Y∣X)=E(Y)E(Y | X) = E(Y)E(Y∣X)=E(Y)

Taking out what’s Known - If we are finding the expectation that involves some function of h(X)h(X)h(X) and we're conditioning on XXX, then we can treat h(X)h(X)h(X) as a constant because XXX is known. E(h(X)Y∣X)=h(X)E(Y∣X)E(h(X) Y | X) = h(X) E(Y | X)E(h(X)Y∣X)=h(X)E(Y∣X)

Linearity - We have linearity in the first term:

E(aY1+bY2∣X)=aE(Y1∣X)+bE(Y2∣X)E(aY_1 + bY_2 | X) = a E(Y_1 | X) + bE(Y_2 | X)E(aY1​+bY2​∣X)=aE(Y1​∣X)+bE(Y2​∣X)

Law of Iterated Expectation (Adam’s Law) - For any two random variables X, Y,

E(E(Y∣X))=E(Y)E(E(Y | X)) = E(Y)E(E(Y∣X))=E(Y)

We can prove the law of iterated expectation with the law of total expectation. Let g(X)=E(Y∣X)g(X) = E(Y|X)g(X)=E(Y∣X) since E(Y∣X)E(Y|X)E(Y∣X) is a random variable. Then,

E(E(Y∣X))=E(g(X))=∑xg(X)⋅P(X=x)=∑xE(Y∣X=x)⋅P(X=x)=E(Y)E(E(Y|X)) = E(g(X)) = \sum_{x} g(X) \cdot P(X = x) = \sum_x E(Y | X=x) \cdot P(X = x) = E(Y)E(E(Y∣X))=E(g(X))=x∑​g(X)⋅P(X=x)=x∑​E(Y∣X=x)⋅P(X=x)=E(Y)

You can also do Adam's Law with extra conditioning: E(Y∣X)=E(E(Y∣X,Z)∣X)E(Y |X) = E(E(Y |X, Z)|X)E(Y∣X)=E(E(Y∣X,Z)∣X)

Eve’s Law - For any two random variables X, Y:

Var(Y)=EX(Var(Y∣X))+VarX(E(Y∣X))Var{(Y)} = E_X( Var{(Y | X)}) + Var_X{(E(Y | X))}Var(Y)=EX​(Var(Y∣X))+VarX​(E(Y∣X))

Both Var(Y∣X)Var{(Y | X)}Var(Y∣X) and E(Y∣X)E(Y | X)E(Y∣X) are functions of the random variable XXX.

Law of Large Numbers (LLN)

Let us have X1,X2,X3…X_1, X_2, X_3 \dotsX1​,X2​,X3​… be i.i.d.. We define Xˉn=X1+X2+X3+⋯+Xnn\bar{X}_n = \frac{X_1 + X_2 + X_3 + \dots + X_n}{n}Xˉn​=nX1​+X2​+X3​+⋯+Xn​​ The Law of Large Numbers states that as n⟶∞n \longrightarrow \inftyn⟶∞, Xˉn⟶E(X)\bar{X}_n \longrightarrow E(X)Xˉn​⟶E(X).

Central Limit Theorem (CLT)

Approximation using CLT: We use  ∼ ˙\dot{\,\sim\,}∼˙ to denote is approximately distributed. We can use the central limit theorem when we have a random variable, Y that is a sum of n i.i.d. random variables with n large. Let us say that E(Y)=μYE(Y) = \mu_YE(Y)=μY​ and Var(Y)=σY2Var(Y) = \sigma^2_YVar(Y)=σY2​. Then,

Y ∼ ˙N(μY,σY2)Y \dot{\,\sim\,} N(\mu_Y, \sigma^2_Y)Y∼˙N(μY​,σY2​)

When we use central limit theorem to estimate YYY, we usually have Y=X1+X2+⋯+XnY = X_1 + X_2 + \dots + X_nY=X1​+X2​+⋯+Xn​ or Y=Xˉn=1n(X1+X2+⋯+Xn)Y = \bar{X}_n= \frac{1}{n}(X_1 + X_2 + \dots + X_n)Y=Xˉn​=n1​(X1​+X2​+⋯+Xn​). Specifically, if we say that each of the XiX_iXi​ have mean μX\mu_XμX​ and σX2\sigma^2_XσX2​, then we have the following approximations.

X1+X2+⋯+Xn ∼ ˙N(nμX,nσX2)X_1 + X_2 + \dots + X_n \dot{\,\sim\,} N(n\mu_X, n\sigma^2_X)X1​+X2​+⋯+Xn​∼˙N(nμX​,nσX2​)
Xˉn=1n(X1+X2+⋯+Xn) ∼ ˙N(μX,σX2n)\bar{X}_n = \frac{1}{n}(X_1 + X_2 + \dots + X_n) \dot{\,\sim\,} N(\mu_X, \frac{\sigma^2_X}{n})Xˉn​=n1​(X1​+X2​+⋯+Xn​)∼˙N(μX​,nσX2​​)

Asymptotic Distributions using CLT: We use →d\xrightarrow{d}d​ to denote converges in distribution to as n⟶∞n \longrightarrow \inftyn⟶∞. These are the same results as the previous section, only letting n⟶∞n \longrightarrow \inftyn⟶∞ and not letting our normal distribution have sany nnn terms. 1σn(X1+⋯+Xn−nμX)→dN(0,1)\frac{1}{\sigma\sqrt{n}} (X_1 + \dots + X_n - n\mu_X) \xrightarrow{d} N(0, 1)σn​1​(X1​+⋯+Xn​−nμX​)d​N(0,1)

Xˉn−μXσn→dN(0,1)\frac{\bar{X}_n - \mu_X}{\frac{\sigma}{\sqrt{n}}} \xrightarrow{d} N(0, 1)n​σ​Xˉn​−μX​​d​N(0,1)

PreviousQuestionsNextQuestions

Last updated 4 years ago

Was this helpful?