Section 9

Conditional Expectation, LLN, and CLT (BH Chapter 9)

Conditioning on an Event - We can find the expected value of Y given that event A or X = x has occurred.

This would be finding the values of E(Y|A) and E(Y|X = x). Note that conditioning in an event results in a number. Note the similarities between regularly finding expectation and finding the conditional expectation. The expected value of a dice roll given that it is prime is 132+133+135=313\frac{1}{3}2 + \frac{1}{3}3 + \frac{1}{3}5 = 3\frac{1}{3}. We are still taking an average.

Conditioning on a Random Variable - We can also find the expected value of Y given the random variable X. The resulting expectation, E(Y|X) is not a number but a function of the random variable X. For an easy way to find E(Y|X), find E(Y|X = x) and then plug in X for all x. This changes the conditional expectation of Y from a function of a number x, to a function of the random variable X.

Law of Total Expectation

This is an extension of the Law of Total Probability. For any set of events B1,B2,B3,...BnB_1, B_2, B_3, ... B_n that partition the sample space (simplest case being {BB, BcB^c}):

E(X)=xxP(X=x)=xxiP(X=xBi)P(Bi)=ixxP(X=xBi)P(Bi)=iE(XBi)P(Bi)E(X) = \sum_{x} x P(X=x)= \sum_{x} x \sum_{i} P(X =x| B_i) P(B_i) = \sum_i \sum_x xP(X=x|B_i) P(B_i) = \sum_i E(X|B_i)P(B_i)

Properties of Conditional Independence

Independence - if X and Y are independent, then YXXY | X \sim X as conditioning on XX gives us no additional information about YY:

E(YX)=E(Y)E(Y | X) = E(Y)

Taking out what’s Known - If we are finding the expectation that involves some function of h(X)h(X) and we're conditioning on XX, then we can treat h(X)h(X) as a constant because XX is known. E(h(X)YX)=h(X)E(YX)E(h(X) Y | X) = h(X) E(Y | X)

Linearity - We have linearity in the first term:

E(aY1+bY2X)=aE(Y1X)+bE(Y2X)E(aY_1 + bY_2 | X) = a E(Y_1 | X) + bE(Y_2 | X)

Law of Iterated Expectation (Adam’s Law) - For any two random variables X, Y,

E(E(YX))=E(Y)E(E(Y | X)) = E(Y)

We can prove the law of iterated expectation with the law of total expectation. Let g(X)=E(YX)g(X) = E(Y|X) since E(YX)E(Y|X) is a random variable. Then,

E(E(YX))=E(g(X))=xg(X)P(X=x)=xE(YX=x)P(X=x)=E(Y)E(E(Y|X)) = E(g(X)) = \sum_{x} g(X) \cdot P(X = x) = \sum_x E(Y | X=x) \cdot P(X = x) = E(Y)

You can also do Adam's Law with extra conditioning: E(YX)=E(E(YX,Z)X)E(Y |X) = E(E(Y |X, Z)|X)

Eve’s Law - For any two random variables X, Y:

Var(Y)=EX(Var(YX))+VarX(E(YX))Var{(Y)} = E_X( Var{(Y | X)}) + Var_X{(E(Y | X))}

Both Var(YX)Var{(Y | X)} and E(YX)E(Y | X) are functions of the random variable XX.

Law of Large Numbers (LLN)

Let us have X1,X2,X3X_1, X_2, X_3 \dots be i.i.d.. We define Xˉn=X1+X2+X3++Xnn\bar{X}_n = \frac{X_1 + X_2 + X_3 + \dots + X_n}{n} The Law of Large Numbers states that as nn \longrightarrow \infty, XˉnE(X)\bar{X}_n \longrightarrow E(X).

Central Limit Theorem (CLT)

Approximation using CLT: We use ˙\dot{\,\sim\,} to denote is approximately distributed. We can use the central limit theorem when we have a random variable, Y that is a sum of n i.i.d. random variables with n large. Let us say that E(Y)=μYE(Y) = \mu_Y and Var(Y)=σY2Var(Y) = \sigma^2_Y. Then,

Y˙N(μY,σY2)Y \dot{\,\sim\,} N(\mu_Y, \sigma^2_Y)

When we use central limit theorem to estimate YY, we usually have Y=X1+X2++XnY = X_1 + X_2 + \dots + X_n or Y=Xˉn=1n(X1+X2++Xn)Y = \bar{X}_n= \frac{1}{n}(X_1 + X_2 + \dots + X_n). Specifically, if we say that each of the XiX_i have mean μX\mu_X and σX2\sigma^2_X, then we have the following approximations.

X1+X2++Xn˙N(nμX,nσX2)X_1 + X_2 + \dots + X_n \dot{\,\sim\,} N(n\mu_X, n\sigma^2_X)
Xˉn=1n(X1+X2++Xn)˙N(μX,σX2n)\bar{X}_n = \frac{1}{n}(X_1 + X_2 + \dots + X_n) \dot{\,\sim\,} N(\mu_X, \frac{\sigma^2_X}{n})

Asymptotic Distributions using CLT: We use d\xrightarrow{d} to denote converges in distribution to as nn \longrightarrow \infty. These are the same results as the previous section, only letting nn \longrightarrow \infty and not letting our normal distribution have sany nn terms. 1σn(X1++XnnμX)dN(0,1)\frac{1}{\sigma\sqrt{n}} (X_1 + \dots + X_n - n\mu_X) \xrightarrow{d} N(0, 1)

XˉnμXσndN(0,1)\frac{\bar{X}_n - \mu_X}{\frac{\sigma}{\sqrt{n}}} \xrightarrow{d} N(0, 1)