# Section 9

Conditional Expectation, LLN, and CLT (BH Chapter 9)

Conditioning on an Event - We can find the expected value of Y given that event A or X = x has occurred.

This would be finding the values of E(Y|A) and E(Y|X = x). Note that conditioning in an event results in a number. Note the similarities between regularly finding expectation and finding the conditional expectation. The expected value of a dice roll given that it is prime is $\frac{1}{3}2 + \frac{1}{3}3 + \frac{1}{3}5 = 3\frac{1}{3}$. We are still taking an average.

Conditioning on a Random Variable - We can also find the expected value of Y given the random variable X. The resulting expectation, E(Y|X) is not a number but a function of the random variable X. For an easy way to find E(Y|X), find E(Y|X = x) and then plug in X for all x. This changes the conditional expectation of Y from a function of a number x, to a function of the random variable X.

# Law of Total Expectation

This is an extension of the Law of Total Probability. For any set of events $B_1, B_2, B_3, ... B_n$ that partition the sample space (simplest case being {$B$, $B^c$}):

$E(X) = \sum_{x} x P(X=x)= \sum_{x} x \sum_{i} P(X =x| B_i) P(B_i) = \sum_i \sum_x xP(X=x|B_i) P(B_i) = \sum_i E(X|B_i)P(B_i)$

# Properties of Conditional Independence

Independence - if X and Y are independent, then $Y | X \sim X$ as conditioning on $X$ gives us no additional information about $Y$:

$E(Y | X) = E(Y)$

Taking out what’s Known - If we are finding the expectation that involves some function of $h(X)$ and we're conditioning on $X$, then we can treat $h(X)$ as a constant because $X$ is known. $E(h(X) Y | X) = h(X) E(Y | X)$

Linearity - We have linearity in the first term:

$E(aY_1 + bY_2 | X) = a E(Y_1 | X) + bE(Y_2 | X)$

Law of Iterated Expectation (Adam’s Law) - For any two random variables X, Y,

$E(E(Y | X)) = E(Y)$

We can prove the law of iterated expectation with the law of total expectation. Let $g(X) = E(Y|X)$ since $E(Y|X)$ is a random variable. Then,

$E(E(Y|X)) = E(g(X)) = \sum_{x} g(X) \cdot P(X = x) = \sum_x E(Y | X=x) \cdot P(X = x) = E(Y)$

You can also do Adam's Law with extra conditioning: $E(Y |X) = E(E(Y |X, Z)|X)$

Eve’s Law - For any two random variables X, Y:

$Var{(Y)} = E_X( Var{(Y | X)}) + Var_X{(E(Y | X))}$

Both $Var{(Y | X)}$ and $E(Y | X)$ are functions of the random variable $X$.

# Law of Large Numbers (LLN)

Let us have $X_1, X_2, X_3 \dots$ be i.i.d.. We define $\bar{X}_n = \frac{X_1 + X_2 + X_3 + \dots + X_n}{n}$ The Law of Large Numbers states that as $n \longrightarrow \infty$, $\bar{X}_n \longrightarrow E(X)$.

# Central Limit Theorem (CLT)

Approximation using CLT: We use $\dot{\,\sim\,}$ to denote is approximately distributed. We can use the central limit theorem when we have a random variable, Y that is a sum of n i.i.d. random variables with n large. Let us say that $E(Y) = \mu_Y$ and $Var(Y) = \sigma^2_Y$. Then,

$Y \dot{\,\sim\,} N(\mu_Y, \sigma^2_Y)$

When we use central limit theorem to estimate $Y$, we usually have $Y = X_1 + X_2 + \dots + X_n$ or $Y = \bar{X}_n= \frac{1}{n}(X_1 + X_2 + \dots + X_n)$. Specifically, if we say that each of the $X_i$ have mean $\mu_X$ and $\sigma^2_X$, then we have the following approximations.

$X_1 + X_2 + \dots + X_n \dot{\,\sim\,} N(n\mu_X, n\sigma^2_X)$
$\bar{X}_n = \frac{1}{n}(X_1 + X_2 + \dots + X_n) \dot{\,\sim\,} N(\mu_X, \frac{\sigma^2_X}{n})$

Asymptotic Distributions using CLT: We use $\xrightarrow{d}$ to denote converges in distribution to as $n \longrightarrow \infty$. These are the same results as the previous section, only letting $n \longrightarrow \infty$ and not letting our normal distribution have sany $n$ terms. $\frac{1}{\sigma\sqrt{n}} (X_1 + \dots + X_n - n\mu_X) \xrightarrow{d} N(0, 1)$

$\frac{\bar{X}_n - \mu_X}{\frac{\sigma}{\sqrt{n}}} \xrightarrow{d} N(0, 1)$