Conditional Expectation, LLN, and CLT (BH Chapter 9)

**Conditioning on an Event** - We can find the expected value of *Y* given that event *A* or *X* = *x* has occurred.

This would be finding the values of *E*(*Y*|*A*) and *E*(*Y*|*X* = *x*). Note that conditioning in an event results in a *number*. Note the similarities between regularly finding expectation and finding the conditional expectation. The expected value of a dice roll given that it is prime is $\frac{1}{3}2 + \frac{1}{3}3 + \frac{1}{3}5 = 3\frac{1}{3}$. We are still taking an average.

**Conditioning on a Random Variable** - We can also find the expected value of *Y* given the random variable *X*. The resulting expectation, *E*(*Y*|*X*) is *not a number but a function of the random variable X*. For an easy way to find *E*(*Y*|*X*), find *E*(*Y*|*X* = *x*) and then plug in *X* for all *x*. This changes the conditional expectation of *Y* from a function of a number *x*, to a function of the random variable *X*.

This is an extension of the *Law of Total Probability*. For any set of events $B_1, B_2, B_3, ... B_n$ that partition the sample space (simplest case being {$B$, $B^c$}):

$E(X) = \sum_{x} x P(X=x)= \sum_{x} x \sum_{i} P(X =x| B_i) P(B_i) = \sum_i \sum_x xP(X=x|B_i) P(B_i) = \sum_i E(X|B_i)P(B_i)$

**Independence** - if *X* and *Y* are independent, then $Y | X \sim X$ as conditioning on $X$ gives us no additional information about $Y$:

$E(Y | X) = E(Y)$

**Taking out what’s Known** - If we are finding the expectation that involves some function of $h(X)$ and we're conditioning on $X$, then we can treat $h(X)$ as a constant because $X$ is known. $E(h(X) Y | X) = h(X) E(Y | X)$

**Linearity** - We have linearity in the first term:

$E(aY_1 + bY_2 | X) = a E(Y_1 | X) + bE(Y_2 | X)$

**Law of Iterated Expectation (Adam’s Law)** - For any two random variables *X*, *Y*,

$E(E(Y | X)) = E(Y)$

We can prove the law of iterated expectation with the law of total expectation. Let $g(X) = E(Y|X)$ since $E(Y|X)$ is a random variable. Then,

$E(E(Y|X)) = E(g(X)) = \sum_{x} g(X) \cdot P(X = x) = \sum_x E(Y | X=x) \cdot P(X = x) = E(Y)$

You can also do Adam's Law with extra conditioning: $E(Y |X) = E(E(Y |X, Z)|X)$

**Eve’s Law** - For any two random variables *X*, *Y*:

$Var{(Y)} = E_X( Var{(Y | X)}) + Var_X{(E(Y | X))}$

Both $Var{(Y | X)}$ and $E(Y | X)$ are **functions** of the random variable $X$.

Let us have $X_1, X_2, X_3 \dots$ be i.i.d.. We define $\bar{X}_n = \frac{X_1 + X_2 + X_3 + \dots + X_n}{n}$ The Law of Large Numbers states that as $n \longrightarrow \infty$, $\bar{X}_n \longrightarrow E(X)$.

**Approximation using CLT**: We use $\dot{\,\sim\,}$ to denote *is approximately distributed*. We can use the central limit theorem when we have a random variable, *Y* that is a sum of *n* i.i.d. random variables with *n* large. Let us say that $E(Y) = \mu_Y$ and $Var(Y) = \sigma^2_Y$. Then,

$Y \dot{\,\sim\,} N(\mu_Y, \sigma^2_Y)$

When we use central limit theorem to estimate $Y$, we usually have $Y = X_1 + X_2 + \dots + X_n$ or $Y = \bar{X}_n= \frac{1}{n}(X_1 + X_2 + \dots + X_n)$. Specifically, if we say that each of the $X_i$ have mean $\mu_X$ and $\sigma^2_X$, then we have the following approximations.

$X_1 + X_2 + \dots + X_n \dot{\,\sim\,} N(n\mu_X, n\sigma^2_X)$

$\bar{X}_n = \frac{1}{n}(X_1 + X_2 + \dots + X_n) \dot{\,\sim\,} N(\mu_X, \frac{\sigma^2_X}{n})$

**Asymptotic Distributions using CLT**: We use $\xrightarrow{d}$ to denote *converges in distribution to* as $n \longrightarrow \infty$. These are the same results as the previous section, only letting $n \longrightarrow \infty$ and not letting our normal distribution have sany $n$ terms. $\frac{1}{\sigma\sqrt{n}} (X_1 + \dots + X_n - n\mu_X) \xrightarrow{d} N(0, 1)$

$\frac{\bar{X}_n - \mu_X}{\frac{\sigma}{\sqrt{n}}} \xrightarrow{d} N(0, 1)$