Expectations, Variance, and the Fundamental Bridge (BH Chapter 4)

The **Expected Value** (or *expectation, mean*) of a random variable can be thought of as the "weighted average" of the possible outcomes of the random variable. Mathematically, if $x_1, x_2, x_3,\cdots$, are all of the possible values that *X* can take, the expected value of *X* can be calculated as follows:

$E(X | A) = \sum\limits_{i}x_iP(X=x_i | A)$

**Linearity of Expectation**

The most important property of expected value is **Linearity of Expectation**. For **any** two random variables *X* and *Y*, *a* and *b* scaling coefficients and *c* is our constant, the following property of holds:

$E(aX + bY + c) = aE(X) + bE(Y) + c$

The above is true regardless of whether *X* and *Y* are independent.

Conditional distributions are still distributions. Treating them as a whole and applying the definition of expectation gives:

$E(X | A) = \sum\limits_{i}x_iP(X=x_i | A)$

**Variance** tells us how spread out the distribution of a random variable is. It is defined as

$Var(X) = E(X - E(X))^2 = E(X^2) - (E(X))^2$

**Properties of Variance**

$Var(cX) = c^2 Var(X)$

$Var(X \pm Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent

Indicator Random Variables are random variables whose value is 1 when a particular event happens, or 0 when it does not. Let $I_A$ be an indicator random variable for the event $A$. Then, we have:

$I_A = \begin{cases}
1 & \text{$$A$$ occurs} \\
0 & \text{$$A$$ does not occur}
\end{cases}$

Suppose $P(A) = p$. Then, $I \sim Bern(p)$ because $I$ has a $p$ chance of being 1, and a $1-p$ chance of being 0.

**Properties of Indicators**

$(I_A)^2 = I_A$, and $(I_A)^k = I_A$ for any power $k$.

$I_{A^c} = 1 - I_A$

$I_{A \cap B} = I_A I_B$ is the indicator for the event $A \cap B$ (that is, $I_A I_B = 1$ if and only if $A$ and $B$ occur, and 0 otherwise)

$I_{A \cup B} = I_A + I_B - I_A I_B$

The **fundamental bridge** is the idea that $E(I_A) = P(A)$. When we want to calculate the expected value of a complicated event, sometimes we can break it down into many indicator random variables, and then apply linearity of expectation on that. For example, if $X = I_1 + I_2 + \ldots + I_n$, then:

$E(X) = E(I_1) + E(I_2) + \ldots + E(I_n) = P(I_1) + P(I_2) + \ldots + P(I_n)$