MGF and Joint, Conditional, and Marginal Distributions (BH Chapter 6)

**Moment** - Moments describe the shape of a distribution. The first three moments, are related to Mean, Variance, and Skewness of a distribution. The $k-th$ moment of a random variable *X* is

$\mu'_k = E(X^k)$

Mean, Variance, and other moments (Skewness, Kurtosis, etc.) can be expressed in terms of the moments of a random variable.

Mean $\mu'_1 = E(X)$

Variance $\mu'_2 = E(X^2) = Var(X) + (\mu_1')^2$

**MGF** For any random variable X, the **moment generating function (MGF)** of X is an exponential function of X if it exists for a finitely-sized interval centered around 0. The MGF is just a function of a dummy variable *t*.

$M_X(t) = E(e^{tX})$

**Why is it called the Moment Generating Function?** Because the $k-th$ derivative of the moment generating function evaluated 0 is the $k-th$ moment of *X*!

$\mu_k' = E(X^k) = M_X^{(k)}(0)$

Why does this relationship hold? By differentiation under the integral sign and then plugging in $t=0$:

$M_X^{(k)}(t) = \frac{d^k}{dt^k}E(e^{tX}) = E(\frac{d^k}{dt^k}e^{tX}) = E(X^ke^{tX})$

$M_X^{(k)}(0) = E(X^ke^{0X}) = E(X^k) = \mu_k'$

**MGF of linear combination of X**: MGF of linear combination of X. If we have $Y = aX + c$, then $M_Y(t) = E(e^{t(aX + c)}) = e^{ct}E(e^{(at)X}) = e^{ct}M_X(at)$

**Uniqueness of the MGF**: *If it exists, the MGF uniquely defines the distribution*. This means that for any two random variables *X* and *Y*, they are distributed the same (their CDFs/PDFs are equal) if and only if their MGF’s are equal. You can’t have different PDFs when you have two random variables that have the same MGF.

**Summing Independent R.V.s by Multiplying MGFs**:

If *X* and *Y* are independent, then the MGF of the sum of two random variables is the product of the MGFs of those two random variables.

$M_{(X+Y)}(t) = E(e^{t(X + Y)}) = E(e^{tX}e^{tY}) = E(e^{tX})E(e^{tY}) = M_X(t) \cdot M_Y(t)$

$M_{(X+Y)}(t) = M_X(t) \cdot M_Y(t)$

Sometimes we have more than one random variable of interest, and we want to study probabilities associated with all of the random variables. Instead of studying the distributions of $X_1, X_2, X_3$ separately, we can study the distribution of the multivariate vector $\textbf{X} = (X_1, X_2, X_3)$. Joint PDFs and CDFs are analogous to multivariate versions of univariate PDFs and CDFs. Usually joint PDFs and PMFs carry more information than the marginal ones do, because they account for the interactions between the various random variables. If, however, the random variables are independent, then the joint PMF/PDF is just the product of the marginals and we get no extra information by studying them jointly rather than marginally.

Both the Joint PMF and Joint PDF must be non-negative and sum/integrate to 1. ($\sum_x \sum_y P(X=x, Y=y) = 1$) ($\int_x\int_y f_{X,Y}(x,y) = 1$). Like in the univariate case, you sum/integrate the PMF/PDF to get the CDF.

By Bayes' Rule, $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$ Similar conditions apply to conditional distributions of random variables.

For discrete random variables: $P(Y=y|X=x) = \frac{P(X=x, Y=y)}{P(X=x)} = \frac{P(X=x|Y=y)P(Y=y)}{P(X=x)}$

For continuous random variables: $f_{Y|X}(y|x) = \frac{f_{X,Y}(x, y)}{f_X(x)} = \frac{f_{X|Y}(x|y)f_Y(y)}{f_X(x)}$