Transformations, Beta, Gamma, and Order Statistics (BH Chapter 8)

**Why do we need the Jacobian?** We need the Jacobian to rescale our PDF so that it integrates to 1.

**One Variable Transformations** Let’s say that we have a random variable *X* with PDF $f_X(x)$, but we are also interested in some function of *X*. We call this function *Y* = *g*(*X*). Note that *Y* is a random variable as well. If *g* is differentiable and one-to-one (every value of *X* gets mapped to a unique value of *Y*), then the following is true:

$f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| {\ or\ } f_Y(y) \left|\frac{dy}{dx}\right|= f_X(x)$

To find $f_Y(y)$ as a function of $y$, plug in $x = g^{-1}(y)$.

$f_Y(y) = f_X(g^{-1}(y))\left|\frac{d}{dy}g^{-1}(y)\right|$

The derivative of the inverse transformation is referred to the **Jacobian**, denoted as $J$.

$J = \frac{d}{dy}g^{-1}(y)$

**Two Variable Transformations** Similarily, let’s say we know the joint distribution of *U* and *V* but are also interested in the random vector (*X*, *Y*) found by (*X*, *Y*) = *g*(*U*, *V*). If *g* is differentiable and one-to-one, then the following is true:

The outer $||$ signs around our matrix tells us to take the absolute value. The inner $||$ signs tells us to the matrix's determinant. Thus the two pairs of $||$ signs tell us to take the absolute value of the determinant matrix of partial derivatives.

The determinant of the matrix of partial derivatives is referred to the Jacobian, denoted as $J$.

**The Letter Gamma** - $\Gamma$ is the (capital) roman letter Gamma. It is used in statistics for both the Gamma function and the Gamma Distribution.

**Recursive Definition** -The Gamma function is an extension of the factorial function to all real (and complex) numbers, with the argument shifted down by 1. When *n* is a positive integer,

$\Gamma(n) = (n-1)!$

For all values of n (except -1 and 0),

$\Gamma(n + 1) = n\Gamma(n)$

**Closed-form Definition** - The Gamma function is defined as:

$\Gamma(n) = \int_0^\infty t^{n-1}e^{-t}dt$

Let us say that *X* is distributed Gamma(*a*, *λ*). We know the following:

**Story** You sit waiting for shooting stars, and you know that the waiting time for a star is distributed Expo(*λ*). You want to see “*a*" shooting stars before you go home. *X* is the total waiting time for the *a*th shooting star.

**Example** You are at a bank, and there are 3 people ahead of you. The serving time for each person is distributed Exponentially with mean of 2 time units. The distribution of your waiting time until you begin service is $Gamma(3,\frac{1}{2} )$

**PDF** The PDF of a Gamma is:

$f(x) = \frac{1}{\Gamma(a)}(\lambda x)^ae^{-\lambda x}\frac{1}{x},
\hspace{.1 in}
x \in [0, \infty)$

Let us say that *X* is distributed Beta(*a*, *λ*). We know the following:

**Story** Let’s say that your waiting time at the bank is distributed *X* ∼ Gamma(*a*, *λ*) and that your total waiting time at the post-office is distributed *Y* ∼ Gamma(*b*, *λ*). You visit both of them while doing errands. Your total waiting time at both is *X* + *Y* ∼ Gamma(*a* + *b*, *λ*) and the fraction of your time that you spend waiting at the Bank is *X* ∼ Beta(*a*, *b*). The fraction is not dependent on *λ*, and $\frac{X}{X+Y}$ is independent from $X + Y$.

**Example** ou are tasked with finding Jules, Vernes, and Nemo in a friendly game of hide and seek. You look for them in order, and the time it takes to find one of them is distributed Exponentially with a mean of 1 of a time unit. The time it takes to find both Jules and Vernes is Expo(3) + Expo(3) ∼ Gamma(2, 3). The time it takes to find Nemo is Expo(3) ∼ Gamma(1, 3). Thus the proportion of the total hide-and- seek time that you spend finding Nemo is distributed Beta(1, 2) and is independent from the total time that you’ve spent playing the game. $Gamma(3, \frac{1}{2})$

**PDF** The PDF of a Beta is:

$f(x) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1},
\hspace{.1 in}
x \in (0, 1)$

**. . . as the Order Statistics of the Uniform** - The smallest of three Uniforms is distributed *U*(1) ∼ Beta(1, 3). The middle of three Uniforms is distributed *U*(2) ∼ Beta(2, 2), and the largest *U*(3) ∼ Beta(3, 1). The distribution of the the *j-th* order statistic of *n* i.i.d Uniforms is:

$U_{(j)} \sim Beta(j, n - j + 1)$

$f_{U_{(j)}}(u) = \frac{n!}{(j-1)!(n-j)!}t^{j-1}(1-t)^{n-j}$

**. . . as the Conjugate Prior of the Binomial** - A prior is the distribution of a parameter before you observe any data ( *f* (*x*)). A posterior is the distribution of a parameter after you observe data *y* ( *f* (*x*|*y*)). Beta is the *conjugate* prior of the Binomial because if you have a Beta-distributed prior on *p* (the parameter of the Binomial), then the posterior distribution on *p* given observed data is also Beta-distributed. This means, that in a two-level model:

$X|p \sim Bin(n, p)$

$p \sim Beta(a, b)$

Then after observing the value $X = x$, we get a posterior distribution $p|(X=x) \sim Beta(a + x, b + n - x)$

We can derive the following relationships between the distributions:

$Gamma(1, \lambda) \sim Expo(\lambda)$

$Beta(1, 1) \sim Unif(0, 1)$

Let us say that we have $X \sim Gamma(a, \lambda)$ and $Y \sim Gamma(b, \lambda)$, and that $X$ is independent to $Y$. By Bank-Post Office result, we have that:

$X + Y \sim Gamma(a + b, \lambda)$

$\frac{X}{X + Y} \sim Beta(a, b)$

$X + Y$ is independent from $\frac{X}{X+Y}$.

**Definition** - Let’s say you have *n* i.i.d. random variables $X_1, X_2, X_3, \dots X_n$. If you arrange them from smallest to largest, the *i*th element in that list is the *i*th order statistic, denoted *X*(*i*). *X*(1) is the smallest out of the set of random variables, and *X*(*n*) is the largest.

**Properties** - The order statistics are dependent random variables. The smallest value in a set of random variables will always vary and itself has a distribution. For any value of *X*(*i*), *X*(*i*+1) ≥ *X*(*j*).

**Distribution** - Taking *n* i.i.d. random variables $X_1, X_2, X_3, \dots X_n$ with CDF *F*(*x*) and PDF *f* (*x*), the CDF and PDF of *X*(*i*) are as follows:

$F_{X_{(i)}}(x) = P (X_{(i)} \leq x) = \sum_{k=i}^n {n \choose k} F(x)^k(1 - F(x))^{n - k}$

$f_{X_{(i)}}(x) = n{n - 1 \choose i - 1}F(x)^{i-1}(1 - F(X))^{n-i}f(x)$

**Universality of the Uniform** - We can also express the distribution of the order statistics of *n* i.i.d. random variables $X_1, X_2, X_3, \dots X_n$ in terms of the order statistics of *n* uniforms. We have that

$F(X_{(j)}) \sim U_{(j)}$