Section 8

Transformations, Beta, Gamma, and Order Statistics (BH Chapter 8)

Continuous Transformations

Why do we need the Jacobian? We need the Jacobian to rescale our PDF so that it integrates to 1.

One Variable Transformations Let’s say that we have a random variable X with PDF fX(x)f_X(x), but we are also interested in some function of X. We call this function Y = g(X). Note that Y is a random variable as well. If g is differentiable and one-to-one (every value of X gets mapped to a unique value of Y), then the following is true:

fY(y)=fX(x)dxdy or fY(y)dydx=fX(x)f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| {\ or\ } f_Y(y) \left|\frac{dy}{dx}\right|= f_X(x)

To find fY(y)f_Y(y) as a function of yy, plug in x=g1(y)x = g^{-1}(y).

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y))\left|\frac{d}{dy}g^{-1}(y)\right|

The derivative of the inverse transformation is referred to the Jacobian, denoted as JJ.

J=ddyg1(y)J = \frac{d}{dy}g^{-1}(y)

Two Variable Transformations Similarily, let’s say we know the joint distribution of U and V but are also interested in the random vector (X, Y) found by (X, Y) = g(U, V). If g is differentiable and one-to-one, then the following is true:

The outer || signs around our matrix tells us to take the absolute value. The inner || signs tells us to the matrix's determinant. Thus the two pairs of || signs tell us to take the absolute value of the determinant matrix of partial derivatives.

​ The determinant of the matrix of partial derivatives is referred to the Jacobian, denoted as JJ.

Gamma Function

The Letter Gamma - Γ\Gamma is the (capital) roman letter Gamma. It is used in statistics for both the Gamma function and the Gamma Distribution.

Recursive Definition -The Gamma function is an extension of the factorial function to all real (and complex) numbers, with the argument shifted down by 1. When n is a positive integer,

Γ(n)=(n1)!\Gamma(n) = (n-1)!

For all values of n (except -1 and 0),

Γ(n+1)=nΓ(n)\Gamma(n + 1) = n\Gamma(n)

Closed-form Definition - The Gamma function is defined as:

Γ(n)=0tn1etdt\Gamma(n) = \int_0^\infty t^{n-1}e^{-t}dt

Gamma Distribution

Let us say that X is distributed Gamma(a, λ). We know the following:

Story You sit waiting for shooting stars, and you know that the waiting time for a star is distributed Expo(λ). You want to see “a" shooting stars before you go home. X is the total waiting time for the ath shooting star.

Example You are at a bank, and there are 3 people ahead of you. The serving time for each person is distributed Exponentially with mean of 2 time units. The distribution of your waiting time until you begin service is Gamma(3,12)Gamma(3,\frac{1}{2} )

PDF The PDF of a Gamma is:

f(x)=1Γ(a)(λx)aeλx1x,x[0,)f(x) = \frac{1}{\Gamma(a)}(\lambda x)^ae^{-\lambda x}\frac{1}{x}, \hspace{.1 in} x \in [0, \infty)

Beta Distribution

Let us say that X is distributed Beta(a, λ). We know the following:

Story Let’s say that your waiting time at the bank is distributed X ∼ Gamma(a, λ) and that your total waiting time at the post-office is distributed Y ∼ Gamma(b, λ). You visit both of them while doing errands. Your total waiting time at both is X + Y ∼ Gamma(a + b, λ) and the fraction of your time that you spend waiting at the Bank is X ∼ Beta(a, b). The fraction is not dependent on λ, and XX+Y\frac{X}{X+Y} is independent from X+YX + Y.

Example ou are tasked with finding Jules, Vernes, and Nemo in a friendly game of hide and seek. You look for them in order, and the time it takes to find one of them is distributed Exponentially with a mean of 1 of a time unit. The time it takes to find both Jules and Vernes is Expo(3) + Expo(3) ∼ Gamma(2, 3). The time it takes to find Nemo is Expo(3) ∼ Gamma(1, 3). Thus the proportion of the total hide-and- seek time that you spend finding Nemo is distributed Beta(1, 2) and is independent from the total time that you’ve spent playing the game. Gamma(3,12)Gamma(3, \frac{1}{2})

PDF The PDF of a Beta is:

f(x)=Γ(a+b)Γ(a)Γ(b)xa1(1x)b1,x(0,1)f(x) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}x^{a-1}(1-x)^{b-1}, \hspace{.1 in} x \in (0, 1)

Notable Uses of the Beta Distribution

. . . as the Order Statistics of the Uniform - The smallest of three Uniforms is distributed U(1) ∼ Beta(1, 3). The middle of three Uniforms is distributed U(2) ∼ Beta(2, 2), and the largest U(3) ∼ Beta(3, 1). The distribution of the the j-th order statistic of n i.i.d Uniforms is:

U(j)Beta(j,nj+1)U_{(j)} \sim Beta(j, n - j + 1)
fU(j)(u)=n!(j1)!(nj)!tj1(1t)njf_{U_{(j)}}(u) = \frac{n!}{(j-1)!(n-j)!}t^{j-1}(1-t)^{n-j}

. . . as the Conjugate Prior of the Binomial - A prior is the distribution of a parameter before you observe any data ( f (x)). A posterior is the distribution of a parameter after you observe data y ( f (x|y)). Beta is the conjugate prior of the Binomial because if you have a Beta-distributed prior on p (the parameter of the Binomial), then the posterior distribution on p given observed data is also Beta-distributed. This means, that in a two-level model:

XpBin(n,p)X|p \sim Bin(n, p)
pBeta(a,b)p \sim Beta(a, b)

Then after observing the value X=xX = x, we get a posterior distribution p(X=x)Beta(a+x,b+nx)p|(X=x) \sim Beta(a + x, b + n - x)

Convolutions and Limits

We can derive the following relationships between the distributions:

Gamma(1,λ)Expo(λ)Gamma(1, \lambda) \sim Expo(\lambda)
Beta(1,1)Unif(0,1)Beta(1, 1) \sim Unif(0, 1)

Let us say that we have XGamma(a,λ)X \sim Gamma(a, \lambda) and YGamma(b,λ)Y \sim Gamma(b, \lambda), and that XX is independent to YY. By Bank-Post Office result, we have that:

X+YGamma(a+b,λ)X + Y \sim Gamma(a + b, \lambda)
XX+YBeta(a,b)\frac{X}{X + Y} \sim Beta(a, b)

X+YX + Y is independent from XX+Y\frac{X}{X+Y}.

Order Statistics

Definition - Let’s say you have n i.i.d. random variables X1,X2,X3,XnX_1, X_2, X_3, \dots X_n. If you arrange them from smallest to largest, the ith element in that list is the ith order statistic, denoted X(i). X(1) is the smallest out of the set of random variables, and X(n) is the largest.

Properties - The order statistics are dependent random variables. The smallest value in a set of random variables will always vary and itself has a distribution. For any value of X(i), X(i+1) ≥ X(j).

Distribution - Taking n i.i.d. random variables X1,X2,X3,XnX_1, X_2, X_3, \dots X_n with CDF F(x) and PDF f (x), the CDF and PDF of X(i) are as follows:

FX(i)(x)=P(X(i)x)=k=in(nk)F(x)k(1F(x))nkF_{X_{(i)}}(x) = P (X_{(i)} \leq x) = \sum_{k=i}^n {n \choose k} F(x)^k(1 - F(x))^{n - k}
fX(i)(x)=n(n1i1)F(x)i1(1F(X))nif(x)f_{X_{(i)}}(x) = n{n - 1 \choose i - 1}F(x)^{i-1}(1 - F(X))^{n-i}f(x)

Universality of the Uniform - We can also express the distribution of the order statistics of n i.i.d. random variables X1,X2,X3,XnX_1, X_2, X_3, \dots X_n in terms of the order statistics of n uniforms. We have that

F(X(j))U(j)F(X_{(j)}) \sim U_{(j)}