Why do we need the Jacobian? We need the Jacobian to rescale our PDF so that it integrates to 1.
One Variable Transformations Let’s say that we have a random variable X with PDF , but we are also interested in some function of X. We call this function Y = g(X). Note that Y is a random variable as well. If g is differentiable and one-to-one (every value of X gets mapped to a unique value of Y), then the following is true:
To find as a function of , plug in .
The derivative of the inverse transformation is referred to the Jacobian, denoted as .
Two Variable Transformations Similarily, let’s say we know the joint distribution of U and V but are also interested in the random vector (X, Y) found by (X, Y) = g(U, V). If g is differentiable and one-to-one, then the following is true:
The outer signs around our matrix tells us to take the absolute value. The inner signs tells us to the matrix's determinant. Thus the two pairs of signs tell us to take the absolute value of the determinant matrix of partial derivatives.
The determinant of the matrix of partial derivatives is referred to the Jacobian, denoted as .
The Letter Gamma - is the (capital) roman letter Gamma. It is used in statistics for both the Gamma function and the Gamma Distribution.
Recursive Definition -The Gamma function is an extension of the factorial function to all real (and complex) numbers, with the argument shifted down by 1. When n is a positive integer,
For all values of n (except -1 and 0),
Closed-form Definition - The Gamma function is defined as:
Let us say that X is distributed Gamma(a, λ). We know the following:
Story You sit waiting for shooting stars, and you know that the waiting time for a star is distributed Expo(λ). You want to see “a" shooting stars before you go home. X is the total waiting time for the ath shooting star.
Example You are at a bank, and there are 3 people ahead of you. The serving time for each person is distributed Exponentially with mean of 2 time units. The distribution of your waiting time until you begin service is
PDF The PDF of a Gamma is:
Let us say that X is distributed Beta(a, λ). We know the following:
Story Let’s say that your waiting time at the bank is distributed X ∼ Gamma(a, λ) and that your total waiting time at the post-office is distributed Y ∼ Gamma(b, λ). You visit both of them while doing errands. Your total waiting time at both is X + Y ∼ Gamma(a + b, λ) and the fraction of your time that you spend waiting at the Bank is X ∼ Beta(a, b). The fraction is not dependent on λ, and is independent from .
Example ou are tasked with finding Jules, Vernes, and Nemo in a friendly game of hide and seek. You look for them in order, and the time it takes to find one of them is distributed Exponentially with a mean of 1 of a time unit. The time it takes to find both Jules and Vernes is Expo(3) + Expo(3) ∼ Gamma(2, 3). The time it takes to find Nemo is Expo(3) ∼ Gamma(1, 3). Thus the proportion of the total hide-and- seek time that you spend finding Nemo is distributed Beta(1, 2) and is independent from the total time that you’ve spent playing the game.
PDF The PDF of a Beta is:
. . . as the Order Statistics of the Uniform - The smallest of three Uniforms is distributed U(1) ∼ Beta(1, 3). The middle of three Uniforms is distributed U(2) ∼ Beta(2, 2), and the largest U(3) ∼ Beta(3, 1). The distribution of the the j-th order statistic of n i.i.d Uniforms is:
. . . as the Conjugate Prior of the Binomial - A prior is the distribution of a parameter before you observe any data ( f (x)). A posterior is the distribution of a parameter after you observe data y ( f (x|y)). Beta is the conjugate prior of the Binomial because if you have a Beta-distributed prior on p (the parameter of the Binomial), then the posterior distribution on p given observed data is also Beta-distributed. This means, that in a two-level model:
Then after observing the value , we get a posterior distribution
We can derive the following relationships between the distributions:
Let us say that we have and , and that is independent to . By Bank-Post Office result, we have that:
is independent from .
Definition - Let’s say you have n i.i.d. random variables . If you arrange them from smallest to largest, the ith element in that list is the ith order statistic, denoted X(i). X(1) is the smallest out of the set of random variables, and X(n) is the largest.
Properties - The order statistics are dependent random variables. The smallest value in a set of random variables will always vary and itself has a distribution. For any value of X(i), X(i+1) ≥ X(j).
Distribution - Taking n i.i.d. random variables with CDF F(x) and PDF f (x), the CDF and PDF of X(i) are as follows:
Universality of the Uniform - We can also express the distribution of the order statistics of n i.i.d. random variables in terms of the order statistics of n uniforms. We have that