Section 5

Continuous Probability and Poisson Process (BH Chapter 5)

What is a Continuous Random Variable (CRV)? A continuous random variable can take on any possible value within a certain interval (for example, [0, 1]), whereas a discrete random variable can only take on variables in a list of countable values (for example, all the integers, or the values 1,12,14,181, \frac{1}{2} , \frac{1}{4} , \frac{1}{8} , etc.)

PMF’s vs. PDF’s Discrete R.V’s have Probability Mass Functions, while continuous R.V.’s have Probability Density Functions. We visualize a PDF as a graph where the x axis is the support of our values

Intuitively, what do the yy values represent? Well it doesn't make sense to say P(X=x)P(X = x) for a continuous r.v. XX because P(X=x)=0P(X = x) = 0 for all xx. Think about the yy value in the above graph as: the relative frequency for getting a value within ϵ\epsilon of the xx value where ϵ\epsilon is small

What is the Cumulative Density Function (CDF)? It is the following function of x.

F(x)=P(Xx)F(x) = P(X \leq x)

CDF properties

1) FF is increasing.

2) FF is right-continuous.

3) F(x)1F(x) \rightarrow 1 as xx \rightarrow \infty, F(x)0F(x) \rightarrow 0 as xx \rightarrow -\infty

What is the Probability Density Function (PDF)? The PDF, f (x), is the derivative of the CDF.

F(x)=f(x)F'(x) = f(x)
F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t)dt
F(b)F(a)=abf(x)dxF(b) - F(a) = \int^b_a f(x)dx

Thus to find the probability that a CRV takes on a value in an interval, you can integrate the PDF, thus finding the area under the density curve.

Two additional properties of a PDF:

  • It must integrate to 1 (because the probability that a CRV falls in the interval [,][-\infty, \infty] is 1

  • The PDF must always be nonnegative.

f(x)dx,f(x)0\int^\infty_{-\infty}f(x)dx, \hspace{2 cm} f(x) \geq 0

How do I find the expected value of a CRV? Where in discrete cases you sum over the probabilities, in continuous cases you integrate over the densities.

E(X)=xf(x)dxE(X) = \int^\infty_{-\infty}xf(x)dx

Universality of the Uniform

When you plug any random variable into its own CDF, you get a Uniform[0,1] random variable. When you put a Uniform[0,1] into an inverse CDF, you get the corresponding random variable. For example, let’s say that a random variable X has a CDF

F(x)=1exF(x) = 1 - e^{-x}

By the Universality of the the Uniform, if we plug in X into this function then we get a uniformly distributed random variable.

F(X)=1eXUF(X) = 1 - e^{-X} \sim U

Similarly, since F(X)UF(X) \sim U then XF1(U)X \sim F^{−1}(U). The key point is that for any continuous random variable X, we can transform it into a uniform random variable and back by using its CDF.

Multivariate LOTUS (Law of the Unconscious Statistician)

In one dimension, we have: E(g(X))=xg(x)P(X=x)E(g(X)) = \sum_xg(x)P(X=x), or E(g(X))=g(x)fX(x)dxE(g(X)) = \int_{-\infty}^{\infty}g(x)f_X(x) d x

For discrete random variables: E(g(X,Y))=xyg(x,y)P(X=x,Y=y)E(g(X, Y)) = \sum_x\sum_yg(x, y)P(X=x, Y=y) For continuous random variables: E(g(X,Y))=g(x,y)fX,Y(x,y)dxdyE(g(X, Y)) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}g(x, y)f_{X,Y}(x, y) d x d y

Poisson Process

The Poisson process gives a story that links the Exponential distribution with the Poisson distribution. A Poisson process with rate λ\lambda has the following properties:

  • The number of arrivals that occur in an interval of length tt is distributed Pois(λt)Pois(\lambda t).

  • The number of arrivals that occur in disjoint intervals are independent of each other.

Count-Time Duality- Instead of asking how many events occur within some amount of time, we can flip the question around and ask how long it takes until some number of events occur. Let TnT_n be the amount of time it takes until the n-th event occurs and let NtN_t be the number of events that occur within time tt. What relationship do we have between TnT_n and NtN_t?

P(Tn>t)=P(Nt<n)P(T_n > t) = P(N_t < n)

To reason about this in words, the event that the n-th arrival time is greater than tt is equivalent to the event that the number of arrivals by time tt is less than nn.

Using count-time duality, we can discern the distribution of TnT_n. Let us first look at the distribution for the first arrival.

For the first arrival, we have P(T1t)=1P(T1>t)=1P(Nt<1)=1eλtP(T_1 \leq t) = 1 - P(T_1 > t) = 1 - P(N_t < 1) = 1 - e^{-\lambda t} This is the Expo(λ)Expo(\lambda) CDF!

Moment - Moments describe the shape of a distribution. The first three moments, are related to Mean, Variance, and Skewness of a distribution. The kthk-th moment of a random variable X is

μk=E(Xk)\mu'_k = E(X^k)

Mean, Variance, and other moments (Skewness, Kurtosis, etc.) can be expressed in terms of the moments of a random variable.

  • Mean μ1=E(X)\mu'_1 = E(X)

  • Variance μ2=E(X2)=Var(X)+(μ1)2\mu'_2 = E(X^2) = Var(X) + (\mu_1')^2

Moment Generating Functions

MGF For any random variable X, the moment generating function (MGF) of X is an exponential function of X if it exists for a finitely-sized interval centered around 0. The MGF is just a function of a dummy variable t.

MX(t)=E(etX)M_X(t) = E(e^{tX})

Why is it called the Moment Generating Function? Because the kthk-th derivative of the moment generating function evaluated 0 is the kthk-th moment of X!

μk=E(Xk)=MX(k)(0)\mu_k' = E(X^k) = M_X^{(k)}(0)

Why does this relationship hold? By differentiation under the integral sign and then plugging in t=0t=0:

MX(k)(t)=dkdtkE(etX)=E(dkdtketX)=E(XketX)M_X^{(k)}(t) = \frac{d^k}{dt^k}E(e^{tX}) = E(\frac{d^k}{dt^k}e^{tX}) = E(X^ke^{tX})
MX(k)(0)=E(Xke0X)=E(Xk)=μkM_X^{(k)}(0) = E(X^ke^{0X}) = E(X^k) = \mu_k'

MGF Properties

MGF of linear combination of X: MGF of linear combination of X. If we have Y=aX+cY = aX + c, then MY(t)=E(et(aX+c))=ectE(e(at)X)=ectMX(at)M_Y(t) = E(e^{t(aX + c)}) = e^{ct}E(e^{(at)X}) = e^{ct}M_X(at)

Uniqueness of the MGF: If it exists, the MGF uniquely defines the distribution. This means that for any two random variables X and Y, they are distributed the same (their CDFs/PDFs are equal) if and only if their MGF’s are equal. You can’t have different PDFs when you have two random variables that have the same MGF.

Summing Independent R.V.s by Multiplying MGFs:

If X and Y are independent, then the MGF of the sum of two random variables is the product of the MGFs of those two random variables.

M(X+Y)(t)=E(et(X+Y))=E(etXetY)=E(etX)E(etY)=MX(t)MY(t)M_{(X+Y)}(t) = E(e^{t(X + Y)}) = E(e^{tX}e^{tY}) = E(e^{tX})E(e^{tY}) = M_X(t) \cdot M_Y(t)
M(X+Y)(t)=MX(t)MY(t)M_{(X+Y)}(t) = M_X(t) \cdot M_Y(t)

Last updated