# Section 5

Continuous Probability and Poisson Process (BH Chapter 5)

What is a Continuous Random Variable (CRV)? A continuous random variable can take on any possible value within a certain interval (for example, [0, 1]), whereas a discrete random variable can only take on variables in a list of countable values (for example, all the integers, or the values $1, \frac{1}{2} , \frac{1}{4} , \frac{1}{8}$ , etc.)

PMF’s vs. PDF’s Discrete R.V’s have Probability Mass Functions, while continuous R.V.’s have Probability Density Functions. We visualize a PDF as a graph where the x axis is the support of our values

Intuitively, what do the $y$ values represent? Well it doesn't make sense to say $P(X = x)$ for a continuous r.v. $X$ because $P(X = x) = 0$ for all $x$. Think about the $y$ value in the above graph as: the relative frequency for getting a value within $\epsilon$ of the $x$ value where $\epsilon$ is small

What is the Cumulative Density Function (CDF)? It is the following function of x.

$F(x) = P(X \leq x)$

# CDF properties

1) $F$ is increasing.

2) $F$ is right-continuous.

3) $F(x) \rightarrow 1$ as $x \rightarrow \infty$, $F(x) \rightarrow 0$ as $x \rightarrow -\infty$

What is the Probability Density Function (PDF)? The PDF, f (x), is the derivative of the CDF.

$F'(x) = f(x)$
$F(x) = \int_{-\infty}^x f(t)dt$
$F(b) - F(a) = \int^b_a f(x)dx$

Thus to find the probability that a CRV takes on a value in an interval, you can integrate the PDF, thus finding the area under the density curve.

Two additional properties of a PDF:

• It must integrate to 1 (because the probability that a CRV falls in the interval $[-\infty, \infty]$ is 1

• The PDF must always be nonnegative.

$\int^\infty_{-\infty}f(x)dx, \hspace{2 cm} f(x) \geq 0$

How do I find the expected value of a CRV? Where in discrete cases you sum over the probabilities, in continuous cases you integrate over the densities.

$E(X) = \int^\infty_{-\infty}xf(x)dx$

# Universality of the Uniform

When you plug any random variable into its own CDF, you get a Uniform[0,1] random variable. When you put a Uniform[0,1] into an inverse CDF, you get the corresponding random variable. For example, let’s say that a random variable X has a CDF

$F(x) = 1 - e^{-x}$

By the Universality of the the Uniform, if we plug in X into this function then we get a uniformly distributed random variable.

$F(X) = 1 - e^{-X} \sim U$

Similarly, since $F(X) \sim U$ then $X \sim F^{−1}(U)$. The key point is that for any continuous random variable X, we can transform it into a uniform random variable and back by using its CDF.

# Multivariate LOTUS (Law of the Unconscious Statistician)

In one dimension, we have: $E(g(X)) = \sum_xg(x)P(X=x)$, or $E(g(X)) = \int_{-\infty}^{\infty}g(x)f_X(x) d x$

For discrete random variables: $E(g(X, Y)) = \sum_x\sum_yg(x, y)P(X=x, Y=y)$ For continuous random variables: $E(g(X, Y)) = \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}g(x, y)f_{X,Y}(x, y) d x d y$

# Poisson Process

The Poisson process gives a story that links the Exponential distribution with the Poisson distribution. A Poisson process with rate $\lambda$ has the following properties:

• The number of arrivals that occur in an interval of length $t$ is distributed $Pois(\lambda t)$.

• The number of arrivals that occur in disjoint intervals are independent of each other.

Count-Time Duality- Instead of asking how many events occur within some amount of time, we can flip the question around and ask how long it takes until some number of events occur. Let $T_n$ be the amount of time it takes until the n-th event occurs and let $N_t$ be the number of events that occur within time $t$. What relationship do we have between $T_n$ and $N_t$?

$P(T_n > t) = P(N_t < n)$

To reason about this in words, the event that the n-th arrival time is greater than $t$ is equivalent to the event that the number of arrivals by time $t$ is less than $n$.

Using count-time duality, we can discern the distribution of $T_n$. Let us first look at the distribution for the first arrival.

For the first arrival, we have $P(T_1 \leq t) = 1 - P(T_1 > t) = 1 - P(N_t < 1) = 1 - e^{-\lambda t}$ This is the $Expo(\lambda)$ CDF!

Moment - Moments describe the shape of a distribution. The first three moments, are related to Mean, Variance, and Skewness of a distribution. The $k-th$ moment of a random variable X is

$\mu'_k = E(X^k)$

Mean, Variance, and other moments (Skewness, Kurtosis, etc.) can be expressed in terms of the moments of a random variable.

• Mean $\mu'_1 = E(X)$

• Variance $\mu'_2 = E(X^2) = Var(X) + (\mu_1')^2$

# Moment Generating Functions

MGF For any random variable X, the moment generating function (MGF) of X is an exponential function of X if it exists for a finitely-sized interval centered around 0. The MGF is just a function of a dummy variable t.

$M_X(t) = E(e^{tX})$

Why is it called the Moment Generating Function? Because the $k-th$ derivative of the moment generating function evaluated 0 is the $k-th$ moment of X!

$\mu_k' = E(X^k) = M_X^{(k)}(0)$

Why does this relationship hold? By differentiation under the integral sign and then plugging in $t=0$:

$M_X^{(k)}(t) = \frac{d^k}{dt^k}E(e^{tX}) = E(\frac{d^k}{dt^k}e^{tX}) = E(X^ke^{tX})$
$M_X^{(k)}(0) = E(X^ke^{0X}) = E(X^k) = \mu_k'$

# MGF Properties

MGF of linear combination of X: MGF of linear combination of X. If we have $Y = aX + c$, then $M_Y(t) = E(e^{t(aX + c)}) = e^{ct}E(e^{(at)X}) = e^{ct}M_X(at)$

Uniqueness of the MGF: If it exists, the MGF uniquely defines the distribution. This means that for any two random variables X and Y, they are distributed the same (their CDFs/PDFs are equal) if and only if their MGF’s are equal. You can’t have different PDFs when you have two random variables that have the same MGF.

Summing Independent R.V.s by Multiplying MGFs:

If X and Y are independent, then the MGF of the sum of two random variables is the product of the MGFs of those two random variables.

$M_{(X+Y)}(t) = E(e^{t(X + Y)}) = E(e^{tX}e^{tY}) = E(e^{tX})E(e^{tY}) = M_X(t) \cdot M_Y(t)$
$M_{(X+Y)}(t) = M_X(t) \cdot M_Y(t)$