The Linguistics of Probability (BH Chapter 3)

**Formal Definition** - A random variable X is a *function* mapping the sample space *S* into the real line.

**Descriptive Definition** - A random variable takes on a numerical summary of an experiment. The randomness comes from the randomness of what outcome occurs. Each outcome has a certain probability. A discrete random variable may only take on a finite (or countably infinite) number of values. Random variables are often denoted by capital letters, usually *X* and *Y*.

A distribution describes the probability that a random variable takes on certain values. Some distributions are commonly used in statistics because they can help model real life phenomena.

**PMF, CDF, and Independence**

**Probability Mass Function (PMF)** (Discrete Only) gives the probability that a random variable takes on the value X.

$P_X(x) = P(X=x)$

**Cumulative Distribution Function (CDF)** gives the probability that a random variable takes on the value x or less

$P(X=x, Y=y) = P(X = x)P(Y = y)$

**Bernoulli Distribution** The Bernoulli distribution is the simplest case of the Binomial distribution, where we only have one trial, or *n* = 1. Let us say that X is distributed Bern(*p*). We know the following:

**Story.** *X* “succeeds” (is 1) with probability *p*, and *X* “fails” (is 0) with probability 1 − *p*.

**Example.** A fair coin flip is distributed $Bern(\frac{1}{2})$

**PMF.** The probability mass function of a Bernoulli is:

$P(X = x) = p^x(1-p)^{1-x}$

$P(X = x) = \begin{cases} p, & x = 1 \\ 1-p, & x = 0 \end{cases}$

**Binomial** Let us say that *X* is distributed Bin(*n*, *p*). We know the following:

**Story** *X* is the number of “successes” that we will achieve in *n* independent trials, where each trial can be either a success or a failure, each with the same probability *p* of success.

**Example** If Lebron James makes 10 free throws and each one independently has a 3 chance of getting 4 in, then the number of free throws he makes is distributed Bin(10, 3), or, letting $X$ be the 4 number of free throws that he makes, $X$ is a Binomial Random Variable distributed $Bin(10, \frac{3}{4})$

**PMF** The probability mass function of a Binomial is:

$P(X = x) = {n \choose x} p^x(1-p)^{n-x}$

**Hypergeometric** Let us say that *X* is distributed HGeom(*w*, *b*, *n*). We know the following:

**Story** In a population of *b* undesired objects and *w* desired objects, *X* is the number of “successes" we will have in a draw of *n* objects, without replacement.

**Example** 1) Let’s say that we have only *b* Weedles (failure) and *w* Pikachus (success) in Viridian Forest. We encounter *n* of the Pokemon in the forest, and *X* is the number of Pikachus in our encounters. 2) The number of aces that you draw in 5 cards (without replacement). 3) You have *w* white balls and *b* black balls, and you draw *b* balls. *X* is the number of white balls you will draw in your sample.

**PMF** The probability mass function of a Hypergeometric is:

$P(X = k) = \frac{{w \choose k}{b \choose n-k}}{{w + b \choose n}}$

**Geometric** Let us say that *X* is distributed Geom(*p*). We know the following:

**Story** *X* is the number of “failures" that we will achieve before we achieve our first success. Our successes have probability *p*.

**Example** If each pokeball we throw has a 1 probability to catch Mew, the number of failed pokeballs will be distributed $Geom(\frac{1}{10})$

**PMF** With *q* = 1 − *p*, the probability mass function of a Geometric is:

$P(X = k) = q^kp$