Ethan P. Marzban
2023-07-10
Let’s actually conduct an experiment together!
Specifically, suppose we toss a coin 3 times and record the outcomes.
First question: what is the outcome space?
Additionally, let’s keep track of the number of heads we observe each time we run this experiment.
Note that each time we run this experiment, we (sure enough) get an element of the outcome space.
But, also note that each time we run the experiment, we get a (potentially) different number of heads.
In fact, each outcome in the outcome space corresponds to a different number of heads:
Outcome | Number of Heads |
---|---|
(H , H , H ) |
3 |
(H , H , T ) |
2 |
(H , T , H ) |
2 |
(T , H , H ) |
2 |
(H , T , T ) |
1 |
(T , H , T ) |
1 |
(T , T , H ) |
1 |
(T , T , T ) |
0 |
This leads us to the notion of random variables.
Loosely speaking, a random variable is a variable or process with a random numerical outcome.
We denote random variables using capital letters; e.g. \(X\), \(Y\), \(Z\), \(W\), etc.
So, for example, \(X =\) “the number of heads in 3 tosses of a coin” is a random variable because (a) it is a numerical outcome of an experiment and (b) it is random (i.e. its value changes depending on the outcome of the experiment).
By the way, note that we also use capital letters to denote events. So, how will we know whether something is an event or a random variable?
That’s right; based on how it is defined! So, again, make sure you are defining everything clearly and explicitly.
A key part of the definition of random variables is that they must be numerical.
What this means is we can always look at the set of values a random variable could take: this is what we call the state space of a random variable.
For example: if \(X =\) “number of heads in 3 tosses of a coin”, we see that \(X\) will only ever be \(0\), \(1\), \(2\), or \(3\).
We often denote the state space of a random variable using the notation \(S_{\verb|<variable>|}\); e.g. \(S_X\) to mean the state space of \(X\), \(S_Y\) to mean the state space of \(Y\), etc.
Because random variables are numerical, their state spaces will always be numerical sets of values.
This means we can classify state spaces using our Variable Classification scheme from Week 1!
We extend the same classification language to random variables:
Definition
Given a random variable \(X\), we say that:
Let’s return to our coin tossing example.
What is the probability that we observe zero heads?
Well, in the language of our random variable \(X\) (which counts the number of heads in these three tosses of our fair coin), we can translate “zero heads” to the event “\(\{X = 0\}\)’’, meaning we want to find \(\mathbb{P}(X = 0)\).
Observing zero heads is equivalent to observing all tails, meaning the event \(\{X = 0\}\) is equivalent to the event { (T
, T
, T
) }.
Now, up to this point I have been careful to avoid explicitly mentioning whether our coin is fair or not.
T
, T
, T
) is simply \((1-p)^3\)
T
, T
, T
) means “T
first and T
second and T
third’’. By independence, the probability of this string of occurrences is simply \[ \mathbb{P}(\text{\texttt{T} first}) \times \mathbb{P}(\text{\texttt{T} second}) \times \mathbb{P}(\text{\texttt{T} third}) \] which is just \((1 - p) \times (1 - p) \times (1 - p) = (1 - p)^3\).I.e., how would we go about computing \(\mathbb{P}(X = 1)\)?
Well, let’s use the same logic we used before: that is, let’s see what outcomes comprise the event \(\{X = 1\}\).
Upon inspection, we see that the event \(\{X = 1\}\) is equivalent to the event {(H
, T
, T
), (T
, H
, T
), (T
, T
, H
) }.
H
, T
, T
) is \(p \times (1 - p) \times (1 - p) = p \times (1 - p)^2\)T
, H
, T
) is \((1 - p) \times p \times (1 - p) = p \times (1 - p)^2\)T
, T
, H
) is \((1 - p)^2 \times p = p \times (1 - p)^2\)Therefore, putting these facts together, we find \[ \mathbb{P}(X = 1) = 3 \times p \times (1 - p)^2 \]
Probability | Value |
---|---|
\(\mathbb{P}(X = 0)\) | \((1 - p)^3\) |
\(\mathbb{P}(X = 1)\) | \(3 \times p \times (1 - p)^2\) |
\(\mathbb{P}(X = 2)\) | \(3 \times p^2 \times (1 - p)\) |
\(\mathbb{P}(X = 3)\) | \(p^3\) |
\[\begin{array}{r|cccc} \boldsymbol{k} & 0 & 1 & 2 & 3 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & (1 - p)^3 & 3 p (1 - p)^2 & 3 p^2 (1 - p) & p^3 \end{array}\]
The table on the previous slide is called a probability mass function, and is often abbreviated as p.m.f..
In general, the p.m.f. of an arbitrary random variable \(X\) is a table or formula that specifies all the possible values a random variable can take (i.e. the state space), along with the probability with which the random variable attains those values.
We use the term “function” to describe this because, in abstraction, we can notate the p.m.f. as \[ p_X(k) := \mathbb{P}(X = k) \] where \(k\) can be any value in the state space of \(X\).
Worked-Out Example 1
Suppose we toss three fair coins independently, and let \(X\) denote the number of heads observed. Construct the p.m.f. (probability mass function) of \(X\).
By our work from above, the p.m.f. of \(X\) is given by \[\begin{array}{r|cccc} \boldsymbol{k} & 0 & 1 & 2 & 3 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 1/8 & 3/8 & 3/8 & 1/8 \end{array}\]
By the way, notice that the probabilities in the p.m.f. sum up to 1.
Properties of a PMF
Worked-Out Example 2
A random variable \(X\) has the following p.m.f.: \[\begin{array}{r|cccc} \boldsymbol{k} & -1.4 & 0 & 3 & 4.15 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 0.1 & 0.2 & \boldsymbol{a} & 0.6 \end{array}\] What must be the value of \(\boldsymbol{a}\)?
Worked-Out Example 3
A random variable \(X\) has the following p.m.f.: \[\begin{array}{r|cccc} \boldsymbol{k} & -1.4 & 0 & 3 & 4.15 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 0.1 & 0.2 & 0.1 & 0.6 \end{array}\] Compute both \(\mathbb{P}(X = 0)\) and \(\mathbb{P}(X \leq 0)\).
There are only two values in \(S_X\) [by the way, as an aside, can anyone tell me what \(S_X\) is in this problem?] that are less than or equal to \(0\): \(0\) itself, and \(-1.4\).
Hence, saying that \(X\) was less than or equal to \(0\) is equivalent to saying \(X\) was either \(-1.4\) or \(0\).
Therefore, \[ \mathbb{P}(X \leq 0) = \mathbb{P}(X = -1.4) + \mathbb{P}(X = 0) = 0.1 + 0.2 = \boxed{0.3} \]
Computing \(\mathbb{P}(X \leq k)\)
To compute \(\mathbb{P}(X \leq k)\), we sum up the values \(\mathbb{P}(X = x)\) for all values of \(x\) in the state space that are less than or equal to \(k\).
Definition
The expected value (or just expectation) of a discrete random variable \(X\) is \[ \mathbb{E}[X] = \sum_{\text{all $k$}} k \cdot \mathbb{P}(X = k) \] where the sum ranges over all values of \(k\) in the state space.
In words: multiply each value in the state space by the corresponding probability, and then sum.
The expected value is a sort of ‘center’ of a random variable.
Worked-Out Example 4
A random variable \(X\) has the following p.m.f.: \[\begin{array}{r|cccc} \boldsymbol{k} & -1.4 & 0 & 3 & 4.15 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 0.1 & 0.2 & 0.1 & 0.6 \end{array}\] Compute \(\mathbb{E}[X]\).
Definition
The variance of a discrete random variable \(X\) is \[ \mathrm{Var}(X) = \sum_{\text{all $k$}} (k - \mathbb{E}[X])^2 \cdot \mathbb{P}(X = k) \] where the sum ranges over all values of \(k\) in the state space. The standard deviation is the square root of the variance: \[ \mathrm{SD}(X) = \sqrt{\mathrm{Var}(X)} \]
Second Formula for Variance
\[ \mathrm{Var}(X) = \left( \sum_{\text{all $k$}} k^2 \cdot \mathbb{P}(X = k) \right) - \left( \mathbb{E}[X] \right)^2 \]
Worked-Out Example 5
A random variable \(X\) has the following p.m.f.: \[\begin{array}{r|cccc} \boldsymbol{k} & -1.4 & 0 & 3 & 4.15 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 0.1 & 0.2 & 0.1 & 0.6 \end{array}\] Compute \(\mathrm{Var}(X)\) and \(\mathrm{SD}(X)\).
Exercise 1
Suppose \(X\) is a random variable with p.m.f. (probability mass function) given by \[\begin{array}{r|cccc} \boldsymbol{k} & -1 & 0 & 1 & 2 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & 0.3 & 0.2 & 0.1 & \boldsymbol{a} \end{array}\]
Exercise 2
Consider the following game: a fair six-sided die is rolled. If the number showing is 1
or 2
, you win a dollar; if the number showing is 3
, 4
, or 5
you win 2 dollars; if the number showing is 6
, you lose 1 dollar. Let \(W\) denote your net winnings after playing this game once.
Alright, let’s close out this lecture by returning to our coin tossing example.
As a reminder: if we let \(X\) denote the number of heads in 3 tosses of a \(p-\)coin (i.e. a coin that lands ‘heads’ with probability \(p\)), the p.m.f. of \(X\) is given by
\[\begin{array}{r|cccc} \boldsymbol{k} & 0 & 1 & 2 & 3 \\ \hline \boldsymbol{\mathbb{P}(X = k)} & (1 - p)^3 & 3 p (1 - p)^2 & 3 p^2 (1 - p) & p^3 \end{array}\]
What if instead of tossing 3 coins, we had tossed 4? Or 5? Or 10?
We could go through the same steps we did before, when deriving the p.m.f. for three tosses, but let’s be a little smarter about this; let’s answer the following more general question:
If \(X\) denotes the number of heads in \(n\) tosses of a \(p-\)coin, what is the p.m.f. of \(X\)?
It’s always a good idea to start with the support. If we are tossing \(n\) coins, we cannot observe any more than \(n\) heads, nor can we observe any less than \(0\) heads. As such, \[ S_X = \{0, 1, 2, \cdots, n\} \]
Now, let’s consider an arbitrary \(k \in S_X\), and examine the event \(\{X = k \}\).
In words, \(\{X = k \}\) means “we observe exactly \(k\) heads”, which is equivalent to “we observe exactly \(k\) heads and \((n - k)\) tails.”
Suppose, for the moment, that these \(k\) heads occurred consecutively, and at the beginning of our tosses. I.e., suppose we have the outcome \[ (\underbrace{H, \ H, \ \cdots, \ H}_{\text{$k$ heads}}, \ \underbrace{T, \ T, \ \cdots, \ T}_{\text{$n - k$ tails}}) \]
As such, there are \(\binom{n}{k}\) other outcomes in the event \(\{X = k\}\), corresponding to the \(\binom{n}{k}\) different ways to place the \(k\) heads among the \(n\) tosses.
So, what we have is \[ \mathbb{P}(X = k) = \binom{n}{k} \cdot p^k \cdot (1 - p)^{n - k} \]
The Binomial Distribution
Suppose the probability of a single trial resulting in a ‘success’ is \(p\). Letting \(X\) denote the number of successes in \(n\) independent trials, then we say that \(X\) follows the Binomial Distribution with parameters \(n\) and \(p\). We use the notation \(X \sim \mathrm{Bin}(n, p)\) to denote this.
Facts about the Binomial Distribution
If \(X \sim \mathrm{Bin}(n, p)\), then
Four Conditions to Check
If \(X\) counts the number of successes in \(n\) trials, there are four conditions that need to be satisfied in order for \(X\) to follow the Binomial Distribution:
Worked-Out Example 6
If we roll a fair \(6-\)sided die \(13\) times (assume rolls are independent of each other) and let \(X\) denote the number of times we observe an even number, is \(X\) binomially distributed?
In a large population of \(100\) students, of which \(70\) own Android phones, we draw a random sample of 10 without replacement and let \(Y\) denote the number of students in this sample that have Android phones. Is \(Y\) binomially distributed?
Consider the same setup as in part (b) above, except this time suppose students are selected with replacement. Is \(Y\) binomially distributed?
Exercise 3
Suppose Jana tosses \(65\) different \(12-\)sided dice, independently of each other; let \(Z\) denote the number of times a multiple of three results.