Ethan P. Marzban
2023-04-11
Definition
An experiment is any procedure that can be repeated an infinite number of times, and each time the procedure is repeated there are a fixed set of things that could occur.
heads
or tails
.Definition
The outcome space of an experiment is the set \(\Omega\) consisting of all outcomes of the experiment.
For instance, in the coin tossing example the outcome space is
\(\Omega =\{\)heads
, tails
\(\}\).
As an aside: some textbooks/professors refer to the outcome space as the sample space, and use the letter \(S\) to denote it.
Worked-Out Exercise 1
Consider the experiment of rolling two four-sided dice and recording the faces that appear. What is an appropriate outcome space for this experiment?
On each die roll, we will observe either a \(1\), \(2\), \(3\), or \(4\).
But, we cannot simply say that our outcome space is \(\{1, 2, 3, 4\}\) as this does not take into account the fact that we rolled two dice!
\[\begin{align*} \Omega = \{ & (1, 1), \ (1, 2), \ (1, 3), \ (1, 4), \\ & (2, 1), \ (2, 2), \ (2, 3), \ (2, 4), \\ & (3, 1), \ (3, 2), \ (3, 3), \ (3, 4), \\ & (4, 1), \ (4, 2), \ (4, 3), \ (4, 4) \} \end{align*}\]
Exercise 1
Consider the experiment of tossing a coin, rolling a 4-sided die, and then tossing another coin. What is the outcome space of this experiment?
There are a few other ways we can use to describe the outcome space of an experiment.
Let’s return to the tossing four dice example from a few slides ago. Another way we could have kept track of the outcomes was by using a table, recording the outcome of the first die roll in the rows and the outcomes of the second in the columns:
1 | 2 | 3 | 4 | |
1 | (1, 1) | (1, 2) | (1, 3) | (1, 4) |
2 | (2, 1) | (2, 2) | (2, 3) | (2, 4) |
3 | (3, 1) | (3, 2) | (3, 3) | (3, 4) |
4 | (4, 1) | (4, 2) | (4, 3) | (4, 4) |
So, tables are a good way of keeping track of outcomes.
But, they really only work when we have two of something (e.g. two dice, two coins, etc.). What happens if we, for example, toss three coins?
This is where tree diagrams can become useful.
which is the outcome \((H, H, H)\).
Sometimes, it will be useful to consider quantities that are a bit more complex than single outcomes.
For example, consider the experiment of rolling two 4-sided dice. I could ask myself: in how many outcomes does the second die roll result in a higher number than the first?
This leads us to the notion of an event.
Definition
An event is a subset of the outcome space. In other words, an event is just a set consisting of one or more outcomes.
Exercise 2
Consider the experiment of tossing three coins and recording the faces that appear. If \(F\) denotes the event “an even number of heads was observed”, what is the mathematical representation of \(F\) (i.e. as a collection of outcomes)?
Remember how we talked about the union of two sets last week?
Well, since events are just sets, we can talk about the union of two sets.
In words, the union corresponds to an “or” statement.
For example, let \(E\) denote the event “it is raining” and \(F\) denote the event “the ground is wet”, then the event \(E \cup F\) would be the event “it is raining or the ground is wet”.
The intersection of two events (denoted with the \(\cap\) symbol), corresponds to an “and” statement
The complement of an event \(E\), denoted \(E^{\complement}\), represents the event “not \(E\)”
Recall that \(E \cap F\) denotes the event “both \(E\) and \(F\) occurred.”
Also recall that \(A^{\complement}\) denotes “not \(A\)”; i.e. “\(A\) did not occur”
As such, \((E \cap F)^{\complement}\) denotes the event “it is not the case that both \(E\) or \(F\) occurred.”
This means that either \(E\) did not occur, or \(F\) did not occur (or both).
Mathematically, this is equivalent to \(E^\complement \cup F^\complement\).
As such, it seems we have arrived at the following equality: \[ (E \cap F)^{\complement} = E^\complement \cup F^\complement \]
DeMorgan’s Laws
Given two events \(E\) and \(F\), we have the following:
Exercise 3
Consider the experiment of tossing two coins at the same time. Define the following events:
Express the following events in words:
Now, there is something interesting about the events defined in the previous example.
The outcome space of the underlying experiment is \[ \Omega = \{ (H, H), \ (H, T), \ (T, H), \ (T, T)\} \] and
This is an example of a subset: we say that a set \(A\) is a subset of another set \(B\) if all elements of \(A\) are also elements of \(B\). We use the notation \(A \subseteq B\).
Another thing to note is that \(A \cap B\) has no elements- there is nothing common to both \(A\) and \(B\)!
The set with no elements is called the empty set, and is denoted \(\varnothing\).
Two events whose intersection is the empty said are said to be disjoint.
Now, you may note that we have yet to mention the term “probability.”
To get a better sense of “probability”, let’s examine how we use the word in everyday speech:
Notice that “rain”, “winning big at a Casino”, and “scoring 100% on the PSTAT 5A Midterm 1” are all events.
As such, “probability” seems to take in an event and spit out a number.
The symbol we use for a probability measure is \(\mathbb{P}\); i.e. we write \(\mathbb{P}(E)\) to denote “the probability of event \(E\)”.
Now, this doesn’t really tell us how to define \(\mathbb{P}(E)\) for an arbitrary event \(E\).
There are (roughly) two schools of thought when it comes to defining the probability of an event: the long-run frequency approach, and the classical approach.
The long-run frequency approach defines the probability of an event \(E\) to be the proportion of times \(E\) occurs, if the underlying experiment were to be repeated a large number of times.
To help us understand the notion of long-run frequencies, let’s go through an example together. Suppose we toss a coin and record whether the outcome lands heads
or tails
, and further suppose we observe the following tosses:
H
, T
, T
, H
, T
, H
, H
, H
, T
, T
heads
after each toss, we count the number of times we observed heads
and divide by the total number of tosses observed.Toss | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Outcome | H |
T |
T |
H |
T |
H |
H |
H |
T |
T |
Raw freq. of H |
1 | 1 | 1 | 2 | 2 | 3 | 4 | 5 | 5 | 5 |
Rel. freq of H |
1/1 | 1/2 | 1/3 | 2/4 | 2/5 | 3/6 | 4/7 | 5/8 | 5/9 | 5/10 |
The second way to define probabilities is what is known as the classical approach.
As an important note: we can only apply the classical approach if we believe all outcomes in our experiment to be equally likely.
If we make the equally likely outcomes assumption, then the classical approach to probability tells us to define \(\mathbb{P}(E)\) as \[ \mathbb{P}(E) = \frac{\text{number of ways $E$ can occur}}{\text{total number of outcomes}} \]
So, for example, if we toss a fair coin once, then the classical approach to probability (which can be used since the coin is fair) states that \[ \mathbb{P}(\texttt{heads}) = \frac{1}{2} \]
Let’s quickly compare these two approaches to defining the probability of an event.
The long-run frequencies definition has the benefit of not requiring the assumption of equally likely outcomes.
The classical approach does not rely on such considerations, making the definitions it produces perhaps a bit more easily interpretable.
In situations where we do not provide a probability a priori, there will likely be some key word or phrase that lets you know we are looking for the classical definition.
Exercise 4
Part (a) Consider the experiment of rolling a fair six-sided die once and recording the number on the face that is showing. What is the probability that the die lands on the number 1?
Part(b) A coin is tossed repeatedly, and the following relative frequency diagram is constructed to track the relative frequency of heads
. What is the probability that the coin will land heads? Is the coin fair?
It turns out that there are three axioms that a probability measure must satisfy, collectively called the axioms of probability:
If you are not familiar with the notion of axioms: an axiom is a fundamental “truth” of math, that does not need to be proven.
Let’s quickly summarize the concepts/terms we’ve covered:
These are the basic building blocks of probability.
We will now combine them!
Important
In this class, using proper notation is very important.
The Complement Rule
Given an event \(E\), we have \(\mathbb{P}(E^\complement) = 1 - \mathbb{P}(E)\)
The Probability of the Empty Set
\(\mathbb{P}(\varnothing) = 0\).
The Addition Rule
Given events \(E\) and \(F\), we have \(\mathbb{P}(E \cup F) = \mathbb{P}(E) + \mathbb{P}(F) - \mathbb{P}(E \cap F)\)
Worked-Out Example 2
A recent survey at the Isla Vista Co-Op revealed that 50% of shoppers buy bread, 30% buy jam, and 20% buy both bread and jam.
Let \(J\) denote the event “a randomly selected shopper will purchase jam”.
The event “a randomly selected shopper will not purchase jam” is given by \(J^\complement\), meaning the quantity we seek is \(\mathbb{P}(J^\complement)\).
By the Complement Rule, we have \[ \mathbb{P}(J^\complement) = 1 - \mathbb{P}(J) = 1 - 0.3 = \boxed{0.7 = 70\%} \]
Let \(J\) be defined as before, and let \(B\) denote the event “a randomly selected shopper will purchase bread or jam”
The first quantity provided in the problem statement tells us that \(\mathbb{P}(B) = 0.5\)
The final quantity provided in the problem statement tells us that \(\mathbb{P}(B \cap J) = 0.2\)
The event “a randomly selected shopper will purchase either bread or jam” is given by \(B \cup J\), meaning we seek \(\mathbb{P}(B \cup J)\).
By the Addition Rule,
\[\begin{align*} \mathbb{P}(B \cup J) & = \mathbb{P}(B) + \mathbb{P}(J) - \mathbb{P}(B \cap J) \\ & = 0.5 + 0.3 - 0.2 = \boxed{0.6 = 60\%} \end{align*}\]
General Strategy for Probability Word Problems
Exercise 5
Two fair six-sided dice are rolled.
Today, we began our introduction to the field of probability.
We discussed the notions of experiments, outcomes, outcome spaces, events, and probabilities.
We then discussed three probability rules: the complement rule, the probability of the empty set, and the addition rule.
Next time, we will start talking about ways to compute the probability of more complex events under the assumption of equally likely outcomes.