Ethan P. Marzban
2023-05-11
Over the course of the past few lectures, we’ve been dealing primarily with population proportions.
A natural point estimate of
We then used the sampling distribution of
Now we will turn our attention to a different population parameter.
Again, it will be useful to establish some notation:
Just as
The sampling distribution of
We will follow the general idea we used before of constructing confidence intervals as
In this case, we use
It turns out that, assuming the population mean is
Therefore, our confidence intervals will take the form
Let’s work on finding the sampling distribution of
It turns out that the first thing we need to ask is whether the underlying population is normally distributed or not.
If the underlying population is normally distributed [again with population mean
Worked-Out Example 1
The heights of adult males is assumed to follow a normal distribution with mean 70 in and standard deviation 15 in. A representative sample of 120 adult males is taken, and the average height of males in this sample is recorded.
The value of 70 in is a population parameter, as it is the true average height of all adult males.
The quantity we seek is
Alright, so that explains what to do if the population values follow a normal distribution.
But what if they don’t? In real-world settings, we don’t typically get to know exactly what the population distribution is.
If our population is not normally distributed, we need to ask ourselves whether we have a “large enough sample”.
Admittedly, there isn’t a single agreed-upon cutoff for “large enough”- for the purposes of this class, we will use
If the population is non-normal, and the sample size is not large enough…
… we can’t do anything.
More specifically, there aren’t any results we can use to confidently make inferences about the population mean- there is just too much uncertainty, between the uncertainty regarding the population’s distribution and the small sample size.
If the population is non-normal, and the sample size is large enough…
… we’re still (perhaps surprisingly) in business!
It turns out that if
In fact, this is such an important result, we give it a name:
Central Limit Theorem for the Sample Mean
If we have reasonably representative samples of large enough size
Worked-Out Example 2
The temperatures collected at all weather stations in Antarctica follow some unknown distribution with unknown mean and known standard deviation 8oF. A researcher records the temperature measurements from a representative sample of 81 different weather stations, and finds the average temperature to be 26oF.
The population is the set of all weather stations in Antarctica.
The sample is the 81 weather stations selected by the researcher.
The random variable of interest is
Part (d): This is where things get interesting!
Again, what we have found is
We seek
Computing the necessary
Notice that in the previous worked-out example (and, indeed, in the CLT for sample means), we need information on the true population standard deviation
What happens if we don’t have access to
Well, we encountered a somewhat similar situation in our discussion on proportions; the standard error of
Does anyone remember how we solved this issue in the context of population proportions?
Can anyone propose a point estimator for
That’s right;
In other words, our proposition is to use confidence intervals of the form
Notice, however, that this introduces additional uncertainty into the problem as
It turns out that the additional uncertainty introduced is so large that we become no longer able to use the normal distribution.
Firstly, recall that we used percentiles of the standard normal distribution because
Mathematically, what the above discussion is saying is that the distribution of
It turns out that, still assuming a large enough sample size, the quantity above follows what is known as a t-distribution.
The
However, one key difference is that the
Another key property is that, for all finite degrees of freedom, the tails of the t-distribution density curve are “wider” (i.e. higher) than the tails of the standard normal density curve.
An interesting fact is that the t-distribution with
Here is the result we’ve been working toward: with samples of reasonably large size
As such, our confidence intervals become
Worked-Out Example 3
A sociologist is interested in performing inference on the true average monthly income (in thousands of dollars) of all citizens of the nation of Gauchonia. As such, she takes a representative sample of 49 people, and finds that these 49 people have an average monthly income of 2.25 and a standard deviation of 1.66.
The population is the set of all Gauchonian residents.
The sample is the set of 49 Gauchonian residents included in the sociologist’s sample.
The random variable of interest is
Part (d)
Is the population normally distributed?
Is the sample size large enough?
Do we know the population standard deviation?
Therefore, we need to use the t-distribution with
Specifically, we need to find the 2.5th percentile of the
On Monday, during Discussion Section, you will talk about how to read a
Therefore, our 95% confidence interval takes the form
The interpretation of this interval is much the same as our intervals for proportions:
We are 95% confident that the true average monthly income (in thousands of dollars) of Gauchonian residents is between 1.773 and 2.727.