Sampling Distributions: Engineering Data Analysis
Sampling Distributions: Engineering Data Analysis
Sampling Distributions: Engineering Data Analysis
Chapter 6
Sampling Distributions
Prepared by:
Topic Overview
Example 6.1.1
A rowing team consists of four rowers who weigh 152, 156, 160, and 164 pounds.
Find all possible random samples with replacement of size two and compute the
sample mean for each one. Use them to find the probability distribution, the
mean, and the standard deviation of the sample mean 𝑋̅.
Solution
The following table shows all possible samples with replacement of size two, along
with the mean of each:
The table shows that there are seven possible values of the sample mean 𝑋̅. The
value 𝑥̅ = 152 happens only one way (the rower weighing 152 pounds must be
selected both times), as does the value 𝑥̅ = 164, but the other values happen
more than one way, hence are more likely to be observed than 152 and 164 are.
Since the 16 samples are equally likely, we obtain the probability distribution of
the sample mean just by counting:
Suppose random samples of size n are drawn from a population with mean μ and
standard deviation σ. The mean 𝜇𝑋̅ and standard deviation 𝜎𝑋̅ of the sample
mean 𝑋̅ satisfy
The first equation says that if we could take every possible sample from the
population and compute the corresponding sample mean, then those numbers
would center at the number we wish to estimate, the population mean μ. The
second equation says that averages computed from samples vary less than
individual measurements on the population do, and quantifies the relationship.
Example 6.1.2
The mean and standard deviation of the tax value of all vehicles registered in a
certain state are μ = $13,525 and σ = $4,180. Suppose random samples of size 100
are drawn from the population of vehicles. What are the mean 𝜇𝑋̅ and standard
deviation 𝜎𝑋̅ of the sample mean 𝑋̅?
Solution
Since n = 100, the formulas yield
What we are seeing in these examples does not depend on the particular
population distributions involved. In general, one may start with any distribution
and the sampling distribution of the sample mean will increasingly resemble the
bell-shaped normal curve as the sample size increases. This is the content of the
Central Limit Theorem.
The dashed vertical lines in the figures locate the population mean. Regardless of
the distribution of the population, as the sample size is increased the shape of the
sampling distribution of the sample mean becomes increasingly bell-shaped,
centered on the population mean. Typically, by the time the sample size is 30 the
distribution of the sample mean is practically the same as a normal distribution.
The importance of the Central Limit Theorem is that it allows us to make probability
statements about the sample mean, specifically in relation to its value in
comparison to the population mean, as we will see in the examples. But to use
the result properly we must first realize that there are two separate random
variables (and therefore two probability distributions) at play:
Example 6.2.1
Let 𝑋̅ be the mean of a random sample of size 50 drawn from a population with
mean 112 and standard deviation 40.
Solution:
1. By the formulas in the previous section
2. Since the sample size is at least 30, the Central Limit Theorem applies: 𝑋̅ is
approximately normally distributed. We compute probabilities using normal
distribution in the usual way, just being careful to use 𝜎𝑋̅ and not σ when we
standardize:
3. Similarly
Note that if in the above example we had been asked to compute the probability
that the value of a single randomly selected element of the population exceeds
113, that is, to compute the number P(X > 113), we would not have been able to
do so, since we do not know the distribution of X, but only that its mean is 112 and
its standard deviation is 40. By contrast we could compute P(𝑋̅ > 113) even without
complete knowledge of the distribution of X because the Central Limit Theorem
guarantees that 𝑋̅ is approximately normal.
For samples of any size drawn from a normally distributed population, the sample
𝜎
mean is normally distributed, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = 𝑛,
√
where n is the sample size.
Example 6.2.2
An automobile battery manufacturer claims that its midgrade battery has a mean
life of 50 months with a standard deviation of 6 months. Suppose the distribution
of battery lives of this particular brand is approximately normal.
1. On the assumption that the manufacturer’s claims are true, find the probability
that a randomly selected battery of this type will last less than 48 months.
2. On the same assumption, find the probability that the mean of a random
sample of 36 such batteries will be less than 48 months.
Solution:
1. Since the population is known to have a normal distribution
𝜎 6
2. The sample mean has mean 𝜇𝑋̅ = 𝜇 = 50 and standard deviation 𝜎𝑋̅ = = =
√𝑛 √36
1. Thus
Suppose random samples of size n are drawn from a population in which the
proportion with a characteristic of interest is p. The mean 𝜇𝑃̂ and standard
deviation 𝜎𝑃̂ of the sample proportion 𝑃̂ satisfy
where q = 1 − p.
The Central Limit Theorem has an analogue for the population proportion 𝑃̂ . To
see how, imagine that every element of the population that has the characteristic
of interest is labeled with a 1, and that every element that does not is labeled with
a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly
the proportion of the population with the special characteristic is the proportion
of the numerical population that are ones; in symbols,
But of course, the sum of all the zeros and ones is simply the number of ones, so
the mean μ of the numerical population is
Thus, the population proportion p is the same as the mean μ of the corresponding
population of zeros and ones. In the same way the sample proportion 𝑝̂ is the
same as the sample mean 𝑥̅ . Thus, the Central Limit Theorem applies to 𝑝̂ .
However, the condition that the sample be large is a little more complicated than
just being of size at least 30.
In actual practice p is not known, hence neither is 𝜎𝑃̂ . In that case in order to
check that the sample is sufficiently large we substitute the known quantity 𝑝̂ for
p. This means checking that the interval
lies wholly within the interval [0,1]. This is illustrated in the examples.
Figure 6.3.1 shows that when p = 0.1, a sample of size 15 is too small but a sample
of size 100 is acceptable.
Example 6.3.1
Suppose that in a population of voters in a certain region 38% are in favor of
particular bond issue. Nine hundred randomly selected voters are asked if they
favor the bond issue.
1. Verify that the sample proportion 𝑝̂ computed from samples of size 900 meets
the condition that its sampling distribution be approximately normal.
2. Find the probability that the sample proportion computed from a sample of
size 900 will be within 5 percentage points of the true population proportion.
Solution:
1. The information given is that p = 0.38, hence q = 1 – p = 0.62. First, we use the
formulas to compute the mean and standard deviation of 𝑝̂ :
which lies wholly within the interval [0,1], so it is safe to assume that 𝑝̂ is
approximately normally distributed.