Sampling Distributions: Engineering Data Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Romblon State University

College of Engineering and Technology


Civil Engineering Department

Engineering Data Analysis

Chapter 6
Sampling Distributions

Prepared by:

Engr. Jeffy Jones F. Fetalvero


Lecturer
ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Topic Overview

A statistic, such as the sample mean or the sample standard deviation, is a


number computed from a sample. Since a sample is random, every statistic is a
random variable: it varies from sample to sample in a way that cannot be
predicted with certainty. As a random variable it has a mean, a standard
deviation, and a probability distribution. The probability distribution of a statistic is
called its sampling distribution. Typically, sample statistics are not ends in
themselves, but are computed in order to estimate the corresponding population
parameters. This chapter introduces the concepts of the mean, the standard
deviation, and the sampling distribution of a sample statistic, with an emphasis on
the sample mean

Intended Learning Outcomes

At the end of this chapter, the students are expected to:

1. To become familiar with the concept of the probability distribution of the


sample mean.
2. To understand the meaning of the formulas for the mean and standard
deviation of the sample mean.
3. To learn what the sampling distribution of 𝑋̅ is when the sample size is large.
4. To learn what the sampling distribution of 𝑋̅ is when the population is normal.
5. To understand the meaning of the formulas for the mean and standard
deviation of the sample proportion.
6. To learn what the sampling distribution of 𝑝̂ is when the sample size is large.

ENGR. JEFFY JONES F. FETALVERO 2


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

6.1: The Mean and Standard Deviation of the Sample Mean

Suppose we wish to estimate the mean μ of a population. In actual practice we


would typically take just one sample. Imagine however that we take sample after
sample, all of the same size n, and compute the sample mean 𝑥̅ each time. The
sample mean x is a random variable: it varies from sample to sample in a way
that cannot be predicted with certainty. We will write 𝑋̅ when the sample mean
is thought of as a random variable, and write x for the values that it takes. The
random variable 𝑋̅ has a mean, denoted 𝜇𝑋̅ , and a standard deviation, denoted
𝜎𝑋̅ . Here is an example with such a small population and small sample size that
we can actually write down every single sample.

Example 6.1.1
A rowing team consists of four rowers who weigh 152, 156, 160, and 164 pounds.
Find all possible random samples with replacement of size two and compute the
sample mean for each one. Use them to find the probability distribution, the
mean, and the standard deviation of the sample mean 𝑋̅.

Solution
The following table shows all possible samples with replacement of size two, along
with the mean of each:

The table shows that there are seven possible values of the sample mean 𝑋̅. The
value 𝑥̅ = 152 happens only one way (the rower weighing 152 pounds must be
selected both times), as does the value 𝑥̅ = 164, but the other values happen
more than one way, hence are more likely to be observed than 152 and 164 are.
Since the 16 samples are equally likely, we obtain the probability distribution of
the sample mean just by counting:

ENGR. JEFFY JONES F. FETALVERO 3


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

For 𝜇𝑋̅ , we obtain

For 𝜎𝑋̅ , we first compute

which is 24,974, so that

The mean and standard deviation of the population {152,156,160,164} in the


example are μ = 158 and 𝜎 = √20. The mean of the sample mean 𝑋̅ that we have
just computed is exactly the mean of the population. The standard deviation of
the sample mean 𝑋̅ that we have just computed is the standard deviation of the
population divided by the square root of the sample size: √10 = √20/√2. These
relationships are not coincidences, but are illustrations of the following formulas.

Suppose random samples of size n are drawn from a population with mean μ and
standard deviation σ. The mean 𝜇𝑋̅ and standard deviation 𝜎𝑋̅ of the sample
mean 𝑋̅ satisfy

The first equation says that if we could take every possible sample from the
population and compute the corresponding sample mean, then those numbers
would center at the number we wish to estimate, the population mean μ. The
second equation says that averages computed from samples vary less than
individual measurements on the population do, and quantifies the relationship.

ENGR. JEFFY JONES F. FETALVERO 4


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Example 6.1.2
The mean and standard deviation of the tax value of all vehicles registered in a
certain state are μ = $13,525 and σ = $4,180. Suppose random samples of size 100
are drawn from the population of vehicles. What are the mean 𝜇𝑋̅ and standard
deviation 𝜎𝑋̅ of the sample mean 𝑋̅?

Solution
Since n = 100, the formulas yield

6.2: The Sampling Distribution of the Sample Mean

In Example 6.1.1, we constructed the probability distribution of the sample mean


for samples of size two drawn from the population of four rowers. The probability
distribution is:

Figure 6.2.1 shows a side-by-side comparison of a histogram for the original


population and a histogram for this distribution. Whereas the distribution of the
population is uniform, the sampling distribution of the mean has a shape
approaching the shape of the familiar bell curve. This phenomenon of the
sampling distribution of the mean taking on a bell shape even though the
population distribution is not bell-shaped happens in general. Here is a somewhat
more realistic example.

Figure 6.2.1. Distribution of a Population and a Sample Mean

ENGR. JEFFY JONES F. FETALVERO 5


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Suppose we take samples of size 1, 5, 10, or 20 from a population that consists


entirely of the numbers 0 and 1, half the population 0, half 1, so that the
population mean is 0.5. The sampling distributions are:

Histograms illustrating these distributions are shown in Figure 6.2.2.

Figure 6.2.2. Distributions of the Sample Mean

ENGR. JEFFY JONES F. FETALVERO 6


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

As n increases the sampling distribution of 𝑋̅ evolves in an interesting way: the


probabilities on the lower and the upper ends shrink and the probabilities in the
middle become larger in relation to them. If we were to continue to increase n
then the shape of the sampling distribution would become smoother and more
bell-shaped.

What we are seeing in these examples does not depend on the particular
population distributions involved. In general, one may start with any distribution
and the sampling distribution of the sample mean will increasingly resemble the
bell-shaped normal curve as the sample size increases. This is the content of the
Central Limit Theorem.

The Central Limit Theorem


For samples of size 30 or more, the sample mean is approximately normally
𝜎
distributed, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = , where n is the
√𝑛
sample size. The larger the sample size, the better the approximation. The Central
Limit Theorem is illustrated for several common population distributions in Figure
6.2.3.

Figure 6.2.3. Distribution of Populations and Sample Means

ENGR. JEFFY JONES F. FETALVERO 7


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

The dashed vertical lines in the figures locate the population mean. Regardless of
the distribution of the population, as the sample size is increased the shape of the
sampling distribution of the sample mean becomes increasingly bell-shaped,
centered on the population mean. Typically, by the time the sample size is 30 the
distribution of the sample mean is practically the same as a normal distribution.

The importance of the Central Limit Theorem is that it allows us to make probability
statements about the sample mean, specifically in relation to its value in
comparison to the population mean, as we will see in the examples. But to use
the result properly we must first realize that there are two separate random
variables (and therefore two probability distributions) at play:

1. X, the measurement of a single element selected at random from the


population; the distribution of X is the distribution of the population, with mean the
population mean μ and standard deviation the population standard deviation σ;

2. 𝑋̅, the mean of the measurements in a sample of size n; the distribution of 𝑋̅ is


𝜎
its sampling distribution, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = 𝑛 .

Example 6.2.1
Let 𝑋̅ be the mean of a random sample of size 50 drawn from a population with
mean 112 and standard deviation 40.

1. Find the mean and standard deviation of 𝑋̅.


2. Find the probability that 𝑋̅ assumes a value between 110 and 114.
3. Find the probability that 𝑋̅ assumes a value greater than 113.

Solution:
1. By the formulas in the previous section

2. Since the sample size is at least 30, the Central Limit Theorem applies: 𝑋̅ is
approximately normally distributed. We compute probabilities using normal
distribution in the usual way, just being careful to use 𝜎𝑋̅ and not σ when we
standardize:

ENGR. JEFFY JONES F. FETALVERO 8


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

3. Similarly

Note that if in the above example we had been asked to compute the probability
that the value of a single randomly selected element of the population exceeds
113, that is, to compute the number P(X > 113), we would not have been able to
do so, since we do not know the distribution of X, but only that its mean is 112 and
its standard deviation is 40. By contrast we could compute P(𝑋̅ > 113) even without
complete knowledge of the distribution of X because the Central Limit Theorem
guarantees that 𝑋̅ is approximately normal.

Normally Distributed Populations


The Central Limit Theorem says that no matter what the distribution of the
population is, as long as the sample is “large,” meaning of size 30 or more, the
sample mean is approximately normally distributed. If the population is normal to
begin with then the sample mean also has a normal distribution, regardless of the
sample size.

For samples of any size drawn from a normally distributed population, the sample
𝜎
mean is normally distributed, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = 𝑛,

where n is the sample size.

The effect of increasing the sample size is shown in Figure 6.2.4.

ENGR. JEFFY JONES F. FETALVERO 9


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Figure 6.2.4: Distribution of Sample Means for a Normal Population

Example 6.2.2
An automobile battery manufacturer claims that its midgrade battery has a mean
life of 50 months with a standard deviation of 6 months. Suppose the distribution
of battery lives of this particular brand is approximately normal.

1. On the assumption that the manufacturer’s claims are true, find the probability
that a randomly selected battery of this type will last less than 48 months.
2. On the same assumption, find the probability that the mean of a random
sample of 36 such batteries will be less than 48 months.

Solution:
1. Since the population is known to have a normal distribution

ENGR. JEFFY JONES F. FETALVERO 10


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

𝜎 6
2. The sample mean has mean 𝜇𝑋̅ = 𝜇 = 50 and standard deviation 𝜎𝑋̅ = = =
√𝑛 √36
1. Thus

6.3: The Sample Proportion

Often sampling is done in order to estimate the proportion of a population that


has a specific characteristic, such as the proportion of all items coming off an
assembly line that are defective or the proportion of all people entering a retail
store who make a purchase before leaving. The population proportion is denoted
p and the sample proportion is denoted 𝑝̂ . Thus, if in reality 43% of people entering
a store make a purchase before leaving,

if in a sample of 200 people entering the store, 78 make a purchase,

The sample proportion is a random variable: it varies from sample to sample in a


way that cannot be predicted with certainty. Viewed as a random variable it will
be written 𝑃̂. It has a mean 𝜇𝑃̂ and a standard deviation 𝜎𝑃̂ . Here are formulas for
their values.

Suppose random samples of size n are drawn from a population in which the
proportion with a characteristic of interest is p. The mean 𝜇𝑃̂ and standard
deviation 𝜎𝑃̂ of the sample proportion 𝑃̂ satisfy

ENGR. JEFFY JONES F. FETALVERO 11


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

where q = 1 − p.

The Central Limit Theorem has an analogue for the population proportion 𝑃̂ . To
see how, imagine that every element of the population that has the characteristic
of interest is labeled with a 1, and that every element that does not is labeled with
a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly
the proportion of the population with the special characteristic is the proportion
of the numerical population that are ones; in symbols,

But of course, the sum of all the zeros and ones is simply the number of ones, so
the mean μ of the numerical population is

Thus, the population proportion p is the same as the mean μ of the corresponding
population of zeros and ones. In the same way the sample proportion 𝑝̂ is the
same as the sample mean 𝑥̅ . Thus, the Central Limit Theorem applies to 𝑝̂ .
However, the condition that the sample be large is a little more complicated than
just being of size at least 30.

The Sampling Distribution of the Sample Proportion


For large samples, the sample proportion is approximately normally distributed,
with mean and standard deviation

A sample is large if the interval lies wholly within the interval


[0,1].

In actual practice p is not known, hence neither is 𝜎𝑃̂ . In that case in order to
check that the sample is sufficiently large we substitute the known quantity 𝑝̂ for
p. This means checking that the interval

ENGR. JEFFY JONES F. FETALVERO 12


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

lies wholly within the interval [0,1]. This is illustrated in the examples.

Figure 6.3.1 shows that when p = 0.1, a sample of size 15 is too small but a sample
of size 100 is acceptable.

Figure 6.3.1: Distribution of Sample Proportions

Figure 6.3.2 shows that when p = 0.5 a sample of size 15 is acceptable.

Figure 6.3.2: Distribution of Sample Proportions for p = 0.5 and n = 15

Example 6.3.1
Suppose that in a population of voters in a certain region 38% are in favor of
particular bond issue. Nine hundred randomly selected voters are asked if they
favor the bond issue.

1. Verify that the sample proportion 𝑝̂ computed from samples of size 900 meets
the condition that its sampling distribution be approximately normal.
2. Find the probability that the sample proportion computed from a sample of
size 900 will be within 5 percentage points of the true population proportion.

ENGR. JEFFY JONES F. FETALVERO 13


ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Solution:
1. The information given is that p = 0.38, hence q = 1 – p = 0.62. First, we use the
formulas to compute the mean and standard deviation of 𝑝̂ :

Then 3𝜎𝑃̂ = 3(0.01618) = 0.04854 ≈ 0.05 so

which lies wholly within the interval [0,1], so it is safe to assume that 𝑝̂ is
approximately normally distributed.

2. To be within 5 percentage points of the true population proportion 0.38 means


to be between 0.38 − 0.05 = 0.33 and 0.38 + 0.05 = 0.43. Thus

ENGR. JEFFY JONES F. FETALVERO 14

You might also like