Sampling Distribution

You are on page 1of 127

Overview

 7.1 Introduction to Sampling Distributions

 7.2 Central Limit Theorem for Means

 7.3 Central Limit Theorem for Proportions


7.1 Introduction to Sampling
Distributions
Objectives:
By the end of this section, I will be
able to…

1) Compute point estimates and sampling error.


2) Explain the sampling distribution of the sample
mean x
3) Describe the sampling distribution of the
sample mean x when the population is normal.
4) Use normal probability plots to assess
normality.
5) Find probabilities and percentiles for the
sample mean when the population is normal.
Point Estimates
 Use known statistics to estimate unknown
parameters and report a single number
as the estimate

 The value of the statistic is called the point


estimate

Table 7.1 Point estimation: Use statistics to estimate


unknown population parameters
Sampling error
 The distance between the point estimate and
its target parameter

Table 7.3 Sampling error for common


characteristics
Example

Page 349

Do problems 10 and 12
Example

Solutions:

10) s 0.9 1.0 0.1

12)
pˆ p 0.70 0.75 0.05
Example 7.3 - Commuting
times for student government
members
We are interested in how long it takes the five
members of the student government to
commute to school. The times (in minutes)
are given in Table 7.4. Since these five
people are all the members of the student
government, we can consider them to
constitute a population.
Example 7.3 continued

Table 7.4 Commuting times for the five members of the


student government

 a. Calculate the population mean commuting


time μ.

 b. Take a sample of the following student


government members: Amber, Brandon, and
Chantal. Find the sample mean commuting
time x and the sampling error |x - μ|.
Example 7.3 continued
Solution

 The mean commuting time of this population is


x 10 20 5 30 15
16 minutes
N 5
 For Amber, Brandon, and Chantal, the sample
mean commuting time is
x 10 20 5
x1 11.67 minutes
N 3
 The sampling error of this sample mean is
|x1 – | = |11.67 – 16|= 4.33 minutes.
Consider previous example and find all possible
samples of student government members,
the sample means, population mean,
and sampling errors (page 339):

Table 7.5 All possible samples of size 3 from population of student government members
Sampling distribution of the
sample mean x

 Collection of the sample means of all


possible samples of size n
Mean of the Sample Means

Calculate the mean of the sample


means as the average of the
sample means:

Note: the mean of the sample


means equals the population
mean in this example.
Fact 1
 The mean of the sampling distribution of the
sample mean x is the value of the population
mean .

 Denoted as x

 Read as “the mean of the sampling


distribution of x is ”
Mean of the Sample Means

For the previous example

x
Standard Deviation of the Population
Standard Deviation of the Sample Means

=3.5119
Fact 2
 The standard deviation of the sampling
distribution of the sample mean x is

x
/ n

 is the population standard deviation

 n is the sample size where n is assumed to


be very large (if n is not large, see the note
on page 340)
Previous example

x
Example
Page 350
Example

Solutions

a)
x 68 inches

x / n 3 inches / 10 0.95 inches


Example

Solutions

b)
x 68 inches

x / n 3 inches / 100 0.30 inches


Example

Solutions

c)
x 68 inches

x / n 3 inches / 1000 0.09 inches


Sampling Distribution of the
Sample Mean for a Normal
Population
Fact 3
 Is itself normal, regardless of sample size
Sampling Distribution of the
Sample Mean for a Normal
Population

Fact 4
 Distributed as normal with mean:

 And standard deviation:

x / n
Fact 5: Standardizing a Normal
Sampling Distribution for
Means
 When the sampling distribution of x
is normal, we may standardize to produce
the standard normal random variable Z as
follows:
x x x
Z
x / n
where is the population mean, is the
population standard deviation, and n
is the sample size.
Example

Page 350
Example

Solutions:

x $50,000

x / n $5000/ 25 $1000
Example

Solutions:

a) Z-score for sample mean of


52,000 is

x 52000 50000
Z 2
/ n 1000
Example

Solutions:

a) P( x $50,000) P( Z 2)
1 P( Z 2)
1 0.9772
0.0228
Look up in Table C
Example

(b) calculator
Example

(c) calculator

P(x $47,000)
normalcdf(-10^99,47000,50000,1000)
0.0013
Example

(c) calculator
Example

(c) calculator

P($52,000 x $53,000)
normalcdf(52000,53000,50000,1000)
0.0215
Example

Page 350

Do part (a)
Example

Solution:

a) $50,000 (note: for a normal


distribution, the mean and the
median are equal)
Example

Page 350

Do part (b)
Example

Solutions:

b) Find X C so that
P( X XC ) 0.95

for a normal distribution with


x $50,000

x / n $5000/ 25 $1000
Example

First find ZC so that

P( Z ZC ) 0.95

using the standard normal


distribution. From Table C
we get that:
ZC 1.655
Example

Use ZC to get X C by using the z-


score formula:
XC X C $50,000
ZC 1.655
$1000

XC $51,655
Example

Solutions (directly with calculator):

b)
XC invNorm(0. 95,50000,1 000) $51,644.85
Example

Page 350

Do parts (c) and (d)


Example

Solutions:

c) $48,355

XC invNorm(0. 05,50000,1 000) $48,355.15

d) $48,355 and $51,645


Normal Probability Plots

 A normal probability plot is a scatterplot of


the estimated cumulative normal
probabilities (expressed as percents) against
the corresponding data values in a data set.
Normal Probability Plot

FIGURE 7.4 Normal probability plot of


normal data.
Analyzing Normal Probability
Plots

 If the points in the normal probability plot


either cluster around a straight line or nearly
all fall within the curved bounds, then it is
likely that the data set is normal.

 Systematic deviations off the straight line


are evidence against the claim that the data
set is normal.
Normal Probability Plot

FIGURE 7.5 Normal probability plot of


right-skewed data.
Summary
 We can use sample statistics as point
estimates of the unknown population
parameters.

 For each statistic, sampling error is the


distance between the point estimate and its
target parameter.

 The sampling distribution of the sample


mean x for a given sample size n consists of
the collection of the sample means of all
possible samples of size n from the
population.
Summary
 The mean of the sampling distribution of the
sample mean x is the value of the population
mean μ (Fact 1).

 The standard deviation of the sampling


distribution of the sample mean x is
x / n , σ is the population standard
deviation (Fact 2).

 The sampling distribution of the sample


mean for a normal population is itself
normal, regardless of sample size (Fact 3).
Summary
 For a normal population, the sampling
distribution of the sample mean x is
distributed as normal (μ, / n ), where μ is
the population mean and σ is the population
standard deviation (Fact 4).

 Normal probability plots are used to assess


the normality of a data set.

 We can use Fact 4 to find probabilities and


percentiles using sample means.
7.2 Central Limit Theorem for
Means
Objectives:
By the end of this section, I will be
able to…

1) Describe the sampling distribution of x for


skewed and symmetric populations as the
sample size increases.

2) Apply the Central Limit Theorem for Means


to solve probability questions about the
sample mean.
Main Idea
 In this section we want to be able to
describe the shape of the distribution of the
sample means.
Symmetric Populations

 For a symmetric distribution, at n=20 the


sampling distribution of the mean is
approximately normal.
Example

Roll a fair die once. Make up a


table that represents the
probability distribution of
X=number on the die. Also, plot
the probability distribution in a
bar chart.
Example

Distribution (table format) is:

x P(x)
1 0.1667
2 0.1667
3 0.1667
4 0.1667
5 0.1667
6 0.1667

xP(x) 3.5
FIGURE 7.11 Distribution of a single fair die roll is symmetric.
Example

Roll a fair die one hundred times.


Take random samples of size 10
from these 100 rolls. For each
sample of size 10, calculate the
sample mean. Plot the
distribution of the means as a
histogram.
FIGURE 7.12 Sample means of size n = 10: already approaching normality.
Example

Roll a fair die one hundred times.


Take random samples of size 20
from these 100 rolls. For each
sample of size 20, calculate the
sample mean. Plot the
distribution of the means as a
histogram.
FIGURE 7.13 Sample means of size n = 20: approximately normal.
FIGURE 7.14 Normal probability plot for n = 20: acceptable normality.
Skewed Populations
 For a skewed population, sampling
distribution of the mean becomes
approximately normal as the sample size
approaches 30.
¯

FIGURE 7.10 Sampling distribution of x and normal probability plots for n = 10, 20, and 30.
Central Limit Theorem for Means
 Population with mean μ

 Standard deviation σ

 The sampling distribution of the sample


mean x becomes approximately normal with
mean μ and standard deviation / n as the
sample size gets larger

 Regardless of the shape of the population.


Rule of Thumb

 We consider n ≥ 30 as large enough


to apply the Central Limit Theorem for
any population.
Three Cases for the Sampling
Distribution of the Sample
Mean x
Case 1

 The population is normal.

 Then the sampling distribution of x is normal


(Fact 3 from 7.1).
Three Cases for the
Sampling Distribution of the
Sample Mean x continued
Case 2

 The population is either non-normal or of


unknown distribution and the sample size is
at least 30.

 Apply Central Limit Theorem for Means:


The sampling distribution of the sample
mean is approximately normal
Three Cases for the Sampling
Distribution of the Sample
Mean x continued
Case 3

 The population is either non-normal or of


unknown distribution and the sample size is
less than 30.

 Insufficient information to conclude that the


sampling distribution of the sample mean x
is either normal or approximately normal
Example

Page 362-363
Example

Solutions

6) Case 2
8) Case 3
10) Case 1
Example

Page 363
Example

Solutions

16(a) x $60,000

16(b)

x / n $10,000/ 16 $2,500

16(c) unknown
Example

Page 363
Example

Solutions

20(a) x 50 miles per gallon

20(b)

x / n 6 / 64 0.75 mpg

20(c) approximately normal


Example

Page 363
Example

Solution:
first notice that it is possible to find the
probability since the systolic blood pressure
readings are normally distributed so the
distribution of the sample mean is also
normal (case 1)

x 80

x / n 8 / 25 1.6
Example

Solution:

P(78 x 82) normalcdf(78,82,80,1.6) 0.7887


Example

Page 363
Example

Solution

Not possible: since the pollen


count distribution is not normally
distributed and the sample size is
smaller than 30, the sampling
distribution of the mean of x is
unknown.
Example

Page 363
Example

Solution:
first notice that it is possible to find the
probability since even though the pollen
count distribution is not normal, the sample
size is at least 30, so the distribution of the
sample mean is also normal (case)

x 8

x / n 1 / 64 0.125
Example

Solution (directly with calculator):

Find XC so that P( X XC ) 0.75

XC invNorm(0. 75,8,0.125 ) 8.1


Example

Page 364
Example
Solution:
(a) yes- case 1 applies

x 38.6o
o o
x / n 10 / 25 2

P( x 40o ) 1 P( x 40o )
1 0.7580
0.2420
Example
Solution:
(b) case 2 does not apply since the sample
size is less than 30.
Summary
 In this section, we examine the behavior of
the sample mean when the population is not
normal.

 The approximate normality of the sampling


distribution of the sample mean kicks in
much quicker when the original population is
symmetric rather than skewed.

 The Central Limit Theorem is one of the


most important results in statistics and is
stated as follows:
Summary
 Given a population with mean μ and
standard deviation σ, the sampling
distribution of the sample mean x becomes
approximately normal(μ, / n ) as the
sample size gets larger, regardless of the
shape of the population.

 This approximation applies for smaller


sample sizes when the original distribution is
more symmetric.
7.3 Central Limit Theorem for
Proportions
Objectives:
By the end of this section, I will be
able to…

1) Explain the sampling distribution of the


sample proportion ˆp.
2) Describe the sampling distribution of the
sample proportion ˆp for extreme and
moderate values of p.
3) Apply the Central Limit Theorem for
Proportions to solve probability questions
about the sample proportion.
ˆ
Sample Proportion p
 Suppose each individual in a population
either has or does not have a particular
characteristic
 For sample of size n sample proportion ˆ p
(read “p-hat”) is
x

n
 x represents the number of individuals in the
sample that have the particular
characteristic
 Use ˆ
p to estimate the unknown value of the
population proportion p
Example

Page 367, Example 7.14


Table 7.6 All possible samples of size 3 from population of student government members
Example 7.15 - Mean of sample
proportions
Calculate the mean of the ten sample
proportions ˆ
p from Table 7.6 page 367.
Example 7.15 continued
Solution

 Mean is

2 1 2 2 3 2 1 2 1 2
3 3 3 3 3 3 3 3 3 3
0.6
10
 ˆ equals the population proportion of
p
females for the original population,
p = 3/5 = 0.6.
Fact 6: Mean of the Sampling
Distribution of the Sample
Proportion pˆ
 The value of the population proportion ˆp

 Denoted as p̂

 where p̂ p

 Read as “the mean of the sampling


distribution of p is p”
Fact 7 - Standard Deviation of the
Sampling Distribution of the
Sample Proportion ˆp

p(1 p) pq

n n

where q 1 p

 p is the population proportion

 n is the sample size


Example

Page 379

Do 8 and 10 parts (a) and (b)


Example
Solutions

8(a) pˆ p 0.5

8(b) q 1 p 0.5

pq (0.5) (0.5)
pˆ 0.2236
n 5
Example
Solutions

10(a) pˆ p 0.01

10(b) q 1 p 0.99

pq (0.01) (0.99)
pˆ 0.0044
n 500
Fact 8 - Conditions for Approximate
Normality for the Sampling
Distribution of the Sample
Proportion pˆ

 May be considered approximately normal


only if both the following conditions hold:

(1) np ≥ 5 and (2) n(1 - p) ≥ 5


Example

Page 379

Do part (c)
Example
Solutions

8(c)

np 5 (0.5) 2.5 5
n(1 p) nq 5 (0.5) 2.5 5
Unknown (we cannot conclude that the
sampling distribution of the proportion is
normal in this case)
Example
Solutions

10(c)

np 500 (0.01) 5
n(1 p) nq 500 (0.99) 495 5
We conclude that the sampling distribution
of the proportion is approximately normal in
this case
Fact 8 continued
 Given a value for p, the minimum sample
size required to produce approximate
normality in the sampling distribution of the
proportion can be found by solving each of
these for n:

np 5 and nq 5

and choosing the next largest integer value


that is at least as large as both of these
values for n.
Example

Page 379

Do problem 16
Example
Solutions

16)
np n (0.05) 5 n 100
nq n (0.95) 5 n 5.26

The minimum sample size is 100


Fact 9 - Standardizing a
Normal Sampling Distribution
for Proportions
 When the sampling distribution of p is
normal (or approximately normal), we
can standardize to produce the standard
normal Z:
pˆ pˆ pˆ p
Z
pˆ p 1 p
n

 where p is the population proportion of


successes and n is the sample size.
Example

Page 379

Do problem 30
Example
Solutions

30) First check that we can


assume a normal distribution:
np 5 (0.5) 2.5 5
n(1 p) nq 5 (0.5) 2.5 5

We cannot conclude that the sampling


distribution of the proportion is normal in
this case and we cannot find the probability.
Example

Page 379

Do problem 32
Example
Solutions

First check that we can assume a


normal distribution for p̂ :

np 500 (0.01) 5
n(1 p) nq 500 (0.99) 495 5
Example
Solutions

For normal distribution use

mean: pˆ p 0.01

Standard deviation:

pq (0.01) (0.99)
pˆ 0.0044
n 500
Example
Solutions

z-score method (Table)

pˆ pˆ 0.011 0.01
Z 0.23
pˆ 0.0044
Example
Solutions

using standard normal distribution


and table lookup:

P( Z 0.23) 1 P( Z 0.23)
1 0.5910
0.4090
From Table T-10
Example
Solutions

It is more accurate to compute Z as

pˆ pˆ 0.011 0.01
Z 0.22
pˆ (0.01) (0.99) / 500
so that:
P( Z 0.22) 1 P( Z 0.22)
1 0.5871
0.4129
Example

Page 379

Do problem 36
Example
Solutions

First check that we can assume a


normal distribution for p̂ :

np 400 (0.5) 200


n(1 p) nq 400 (0.50) 200

Yes- both are greater than 5


Example
Solutions

Find the value: p̂c

So that

P( pˆ pˆ c ) 0.90

p̂c is the 90th percentile of values of p̂


Example
Solutions

For normal distribution use

mean: pˆ p 0.5

standard deviation:

pq (0.5) (0.5)
pˆ 0.025
n 400
Example
Solutions

Using calculator gives:

pˆ C invNorm(0.90,0.5,0.025) 0.532
Central Limit Theorem for
Proportions
 the sampling distribution of the
sample proportion p̂ follows an
approximately normal distribution with
 mean p

 standard deviation pˆ
p(1 p) pq
n n

when the following conditions are


satisfied:
np ≥ 5 and n(1 - p) ≥ 5
Example

Page 379
Example
Solutions

(a) Take p=0.87 and choose smallest


integer value of n so that both of
the following are true:
np n(0.87) 5
n(1 p) nq n (0.13) 5

Answer: n n *
39
Example
Solutions

(b) Check that we can assume a


normal distribution for p̂ with a
sample size of n=39

np 39 (0.87) 33.93
n(1 p) nq 39 (0.13) 5.07

Yes- both larger than 5


Example
Solutions

(c) The Central Limit Theorem tells us


that the sampling distribution of
p̂ is approximately normal when

n=39
Example
Use n=50

(d) The Central Limit Theorem gives


that the sampling distribution of
p̂ is approximately normal with
pˆ p 0.87

p(1 p) 0.87 (0.13)


pˆ 0.0476
n 50
Example
(d) For n=50, find

45
P pˆ P pˆ 0.90
50

normalcdf(0.90,1099 ,0.87,0.0476) 0.2643


Summary
 The sampling distribution of the sample
proportion pˆ for a given sample size n
consists of the collection of the sample
proportions of all possible samples of size n
from the population.

 The approximate normality of the sampling


distribution of the sample proportion kicks in
much quicker when the population
proportion is moderate rather than extreme.
Summary
 According to the Central Limit Theorem for
Proportions, the sampling distribution of the
sample proportion pˆ follows an
approximately normal distribution with mean
μˆp = p and standard deviation p p 1 p /n
the following conditions are satisfied:
(1) np ≥ 5 and (2) n(1 - p) ≥ 5.

 We can use Fact 9 to find probabilities and


percentiles for sample proportions.

You might also like