Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell

Section 7: Sampling Distributions & CLT
Introduction to Probability & Statistics
Dr. Oliver Russell
201 - SN1
Lecture 2: Sections 7.2-7.4
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 1 / 17
RECAP: parameters vs. statistics
Definition
A population parameter is a numerical descriptive measure of a
population. Because it is based on the observations in the population, its
value is almost always unknown.
Definition
A sample statistic is a numerical descriptive measure of a sample. It is
calculated from the observations in the sample.
RECAP: good estimator: summary
In order to make an inference about a population parameter, we choose a

sample statistic with a sampling distribution that:
is unbiased,
has a smaller standard deviation than the others (i.e. the statistic
with the smallest standard error ).
RECAP: sampling distribution of X̄ from normal
population
Theorem
Consider a random sample of n observations selected from a normal
population with mean µ and standard deviation σ. Then the sampling
distribution of X̄ is normal with mean
µX̄ = µ
and standard deviation

σ
σX̄ = √ .
n
RECAP: Central Limit Theorem (CLT)
Theorem
Consider a random sample of n observations selected from any population
with mean µ and standard deviation σ. Then, when n is sufficiently large,
the sampling distribution of X̄ will be approximately normal with mean
µX̄ = µ
and standard deviation

σ
σX̄ = √ .
n
RECAP: sampling distribution of X̄
Thus, the CLT says that for large enough n, approximately,
σ2

X̄ ∼ N µ, .
n
or, equivalently,
X̄ − µ
√ = Z ∼ N (0, 1) .
σ/ n
The larger n is, the closer X̄ becomes to a true normal distribution. For
most sampled populations, sample sizes of n ≥ 30 will suffice for the
normal approximation to be reasonable.
Furthermore,
Quick quiz (Example 7.2.1)
At Marianopolis, the R-score of the student population has an average
value of 29.2, with 2.8 standard deviation.
1 If a sample of size n = 49 is selected from the Marianopolis student
population, what approximate distribution does the sample mean

(that is, X̄ ) obey?
2 What is the probability that the average R-score computed from a
sample of 49 randomly selected Marianopolis students is at least 30?

3 What values of the sample mean, centered around µ = 29.2, are likely
to occur with 95% probability?
Population vs. sample proportions
Definition
When discussing data which only have 2 potential outcomes (say, success
or failure), the binomial proportion of a population, p, is the
population’s proportion of successes.
Definition
The sample proportion, P̂, is a random variable representing the
proportion of successes in a randomly drawn sample.
Corollary of CLT for proportions
Corollary
By the CLT, if a sample size is large enough, then it turns out that the
random variable P̂ is also approximately normally distributed with mean
µP̂ = p
and standard deviation r

p(1 − p)
σP̂ = .
n
Sampling distribution of P̂
In other words, if n large enough, then approximately,

p(1 − p)
P̂ ∼ N p, .
n
or, equivalently,
P̂ − p
p ∼ N (0, 1) .
p(1 − p)/n
Here, large enough means n ≥ 30, np ≥ 10 and n(1 − p) ≥ 10.
Quick quiz
Prove the result on the previous slide. Hint: start with the original CLT
(for X̄ ) and recognize that any randomly chosen observation, Xi , in a
dichotomous sample is a Bernoulli(p) random variable.
Quick quiz
60% of a city’s voters favour candidate A for mayor. In a random sample
of 100 voters, what is the probability that fewer than half are in favour of
candidate A?
Let us assume that 72% of current Marianopolis students exercise at least
5 hours (on average) per week. A sample of 81 students is collected at
random and are asked to complete a questionnaire about their health and
exercise habits.
1 True or false: 72% of the students in the sample exercise at least 5
hours per week (on average)?

2 Let P̂ = “The proportion of students in the sample who exercise 5 or
more hours per week.” What distribution does variable P̂ obey?

Justify your claim.
3 What is the probability that two thirds or less of the surveyed
students exercise at least 5 hours per week?

4 What values of the sample proportion, centered around p = 0.72, are
likely to occur with 90% probability?
Summary: sampling distributions of X̄ and P̂

Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell

Uploaded by

Copyright:

Available Formats

Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell

Uploaded by

Copyright:

Available Formats

Section 7: Sampling Distributions & CLT

Introduction to Probability & Statistics

Dr. Oliver Russell

Lecture 2: Sections 7.2-7.4

In order to make an inference about a population parameter, we choose a

and standard deviation

and standard deviation

population, what approximate distribution does the sample mean

sample of 49 randomly selected Marianopolis students is at least 30?

to occur with 95% probability?

and standard deviation r

In other words, if n large enough, then approximately,

hours per week (on average)?

more hours per week.” What distribution does variable P̂ obey?

students exercise at least 5 hours per week?

likely to occur with 90% probability?

You might also like