Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell
Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell
Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell
201 - SN1
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 1 / 17
RECAP: parameters vs. statistics
Definition
A population parameter is a numerical descriptive measure of a
population. Because it is based on the observations in the population, its
value is almost always unknown.
Definition
A sample statistic is a numerical descriptive measure of a sample. It is
calculated from the observations in the sample.
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 2 / 17
RECAP: good estimator: summary
is unbiased,
has a smaller standard deviation than the others (i.e. the statistic
with the smallest standard error ).
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 3 / 17
RECAP: sampling distribution of X̄ from normal
population
Theorem
Consider a random sample of n observations selected from a normal
population with mean µ and standard deviation σ. Then the sampling
distribution of X̄ is normal with mean
µX̄ = µ
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 4 / 17
RECAP: Central Limit Theorem (CLT)
Theorem
Consider a random sample of n observations selected from any population
with mean µ and standard deviation σ. Then, when n is sufficiently large,
the sampling distribution of X̄ will be approximately normal with mean
µX̄ = µ
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 5 / 17
RECAP: sampling distribution of X̄
Thus, the CLT says that for large enough n, approximately,
σ2
X̄ ∼ N µ, .
n
or, equivalently,
X̄ − µ
√ = Z ∼ N (0, 1) .
σ/ n
The larger n is, the closer X̄ becomes to a true normal distribution. For
most sampled populations, sample sizes of n ≥ 30 will suffice for the
normal approximation to be reasonable.
Furthermore,
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 6 / 17
Quick quiz (Example 7.2.1)
At Marianopolis, the R-score of the student population has an average
value of 29.2, with 2.8 standard deviation.
1 If a sample of size n = 49 is selected from the Marianopolis student
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 7 / 17
Quick quiz (Example 7.2.1)
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 8 / 17
Population vs. sample proportions
Definition
When discussing data which only have 2 potential outcomes (say, success
or failure), the binomial proportion of a population, p, is the
population’s proportion of successes.
Definition
The sample proportion, P̂, is a random variable representing the
proportion of successes in a randomly drawn sample.
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 9 / 17
Corollary of CLT for proportions
Corollary
By the CLT, if a sample size is large enough, then it turns out that the
random variable P̂ is also approximately normally distributed with mean
µP̂ = p
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 10 / 17
Sampling distribution of P̂
or, equivalently,
P̂ − p
p ∼ N (0, 1) .
p(1 − p)/n
Here, large enough means n ≥ 30, np ≥ 10 and n(1 − p) ≥ 10.
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 11 / 17
Quick quiz
Prove the result on the previous slide. Hint: start with the original CLT
(for X̄ ) and recognize that any randomly chosen observation, Xi , in a
dichotomous sample is a Bernoulli(p) random variable.
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 12 / 17
Quick quiz
60% of a city’s voters favour candidate A for mayor. In a random sample
of 100 voters, what is the probability that fewer than half are in favour of
candidate A?
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 13 / 17
Quick quiz (Example 7.4.1)
Let us assume that 72% of current Marianopolis students exercise at least
5 hours (on average) per week. A sample of 81 students is collected at
random and are asked to complete a questionnaire about their health and
exercise habits.
1 True or false: 72% of the students in the sample exercise at least 5
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 14 / 17
Quick quiz (Example 7.4.1)
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 15 / 17
Quick quiz (Example 7.4.1)
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 16 / 17
Summary: sampling distributions of X̄ and P̂
201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 17 / 17