Section 7: Sampling Distributions & CLT: Introduction To Probability & Statistics Dr. Oliver Russell

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Section 7: Sampling Distributions & CLT

Introduction to Probability & Statistics

Dr. Oliver Russell

201 - SN1

Lecture 2: Sections 7.2-7.4

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 1 / 17
RECAP: parameters vs. statistics
Definition
A population parameter is a numerical descriptive measure of a
population. Because it is based on the observations in the population, its
value is almost always unknown.

Definition
A sample statistic is a numerical descriptive measure of a sample. It is
calculated from the observations in the sample.

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 2 / 17
RECAP: good estimator: summary

In order to make an inference about a population parameter, we choose a


sample statistic with a sampling distribution that:

is unbiased,

has a smaller standard deviation than the others (i.e. the statistic
with the smallest standard error ).

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 3 / 17
RECAP: sampling distribution of X̄ from normal
population

Theorem
Consider a random sample of n observations selected from a normal
population with mean µ and standard deviation σ. Then the sampling
distribution of X̄ is normal with mean

µX̄ = µ

and standard deviation


σ
σX̄ = √ .
n

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 4 / 17
RECAP: Central Limit Theorem (CLT)

Theorem
Consider a random sample of n observations selected from any population
with mean µ and standard deviation σ. Then, when n is sufficiently large,
the sampling distribution of X̄ will be approximately normal with mean

µX̄ = µ

and standard deviation


σ
σX̄ = √ .
n

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 5 / 17
RECAP: sampling distribution of X̄
Thus, the CLT says that for large enough n, approximately,

σ2
 
X̄ ∼ N µ, .
n

or, equivalently,
X̄ − µ
√ = Z ∼ N (0, 1) .
σ/ n
The larger n is, the closer X̄ becomes to a true normal distribution. For
most sampled populations, sample sizes of n ≥ 30 will suffice for the
normal approximation to be reasonable.
Furthermore,

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 6 / 17
Quick quiz (Example 7.2.1)
At Marianopolis, the R-score of the student population has an average
value of 29.2, with 2.8 standard deviation.
1 If a sample of size n = 49 is selected from the Marianopolis student

population, what approximate distribution does the sample mean


(that is, X̄ ) obey?
2 What is the probability that the average R-score computed from a

sample of 49 randomly selected Marianopolis students is at least 30?


3 What values of the sample mean, centered around µ = 29.2, are likely

to occur with 95% probability?

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 7 / 17
Quick quiz (Example 7.2.1)

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 8 / 17
Population vs. sample proportions

Definition
When discussing data which only have 2 potential outcomes (say, success
or failure), the binomial proportion of a population, p, is the
population’s proportion of successes.

Definition
The sample proportion, P̂, is a random variable representing the
proportion of successes in a randomly drawn sample.

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 9 / 17
Corollary of CLT for proportions

Corollary
By the CLT, if a sample size is large enough, then it turns out that the
random variable P̂ is also approximately normally distributed with mean

µP̂ = p

and standard deviation r


p(1 − p)
σP̂ = .
n

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 10 / 17
Sampling distribution of P̂

In other words, if n large enough, then approximately,


 
p(1 − p)
P̂ ∼ N p, .
n

or, equivalently,
P̂ − p
p ∼ N (0, 1) .
p(1 − p)/n
Here, large enough means n ≥ 30, np ≥ 10 and n(1 − p) ≥ 10.

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 11 / 17
Quick quiz
Prove the result on the previous slide. Hint: start with the original CLT
(for X̄ ) and recognize that any randomly chosen observation, Xi , in a
dichotomous sample is a Bernoulli(p) random variable.

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 12 / 17
Quick quiz
60% of a city’s voters favour candidate A for mayor. In a random sample
of 100 voters, what is the probability that fewer than half are in favour of
candidate A?

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 13 / 17
Quick quiz (Example 7.4.1)
Let us assume that 72% of current Marianopolis students exercise at least
5 hours (on average) per week. A sample of 81 students is collected at
random and are asked to complete a questionnaire about their health and
exercise habits.
1 True or false: 72% of the students in the sample exercise at least 5

hours per week (on average)?


2 Let P̂ = “The proportion of students in the sample who exercise 5 or

more hours per week.” What distribution does variable P̂ obey?


Justify your claim.
3 What is the probability that two thirds or less of the surveyed

students exercise at least 5 hours per week?


4 What values of the sample proportion, centered around p = 0.72, are

likely to occur with 90% probability?

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 14 / 17
Quick quiz (Example 7.4.1)

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 15 / 17
Quick quiz (Example 7.4.1)

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 16 / 17
Summary: sampling distributions of X̄ and P̂

201 - SN1 Section 7: Sampling Distributions & CLT Lecture 2: Sections 7.2-7.4 17 / 17

You might also like