CH 07 CLT

Sampling Distributions
___________________________________________
1) Revisit the difference between a statistic and a

parameter?
2) Discuss factors that determine whether an estimate

of a parameter is ‘good’ or ‘bad’.
3) Define a ‘sampling distribution’ and discuss the

properties of same.
4) Answer the following burning question: Why do

we take relatively large samples of data?
How can I estimate the number of siblings that
people in this class have?
___________________________________________
Take a sample and calculate:

a) mean
b) median
c) mode
d) (High Score + Low Score) / 2
How do I know which of these options is the best?
1) Working with a known population

Take a sample from a population with
known parameters and calculate different
stats (e.g., mean, median, mode, [High +
Low] / 2) and compare them with the
population parameter.
2) Repeated Samples method

Take population with known parameters and
see how the distributions of the different
statistics compare with the population
parameter.
What's a Sampling Distribution?
___________________________________________
Sampling Distribution - probability distribution

calculated from repeated samples of n
measurements
We are going to model sampling distributions as

continuous RVs (eventually).
Why is this appropriate?

B/C mean does not have to be a possible
outcome of the experiment.
What does this buy us?

B/C we know how to calculate the area under the
normal curve for continuous RVs we can
calculate the probability of obtaining a sample
with a given statistic (e.g., mean) from a
population.
How are we going to do this?

Patience, my child. All will be revealed.
Constructing a sampling distribution
___________________________________________
3 5 7 9 11
 = 7
 = 3.2  3
How many unique samples could we draw from this

population (without replacement) if n = 2?
Important things about this example
___________________________________________
1) A sampling distribution can be constructed by

taking repeated samples from the population.
2) This information can be used to determine how

well the sample statistic matches the population.
3) Note in this case that the mean of the sampling

distribution was equal to the mean of the
population, and that the standard deviation of the
sampling distribution was smaller than that of
the population.
4) Still haven’t told you what makes for a good

statistic.
Properties of a good estimator
________________________________________
Point Estimator - rule or formula that tells us how to

use the sample data to calculate a single number
that can be used as an estimate of the population
parameter (really just another word for statistic).
A good point estimator (statistic) is:

(a) unbiased
mean of the sampling distribution
equals the mean of the population.
(b) minimum variability
The variability of the sampling
distribution is called the Standard Error.
Sometimes referred to as reliability.
Can we control biasedness?

What if the mean of the sampling distribution is
too high/low?
Can we control variability?

a) Choose random samples
b) Choose large samples
If we can only have one, which one do we want?

So, you want to construct a sampling distribution…
________________________________________
Not so fast, Skippy. Can you envision a problem that

might prevent you from constructing a sampling
distribution?
Let’s construct a sampling distribution for n=5 for

this class:
a) How many observations would be in the
sampling distribution?
b) What about samples of 20 at AC?
Can computer technology save us?

Restricted samples: Yes.
Unrestricted samples: No.
________________________________________
Is this the end?

Is class dismissed until the final?
Is there no way to save the semester?
Our hero
________________________________________
Central Limit Theorem - When n, the number of

observations in a sample taken from a
population, is sufficiently large (n  30), the
sampling distribution of M (the mean of the
sample) will be approximately normal.
Further, the larger n gets, the more closely the

sampling distribution will approximate a normal
distribution.
Finally,
a)  = 
M and
b)  =  / n
M and
c) z = (M - ) /  M
= (M- ) / ( / n)
Using the CLT: Rush Example
________________________________________
You are deciding whether or not to rush  (it’s a

special Stats Honor Fraternity) and, because you are
the type of person who would rush a Stats Honor
Fraternity, you want to know what the average
intelligence level of the frat is. You ask Eric
Stratton, the Rush Chairman (he seemed real glad to
meet you) what the average GPA in the house is. He
says, “ = 3.5 and  = .6”. You randomly poll 36
fraternity members and find that the mean of the
sample is 3.4. What do you conclude?
P(z  [M-] / [/n])

P(z  [3.4-3.5] / [.6/36])
P(z  [-.1 / .1])
P(z  -1)= Area(Tail -1.0) = .1587
Would you alter your conclusion if the mean of the

sample was 3.2? How?
P(z  [M-] / [/n])

P(z  [3.2-3.5] / [.6/36])
P(z  [-.3 / .1])
P(z  -3)= Area(Tail -3.0) = .0013
More Chips Ahoy
___________________________________________
Remember a few weeks ago, you and Biff were

trying to figure out the probability that ONE Chips
Ahoy cookie, which is supposed to have 23 chips
could have as few as 17 chips. Let's say you re-
conduct the experiment, but you're smarter now, so
rather than examine 1 cookie, you collect a sample of
49 cookies (I imagine you got sick after eating the
stimuli). The mean number of chips in your sample
was 20, and the standard deviation was 17.5 chips.
Do you have just cause for a legal action against
Chips Ahoy? In other words, what is the probability
that your sample of cookies was drawn from a
population with  = 23?
Central Limit Theorem with Proportions
________________________________________
The central limit theorem applies to proportional data

just as well as it does to numerical data (e.g., coin-
flipping example).
Central Limit Theorem with proportions:

1. The larger the sample, the more normal the
sampling distribution will be.
2. p = P
3.
4.
Applying the CLT with proportions: Blood Example
________________________________________
Nine percent of the U.S. Population has Type B

blood. What is the probability that 12.5% of a
random sample of 400 people will have Type B
blood?
P(P  .125)
p =
=
= .014
P (z  [.125 – .09] / .014)

P (z  2.5)
Area (Tail: 2.5) = .0062

CLT with proportions: Christmas Example
________________________________________
Sixty percent of the U.S. Population believes that

Christmas presents should be opened on Christmas
morning, as opposed to Christmas Eve. What is the
probability that 65 people out of a random sample of
125 will agree that Christmas morning is the
appropriate time to open presents?
Why do we sample?
________________________________________
1) To ensure an unbiased estimator (i.e., random

sample).
2) To decrease the variability of our estimator (i.e.,

increase its reliability).
3) To enable us to use the Central Limit Theorem as a

way of modeling chance variation in our sample.

CH 07 CLT

Uploaded by

Copyright:

Available Formats

CH 07 CLT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 07 CLT

Uploaded by

Copyright:

Available Formats

Sampling Distributions

1) Revisit the difference between a statistic and a

2) Discuss factors that determine whether an estimate

3) Define a ‘sampling distribution’ and discuss the

4) Answer the following burning question: Why do

Take a sample and calculate:

How do I know which of these options is the best?

1) Working with a known population

2) Repeated Samples method

Sampling Distribution - probability distribution

We are going to model sampling distributions as

Why is this appropriate?

What does this buy us?

How are we going to do this?

How many unique samples could we draw from this

1) A sampling distribution can be constructed by

2) This information can be used to determine how

3) Note in this case that the mean of the sampling

4) Still haven’t told you what makes for a good

Point Estimator - rule or formula that tells us how to

A good point estimator (statistic) is:

Can we control biasedness?

Can we control variability?

If we can only have one, which one do we want?

Not so fast, Skippy. Can you envision a problem that

Let’s construct a sampling distribution for n=5 for

b) What about samples of 20 at AC?

Can computer technology save us?

Is this the end?

Central Limit Theorem - When n, the number of

Further, the larger n gets, the more closely the

You are deciding whether or not to rush  (it’s a

P(z  [M-] / [/n])

Would you alter your conclusion if the mean of the

P(z  [M-] / [/n])

Remember a few weeks ago, you and Biff were

The central limit theorem applies to proportional data

Central Limit Theorem with proportions:

Nine percent of the U.S. Population has Type B

P (z  [.125 – .09] / .014)

Area (Tail: 2.5) = .0062

Sixty percent of the U.S. Population believes that

1) To ensure an unbiased estimator (i.e., random

2) To decrease the variability of our estimator (i.e.,

3) To enable us to use the Central Limit Theorem as a

You might also like