CH 07 CLT

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 14

Sampling Distributions

___________________________________________

1) Revisit the difference between a statistic and a


parameter?

2) Discuss factors that determine whether an estimate


of a parameter is ‘good’ or ‘bad’.

3) Define a ‘sampling distribution’ and discuss the


properties of same.

4) Answer the following burning question: Why do


we take relatively large samples of data?
How can I estimate the number of siblings that
people in this class have?
___________________________________________

Take a sample and calculate:


a) mean
b) median
c) mode
d) (High Score + Low Score) / 2

How do I know which of these options is the best?

1) Working with a known population


Take a sample from a population with
known parameters and calculate different
stats (e.g., mean, median, mode, [High +
Low] / 2) and compare them with the
population parameter.

2) Repeated Samples method


Take population with known parameters and
see how the distributions of the different
statistics compare with the population
parameter.
What's a Sampling Distribution?
___________________________________________

Sampling Distribution - probability distribution


calculated from repeated samples of n
measurements

We are going to model sampling distributions as


continuous RVs (eventually).

Why is this appropriate?


B/C mean does not have to be a possible
outcome of the experiment.

What does this buy us?


B/C we know how to calculate the area under the
normal curve for continuous RVs we can
calculate the probability of obtaining a sample
with a given statistic (e.g., mean) from a
population.

How are we going to do this?


Patience, my child. All will be revealed.
Constructing a sampling distribution
___________________________________________

3 5 7 9 11

 = 7
 = 3.2  3

How many unique samples could we draw from this


population (without replacement) if n = 2?
Important things about this example
___________________________________________

1) A sampling distribution can be constructed by


taking repeated samples from the population.

2) This information can be used to determine how


well the sample statistic matches the population.

3) Note in this case that the mean of the sampling


distribution was equal to the mean of the
population, and that the standard deviation of the
sampling distribution was smaller than that of
the population.

4) Still haven’t told you what makes for a good


statistic.
Properties of a good estimator
________________________________________

Point Estimator - rule or formula that tells us how to


use the sample data to calculate a single number
that can be used as an estimate of the population
parameter (really just another word for statistic).

A good point estimator (statistic) is:


(a) unbiased
mean of the sampling distribution
equals the mean of the population.
(b) minimum variability
The variability of the sampling
distribution is called the Standard Error.
Sometimes referred to as reliability.

Can we control biasedness?


What if the mean of the sampling distribution is
too high/low?

Can we control variability?


a) Choose random samples
b) Choose large samples

If we can only have one, which one do we want?


So, you want to construct a sampling distribution…
________________________________________

Not so fast, Skippy. Can you envision a problem that


might prevent you from constructing a sampling
distribution?

Let’s construct a sampling distribution for n=5 for


this class:
a) How many observations would be in the
sampling distribution?

b) What about samples of 20 at AC?

Can computer technology save us?


Restricted samples: Yes.
Unrestricted samples: No.
________________________________________

Is this the end?


Is class dismissed until the final?
Is there no way to save the semester?
Our hero
________________________________________

Central Limit Theorem - When n, the number of


observations in a sample taken from a
population, is sufficiently large (n  30), the
sampling distribution of M (the mean of the
sample) will be approximately normal.

Further, the larger n gets, the more closely the


sampling distribution will approximate a normal
distribution.

Finally,
a)  = 
M and
b)  =  / n
M and
c) z = (M - ) /  M

= (M- ) / ( / n)
Using the CLT: Rush Example
________________________________________

You are deciding whether or not to rush  (it’s a


special Stats Honor Fraternity) and, because you are
the type of person who would rush a Stats Honor
Fraternity, you want to know what the average
intelligence level of the frat is. You ask Eric
Stratton, the Rush Chairman (he seemed real glad to
meet you) what the average GPA in the house is. He
says, “ = 3.5 and  = .6”. You randomly poll 36
fraternity members and find that the mean of the
sample is 3.4. What do you conclude?

P(z  [M-] / [/n])


P(z  [3.4-3.5] / [.6/36])
P(z  [-.1 / .1])
P(z  -1)= Area(Tail -1.0) = .1587

Would you alter your conclusion if the mean of the


sample was 3.2? How?

P(z  [M-] / [/n])


P(z  [3.2-3.5] / [.6/36])
P(z  [-.3 / .1])
P(z  -3)= Area(Tail -3.0) = .0013
More Chips Ahoy
___________________________________________

Remember a few weeks ago, you and Biff were


trying to figure out the probability that ONE Chips
Ahoy cookie, which is supposed to have 23 chips
could have as few as 17 chips. Let's say you re-
conduct the experiment, but you're smarter now, so
rather than examine 1 cookie, you collect a sample of
49 cookies (I imagine you got sick after eating the
stimuli). The mean number of chips in your sample
was 20, and the standard deviation was 17.5 chips.
Do you have just cause for a legal action against
Chips Ahoy? In other words, what is the probability
that your sample of cookies was drawn from a
population with  = 23?
Central Limit Theorem with Proportions
________________________________________

The central limit theorem applies to proportional data


just as well as it does to numerical data (e.g., coin-
flipping example).

Central Limit Theorem with proportions:


1. The larger the sample, the more normal the
sampling distribution will be.

2. p = P

3.

4.
Applying the CLT with proportions: Blood Example
________________________________________

Nine percent of the U.S. Population has Type B


blood. What is the probability that 12.5% of a
random sample of 400 people will have Type B
blood?
P(P  .125)

p =
=
= .014

P (z  [.125 – .09] / .014)


P (z  2.5)

Area (Tail: 2.5) = .0062


CLT with proportions: Christmas Example
________________________________________

Sixty percent of the U.S. Population believes that


Christmas presents should be opened on Christmas
morning, as opposed to Christmas Eve. What is the
probability that 65 people out of a random sample of
125 will agree that Christmas morning is the
appropriate time to open presents?
Why do we sample?
________________________________________

1) To ensure an unbiased estimator (i.e., random


sample).

2) To decrease the variability of our estimator (i.e.,


increase its reliability).

3) To enable us to use the Central Limit Theorem as a


way of modeling chance variation in our sample.

You might also like