c08 Sampling
c08 Sampling
c08 Sampling
Sample Mean
To calculate the average, or mean, add all values, then
Bias divide by the number of individuals.
Any sampling procedure that produces inferences that X1 + X2 + · · · + Xn 1 n
consistently overestimate or consistently underestimate X= = ∑ Xi
n n i=1
some characteristic of the population is said to be bi-
ased. where X is the special symbol of the sample mean and
1 n
x = ∑ xi denotes its value, or the realization of X.
To eliminate any possibility of bias in the sam- n i=1
38 Chapter 8. Fundamental Sampling Distributions and Data Descriptions
N OTE . The mean is the balance point. It is the “center The sample variance “S2 ” is used to describe the vari-
of mass”. ation around the mean. We use
E XAMPLE 8.1. The weights of a group of students (in 1
s2 = (xi − x)2
lbs) are given below: n−1 ∑
" #
135 105 118 163 172 183 122 150 121 162 1 2 (∑ x)2
= x −
n−1 ∑ n
Find the mean. If another student joins in the group and
n ∑(x2 ) − (∑ x)2
his weight is 250 lbs, what would be the new mean? =
n(n − 1)
Find the median. If another student joins in the group N OTE . Properties of Standard Deviation
and his weight is 250 lbs, what would be the new me-
dian? • s measures spread about the mean and should be
used only when the mean is the measure of center.
Sample Mode • s = 0 only when all observations have the same
The mode of a data set is the value that occurs most value and there is no spread. Otherwise, s > 0.
frequently.
• s gets larger, as the observations become more
spread out about their mean.
The cases are unimodal, bimodal, multimodal
and no mode. The mode is/are the value(s) whose • s has the same units of measurement as the origi-
frequencies are the largest (the peaks). nal observations.
E XAMPLE 8.3. The weights of three group of students
N OTE . The standard
r deviation of a population is de-
(in lbs) are given below:
∑(x − µ)2
fined by σ = , where N is the population
(a) 135, 105, 118, 163, 172, 183, 122, 150 N
size and µ is population mean. Be careful with the de-
(b) 135, 105, 118, 163, 172, 183, 122, 135 nominator inside square-root is N, instead of N − 1.
(c) 135, 135, 118, 118, 122, 118, 122, 135 E XAMPLE 8.4. Calculate the sample variance and the
sample standard deviation of the following set of data:
Find the mode for each group.
0 1 −2 −3 9
8.3 Sampling Distributions • The variation of X is much smaller than that of the
population. The standard deviation of X decreases
as the sample size n increases.
Sampling Distribution
• The above results do NOT require any assump-
In general, the sampling distribution of a given statistic
tions on the shape of the population. However, a
is the distribution of the values taken by the statistic
random sample is a must.
in all possible samples of the same size form the same
population. E XAMPLE 8.7. The mean and standard deviation of the
strength of a packaging material are 55 kg and 6 kg, re-
In other words, if we repeatedly collect samples spectively. A quality manager takes a random sample of
of the same sample size from the population, compute specimens of this material and tests their strength. If the
the statistics (mean, standard deviation, proportion), manager wants to reduce the standard deviation of X to
and then draw a histogram of those statistics, the dis- 1.5 kg, how many specimens should be tested?
tribution of that histogram tends to have is called the
sample distribution of that statistics (mean, standard E XAMPLE 8.8. A soft-drink machine is regulated so
deviation, proportion). that the amount of drink dispensed averages 240 milliliters
with a standard deviation of 15 milliliters. Periodically,
N OTE . The statistical applets are good tools to study
the machine is checked by taking a sample of 40 drinks
the sampling distribution. Check out the Rice Univer-
and computing the average content. If the mean of the
sity Applets at http://onlinestatbook.com/stat_
40 drinks is a value within the interval µX ± 2σX , the
sim/sampling_dist/index.html.
machine is thought to be operating satisfactorily; other-
wise, adjustments are made. The company official found
the mean of 40 drinks to be x = 236 milliliters and con-
8.4 Sampling Distribution of Means cluded that the machine needed no adjustment. Was this
and the Central Limit Theorem a reasonable decision?
8.4.1 Sampling Distribution of Sample Means Sampling Distribution of Sample Means from a
from a Normal Population Normal Population
1 n
Theorem. Let X = ∑ Xi be the sample mean of a
Mean and Standard Deviation of a Sample Mean n i=1
random sample of size n drawn from a normal popu-
Theorem. Let X be the sample mean of a random sam-
lation having mean µ and standard deviation σ , then X
ple of size n drawn from a population having mean µ and
follows an exact normal
√ distribution with mean µ and
standard deviation σ , then the mean of X is
standard deviation σ / n. That is,
µX = µ √
Xi ∼ N (µ, σ ) =⇒ X ∼ N µ, σ / n .
and the standard deviation of X is
σ
σX = √ E XAMPLE 8.9. Prove the above theorem.
n
N OTE . • One of the essential assumptions is a ran-
E XAMPLE 8.6. Prove the above theorem. dom sample.
N OTE . • The sample mean X is an unbiased esti- (b) What is the probability that the average content of
mator of the population mean µ and is less vari- the bottles in a 12-pack of beer is less than 339
able than a single observation. ml?
E XAMPLE 8.11. A patient is classified as having ges- approximate probability statement concerning the sam-
tational diabetes if the glucose level is above 140 mil- ple mean, without knowledge of the shape of the popu-
ligrams per deciliter (mg/dl) one hour after a sugary drink lation distribution.
is ingested. Sheila’s measured glucose level one hour
after ingesting the sugary drink varies according to the • Again, one of the essential assumptions is a ran-
normal distribution with µ = 125 mg/dl and σ = 10 dom sample.
mg/dl.
• The distribution of X has the approximately nor-
(a) If a single glucose measurement is made, what is mal distribution if the random sample is from a
the probability that Sheila is diagnosed as having population other than normal.
gestational diabetes? • How large a sample size? Usually, it would safe
to apply the CLT if n ≥ 30. It also depends on the
(b) If measurements are made on three separate days
population distribution, however. More observa-
and the mean result is compared with the criterion
tions are required if the population distribution is
140 mg/dl, what is the probability that Sheila is
far from normal.
diagnosed as having gestational diabetes?
E XAMPLE 8.12. The time a family physician spends
(c) What is the level L such that there is probability
seeing a patient follows some right-skewed distribution
only 5% that the mean glucose level of three test
with a mean of 15 minutes and a standard deviation of
results fall above L for Sheila’s glucose level dis-
11.6 minutes.
tribution.
(a) Can you calculate the probability that the doctor
8.4.2 The Central Limit Theorem (CLT) spends less than 12 minutes with the next patient
she sees? If so, do it. If not, explain why.
(b) What is the probability that the doctor spends an
average time between 13 and 18 minutes with her
30 patients of the day?
(c) One day, 35 patients have an appointment to see
the doctor. What is the probability that she will
have to work overtime, beyond her 8-hour shift?
Theorem. If independent samples of size n1 and n2 are N OTE (Degrees of Freedom). There are n degrees of
drawn at random from two populations, discrete or con- freedom, or independent pieces of information, in the
tinuous, with means µ1 and µ2 and variances σ12 and σ22 , random sample from the normal distribution. When the
respectively, then the sampling distribution of the dif- data (the values in the sample) are used to compute the
ferences of means, X 1 − X 2 , is approximately normally mean (i.e., when µ is replaced by x), a degree of free-
distributed with mean and variance given by dom is lost in the estimation of µ. Hence, there are the
remaining (n − 1) degrees of freedom in the information
σ12 σ22 used to estimate σ 2 .
µX 1 −X 2 = µ1 − µ2 and σX2 = + .
1 −X 2 n1 n2
Let χα2 (ν) be the χ 2 value above which we find
So, an area of α under the curve of the chi-squared distri-
bution with ν degrees of freedom. That is,
X 1 − X 2 − (µ1 − µ2 ) ·
Z= q ∼ N(0, 1) P χ 2 (ν) > χα2 (ν) = α.
σ12 /n1 + σ22 /n2
We use table A.5. to find these critical values of the
N OTE . If both samples are from the normal popula- chi-squared distribution with ν degrees of freedom.
tions, the sampling distribution of X 1 − X 2 will be ex-
E XAMPLE 8.15. Find the critical values
actly normal, instead of approximate normal.
2 (4)
(a) χ0.95
E XAMPLE 8.13. We take a random sample of five 10-
year-old boys and four 10-year-old girls and measure 2 (22)
(b) χ0.75
their heights. Suppose that we know that heights X1 of
10-year old boys follow a normal distribution with mean E XAMPLE 8.16. Find k such that P χ 2 (12) < k = 0.80.
55.7 inches and standard deviation 2.9 inches, and that
heights X2 of 10-year old girls follow a normal distribu- E XAMPLE 8.17. Use Table A.5. to give the best esti-
tion with mean 54.1 inches and standard deviation 2.6 mate to each of the following probabilities.
inches. What is the probability that the mean height of
the girls in the sample is smaller than the mean height (a) P χ 2 (5) ≥ 3
for the boys in the sample?
(b) P χ 2 (8) > 3.33
E XAMPLE 8.14. A research on bulimia among college
(c) P χ 2 (10) ≤ 6.66
women studies the connection between childhood sexual
abuse and a measure of family cohesion (the higher the (d) P χ 2 (25) > 99.9
score, the greater the cohesion). Assume that sexually
abused students have an average family cohesion scale
of 2.8 and a standard deviation of 2.1, while non-abused
students have the average scale of 4.8 and a standard de-
8.6 t-Distribution
viation of 3.2. What is the probability that a random
sample of 49 non-abused students will have an average We have learned that Z = σX−µ √ (exactly or approxi-
/ n
family cohesion scale that is at least 0.5 scores higher mately) follows the standard normal distribution, where
than the average scale of a randoms sample of 36 sexu- the data are from a random sample of size n from the
ally abused students? What can you conclude? population with mean µ and standard deviation σ .
And, it is very likely that both µ and σ are unknown
parameters. In practice, it suffices that the distribu-
8.5 Sampling Distribution of S2 tion is symmetric and single-peaked unless the sample
is very small.
Since most of the simple work in statistical in-
Distribution of (n − 1)S2 /σ 2 ference focus on the unknown population mean µ, we
If S2 is the variance of a random sample of size n taken will need deal with the unknown σ especially when n is
from a normal population having the variance σ 2 , then not large. It is quite intuitive and natural to estimate
the statistic the unknown population standard deviation σ using
the sample standard deviation S.
n X −X 2
2 (n − 1)S2 i
χ = =∑ X −µ
σ2 i=1 σ2 We have another statistic T = √ as an ana-
S/ n
has a chi-squared distribution with ν = n − 1 degrees X −µ
log sample version of Z = √ .
of freedom. σ/ n