Topic07 Wrriten
Topic07 Wrriten
Topic07 Wrriten
• The resulting number is called a point estimate. It can be regarded as a sensible value
for θ, and is obtained by selecting a suitable statistic and computing its value from the
given sample data.
Parameter, θ Statistic, θ̂
Mean µ X
Variance σ2 S2
Standard Deviation σ S
Proportion p P̂
Example 7.1. An automobile manufacturer has developed a new type of bumper, which is
supposed to absorb impacts with less damage than previous bumpers. The manufacturer has
used this bumper in a sequence of 25 controlled crashes against a wall, each at 15 km/h, using
one of its compact car models. Let X = number of crashes that result in no visible damage
to the automobile. The parameter to be estimated is p = proportion of all such crashes that
result in no damage. If X is observed to be x = 15, find a point estimate of the parameter and
state the corresponding point estimator.
Example 7.2. A study reported observations on X = voids filled with asphalt (%) for 52
specimens of a certain type of hot-mix asphalt. Find a point estimate of the variance σ 2 of the
population distribution given that (xi − x)2 = 2097.4124, and state the corresponding point
P
estimator.
2
Example 7.3. Consider the accompanying 20 observations on dielectric breakdown voltage for
pieces of epoxy resin.
Consider the following estimators and compute the corresponding estimate. Which of the
estimates do you think is closest to the true value?
X
a. estimator = X =
n
min(Xi ) + max(Xi )
b. estimator =
2
Without knowing the true value, we will not know which of the estimates is closest to the
true value. A typical way to choose an estimator is to find estimators that have some specified
desirable property and then find the best estimator in this restricted group. A popular property
of this sort in the statistical community is unbiasedness.
Example 7.4. Suppose we have two measuring instruments; one instrument has been accu-
rately calibrated, but the other systematically gives readings larger than the true value being
measured. When each instrument is used repeatedly on the same object, because of mea-
surement error, the observed measurements will not be identical. However, the measurements
produced by the first instrument will be distributed about the true value in such a way that
on average this instrument measures what it purports to measure, so it is called an unbiased
instrument. The second instrument yields observations that have a systematic error component
or bias.
3
E(θ̂) = θ
If θ̂ is biased,
Bias(θ̂) = E(θ̂) − θ
Example 7.5. Let X be a binomial random variable with parameters n and p, i.e. X ∼
X
Bin(n, p). Show that the sample proportion P̂ = is an unbiased estimator of p.
n
4
a. Show that the sample mean X is an unbiased estimator of µ and the sample variance S 2
is an unbiased estimator of σ 2 .
5
(n − 1)S 2
b. Determine whether the estimator is an unbiased estimator of σ 2 . If it is not
n
unbiased, find the bias.
Among all estimators of θ that are unbiased, choose the one that has minimum variance.
The resulting θ̂ is called the minimum variance unbiased estimator of θ.
Figure (a) below shows distributions of two different unbiased estimators. Use of the estimator
with the more concentrated distribution is more likely than the other one to result in an estimate
closer to θ. Figure (b) displays estimates from the two estimators based on 10 different samples.
The MVUE is, in a certain sense, the most likely among all unbiased estimators to produce an
estimate close to the true θ.
Note 7.1. In some situations, it is possible to obtain an estimator with small bias that would
be preferred to the best unbiased estimator.
7
Besides reporting the value of a point estimate, some indication of its precision should be given.
If the standard error itself involves unknown parameters whose values can be estimated,
substitution of these estimates into σθ̂ yields the estimated standard error of the esti-
mator
σ̂θ̂ = sθ̂
Example 7.9. Recall Example 7.1 where we found that a point estimator of the parameter p
X
is P̂ = .
n
a. What is the standard error of P̂ ?
• Parameter of interest: µ.
σ σ
d. Show that P X − 1.96 √ < µ < X + 1.96 √ = 0.95.
n n
• Interpretation: We are 95% confident that the population mean lies between the lower
limit and the upper limit.
12
σ σ
100(1 − α)% CI for µ = x − zα/2 √ , x + zα/2 √
n n
A 100(1 − α)% CI for the mean µ of a normal population when the value of σ is known is
given by
σ σ
x − zα/2 √ , x + zα/2 √
n n
or, equivalently, by
σ
x ± zα/2 √
n
90%
95%
99%
13
Example 7.12. Industrial engineers who specialize in ergonomics are concered with designing
workspace and work-operated devices so as to achieve high productivity and comfort. A sample
of n = 31 trained typists was selected, and the preferred keyboard height was determined for
each typist. The resulting sample average preferred height was x = 80.0 cm. Assuming that
the preferred height is normally distributed with σ = 2.0 cm, obtain a 95% confidence interval
for µ, the true average preferred height for the population of all experienced typists.
14
Example 7.13. The production process for engine control housing units of a particular type
has recently been modified. Prior to this modification, historical data had suggested that the
distribution of hole diameters for bushings on the housings was normal with a standard deviation
of 0.100 mm. It is believed that the modification has not affected the shape of distribution or
the standard deviation, but that the value of the mean diameter may have changed. A sample
of 40 housing units is selected and hole diameter is determined for each one, resulting in a
sample mean diameter of 5.426 mm. Calculate a confidence interval for the true average hole
diameter using a confidence level of 90%.
15
The sample size n necessary for a 100(1 − α)% CI for the mean µ of a normal population
when the value of σ is known and an error bound of B can be derived as follows:
σ
100(1 − α)%CI for µ = x ± z α2 √
n
σ
Let B = z α2 √ . Then,
n
√ zα σ z α σ 2
n = 2 =⇒ n = 2
B B
Example 7.14. Extensive monitoring of a computer time-sharing system has suggested that
response time to a particular editing command is normally distributed with standard deviation
25 millisec. A new operating system has been installed, and we wish to estimate the true
average response time µ for the new environment. Assuming that response times are still
normally distributed with σ = 25, what sample size is necessary to ensure that the resulting
95% confidence interval has a width of at most 10?
16
σ2
X ∼ N µ,
n
because
– E(X) = µ
σ2
– Var(X) =
n
– X ∼ Normal by the Central Limit Theorem
A 100(1 − α)% CI for the mean µ of a population when n is sufficiently large is given by
σ σ
x − zα/2 √ , x + zα/2 √
n n
or, equivalently, by
σ
x ± zα/2 √
n
The sample size necessary for a 100(1 − α)% CI for the mean µ of a population when n is
sufficiently large and an error bound of B is given by
z σ 2
α/2
n=
B
17
Example 7.16. The time (in minutes) for a certain chemical reaction is to be determined in
a sample of size n. If the investigator believes that almost all times in the distribution are
between 320 and 440, what sample size would be appropriate for estimating the true average
time to within 5 minutes with a confidence level of 95%?
• In this example, σ is unknown. The value of s is also not available before the data has
been gathered.
• By being conservative and guessing a larger value of s, an n larger than necessary will be
chosen. The investigator may be able to specify a reasonably accurate value of the pop-
ulation range (the difference between the largest and smallest values). If the population
distribution is not too skewed, dividing the range by 4 gives a ballpark value of what s
might be.
18
Let X1 , X2 , . . ., Xn be a random sample from a population of zeros and ones with probability
of success p. If n is sufficiently large (np ≥ 10 and n(1 − p) ≥ 10), we have
p(1 − p)
P̂ = X ∼ N p,
n
because
• E(X) = p
p(1 − p)
• Var(X) =
n
• X ∼ Normal by the Central Limit Theorem
p̂(1 − p̂)
With n sufficiently large, Var(X) ≈ .
n
A 100(1 − α)% CI for the proportion p of a population when n is sufficiently large is given
by r r !
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 , p̂ + zα/2
n n
or, equivalently, by r
p̂(1 − p̂)
p̂ ± zα/2
n
The sample size necessary for a 100(1 − α)% CI for the proportion p of a population when
n is sufficiently large and an error bound of B is given by
z 2
α/2
n= p(1 − p) if a value of p is available
B
z 2
α/2
n= otherwise
2B
19
Example 7.17. Let X1 , X2 , . . ., Xn be a random sample from a population of zeros and ones
p(1 − p)
with probability of success p. Show that E(X) = p and Var(X) = .
n
20
Example 7.18. Derive a formula for the sample size necessary for a 100(1 − α)% confidence
interval for p from a population when n is sufficiently large and a bound on the error of
estimation B.
Example 7.19. Refer to the formula derived above, explain why the neccessary sample size is
maximized at p = 0.5.
21
Example 7.20. The article “Repeatability and Reproducibility for Pass/Fail Data” (J. of
Testing and Eval., 1997: 151–153) reported that in n = 48 trials in a particular laboratory, 16
resulted in ignition of a particular type of substrate by a lighted cigarette. Let p denote the
long-run proportion of all such trials that would result in ignition.
b. Find the sample size necessary to ensure a width of 0.10 assuming p = 0.33.
– Parameter of interest: µ.
– The population distribution is normal i.e. X ∼ N(µ, σ 2 ).
– The population standard deviation σ is unknown.
Then
X −µ
√ ∼ tn−1
S/ n
• tn−1 : t distribution with n − 1 degrees of freedom.
A 100(1 − α)% CI for the mean µ of a normal population when σ is unknown is given
by
s s
x − tα/2,n−1 √ , x + tα/2,n−1 √
n n
or, equivalently, by
s
x ± tα/2,n−1 √
n
• To check whether a sample is from a normal population, one can use a normal proba-
bility plot or normal quantile-quantile plot (Q-Q plots). If the sample observations
are in fact drawn from a normal distribution with mean value and standard deviation,
the points should fall close to a straight line. Thus a plot for which the points fall close
to some straight line suggests that the assumption of a normal population distribution is
not violated. (The interested reader is referred to Section 4.6 of the textbook for further
details.)
(Drawing the Q-Q plot is beyond the scope of this course. The emphasis here is to know
how to interpret the plot.)
23
Example 7.21. Consider the following sample of fat content (in percentage) of n = 10 ran-
domly selected hot dogs:
25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5
a. Check if the sample is from a normal population.
b. Compute a 95% confidence interval for the population mean fat content.
Exercises
Section 6.1: 1, 7, 8, 9, 11, 13
Sections 7.1: 1, 2, 3, 4, 5; Sections 7.2: 13, 19, 23, 25; Section 7.3: 28, 29, 33(b)(c), 35(a), 37(a)