Sec. 6.7 T Test
Sec. 6.7 T Test
Sec. 6.7 T Test
6.7 Small-Sample Tests for the Difference Between Two Means 439
advertiser provided each couple in a random sample men who think that the new remote is easier to find,
of 500 married couples with a new type of TV remote and let p2 be the corresponding proportion of married
control that is supposed to be easier to find when women. Can the statistic ! p1 − !p2 = 0.62 − 0.54 be
needed. Of the 500 husbands, 62% said that the new used to test H0 : p1 − p2 = 0 versus H1 : p1 − p2 =
̸ 0?
remote was easier to find than their old one. Of the If so, perform the test and compute the P-value. If
500 wives, only 54% said the new remote was easier not, explain why not.
to find. Let p1 be the population proportion of married
14. The following MINITAB output presents the results of a hypothesis test for the difference p1 − p2 between two
population proportions.
15. The following MINITAB output presents the results of a hypothesis test for the difference p1 − p2 between two
population proportions. Some of the numbers are missing. Fill in the numbers for (a) through (d).
6.7 Small-Sample Tests for the Difference Between Two Means 441
is therefore between 0.025 and 0.05 (see Figure 6.13). We conclude that the mean count
is lower when the enzyme is present.
0.05
0.025
0 1.895 2.365
2.038
FIGURE 6.13 The null distribution is Student’s t with seven degrees of freedom. The
observed value of the test statistic is 2.038. If H0 is true, the probability that t takes on
a value as extreme as or more extreme than that observed is between 2.5% and 5%.
Example
6.12
Good website design can make Web navigation easier. The article “The Implications
of Visualization Ability and Structure Preview Design for Web Information Search
Tasks” (H. Zhang and G. Salvendy, International Journal of Human-Computer Inter-
action, 2001:75–95) presents a comparison of item recognition between two designs.
A sample of 10 users using a conventional Web design averaged 32.3 items identified,
with a standard deviation of 8.56. A sample of 10 users using a new structured Web
design averaged 44.1 items identified, with a standard deviation of 10.09. Can we
conclude that the mean number of items identified is greater with the new structured
design?
Solution
Let X = 44.1 be the sample mean for the structured Web design. Then s X = 10.09 and
n X = 10. Let Y = 32.3 be the sample mean for the conventional Web design. Then
sY = 8.56 and n Y = 10. Let µ X and µY denote the population mean measurements
made by the structured and conventional methods, respectively. The null and alternate
hypotheses are
H0 : µ X − µY ≤ 0 versus H1 : µ X − µY > 0
The test statistic is
(X − Y ) − 0
t="
s X2 /n X + sY2 /n Y
Consulting the t table with 17 degrees of freedom, we find that the value cutting
off 1% in the right-hand tail is 2.567, and the value cutting off 0.5% in the right-
hand tail is 2.898. Therefore the area in the right-hand tail corresponding to values as
extreme as or more extreme than the observed value of 2.820 is between 0.005 and
0.010. Therefore 0.005 < P < 0.01 (see Figure 6.14). There is strong evidence that
the mean number of items identified is greater for the new design.
0.01
0.005
0 2.567 2.898
2.820
FIGURE 6.14 Solution to Example 6.12. The P-value is the area in the right-hand tail,
which is between 0.005 and 0.01.
The following computer output (from MINITAB) presents the results from Exam-
ple 6.12.
Note that the 95% lower confidence bound is consistent with the alternate hypothesis.
This indicates that the P-value is less than 5%.
The methods described in this section can be used to test a hypothesis that two
population means differ by a specified constant. Example 6.13 shows how.
Example
6.13
Refer to Example 6.12. Can you conclude that the mean number of items identified
with the new structured design exceeds that of the conventional design by more
than 2?
Solution
The null and alternate hypotheses are
H0 : µ X − µY ≤ 2 versus H1 : µ X − µY > 2
Navidi-3810214 book November 11, 2013 14:8
6.7 Small-Sample Tests for the Difference Between Two Means 443
Summary
Let X 1 , . . . , X n X and Y1 , . . . , Yn Y be samples from normal populations with
means µ X and µY and standard deviations σ X and σY , respectively. Assume
the samples are drawn independently of each other.
If σ X and σY are not known to be equal, then, to test a null hypothesis of the
form H0 : µ X − µY ≤ #0 , H0 : µ X − µY ≥ #0 , or H0 : µ X − µY = #0 :
[(s X2 /n X ) + (sY2 /n Y )]2
■ Compute ν = , rounded
[(s X /n X )2 /(n X − 1)] + [(sY2 /n Y )2 /(n Y
2
− 1)]
down to the nearest integer.
(X − Y ) − #0
■ Compute the test statistic t = " 2 .
s X /n X + sY2 /n Y
■ Compute the P-value. The P-value is an area under the Student’s t curve
with ν degrees of freedom, which depends on the alternate hypothesis as
follows:
Alternate Hypothesis P-value
H1 : µ X − µY > #0 Area to the right of t
H1 : µ X − µY < #0 Area to the left of t
H1 : µ X − µY =
̸ #0 Sum of the areas in the tails cut off by t and −t
Example
6.14
Two methods have been developed to determine the nickel content of steel. In a
sample of five replications of the first method on a certain kind of steel, the average
measurement (in percent) was X = 3.16 and the standard deviation was s X = 0.042.
The average of seven replications of the second method was Y = 3.24 and the standard
deviation was sY = 0.048. Assume that it is known that the population variances are
nearly equal. Can we conclude that there is a difference in the mean measurements
between the two methods?
Solution
Substituting the sample sizes n X = 5 and n Y = 7 along with the sample standard
deviations s X = 0.042 and sY = 0.048, we compute the pooled standard deviation,
obtaining s p = 0.0457.
The value of the test statistic is therefore
3.16 − 3.24
t= √ = −2.990
0.0457 1/5 + 1/7
Under H0 , the test statistic has the Student’s t distribution with 10 degrees of freedom.
Consulting the Student’s t table, we find that the area under the curve in each tail is
between 0.01 and 0.005. Since the null hypothesis stated that the means are equal, this
is a two-tailed test, so the P-value is the sum of the areas in both tails. We conclude
that 0.01 < P < 0.02 (see Figure 6.15). There does appear to be a difference in the
mean measurements between the two methods.
0.01 0.01
0.005 0.005
!2.990 2.990
FIGURE 6.15 Solution to Example 6.14. The P-value is the sum of the areas in both
tails, which is between 0.01 and 0.02.
6.7 Small-Sample Tests for the Difference Between Two Means 445
In situations where the sample variances are nearly equal, it is tempting to assume
that the population variances are nearly equal as well. This assumption is not justified,
however, because it is possible for the sample variances to be nearly equal even when
the population variances are quite different. Computer packages often offer a choice of
assuming variances to be equal or unequal. The best practice is to assume the variances to
be unequal unless it is quite certain that they are equal. See the discussion in Section 5.6.
Summary
Let X 1 , . . . , X n X and Y1 , . . . , Yn Y be samples from normal populations with
means µ X and µY and standard deviations σ X and σY , respectively. Assume
the samples are drawn independently of each other.
If σ X and σY are known to be equal, then, to test a null hypothesis of the
form H0 : µ X − µY ≤ #0 , H0 : µ X − µY ≥ #0 , or H0 : µ X − µY = #0 :
%
(n X − 1)s X2 + (n Y − 1)sY2
■ Compute s p = .
n X + nY − 2
(X − Y ) − #0
■ Compute the test statistic t = √ .
s p 1/n X + 1/n Y
■ Compute the P-value. The P-value is an area under the Student’s t curve
with n X + n Y − 2 degrees of freedom, which depends on the alternate
hypothesis as follows:
Alternate Hypothesis P-value
H1 : µ X − µY > #0 Area to the right of t
H1 : µ X − µY < #0 Area to the left of t
H1 : µ X − µY =
̸ #0 Sum of the areas in the tails cut off by t and −t
penetration resistance, expressed as a multiple of a a. Can you conclude that the mean time to perform
standard quantity, for a certain fine-grained soil. Fif- a lift of low difficulty is less when using the
teen measurements taken at a depth of 1 m had a mean video system than when using the tagman system?
of 2.31 with a standard deviation of 0.89. Fifteen mea- Explain.
surements taken at a depth of 2 m had a mean of 2.80 b. Can you conclude that the mean time to perform
with a standard deviation of 1.10. Can you conclude a lift of moderate difficulty is less when using the
that the penetration resistance differs between the two video system than when using the tagman system?
depths? Explain.
c. Can you conclude that the mean time to perform
4. The article “Time Series Analysis for Construction
a lift of high difficulty is less when using the
Productivity Experiments” (T. Abdelhamid and J. Ev-
video system than when using the tagman system?
erett, Journal of Construction Engineering and Man-
Explain.
agement, 1999:87–95) presents a study comparing the
effectiveness of a video system that allows a crane 5. The Mastic tree (Pistacia lentiscus) is used in re-
operator to see the lifting point while operating the forestation efforts in southeastern Spain. The article
crane with the old system in which the operator re- “Nutrient Deprivation Improves Field Performance
lies on hand signals from a tagman. Three different of Woody Seedlings in a Degraded Semi-arid Shrub-
lifts, A, B, and C, were studied. Lift A was of little land” (R. Trubata, J. Cortina, and A. Vilagrosaa,
difficulty, lift B was of moderate difficulty, and lift C Ecological Engineering, 2011:1164–1173) presents
was of high difficulty. Each lift was performed several a study that investigated the effect of adding slow-
times, both with the new video system and with the release fertilizer to the usual solution on the growth
old tagman system. The time (in seconds) required to of trees. Following are the heights, in cm, of 10 trees
perform each lift was recorded. The following tables grown with the usual fertilizer (the control group),
present the means, standard deviations, and sample and 10 trees grown with the slow-release fertilizer
sizes. (treatment). These data are consistent with the mean
and standard deviation reported in the article. Can you
conclude that the mean height of plants grown with
Low Difficulty slow-release fertilizer is greater than that of plants
Standard Sample with the usual fertilizer?
Mean Deviation Size
Usual 17.3 22.0 19.5 18.7 19.5
Tagman 47.79 2.19 14 18.5 18.6 20.3 20.3 20.3
Video 47.15 2.65 40 Slow-release 25.2 23.2 25.2 26.2 25.0
25.5 25.2 24.1 24.8 23.6
6.7 Small-Sample Tests for the Difference Between Two Means 447
of producing 100 L of the chemical was determined content of 9.47 with a standard deviation of 2.22. Can
each time. The results, in dollars, were as follows: you conclude that the mean iron oxide content differs
New Process: 51 52 55 53 54 53 between limenite grain and limenite lamella?
Old Process: 50 54 59 56 50 58 11. The article “Structural Performance of Rounded
Can you conclude that the mean cost of the new Dovetail Connections Under Different Loading Con-
method is less than that of the old method? ditions” (T. Tannert, H. Prion, and F. Lam, Can J Civ
Eng, 2007:1600–1605) describes a study of the de-
8. The article “Effects of Aerosol Species on Atmo- formation properties of dovetail joints. In one experi-
spheric Visibility in Kaohsiung City, Taiwan” (C. Lee, ment, 10 rounded dovetail connections and 10 double
C. Yuan, and J. Chang, Journal of Air and Waste Man- rounded dovetail connections were loaded until fail-
agement, 2005:1031–1041) reported that for a sample ure. The rounded connections had an average load
of 20 days in the winter, the mass ratio of fine to coarse at failure of 8.27 kN with a standard deviation of
particles averaged 0.51 with a standard deviation of 0.62 kN. The double-rounded connections had an av-
0.09, and for a sample of 14 days in the spring the erage load at failure of 6.11 kN with a standard devia-
mass ratio averaged 0.62 with a standard deviation of tion of 1.31 kN. Can you conclude that the mean load
0.09. Let µ1 represent the mean mass ratio during the at failure is greater for rounded connections than for
winter and let µ2 represent the mean mass ratio dur- double-rounded connections?
ing the summer. It is desired to test H0 : µ2 − µ1 = 0
versus H1 : µ2 − µ1 = ̸ 0. 12. The article “Variance Reduction Techniques: Exper-
imental Comparison and Analysis for Single Sys-
a. Someone suggests that since the sample standard
tems” (I. Sabuncuoglu, M. Fadiloglu, and S. Celik,
deviations are equal, the pooled variance should be
IIE Transactions, 2008:538–551) describes a study of
used. Do you agree? Explain.
the effectiveness of the method of Latin Hypercube
b. Using an appropriate method, perform the test. Sampling in reducing the variance of estimators of
9. The article “Wind-Uplift Capacity of Residential the mean time-in-system for queueing models. For
Wood Roof-Sheathing Panels Retrofitted with In- the M/M/1 queueing model, ten replications of the
sulating Foam Adhesive” (P. Datin, D. Prevatt, experiment yielded an average reduction of 6.1 with a
and W. Pang, Journal of Architectural Engineering, standard deviation of 4.1. For the serial line model, ten
2011:144–154) presents a study of the failure pres- replications yielded an average reduction of 6.6 with
sures of roof panels. A sample of 15 panels constructed a standard deviation of 4.3. Can you conclude that the
with 8-inch nail spacing on the intermediate framing mean reductions differ between the two models?
members had a mean failure pressure of 8.38 kPa with 13. In an experiment to test the effectiveness of a new
a standard deviation of 0.96 kPa. A sample of 15 pan- sleeping aid, a sample of 12 patients took the new
els constructed with 6-inch nail spacing on the inter- drug, and a sample of 14 patients took a commonly
mediate framing members had a mean failure pressure used drug. Of the patients taking the new drug, the
of 9.83 kPa with a standard deviation of 1.02 kPa. Can average time to fall asleep was 27.3 minutes with a
you conclude that 6-inch spacing provides a higher standard deviation of 5.2 minutes, and for the patients
mean failure pressure? taking the commonly used drug the average time was
10. The article “Magma Interaction Processes Inferred 32.7 minutes with a standard deviation of 4.1 minutes.
from Fe-Ti Oxide Compositions in the Dölek and Can you conclude that the mean time to sleep is less
Sariçiçek Plutons, Eastern Turkey” (O. Karsli, F. for the new drug?
Aydin, et al., Turkish Journal of Earth Sciences, 14. Refer to Exercise 11 in Section 5.6. Can you conclude
2008:297–315) presents chemical compositions (in that the mean sodium content is higher for brand B
weight-percent) for several rock specimens. Fourteen than for brand A?
specimens (two outliers were removed) of limenite
grain had an average iron oxide (Fe2 O3 ) content of 15. Refer to Exercise 12 in Section 5.6. Can you conclude
9.30 with a standard deviation of 2.71, and seven spec- that the mean permeability coefficient at 60◦ C differs
imens of limenite lamella had an average iron oxide from that at 61◦ C?
Navidi-3810214 book November 11, 2013 14:8
16. The following MINITAB output presents the results of a hypothesis test for the difference µ X − µY between two
population means.
Two-sample T for X vs Y
Two-sample T for X vs Y
Vehicle
1 2 3 4 5 6 7 8