Ch09-Tests of Hypotheses For A Single Sample
Ch09-Tests of Hypotheses For A Single Sample
Ch09-Tests of Hypotheses For A Single Sample
for Engineers
Seventh Edition
Douglas C. Montgomery George C. Runger
Chapter 9
Tests of Hypotheses for a Single Sample
Chapter 9 Title Slide
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 1
9 Tests of Hypotheses for a Single Sample
CHAPTER OUTLINE
9.1 Hypothesis Testing 9.4 Tests of the Variance & Standard
9.1.1 Statistical Hypotheses Deviation of a Normal Distribution.
9.1.2 Tests of Statistical Hypotheses 9.4.1 Hypothesis Tests on the Variance
9.1.3 1-Sided & 2-Sided Hypotheses 9.4.2 Type II Error & Choice of Sample Size
9.1.4 𝑃-Values in Hypothesis Tests 9.5 Tests on a Population Proportion
9.1.5 Connection between Hypothesis 9.5.1 Large-Sample Tests on a Proportion
Tests & Confidence Intervals 9.5.2 Type II Error & Choice of Sample Size
9.1.6 General Procedure for 9.6 Summary Table of Inference Procedures
Hypothesis Tests
for a Single Sample
9.2 Tests on the Mean of a Normal
9.7 Testing for Goodness of Fit
Distribution, Variance Known
9.2.1 Hypothesis Tests on the Mean
9.8 Contingency Table Tests
9.2.2 Type II Error & Choice of Sample Size 9.9 Non-Parametric Procedures
9.2.3 Large-Sample Test 9.9.1 The Sign Test
9.3 Tests on the Mean of a Normal 9.9.2 The Wilcoxon Signed-Rank Test
Distribution, Variance Unknown 9.9.3 Comparison to the 𝑡-test
9.3.1 Hypothesis Tests on the Mean 9.10 Equivalence Testing
9.3.2 Type II Error & Choice of Sample Size 9.11 Combining 𝑃-values
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
Chapter 9 Contents 2
Learning Objectives for Chapter 9
After careful study of this chapter, you should be able to do the following:
1. Structure engineering decision-making problems as hypothesis tests
2. Test hypotheses on the mean of a normal distribution using a Z-test or a t-test
3. Test hypotheses on the variance or standard deviation of a normal distribution
4. Test hypotheses on a population proportion
5. Use the P-value approach for making decisions in hypothesis tests
6. Compute power & Type II error probability and make sample size selection decisions for tests
on means, variances and proportions
7. Explain & use the relationship between confidence intervals & hypothesis tests
8. Use the chi-square goodness-of-fit test to check distributional assumptions
9. Apply contingency table tests
10. Apply nonparametric tests
11. Use equivalence testing
12. Combine P-values
Chapter 9 Learning Objectives Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
3
Hypothesis Testing
• If this information is consistent with the hypothesis, then we will conclude that the
hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude
that the hypothesis is false.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
7
Computing the Probability of Type 𝑰 Error
= P( X 48 .5 when = 50 ) + P( X 51 .5 when = 50 )
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
8
Computing the Probability of Type 𝑰𝑰 Error
= P(48 .5 X 51 .5 when = 52 )
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
9
Power of a Statistical Test
• The power is computed as 1 − 𝛽, and power can be interpreted as the probability of correctly
rejecting a false null hypothesis.
• For example, consider the propellant burning rate problem when we are testing 𝐻 0: 𝜇 = 50 centimeters
per second against 𝐻 1: 𝜇 ≠ 50 centimeters per second. Suppose that the true value of the mean is 𝜇
= 52.
• When 𝑛 = 10, we found that 𝛽 = 0.2643, so the power of this test is 1 – 𝛽 = 1 − 0.2643 = 0.7357
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
10
One-Sided and Two-Sided Hypotheses
Two-Sided Test One-Sided Tests
H0: = 0
H0: = 0 or H0: = 0
H1: ≠ 0 H1: > 0 H1: < 0
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
11
𝑷-value
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
12
𝑷-values in Hypothesis Tests
• Consider the two-sided hypothesis test 𝐻0: = 50 against 𝐻1: 50 with 𝑛 16 and = 2.5.
Suppose that the observed sample mean is 𝑥ҧ = 51.3 centimeters per second.
• The P-value of the test is the probability above 51.3 plus the probability below 48.7.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
13
Connection Between Hypothesis Tests
and Confidence Intervals
• A close relationship exists between the test of a hypothesis for 𝜃, and the confidence interval
for 𝜃.
• If [𝑙, 𝑢] is a 100(1 − 𝛼) confidence interval for the parameter 𝜃, the test of size 𝛼 of the
hypothesis
𝐻0: = 0
𝐻1: 0
will lead to rejection of 𝐻0 if and only if 𝜃0 is not in the 100(1 − ) 𝐶𝐼 [𝑙, 𝑢].
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
14
General Procedure for Hypothesis Tests
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
15
Hypothesis Tests on the Mean
7. Conclusion: Since z0 = 3.25 and the p-value is = 2[1 − (3.25)] = 0.0012, we reject
𝐻0: = 50 at the 0.05 level of significance.
Practical Interpretation: The mean burning rate differs from 50 centimeters per second,
based on a sample of 25 measurements.
25 25
= 1.96 − − − 1.96 −
= (− 0.54) − (− 4.46) = 0.295
• The probability is about 0.3 that the test will fail to reject the null hypothesis when the true
burning rate is 49 centimeters per second.
• Practical Interpretation: A sample size of n = 25 results in reasonable, but not great
power = 1 − 𝛽 = 1 − 0.3 = 0.70.
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 21
Example 9.3b | Propellant Burning Rate Type II
Error
• Suppose that the analyst wishes to design the test so that if the true mean burning rate differs from
50 centimeters per second by as much as 1 centimeter per second, the test will detect this (i.e.,
reject H0: = 50) with a high probability, say, 0.90. Now, we note that = 2, = 51 − 50 = 1, =
0.05, and = 0.10.
• Since z/2 = z0.025 = 1.96 and z = z0.10 = 1.28, the sample size required to detect this departure from
H0: = 50 is found by Equation 9.22 as
( z / 2 + z ) 2 2 (1.96 + 1.28) 2 2 2
n −~ = −~ 42
2 (1)2
• The approximation is good here because, (− z /2 − n /) = (− 1.96 − (1) 42/2) = (− 5.20) −
~ 0
• Curves are provided for two-sided alternatives on Charts VIIe and VIIf .
The abscissa scale factor d on these charts is defined as
− 0
d = =
• For the one-sided alternative 0 or 0 , use charts VIIg and VIIh. The
abscissa scale factor d on these charts is defined as
− 0
d = =
9.3.2 Type II Error and Choice of Sample Size
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
27
Example 9.7 | Golf Club Design Sample Size
• Consider the golf club testing problem from Example 9.6. If the mean coefficient of
restitution exceeds 0.82 by as much as 0.02, is the sample size n = 15 adequate to
ensure that H0: = 0.82 will be rejected with probability at least 0.8?
• To solve this problem, we will use the sample standard deviation s = 0.02456 to estimate
. Then d = / = 0.02/0.02456 = 0.81.
• By referring to the operating characteristic curves in Appendix Chart VIIg (for = 0.05)
with d = 0.81 and n = 15, we find that = 0.10, approximately.
• Thus, the probability of rejecting H0: = 0.82 if the true mean exceeds this by 0.02 is
approximately 1 − = 1 − 0.10 = 0.90, and we conclude that a sample size of n = 15 is
adequate to provide the desired sensitivity.
• This is the abscissa parameter for Chart VIIk. From this chart, with n = 20 and 𝜆 = 1.25, we find that 𝛽 ≅ 0.6.
Therefore, there is only about a 40% chance that the null hypothesis will be rejected if the true standard deviation
is really as large as = 0.125 fluid ounce.
• To reduce the -error, a larger sample size must be used. From the operating characteristic curve with = 0.20
and = 1.25, we find that n = 75, approximately. Thus, if we want the test to perform as required above, the
sample size must be at least 75 bottles.
9.5.1 Large-Sample Tests on a Proportion Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
33
Large-Sample Tests on a Proportion
9.5.1 Large-Sample Tests on a Proportion Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
34
Type II Error and Choice of Sample Size
p − p + z/2 p0 (1 − p0 ) /n p − p − z /2 p0 (1 − p0 ) /n
• For a two-sided alternative = 0
− 0
p(1 − p) / n p(1 − p) /n
p0 − p − z p0 (1 − p0 ) /n
• If the alternative is p < p0 = 1 −
p(1 − p) /n
p0 − p + z p0 (1 − p0 ) /n
• If the alternative is p > p0 =
p(1 − p) /n
n =ê ú
êë 0.03 - 0.05 ú
û
-~ 832
• Conclusion: Note that n = 832 is a very large sample size. However, we are trying to detect a fairly
small deviation from the null value p0 = 0.05.
9.5.2 Type II Error and Choice of Sample Size
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 36
Testing for Goodness of Fit
• Based on chi-square distribution
• Requires a random sample of size n from the population whose probability distribution is
unknown
• Let Oi be the observed frequency in the ith class interval.
• Let Ei be the expected frequency in the ith class interval.
9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
37
Example 9.12a | Printed Circuit Board
Defects-Poisson Distribution
• The number of defects in printed circuit boards is hypothesized to follow
a Poisson distribution. A random sample of n = 60 printed boards has
been collected, and the following number of defects observed.
• The estimate of the mean number of defects per board is the sample
average, (32·0 + 15·1 + 9·2 + 4·3)/60 = 0.75. From the Poisson
distribution with parameter 0.75, we may compute pi, the theoretical,
hypothesized probability associated with the ith class interval. We may
find the pi as follows: e −0.75 (0.75)0
p = P ( X = 0) =
1 = 0.472
0!
e −0.75 (0.75)1
p 2 = P ( X = 1) = = 0.354
1!
e −0.75 (0.75)2
p3 = P ( X = 2) = = 0.133
2!
p 4 = P ( X 3) = 1 − ( p1 + p 2 + p3 ) = 0.041
9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
38
Example 9.12b | Printed Circuit Board
Defects-Poisson Distribution
• The expected frequencies are computed by multiplying the sample size n
= 60 times the probabilities pi. That is, Ei = npi. The expected frequencies
follow:
• Because the expected frequency in the last cell is less than 3, we combine
the last two cells.
9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
39
Example 9.12c | Printed Circuit Board
Defects-Poisson Distribution
The seven-step hypothesis-testing procedure may now be applied, using = 0.05, as follows:
1. Parameter of interest: The variable of interest is the form of the distribution of defects in printed circuit
boards.
2. Null hypothesis: H0: The form of the distribution of defects is Poisson.
3. Alternative hypothesis: H1: The form of the
(oi − Ei )2 of defects is not Poisson.
k distribution
+
(13 − 10.44)2 = 2.94
10.44
2
= 2.71
2
= 3.84 = 2.94
7. Conclusions: We find from Appendix Table III that and Because lies
between these values, we conclude that the P-value is between 0.05 and 0.10. Therefore, since the P-value
exceeds 0.05 we are unable to reject the null hypothesis that the distribution of defects in printed circuit boards
is Poisson. The exact P-value computed from Minitab is 0.0864.
9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 40
Contingency Table Tests
c
1
Assuming independence, the estimators of ui and vj are: uˆ i =
n
Oij
j =1
r
1
vˆ j =
n
Oij
i =1
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
41
Contingency Table Tests
Therefore, the expected frequency of each cell is
c r
1
Eij = nuˆ i vˆ j =
n
Oij Oij
j =1 i =1
Then, for large n, the statistic
r c
(Oij - Eij )2
c =å å
2
0
i=1 j=1 Eij
has an approximate chi-square distribution with (r −1)(c − 1) degrees of freedom if the null
hypothesis is true. We should reject the null hypothesis if the value of the test statistic is too
large. The P-value would be calculated as the probability beyond on the (r −1)(c −1)
( )
distribution, or P = p (r −1)(c −1) . For a fixed-level test, we would reject the hypothesis of
independence if the observed value of the test statistic exceeded (r −1)(c −1) .
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
42
Example 9.14a | Health Insurance Plan
Preference
A company has to choose among three health insurance plans. Management wishes to
know whether the preference for plans is independent of job classification and wants to
use α= 0.05. The opinions of a random sample of 500 employees are shown in Table 9.3.
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
44
Example 9.14c | Health Insurance Plan
Preference
The seven-step hypothesis-testing procedure may now be applied to this problem.
1. Parameter of Interest: The variable of interest is employee preference among health
insurance plans.
2. Null hypothesis: H0: Preference is independent of salaried versus hourly job classification.
3. Alternative hypothesis: H1: Preference is not independent of salaried versus hourly job
classification.
r c (oij − Eij )2
4. Test statistic: The test statistic is =
i =1 j = 1 Eij
5. Reject H0 if: We will use a fixed-significance level test with a = 0.05. Therefore, since r = 2 and
c = 3, the degrees of freedom for chi-square are (r – 1)(c – 1) = (1)(2) = 2, and we would reject
H0 if c 20 = 49.63 > c 20.05,2 = 5.99
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
45
Example 9.14d | Health Insurance Plan
Preference
5. Reject H0 if: We will use a fixed-significance level test with a = 0.05. Therefore, since r = 2
and c = 3, the degrees of freedom for chi-square are (r – 1)(c – 1) = (1)(2) = 2, and we would
reject H0 if
2 3 (o − E )2
6. Computations: =
ij ij
i =1 j =1 Eij
=
(160 − 136)2 +
(140 − 136)2 +
(40 − 68)2
136 136 68
+
(40 − 64)2 +
(60 − 64)2 +
(60 − 32)2
64 64 32
= 49.63
7. Conclusions: Since = 49.63 = 5.99 , we reject the hypothesis of independence and
conclude that the preference for health insurance plans is not independent of job
classification. The P-value for = 49.63 P = 1.671 10–11 (This value was computed from
computer software).
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
46
Nonparametric Procedures
The Sign Test
• Used to test hypotheses about the median 𝝁 of a continuous distribution.
• Suppose that the hypotheses are
• Test procedure: Let X1, X2,... ,Xn be a random sample from the population of interest.
• Form the differences , i =1,2,…,n.
• An appropriate test statistic is the number of these differences that are positive, say R+.
• P-value for the observed number of plus signs r+ can be calculated directly from the binomial
distribution.
• If the computed P-value is less than or equal to the significance level α, we will reject H0 .
• The two-sided alternative may also be tested
• If the computed P-value is less than the significance level α, we will reject H0.
9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
47
Example 9.15a | Propellant Shear Strength
Sign Test
• Montgomery, Peck, and Vining (2012) reported on a
study in which a rocket motor is formed by binding
an igniter propellant and a sustainer propellant
together inside a metal housing.
• The shear strength of the bond between the two
propellant types is an important characteristic.
• The results of testing 20 randomly selected motors
are shown in Table 9.5. Test the hypothesis that the
median shear strength is 2000 psi, using α = 0.05.
9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
48
Example 9.15b | Propellant Shear
Strength Sign Test
The seven-step hypothesis-testing procedure is:
1. Parameter of Interest: The variable of interest is the median of the distribution of propellant shear
strength.
2. Null hypothesis:
3. Alternative hypothesis:
4. Test statistic: The test statistic is the observed number of plus differences in Table 9.5, i.e., r+=14.
5. Reject H0 if: The P-value corresponding to r+ = 14 is less than or equal to α= 0.05
6. Computations : r+ = 14 is greater than n/2 = 20/2 = 10, we calculate the P-value from
1
P = 2 P R + 14 when p =
2
20
20
= 2 ( 0.5 ) ( 0.5 )
r 20 − r
r =14 r
= 0.1153
7. Conclusions: Since the P-value is greater than α= 0.05 we cannot reject
the null hypotheses that the median shear strength is 2000 psi.
9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
49
The Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test
• A test procedure that uses both direction (sign) and magnitude.
• Suppose that the hypotheses are H 0 : = 0 H1 : 0
• Test procedure : Let X1, X2,... ,Xn be a random sample from continuous and symmetric distribution with mean
(and Median) μ. Form the differences .
• Rank the absolute differences X i - m0 in ascending order, and give the ranks to the signs of their corresponding
differences.
• Let W+ be the sum of the positive ranks and W– be the absolute value of the sum of the negative ranks, and let
W = min(W+, W−).
• Critical values of W, can be found in Appendix Table IX.
• If the computed value is less than the critical value, we will reject H0 .
• For one-sided alternatives H1 : 0 reject H0 if W– ≤ critical value
H1 : 0 reject H0 if W+ ≤ critical value
9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
50
Example 9.16a | Propellant Shear Strength-
Wilcoxon Signed-Rank Test
• We illustrate the Wilcoxon signed-rank test by applying it to the propellant shear strength
data from Table 9.5. Assume that the underlying distribution is a continuous symmetric
distribution.
• The seven-step hypothesis-testing procedure is applied as follows:
1. Parameter of Interest: The variable of interest is the mean (or median) of the
distribution of propellant shear strength.
2. Null hypothesis: H 0 : = 2000 psi
3. Alternative hypothesis: H1 : 2000 psi
+ −
4. Test statistic: The test statistic is 𝑤 = min(𝑤 , 𝑤 )
5. Reject H0 if: W ≤ 52 (from Appendix Table IX).
+
6. Computations : The sum of the positive ranks is 𝑤 = (1 + 2 + 3 + 4 + 5 + 6 + 11 + 13
+ 15 + 16 + 17 +− 18 + 19 + 20) = 150, and the sum of the absolute values of the
negative ranks is 𝑤 = (7 + 8 + 9 + 10 + 12 + 14) = 60.
9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
51
Example 9.16b | Propellant Shear Strength-
Wilcoxon Signed-Rank Test
+ −
𝑤 = min(𝑤 , 𝑤 ) = min(150 , 60) = 60
9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
52
The Wilcoxon Signed-Rank Test
9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
53
Equivalence Testing
• A test that is used when we want to reject the null hypothesis (and support the alternative)
in the form of:
𝐻0 : 𝜇 ≠ 80 𝑣𝑠. 𝐻1 : 𝜇 = 80
• To test the above, the following two sets of one-sided alternative hypotheses are tested:
where 𝛿 is called the equivalence band, a practical threshold or limit within which the mean
is considered to be the same as standard.
• Note that these are just two one-sided tests. For this reason, a test of equivalence is
sometimes called two one-sided tests (TOST).
9.10 Equivalence Testing Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
54
Important Terms and Concepts
• Acceptance Region • Hypotheses • Reference distribution for a test statistic
• Alternative Hypothesis • Hypothesis testing • Rejection region
• 𝛼 and 𝛽 • Independence test • Sampling distribution
• Chi-square tests • inference • Sample size determination for hypothesis tests
• Combining P-values • Non-parametric or distribution-free methods • Sign test
• Confidence interval • Normal approximation to non-parametric tests • Significance level of a test
• Connection between hypothesis tests and • Null distribution • Statistical hypotheses
confidence intervals
• Null hypothesis • Statistical vs. practical significance
• Contingency table
• Observed significance level • Symmetric continuous distributions
• Critical region for a test statistic
• One-and two-sided alternative hypotheses • T-test
• Critical Values
• Operating Characteristic (OC) curves • Test statistic
• Equivalence testing
• Parametric • Type I & Type II errors
• Fixed significance level
• Power of a statistical test • Wilcoxen signed-rank test
• Goodness-of-fit test
• P-value • Z-test
• Homogeneity test
• Ranks
Chapter 9 Important Terms and Concepts Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
55