Ch09-Tests of Hypotheses For A Single Sample

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Applied Statistics and Probability

for Engineers
Seventh Edition
Douglas C. Montgomery George C. Runger

Chapter 9
Tests of Hypotheses for a Single Sample
Chapter 9 Title Slide
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 1
9 Tests of Hypotheses for a Single Sample
CHAPTER OUTLINE
9.1 Hypothesis Testing 9.4 Tests of the Variance & Standard
9.1.1 Statistical Hypotheses Deviation of a Normal Distribution.
9.1.2 Tests of Statistical Hypotheses 9.4.1 Hypothesis Tests on the Variance
9.1.3 1-Sided & 2-Sided Hypotheses 9.4.2 Type II Error & Choice of Sample Size
9.1.4 𝑃-Values in Hypothesis Tests 9.5 Tests on a Population Proportion
9.1.5 Connection between Hypothesis 9.5.1 Large-Sample Tests on a Proportion
Tests & Confidence Intervals 9.5.2 Type II Error & Choice of Sample Size
9.1.6 General Procedure for 9.6 Summary Table of Inference Procedures
Hypothesis Tests
for a Single Sample
9.2 Tests on the Mean of a Normal
9.7 Testing for Goodness of Fit
Distribution, Variance Known
9.2.1 Hypothesis Tests on the Mean
9.8 Contingency Table Tests
9.2.2 Type II Error & Choice of Sample Size 9.9 Non-Parametric Procedures
9.2.3 Large-Sample Test 9.9.1 The Sign Test
9.3 Tests on the Mean of a Normal 9.9.2 The Wilcoxon Signed-Rank Test
Distribution, Variance Unknown 9.9.3 Comparison to the 𝑡-test
9.3.1 Hypothesis Tests on the Mean 9.10 Equivalence Testing
9.3.2 Type II Error & Choice of Sample Size 9.11 Combining 𝑃-values
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
Chapter 9 Contents 2
Learning Objectives for Chapter 9
After careful study of this chapter, you should be able to do the following:
1. Structure engineering decision-making problems as hypothesis tests
2. Test hypotheses on the mean of a normal distribution using a Z-test or a t-test
3. Test hypotheses on the variance or standard deviation of a normal distribution
4. Test hypotheses on a population proportion
5. Use the P-value approach for making decisions in hypothesis tests
6. Compute power & Type II error probability and make sample size selection decisions for tests
on means, variances and proportions
7. Explain & use the relationship between confidence intervals & hypothesis tests
8. Use the chi-square goodness-of-fit test to check distributional assumptions
9. Apply contingency table tests
10. Apply nonparametric tests
11. Use equivalence testing
12. Combine P-values
Chapter 9 Learning Objectives Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
3
Hypothesis Testing

Two-sided Alternative Hypothesis


Let 𝐻0: 𝜇 = 50 centimeters per second and 𝐻1: 𝜇 ≠ 50 centimeters per second
• The statement 𝐻0: 𝜇 = 50 is called the null hypothesis.
• The statement 𝐻1: 𝜇 ≠ 50 is called the alternative hypothesis.
One-sided Alternative Hypothesis
• 𝐻0: 𝜇 = 50 centimeters per second 𝐻0: 𝜇 = 50 centimeters per second
or
• 𝐻1: 𝜇 < 50 centimeters per second 𝐻1: 𝜇 > 50 centimeters per second

Sec 9.1.1 Statistical Hypotheses


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
4
Hypothesis Testing
Test of a Hypothesis
• A procedure leading to a decision about a particular hypothesis

• Hypothesis-testing procedures rely on using the information in a random sample from


the population of interest

• If this information is consistent with the hypothesis, then we will conclude that the
hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude
that the hypothesis is false.

Sec 9.1.2 Tests of Statistical Hypotheses


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
5
Tests of Statistical Hypotheses

Sec 9.1.2 Tests of Statistical Hypotheses


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
6
Decisions in Hypothesis Testing

Sometimes the type 𝐼 error


probability is called the
significance level, or the -
error, or the size of the test.

Sec 9.1.2 Tests of Statistical Hypotheses

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
7
Computing the Probability of Type 𝑰 Error

 = P( X  48 .5 when  = 50 ) + P( X  51 .5 when  = 50 )

This implies that 5.74% of all random samples would


lead to rejection of the hypothesis 𝐻0: 𝜇 = 50.

Sec 9.1.2 Tests of Statistical Hypotheses

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
8
Computing the Probability of Type 𝑰𝑰 Error

 = P(48 .5  X  51 .5 when  = 52 )

This implies that the probability that we will fail to


reject the false null hypothesis is 0.2643.

Sec 9.1.2 Tests of Statistical Hypotheses

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
9
Power of a Statistical Test

• The power is computed as 1 − 𝛽, and power can be interpreted as the probability of correctly
rejecting a false null hypothesis.

• For example, consider the propellant burning rate problem when we are testing 𝐻 0: 𝜇 = 50 centimeters
per second against 𝐻 1: 𝜇 ≠ 50 centimeters per second. Suppose that the true value of the mean is 𝜇
= 52.

• When 𝑛 = 10, we found that 𝛽 = 0.2643, so the power of this test is 1 – 𝛽 = 1 − 0.2643 = 0.7357

Sec 9.1.2 Tests of Statistical Hypotheses

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
10
One-Sided and Two-Sided Hypotheses
Two-Sided Test One-Sided Tests
H0:  = 0
H0:  = 0 or H0:  = 0
H1:  ≠ 0 H1:  > 0 H1:  < 0

Sec 9.1.3 One-Sided and Two-Sided Hypotheses

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
11
𝑷-value

P-value is the observed significance level.

Sec 9.1.4 P-Values in Hypothesis Tests

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
12
𝑷-values in Hypothesis Tests

• Consider the two-sided hypothesis test 𝐻0:  = 50 against 𝐻1:   50 with 𝑛 16 and  = 2.5.
Suppose that the observed sample mean is 𝑥ҧ = 51.3 centimeters per second.

• The P-value of the test is the probability above 51.3 plus the probability below 48.7.

Sec 9.1.4 P-Values in Hypothesis Tests

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
13
Connection Between Hypothesis Tests
and Confidence Intervals
• A close relationship exists between the test of a hypothesis for 𝜃, and the confidence interval
for 𝜃.

• If [𝑙, 𝑢] is a 100(1 − 𝛼) confidence interval for the parameter 𝜃, the test of size 𝛼 of the
hypothesis
𝐻0:  = 0
𝐻1:   0

will lead to rejection of 𝐻0 if and only if 𝜃0 is not in the 100(1 − ) 𝐶𝐼 [𝑙, 𝑢].

Sec 9.1.5 Connection between Hypothesis Tests and Confidence Intervals

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
14
General Procedure for Hypothesis Tests

Sec 9.1.6 General Procedure for Hypothesis Tests

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
15
Hypothesis Tests on the Mean

Sec 9.2.1 Hypothesis Tests on the Mean


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
16
Example 9.2a | Propellant Burning Rate
• Air crew escape systems are powered by a solid propellant. The burning rate of this
propellant is an important product characteristic. Specifications require that the mean
burning rate must be 50 centimeters per second and the standard deviation is  = 2
centimeters per second. The significance level of  = 0.05 and a random sample of n =
25 has a sample average burning rate of centimeters per second. Draw conclusions.
• The seven-step procedure is
1. Parameter of interest: The parameter of interest is , the mean
burning rate.
2. Null hypothesis: H0:  = 50 centimeters per second
3. Alternative hypothesis: H1:   50 centimeters per second

Sec 9.2.1 Hypothesis Tests on the Mean


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
17
Example 9.2b | Propellant Burning Rate
4. Test statistic: The test statistic is z = x −  0
0
/ n
5. Reject H0 if: Reject H0 if the P-value is less than 0.05. The boundaries of the critical
region would be 𝑧0.025 = 1.96 𝑎𝑛𝑑 𝑧0.025 = −1.96.
6. Computations: Since x = 51 .3 and  = 2,
51.3 − 50
z0 = = 3.25
2/ 25

7. Conclusion: Since z0 = 3.25 and the p-value is = 2[1 − (3.25)] = 0.0012, we reject
𝐻0:  = 50 at the 0.05 level of significance.
Practical Interpretation: The mean burning rate differs from 50 centimeters per second,
based on a sample of 25 measurements.

Sec 9.2.1 Hypothesis Tests on the Mean


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
18
Type II Error and Choice of Sample Size

Sec 9.2.2 Type II Error and Choice of Sample Size


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 19
Type II Error and Choice of Sample Size

Sec 9.2.2 Type II Error and Choice of Sample Size


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
20
Example 9.3a | Propellant Burning Rate Type II
Error
• Consider the rocket propellant problem of Example 9.2. The true burning rate is 49
centimeters per second. Find  for the two-sided test with  = 0.05,  = 2, and n = 25?
• Here  = 1 and z/2 = 1.96. From Equation 9.20,

 25   25 
 =  1.96 −  −   − 1.96 −
 


     
=  (− 0.54) −  (− 4.46) = 0.295

• The probability is about 0.3 that the test will fail to reject the null hypothesis when the true
burning rate is 49 centimeters per second.
• Practical Interpretation: A sample size of n = 25 results in reasonable, but not great
power = 1 − 𝛽 = 1 − 0.3 = 0.70.

Sec 9.2.2 Type II Error and Choice of Sample Size

Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 21
Example 9.3b | Propellant Burning Rate Type II
Error
• Suppose that the analyst wishes to design the test so that if the true mean burning rate differs from
50 centimeters per second by as much as 1 centimeter per second, the test will detect this (i.e.,
reject H0:  = 50) with a high probability, say, 0.90. Now, we note that  = 2,  = 51 − 50 = 1,  =
0.05, and  = 0.10.
• Since z/2 = z0.025 = 1.96 and z = z0.10 = 1.28, the sample size required to detect this departure from
H0:  = 50 is found by Equation 9.22 as
( z  / 2 + z ) 2  2 (1.96 + 1.28) 2 2 2
n −~ = −~ 42
2 (1)2

• The approximation is good here because,  (− z /2 −  n /) =  (− 1.96 − (1) 42/2) = (− 5.20) −
~ 0

which is small relative to .


• Practical Interpretation: To achieve a much higher power of 0.90 we need a considerably large
sample size, n = 42 instead of n = 25.
Sec 9.2.2 Type II Error and Choice of Sample Size
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 22
Large-Sample Test
• A test procedure for the null hypothesis H0:  = 0 assuming that the population is
normally distributed and that 2 known is developed. In most practical situations, 2 will be
unknown. Even, we may not be certain that the population is normally distributed.
• In such cases, if n is large (say, n  40) the sample standard deviation s can be
substituted for  in the test procedures. Thus, while we have given a test for the mean of
a normal distribution with known 2, it can be easily converted into a large-sample test
procedure for unknown 2 regardless of the form of the distribution of the population.
• Exact treatment of the case where the population is normal, 2 is unknown, and n is small
involves use of the t distribution.

Sec 9.2.3 Large-Sample Test


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
23
Hypothesis Tests on the Mean

9.3.1 Hypothesis Tests on the Mean


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
24
Example 9.6a | Golf Club Design
• An experiment was performed in which 15 drivers produced by a particular club maker were selected at random
and their coefficients of restitution measured. It is of interest to determine if there is evidence (with  = 0.05) to
support a claim that the mean coefficient of restitution exceeds 0.82.
• The observations are:
0.8411 0.8191 0.8182 0.8125 0.8750
0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
• The sample mean and sample standard deviation are x = 0.83725 and s = 0.02456. The objective of the
experimenter is to demonstrate that the mean coefficient of restitution exceeds 0.82, hence a one-sided
alternative hypothesis is appropriate.
The seven-step procedure for hypothesis testing is as follows:
1. Parameter of interest: The parameter of interest is the mean coefficient of restitution, .
2. Null hypothesis: H0:  = 0.82
3. Alternative hypothesis: H1:   0.82
9.3.1 Hypothesis Tests on the Mean
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
25
Example 9.6b | Golf Club Design
4. Test Statistic: The test statistic is x − 0
t0 =
s/ n
5. Reject H0 if: Reject H0 if the P-value is less than 0.05.

6. Computations: Since x = 0.83725 , s = 0.02456,  = 0.82, and n = 15, we have


0.83725 − 0.82
t0 = = 2.72
0.02456/ 15
7. Conclusions: From Appendix A Table II, for a t distribution with 14 degrees of freedom, t0 = 2.72
falls between two values: 2.624, for which  = 0.01, and 2.977, for which  = 0.005. Since, this is a
one-tailed test the P-value is between those two values, that is, 0.005 < P < 0.01. Therefore, since
P < 0.05, we reject H0 and conclude that the mean coefficient of restitution exceeds 0.82.
Practical Interpretation: There is strong evidence to conclude that the mean coefficient of restitution
exceeds 0.82.
9.3.1 Hypothesis Tests on the Mean
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
26
Type II Error and Choice of Sample Size
• The type II error of the two-sided alternative would be
 = P(−t /2,n −1  T0  t /2,n −1 |   0)
= P(−t /2,n −1  T0  t /2,n −1 )

• Curves are provided for two-sided alternatives on Charts VIIe and VIIf .
The abscissa scale factor d on these charts is defined as
 − 0 
d = =
 
• For the one-sided alternative   0 or    0 , use charts VIIg and VIIh. The
abscissa scale factor d on these charts is defined as
 − 0 
d = =
 
9.3.2 Type II Error and Choice of Sample Size
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
27
Example 9.7 | Golf Club Design Sample Size
• Consider the golf club testing problem from Example 9.6. If the mean coefficient of
restitution exceeds 0.82 by as much as 0.02, is the sample size n = 15 adequate to
ensure that H0:  = 0.82 will be rejected with probability at least 0.8?
• To solve this problem, we will use the sample standard deviation s = 0.02456 to estimate
. Then d = / = 0.02/0.02456 = 0.81.
• By referring to the operating characteristic curves in Appendix Chart VIIg (for  = 0.05)
with d = 0.81 and n = 15, we find that  = 0.10, approximately.

• Thus, the probability of rejecting H0:  = 0.82 if the true mean exceeds this by 0.02 is
approximately 1 −  = 1 − 0.10 = 0.90, and we conclude that a sample size of n = 15 is
adequate to provide the desired sensitivity.

9.3.2 Type II Error and Choice of Sample Size


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
28
Hypothesis Tests on the Variance

9.4.1 Hypothesis Tests on the Variance


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
29
Example 9.8 | Automated Filling
• An automated filling machine is used to fill bottles with liquid detergent. A random sample of 20 bottles results in
a sample variance of fill volume of s2 = 0.0153 (fluid ounces)2. If the variance of fill volume exceeds 0.01 (fluid
ounces)2, an unacceptable proportion of bottles will be underfilled or overfilled. Is there evidence in the sample
data to suggest that the manufacturer has a problem with underfilled or overfilled bottles? Use  = 0.05, and
assume that fill volume has a normal distribution.
• Using the seven-step procedure results in the following:

9.4.1 Hypothesis Tests on the Variance


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
30
Type II Error and Choice of Sample Size
For the two-sided alternative hypothesis: s
l=
s0
Example 9.9 | Automated Filling Sample Size
• Consider the bottle-filling problem from Example 9.8. If the variance of the filling process exceeds 0.01 (fluid
ounces)2, too many bottles will be underfilled. Thus, the hypothesized value of the standard deviation is 0 = 0.10.
Suppose that if the true standard deviation of the filling process exceeds this value by 25%, we would like to
detect this with probability at least 0.8. Is the sample size of n = 20 adequate?
 0.125
• To solve this problem, note that we require  = = = 1.25
0 0.10

• This is the abscissa parameter for Chart VIIk. From this chart, with n = 20 and 𝜆 = 1.25, we find that 𝛽 ≅ 0.6.
Therefore, there is only about a 40% chance that the null hypothesis will be rejected if the true standard deviation
is really as large as  = 0.125 fluid ounce.
• To reduce the -error, a larger sample size must be used. From the operating characteristic curve with  = 0.20
and  = 1.25, we find that n = 75, approximately. Thus, if we want the test to perform as required above, the
sample size must be at least 75 bottles.

9.4.2 Type II Error and Choice of Sample Size


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 31
Large-Sample Tests on a Proportion

9.5.1 Large-Sample Tests on a Proportion


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 32
Example 9.10 | Automobile Engine Controller
A semiconductor manufacturer produces controllers used in automobile engine applications. The customer
requires that the process fallout or fraction defective at a critical manufacturing step not exceed 0.05 and that the
manufacturer demonstrate process capability at this level of quality using  = 0.05. The semiconductor
manufacturer takes a random sample of 200 devices and finds that four of them are defective. Can the
manufacturer demonstrate process capability for the customer?
1. Parameter of Interest: The parameter of interest is the process fraction defective p.
2. Null hypothesis: H0:p = 0.05
3. Alternative hypothesis: H1: p < 0.05
x − np0 where x = 4, n = 200, and p0 = 0.05.
4. Test Statistic: z0 =
np0 (1 − p0 )
Practical
5. Reject H0 if: Reject H0: p = 0.05 if the p-value is less than 0.05.
4 − 200(0.05) Interpretation: We
6. Computations: The test statistic is z 0 = 200(0.05)(0.95) = −1.95 conclude that the
7. Conclusions: Since z0 = −1.95, the P-value is  (−1.95) = 0.0256, so we reject H0 and process is capable
conclude that the process fraction defective p is less than 0.05.

9.5.1 Large-Sample Tests on a Proportion Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
33
Large-Sample Tests on a Proportion

Another form of the test statistic Z0 is

9.5.1 Large-Sample Tests on a Proportion Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
34
Type II Error and Choice of Sample Size
 p − p + z/2 p0 (1 − p0 ) /n   p − p − z /2 p0 (1 − p0 ) /n 
• For a two-sided alternative  =  0

 −  0
 


 p(1 − p) / n   p(1 − p) /n 

 p0 − p − z  p0 (1 − p0 ) /n 
• If the alternative is p < p0  = 1 −  
 p(1 − p) /n 
 
 p0 − p + z  p0 (1 − p0 ) /n 
• If the alternative is p > p0  =  
 p(1 − p) /n 
 

9.5.2 Type II Error and Choice of Sample Size


Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 35
Example 9.11 | Automobile Engine Controller
Type II Error
• Consider the semiconductor manufacturer from Example 9.10. Suppose that its process
fallout is really p = 0.03. What is the -error for a test of process capability that uses n =
200 and a = 0.05?
 0.05 − 0.03 − (1.645) 0.05(0.95) / 200 
 =1−   
 0.03(1 − 0.03) / 200 
= 1 −  (− 0.44) = 0.67
• Suppose that the semiconductor manufacturer was willing to accept a -error as large as
0.10 if the true value of the process fraction defective was p = 0.03. If the manufacturer
continues to use a = 0.05, what sample size would be required?
é1.645 0.05 ( 0.95) + 1.28 0.03( 0.97) ù
2

n =ê ú
êë 0.03 - 0.05 ú
û

-~ 832
• Conclusion: Note that n = 832 is a very large sample size. However, we are trying to detect a fairly
small deviation from the null value p0 = 0.05.
9.5.2 Type II Error and Choice of Sample Size
Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 36
Testing for Goodness of Fit
• Based on chi-square distribution
• Requires a random sample of size n from the population whose probability distribution is
unknown
• Let Oi be the observed frequency in the ith class interval.
• Let Ei be the expected frequency in the ith class interval.

9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
37
Example 9.12a | Printed Circuit Board
Defects-Poisson Distribution
• The number of defects in printed circuit boards is hypothesized to follow
a Poisson distribution. A random sample of n = 60 printed boards has
been collected, and the following number of defects observed.

• The estimate of the mean number of defects per board is the sample
average, (32·0 + 15·1 + 9·2 + 4·3)/60 = 0.75. From the Poisson
distribution with parameter 0.75, we may compute pi, the theoretical,
hypothesized probability associated with the ith class interval. We may
find the pi as follows: e −0.75 (0.75)0
p = P ( X = 0) =
1 = 0.472
0!
e −0.75 (0.75)1
p 2 = P ( X = 1) = = 0.354
1!
e −0.75 (0.75)2
p3 = P ( X = 2) = = 0.133
2!
p 4 = P ( X  3) = 1 − ( p1 + p 2 + p3 ) = 0.041

9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
38
Example 9.12b | Printed Circuit Board
Defects-Poisson Distribution
• The expected frequencies are computed by multiplying the sample size n
= 60 times the probabilities pi. That is, Ei = npi. The expected frequencies
follow:

• Because the expected frequency in the last cell is less than 3, we combine
the last two cells.

9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
39
Example 9.12c | Printed Circuit Board
Defects-Poisson Distribution
The seven-step hypothesis-testing procedure may now be applied, using  = 0.05, as follows:
1. Parameter of interest: The variable of interest is the form of the distribution of defects in printed circuit
boards.
2. Null hypothesis: H0: The form of the distribution of defects is Poisson.
3. Alternative hypothesis: H1: The form of the
(oi − Ei )2 of defects is not Poisson.
k distribution

Test statistic: The test statistic is   = 



4. Ei
i =1

5. Reject H0 if: Reject H0(32


if the P-value is(15
− 28.32)2
less than 0.05.
− 21.24 )2
  = +
6. Computations: 28.32 21.24

+
(13 − 10.44)2 = 2.94
10.44
 
2
= 2.71  
2
= 3.84   = 2.94
7. Conclusions: We find from Appendix Table III that and Because lies
between these values, we conclude that the P-value is between 0.05 and 0.10. Therefore, since the P-value
exceeds 0.05 we are unable to reject the null hypothesis that the distribution of defects in printed circuit boards
is Poisson. The exact P-value computed from Minitab is 0.0864.
9.7 Testing for Goodness of Fit Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved 40
Contingency Table Tests

c
1
Assuming independence, the estimators of ui and vj are: uˆ i =
n
 Oij
j =1
r
1
vˆ j =
n
 Oij
i =1

9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
41
Contingency Table Tests
Therefore, the expected frequency of each cell is
c r
1
Eij = nuˆ i vˆ j =
n
 Oij  Oij
j =1 i =1
Then, for large n, the statistic
r c
(Oij - Eij )2
c =å å
2
0
i=1 j=1 Eij

has an approximate chi-square distribution with (r −1)(c − 1) degrees of freedom if the null 
hypothesis is true. We should reject the null hypothesis if the value of the test statistic   is too

large. The P-value would be calculated as the probability beyond   on the  (r −1)(c −1)
( )
distribution, or P = p  (r −1)(c −1)    . For a fixed-level test, we would reject the hypothesis of
 
independence if the observed value of the test statistic   exceeded  (r −1)(c −1) .

9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
42
Example 9.14a | Health Insurance Plan
Preference
A company has to choose among three health insurance plans. Management wishes to
know whether the preference for plans is independent of job classification and wants to
use α= 0.05. The opinions of a random sample of 500 employees are shown in Table 9.3.

To find the expected frequencies, we must first compute


uˆ2 = (160/500) = 0.32, vˆ1 = (200/500) = 0.40, vˆ2 = (200/500) = 0.40, and
uˆ1 = (340/500) = 0.68, vˆ3 = (100 /500) = 0.20
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
43
Example 9.14b | Health Insurance Plan
Preference
• The expected frequencies may now be computed from Equation 9.49.
• For example, the expected number of salaried workers favoring health insurance plan 1 is
E11 = nuˆ1vˆ1 = 500 (0.68 )(0.40 ) = 136
• The expected frequencies are shown in the table below

9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
44
Example 9.14c | Health Insurance Plan
Preference
The seven-step hypothesis-testing procedure may now be applied to this problem.
1. Parameter of Interest: The variable of interest is employee preference among health
insurance plans.
2. Null hypothesis: H0: Preference is independent of salaried versus hourly job classification.
3. Alternative hypothesis: H1: Preference is not independent of salaried versus hourly job
classification.
r c (oij − Eij )2
4. Test statistic: The test statistic is   = 
i =1 j = 1 Eij

5. Reject H0 if: We will use a fixed-significance level test with a = 0.05. Therefore, since r = 2 and
c = 3, the degrees of freedom for chi-square are (r – 1)(c – 1) = (1)(2) = 2, and we would reject
H0 if c 20 = 49.63 > c 20.05,2 = 5.99

9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
45
Example 9.14d | Health Insurance Plan
Preference
5. Reject H0 if: We will use a fixed-significance level test with a = 0.05. Therefore, since r = 2
and c = 3, the degrees of freedom for chi-square are (r – 1)(c – 1) = (1)(2) = 2, and we would
reject H0 if
2 3 (o − E )2
6. Computations:   =
 
ij ij

i =1 j =1 Eij

=
(160 − 136)2 +
(140 − 136)2 +
(40 − 68)2
136 136 68

+
(40 − 64)2 +
(60 − 64)2 +
(60 − 32)2
64 64 32
= 49.63

 
7. Conclusions: Since   = 49.63    = 5.99 , we reject the hypothesis of independence and
conclude that the preference for health insurance plans is not independent of job
classification. The P-value for  = 49.63 P = 1.671  10–11 (This value was computed from
computer software).
9.8 Contingency Table Tests Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
46
Nonparametric Procedures
The Sign Test
• Used to test hypotheses about the median 𝝁 ෥ of a continuous distribution.
• Suppose that the hypotheses are
• Test procedure: Let X1, X2,... ,Xn be a random sample from the population of interest.
• Form the differences , i =1,2,…,n.
• An appropriate test statistic is the number of these differences that are positive, say R+.
• P-value for the observed number of plus signs r+ can be calculated directly from the binomial
distribution.
• If the computed P-value is less than or equal to the significance level α, we will reject H0 .
• The two-sided alternative may also be tested
• If the computed P-value is less than the significance level α, we will reject H0.

9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
47
Example 9.15a | Propellant Shear Strength
Sign Test
• Montgomery, Peck, and Vining (2012) reported on a
study in which a rocket motor is formed by binding
an igniter propellant and a sustainer propellant
together inside a metal housing.
• The shear strength of the bond between the two
propellant types is an important characteristic.
• The results of testing 20 randomly selected motors
are shown in Table 9.5. Test the hypothesis that the
median shear strength is 2000 psi, using α = 0.05.

9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
48
Example 9.15b | Propellant Shear
Strength Sign Test
The seven-step hypothesis-testing procedure is:
1. Parameter of Interest: The variable of interest is the median of the distribution of propellant shear
strength.
2. Null hypothesis:
3. Alternative hypothesis:
4. Test statistic: The test statistic is the observed number of plus differences in Table 9.5, i.e., r+=14.
5. Reject H0 if: The P-value corresponding to r+ = 14 is less than or equal to α= 0.05
6. Computations : r+ = 14 is greater than n/2 = 20/2 = 10, we calculate the P-value from
 1
P = 2 P  R +  14 when p = 
 2
20
 20 
= 2    ( 0.5 ) ( 0.5 )
r 20 − r

r =14  r 

= 0.1153
7. Conclusions: Since the P-value is greater than α= 0.05 we cannot reject
the null hypotheses that the median shear strength is 2000 psi.
9.9.1 The Sign Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
49
The Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test
• A test procedure that uses both direction (sign) and magnitude.
• Suppose that the hypotheses are H 0 :  = 0 H1 :    0
• Test procedure : Let X1, X2,... ,Xn be a random sample from continuous and symmetric distribution with mean
(and Median) μ. Form the differences .
• Rank the absolute differences X i - m0 in ascending order, and give the ranks to the signs of their corresponding
differences.
• Let W+ be the sum of the positive ranks and W– be the absolute value of the sum of the negative ranks, and let
W = min(W+, W−).
• Critical values of W, can be found in Appendix Table IX.
• If the computed value is less than the critical value, we will reject H0 .
• For one-sided alternatives H1 :   0 reject H0 if W– ≤ critical value
H1 :   0 reject H0 if W+ ≤ critical value

9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
50
Example 9.16a | Propellant Shear Strength-
Wilcoxon Signed-Rank Test
• We illustrate the Wilcoxon signed-rank test by applying it to the propellant shear strength
data from Table 9.5. Assume that the underlying distribution is a continuous symmetric
distribution.
• The seven-step hypothesis-testing procedure is applied as follows:
1. Parameter of Interest: The variable of interest is the mean (or median) of the
distribution of propellant shear strength.
2. Null hypothesis: H 0 :  = 2000 psi
3. Alternative hypothesis: H1 :   2000 psi
+ −
4. Test statistic: The test statistic is 𝑤 = min(𝑤 , 𝑤 )
5. Reject H0 if: W ≤ 52 (from Appendix Table IX).
+
6. Computations : The sum of the positive ranks is 𝑤 = (1 + 2 + 3 + 4 + 5 + 6 + 11 + 13
+ 15 + 16 + 17 +− 18 + 19 + 20) = 150, and the sum of the absolute values of the
negative ranks is 𝑤 = (7 + 8 + 9 + 10 + 12 + 14) = 60.

9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
51
Example 9.16b | Propellant Shear Strength-
Wilcoxon Signed-Rank Test
+ −
𝑤 = min(𝑤 , 𝑤 ) = min(150 , 60) = 60

7. Conclusions: Since w = 60 is not ≤ 𝑤0 = 52 we fail to


reject the null hypotheses that the mean or median shear
strength is 2000 psi.

9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
52
The Wilcoxon Signed-Rank Test

9.9.2 The Wilcoxon Signed-Rank Test Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
53
Equivalence Testing
• A test that is used when we want to reject the null hypothesis (and support the alternative)
in the form of:
𝐻0 : 𝜇 ≠ 80 𝑣𝑠. 𝐻1 : 𝜇 = 80
• To test the above, the following two sets of one-sided alternative hypotheses are tested:

where 𝛿 is called the equivalence band, a practical threshold or limit within which the mean
is considered to be the same as standard.
• Note that these are just two one-sided tests. For this reason, a test of equivalence is
sometimes called two one-sided tests (TOST).
9.10 Equivalence Testing Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
54
Important Terms and Concepts
• Acceptance Region • Hypotheses • Reference distribution for a test statistic
• Alternative Hypothesis • Hypothesis testing • Rejection region
• 𝛼 and 𝛽 • Independence test • Sampling distribution
• Chi-square tests • inference • Sample size determination for hypothesis tests
• Combining P-values • Non-parametric or distribution-free methods • Sign test
• Confidence interval • Normal approximation to non-parametric tests • Significance level of a test
• Connection between hypothesis tests and • Null distribution • Statistical hypotheses
confidence intervals
• Null hypothesis • Statistical vs. practical significance
• Contingency table
• Observed significance level • Symmetric continuous distributions
• Critical region for a test statistic
• One-and two-sided alternative hypotheses • T-test
• Critical Values
• Operating Characteristic (OC) curves • Test statistic
• Equivalence testing
• Parametric • Type I & Type II errors
• Fixed significance level
• Power of a statistical test • Wilcoxen signed-rank test
• Goodness-of-fit test
• P-value • Z-test
• Homogeneity test
• Ranks

Chapter 9 Important Terms and Concepts Copyright © 2019 John Wiley & Sons, Inc. All Rights Reserved
55

You might also like