Tutorial 4 - Analysis of Variance PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

STA408: Statistics for Science and Engineering

Tutorial 4
The following exercises assume that all the assumptions required to apply the ANOVA procedures hold
true.

One-way ANOVA
1. A large company buys thousands of lightbulbs every year. The company is currently considering
four brands of lightbulbs to choose from. Before the company decides which lightbulbs to buy, it
wants to investigate if the mean lifetimes of the four types of light bulbs are the same. The
company’s research department randomly selected a few bulbs of each type and test them. The
following table lists the number of hours (in thousands) that each of the bulbs in each brand lasted
before being burned out.
Brand I 23 24 19 26 22 23 25
Brand II 19 23 18 24 20 22 19
Brand III 23 27 25 26 23 21 17
Brand IV 26 24 21 29 28 24 28
At 2.5% significance level, test the null hypothesis that the mean lifetime of bulbs for each of these
four brands is the same.
(Answer: critical value = 3.72, test statistic, 𝐹 = 3.878, reject 𝐻0 )

2. A university employment office wants to compare the time taken by graduates with three different
majors to find their first full-time job after graduation. The following table lists the time (in days)
taken to find their first full-time job after graduation for a random sample of eight business majors,
seven computer science majors and six engineering majors who graduated last year.
Business 208 162 240 180 148 312 176 292
Computer Science 156 113 281 128 305 147 232
Engineering 126 275 363 146 298 392
At 5% significance level, can you conclude that the mean time taken to find their first full-time job
for all last year’s graduates in these fields are the same?
(Answer: critical value = 3.55, test statistic, 𝐹 = 1.298, do not reject 𝐻0 )

3. The following ANOVA table, based on information obtained for three samples selected from three
independent populations that are normally distributed with equal variances, has a few missing
values.
ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic
Treatments 2 19.2813
Error 89.3677 𝐹= =
Total 12
(a) Find the missing value and complete the ANOVA table.
(b) Use 𝛼 = 0.01, what is your conclusion for the test with the null hypothesis that the means of
the three populations are all equal against the alternative hypothesis that the means of the
three populations are not all equal?
(Answer: (a) 𝐹 = 2.1575 (b) critical value= 7.56, do not reject 𝐻0 )

4. An ophthalmologist is interested in determining whether a golfer’s type of vision (far-sightedness,


near-sightedness, no prescription) impacts how well he or she can judge distance. Random
samples of golfers from these three groups were selected, and these golfers were blindfolded and
taken to the same location on a golf course. Then each of them were asked to estimate the distance
from this location to the pin at the end of the table. The data (in metres) in the following table
represent how far off the estimates (let us call these errors) of these golfers were from the actual
STA408 Tutorial 4 Chapter 4: Analysis of Variance

distance. A negative value implies that the person underestimated the distance, and a positive value
implies that a person overestimated the distance.
Far-sighted 11 9 8 10 3 11 8 1 4
Near-sighted 2 5 7 8 6 9 2 10 10
No prescription 5 1 0 4 3 2 0 8
The Minitab output of the data above is as displayed below.
One-way ANOVA: Far-sighted, Near-sighted, No prescription
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Factor 2 182.3 91.14 5.58 0.011
Error 23 375.8 16.34
Total 25 558.0

Test at a 1 % significance level whether the average errors in predicting distance for all golfers if
the three different vision types are the same.
(Answer: Do not reject 𝐻0 )

5. A billiards parlour in a small town is open just 4 days per week, i.e., Thurday through Sunday.
Revenue vary considerably from day to day and week to week, so the owner is not sure whether
some days of the week are more profitable than others. He takes random samples of 5 Thursdays,
5 Fridays, 5 Saturdays and 5 Sundays from last year’s records and lists the revenue for these 20
days. His bookkeeper find the average revenue for each of the fours samples, and then calculates
∑ 𝑥2 . The results are shown in the following table. The value of the ∑ 𝑥2 came out to be 2,890,000.
Day Mean Revenue (RM) Sample Size
Thursday 295 5
Friday 380 5
Saturday 405 5
Sunday 345 5
At a 1% level of significance, can you conclude that the mean revenue is the same for each of the
four days of the week?
(Answer: critical value = 5.29, test statistic = 0.5725, do not reject 𝐻0 )

Randomised Complete Block Design


6. The following data represent the final grades obtained by 5 students in mathematics, English,
French, and biology:
Subject
Student Math English French Biology
1 68 57 73 61
2 83 94 91 86
3 72 81 63 59
4 55 73 77 66
5 92 68 75 87
Below is the Minitab output of the data above
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Student (i) 1618.70 404.67 (iv) 0.021
Subject 3 (ii) 14.05 0.15 0.927
Error 12 1112.10 (iii)
Total 19 2772.95

(a) Show that the total sum of squares is 2772.95.


(b) Find the missing values (i), (ii), (iii) and (iv)

2
STA408 Tutorial 4 Chapter 4: Analysis of Variance

(c) Test the hypothesis that the courses are of equal difficulty.

7. Given a randomized block experiment in the following ANOVA summary table.


Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic
Treatments 𝐹𝐴 = 3.78
Blocks 3 595.82 𝐹𝐵 =
Error 15 14.99
Total
(a) Fill in all the missing values.
(b) Find the total sample size of the above experiment.
(c) At the 0.025 level of significance, is there evidence of a difference among the six group means?
(d) At the 0.025 level of significance, is there evidence of an effect due to blocks?

8. An experiment is conducted in which 4 treatments are to be compared in 5 blocks. The data are
given below.
Block
Treatment 1 2 3 4 5
1 12.8 10.6 11.7 10.7 11
2 11.7 14.2 11.8 9.9 13.8
3 11.5 14.7 13.6 10.7 15.9
4 12.6 16.5 15.4 9.6 17.1
(a) Compute estimates of the treatment and block variance components.
(b) Assuming a random effects model, test the null hypothesis, at 0.01 level of significance, that
there is no difference between treatment means.

Two-way ANOVA
9. A gardening company is testing new ways to improve plant growth. Twelve plants are randomly
selected and exposed to a combination of two factors, a “Grow-light” in two different strengths and
a plant food supplement with different mineral supplements. After a number of days, the plants are
measured for growth and the results (in inches) are put into the appropriate boxes.

Grow-light 1 Grow-light 2
Plant food A 9.2, 9.4, 8.9 8.5, 9.2, 8.9
Plant food B 7.1, 7.2, 8.5 5.5, 5.8, 7.6

A two-way ANOVA summary table


Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Plant 12.8133
Light 1.9200
Food×Light 0.7500
Error 4.1733
Total 19.6567

Can an interaction between the two factors be concluded? Is there a difference in mean growth with
respect to light? With respect to plant food? Use 𝛼 = 0.05.

3
STA408 Tutorial 4 Chapter 4: Analysis of Variance

10. Two types of outdoor paint, enamel and latex, were tested to see how long (in months) each lasted
before it began to crack, flake and peel. They were tested in four geographical locations in Malaysia
to study the effects of climate on the paint. The data are given as follows:
Geographical Location
Type of paint North East South West
Enamel 60, 53, 58, 62, 57 54, 63, 62, 71, 76 80, 82, 62, 88, 71 62, 76, 55, 48, 61
Latex 36, 41, 54, 65, 53 62, 61, 77, 53, 64 68, 72, 71, 82, 86 63, 65, 72, 71, 63
Below is the Minitab output for the data above.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Paint 1 14.40 14.40 (v) 0.655
Location (i) (ii) 847.90 11.96 0.000
Paint*Location 3 282.60 (iv) 1.33 0.282
Error 32 (iii) 70.91
Total 39 5109.90

(a) Determine the experimental design used in this analysis.


(b) What are the assumptions required for the ANOVA analysis?
(c) How many observations were involved in this study?
(d) Find the values of (i), (ii), (iii), (iv) and (v) in the ANOVA table above.
(e) Do the data provide sufficient evidence to indicate there is an interaction effect between the
assignment of subcontractors and types of homes built? Test using 𝛼 = 0.05.
(f) Based on the result in (e), should a further test be conducted on the main effects?

11. A researcher conducted a study to two different diets and two different exercise programs. Three
randomly selected subjects were assigned to each group for one month. The values indicate the
amount of weight each lost.
Diet
Exercise Program A B
I 5, 6, 4 8, 10, 15
II 3, 4, 8 12, 16, 11

The Minitab output for the above data is as follows.


Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Exercise Program 1 3.000 3.000 0.43 0.531
Diets 1 147.000 147.000 21.00 0.002
Exercise Program*Diets 1 3.000 3.000 0.43 0.531
Error 8 56.000 7.000
Total 11 209.000

(a) What procedure should be used to analyse the data above?


(b) What are the names of the two variables?
(c) How many levels does each variable contain?
(d) What are the hypotheses for the study?
(e) Which test(s) in (d) is(are) significant?
(f) Based on (e), how can you conclude this study?

12. A pigment laboratory is testing both dry additives to see their effect on the durability rating (a
number for 1 to 10) of a finished paint product. The paint to be tested is divided into four equal
quantities, and a different combination of the two additives is added to one-fourth of each quantity.
After a prescribed number of hours, the durability rating is obtained for each of the 16 samples,
and the results are recorded below.

4
STA408 Tutorial 4 Chapter 4: Analysis of Variance

Dry additive I Dry additive II


Solution additive A 9, 8, 5, 6 4, 5, 8, 9
Solution additive B 7, 7, 6, 8 10, 8, 6, 7
The Minitab output for the above data is as follows.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Solution 1 1.5625 1.56250 0.50 0.494
Dry 1 0.0625 0.06250 0.02 0.890
Solution*Dry 1 1.5625 1.56250 0.50 0.494
Error 12 37.7500 3.14583
Total 15 40.9375

Can an interactive be concluded between the dry and solution additives? Is there a difference in
mean durability rating with respect to dry additive used? With respect to solution additive? Use
𝛼 = 0.05.

Miscellaneous Questions
13. The following are the numbers of mistakes made in five successive weeks by four technicians
working for a medical laboratory.
Technician I 13 16 12 14 15
Technician II 14 16 11 19 15
Technician III 13 18 16 14 18
Technician IV 18 10 14 15 12
(a) Find the treatment sum of squares.
(b) Show that the total sum of squares is 114.55.
(c) Test at the 0.05 level of significance whether there is a difference among the four population
means. (Do not reject 𝐻0 )

14. An electrical engineer is investigating a plasma etching process used in semiconductor


manufacturing. It is of interest to study the effects of two factors, the C 2F6 gas flow rate and the
power applied to the cathode. The response is the etch rate. The data and Minitab output are given
as follows.
Power Supplied
C2F6 Flow Rate 1 2 3
1 288, 360 488, 465 670, 720
2 385, 411 482, 521 692, 724
3 488, 462 595, 612 761, 801
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


C2F6 Flow Rate 2 46343 23172 (iv) 0.000
Power Supplied 2 (ii) 165002 212.16 0.000
C2F6 Flow Rate*Power Supplied (i) 3162 791 1.02 0.448
Error 9 6999 (iii)
Total 17 386508

(a) Find the values indicated by (i), (ii), (iii), (iv) of the ANOVA table.
(b) Test at 5% significance level whether there is an interaction effect between C2F6 flow rate
and power supplied.
(c) Based on the result in (b), should a further test be conducted on the main effects? Explain.

5
STA408 Tutorial 4 Chapter 4: Analysis of Variance

15. In a study on the mercury concentration in periphyton at different locations, a researcher measured
the total concentration in periphyton total solids at six different stations on five different days. The
recorded data set is given in the table below.
Station
Day I II III IV V VI
1 0.45 3.24 1.33 2.04 3.93 5.93
2 0.10 0.10 0.99 4.31 9.92 6.49
3 0.25 0.25 1.65 3.13 7.39 4.43
4 0.09 0.06 0.92 3.66 7.88 6.24
5 0.15 0.16 2.17 3.50 8.82 5.39
The Minitab output below gives the analysis of variance for the data set.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Station 5 (ii) 43.5867 27.40 0.000
Day (i) 2.974 0.7435 (iv) 0.759
Error 20 31.811 (iii)
Total 29 252.718

(a) Identify the response, treatment and block variables.


(b) Determine the values for (i), (ii), (iii) and (iv) in the Minitab output.
(c) Show that the total sum of squares is 252.718.
(d) At 𝛼 = 0.01 , determine whether the mean mercury content is significantly different
between stations?

16. A drug company tested three formulations of a pain relief medicine for migraine headache sufferers.
For the experiment 27 volunteers were selected and 9 were randomly assigned to one of three drug
formulations. The subjects were instructed to take the drug during their next migraine headache
episode and to report their pain on a scale of 1 to 10 (10 being most pain).
Drug A 4 5 4 3 2 4 3 4 4
Drug B 6 8 4 5 4 6 5 8 6
Drug C 6 7 6 6 7 5 6 5 5
The Minitab output below shows the analysis of variance for the data above.
Analysis of Variance
Source DF Adj SS Adj MS F-Value
Factor 2 28.22 14.111 11.91
Error 24 28.44 1.185
Total 26 56.67
(a) Show that the total sum of squares is 56.67.
(b) Test whether the average pain levels for all migraine headache sufferers are not all the same
at 𝛼 = 0.01. (Reject 𝐻0 )

17. Suppose you want to determine whether the brand of laundry detergent used and the temperature
affect the amount of dirt removed from your laundry. To wash your laundry, you bought 𝒎
different brands of detergent and chose 𝒏 levels of temperature. The Minitab output of the analysis
of the amount of dirt removed from your laundry is as follows.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Type of Detergent 1 20.17 20.167 (iv) 0.006
Temperature 2 200.33 (iii) 48.73 0.000
Type of Detergent*Temperature (i) 16.33 8.167 3.97 0.037
Error 18 (ii) 2.056
Total 23 273.83

6
STA408 Tutorial 4 Chapter 4: Analysis of Variance

(a) Identify the type of experimental design used in this study.


(b) Find the missing values of (i), (ii), (iii) and (iv) of the output.
(c) Test 2% significance level whether the data provide sufficient evidence to indicate an
interaction effect between the brands of detergent and levels of temperature.
(Do not reject 𝐻0 )
(d) Based on your result in (c), should a further test be conducted on the main effects? (Yes)

18. Place and Abramson (2008) put diamondback rattlesnakes (Crotalus atrox) in a "rattlebox," a box
with a lid that would slide open and shut every 5 minutes. At first, the snake would rattle its tail
each time the box opened. After a while, the snake would become habituated to the box opening
and stop rattling its tail. They counted the number of box openings until a snake stopped rattling;
fewer box openings means the snake was more quickly habituated. They repeated this experiment
on each snake on four successive days, which is the block variable. The table below is the data of 6
snakes that did become habituated on each day:
Snake ID Day 1 Day 2 Day 3 Day 4
S1 85 58 15 57
S2 107 51 30 12
S3 61 60 68 36
S4 22 41 63 21
S5 40 45 28 10
S6 65 27 3 16
Below is the Minitab output for the data above.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Snake 5 (II) 608.4 (IV) 0.338
Day 3 4878 (III) 3.32 0.049
Error (I) 7346 489.7
Total 23 15266
(a) Show that the total sum of squares is 15266.
(b) Find the values of (I), (II), (III) and (IV) in the Minitab output above.
(c) Test at 5% significance level that the mean number of trials before rattlesnakes stopped on
four successive days are different. (Reject 𝐻0 )

19. Twenty young cattle are assigned at random among four experimental groups. Each group is fed a
different diet. The data below are the cattle’s weight in kg after being raised on these diets for 10
months.
Feed 1 60.8 57.1 65.0 58.7 61.8
Feed 2 68.3 67.7 74.0 66.3 69.9
Feed 3 102.6 102.2 100.5 97.5 98.9
Feed 4 87.9 84.7 83.2 85.8 90.3
The ANOVA form the Minitab output for the data above is as follows.
Analysis of Variance

Source DF Adj SS Adj MS F-Value


Factor 3 4703.2 1567.73 206.72
Error 16 121.3 7.58
Total 19 4824.5

(a) Show that the total sum of squares for the data above is 4824.5?
(b) Is there sufficient evidence at 5% significance level to conclude that the mean weights for the
cattle are all the same for all four diets in this experiment?

7
STA408 Tutorial 4 Chapter 4: Analysis of Variance

20. A study is conducted to examine if the mean corn yield (measured in metric tons) depends on the
levels of manure and nitrogen-based fertilizer. The Minitab output below shows the Analysis of
Variance of the study done.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Fert 1 17.298 17.298 5.77 0.029
Manure 1 15.842 15.842 (iii) 0.035
Fert*Manure 1 3.872 3.872 1.29 0.273
Error 16 (i) (ii)
Total 19 85.012

(a) Find the missing values of (i), (ii) and (iii) in the ANOVA table above.
(b) Determine the experimental design used in this study. State the sample size for each group?
(c) Test at 5% significance level whether there is an interaction effect between the manure and
nitrogen-based fertilizer on the mean corn yield.
(d) Based on the result in (c), should a further test be conducted on the main effects? Explain.

21. A company that plans to introduce a new type of herbicide wants to determine which dosage
produces the best crop yield for cotton. Four fields are available for testing with each field having
fairly uniform characteristics although there are some differences between the fields. Each field is
divided into 6 equal sized plots with dosages of 5, 10, 15, 20, 25 and 30 units of herbicide assigned
to the plots at random. The yields are as shown in the Table below.
Dosage of herbicide
5 10 15 20 25 30
1 9.7 19.2 8.3 6.3 6.7 5.8
2 13.0 20.2 6.4 6.0 5.2 3.6
Field

3 15.6 17.7 6.1 4.0 3.1 4.1


4 7.1 11.8 1.7 0.5 0.8 0.2
The ANOVA table for the above data produced by Minitab is as follow.
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Dosage 5 (II) 122.329 48.39 0.000
Field (I) 127.12 42.374 (IV) 0.000
Error 15 37.92 (III)
Total 23 776.69
Assume that the all the assumptions for ANOVA are met, answer the questions below.
(a) Identify the response and block variables.
(b) Use an appropriate formula to show that the total sum of squares is 776.69.
(c) Find the missing values in the ANOVA table above.
(d) Is there sufficient evidence at 1% significance level to conclude a significant difference in the
true mean yield of cotton due to the use of different dosages of herbicide?

22. A food company wished to test four different package designs for a new breakfast cereal. Twenty
stores, with approximately equal sales volume, were selected as the experimental units. Each store
was randomly assigned one of the package designs, with each package design assigned to five
stores. A fire occur in one store during the study period, so this store had to be dropped from the
study. The data below are the number of cases of breakfast cereal sold by each store.
Package design 1 11 17 16 14 15
Package design 2 12 10 15 19 11
Package design 3 23 20 18 17
Package design 4 27 33 22 26 28

8
STA408 Tutorial 4 Chapter 4: Analysis of Variance

The ANOVA form the Minitab output for the data above is as follows.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Package 3 588.2 196.07 18.59 0.000
Error 15 158.2 10.55
Total 18 746.4

(a) Show that the total sum of squares for the data above is 746.4?
(b) Is there sufficient evidence at 5% significance level to conclude that the mean number of
cases for breakfast cereal sold are the same for the four different package designs?
(Answer: reject 𝐻0 )

23. A study was conducted to examine the effects of height of the shelf display (bottom, middle, top)
and width of the shelf display (regular, wide) on the sales of the bread at a bakery. The Minitab
output below shows the Analysis of Variance of the study done.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Display height 2 1544.00 772.00 (ii) 0.000
Display width 1 12.00 12.00 1.16 0.323
Display height*Display width 2 24.00 12.00 (iii) 0.375
Error 6 (i) 10.33
Total 11 1642.00

(a) Find the missing values of (i), (ii) and (iii) in the ANOVA table above.
(b) What is the experimental design used in this analysis. State the sample size for each group?
(c) Test at 5% significance level whether there is an interaction effect between the display height
and display width on the mean sales of bread.
(d) Based on the result in (c), should a further test be conducted before concluding on the main
effects? Explain.

24. An experiment was designed to study the performance of four different detergents in cleaning
clothes. The cleanness readings (higher  cleaner) as given in the table below were obtained with
specially designed equipment for three different types of common stains.
Detergent 1 Detergent 2 Detergent 3 Detergent 4
Stain 1 45 47 48 42
Stain 2 43 46 50 37
Stain 3 51 52 55 49

The ANOVA table for the above data produced by Minitab is as follow.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Detergent 3 110.92 (III) 11.78 0.006
Stain 2 (II) 67.583 (IV) 0.002
Error (I) 18.83 3.139
Total 11 264.92

Assume that the all the assumptions for ANOVA are met, answer the questions below.
(a) Identify the response and block variables.
(b) Show that the total sum of squares is 264.92.
(c) Find the missing values, (I), (II), (III) and (IV) in the ANOVA table above.
(d) Test at 2% significance level whether there is a significant difference in the mean cleanness
reading of clothes due to the use of different detergents?

9
STA408 Tutorial 4 Chapter 4: Analysis of Variance

25. Four machines are set up to produce alloy spacers for use in the assembly of microlight aircraft.
The spaces are supposed to be identical but the four machines give rise to the following varied
lengths in mm.
Machine A 46 54 48 46 56
Machine B 56 55 56 60 53
Machine C 55 51 50 51 53
Machine D 49 53 57 60 51
The ANOVA form the Minitab output for the data above is as follows.
Analysis of Variance

Source DF Adj SS Adj MS F-Value


Factor 3 100.0 33.33 2.54
Error 16 210.0 13.12
Total 19 310.0
(a) Show that the total sum of squares for the data above is 310.0?
(b) Test the null hypothesis at 5% significance level that the mean lengths produced by the four
machines are the same?

26. An experiment was performed to determine the effect of four different chemicals on the strength
of a fabric. These chemicals are used on five fabric samples as part of the permanent press finishing
process. The Minitab output below shows the Analysis of Variance for the study done.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Chemical Type 3 18.0440 6.01467 (iii) 0.000
Fabric Sample 4 6.6930 (ii) 21.11 0.000
Error 12 (i) 0.07925
Total 19 25.6880
(a) Find the missing values of (i), (ii) and (iii) in the ANOVA table above.
(b) Determine the experimental design used in this analysis. Identify the response, treatment
and block variables in this experiment.
(c) Test at 1% significance level whether there is a difference in the mean fabric strengths due
to use of different chemicals.

10

You might also like