ANOVA Ajay 29 11 21 - Copy

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

ANOVA

(Analysis of Variance)

Thanks to unknown
sources for sharing slides

ANOVA Ajay Gupta NIT Jalandhar 1


ANOVA: Introduction
• Many studies involve comparisons between more
than two groups of subjects.
• If the outcome is categorical (count) data, a Chi-
square test for a larger than 2 X 2 table can be used to
compare proportions between groups.
• If the outcome is numerical, ANOVA can be used to
compare the means between groups.
• ANOVA is an abbreviation for the full name of the
method: Analysis of Variance

Invented by R.A. Fisher in the 1920’s
ANOVA Ajay Gupta NIT Jalandhar 2
Why ANOVA instead of multiple t-tests?
• If you are comparing means between more than two
groups, why not just do several two sample t-tests to
compare the mean from one group with the mean
from each of the other groups?

Before ANOVA, this was the only option available to
compare means between more than two groups.
• The problem with the multiple t-tests approach is
that as the number of groups increases, the number
of two sample t-tests also increases.
• As the number of tests increases the probability of
making a Type I error also increases.
ANOVA Ajay Gupta NIT Jalandhar 3
ANOVA: a single test for
multiple comparisons
• The advantage of using ANOVA over
multiple t-tests is that ANOVA will identify if
any two of
the group means are significantly different with a
single test.
• If the significance level is set at 0.05, the
probability of a Type I error for ANOVA = 0.05
regardless of the number of groups being
compared.
• If the ANOVA F-test is significant, further
ANOVA Ajay Gupta NIT Jalandhar 4
comparisons can be done to determine which
ANOVA Hypotheses
The Null hypothesis for ANOVA is :
The means for all groups are equal:
Ho : μ1 = μ2 = μ3 = .... = μk

The Alternative hypothesis for ANOVA is

At least two of the means are not equal.


The test statistic for ANOVA is the ANOVA F- statistic.

ANOVA Ajay Gupta NIT Jalandhar 5


Analysis of Variance

ANOVA Ajay Gupta NIT Jalandhar 6


ANOVA: F-statistic
If variability between groups is large relative to
the variability within groups, the F-statistic will be
large.
If variability between groups is similar or
smaller than variability within groups, the F-
statistic will be small.
If the F-statistic is large enough, the null
hypothesis that all means are equal is rejected.

ANOVA Ajay Gupta NIT Jalandhar 7


Illustration of small F-statistic

Withi
n
Group 1 Group
mean
Group 2
mean
Overall
Group 3
mean
mea
n

Source: Introduction to the Practice of Statistics, Moore and McCabe


ANOVA Ajay Gupta NIT Jalandhar 8
Illustration of large F-statistic

Group 1 Within
mean group

Group 2
mean
Overal
lmea Group 3
n mean

Source: Introduction to the Practice of Statistics, Moore and McCabe

ANOVA Ajay Gupta NIT Jalandhar 9


Total variation in two parts
The difference of each observation from
the overall mean can be divided into two
parts:
– The difference between the observation and the
group mean
– The difference between the group mean and the
overall mean
ANOVA makes use of this partitioned
variability.
ANOVA Ajay Gupta NIT Jalandhar 10
Partitioned Variability
• Consider the following VERY small data set
Mean for each group

Group 1: 12 15 18
15

Group 2: 10 13 16
13

Group 3: 8 9 10 13
10
– Overall mean = 12.4
• Subject 3 in group 1 has value 18
– Difference between subject 3 and overall
mean: 18 – 12.4 = 5.6
ANOVA Ajay Gupta NIT Jalandhar 11
SST = SSW(E) + SSB
This partitioned relationship is also true for
the squared differences:
– The variability between each observation and the
overall (or grand) mean is measured by the ‘sum of
squares total’ (SST)
– The variability within groups is measured by the ‘sum
of squares within’ (SSW( E)).
– The variability between groups is measured by the
‘sum of squares between’ (SSB).

ANOVA Ajay Gupta NIT Jalandhar 12


Mean Square Within and
Mean Square Between
The mean squares are measures of the
average variability and are calculated from
the sum of squares divided by the degrees of
freedom.
MSW = SSW/(N-j)
– MSW has N-j degrees of freedom where N= total
number of observations and j= number of
groups
MSB = SSB / ( j-1)
– MSB = j-1 degrees of freedom where j= number
of groups ANOVA Ajay Gupta NIT Jalandhar 13
ANOVA Assumptions
• The observations are from a random sample and they are
independent from each other
• The observations are normally distributed within each
group
ANOVA is still appropriate if this assumption is not met when the
sample size in each group is at least 30
• The variances are approximately equal between groups
If the ratio of the largest SD / smallest SD < 2, this assumption
is considered to be met.
• It is not required to have equal sample sizes in all
groups.
ANOVA Ajay Gupta NIT Jalandhar 14
Sampling Distribution of
ANOVA F-statistic
The sampling distribution of the ANOVA F-statistic
is the F-distribution
– non-negative since all F-statistics are positive.
– indexed by two degrees of freedom
• Numerator df = number of groups minus 1 (j-1)
• Denominator df = total sample size minus number of
groups (N-j)
– The shape of the F-distribution varies depending on the
two degrees of freedom

ANOVA Ajay Gupta NIT Jalandhar 15


Adjustments for
Multiple
Comparisons
• When multiple comparisons are being done it is
customary to adjust the significance level of each
individual comparison so that the overall
experiment significance level remains at 0.05
• For an ANOVA with 3 groups, there are 3
combinations of t-tests.
• A conservative adjustment (Bonferroni
adjustment) is to divide 0.05/3 so that alpha for
each test = 0.017. Each comparison will be
significant if the p-value < 0.017
ANOVA Ajay Gupta NIT Jalandhar 16
Multiple Comparison Tests
• Bonferroni procedure
• Duncan Multiple range test
• Dunnett’s multiple comparison test
• Newman-Keuls test
• Scheffe’s test
• Tukey’s test
• Holm t-test
ANOVA Ajay Gupta NIT Jalandhar 17
Other ANOVA Procedures
• One-way ANOVA is Analysis of Variance for one
factor
• More than one factor can be used for a two, three
or four-way ANOVA
• A continuous variable can be added to the model –
this is Analysis of Covariance (ANCOVA)
• Repeated Measures ANOVA can handle replicated
measurements on the same observation unit
(subject)
ANOVA Ajay Gupta NIT Jalandhar 18
Steps for one way ANOVA

ANOVA Ajay Gupta NIT Jalandhar 19


Steps for one way ANOVA(Contd.)

ANOVA Ajay Gupta NIT Jalandhar 20


Steps for one way ANOVA(Contd.)

ANOVA Ajay Gupta NIT Jalandhar 21


• It is customary to summarize the results of ANOVA
in the form of ANOVA table given below

Source of DoF Sum of squares Mean Sum of F-Ratio


Variation squares

Between K-1 SSC


Samples

Within Samples N-K SSE

Total N-1 SST


ANOVA Ajay Gupta NIT Jalandhar 22
• Following are the yields obtained in Kgs. from three varieties of
wheat i.e. WL 711, WG 357 and 1562 sown in 14 plots

WL 711 12 13 14 13
WG 357 10 9 10 9 9
1562 13 14 13 12 14

Is there any significant difference in yield from three


varieties?

ANOVA Ajay Gupta NIT Jalandhar 23


H0 : Mean Yield of three varieties is same
H1 : Mean Yield of at least two varieties differ significantly

WL 711 WG 357 1562

12 144 10 100 13 169


13 169 9 81 14 196
14 198 10 100 13 169
13 169 9 81 12 144
9 81 14 196

ANOVA Ajay Gupta NIT Jalandhar 24


ANOVA Ajay Gupta NIT Jalandhar 25


ANOVA Table

Source of DoF Sum of squares Mean Sum of F-Ratio


Variation squares
Between Varieties 2 44.36

Within Varieties 11 6

Total 13 50.36

ANOVA Ajay Gupta NIT Jalandhar 26


F tab for (2,11) DOF at 5% level of significance is 4.26
Since Fcal > F tab
H0 is rejected

Conclusion : Mean Yield of at least two varieties differ


significantly

ANOVA Ajay Gupta NIT Jalandhar 27


Steps for two way ANOVA

ANOVA Ajay Gupta NIT Jalandhar 28


Steps for two way ANOVA(Contd.)

ANOVA Ajay Gupta NIT Jalandhar 29


Steps for two way ANOVA(Contd.)

ANOVA Ajay Gupta NIT Jalandhar 30


Steps for two way ANOVA(Contd.)

ANOVA Ajay Gupta NIT Jalandhar 31


• The ANOVA table is given below

Source of DoF Sum of squares Mean Sum of F-Ratio


Variation squares

Factor A r-1 SSA

Factor B c-1 SSB

Error (r-1)(c-1) SSE

Total rc-1 SST


ANOVA Ajay Gupta NIT Jalandhar 32
• A company appoints four salesmen P,Q,R,S and observes their
sales in three seasons i.e. summer, winter and monsoon. The
sales figures are given in the following table

Season Salesmen
P Q R S
Summer 13 16 16 14
Winter 17 16 17 16
Monsoon 13 14 15 15

Carry out The analysis of variance and comment


on the results obtained.

ANOVA Ajay Gupta NIT Jalandhar 33


Null hypothesis
H0: sales does not vary significantly among the seasons
H0: sales does not vary significantly among the salesmen
Alternative hypothesis
H1: there is a significant difference in sales during at least two
seasons
H1: there is a significant difference in sales among at least two
salesmen
To make the calculations easier, let us subtract 15 from each
observation
Season Salesmen Total
P Q R S
Summer -2 1 1 -1 -1
Winter 2 1 2 1 6
Monsoon -2 -1 0 0 -3
Total -2 1 3 0 G=2
ANOVA Ajay Gupta NIT Jalandhar 34

ANOVA Ajay Gupta NIT Jalandhar 35


Source of DoF Sum of squares Mean Sum of F-Ratio
Variation squares

11.17
Between Seasons 2

3 4.34
Between salesmen

6 6.16
Error

Total 11 21.67
ANOVA Ajay Gupta NIT Jalandhar 36
F tab for (2,6) and (3,6) DOF at 5% level of significance are
5.14 and 4.76
For effect of seasons
Since Fcal > F tab
H0 is rejected
Conclusion : sales varies significantly with seasons

For effect of salesmen


Since Fcal < F tab
H0 cannot be rejected
Conclusion : sales does not vary significantly among the
salesmen
ANOVA Ajay Gupta NIT Jalandhar 37

ANOVA Ajay Gupta NIT Jalandhar 38


ANOVA Ajay Gupta NIT Jalandhar 39


ANOVA Ajay Gupta NIT Jalandhar 40


ANOVA Ajay Gupta NIT Jalandhar 41


The ANOVA table is given below

Source of DOF Sum of squares Mean Sum of F-Ratio


Variation squares

Factor A p-1 SSA

Factor B q-1 SSB

Interaction AB SSAB
(p-1)(q-1)

Pq(r-1)
Error SSE

Total pqr-1 SST


ANOVA Ajay Gupta NIT Jalandhar 42
An experiment was conducted to study the effects of 4 different
varieties of Moong (M1,M2, M3, M4) and three different spacing
(S1,S2,S3) and also to see whether the varieties behave differently
at different spacing. Data on yield of three plots taken from each
variety spacing combination is given below. Carry out ANOVA on
this data.

Varieties Spacings

S1 S2 S3

M1 35,40,38 40,42,45 42,39,46

M2 42,48,45 50,47,42 52,41,50

M3 32,40,36 40,45,50 51,42,43

M4 51,50,48 54,52,55 50,56,48

ANOVA Ajay Gupta NIT Jalandhar 43


Null hypothesis
• H0: Different varieties produce same yield.
• H0: yield does not vary significantly with spacings
• H0: Interaction effect among varieties and spacings
is not significant
Alternative hypothesis
• H1: Different varieties produce different yield.
• H1: yield is significantly affected by spacings
• H1: Interaction effect among varieties and spacings
is significant

ANOVA Ajay Gupta NIT Jalandhar 44


To make calculations easier, subtract 45 from all
observations

Varieties Spacings

S1 S2 S3

M1 -10,-5,-7 -5,-3,0 -3,-6,1

M2 -3,3,0 -5,-2,-3 7,-4,5

M3 -13,-5,-9 -5,0,5 6,-3,-2

M4 6,5,3 9,7,10 5,11,3

ANOVA Ajay Gupta NIT Jalandhar 45


By taking the sum of all observations in each
cell, we get
Varieties Spacings Row totals

S1 S2 S3

M1 -22 -8 -8 -38

M2 0 -6 8 2

M3 -27 0 1 -26

M4 14 26 19 59

Column totals -35 12 20 G = -3

ANOVA Ajay Gupta NIT Jalandhar 46


ANOVA Ajay Gupta NIT Jalandhar 47


The ANOVA table is given below

Source of DOF Sum of squares Mean Sum of F-Ratio


Variation squares

3 622.53
Varieties

2 147.17
spacings

Interaction AB 6 121.72

24 337.33
Error

Total 35 1228.75

ANOVA Ajay Gupta NIT Jalandhar 48


Effect of Varieties
F tab for (3,24) DOF at 5% level of significance is 3.01
Since Fcal> F tab
H0 is rejected
Conclusion : Different varieties significantly affect the yield

Effect of spacings
F tab for (2,24) DOF at 5% level of significance is 3.40
Since Fcal> F tab
H0 is rejected
Conclusion :Different spacings significantly affect the yield

ANOVA Ajay Gupta NIT Jalandhar 49


Interaction effect
F tab for (6,24) DOF at 5% level of significance is 2.51
Since Fcal< F tab
H0 cannot be rejected

Conclusion : There is no significant interaction effect


between varieties and spacings. i.e.yield from a
particular variety does not change sharply at some
specific spacings as compared to other varieties.
ANOVA Ajay Gupta NIT Jalandhar 50

You might also like