Readings For Lecture 5,: S S N N S N

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Readings for Lecture 5,

Topic

Chapters

Theory of F Distribution

Pages

Primarily from Instructor

F test for Differences in two Variances

11

458-467

One Way ANOVA Completely Randomized Design- Fixed Effects Model


Assumptions, Model and Rational
Calculations

Instructor
12

476-488

Lecture5, Notes
F Distribution: The F distribution is a test statistic
used in all ANOVA (Analysis of Variance)
techniques. It is also used in all multiple regression
techniques. An F statistic is the ratio of two chisquare variables divided by their degrees of
freedom.
In the example below, I have begun with two chisquare variables divided by their degrees of
freedom. Notice, that in the second step, especially
if the sample sizes were huge, that one would
expect to get a value reasonably close to one.

n1 1 s12
s12
12
n1 1
12 s12
F

n2 1 s 22 s 22 s 22
22
22
n2 1
(1)

(2) (3)

The third step is only true if 1 2 were equal to 2 2. If


it were true, then again, one would expect to get a
value close to one, especially if the sample sizes
1

were huge. If, however, 1 2 were not equal to 2 2,


then the value one would get by calculating s12/s22
would of course deviate from one. Thus a null
hypothesis is associated with this test statistic, and
it is
Ho: 1 2 = 2 2 versus
Ha: 1 2 2 2
This looks like a two-tailed test procedure, but,
because we always put the larger sample variance
on top (call it s12), we only employ the upper tail of
the F distribution.
Example:
Men

( X i X m )

Wom
en

( X i X w )

60

100

80

70

78

80

100

82

80

=200
Df=2

s w2

=8
Df=3

8 2 200
sm
100
3
2
Notice, students, the sample size is

very small and even though the differences in the


sample variances are huge, we may not be able to
reject the null hypothesis of equality (because of
the small sample size, and thus large variability).
For this specific two tailed Ftest, the larger variance
is assigned to the numerator.
F 2,3

100 300

37.5
8/3
8
0.02
2

In this case, in order to find the critical F value, we


actually look up the F value under /2. This is due
to the fact that we always put the larger value of
the sample variance on the top. In this case we look
up in the F tables under 0.01, with 2 degrees of
freedom for the numerator, and three degrees of
freedom for the denominator. The critical value
would be 30.816. Thus we would reject the Null
hypothesis of equality.

F0.01=30.82

37.5

Now, dear students:


If you have a test of hypothesis as follows:
Ho: 1 2 2 2
Ha: 1 2 > 2 2
You would again place the value of s12 on top, and then
use the upper tail value of alpha, not alpha divided by two.
And again if you have a test of hypothesis as follows.
Ho: 1 2 2 2
Ha: 1 2 < 2 2
You would need to change it to
Ho: 2 2 1 2
3

Ha: 2 2 > 1 2 and then proceed by using the upper tail


value of alpha,
not alpha/2
ANOVA Analysis of Variance: We will begin our
study of ANOVAs by examining a Fixed Effects
Model, completely randomized design.
The null hypothesis associated with this ANOVA
technique might appear as follows:
Ho: 1= 2= 3
Ha: The null hypothesis is not true
The amazing thing is that we test the above
hypothesis by calculating the ratio of two
variances.
Ftest

sn2
2
sd

In a fixed effects model, we are only interested in


the specific treatments. For example, imagine that
we are interest in selling a product, and we are
thinking of packaging the product in one of three
manners. We are only thinking of these 3 packaging
designs. That is, these packaging designs are not a
sample of three packaging designs from a world of
many possible packaging designs.
When we use the phrase, completely randomized
design, that means that the samples are assigned
to the treatments at random. Imagine for example,
that we have 30 stores who are willing to
participate in the above study. We could accomplish
a completely randomized design by randomly
assigning ten of these stores to sell the product
using packaging design one, the next ten randomly
4

picked stores, assign them packaging design two,


and the last ten are stuck with packaging 3
treatment. In a one way ANOVA we may have
anywhere from two treatments to r treatments.

Example of hypothesis that could be tested.


Testing that:
1) Teaching Techniques are equal.
2) Marketing techniques are equal.
3) Manufacturing Techniques are equal.
2
12 22 32 e2 common
In an ANOVA technique, Ho may

not be true, but in order to test that hypothesis


about equality of means, the following
needs to be true.
That is, in order for us to test the hypothesis of equality of means, there
must exist equal population variances within each treatment group.
The model is Xij = + j + e ij and E(Xij) = + j . That is to say, the
average value in each treatment group is equal to the Grand mean plus
the treatment effect. Certainly there is variance around this mean. The
population variance (not necessarily the sample variance) around one
treatment mean is equal to the population variance around the other
treatment means. This variance term is labeled e2. An important
assumption of the Anova technique is that these error terms are normally
distributed with mean equal to zero and variance equal to e2. Notice,
there is no subscript j associated with the treatment group, that is the
amount of the variance is not an artifact of which treatment group we are
dealing with. All treatment groups have the same population variance.
You should know, however, that this test procedure is very robust in terms
of this assumption of equality of variances. That is, even if this assumption
is reasonable violated, the ANOVA procedure still yields appropriate
results.
Again, the Ftest that is used to test the hypothesis of equality of
population means is the following:

Ftest

sn2
sd2

The estimate of the variance in the denominator is


a valid estimate of the common variance ( e )
regardless of whether or not the hypothesis of
equality of means is equal. This estimate of the
common variance turns out to be nothing more than
the average of the sample variances if the sample
sizes are equal. If they are not equal, the estimate
of the common variance used in the denominator is
the weighted average of the sample variances
(where the sample variances are weighted by their
degrees of freedom) .
2

But the variance in the numerator is an estimate of


the common variance only if Ho is true (1 = 2=3)
How do we get the value of s2 in numerator? Recall
that
2x =

2
n

Therefore
2=n 2x

And thus:
= n s 2x where n is the sample size used to
obtain the treatment means. This assumes that all
treatment means are the same.
s2

Let us refer to the example below. We have three


sample means; all obtained using a sample size of
five. We obtain the sample variance of these
means; we then multiply by five, and we have an
estimate of the common variance.
6

But it is only an estimate of the common variance,


if these means came from a population with the
same mean. Notice the argument below. In the first
picture the sample means (using a specific sample
size) came from one population, and the only
reason the means differ is due to the random
nature of the variable itself which causes the
sample means to vary.
But in the second picture, the reason that the
sample means vary is two-fold. The sample means
come from different worlds, and there is variability
in each of those worlds.

Same Population

1
2
3

Different Populations

x2

x1

If this is the reality, then the above estimate of the common


variance will not be valid because the sample means come
from different populations.

Example 1
Treatment:

(1)

X1

(2)

X2

(3)

X3

20

30

40

10

100

20

100

30

100

30

100

40

100

50

100

15

25

25

25

35

25

25

25

35

25

45

25

Total

250

S 1=

250
=62 .5
4

250
=62 .5
4

250

250

S 2=

S 3=

250
=62 .5
4

S X =

200
=100
2

First: Find the estimate of the common variance


which is used in the denominator of the F ratio:

( n11 ) S21 + ( n2 1 ) S 22+ ( n31 ) S23


S =
( n1 1 ) + ( n21 )+ ( n31 )
2
P

1+ n 2+n 3
n
2
( n11 ) S 1 +( n2 1 ) S 22+ ( n31 ) S 23

=(4*62.5+4*62.5+4*62.5)/12=62.5
=(62.5+62.5+62.5)/3=62.5

since sample sizes are

equal.

Note: when we turn to the F table we have to have the


degrees of freedom used to estimate the common
variance in the denominator. In general that will be N-r,
where N is the total sample size and r is the number of

treatments. In this case the degrees of freedom are=


2+ n33
1+ n
n

= 12.

Now we have to obtain the estimate of the common


variance that is used in the numerator. Remember, this
estimate of the common variance is only a valid estimate
of e2 if the hypothesis of equality of means is equal.
= n s 2x = 5[(20-30)2 +(30-30)2 +(40-30)2 ]/(3-1)
= 5*(200/2) = 500
s

S 2n 500
Ftest = 2 =
=8
S d 62.5

For a ratio that is supposed to be close to

one, this value of 8 is quite large. How large would too


large be. We need to look up the F 0.05 value associated
with two degrees of freedom for the numerator and 12
degrees of freedom associated with the denominator.
F(=0.05,2,12)=3.89

P-value reject Ho of equality of means


More background
X ij =+ j+l ij X ij = ^ + ^
j + l^ij
X ij : Observation
^ : Estimate of the population mean
^
J : Estimate of the treatment effect=
sum of treatment effects equal 0
l^ij : estimate of an error term

^= X=30

^
1=10 ^2=0 ^3=10
X ij = ^ + ^
j + l^ij
^ +( X ij ^ )
= ^ + ( ^

10

^
^

To calculate the grand sum of squares, we subtract the


grand mean from each observation and square, and then
sum the squares. Note the following.
(1)

(2)

( X X )

( X X )

(3)

( X X )

X X

20

100

30

40

100

20

100

10

400

20

100

30

30

30

40

100

50

400

40

100

15

225

25

25

35

25

25

25

35

25

45

225

Total

750

250

750

200

The sum of squares total is equal to 1,750 = 750 +


250 + 750
The sum of squares total is equal to 1,750
Let us break the total sum of squares SST, down
into its two parts, SSW (sum of squares within) and
SSA (sum of squares among means)
r

nj

(X ij X )2=SST = [(
j=1 i=1

( X j X)
X ij X j + 2

= (X ij X j )2 + ( X j X )2 Note, the cross product is


equal to zero.
=SSW

SSA

r =3 (# of treatment groups)

SSA= ( X j X )2 =

2
( X j X)
nj
r

j=1

sizes are equal.


11

=n ( X j X )2 when the sample

Now remember SSA is equal to n ( X j X )2 given equal


samples sizes and turns out to be 5[(20-30)2 +(30-30)2
+(40-30)2 ] which is 1,000 and
SSW is (X ij X j )2 = 250 + 250 + 250 = 750
which are displayed in the table below

(1)

( X X 1)

(2)

( X X 2 )

(3)

( X X 3 )

X X

20

30

40

20

100

10

100

20

100

30

100

30

30

100

40

100

50

100

40

100

15

25

25

25

35

25

25

25

35

25

45

25

Total

250

250

SSW = 250 + 250 + 250 = 750


1000

250

200

SSA = 5* 200 =

SST(Total sum of squares)=SSW +SSA


=750+5200=1750
So students we have arrived at SST two ways, first by
calculating it directly, and then by calculating the sum of
SSW Plus SSA
12

Now, we have two independent estimates of the common


variance:
MSW and MSA

MSW =

( n11 ) S21 + ( n21 ) S 22+ ( n31 ) S23


SSW
=S2P=
N r
( n11 ) + ( n 21 ) + ( n3 1 )
1+ n 2+n 3
n
2
( n11 ) S 1 + ( n2 1 ) S 22+ ( n31 ) S 23

r =3
N=

2+ n3
1+ n
n

MSA=

( X j X )2
SSA
=n
=n S2X
r1
r 1

if the sample means are

from one population. Expected value of MSA = 2


only if there are no treatment effects

Source of
variation

Sum
of
squar
es

Degrees
of
freedom

Among
group

SSA

r-1

Within
group

SSW

N-r

Total

SST

N-1

13

Mean
square

SSA
r1

EMS

E(MSA)=
j

nj
r
1
2
e +

r1 j=1

SSW
N r

E(MSW)= 2e

F=

MSA
MSW

MSA: Mean Square Due to Treatment if Ho of equal


means is true
MSW: Mean Square Due to Error regardless if Ho of
equal means is true

Homework: Lecture 5
A. Dear Students: This is going to be a challenge
that will shape up both your excel skills as well as
your knowledge of one way ANOVA. I want you to
duplicate the simulation spreadsheet that I have up
on blackboard associated with this one way ANOVA
lecture. The spreadsheet that I want you to
duplicate is the spreadsheet labeled Fixed RN (RN
stands for Random Numbers). Note that the
formulas in the spread sheet are not displayed.
They were displayed in class in the spreadsheet I
used.
Please form three teams of equal size, (you have
done that) and construct this simulation. I want you
to be able to demonstrate its proper working in
class.
Construct the simulated ANOVA table as a team,
following carefully the procedure demonstrated in
class. Please email the excel sheet to me once you
are done, with your names on it.
B. Prove that ^ 2e = s 2p = MSW=

SSW
, note, it is
N r

not necessary to assume that n1=n2= nr. That


which you will obtain is a weighted average of the
sample variances, each sample variance weighted
by its degrees of freedom.
14

However, when one does assume that n1=n2 = . nr,


2
then s 2p = s 1

+ s 22

+ s 2r )/r Students, use

your notes.
1. Pg 467. # 19, 22, 23,24
2. P.468#25, 26, 29
Students for problems 3 through 6 do not use the
Tukey-Kramer procedure. Your work on each of
these problems is done after the conclusion
concerning the F test.
3. P.493#1
4. P.494#2, 3, 4, 5
5. P.495#9
6. P.496# 11
7.
(1)

( X X )

(2)

( X X )

(3)

( X X )

X X

20

30

40

20

100

10

100

20

100

30

100

30

30

100

40

100

50

100

40

100

15

25

25

25

35

25

25

25

35

25

45

25

Total

250
SSW

MSW= N r =?

250

250

200

SSA

MSA= r1 =? F=?

8. Complete ANOVA Table using the following data


set.

15

20

30

40

10

20

30

30

40

50

15

25

35

25

35

45

18

33

44

22

27

36

9) Students: I now want you to study the ANOVA


simulation that I demonstrated in class. Study both
the spreadsheet labeled Fixed RN, and RNs vary.
a) I want you to increase the treatment effects,
determine what will happen to the F value, and its
corresponding p value associated with the overall
hypothesis, and be able to explain why.
b) I want you to increase the standard deviation of
the error term. Now again, tell me what will happen
to the F value and the corresponding p value
associated with the overall hypothesis, and be able
to explain why.
c) Now, tell me why the sum of the real error terms
is not zero, and the sum of the estimate of the error
terms is equal to zero. Tell me what is the
relationship between the estimate of the treatment
effect and the mean of the error terms.

16

You might also like