Repeated Measure ANOVA - Between and Within Subjects
Repeated Measure ANOVA - Between and Within Subjects
Repeated Measure ANOVA - Between and Within Subjects
1. 2. 3. 4. 5. 6. 7. 8. Introduction Structural model, SS partitioning, and the ANOVA table Assumptions Analysis of omnibus ANOVA effects Contrasts Simple effects tests Final thoughts An example: Changes in bone calcium over time 11B-2 11B-3 11B-7 11B-14 11B-16 11B-25 11B-29 11B-30 11B-45 11B-62
Appendix: Extra examples 9. Example #1: 3(between) * 3(within) design 10. Example #2: 2(between) * 4(within) design
11B-1
2007 A. Karpinski
Multi-Factor Repeated Measures ANOVA Designs with on Between-Subjects and one Within-Subjects Factors 1. Introduction Lets start with a simple example: one between subjects factor and one within-subjects factor Imagine that our previous data on the effects of a test prep class did not come from pre- and post-test scores from the same participants, but instead were scores from two different groups of people. In this case, we randomly assigned people to either take a test prep class, or to not take the test prep class.
No Training Subscale1 Subscale2 Subscale3 42 42 48 42 48 48 48 48 54 42 54 54 54 66 54 36 42 36 48 48 60 48 60 66 54 60 54 48 42 54 46.2 51.0 52.8 Training Subscale1 Subscale2 Subscale3 48 60 78 36 48 60 66 78 78 48 78 90 48 66 72 36 48 54 54 72 84 54 72 90 48 72 78 54 66 78 49.2 66.0 76.2
Between-Subjects Comparison
Training No Subscale 1 X .11 = 46.2 Subscale of test Subscale 2 X .21 = 51.0 Subscale 3 X .31 = 52.8
X ..1 = 50.0
X ..2 = 63.8
Yes
X .12 = 49.2
X .1. = 47.7
X .22 = 66.0
X .2. = 58.5
Within-Subjects Comparison
11B-2
2007 A. Karpinski
2. Structural model, SS partitioning, and the ANOVA table To understand the structural model of a between and within design, lets start with the model of a design containing two within factors, and see what changes. We will consider the A factor the between subjects factor, and the B factor the within subjects factor
Yijk = + j + k + + ( ) jk + ( ) + ( ) + ( )
o is the grand mean of all scores o j is the effect of the between subjects factor
No Training 50.0 Training 63.8
12
Subscale1 46.2 49.2 Subscale2 51.0 66.0 Subscale3 52.8 76.2
13
21
22
23
o These fixed effect parameters are computed exactly the same as for a all between- or all within-subjects design.
11B-3
2007 A. Karpinski
o is the subject effect, but we have a subject effect for each level of A, the between-subjects factor. We refer to this as the subject effect within (each level of) A, ( j )
1
No Training 44 46 50 50 58 38 52 58 56 48 Training 62 48 74 72 62 46 70 72 66 66
Note that ( j ) measures how much the factor A effect varies by subject. We can think of the ( j ) terms as a measure of the error in the j effect. For this to work, we will need (1) = ( 2) ( ) is the interaction between subject and A. But subjects are not crossed with factor A. There are different subjects in each level of A. Thus, we cannot estimate this term. When a factor (Subjects) is not crossed with each level of another factor (A), but instead only appears within a single level of that factor (A), we say that subjects are nested within A
11B-4
2007 A. Karpinski
o ( ) is the interaction between subject and B. We will be able to estimate this term. Each subject gets each level of the within subjects factor. Because we have two groups of subjects, we will have two estimates of ( ) , one for each level. We refer to this as the subject by B effect within (each level of) A, ( ) ( j )
Note that ( ) ( j ) measures how much the factor B effect varies by subject. ( ) ( j ) is also a measure of the extent to which the A*B interaction varies by subject Thus, we can think of the ( ) ( j ) terms as a measure of the error in the k and jk effects. Again, for this to work nicely, we need
( ) (1) = ( ) ( 2 )
Finally, the ( ) effect is the three-way interaction between subject, A and B. But as we already noted, subjects are not crossed with factor A; subjects are nested within A. Thus, we cannot estimate how ( ) (i ) varies across subjects.
11B-5
2007 A. Karpinski
So we are left with the following model for a between (A) and within (B) factors design:
Yijk = + j + k + ( i ) + ( ) jk + ( ) ( i )
Lets look at the expected mean squares for each of the terms in the model to see if our intuitions about the error terms are correct:
Source Factor A Subjects/A (Between Error) Factor B A*B
B*Subjects/A (Within Error)
E(MS) nb 2 j 2 2 + b + a 1 2 + b 2
na k2
2 n jk
F MSA MS ( S / A)
2 + 2 +
b 1
MSB MS ( B * S / A)
+ +
2 2
(a 1)(b 1) 2 2 +
MSAB MS ( B * S / A)
ANOVA Table
Source Factor A SS SSA df (a-1) MS SSA a 1 SS ( S / A) N a F MSA MS ( S / A)
SS(S/A)
N-a
SSB SSAB
(b-1) (a-1)(b-1)
MSB MS ( B * S / A)
MSAB MS ( B * S / A)
SS(B*S/A)
(N-a)(b-1)
11B-6
2007 A. Karpinski
3. Assumptions for between and within factor designs Assumptions for between-subjects tests: These assumptions are identical to the assumptions for a one-way between-subjects ANOVA. o To conduct the omnibus test for the between subjects effect, assumptions are made on the marginal between-subjects means. Samples are independent and randomly drawn from the population Each group is normally (symmetrically) distributed All groups have a common variance o If you will perform simple effects tests on the between-subjects factor, then you need to make the following assumptions on the betweensubjects cell means at each level of the within-subjects factor. Each group is normally (symmetrically) distributed All groups have a common variance Assumptions for within-subjects tests: o When examining the model parameters, we noted that we needed the error terms to be equal in the two samples: (1) = ( 2) and
( ) (1) = ( ) ( 2 ) . To satisfy this assumption, we must have homogeneity
A1
12 2 2 23
2 13 13 12 1 23 = 12 22 23 2 2 13 23 3 3
A2
Homogeneity of variance/covariance matrices is required for any omnibus comparisons on the within-subjects marginal means or for omnibus interaction tests on between & within cell means. SPSS provides Boxs M test and Levines test as a check of homogeneity of variance/covariance matrices. If this assumption is violated, the omnibus tests may not be preformed for the main effect of the within-subjects effect or for the interaction between the within-subjects and between-subjects factor.
11B-7
2007 A. Karpinski
o If homogeneity of variance/covariance matrices is satisfied, then in order to conduct omnibus tests for the main effect of the within-subjects effect or for the interaction between the within-subjects and betweensubjects factor we must have: Sphericity of the pooled variance/covariance matrix. Normality of repeated measures (but we already checked this) Participants are independent and randomly selected from the population (but we already checked this) o If we wish to conduct simple effects tests for the effect of the repeated measures factor at each level of the between-subject factor, then we must have sphericity of the variance/covariance matrix for each between subjects group. Note that we do not need to have homogeneity of variance/covariance matrices in order to test this assumption. Testing assumptions: Normality o For all tests on the marginal within-subjects means and on the cell means, we need to check normality on a cell-by-cell basis.
EXAMINE VARIABLES=scale1 scale2 scale3 BY cond /PLOT BOXPLOT NPPLOT SPREADLEVEL /COMPARE VARIABLES.
1 00
90
80
70
8 13
Tests of Normality cond 1.00 2.00 1.00 2.00 1.00 2.00 Statistic .911 .897 .886 .869 .897 .892 Shapiro-Wilk df 10 10 10 10 10 10 Sig. .287 .202 .151 .097 .203 .180
60
16
50 SCALE1
6 16 2
scale1
SCALE2 SCALE3
10 10 10 10 10 10
40
scale2 scale3
30
N =
1 .0 0
2 .0 0
COND
11B-8
2007 A. Karpinski
o In order to conduct all tests on the between-subjects marginal means, we need the marginal means to be normally distributed. To test the marginal means, we must manually average across the repeated measures, compute the marginal effects, and conduct our usual tests for normality.
COMPUTE between = (scale1+scale2+scale3)/3. EXAMINE VARIABLES=between BY cond /PLOT BOXPLOT NPPLOT SPREADLEVEL.
80
A
70
70.00
A A A
60
between
60.00
A A A
50
16
50.00
A A A A A A
BETWEEN
40
30
N= 10 10
40.00
A
1.00
2.00
0.50
1.00
1.50
2.00
COND
cond
BETWEEN
To check homogeneity/sphericity, we will adopt a three-step approach o Check the equality of the variance/covariance matrices across the different samples o Check the sphericity of the pooled variance/covariance matrix (Overall sphericity) o Check the sphericity of the variance/covariance matrix for each group separately (Multi-sample sphericity)
11B-9
2007 A. Karpinski
To check the homogeneity of the variance/covariance matrices across the different samples, we use Boxs test of equality of the variance/covariance matrices and Levenes test of variances.
GLM scale1 scale2 scale3 by cond /WSFACTOR = scale 3 /PRINT = DESC HOMO.
A1
A2
Note that this test is not examining if the Var/Cov matrices are spherical, only if they are equal If we reject the null hypothesis, we can not pool the matrices to test within-subject effects (and we will need to consider alternative approaches to omnibus analyses).
Box's Test of Equality of Covariance Matrices Box's M F df1 df2 Sig. 5.682 .774 6 2347.472 .591
Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups.
We fail to reject null hypothesis, so we have no evidence that the variance/covariance matrices are unequal.
11B-10
2007 A. Karpinski
o Levenes test is a more focused test of the equivalence of only the variances A1 A2
12 12 2 2 12 2 = 12 2 2 2 13 23 3 13 23 3
Levene's Test of Equality of Error Variances SCALE1 SCALE2 SCALE3 F .635 .248 1.204 df1 1 1 1 df2 18 18 18 Sig. .436 .624 .287
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
This test only examines if the variances of the different groups are equal For subscale 1
2 2 2 H0 : 1 = 2 = ... = a 2 H1 : At least 1 i differs from the others F (1,18) = 0.64, p = .43
For subscale 2
For subscale 3
o If either Boxs test or any of Levenes Tests are significant, then we reject the assumption of homogeneity of the variance/covariance matrices. o In this case, we have no evidence to conclude that the matrices are different, so we may pool them and test for sphericity.
11B-11
2007 A. Karpinski
Next, we test for overall sphericity by averaging across the between subjects factor and examining the epsilon
Entire Sample Measure: MEASURE_1 Epsilon Within Subjects Effect SCALE Greenhous e-Geisser .961 Huynh-Feldt 1.000 Lower-bound .500
o In this case, the overall sphericity assumption is satisfied o We may conduct unadjusted omnibus tests on the within-subjects factors SCALE SCALE*CONDITION If we plan on conducting simple effects tests (of the within-subjects factor at each level of the between-subjects factor), then we need to examine the epsilon for each condition (the multi-sample sphericity).
temporary. select if cond=1. GLM scale1 scale2 scale3 /WSFACTOR = scale 3.
CONDITION #1 Measure: MEASURE_1 Epsilon Within Subjects Effect SCALE Greenhous e-Geisser .864 Huynh-Feldt 1.000 Lower-bound .500
o We conclude that, separately, the var/cov matrix for each condition is not spherical, but the violation is fixable If we want to conduct follow-up tests on each condition, we need to adjust all omnibus tests Overall we conclude that: o The var/cov matrix for condition 1 equals the var/cov matrix for condition 2 o When we combine the 2 conditions, the overall var/cov matrix is spherical o BUT the neither the var/cov matrix for condition 1 nor the var/cov matrix for condition 2 is spherical!
11B-12
2007 A. Karpinski
Remember that all this funny business of checking the var/cov matrix can be skipped if we avoid omnibus tests and stick to contrasts! If assumptions are violated: A recap i. If normality/symmetry is not satisfied: o All F-tests may be biased. o Try advanced non-parametric/distribution-free tests ii. If the variances are not equal between groups within each condition (Levenes test and boxplots suggest heterogeneity): o Then we cannot conduct between-subjects tests that require equal variances (omnibus tests and/or standard contrasts).
o Test all between-subject contrasts with unequal variance contrasts. o Test all between-subject omnibus tests with the Brown-Forsyth F* Test.
iii. If variance/covariance matrices are not equal across all groups (Boxs M is significant or Levenes test suggests heterogeneity): o Then we cannot pool var/cov matrices over the between-subjects groups. o The omnibus within-subject error term (used to test within-subject effects and between/within interactions) is not valid. o Use the MANOVA approach for omnibus tests of within-subject effects and
between/within interactions OR use contrasts for between/within tests.
iv. If sphericity of the combined variance/covariance matrix (Overall sphericity) is violated: o Note: If assumption (iii.) is violated, then we cannot pool the var/cov matrices and this assumption is automatically violated. o The omnibus within-subject error term (used to test within-subject effects and between/within interactions) is not valid. o If violation is moderate, use epsilon-adjusted omnibus tests or contrasts. o If violation is extreme, use contrasts for between/within tests or the MANOVA
approach for omnibus tests of within-subject effects and between/within interactions.
v. If the sphericity of the variance/covariance matrix for each group separately (Multi-sample sphericity) is violated: o Note: If assumption (iii.) is violated, then this assumption may still be satisfied. o The omnibus within-subject error term calculated at each level of the between subject factor (simple effects of the within-subjects factor at each level of the between subjects factor) is not valid. o If violation is moderate, use epsilon-adjusted simple effect omnibus tests or contrasts. o If violation is extreme, use the MANOVA approach for simple effect omnibus tests of
within-subject effects or contrasts. 2007 A. Karpinski
11B-13
4. Analysis of omnibus ANOVA effects Partial eta-squared is a measure of percentage of the variance accounted for (in the sample) that can be used for omnibus tests:
(2Effect ) =
2 A =
SS A SS A + SS S / A
Training No Yes
In this case, we may conduct unadjusted within subjects tests (see p. 11B-12)
GLM scale1 scale2 scale3 by cond /WSFACTOR = scale 3 /PRINT = DESC.
Tests of Within-Subjects Effects Measure: MEASURE_1 Source SCALE Type III Sum of Squares 2899.200 2899.200 2899.200 2899.200 1051.200 1051.200 1051.200 1051.200 849.600 849.600 849.600 849.600 df 2 1.921 2.000 1.000 2 1.921 2.000 1.000 36 34.586 36.000 18.000 Mean Square 1449.600 1508.872 1449.600 2899.200 525.600 547.091 525.600 1051.200 23.600 24.565 23.600 47.200 F 61.424 61.424 61.424 61.424 22.271 22.271 22.271 22.271 Sig. .000 .000 .000 .000 .000 .000 .000 .000
SCALE * COND
Error(SCALE)
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
2 Scale =
2 Time *Scale =
Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Intercept COND Error Type III Sum of Squares 194256.600 2856.600 3682.800 df 1 1 18 Mean Square 194256.600 2856.600 204.600 F 949.446 13.962 Sig. .000 .002
2 Condition =
11B-14
2007 A. Karpinski
o Tests of the within subjects factors: The main effect of scale: F (2,36) = 61.42, p < .01, 2 = .77 Collapsing across level of training, there are significant differences in the scores to the three sub-scales of the test The scale by training interaction: F (2,36) = 22.27, p < .01, 2 = .55 The effect of training is not the same for each subscale of the test o Tests of the between subjects factors: The main effect of training: F (1,18) = 13.96, p < .01, 2 = .44 Averaging across subscales, those who received training performed better than those who do not receive training. To interpret effects, you use the same logic outlined for factorial ANOVA. Start with the highest order significant (or important) effect. Interpret lower order effects only if they are meaningful. o In this case, we have a significant scale by training interaction. We could follow-up this result with simple effect tests.
11B-15
2007 A. Karpinski
X.j c2 j n
SS MSE
MSE
F (1, df ) =
2 SS = c2 j
o For between-subjects tests on the marginal means: If the homogeneity of variances assumption is satisfied, then MSE will be the between-subjects error term, MSE = MSS / A (with df = N-a). If the homogeneity of variances assumption is not satisfied, then we can use the unequal variance test for contrast (Welshs test). o For between-subjects tests within one level of the within-subject factor: If the homogeneity of variances assumption is satisfied at that level of the within subjects factor, then MSE will be the between-subjects error term, MSE = MS S / A j (with df = N-a). If the homogeneity of variances assumption is not satisfied, then we can use the unequal variance test for contrast (Welshs test). o For within subjects tests (either on marginal within-subjects means or on the between*within cell means): MSE will be a contrast-specific error term (with df = N-a). If the data are spherical, then we could use an omnibus error term. For contrasts on the marginal within-subjects means or on the between/within cell means, use the omnibus within-subjects error term, MSE = MSB * S / A (with df = (b-1)(N-a)). However, I recommend that you always use the contrast-specific error term.
o Note that all contrasts should have df = N-a.
(Unless for some reason you decide to use the omnibus error term for within-subject or between*within contrasts.)
11B-16
2007 A. Karpinski
Effect sizes for contrasts o Partial eta-squared is a measure of percentage of the variance accounted for (in the sample) that can be used for contrasts:
2 Contrast =
SS Contrast
SS Contrast + SS ErrorTermForContrast
o For contrasts (except maybe polynomial trends), we can also compute a d as a measure of the effect size, just as we did for the paired t-test.
= d
Where: is the average value of the contrast of interest is the standard deviation of the contrast values For between-subject contrasts, we can compute d directly from the tstatistic:
= 2*t d df
o For all contrasts, we can also compute an r as a measure of the effect size.
2 t Contrast = 2 r = t Contrast + df contrast
To perform contrasts on the between subjects marginal means, you need to compute an average across the within subjects factor. Between-Subjects Marginal Means
Training No Yes Subscale 1 X .11 = 46.2 X .12 = 49.2 X .1. = 47.7 Subscale of test Subscale 2 X .21 = 51.0 X .22 = 66.0 X .2. = 58.5 Subscale 3 X .31 = 52.8 X .32 = 76.2 X .3. = 64.5
11B-17
2007 A. Karpinski
o To run a test on the marginal between-subject means, we need to compute a new variable and then run an ANOVA (or t-test).
COMPUTE between = (scale1+scale2+scale3)/3. T-TEST GROUPS=cond(1 2) /VARIABLES=between .
Group Statistics cond 1.00 2.00 N 10 10 Mean 50.0000 63.8000 Std. Deviation 6.39444 9.77298 Std. Error Mean 2.02210 3.09049
between
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -21.55920 -21.64936 -6.04080 -5.95064
Sig. .250
t -3.737 -3.737
df 18 15.512
d=
2*t df
2 * 3.737 18
= 1.76
2 =
SS Contrast
Individuals in the training condition performed better than intervals without training: t (18) = 3.73, p < .01, d = 1.76 This test is identical to the main effect of training obtained from the repeated measures analysis, F (1,18) = 13.96, p < .01, 2 = .44 o In this example, the between subjects factor has only two levels so follow-up tests are unnecessary. If the between subjects factor had more than two levels, you could use the CONTRAST command to test the between subjects contrasts. If the between-subjects variances are unequal, you can use unequal variance contrasts. You may need to adjust the p-value of the tests, depending on whether the tests are planned or post-hoc.
11B-18 2007 A. Karpinski
Within-Subjects Marginal Means o The easiest approach to conducting contrasts on the within subjects marginal means is to use SPSSs built in contrasts: Specify a type of contrast on the within-subject factor using the WSFACTOR subcommand: To test if subscale 2 differs from subscale3:
GLM scale1 scale2 scale3 by cond /WSFACTOR = scale 3 helmert.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source SCALE SCALE * COND Error(SCALE) SCALE Level 1 vs. Later Level 2 vs. Level 3 Level 1 vs. Later Level 2 vs. Level 3 Level 1 vs. Later Level 2 vs. Level 3 Type III Sum of Squares 3808.800 720.000 1312.200 352.800 621.000 871.200 df 1 1 1 1 18 18 Mean Square 3808.800 720.000 1312.200 352.800 34.500 48.400 F 110.400 14.876 38.035 7.289 Sig. .000 .001 .000 .015
2 =
SS Contrast
We want to conduct tests on the marginal scale means (average across condition), so we need to read the line labeled SCALE Averaging across level of training, we find that scores on scale 3 are higher than scores on scale 2, F (1,18) = 14.87, p < .01, 2 = .45
11B-19
2007 A. Karpinski
o An alternative approach to conducting contrasts on the within subjects marginal means is to use the special command:
GLM scale1 scale2 scale3 by cond /WSFACTOR = scale 3 special ( 1 1 1 0 -1 1 -1 0 1).
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source SCALE SCALE * COND Error(SCALE) SCALE L1 L2 L1 L2 L1 L2 Type III Sum of Squares 720.000 5644.800 352.800 2080.800 871.200 986.400 df 1 1 1 1 18 18 Mean Square 720.000 5644.800 352.800 2080.800 48.400 54.800 F 14.876 103.007 7.289 37.971 Sig. .001 .000 .015 .000
The contrast labeled SCALE L1 gives us the same results as the previous analysis o If we try to create a new variable reflecting the contrast, and run a t-test, we get an incorrect result because the between-subjects factor is no longer included in the analysis (and we are ignoring the fact that we have different groups of participants). You should not use this method.
compute c1 = scale3 - scale2. T-TEST /TESTVAL=0 /VARIABLES=c1.
One-Sample Test Test Value = 0 95% Confidence Interval of the Difference Lower Upper 2.2436 9.7564
C1
t 3.343
df 19
When we convert this to an F-value, F(1,19) = 11.18, p = .003 The degrees of freedom are off by one, and this method uses a slightly different error term because this method of analysis completely drops between-subjects factor from the analysis. o Again, depending on the nature of these tests, the p-values may need adjustment.
11B-20
2007 A. Karpinski
To conduct contrasts on the between subjects by within subjects cell means, SPSS makes the task difficult. o To compute a between/within contrast in SPSS, we must be able to write the contrast as an interaction contrast (a difference of differences). Suppose we want to examine if the difference between scores on subscale 2 and subscale 3 depends on training:
Subscale1 No Training Training Subscale2 1 -1 Subscale3 -1 1
o Method #1: Use brand-name contrasts * condition tests. This contrast is a test of whether the (scale3 scale2) contrast differs by condition. The effect of scale3 scale2 for no training: NoTrain3 NoTrain 2 The effect of scale3 scale2 for training: Train 3 Train 2 Do these effects differ?
: ( NoTrain 3 NoTrain 2 ) ( Train 3 Train 2 ) : NoTrain 3 NoTrain 2 Train 3 + Train 2
I can obtain this contrast from SPSS by asking for an interaction between the (scale3 scale2) contrast on the marginal scale means and a (Training No Training) contrast on the marginal training condition means
Subscale1 No Training Training -1 1 Subscale2 Subscale3 -1 1
11B-21
2007 A. Karpinski
This test result was printed when we asked for the Helmert contrasts:
GLM scale1 scale2 scale3 by cond /WSFACTOR = scale 3 helmert.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source SCALE SCALE * COND Error(SCALE) SCALE Level 1 vs. Later Level 2 vs. Level 3 Level 1 vs. Later Level 2 vs. Level 3 Level 1 vs. Later Level 2 vs. Level 3 Type III Sum of Squares 3808.800 720.000 1312.200 352.800 621.000 871.200 df 1 1 1 1 18 18 Mean Square 3808.800 720.000 1312.200 352.800 34.500 48.400 F 110.400 14.876 38.035 7.289 Sig. .000 .001 .000 .015
2 =
SS Contrast
11B-22
2007 A. Karpinski
o Method #3: Manually compute the main effect contrast of interest, and run a t-test comparing that variable across levels of training:
compute c1 = scale3 - scale2. UNIANOVA c1 by cond.
Tests of Between-Subjects Effects Dependent Variable: C1 Source Corrected Model Intercept COND Error Total Corrected Total Type III Sum of Squares 352.800a 720.000 352.800 871.200 1944.000 1224.000 df 1 1 1 18 20 19 Mean Square 352.800 720.000 352.800 48.400 F 7.289 14.876 7.289 Sig. .015 .001 .015
c1
Independent Samples Test t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -14.93654 -1.86346
c1
t -2.700
df 18
d=
2*t
2 * 2.7
We conclude that the difference between scale 2 and scale 3 scores differs as a result of training. The difference between scores on scale 2 and scale 3 becomes larger after training
11B-23
2007 A. Karpinski
o For contrasts that are not differences across levels of the betweensubjects factor, more advanced techniques are required (but it is not clear that you should be running these types of contrasts. How would you interpret this?).
No Training Training Subscale1 -1 Subscale2 1 -1 Subscale3 1
o If the between-subjects factor has more than two levels, then testing between/within contrasts is trickier (see example 1).
Subscale1 No Training Type A Training Type B Training Subscale2 1 -1 Subscale3 -1 1
o If these contrasts are post-hoc and need adjustment, follow the adjustment procedures for factorial designs (using the appropriate error term and error degrees of freedom).
11B-24
2007 A. Karpinski
6. Simple Effects Tests To conduct simple effects (of the between-subjects factor at each level of the within-subjects factor), we can run between-subject analyses on each scale. o We want to compute an error term based only on the within-subject information that is test-specific. Thus, it is acceptable to run separate tests on each subscale o The variances of the training conditions are equal for each subscale (recall the Levenes tests, p. 11B-11), so standard tests may be conducted.
Training No Yes Subscale 1 X .11 = 46.2 X .12 = 49.2 Subscale of test Subscale 2 X .21 = 51.0 X .22 = 66.0 Subscale 3 X .31 = 52.8 X .32 = 76.2
SCALE1
SCALE2
SCALE3
Between Groups Within Groups Total Between Groups Within Groups Total Between Groups Within Groups Total
11.598
.003
27.543
.000
2 Scale 1 =
45 1125 2737.8 2 2 = .04 Scale = .39 Scale = .60 2 = 3 = 45 + 997 1125 + 1746 2737.8 + 1789.2 .05 p crit = = 0.0167 3
There is no effect of training on performance on subscale 1, F(1,18) = 0.81, ns. F (1,18) = 0.82, ns, 2 = .04 For subscales 2 and 3, training improves performance, F (1,18) = 11.60, p < .05, 2 = .39 , and F (1,18) = 27.54, p < .05, 2 = .60 , respectively. o These contrasts could also be run as t-tests.
T-TEST GROUPS = cond(1 2) /VARIABLES = scale1 scale2 scale3.
11B-25
2007 A. Karpinski
To conduct simple effects (of the within-subjects factor at each level of the between-subjects factor), we can run within-subject analyses at each level of training. o These are omnibus within-subjects tests. An epsilon adjustment is required for each test. o The variance/covariance matrices for training and no training conditions are equal. Thus, we would like to pool information from both betweensubject conditions to calculate the error term (in order to increase power and the precisions of the estimate of the error term).
Training No Yes Subscale 1 X .11 = 46.2 X .12 = 49.2 Subscale of test Subscale 2 X .21 = 51.0 X .22 = 66.0 Subscale 3 X .31 = 52.8 X .32 = 76.2
o If we analyze the training and no training groups separately, the error terms will only contain information from the training and no training groups, respectively (Note that this procedure would be acceptable if the variances between the training and no training groups were unequal) Thus, unless the between-subjects variances are unequal, we should avoid doing the following:
Temporary. Select if cond = 1. GLM scale1 scale2 scale3 /WSFACTOR = scale 3. Temporary. Select if cond = 2. GLM scale1 scale2 scale3 /WSFACTOR = scale 3.
Each of these tests will only have n-1 degrees of freedom (assuming equal n per group), rather than N-a. Thus, with this approach, we lose power and accuracy (assuming homogeneity of variance)
11B-26
2007 A. Karpinski
However, we can select each group separately to obtain the sum of squares for the simple effects tests. We can then manually compute tests for the effect of time for training and no training groups separately using the omnibus within-subjects error term:
(a 1), ( N a)(b 1)] = F [ MS Scale ( No Training Only ) MS Scale*Subject / Training
From the full within-subjects omnibus tests we can obtain the appropriate epsilon correction and error mean squares.
Mauchly's Test of Sphericity Measure: MEASURE_1 Epsilon Within Subjects Effect scale Greenhous e-Geisser .961 Huynh-Feldt 1.000 Lower-bound .500
Tests of Within-Subjects Effects Measure: MEASURE_1 Source scale Type III Sum of Squares 2899.200 2899.200 2899.200 2899.200 1051.200 1051.200 1051.200 1051.200 849.600 849.600 849.600 849.600 df 2 1.921 2.000 1.000 2 1.921 2.000 1.000 36 34.586 36.000 18.000 Mean Square 1449.600 1508.872 1449.600 2899.200 525.600 547.091 525.600 1051.200 23.600 24.565 23.600 47.200 F 61.424 61.424 61.424 61.424 22.271 22.271 22.271 22.271 Sig. .000 .000 .000 .000 .000 .000 .000 .000
scale * cond
Error(scale)
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
From the within-subjects omnibus tests at each level of the between subjects factor, we can obtain the value of epsilon, and the appropriate numerator mean squares.
SORT CASES BY cond . SPLIT FILE LAYERED BY cond . GLM scale1 scale2 scale3 /WSFACTOR = scale 3. SPLIT FILE OFF.
11B-27
2007 A. Karpinski
Mauchly's Test of Sphericity Measure: MEASURE_1 Epsilon cond 1.00 2.00 Within Subjects Effect scale scale Greenhous e-Geisser .864 .776 Huynh-Feldt 1.000 .907 Lower-bound .500 .500
Tests of Within-Subjects Effects Measure: MEASURE_1 cond 1.00 Source scale Error(scale) 2.00 scale Error(scale) Type III Sum of Squares 232.800 232.800 415.200 415.200 3717.600 3717.600 434.400 434.400 df 2 1.727 18 15.544 2 1.551 18 13.960 Mean Square 116.400 134.792 23.067 26.711 1858.800 2396.735 24.133 31.117 F 5.046 5.046 Sig. .018 .024
Sphericity Assumed Greenhouse-Geisser Sphericity Assumed Greenhouse-Geisser Sphericity Assumed Greenhouse-Geisser Sphericity Assumed Greenhouse-Geisser
77.022 77.022
.000 .000
We are conducting two simple effects tests, and thus, we need to apply a p-value correction.
p crit = .05 = 0.025 2
o Conclusions: For the no training condition, we find a significant difference in performance over the three subscales, F (2,36) = 4.93, p < .05, 2 = .22 . For the training condition, we find a significant difference in performance over the three subscales, F (2,36) = 78.86, p < .05, 2 = .81 . Further pairwise tests must be conducted to understand these differences.
11B-28
2007 A. Karpinski
7. Final thoughts The approach to repeated measures that we have studied is known as the univariate approach. We assumed that all the differences of all the repeated measures were drawn from the same population. This assumption led us to a restrictive assumption on the covariance matrix and correlation matrix
2 c c c
c 2 c c
c c 2 c
c c c 2
Other approaches are possible, and if omnibus tests are called for are usually preferable. One approach is to assume that each difference of variables is drawn from a different population. This approach is known as the multivariate approach and leads to no assumptions on the covariance/correlation matrix.
12 12 13 14 2 12 2 23 24 2 13 23 3 34 2 14 24 34 4 1 12 13 14
12
1
13 23
1
23 24
14 24 34
1
34
More recently, people have begun trying to model the structure of the variance covariance matrix:
1 1 2 3
1
1
2 1
1
1 2
3 2 1
1
o This approach is complicated, but it has much appeal if you Have missing observations Have unequal spacing in your repeated measurements Are interested in the variance components
11B-29
2007 A. Karpinski
8. An example: Changes in bone calcium over time (2 * 4) A diet/exercise treatment was developed to stop bone calcium loss in women. A sample of older women was obtained and the women were placed in either a control group (n = 15) or a treatment group (n = 16). Bone calcium levels were obtained by photon absorptiometry readings of the dominant ulna bone at the time of enrollment in the study and at one year, two year, and three year follow-ups. Investigators were interested in: o Whether the treatment group had less bone loss than the control group. o Whether the rate of bone loss differs between the treatment group and the control group. The following data were obtained:
Control Group Baseline 1 Year 2 Year
87.3 59.0 76.7 70.6 54.9 78.2 73.7 61.8 85.3 82.3 68.6 67.8 66.2 81.0 72.3 86.9 60.2 76.5 76.1 55.1 75.3 70.8 68.7 84.4 86.9 65.4 69.2 67.0 82.3 74.6 86.7 60.0 75.7 72.1 57.2 69.1 71.8 68.2 79.2 79.4 72.3 66.3 67 86.8 75.3
3Year
75.5 53.6 69.5 65.3 49.0 67.6 74.6 57.4 67.0 77.4 60.8 57.9 56.2 73.9 66.1
3Year
81.2 60.6 75.2 66.7 54.2 68.6 71.6 64.1 70.3 67.9 65.9 48.0 51.5 68.0 65.7 53.0
Time Group Control Treatment Baseline 72.38 69.23 70.75 Year 1 73.29 70.66 71.93 Year 2 72.47 71.18 71.81 Year 3 64.79 64.53 64.65 70.73 68.90
11B-30
2007 A. Karpinski
Control Treatment
First, lets consider how we might test the hypotheses. o Question #1: Does the treatment group have less bone loss than the control group?
o Question #2: Is the rate of bone loss different between the treatment group and the control group?
11B-31
2007 A. Karpinski
Next, lets test all the assumptions for this model. o Normality
EXAMINE VARIABLES= baseline year1 year2 year3 BY group /PLOT BOXPLOT NPPLOT SPREADLEVEL /COMPARE VARIABLES.
baseline year1 year2 year3
90
80
70
60
baseline year1
50
year2
1.00 2.00
year3
group
If we want to perform tests on the marginal group means, then we should check normality on the marginal group means. (Would tests on the marginal group means make sense?)
COMPUTE average = SUM(baseline,year1,year2,year3)/4. EXAMINE VARIABLES= average BY group /PLOT BOXPLOT SPREADLEVEL.
90.00
80.00
average
70.00
60.00
Tests of Normality group Control Treatment Statistic .968 .950 Shapiro-Wilk df 15 16 Sig. .827 .497
average
group
2007 A. Karpinski
Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups.
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
We have no evidence that the variance/covariance matrices are different across the two treatment groups. This assumption is satisfied. We may average the data from the groups together to test within subject effects (marginal time means and time by group interaction effects) Homogeneity of variances for between group tests. Necessary for equal variance tests of all between group effects.
GLM baseline year1 year2 year3 BY group /WSFACTOR = time 4 Polynomial /PRINT = DESCRIPTIVE HOMOGENEITY.
Levene's Test of Equality of Error Variances baseline year1 year2 year3 F .076 .042 .163 .013 df1 1 1 1 1 df2 29 29 29 29 Sig. .784 .839 .689 .911
average
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean
We do not have any evidence that the variances are different across the two groups. We may conduct all between group tests under the assumption that the variances between groups are equal.
11B-33
2007 A. Karpinski
Overall sphericity (averaging over the between subjects factor). Necessary for omnibus tests on the marginal time means and for omnibus time*group interaction tests (Are these tests meaningful?)
GLM baseline year1 year2 year3 BY group /WSFACTOR = time 4 Polynomial /PRINT = DESCRIPTIVE HOMOGENEITY.
Measure: MEASURE_1 Epsilon Within Subjects Effect time Greenhous e-Geisser .911 Huynh-Feldt 1.000 Lower-bound .333
The data are spherical. We can conduct omnibus tests for the within-subject effect (time) or for between/within subject interactions (group*time). Multi-sample sphericity: sphericity within each group/treatment level (the between subjects factor). Necessary for simple effect omnibus tests for the effect of time for the treatment group and the effect of time for the control group (Are these tests meaningful?)
SORT CASES BY group . SPLIT FILE LAYERED BY group . GLM baseline year1 year2 year3 /WSFACTOR = time 4 Polynomial /PRINT = DESCRIPTIVE HOMOGENEITY. SPLIT FILE OFF.
Measure: MEASURE_1 Epsilon group Control Treatment Within Subjects Effect time time Greenhous e-Geisser .879 .779 Huynh-Feldt 1.000 .932 Lower-bound .333 .333
Within each treatment level, the data are not spherical, but the violation is fixable. We can conduct epsilon-adjusted simple effect omnibus tests for the within-subject effect (time) at each level of the between-subjects factor (group).
11B-34
2007 A. Karpinski
Time Group Control Treatment Baseline 72.38 69.23 70.75 Year 1 73.29 70.66 71.93 Year 2 72.47 71.18 71.81 Year 3 64.79 64.53 64.65 70.73 68.90
o Conclusions from tests of assumptions: Tests of between subjects effects We may perform an omnibus test (and/or standard contrasts) on the marginal between-subjects (group) means. We may perform standard simple-effects tests (in this case, contrasts) for the effect of the between-subjects factor (group) at each level of the within-subjects factor (time). Tests of within subjects effects We may perform standard omnibus tests on the marginal withinsubjects (time) effect and on the between/within (group by time) interaction. Within each group level, the data are not spherical, but the violation is fixable. We can conduct epsilon-adjusted simple effect omnibus tests for the within-subject effect (time) at each level of the betweensubjects factor (group). Question#1: Does the treatment group have less bone loss than the control group? o We can perform tests of the effect of group at each year. The hypotheses about the rate of bone loss are more important those will be our planned tests. Thus, we will consider these four pairwise comparisons to be posthoc tests.
74 72 Bone Calcium 70 68 66 64 Baseline Year 1 Year 2 Year 3
Control Treatment
11B-35
2007 A. Karpinski
o The easiest way to run these tests is as 4 separate independent samples ttests.
T-TEST GROUPS = group(1 2) /VARIABLES = baseline year1 year2 year3.
Group Statistics group Control Treatment Control Treatment Control Treatment Control Treatment N 15 16 15 16 15 16 15 16 Mean 72.3800 69.2313 73.2933 70.6563 72.4733 71.1813 64.7867 64.5313 Std. Deviation 9.59786 9.89186 9.43803 10.02975 8.47884 9.29245 8.68586 9.02306 Std. Error Mean 2.47816 2.47297 2.43689 2.50744 2.18923 2.32311 2.24268 2.25577
Independent Samples Test t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -4.01876 10.31626 -4.52862 9.80278 -5.25645 7.84062 -6.25851 6.76934
df 29 29 29 29
d baseline =
2 * 0.898 29
.33 d Year1 =
2 * 0.753 29
.29 d Year 2 =
2 * 0.404 29
= .15 d Year 3 =
2 * 0.080 29
.03
We should apply a Tukey HSD post-hoc correction to these tests. Because none of these tests are significant, it is not necessary to do the calculations, we can report the tests are non-significant with the Tukey HSD procedure. However, for completeness, here is the correction:
t crit = q (1 ,8,29) 2
4.613 2
= 3.26
Applying a Tukey HSD correction to these pairwise tests, we find: No evidence that the treatment and the control group differed in their calcium bone density at baseline, t (29) = 0.90, ns, d = .33 . No evidence that the treatment and the control group differed in their calcium bone density at the one year follow-up, t (29) = 0.75, ns, d = .29 . No evidence that the treatment and the control group differed in their calcium bone density at the two year follow-up, t (29) = 0.40, ns, d = .15 . No evidence that the treatment and the control group differed in their calcium bone density at the two year follow-up, t (29) = 0.08, ns, d = .03 .
11B-36
2007 A. Karpinski
Question #2: Is the rate of bone loss different between the treatment group and the control group? o We can test for: (Downward) polynomial trends in the control condition, (Downward) polynomial trends in the treatment condition, And for differences in the polynomial trends between the groups.
Control 74 72 Bone Calcium 70 68 66 64 Baseline Year 1 Year 2 Year 3
Bone Calcium 74 72 70 68 66 64 Baseline Year 1 Year 2 Year 3 Treatment
Treatment
Year 2
Year 3
o Although there are many tests here (nine), they are the key tests of the hypotheses and we have a strong theory supporting these hypotheses. Thus, were these my own data, I would not apply a p-value correction to them. o If you were to apply a correction: These are complex contrasts, so you could use a Scheff correction:
3 * F ( = .05,3,29) = 3 * 2.934 = 8.802
Alternatively, you will be conducting 9 planned contrasts, so a Bonferroni correction could also be appropriate:
p crit =
You can select whichever of these two methods is less conservative. In this case, the Bonferroni correction is less conservative by a hair, so were we to apply a correct we should use the Bonferroni correction.
.05 = .0056 9
11B-37
2007 A. Karpinski
Question #2 A and B: Are there polynomial trends in the control condition? Are there polynomial trends in the treatment condition? o These are contrast tests within one level of the between-subjects variable. o Method #1: Select the level of the between-subjects variable of interest and conduct polynomial trends on that level.
SORT CASES BY group . SPLIT FILE LAYERED BY group . GLM baseline year1 year2 year3 /WSFACTOR = time 4 Polynomial. SPLIT FILE OFF.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 group Control Source time time Linear Quadratic Cubic Linear Quadratic Cubic Linear Quadratic Cubic Linear Quadratic Cubic Type III Sum of Squares 417.720 277.350 19.763 145.547 118.805 88.540 147.425 260.823 31.500 57.790 16.208 78.106 df 1 1 1 14 14 14 1 1 1 15 15 15 Mean Square 417.720 277.350 19.763 10.396 8.486 6.324 147.425 260.823 31.500 3.853 1.081 5.207 F 40.180 32.683 3.125 Sig. .000 .000 .099
Error(time)
Treatment
time
Error(time)
2 Linear Control =
417.72 277.35 2 = .74 Quadratic = .70 Control = 417.72 + 145.547 277.35 + 118.805 19.763 2 Cubic = .18 Control = 19.763 + 88.50
2 Linear Treatment =
147.425 260.823 2 = .72 Quadratic = .94 Treatment = 147.425 + 57.790 260.823 + 16.208 31.5 2 Cubic = .29 Treatment = 31.5 + 78.106 FLinearTreatment (1,15) = 38.27, p < .01, 2 = .72 FQuadraticTreatment (1,15) = 241.39, p < .01, 2 = .94 FCubicTreatment (1,15) = 6.50, p = .03, 2 = .29
Treatment 74 72 Bone Calcium 70 68 66 64
FLinearControl (1,14) = 40.18, p < .01, 2 = .74 FQuadraticControl (1,14) = 32.68, p < .01, 2 = .70 FCubicControl (1,14) = 3.13, p = .10, 2 = .18
Control 74 72 Bone Calcium 70 68 66 64 Baseline Year 1 Year 2 Year 3
Baseline
Year 1
Year 2
Year 3
11B-38
2007 A. Karpinski
o Advantages of method #1 It is easy to run o Disadvantages of method #1 Each test has fewer than (N-a) degrees of freedom. If the variances between groups are homogeneous, then we are (voluntarily) sacrificing accuracy and power. o Method #2: Compute the contrast of interest. Trick SPSS into testing it within each group separately using an error term with information from all between-subjects groups. (This method is only appropriate if you have equal variances between groups).
compute linear = -3*baseline + -1*year1 + 1*year2 + 3*year3. compute quad = 1*baseline + -1*year1 + -1*year2 + 1*year3. compute cubic = -1*baseline + 3*year1 + -3*year2 + 1*year3. ONEWAY linear quad cubic BY group /CONTRAST= 1 0 /CONTRAST= 0 1 /STATISTICS DESCRIPTIVES HOMOGENEITY.
Contrast #1
H 0 : 1 * Linear:Control + 0 * Linear:Treatment = 0 H 0 : Linear:Control = 0 H 0 : 3* Control :Baseline + 1* Control :Year1 + 1* Control :Year 2 + 3* Control :Year 3 = 0 H 0 : 0 * Linear:Control + 1 * Linear:Treatment = 0 H 0 : Linear:Treatment = 0 H 0 : 3 * Treatment:Baseline + 1 * Treatment:Year1 + 1 * Treatment:Year 2 + 3 * Treatment:Year 3 = 0
Contrast #2
By using the contrast subcommand, we obtain an error term that uses information from both groups (and has N-a dfs).
11B-39
2007 A. Karpinski
Note that the assume equal variances tests have N-a dfs and that the does not assume equal variances tests are identical to Method 1 where we ran the contrast only on the (between-subjects) group of interest.
Contrast Tests Contrast 1 2 1 2 1 2 1 2 1 2 1 2 Value of Contrast Std. Error -23.6000a 3.05758 -13.5750a 2.96049 -23.6000a 3.72312 -13.5750a 2.19449 -8.6000a 1.11422 -8.0750a 1.07884 -8.6000a 1.50431 -8.0750a .51974 -5.1333a 2.76800 -6.2750a 2.68011 -5.1333a 2.90385 -6.2750a 2.55123 t -7.719 -4.585 -6.339 -6.186 -7.718 -7.485 -5.717 -15.537 -1.855 -2.341 -1.768 -2.460 df 29 29 14.000 15.000 29 29 14.000 15.000 29 29 14.000 15.000 Sig. (2-tailed) .000 .000 .000 .000 .000 .000 .000 .000 .074 .026 .099 .027
linear
quad
cubic
Does Not Assume Equal Variance Tests Matches Method #1 output nj-1 degrees of freedom
t Linear Control (29) = 7.72, p < .01, r = .82 tQuadraticControl (29) = 7.72, p < .01, r = .82 tCubic Control (29) = 1.86, p = .07, r = .33 t Linear Treatment (29) = 4.59, p < .01, r = .65 tQuadraticTreatment (29) = 7.49, p < .01, r = .81 tCubic Treatment (29) = 2.34, p = .03, r = .40
t Linear Control (14) = 6.34, p < .01, r = .86 tQuadraticControl (14) = 5.72, p < .01, r = .84 tCubic Control (14) = 1.77, p = .10, r = .42 t Linear Treatment (15) = 6.19, p < .01, r = .85 tQuadraticTreatment (15) = 15.54, p < .01, r = .97 tCubic Treatment (15) = 2.46, p = .03, r = .53
Which method is right? It depends! If the variances between groups are equal, then we should pool the error term to include information from both groups. This procedure results in more accurate error estimates and tests with greater power. If the variances between groups are unequal, then we should not pool the error term and we should only use information from the group of interest to calculate the error term. o Advantages of method #2 It gives us both equal variance and unequal variance output o Disadvantages of method #2 More time consuming to run than method #1
11B-40
2007 A. Karpinski
Question #2 C: Are there differences in the polynomial trends between the groups?
Control 74 72 Bone Calcium 70 68 66 64 Baseline Year 1 Year 2 Year 3 Treatment
Linear: Treatment
Group Control Treatment Baseline -3 Time Year Year 1 2 -1 1 Year 3 3
Linear: Control
Group Control Treatment Baseline -3 Time Year Year 1 2 -1 1 Year 3 3
Linear: Treatment Linear: Control Time Group Baseline Year 1 Year 2 Year 3 Control 3 1 -1 -3 Treatment -3 -1 1 3
Linear:Treatment = 3* Treatment :Baseline + 1* Treatment :Year1 + 1* Treatment :Year 2 + 3* Treatment :Year 3 LinearControl = 3* Control :Baseline + 1* Control :Year1 + 1* Control :Year 2 + 3* Control :Year 3 Linear:Treatment Control = Linear:Treatment Linear:Control
= 3* Treatment :Baseline + 1* Treatment :Year1 + 1* Treatment :Year 2 + 3* Treatment :Year 3 1(3* Control :Baseline + 1* Control :Year1 + 1* Control :Year 2 + 3* Control :Year 3 ) = 3* Treatment :Baseline + 1* Treatment :Year1 + 1* Treatment :Year 2 + 3* Treatment :Year 3 3* Control :Baseline + 1* Control :Year1 1* Control :Year 2 3* Control :Year 3
o We can repeat this procedure for differences in the quadratic and cubic trends
11B-41
2007 A. Karpinski
o Method #1: Examine the interaction between the polynomial trends on time (the repeated measures factor) and condition.
GLM baseline year1 year2 year3 BY group /WSFACTOR = time 4 Polynomial.
Time * Group (Linear) Group Control Treatment Baseline Year 1 Time Year 2 Year 3 -1 1 -3 -1 1 Time Year 2 -1 1 3
Baseline 3 -3
Year 1 1 -1
Year 3 -3 3
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source time time Linear Quadratic Cubic Linear Quadratic Cubic Linear Quadratic Cubic Type III Sum of Squares 534.960 538.172 50.381 38.903 .533 .505 203.337 135.013 166.645 df 1 1 1 1 1 1 29 29 29 Mean Square 534.960 538.172 50.381 38.903 .533 .505 7.012 4.656 5.746 F 76.296 115.597 8.767 5.548 .115 .088 Sig. .000 .000 .006 .025 .737 .769
time * group
Error(time)
2 DiffernceI nLinear =
38.903 0.533 2 = .16 DiffernceI < .01 nQuadratic = 38.903 + 203.337 0.533 + 135.013 0.505 2 DiffernceI < .01 nCubic = 0.505 + 166.645
Difference in linear trends: F (1,29) = 5.55, p = .03, 2 = .16 Difference in quadratic trends: F (1,29) = 0.12, p = .74, 2 < .01 Difference in cubic trends: F (1,29) = 0.09, p = .77, 2 < .01 o Advantages of method #1 Easy to run o Disadvantages of method #1 Only works (provides a 1 df contrast test of difference between polynomial trends) when a=2.
11B-42
2007 A. Karpinski
o Method #2: Compute the contrast of interest and (manually) ask for a comparison between the treatment group and the control group.
ONEWAY linear quad cubic BY group /CONTRAST= -1 1.
Baseline 3 -3
Year 1 1 -1
Year 3 -3 3
Thus, the contrast command tests for a difference in linear, quadratic, and cubic trends between the control and treatment groups (exactly the same as Method #1).
Contrast Tests Contrast 1 1 1 Value of Contrast 10.0250 .5250 -1.1417 Std. Error 4.25597 1.55093 3.85290 t 2.356 .339 -.296 df 29 29 29 Sig. (2-tailed) .025 .737 .769
r=
2.356 2 2.356 2 + 29
rLinearDiff =
= .40 rQuadDiff =
0.339 2 0.339 2 + 29
= .06 rCubicDiff =
0.296 2 0.296 2 + 29
= .05
11B-43
2007 A. Karpinski
Treatment
Year 2
Year 3
Difference in linear trends: t (29) = 2.36, p = .03, r = .40 Difference in quadratic trends: t (29) = 0.34, p = .74, r = .06 Difference in cubic trends: t (29) = 0.30, p = .77, r = .05 o Advantages of method #2 Can be used to test for differences in trends when there are more than 2 between-subject groups in the factor (a>2). Also provides output to test the contrasts when the variance between groups is not homogeneous o Disadvantages of method #2 More time consuming to run than method #1. Conclusions o Question #2: Is the rate of calcium loss different between the treatment group and the control group? Yes. There are significant linear, quadratic, and (significant or marginally significant) cubic trends in calcium bone loss for both the treatment and control group. These trends indicate that over time, participants in both groups are losing calcium in their bones. However, the linear rate of calcium bone loss is stronger in the control group than in the treatment group. Thus, there is some evidence that the treatment is associated with less bone loss. o Question #1: Does the treatment group have less calcium loss than the control group? No. At the same time, there were no differences in bone calcium levels at any of the follow-up assessments. This example is an illustration of growth curve analysis. In growth curve analysis, the rate/pattern of change over time is modeled and usually compared between 2 or more groups.
11B-44
2007 A. Karpinski
Appendix Two Additional Between/Within Examples 9. : Effects of brain damage on memory (3 * 3) A neuropsychologist is exploring short-term memory deficits in braindamaged individuals. Patients were classified as either having lefthemisphere damage, right-hemisphere damage, or no damage (control). Participants viewed stimuli consisting of string of all digits, all letter, and mixed letters and digits. The longest string that each participant could remember in each condition is listed below:
Damage Left Brain Right Brain Control Digits
6 8 7 9 8 9 8 10 9 8 6 7 7 7 9 9 8 10
Stimuli Letters
5 7 7 8 8 7 8 9 10 5 4 6 8 6 8 7 8 10
Mixed
6 5 4 6 7 8 7 9 8 8 7 5 8 7 7 9 8 9
The researcher would like to know: o Does recall vary by type of stimuli? o Does this difference vary by type of brain damage? o Does recall vary by type of brain damage? o Does this difference vary by type stimuli?
10 9 8 7 6 5 digit letter mixed Left Brain Right Brain Control 10 9 8 7 6 5 Left Brain Right Brain Control digit letter mixed
Recall
Recall
11B-45
2007 A. Karpinski
DIGIT
LETTER
MIXED
DAMAGE Left Brain Right Brain Control Left Brain Right Brain Control Left Brain Right Brain Control
Statistic .853 .775 .853 .907 .822 .907 .958 .960 .822
Sig. .167 .035 .167 .415 .091 .415 .804 .820 .091
10
damage
11B-46
2007 A. Karpinski
10.00
9.00
between
8.00
7.00 11
6.00
between
Left Brain Right Brain Control
damage
Descriptives between damage Left Brain Right Brain Control Skewness Kurtosis Skewness Kurtosis Skewness Kurtosis Statistic .811 -1.029 -1.153 2.500 .000 -1.875 Std. Error .845 1.741 .845 1.741 .845 1.741
The data look relatively symmetrical o Homogeneity of variances / Sphericity Homogeneity of variances for between group tests:
GLM digit letter mixed BY damage /WSFACTOR = recall 3 /PRINT = DESC HOMO.
Levene's Test of Equality of Error Variances DIGIT LETTER MIXED F .250 1.000 1.250 df1 2 2 2 df2 15 15 15 Sig. .782 .391 .315
COMPUTE between = (digit + letter + mixed)/3. EXAMINE VARIABLES=between BY damage /PLOT BOXPLOT SPREADLEVEL.
Test of Homogeneity of Variance Levene Statistic 1.573 df1 2 df2 15 Sig. .240
BETWEEN
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
We do not have any evidence that the variances are different across the between-subjects groups. This assumption is satisfied.
11B-47
2007 A. Karpinski
Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups.
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
We do not have any evidence that the variance/covariance matrices are different across the three groups. This assumption is satisfied. Overall sphericity (averaging over the between subjects factor):
GLM digit letter mixed BY damage /WSFACTOR = recall 3 /PRINT = HOMOGENIETY.
Mauchly's Test of Sphericity Measure: MEASURE_1 Epsilon Within Subjects Effect RECALL Greenhous e-Geisser .689 Huynh-Feldt .837 Lower-bound .500
The data are not spherical and the violation is severe. We cannot conduct omnibus tests for the within-subject effect (recall) or for between/within subject interactions (recall*stimuli).
11B-48
2007 A. Karpinski
The data are only spherical for patients with right brain damage. For the other two groups, the data are not spherical and the violation is severe and unfixable. If we want to use the same methods to test effects at each level, then we cannot conduct simple effect omnibus tests for the within-subject effect (recall) within each level of the between-subjects factor (stimuli). o Conclusions from tests of assumptions: We may perform an omnibus test and/or standard contrasts on the marginal between-subjects (damage) means. We may not perform any omnibus tests involving within-subjects effects. Tests on the marginal within-subjects (stimuli) means or on the between/within interaction (damage by stimuli) must use a contrast-specific error term.
Stimuli Letters X = 5.67 X = 7.33 X = 8.67 X = 7.22
11B-49
2007 A. Karpinski
Hypothesis testing: o The researcher is basically asking for all possible tests of interest to be conducted. We will consider all tests to be exploratory (post-hoc).
Damage Left Brain Right Brain Control Stimuli Digits Letters Mixed X = 7.00 X = 5.67 X = 5.83 X = 8.17 X = 7.33 X = 7.50 X = 9.00 X = 8.67 X = 8.33 X = 8.06 X = 7.22 X = 7.22
o Does recall vary by type of stimuli? Main effect for stimuli (Within subject effect) We cannot conduct a standard omnibus test We will conduct pairwise tests on marginal (within-subject) stimuli means. o Does this difference vary by type of brain damage? Interaction between damage and stimuli (between by within effect) We cannot conduct a standard interaction omnibus test We will conduct pairwise tests on the effect of stimuli within each level of brain damage. o Does recall vary by type of brain damage? Main effect for brain damage (between subject effect) We can conduct a standard omnibus test We will follow this test with pairwise tests on marginal (betweensubject) brain damage means. o Does this difference vary by type of stimuli? Interaction between damage and stimuli (between by within effect) We cannot conduct a standard omnibus interaction test We can examine the simple effect of brain damage within each level of stimuli and follow each test with pairwise comparisons to identify differences.
11B-50
2007 A. Karpinski
o Does recall vary by type of stimuli? We will conduct pairwise tests on marginal stimuli means.
Damage Left Brain Right Brain Control Digits X = 7.00 X = 8.17 X = 9.00 X = 8.06 Stimuli Letters X = 5.67 X = 7.33 X = 8.67 X = 7.22 Mixed X = 5.83 X = 7.50 X = 8.33 X = 7.22
GLM digit letter mixed BY damage /WSFACTOR = recall 3 simple (1). GLM digit letter mixed BY damage /WSFACTOR = recall 3 simple (2).
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source recall Error(recall) recall Level 2 vs. Level 1 Level 3 vs. Level 1 Level 2 vs. Level 1 Level 3 vs. Level 1 Type III Sum of Squares 12.500 12.500 17.500 25.500 df 1 1 15 15 Mean Square 12.500 12.500 1.167 1.700 F 10.714 7.353 Sig. .005 .016
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source recall Error(recall) recall Level 3 vs. Level 2 Level 3 vs. Level 2 Type III Sum of Squares .000 53.000 df 1 15 Mean Square .000 3.533 F .000 Sig. 1.000
2 DigitsV . Letters =
12.5 12.5 0 2 2 = .42 DigitsV = .33 LettersV = .0 . Mixed = . Mixed = 12.5 + 17.5 12.5 + 25.5 0 + 535
(q(.05,3,15) )2
2
(3.67 )2
2
= 6.747
Digits vs. Letters: F (1,15) = 10.71, p < .05, 2 = .42 Digits vs. Mixed: F (1,15) = 7.35, p < .05, 2 = .33 Letters vs. Mixed: F (1,15) = 0.00, ns, 2 = 0.00
11B-51
2007 A. Karpinski
o Does this difference in recall of types of stimuli vary by type of brain damage? We want to repeat the three contrasts we just ran, but we want to look within each level of brain damage (rather than averaging over the types of damage). Digits vs. Letters
Damage Left Brain Right Brain Control Stimuli Digits Letters X = 7.00 X = 5.67 X = 8.17 X = 7.33 X = 9.00 X = 8.67 X = 8.06 X = 7.22 Mixed X = 5.83 X = 7.50 X = 8.33 X = 7.22 Digit-Letters X = 6.17 X = 7.67 X = 8.67 Damage Left Brain Right Brain Control 1.33 0.84 0.33
To examine these differences at each level of brain damage, we can compute the difference of interest and use the ONEWAY command and the CONTRAST subcommand:
Compute dig_let = digit - letter. ONEWAY dig_let by damage /STAT = DESC /CONT = 1 0 0 /CONT = 0 1 0 /CONT = 0 0 1.
ANOVA dig_let Sum of Squares 3.000 17.500 20.500 df 2 15 17 Mean Square 1.500 1.167 F 1.286 Sig. .305
Contrast Tests Contrast 1 2 3 Value of Contrast 1.3333 .8333 .3333 Std. Error .44096 .44096 .44096 t 3.024 1.890 .756 df 15 15 15 Sig. (2-tailed) .009 .078 .461
dig_let
t crit =
q(.05 3,3,15) 4.473 q(.10 3,3,15) 3.973 = = 3.163 t crit = = = 2.809 2 2 2 2 1.33 0.833 .3333 = 1.03 d RightBrain = = 0.64 d Control = = 0.26 1.29 1.29 1.29
d LeftBrain =
Digits vs. Letters Left Brain: t (15) = 3.02, p < .10, d = 1.03 Right Brain: t (15) = 1.89, ns, d = 0.64 No Damage: t (15) = 0.76, ns, d = 0.26
11B-52 2007 A. Karpinski
Compute dig_mix = digit - mixed. ONEWAY dig_mix by damage /STAT = DESC /CONT = 1 0 0 /CONT = 0 1 0 /CONT = 0 0 1.
ANOVA dig_mix Sum of Squares 1.000 25.500 26.500 df 2 15 17 Mean Square .500 1.700 F .294 Sig. .749
Contrast Tests Contrast 1 2 3 Value of Contrast 1.1667 .6667 .6667 Std. Error .53229 .53229 .53229 t 2.192 1.252 1.252 df 15 15 15 Sig. (2-tailed) .045 .230 .230
dig_mix
t crit =
q(.05 3,3,15) 4.473 q(.10 3,3,15) 3.973 = = 3.163 t crit = = = 2.809 2 2 2 2 1.167 0.667 .667 = 0.90 d RightBrain = = 0.51 d Control = = 0.52 1.303 1.303 1.303
d LeftBrain =
Digits vs. Mixed Left Brain: t (15) = 2.19, ns, d = .90 Right Brain: t (15) = 1.25, ns, d = .52 No Damage: t (15) = 1.25, ns, d = .52
11B-53
2007 A. Karpinski
Compute let_mix = letter - mixed. ONEWAY let_mix by damage /STAT = DESC /CONT = 1 0 0 /CONT = 0 1 0 /CONT = 0 0 1.
ANOVA let_mix Sum of Squares 1.000 53.000 54.000 df 2 15 17 Mean Square .500 3.533 F .142 Sig. .869
Contrast Tests Contrast 1 2 3 Value of Contrast -.1667 -.1667 .3333 Std. Error .76739 .76739 .76739 t -.217 -.217 .434 df 15 15 15 Sig. (2-tailed) .831 .831 .670
let_mix
t crit =
q(.05 3,3,15) 4.473 q(.10 3,3,15) 3.973 = = 3.163 t crit = = = 2.809 2 2 2 2 0.1667 0.1667 .333 = 0.09 d RightBrain = = 0.09 d Control = = 0.18 1.880 1.880 1.880
d LeftBrain =
Letters vs. Mixed Left Brain: t (15) = 0.22, ns, d = .09 Right Brain: t (15) = 0.22, ns, d = .09 No Damage: t (15) = 0.43, ns, d = 18 Overall, the effects are relatively consistent within each level of brain damage, although there is some (marginal) evidence that the advantage of digits over letters is stronger in left-brain damaged individuals than in control or right-brain damaged participants.
2007 A. Karpinski
11B-54
Stimuli Digits Letters Mixed X = 7.00 X = 5.67 X = 5.83 X = 8.17 X = 7.33 X = 7.50 X = 9.00 X = 8.67 X = 8.33 X = 8.06 bc X = 7.22 b X = 7.22 c
Note: Within each row, means with a common subscript are significantly different from each other.
10 9 8 7 6 5 Left Brain Right Brain Control digit letter mixed
o Does recall vary by type of brain damage? Main effect for brain damage (between subject effect) We will follow-up this test with pairwise tests on marginal brain damage means.
Damage Left Brain Right Brain Control Stimuli Digits X X X X = 7.00 = 8.17 = 9.00 = 8.06 Letters X X X X = 5.67 = 7.33 = 8.67 = 7.22 Mixed X X X X = 5.83 = 7.50 = 8.33 = 7.22 X = 6.17 X = 7.67 X = 8.67
Recall
11B-55
2007 A. Karpinski
GLM digit letter mixed BY damage /WSFACTOR = recall 3 /POSTHOC = damage (TUKEY)
Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Intercept DAMAGE Error Type III Sum of Squares 3037.500 57.000 18.500 df 1 2 15 Mean Square 3037.500 28.500 1.233 F 2462.838 23.108 Sig. .000 .000
2 =
57 = .75 57 + 18.5
There is a significant main effect for brain damage, F (2,15) = 23.10, p < .01, 2 = .76 . Overall, recall varies by type of brain damage.
Multiple Comparisons Dependent Variable: BETWEEN Tukey HSD Mean Difference (I-J) -1.5000* -2.5000* 1.5000* -1.0000* 2.5000* 1.0000*
(J) DAMAGE Right Brain Control Left Brain Control Left Brain Right Brain
95% Confidence Interval Lower Bound Upper Bound -2.4615 -.5385 -3.4615 -1.5385 .5385 2.4615 -1.9615 -.0385 1.5385 3.4615 .0385 1.9615
Left-brain vs. right brain: Left-brain vs. control: Right-brain vs. control:
t (15) = 4.05, p < .01, d = 1.35 t (15) = 6.75, p < .01, d = 2.25
t (15) = 2.70, p = .04, d = 0.90
These exact same tests can be conducted by manually averaging over the within-subjects factor and conducting an ANOVA on this average variable.
COMPUTE between = (digit + letter + mixed)/3. ONEWAY between by damage /POSTHOC = TUKEY.
11B-56
2007 A. Karpinski
o Does this difference vary by type of stimuli? We can examine the simple effect of brain damage within each level of stimuli and follow each test with pairwise comparisons to identify differences.
Damage Left Brain Right Brain Control Stimuli Digits Letters Mixed X = 7.00 X = 5.67 X = 5.83 X = 8.17 X = 7.33 X = 7.50 X = 9.00 X = 8.67 X = 8.33 X = 8.06 X = 7.22 X = 7.22
To conduct simple effects within each level of stimuli, we can select the appropriate level and run an omnibus test comparing the levels of damage. (Note that we can select a level of stimuli because stimuli is a within-subjects factor. For within subjects factors, we compute error estimates based only on information involved in the comparison). Simple effect of damage for digits only:
ONEWAY digit by damage /STAT = DESC /CONTRAST = -1 1 0 /CONTRAST = -1 0 1 /CONTRAST = 0 -1 1.
ANOVA DIGIT Sum of Squares 12.111 12.833 24.944 df 2 15 17 Mean Square 6.056 .856 F 7.078 Sig. .007
pcrit =
.05 = .0167 3
2 =
There is a significant simple effect for brain damage on recall of digits only, F (2,15) = 7.08, p < .05, 2 = .49 . Overall, recall of digits only varies by type of brain damage.
11B-57
2007 A. Karpinski
Contrast Tests Contrast 1 2 3 Value of Contrast 1.1667 2.0000 .8333 Std. Error .53403 .53403 .53403 t 2.185 3.745 1.560 df 15 15 15 Sig. (2-tailed) .045 .002 .139
digit
t crit = d LeftVRight =
1.1667 2 .8333 = 1.26 d LeftVControl = = 2.16 d RightVControl = = 0.90 .925 .9252 .9252
Recall of digits: Left-brain vs. right brain: Left-brain vs. control: Right-brain vs. control:
Stimuli Digits Letters Mixed X = 7.00 X = 5.67 X = 5.83 X = 8.17 X = 7.33 X = 7.50 X = 9.00 X = 8.67 X = 8.33 X = 8.06 X = 7.22 X = 7.22
Recall
11B-58
2007 A. Karpinski
pcrit =
.05 = .0167 3
2 =
There is a significant simple effect for brain damage on recall of letters only, F (2,15) = 11.30, p < .05, 2 = .60 . Overall, recall of letters only varies by type of brain damage.
Contrast Tests Contrast 1 2 3 Value of Contrast 1.6667 3.0000 1.3333 Std. Error .63246 .63246 .63246 t 2.635 4.743 2.108 df 15 15 15 Sig. (2-tailed) .019 .000 .052
letter
d LeftVRight
q(.05 3,3,15) 4.473 = = 3.163 2 2 1.6667 3 1.333 = = 1.52 d LeftVControl = = 2.74 d RightVControl = = 1.21 1.095 1.095 1.095 t crit = t (15) = 2.64, ns, d = 1.52 t (15) = 4.74, p < .05, d = 2.74
Recall of letters: Left-brain vs. right brain: Left-brain vs. control: Right-brain vs. control:
10 9 8 7 6 5 digit letter
Recall
mixed
11B-59
2007 A. Karpinski
pcrit =
.05 = .0167 3
2 =
There is a significant simple effect for brain damage on recall of letters and numbers, F (2,15) = 7.41, p < .05, 2 = .50 . Overall, recall of letters only varies by type of brain damage.
Contrast Tests Contrast 1 2 3 Value of Contrast 1.6667 2.5000 .8333 Std. Error .66109 .66109 .66109 t 2.521 3.782 1.261 df 15 15 15 Sig. (2-tailed) .024 .002 .227
mixed
d LeftVRight
q(.05 3,3,15) 4.473 = = 3.163 2 2 1.6667 2.5 0.8333 = = 1.46 d LeftVControl = = 2.18 d RightVControl = = 0.73 1.145 1.145 1.145 t crit =
Recall of mixed stimuli (digits and letters): t (15) = 2.52, ns, d = 1.46 Left-brain vs. right brain: Left-brain vs. control: t (15) = 3.78, p < .05, d = 2.18 t (15) = 1.26, ns, d = 0.73 Right-brain vs. control:
10 9 8 7 6 5 digit letter mixed Left Brain Right Brain Control
Recall
11B-60
2007 A. Karpinski
Conclusions from simple effects of the effect of brain damage on recall for each type of stimulus
Damage Left Brain Right Brain Control Stimuli Digits Letters Mixed a b X = 7.00 X = 5.67 X = 5.83 c X = 8.17 X = 7.33 X = 7.50 X = 9.00 a X = 8.67 b X = 8.33 c X = 8.06 X = 7.22 X = 7.22
Note: Within each column, means with a common subscript are significantly different from each other.
o The simple effects and pairwise tests allow us to indirectly test the stimuli by damage interaction. However, we never actually tested any interaction contracts (all of our contrasts on cell means were within a level of a factor). When the between-subjects factor has more than two levels, testing interaction contrasts is not straightforward.
11B-61
2007 A. Karpinski
10. Relationship between time of year and cholesterol (2 * 4) Example #2: The Seasons data come from a longitudinal study recently conducted by the UMass Medical School (Merriam et al., 1999). Subjects were volunteers recruited from the membership of a large HMO in central Massachusetts. For some of the variables, subjects provided data during each season of the year. The number at the end of the variable name indicates the season: 1=winter; 2=spring; 3=summer; and 4=fall. Participants total cholesterol (TC) level was measured in each of the four seasons. The researcher would like to know if total cholesterol levels varied season, and if this variation differed for men and women.
Descriptive Statistics TC1 SEX Male Female Total Male Female Total Male Female Total Male Female Total Mean 224.0591 216.4171 220.3179 218.8182 213.2204 216.0777 222.1636 214.0924 218.2123 222.5182 215.0948 218.8840 Std. Deviation 40.79346 42.84937 41.93859 40.11304 40.43307 40.32061 41.60071 41.07910 41.49518 39.90822 42.98048 41.55878 N 220 211 431 220 211 431 220 211 431 220 211 431
TC2
TC3
TC4
11B-62
2007 A. Karpinski
EXAMINE VARIABLES=tc1 tc2 tc3 tc4 BY sex /PLOT BOXPLOT NPPLOT SPREADLEVEL /COMPARE VARIABLES.
Tests of Normality
500
400
94 1 73 57 71 41 1 73
1 73 195 166 17
41
230
277
300
TC1 TC2
TC1
200
100
214
11 214
214
214
252
TC2 TC3
TC3 TC4
0
N= 220 220 220 220 211 211 211 211
SEX
300
200
Tests of Normality
100
214
0
N= 220 211
Male
Female
TC_MEAN
SEX
The data look relatively symmetrical, but there are a number of outliers. A sensitivity analysis would be in order.
11B-63
2007 A. Karpinski
Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups.
We do not have any evidence that the variance/covariance matrices are different across the three groups. This assumption is satisfied. Homogeneity of variances for between group tests:
GLM tc1 tc2 tc3 tc4 BY sex /WSFACTOR = time 4 /PRINT = DESC HOMO.
Levene's Test of Equality of Error Variances
Test of Homogeneity of Variance
COMPUTE mean_tc = (tc1 + tc2 + tc3 + tc4)/4. EXAMINE VARIABLES= mean_tc BY sex /PLOT BOXPLOT SPREADLEVEL.
df1 1 1 1 1
TC_MEAN
df1 1
df2 429
Sig. .406
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
We do not have any evidence that the variances are different across the two groups. This assumption is satisfied.
11B-64
2007 A. Karpinski
The data are spherical. We can conduct omnibus tests for the withinsubject effect (time) or for between/within subject interactions (sex*time).
Temporary. select if sex = 1. GLM tc1 tc2 tc3 tc4 /WSFACTOR = time 4.
Mauchly's Test of Sphericity Measure: MEASURE_1 Epsilon Within Subjects Effect TIME Greenhous e-Geisser .952 Huynh-Feldt .966 Lower-bound .333
Within each level of stimuli the data are spherical. We can conduct simple effect omnibus tests for the within-subject effect (time) within each level of the between-subjects factor (sex).
11B-65
2007 A. Karpinski
o Conclusions from tests of assumptions: We may perform an omnibus test and/or standard contrasts on the marginal between-subjects (sex) means. We may perform standard omnibus tests on the marginal withinsubjects (time) effect and on the between/within (sex by time) interaction. We may perform standard simple-effect omnibus tests for the effect of the within-subjects factor (time) within each level of the betweensubjects factor (sex). Contrasts on the marginal within-subjects (time) means or on the between/within (sex by time) cell means may use the omnibus error term. However, I recommend always using a contrast-specific error term, so all tests will use these contrast-specific error terms. There are a number of outliers; a sensitivity analysis should be conducted. General ANOVA omnibus tests:
GLM tc1 tc2 tc3 tc4 BY sex /WSFACTOR = time 4 /PRINT = DESC HOMO.
Tests of Within-Subjects Effects Measure: MEASURE_1 Source TIME Type III Sum of Squares 3982.386 3982.386 3982.386 3982.386 384.530 384.530 384.530 384.530 317410.663 317410.663 317410.663 317410.663 df 3 2.939 2.968 1.000 3 2.939 2.968 1.000 1287 1260.637 1273.235 429.000 Mean Square 1327.462 1355.223 1341.813 3982.386 128.177 130.857 129.562 384.530 246.628 251.786 249.295 739.885 F 5.382 5.382 5.382 5.382 .520 .520 .520 .520 Sig. .001 .001 .001 .021 .669 .665 .667 .471
TIME * SEX
Error(TIME)
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
o Omnibus tests using the within-subjects error term MS Time*Sub / Sex : Main effect of time: F (3,1287) = 5.38, p = .001 Time by gender interaction: F (3,1287) = 0.52, p = .67
11B-66
2007 A. Karpinski
Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Source Intercept SEX Error Type III Sum of Squares 82119676.7 22231.744 2598411.686 df 1 1 429 Mean Square 82119676.65 22231.744 6056.904 F 13558.029 3.670 Sig. .000 .056
o Omnibus tests using the between-subjects error term MS Sub / Sex : Main effect of gender: F (1,429) = 3.67, p = .056
Simple effects of season within each gender: o There are two simple effects tests (for men and for women). We need to use an adjusted critical p-value to maintain FW = .05
pcrit = .05 = .025 2
o We want our test of season to be based on an error term containing information from both men and women (because overall sphericity is satisfied, we should use the omnibus within-subjects error term). If we select men and women separately, the error terms will only contain information from the male and female participants, respectively. However, we can select each group separately to obtain the sum of squares for the simple effects tests. We can then manually compute tests for the effect of time for men and women separately using the omnibus error term:
F (a 1, ( N a)(b 1)) = MS Time ( Men Only ) MS Time*Sub / Sex F (a 1, ( N a)(b 1)) = MS Time (Women Only ) MS Time*Sub / Sex
11B-67
2007 A. Karpinski
Tests of Within-Subjects Effects Measure: MEASURE_1 Source TIME Type III Sum of Squares 3214.312 3214.312 3214.312 3214.312 161583.937 161583.937 161583.937 161583.937 df 3 2.938 2.983 1.000 657 643.528 653.231 219.000 Mean Square 1071.437 1093.868 1077.619 3214.312 245.942 251.091 247.361 737.826 F 4.356 4.356 4.356 4.356 Sig. .005 .005 .005 .038
Error(TIME)
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
We can use the SS and MS for the effect of time within men, but we should not use the F-test. Because overall sphericity is satisfied, we should use the omnibus within-subject error term for this simple effect test
MS Time ( Men Only ) MS Time*Sub / Sex
pcrit = F (3,1287) = 1071.437 = 4.344, p = .0048 246.648
F (a 1, ( N a)(b 1)) =
There is a significant simple effect of time on total cholesterol levels for men, F (3,1287) = 4.34, p < .05 .
11B-68
2007 A. Karpinski
Tests of Within-Subjects Effects Measure: MEASURE_1 Source TIME Type III Sum of Squares 1194.775 1194.775 1194.775 1194.775 155826.725 155826.725 155826.725 155826.725 df 3 2.855 2.898 1.000 630 599.467 608.594 210.000 Mean Square 398.258 418.543 412.266 1194.775 247.344 259.942 256.044 742.032 F 1.610 1.610 1.610 1.610 Sig. .186 .188 .187 .206
Error(TIME)
Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound
F (a 1, ( N a)(b 1)) =
F (3,1287) =
There is no significant simple effect of time on total cholesterol levels for women, F (3,1287) = 1.62, p < .05 .
11B-69
2007 A. Karpinski
Simple effects of gender within each time: o There are four simple effects tests (one for each season). We need to use an adjusted critical p-value to maintain FW = .05
pcrit = .05 = .0125 4
Tests of Between-Subjects Effects Dependent Variable: TC1 Source Corrected Model Intercept SEX Error Total Corrected Total Type III Sum of Squares 6289.922a 20896457.5 6289.922 750013.530 21677027.0 756303.452 df 1 1 1 429 431 430 Mean Square 6289.922 20896457.46 6289.922 1748.283 F 3.598 11952.558 3.598 Sig. .059 .000 .059
11B-70
2007 A. Karpinski
11B-71
2007 A. Karpinski
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source TIME TIME Linear Quadratic Cubic Linear Quadratic Cubic Linear Quadratic Cubic Type III Sum of Squares 102.937 2583.051 1296.398 17.789 52.504 314.237 122043.560 104784.659 90582.444 df 1 1 1 1 1 1 429 429 429 Mean Square 102.937 2583.051 1296.398 17.789 52.504 314.237 284.484 244.253 211.148 F .362 10.575 6.140 .063 .215 1.488 Sig. .548 .001 .014 .803 .643 .223
TIME * SEX
Error(TIME)
Linear trend in total cholesterol over seasons: F (1,429) = 0.36, ns Quadratic trend in total cholesterol over seasons: F (1,429) = 10.58, p < .05 Cubic trend in total cholesterol over seasons: F (1,429) = 6.14, ns
11B-72
2007 A. Karpinski
o Next, we test if these polynomial trends differ by gender: These tests (linear*sex, quadratic*sex, and cubic*sex) were printed in the previous analysis. These are complex post-hoc tests and require a Scheffe correction:
F crit= df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 9.05
Winter -3 3
Spring -1 1
Season Summer 1 -1
Fall 3 -3
There is no difference in linear trends in total cholesterol over seasons between men and women: F (1,429) = 0.06, ns
Winter 1 -1
Fall 1 -1
There is no difference in quadratic trends in total cholesterol over seasons between men and women: F (1,429) = 0.22, ns
Winter -3 3
Spring 1 -1
Season Summer -1 1
Fall 3 -3
There is no difference in cubic trends in total cholesterol over seasons between men and women: F (1,429) = 1.49, ns
11B-73
2007 A. Karpinski
Next, we conduct repeated contrasts on the marginal time means (comparing each level to the previous level):
GLM tc1 tc2 tc3 tc4 BY sex /WSFACTOR = time 4 repeated /PRINT = DESC HOMO.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source TIME TIME Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Type III Sum of Squares 7667.696 1915.740 198.305 450.076 658.904 45.200 185873.819 191652.790 211850.594 df 1 1 1 1 1 1 429 429 429 Mean Square 7667.696 1915.740 198.305 450.076 658.904 45.200 433.272 446.743 493.824 F 17.697 4.288 .402 1.039 1.475 .092 Sig. .000 .039 .527 .309 .225 .762
TIME * SEX
Error(TIME)
o These are post-hoc pairwise comparisons and require a Tukey HSD correction.
F crit=
(q(.05,4,429))2 (3.633)2
2 = 2
= 6.60
Winter vs. Spring: F (1,429) = 17.70, p < .05 Spring vs. Summer: F (1,429) = 4.29, ns Summer vs. Fall: F (1,429) = 0.40, ns We also want to test if these repeated contrasts differ for men and women. o Again, tests of these contrasts were provided as interaction contrasts when we asked for the repeated contrasts. o These are complex, interaction post-hoc tests and require a Scheffe correction:
F crit= df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 9.05
11B-74
2007 A. Karpinski
Male Female
Winter 1 -1
Spring -1 1
Season Summer 0 0
Fall 0 0
Difference in winter vs. spring total cholesterol levels between men and women: F (1,429) = 1.04, ns
Male Female
Winter 0 0
Spring 1 -1
Season Summer -1 1
Fall 0 0
Difference in spring vs. summer total cholesterol levels between men and women: F (1,429) = 1.48, ns
Male Female
Winter 0 0
Spring 0 0
Season Summer 1 -1
Fall -1 1
Difference in summer vs. fall total cholesterol levels between men and women: F (1,429) = 0.09, ns
11B-75
2007 A. Karpinski
o These differences in repeated contrasts between men and women can also be conducted by computing repeated contrasts on the marginal season means, and then testing if these contrasts differ by gender Step 1: Compute a contrast comparing winter to spring.
Winter Spring = Winter Spring
Winter
Fall
Male Female 1 -1
interaction = Winter Spring (men ) Winter Spring ( women) = (Winter (men ) Spring ( men)) (Winter (women ) Spring ( women)) = Winter (men ) Spring ( men) Winter ( women) + Spring ( women)
Season Spring Summer -1 1
Male Female
Winter 1 -1
Fall
A test of whether the difference in winter vs. spring total cholesterol levels are equal for men and women is equivalent to a test of the interaction contrast, H0 : interaction = 0
H 0 : interaction = 0 H0 : Winter Spring (men ) Winter Spring ( women) = 0 H0 : Winter Spring (men ) = Winter Spring ( women)
11B-76
2007 A. Karpinski
o In SPSS:
Compute t1vst2 = tc1 - tc2. Compute t2vst3 = tc2 - tc3. Compute t3vst4 = tc3 - tc4. T-TEST GROUPS=sex(0 1) /VARIABLES=t1vst2 t2vst3 t3vst4.
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means
F T1VST2 Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed .491
Sig. .484
t 1.019 1.021
T2VST3
.856
.355
-1.214 -1.217
T3VST4
.656
.418
.303 .302
t crit = df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 3.01
Difference in winter vs. spring total cholesterol levels between men and women: t(429) = 1.02, ns Difference in spring vs. summer total cholesterol levels between men and women: t(429) = 1.21, ns Difference in summer vs. fall total cholesterol levels between men and women: t(429) = 0.30, ns o These results exactly match the results we obtain by asking for repeated contrasts (and repeated*gender interaction contrasts) in the repeatedmeasures ANOVA. Both analyses test the same hypothesis and include gender as a between-subjects factor in the design. o An advantage of this method is that it can be used when the variance between the male and female (contrast) scores are not equal.
11B-77
2007 A. Karpinski
If we wish to conduct simple contrasts (comparing cholesterol levels at each time point to the cholesterol levels in winter):
GLM tc1 tc2 tc3 tc4 BY sex /WSFACTOR = tc 4 simple(1) /PRINT = DESC HOMO.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source TIME TIME Level 2 vs. Level 1 Level 3 vs. Level 1 Level 4 vs. Level 1 Level 2 vs. Level 1 Level 3 vs. Level 1 Level 4 vs. Level 1 Level 2 vs. Level 1 Level 3 vs. Level 1 Level 4 vs. Level 1 Type III Sum of Squares 7667.696 1918.108 882.930 450.076 19.839 5.148 185873.819 237265.107 233599.217 df 1 1 1 1 1 1 429 429 429 Mean Square 7667.696 1918.108 882.930 450.076 19.839 5.148 433.272 553.066 544.520 F 17.697 3.468 1.621 1.039 .036 .009 Sig. .000 .063 .204 .309 .850 .923
TIME * SEX
Error(TIME)
o These are post-hoc pair-wise comparisons and require a Tukey HSD correction.
F crit=
(q(.05,4,429))2 (3.633)2
2 = 2
= 6.60
Winter vs. Spring: F (1,429) = 17.70, p < .05 Winter vs. Summer: F (1,429) = 3.47, ns Winter vs. Fall: F (1,429) = 1.62, ns We can also test if these simple contrasts differ for men and women. o These are complex, interaction post-hoc tests and require a Scheffe correction:
F crit= df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 9.05
11B-78
2007 A. Karpinski
o The tests of these contrasts were provided as interaction contrasts when we asked for the repeated contrasts:
Winter vs. Spring
Male Female
Winter 1 -1
Spring -1 1
Season Summer 0 0
Fall 0 0
Difference in winter vs. spring total cholesterol levels between men and women: F (1,429) = 1.04, ns
Male Female
Winter 1 -1
Fall 0 0
Difference in winter vs. summer total cholesterol levels between men and women: F (1,429) = 0.04, ns
Male Female
Winter 1 -1
Fall -1 1
Difference in winter vs. fall total cholesterol levels between men and women: F (1,429) = 0.01, ns
11B-79
2007 A. Karpinski
o Again, differences in simple contrasts between men and women can also be conducted by computing simple contrasts on the marginal season means, and then testing if these contrasts differ by gender o In SPSS:
Compute t1vst2 = tc1 - tc2. Compute t1vst3 = tc1 - tc3. Compute t1vst4 = tc1 - tc4. T-TEST GROUPS=sex(0 1) /VARIABLES=t1vst2 t1vst3 t1vst4.
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means
F T1VST2 Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed .491
Sig. .484
t 1.019 1.021
T1VST3
.153
.696
-.189 -.189
T1VST4
.018
.895
.097 .097
t crit = df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 3.01
Difference in winter vs. spring total cholesterol levels between men and women: t(429) = 1.02, ns Difference in winter vs. summer total cholesterol levels between men and women: t(429) = 0.19, ns Difference in winter vs. fall total cholesterol levels between men and women: t(429) = 0.01, ns
11B-80
2007 A. Karpinski
Finally, when we look at the data, we may decide to examine some complex contrasts on the marginal season means:
Gender Male Female Winter 224.06 216.42 220.32 Spring 218.82 213.22 216.07 Season Summer 222.16 214.09 218.21 Fall 222.51 216.09 218.88
i.
Do cholesterol levels in winter differ from average cholesterol levels in summer and fall?
Winter 2 Season Spring Summer 0 -1 Fall -1
ii.
Do cholesterol levels in spring differ from average cholesterol levels in summer and fall?
Winter 0 Spring 2 Season Summer -1 Fall -1
iii.
Do average cholesterol levels in the winter and fall differ from average cholesterol levels in summer and fall?
Winter -1 Spring 1 Season Summer 1 Fall -1
iv. o We cannot test these hypotheses on the marginal means in SPSS by computing a value reflecting this contrast (because we need to keep gender in the analysis). o We must enter these contrasts in the special subcommand as contrasts coefficients on the marginal season means.
11B-81
2007 A. Karpinski
GLM tc1 tc2 tc3 tc4 by sex /WSFACTOR = time 4 special (1 1 1 1 2 0 -1 -1 0 -2 1 1 -1 1 1 -1) .
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source TIME TIME L1 L2 L3 L1 L2 L3 L1 L2 L3 Type III Sum of Squares 5403.773 10326.706 10332.205 4.775 1990.511 210.014 729878.055 590257.230 419138.635 df 1 1 1 1 1 1 429 429 429 Mean Square 5403.773 10326.706 10332.205 4.775 1990.511 210.014 1701.347 1375.891 977.013 F 3.176 7.505 10.575 .003 1.447 .215 Sig. .075 .006 .001 .958 .230 .643
TIME * SEX
Error(TIME)
Winter vs. (Summer and Fall): F (1,429) = 3.18, ns Spring vs. (Summer and Fall): F (1,429) = 7.50, ns (Winter and Fall) vs. (Spring and Summer): F (1,429) = 10.58, p < .05 We also should check to test if these comparisons differ for men and women. o Again, these are complex, interaction post-hoc tests and require a Scheffe correction:
F crit= df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 9.05
11B-82
2007 A. Karpinski
o The tests of these contrasts were provided as interaction contrasts when we asked for the special contrasts:
L1* Sex
Male Female
Winter 2 -2
Fall -1 1
Difference in winter vs. (summer and fall) total cholesterol levels between men and women: F (1,429) = 0.01, ns
L2* Sex
Male Female
Winter 0 0
Spring 2 -2
Season Summer -1 1
Fall -1 1
Difference in spring vs. (summer and fall) total cholesterol levels between men and women: F (1,429) = 1.45, ns
L3* Sex
Male Female
Winter -1 1
Spring 1 -1
Season Summer 1 -1
Fall -1 1
Difference in (winter and fall) vs. (spring and summer) total cholesterol levels between men and women: F (1,429) = .22, ns
11B-83
2007 A. Karpinski
o These differences in complex interaction contrasts can also be conducted by computing the complex contrasts on the marginal season means, and then testing if these contrasts differ by gender For example, first compute a contrast comparing winter to (summer and fall):
Win SumFall = 2Win (Sum + Fall )
Winter
Spring
Season Summer
Fall
Male Female 2 -1 -1
Male Female
Winter 2 -2
Spring 0 0
Fall -1 1
A test of whether the difference in winter vs. (summer and fall) total cholesterol levels is equal for men and women is equivalent to testing if the interaction contrast differs from zero ( H0 : interaction = 0 ).
H 0 : interaction = 0 H 0 : Win SumFall ( men) Win SumFall ( women) = 0 H 0 : Win SumFall ( men) = Win SumFall ( women)
11B-84
2007 A. Karpinski
o In SPSS:
compute t1vst34 = tc1 - (tc3 + tc4)/2. compute t2vst34 = tc2 - (tc3 + tc4)/2. compute t14vst23 = (tc1 + tc4)/2 - (tc2 + tc3)/2. T-TEST GROUPS=sex(0 1) /VARIABLES= t1vst34 t2vst34 t14vst23.
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means
F T1VST34 Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed .007
Sig. .934
t -.053 -.053
T2VST34
.712
.399
-1.203 -1.205
T14VST23
.743
.389
.464 .463
t crit = df season * sex * F (.05, df season * sex , df error ) = 3* F (.05,3,429) = 3* 3.01 = 3.01
Difference in winter vs. (summer and fall) total cholesterol levels between men and women: t(429) = 0.05, ns Difference in spring vs. (summer and fall) total cholesterol levels between men and women: t(429) = 1.20, ns Difference in (winter and fall) vs. (spring and summer) total cholesterol levels between men and women: t(429) = 0.46, ns o These results exactly match the results we obtain by asking for special contrasts (and special*gender interaction contrasts) in the repeatedmeasures ANOVA. Both analyses test the same hypotheses and include gender as a between-subjects factor in the design. Remember that our check of assumptions revealed a number of outliers. We should conduct a sensitivity analysis to see of the outliers affected any of our conclusions.
11B-85
2007 A. Karpinski