SPSS Manual PDF
SPSS Manual PDF
SPSS Manual PDF
PRESENTS
Statistical Analysis
A Manual on Dissertation and Thesis
Statistics in SPSS
AMOS .................................................................................................................................................. 15
2
How to run the Partial Correlation in SPSS ......................................................................................... 34
Canonical Correlation.............................................................................................................................. 47
What is Canonical Correlation analysis? ............................................................................................. 47
Correlations ......................................................................................................................................... 56
3
The Factorial ANOVA in SPSS .............................................................................................................. 70
4
The Double-Multivariate Profile Analysis in SPSS ............................................................................. 112
5
The Multiple Linear Regression in SPSS ............................................................................................ 146
6
WELCOME MESSAGE
Statistics Solutions is dedicated to expediting the dissertation and thesis process for graduate students
by providing statistical help and guidance to ensure a successful graduation. Having worked on my own
mixed method (qualitative and quantitative) dissertation, and with over 18 years of experience in
research design, methodology, and statistical analyses, I present this SPSS user guide, on behalf of
Statistics Solutions, as a gift to you.
The purpose of this guide is to enable students with little to no knowledge of SPSS to open the program
and conduct and interpret the most common statistical analyses in the course of their dissertation or
thesis. Included is an introduction explaining when and why to use a specific test as well as where to
find the test in SPSS and how to run it. Lastly, this guide lets you know what to expect in the results and
informs you how to interpret the results correctly.
Statistics Solutionsoffers a family of solutions to assist you towards your degree. If you would like to
learn more or schedule your free 30-minute consultation to discuss your dissertation research, you can
visit us at www.StatisticsSolutions.com or call us at 877-437-8622.
7
CHAPTER 1: FIRST CONTACT WITH SPSS
SPSS stands for Software Package for the Social Sciences and was rebranded in version 18 to SPSS
(Predictive Analytics Software). Throughout this manual, we will employ the rebranded name, SPSS.
The screenshots you will see are taken from version 18. If you use an earlier version, some of the paths
might be different because the makers of SPSS sometimes move the menu entries around. If you have
worked with older versions before, the two most noticeable changes are found within the graph builder
and the non-paracontinuous-level tests.
SPSS has three basic windows: the data window, the syntax window, and the output window. The
particular view can be changed by going to the Window menu. What you typically see first is the data
window.
Data Window. The data editor window is where the data is either inputted or imported. The data
editor window has two viewsthe data view and the variable view. These two windows can be
swapped by clicking the buttons on the lower left corner of the data window. In the data view, your
data is presented in a spreadsheet style very similar to Excel. The data is organized in rows and
columns. Each row represents an observation and each column represents a variable.
In the variable view, the logic behind each variable is stored. Each variable has a name (in the name
column), a type (numeric, percentage, date, string etc.), a label (usually the full wording of the
question), and the values assigned . For example, in
9
the name column we may have a variable called gender. In the label column we may specify that the
/
females. You can also manage the value to indicate a missing answer, the measurement level scale
(which is metric, ratio, or interval data), ordinal or nominal, and new to SPSS 18 a pre-defined role.
10
The Syntax Window. In the syntax editor window you can program SPSS. Although to
program syntax for virtually all analyses, using the syntax editor is quite useful for two purposes: 1) to
save your analysis steps and 2) to run repetitive tasks. Firstly, you can document your analysis steps and
save them in a syntax file, so others may re-run your tests and you can re-run them as well. To do this
you simply hit the PASTE button you find in most dialog boxes. Secondly, if you have to repeat a lot of
steps in your analysis, for example, calculating variables or re-coding, it is most often easier to specify
these things in syntax, which saves you the time and hassle of scrolling and clicking through endless lists
of variables.
11
The Output Window. The output window is where SPSS presents the results of the analyses you
conducted. Besides the usual status messages, you'll find all of the results, tables, and graphs in here.
In the output window you can also manipulate tables and graphs and reformat them (e.g., to APA 6th
edition style).
12
SPSS Regression
SPSS Regression is the add-on to enlarge the regression analysis capabilities of SPSS. This module
includes multinomial and binary logistic regression, constrained and unconstrained nonlinear regression,
weighted least squares, and probit.
13
14
AMOS
AMOS is a program that allows you to specify and estimate structural equation models. Structural
equation models are published widely especially in the social sciences. In basic terms, structural
equation models are a fancy way of combining multiple regression analyses and interaction effects.
15
CHAPTER 2: CHOOSING THE RIGHT STATISTICAL ANALYSIS
This manual is a guide to help you select the appropriate statistical technique. Your quantitative study is
a process that presents many different options that can lead you down different paths. With the help of
this manual, we will ensure that you choose the right paths during your statistical selection process. The
next section will help guide you towards the right statistical test for your work. However, before you
can select a test it will be necessary to know a thing or two about your data.
When it comes to selecting your test, the level of measurement of your data is important. The
measurement level is also referred to as the scale of your data. The easy (and
slightly simplified) answer is that there are three different levels of
measurement: nominal, ordinal, and scale. In your SPSS data editor the
measure column looks can have exactly those three values.
Measurement Scales
Nominal data is the most basic level of measurement. All data is at least nominal. A characteristic is
measured on a nominal scale if the answer contains different groups or categories like male/female;
treatment group/control group; or multiple categories like colors or occupations, highest degrees
earned, et cetera.
Ordinal data contains a bit more information than nominal data. On an ordinal scale your answer is one
of a set of different possible groups like on a nominal scale, however the ordinal scale allows you to
order the answer choices. Examples of this include all questions where the answer choices are grouped
in ranges, like income bands, age groups, and diagnostic cut-off values, and can also include rankings
(first place, second place, third place), and strengths or quantities of substances (high dose/ medium
dose/ low dose).
Scale data also contains more information than nominal data. If your data is measured on a continuous-
level scale then the intervals and/or ratios between your groups are defined. Technically, you can
define the distance between two ordinal groups by either a ratio or by an interval. What is a ratio scale?
A ratio scale is best defined by what it allows you to do. With scale data you can make claims such as
irst place is twice as good as second placehereas on an ordinal scale you are unable to make these
claims for you cannot know them for certain. Fine examples of scale data include the findings that a
temperature of 120K is half of 240K, and sixty years is twice as many years as thirty years, which is
twice as many years as fifteen. What is an interval scale? An interval scale enables you to establish
intervals. Examples include the findings that the difference between 150ml and 100ml is the same as
the difference between 80ml and 30ml, and five minus three equals two which is the same as twelve
minus two. Most often you'll also find Likert-like scales in the interval scale category of levels of
measurement. An example of a Likert-like scale would include the following question and statements:
How satisfied are you with your life? Please choose an answer from 1 to 7, where 1 is completely
dissatisfied, 2 is dissatisfied, 3 is somewhat dissatisfied, 4 is neither satisfied or dissatisfied, 5 is
somewhat satisfied, 6 is satisfied, and 7 is completely satisfied. These scales are typically interval scales
and not ratio scales because you cannot really claim that dissatisfied (2) is half as satisfied as neither (4).
Similarly, logarithmic scales such as those you find in a lot of indices don't have the same intervals
16
between values, but the distance between observations can be expressed by ratios. [A word of caution:
statisticians often become overly obsessed with the latter category; they want to know for instance if
that scale has a natural zero. For our purposes it is enough to know that if the distance between groups
can be expressed as an interval or ratio, we run the more advanced tests. In this manual we will refer to
interval or ratio data as being of continuous-level scale.]
YY1Y11Y1
XXXX YYY Y XXX?X??Y?YY Y XXX X YYY Y XXX X
YY2Y22Y2
I am
I am
I am
interested
I interested
aminterested
interested
in
in
in
in Relationships
Relationships
Relationships
Relationships Differences
Differences
Differences
Differences Predictions
Predictions
Predictions
Predictions Classifications
Classifications
Classifications
Classifications
Are you interested in the relationship between two variables, for example, the higher X and the higher
Y? Or are you interested in comparing differences such as, y
B? Are you interested in predicting an outcome variable like, z
X? Or are you interested in classifications, for example, y
into group A or B?
X Y
Relationships
Ordinal Spearman
correlation
Scale Point biserial Pearson bivariate
correlation correlation
17
X ? Y
Differences
Scale of the
Nominal Ordinal Scale (ratio,
dependent
(or better) (or better) interval)
variable?
Distribution of
Normal
the dependent
(KS-test not
variable?
significant)
Homo-
Non-equal Equal
scedasticity?
variances variances
2 samples Independent
Samples
T-test
2 dependent Dependent
samples Samples
More than
No 1 independent More than 1 Profiles
2 variables?
confounding variable independent
factor variable
Repeated Repeated
measures of measures
dependent ANOVA
variable
Repeated Repeated
measures of measures
dependent ANCOVA
variable
18
Decision Tree for Predictive Analyses
You have chosen a straightforward family of statistical techniques. Given that your independent
variables are often continuous-level data (interval or ratio scale), you need only consider the scale of
your dependent variable and the number of your independent variables.
X Y
Predictions
My independent
variable is Scale (ratio or interval)
19
Decision Tree for Classification Analyses
If you want to classify your observations you only have two choices. The discriminant analysis has more
reliability and better predictive power, but it also makes more assumptions than multinomial regression.
Thoroughly weigh your two options and choose your own statistical technique.
Yes No
Discriminant Multinomial
analysis regression
20
In quantitative testing we are always interested in the question, Can I generalize my findings from my
sample to the general population? This question refers to the external validity of the analysis. The
ability to establish external validity of findings and to measure it with statistical power is one of the key
strengths of quantitative analyses. To do so, every statistical analysis includes a hypothesis test. If you
took statistics as an undergraduate, you may remember the null hypothesis and levels of significance.
In SPSS, all tests of significance give a p-value. The p-value is the statistical power of the test. The
critical value that is widely used for p is 0.05. That is, for p n reject the null hypothesis; in
most tests this means that we might generalize our findings to the general population. In statistical
terms, generalization creates the probability of wrongly rejecting a correct null hypothesis and thus the
p-value is equal to the Type I or alpha-error.
Remember: If you run a test in SPSS, if the p-value is less than or equal to 0.05, what you found in the
sample is externally valid and can be generalized onto the population.
21
CHAPTER 3: Introducing the two Examples used throughout this manual
In this manual we will rely on the example data gathered from a fictional educational survey. The
sample consists of 107 students aged nine and ten. The pupils were taught by three different methods
(frontal teaching, small group teaching, blended learning, i.e., a mix of classroom and e-learning).
Among the data collected were multiple test scores. These were standardized test scores in
mathematics, reading, writing, and five aptitude tests that were repeated over time. Additionally the
pupils got grades on final and mid-term exams. The data also included several mini-tests which included
newly developed questions for the standardized tests that were pre-tested, as well as other variables
such as gender and age. After the team finished the data collection, every student was given a
computer game (the choice of which was added as a data point as well).
The data set has been constructed to illustrate all the tests covered by the manual. Some of the results
switch the direction of causality in order to show how different measurement levels and number of
variables influence the choice of analysis. The full data set contains 37 variables from 107 observations:
22
CHAPTER 4: Analyses of Relationship
Firstly, the Chi-Square Test can test whether the distribution of a variable in a sample approximates an
assumed theoretical distribution (e.g., normal distribution, Beta). [Please note that the Kolmogorov-
Smirnoff test is another test for the goodness of fit. The Kolmogorov-Smirnov test has a higher power,
but can only be applied to continuous-level variables.]
Secondly, the Chi-Square Test can be used to test of independence between two variables. That means
that it tests whether one variable is independent from another one. In other words, it tests whether or
not a statistically significant relationship exists between a dependent and an independent variable.
When used as test of independence, the Chi-Square Test is applied to a contingency table, or cross
tabulation (sometimes called crosstabs for short).
Typical questions answered with the Chi-Square Test of Independence are as follows:
Medicine - Are children more likely to get infected with virus A than adults?
Sociology - Is there a difference between the marital status of men and woman in their early
30s?
Management - Is customer segment A more likely to make an online purchase than segment B?
As we can see from these questions and the decision tree, the Chi-Square Test of Independence works
with nominal scales for both the dependent and independent variables. These example questions ask
for answer choices on a nominal scale or a tick mark in a distinct category (e.g., male/female,
infected/not infected, buy online/do not buy online).
In more academic terms, most quantities that are measured can be proven to have a distribution that
approximates a Chi-Square distribution. Pearson's Chi Square Test of Independence is an approximate
test. This means that the assumptions for the distribution of a variable are only approximately Chi-
Square. This approximation improves with large sample sizes. However, it poses a problem with small
sample sizes, for which a typical cut-off point is a cell size below five expected occurrences.
23
Taking this into consideration, Fisher developed an exact test for contingency tables with small samples.
Exact tests do not approximate a theoretical distribution, as in this case Chi-Square distribution. Fisher's
exact test calculates all needed information from the sample using a hypergeocontinuous-level
distribution.
What does this mean? Because it is an exact test, a significance value p calculated with Fisher's Exact
Test will be correct; i.(in the long run) will actually reject a true null hypothesis
in 1% of all tests conducted. For an approximate test such as Pearson's Chi-Square Test of
Independence this is only asymptotically the case. Therefore the exact test has exactly the Type I Error
--value.
When applied to a research problem, however, this difference might simply have a smaller impact on
the results. The rule of thumb is to use exact tests with sample sizes less than ten. Also both Fisher's
exact test and Pearson's Chi-Square Test of Independence can be easily calculated with statistical
software such as SPSS.
The Chi-Square Test of Independence is the simplest test to prove a causal relationship between an
independent and one or more dependent variables. As the decision-tree for tests of independence
shows, the Chi-Square Test can always be used.
24
This menu entry opens the crosstabs menu. Crosstabs is short for cross tabulation, which is sometimes
referred to as contingency tables.
The first step is to add the variables to rows and columns by simply clicking on the variable name in the
left list and adding it with a click on the arrow to either the row list or the column list.
The button opens the dialog for the Exact Tests. Exact tests are needed with small cell sizes
below ten respondents per cell. SPSS has the choice between Monte-Carlo simulation and Fisher's Exact
Test. Since our cells have a population greater or equal than ten we stick to the Asymptotic Test that is
Pearson's Chi-Square Test of Independence.
25
The button ^ opens the dialog for the additional statics we want SPSS to compute. Since we
want to run the Chi-Square Test of Independence we need to tick Chi-Square. We also want to include
the contingency coefficient and the correlations which are the tests of interdependence between our
two variables.
The next step is to click on the button Cells This brings up the dialog to specify the information
each cell should contain. Per default, only the Observed Counts are selected; this would create a simple
contingency table of our sample. However the output of the test, the directionality of the correlation,
and the dependence between the variables are interpreted with greater ease when we look at the
differences between observed and expected counts and percentages.
26
The Output of the Chi-Square Test of Independence
The output is quite straightforward and includes only four tables. The first table shows the sample
description.
The second table in our output is the contingency table for the Chi-Square Test of Independence. We
find that there seems to be a gender difference between those who fail and those who pass the exam.
We find that more male students failed the exam than were expected (22 vs. 19.1) and more female
students passed the exam than were expected (33 vs. 30.1).
This is a first indication that our hypothesis should be supportedour hypothesis being that gender has
an influence on whether the student passed the exam. The results of the Chi-Square Test of
Independence are in the SPSS output table Chi-Square Tests:
27
Alongside the Pearson Chi-Square, SPSS automatically computes several other values, Yates continuity
correction for 2x2 tables being one of them. Yates introduced this value to correct for small degrees of
freedom. However, Yates continuity correction has little practical relevance since SPSS calculates
Fisher's Exact Test as well. Moreover the rule of thumb is that for large samples sizes (n > 50) the
continuity correction can be omitted.
Secondly, the Likelihood Ratio, or G-Test, is based on the maximum likelihood theory and for large
samples it calculates a Chi-Square similar to the Pearson Chi-Square. G-Test Chi-Squares can be added
to allow for more complicated test designs.
Thirdly, Fisher's Exact Test, which we discussed earlier, should be used for small samples with a cell size
below ten as well as for very large samples. The Linear-by-Linear Association, which calculates the
association between two linear variables, can only be used if both variables have an ordinal or
continuous-level scale.
The first row shows the results of Chi-Square Test of Independence: the X value is 1.517 with 1 degree
of freedom, which results in a p-value of .218. Since 0.218 is larger than 0.05 we cannot reject the null
hypothesis that the two variables are independent, thus we cannot say that gender has an influence on
passing the exam.
The last table in the output shows us the contingency coefficient, which is the result of the test of
interdependence for two nominal variables. It is similar to the correlation coefficient r and in this case
0.118 with a significance of 0.218. Again the contingency coefficient's test of significance is larger than
the critical value 0.05, and therefore we cannot reject the null hypothesis that the contingency
coefficient is significantly different from zero.
Symmetric Measures
Asymp. Std.
a b
Value Error Approx. T Approx. Sig.
28
Bivariate (Pearson) Correlation
A correlation expresses the strength of linkage or co-occurrence between to variables in a single value
between -1 and +1. This value that measures the strength of linkage is called correlation coefficient,
which is represented typically as the letter r.
The correlation coefficient between two continuous-level variables is also called Pearson's r or Pearson
product-moment correlation coefficient. A positive r value expresses a positive relationship between
the two variables (the larger A, the larger B) while a negative r value indicates a negative relationship
(the larger A, the smaller B). A correlation coefficient of zero indicates no relationship between the
variables at all. However correlations are limited to linear relationships between variables. Even if the
correlation coefficient is zero, a non-linear relationship might exist.
29
Bivariate (Pearson's) Correlation in SPSS
At this point it would be beneficial to create a scatter plot to visualize the relationship between our two
test scores in reading and writing. The purpose of the scatter plot is to verify that the variables have a
linear relationship. Other forms of relationship (circle, square) will not be detected when running
Pearson's Correlation Analysis. This would create a type II error because it would not reject the null
hypothesis of the test of independence ('the two variables are independent and not correlated in the
universe') although the variables are in reality dependent, just not linearly.
30
In the Chart Builder we simply choose in the Gallery tab the Scatter/Dot group of charts and
drag the 'Simple Scatter' diagram (the first one) on the chart canvas. Next we drag variable Test_Score
on the y-axis and variable Test2_Score on the x-Axis.
SPSS generates the scatter plot for the two variables. A double click on the output diagram opens the
chart editor and a click on 'Add Fit Line' adds a linearly fitted line that represents the linear association
that is represented by Pearson's bivariate correlation.
31
To calculate Pearson's bivariate correlation coefficient in SPSS we have to open the dialog in
This opens the dialog box for all bivariate correlations (Pearson's, Kendall's, Spearman). Simply
select the variables you want to calculate the bivariate correlation for and add them with the arrow.
32
Select the bivariate correlation coefficient you need, in this case Pearson's. For the Test of Significance
we select the two-tailed test of significance, because we do not have an assumption whether it is a
positive or negative correlation between the two variables Reading and Writing. We also leave the
default tick mark at flag significant correlations which will add a little asterisk to all correlation
coefficients with p<0.05 in the SPSS output.
In this example Pearson's correlation coefficient is .645, which signifies a medium positive linear
correlation. The significance test has the null hypothesis that there is no positive or negative correlation
between the two variables in the universe (r = 0). The results show a very high statistical significance of
p < 0.001 thus we can reject the null hypothesis and assume that the Reading and Writing test scores
are positively, linearly associated in the general universe.
The initial hypothesis predicted a linear relationship between the test results scored on the
Reading and Writing tests that were administered to a sample of 107 students. The scatter
diagrams indicate a linear relationship between the two test scores. Pearson's bivariate
correlation coefficient shows a medium positive linear relationship between both test scores (r =
.645) that is significantly different from zero (p < 0.001).
33
Partial Correlation
Do storks bring Babies? Pearson's Bivariate Correlation Coefficient shows a positive and
significant relationship between the number of births and the number of storks in a sample of 52
US counties.
Spurious correlations are caused by not observing a third variable that influences the two analyzed
variables. This third, unobserved variable is also called the confounding factor, hidden factor,
suppressor, mediating variable, or control variable. Partial Correlation is the method to correct for the
overlap of the moderating variable.
In the stork example, one cofounding factor is the size of the county larger counties tend to have
larger populations of women and storks andas a clever replication of this study in the Netherlands
showedthe cofounding factor is the weather nine months before the date of observation. Partial
correlation is the statistical test to identify and correct spurious correlations.
34
35
36
This opens the dialog of the Partial Correlation Analysis. First, we select the variables for which we want
to calculate the partial correlation. In our example, these are Aptitude Test 2 and Aptitude Test 5. We
want to control the partial correlation for Aptitude Test 1, which we add in the list of control variables.
The dialog K allows to display additional descriptive statistics (mean and standard deviations)
and the zero-order correlations. /orrelation analysis already, check the zero-order
correlations, as this will include Pearson's Bivariate Correlation Coefficients for all variables in the
output. Furthermore we can manage how missing values will be handled.
37
The Output of Partial Correlation Analysis
The output of the Partial Correlation Analysis is quite straightforward and only consists of a single table.
The first half displays the zero-order correlation coefficients, which are the three Person's Correlation
Coefficients without any control variable taken into account.
The zero-order correlations seem to support our hypothesis that a higher test score on aptitude test 2
increases the test score of aptitude test 5. Both a weak association of r = 0.339, which is highly
significant p < 0.001. However, the variable aptitude test 1 also significantly correlates with both test
scores (r = -0.499 and r = -0.468). The second part of the table shows the Partial Correlation Coefficient
between the Aptitude Test 2 and Aptitude Test 5 when controlled for the baseline test Aptitude 1. The
Partial Correlation Coefficient is now ryz = 0.138 and not significant p = 0.159.
The observed bivariate correlation between the Aptitude Test Score 2 and the score of Aptitude
Test 5 is almost completely explained by the correlation of both variables with the baseline
Aptitude Test 1. The partial correlation between both variables is very weak (rXz= 0.138) and
not significant with p = 0.159. Therefore we cannot reject the null hypothesis that both variables
are independent.
38
Spearman Rank Correlation
All correlation analyses express the strength of linkage or co-occurrence between to variables in a single
value between -1 and +1. This value is called the correlation coefficient. A positive correlation
coefficient indicates a positive relationship between the two variables (the larger A, the larger B) while a
negative correlation coefficients expresses a negative relationship (the larger A, the smaller B). A
correlation coefficient of 0 indicates that no relationship between the variables exists at all. However
correlations are limited to linear relationships between variables. Even if the correlation coefficient is
zero a non-linear relationship might exist.
Compared to Pearson's bivariate correlation coefficient the Spearman Correlation does not require
continuous-level data (interval or ratio), because it uses ranks instead of assumptions about the
distributions of the two variables. This allows us to analyze the association between variables of ordinal
measurement levels. Moreover the Spearman Correlation is a non-paracontinuous-level test, which
does not assume that the variables approximate multivariate normal distribution. Spearman Correlation
Analysis can therefore be used in many cases where the assumptions of Pearson's Bivariate Correlation
(continuous-level variables, linearity, heteroscedasticity, and multivariate normal distribution of the
variables to test for significance) are not met.
Sociology: Do people with a higher level of education have a stronger opinion of whether or not
tax reforms are needed?
Medicine: Does the number of symptoms a patient has indicate a higher severity of illness?
Business: Are consumers more satisfied with products that are higher ranked in quality?
Theoretically, the Spearman correlation calculates the Pearson correlation for variables that are
converted to ranks. Similar to Pearson's bivariate correlation, the Spearman correlation also tests the
null hypothesis of independence between two variables. However this can lead to difficult
interpretations. Kendall's Tau-b rank correlation improves this by reflecting the strength of the
dependence between the variables in comparison.
39
Since both variables need to be of ordinal scale or ranked data, Spearman's correlation requires
converting interval or ratio scales into ranks before it can be calculated. Mathematically, Spearman
correlation and Pearson correlation are very similar in the way that they use difference measurements
to calculate the strength of association. Pearson correlation uses standard deviations while Spearman
correlation difference in ranks. However, this leads to an issue with the Spearman correlation when tied
ranks exist in the sample. An example of this is when a sample of marathon results awards two silver
medals but no bronze medal. A statistician is even crueler to these runners because a rank is defined as
average position in the ascending order of values. For a statistician, the marathon result would have
one first place, two places with a rank of 2.5, and the next runner ranks 4. If tied ranks occur, a more
complicated formula has to be used to calculate rho, but SPSS automatically and correctly calculates tied
ranks.
The Spearman Correlation requires ordinal or ranked data, therefore it is very important that
measurement levels are correctly defined in SPSS. Grade 2 and Grade 3 are ranked data and therefore
measured on an ordinal scale. If the measurement levels are specified correctly, SPSS will automatically
convert continuous-level data into ordinal data. Should we have raw data that already represents
rankings but without specification that this is an ordinal scale, nothing bad will happen.
40
This opens the dialog for all Bivariate Correlations, which also includes Pearson's Bivariate Correlation.
Using the arrow, we add Grade 2 and Grade 3 to the list of variables for analysis. Then we need to tick
the correlation coefficients we want to calculate. In this case the ones we want are Spearman and
Kendall's Tau-b.
41
The first coefficient in the output table is Kendall's Tau-b. Kendall's Tau is a simpler correlation
coefficient that calculates how many concordant pairs (same rank on both variables) exist in a sample.
Tau-b measures the strength of association when both variables are measured at the ordinal level. It
adjusts the sample for tied ranks.
The second coefficient is Spearman's rho. SPSS shows that for example the Bivariate Correlation
''0.507 634. In both cases the significance is p <
0.001. For small samples, SPSS automatically calculates a permutation test of significance instead of the
classical t-test, which violates the assumption of multivariate normality when sample size is small.
One possible interpretation and write-up of Spearman's Correlation Coefficient rho and the test of
significance is as follows:
We analyzed the question of whether the grade achieved in the reading test and the grade
achieved in the writing test are somewhat linked. Spearman's Correlation Coefficient indicates a
strong association between these two variables (0.634). The test of significance indicates
that with p < 0.001 we can reject the null hypothesis that both variables are independent in the
general population. Thus we can say that with a confidence of more than 95% the observed
positive correlation between grade writing and grade reading is not caused by random effects
and both variables are interdependent.
Please always bear in mind that correlation alone does not make for causality.
Point-Biserial Correlation
42
Like all correlation analyses the Point-Biserial Correlation measures the strength of association or co-
occurrence between two variables. Correlation analyses express this strength of association in a single
value, the correlation coefficient.
The Point-Biserial Correlation Coefficient is a correlation measure of the strength of association between
a continuous-level variable (ratio or interval data) and a binary variable. Binary variables are variables of
nominal scale with only two values. They are also called dichotomous variables or dummy variables in
Regression Analysis. Binary variables are commonly used to express the existence of a certain
characteristic (e.g., reacted or did not react in a chemistry sample) or the membership in a group of
observed specimen (e.g., male or female). If needed for the analysis, binary variables can also be
created artificially by grouping cases or recoding variables. However it is not advised to artificially create
a binary variable from ordinal or continuous-level (ratio or scale) data because ordinal and continuous-
level data contain more variance information than nominal data and thus make any correlation analysis
more reliable. For ordinal data use the Spearman Correlation Coefficient rho, for continuous-level (ratio
or scale) data use Pearson's Bivariate Correlation Coefficient r. Binary variables are also called dummy.
The Point-Biserial Correlation Coefficient is typically denoted as rpb .
Like all Correlation Coefficients (e.g. Pearson's r, Spearman's rho), the Point-Biserial Correlation
Coefficient measures the strength of association of two variables in a single measure ranging from -1 to
+1, where -1 indicates a perfect negative association, +1 indicates a perfect positive association and 0
indicates no association at all. All correlation coefficients are interdependency measures that do not
express a causal relationship.
Mathematically, the Point-Biserial Correlation Coefficient is calculated just as the Pearson's Bivariate
Correlation Coefficient would be calculated, wherein the dichotomous variable of the two variables is
either 0 or 1which is why it is also called the binary variable. Since we use the same mathematical
concept, we do need to fulfill the same assumptions, which are normal distribution of the continuous
variable and homoscedasticity.
Biology Do fish react differently to red or green lighted stimulus as food signal? Is there an
association between the color of the stimulus (red or green light) and the reaction time?
Medicine Does a cancer drug prolong life? How strong is the association between
administering the drug (placebo, drug) and the length of survival after treatment?
Sociology Does gender have an influence on salary? Is there an association between gender
(female, male) with the income earned?
Social psychology Is satisfaction with life higher the older you are? Is there an association
between age group (elderly, not elderly) and satisfaction with life?
Economics Does analphabetism indicate a weaker economy? How strong is the association
between literacy (literate vs. illiterate societies) and GDP?
43
Since all correlation analyses require the variables to be randomly independent, the Point-Biserial
Correlation is not the best choice for analyzing data collected in experiments. For these cases a Linear
Regression Analysis with Dummy Variables is the best choice. Also, many of the questions typically
answered with a Point-Biserial Correlation Analysis can be answered with an independent sample t-Test
or other dependency tests (e.g., Mann-Whitney-U, Kruskal-Wallis-H, and Chi-Square). Not only are some
of these tests robust regarding the requirement of normally distributed variables, but also these tests
analyze dependency or causal relationship between an independent variable and dependent variables in
question.
Since we use the Pearson r as Point-Biserial Correlation Coefficient, we should first test whether there is
a relationship between both variables. As described in the section on Pearson's Bivariate Correlation in
SPSS, the first step is to draw the scatter diagram of both variables. For the Point-Biserial Correlation
Coefficient this diagram would look like this.
44
The diagram shows a positive slope and indicates a positive relationship between the math score and
passing the final exam or failing it. Since our variable exam is measured on nominal level (0, 1), a better
way to display the data is to draw a box plot. To create a box plot we select 'and
select the Simple Box plot from the List in the Gallery. Drag Exam on the x-axis and Math Test on the y-
axis.
45
In our example we can see in the box plot that not only are the math scores higher on average for
students who passed the final exam but also that there is almost no overlapping between the two
groups. Now that we have an understanding of the direction of our association between the two
variables we can conduct the Point-Biserial Correlation Analysis.
SPSS does not have a special procedure for the Point-Biserial Correlation Analysis. If a Point-Biserial
Correlation is to be calculated in SPSS, the procedure for Pearson's r has to be used. Therefore we open
the Bivariate Correlations dialog Analyze/Correla
46
nominal variable is of measured on the continuous-level (ratio or interval data).
The table shows that the correlation coefficient for math is r = 0.810, for reading is r = 0.545 and for
writing is r = 0.673. This indicates a strong association between the outcome of the exam and the
previous test scores in reading, writing, and math. The two-tailed test of independence is significant
with p < 0.001. Therefore we can reject our null hypotheses that each variable is independent in the
universe with a confidence level greater than 95%.
A write-up could read as follows:
We analyzed the relationship between passing or failing the final exam and the previously
achieved scores in math, reading, and writing tests. The point-biserial correlation analysis finds
that the variables are strongly and positively associated (r = 0.810, 0.545, and 0.673). That
statistical test of significance confirms that the correlation we found in our sample can be
generalized onto the population our sample was drawn from (p < 0.001). Thus we might say that
a higher test score increases the probability of passing the exam and vice versa.
Canonical Correlation
47
Canonical is the statistical term for analyzing latent variables (which are not directly observed) that
represent multiple variables (which are directly observed). The term can also be found in canonical
regression analysis and in multivariate discriminant analysis.
Canonical Correlation analysis is the analysis of multiple-X multiple-Y correlation. The Canonical
Correlation Coefficient measures the strength of association between two Canonical Variates.
A Canonical Variate is the weighted sum of the variables in the analysis. The canonical variate is
denoted CV. Similarly to the discussions on why to use factor analysis instead of creating unweighted
indices as independent variables in regression analysis, canonical correlation analysis is preferable in
analyzing the strength of association between two constructs. This is such because it creates an internal
structure, for example, a different importance of single item scores that make up the overall score (as
found in satisfaction measurements and aptitude testing).
For multiple x and y the canonical correlation analysis constructs two variates CVX1 = a1x1 + a2x2 + a3x3 +
nxn and CVY1 = b1y1 + b2y2 + b3y3 mym. The canonical weights a1n and b1n are chosen so
that they maximize the correlation between the canonical variates CVX1 and CVY1. A pair of canonical
variates is called a canonical root. This step is repeated for the residuals to generate additional duplets
of canonical variates until the cut-off value = min(n,m) is reached; for example, if we calculate the
canonical correlation between three variables for test scores and five variables for aptitude testing, we
would extract three pairs of canonical variates or three canonical roots. Note that this is a major
difference from factor analysis. In factor analysis the factors are calculated to maximize between-group
variance while minimizing in-group variance. They are factors because they group the underlying
variables.
Canonical Variants are not factors because only the first pair of canonical variants groups the variables in
such way that the correlation between them is maximized. The second pair is constructed out of the
residuals of the first pair in order to maximize correlation between them. Therefore the canonical
variants cannot be interpreted in the same way as factors in factor analysis. Also the calculated
canonical variates are automatically orthogonal, i.e., they are independent from each other.
Similar to factor analysis, the central results of canonical correlation analysis are the canonical
correlations, the canonical factor loadings, and the canonical weights. They can also be used to
calculate d, the measure of redundancy. The redundancy measurement is important in questionnaire
design and scale development. It can answer questions such as, When I measure a five item
satisfaction with the last purchase and a three item satisfaction with the after sales support, can I
exclude one of the two scales for the sake of shortening my questionnaire? Statistically it represents the
proportion of variance of one set of variables explained by the variant of the other set of variables.
The canonical correlation coefficients test for the existence of overall relationships between two sets of
variables, and redundancy measures the magnitude of relationships. Lastly Wilk's lambda (also called U
value) and Bartlett's V are used as a Test of Significance of the canonical correlation coefficient.
Typically Wilk's lambda is used to test the significance of the first canonical correlation coefficient and
Bartlett's V is used to test the significance of all canonical correlation coefficients.
48
A final remark: Please note that the Discriminant Analysis is a special case of the canonical correlation
analysis. Every nominal variable with n different factor steps can be replaced by n-1 dichotomous
variables. The Discriminant Analysis is then nothing but a canonical correlation analysis of a set of
binary variables with a set of continuous-level(ratio or interval) variables.
In the SPSS syntax we need to use the command for MANOVA and the subcommand /discrim in a one
factorial design. We need to include all independent variables in one single factor separating the two
groups by the WITH command. The list of variables in the MANOVA command contains the dependent
variables first, followed by the independent variables (Please do not use the command BY instead of
WITH because that would cause the factors to be separated as in a MANOVA analysis).
The subcommand /discrim produces a canonical correlation analysis for all covariates. Covariates are
specified after the keyword WITH. ALPHA specifies the significance level required before a canonical
49
variable is extracted, default is 0.25; it is typically set to 1.0 so that all discriminant function are
reported. Your syntax should look like this:
To execute the syntax, just highlight the code you just wrote and click on the big green Play button.
The next section reports the canonical correlation coefficients and the eigenvalues of the canonical
roots. The first canonical correlation coefficient is .81108 with an explained variance of the correlation
50
of 96.87% and an eigenvalue of 1.92265. Thus indicating that our hypothesis is correct generally the
standardized test scores and the aptitude test scores are positively correlated.
So far the output only showed overall model fit. The next part tests the significance of each of the roots.
We find that of the three possible roots only the first root is significant with p < 0.05. Since our model
contains the three test scores (math, reading, writing) and five aptitude tests, SPSS extracts three
canonical roots or dimensions. The first test of significance tests all three canonical roots of significance
(f = 9.26 p < 0.05), the second test excludes the first root and tests roots two to three, the last test tests
root three by itself. In our example only the first root is significant p < 0.05.
In the next parts of the output SPSS presents the results separately for each of the two sets of variables.
Within each set, SPSS gives the raw canonical coefficients, standardized coefficients, correlations
between observed variables and the canonical variant, and the percent of variance explained by the
canonical variant. Below are the results for the 3 Test variables.
The raw canonical coefficients are similar to the coefficients in linear regression; they can be used to
calculate the canonical scores.
51
Easier to interpret are the standardized coefficients (mean = 0, st.dev. = 1). Only the first root is
relevant since root two and three are not significant. The strongest influence on the first root is variable
Test_Score (which represents the math score).
The next section shows the same information (raw canonical coefficients, standardized coefficients,
correlations between observed variables and the canonical variant, and the percent of variance
explained by the canonical variant) for the aptitude test variables.
52
Again, in the table standardized coefficients, we find the importance of the variables on the canonical
roots. The first canonical root is dominated by Aptitude Test 1.
The next part of the table shows the multiple regression analysis of each dependent variable (Aptitude
Test 1 to 5) on the set of independent variables (math, reading, writing score).
53
The next section with the analysis of constant effects can be ignored as it is not relevant for the
canonical correlation analysis. One possible write-up could be:
The initial hypothesis is that scores in the standardized tests and the aptitude tests are
correlated. To test this hypothesis we conducted a canonical correlation analysis. The analysis
included three variables with the standardized test scores (math, reading, and writing) and five
variables with the aptitude test scores. Thus we extracted three canonical roots. The overall
model is significant (Wilk's Lambda = .32195, with p < 0.001), however the individual tests of
significance show that only the first canonical root is significant on p < 0.05. The first root
explains a large proportion of the variance of the correlation (96.87%, eigenvalue 1.92265). Thus
we find that the canonical correlation coefficient between the first roots is 0.81108 and we can
assume that the standardized test scores and the aptitude test scores are positively correlated in
the population.
54
CHAPTER 5: Analyses of Differences
However the independent variable t-test is somewhat different. The independent variable t-test
compares whether a variable is zero in the population. This can be done because many statistical
-distribution when the underlying variables that go
into the calculation are multivariate normal distributed.
How is that relevant? The independent variable t-test is most often used in two scenarios: (1) as the
test of significance for estimated coefficients and (2) as the test of independence in correlation analyses.
To start, the independent variable t-test is extremely important for statistical analyses that calculate
variable weights, for example linear regression analysis, discriminant analysis, canonical correlation
analysis, or structural equation modeling. These analyses use a general linear model and an
optimization mechanic, for example maximum likelihood estimates, to build a linear model by adding
the weighted variables in the analysis. These estimations calculate the coefficients for the variables in
the analysis. The independent variable t-test is used to check whether these variable weights exist in
the general population from which our sample was drawn or whether these weights are statistical
artifacts only found by chance.
Most statistical packages, like SPSS, test the weights of linear models using ANOVA and the t-test,
because the ANOVA has a higher statistical power and also because the relationship is quite simple
(t=F). However the ANOVA makes additional assumptions, for example homoscedasticity that the t-test
does not need. Most statistical packages used to estimate Structural Equation Models, e.g., AMOS, EQS,
LISREL, call the independent variable t-test z-statistics. In some older versions the tests are called T-
values. It is also noteworthy that the overall goodness of fit of a structural equation model when based
on the covariance matrix uses the chi-square distribution. Individual path weights are based on the t-
test or ANOVA, which typically can be specified when defining the model.
Once again, it is important to point out that all t-tests assume multivariate normality of the underlying
variables in the sample. The test is robust for large sample sizes and in data sets where the underlying
distributions are similar even when they are not multivariate normal. If the normality assumption is
violated, the t-values and therefore the levels of significance are too optimistic.
55
Secondly, the independent variable t-test is used as the test of independence in correlation analyses,
e.g., Pearson's bivariate r, point-biserial r, canonical correlation analysis, classical test of Spearman's rho.
These variables calculate a sample test value to measure independence, whose distribution
t distribution for large sample sizes. In correlation analysis the independent
variable t-test assesses whether the variables are independent in the population or whether the linkage
between the variables found in the sample is caused by chance.
The independent variable t-test is virtually identical to the 1-sample t-test. But where the independent
variable t-test tests significance or the independence of a derived statistical measurement, the 1-sample
t-test tests whether a mean score calculated from a sample equals certain hypothetically assumed value
(e.g., zero, or a known population mean).
Correlations
In the bivariate correlation analyses it
is included in the general correlation
menu. It is marked per default and
cannot be unchecked. The only thing
that can be selected is whether a one-
tailed or a two-tailed independent
variable t-test shall be calculated.
The SPSS Syntax for the independent variable t-test is in the subcommand /PRINT=TWOTAIL NOSIG,
where TWOTAIL indicates two-tailed t-test, and NOSIG that significant correlations are to be flagged.
56
The result of the test is included in the correlation matrix in the rows where it says significance (2-
tailed). In this example resident population and murder rate are not correlated. The correlation
coefficient is relatively small and with a p=0.814 the null hypothesis of the independent variable t-test
cannot be rejected. Thus we assume that the coefficient is not different from zero in the universe the
sample was drawn from.
Regression
In regression analysis the independent variable t-test can be switched on in the statistics dialog box of
the linear regression menu
Analyze/ Regression/ >
57
This will calculate the coefficients table that includes the t-test, otherwise only the F-Test for overall
significance of the model will be calculated (null hypothesis: all estimates are not zero). And the table
will include the 95% confidence interval for the estimated coefficients and constants.
In this example the estimated linear regression equation is math test score = 36.824 + 0.795* reading
test score. The independent variable t-test shows that the regression constant b = 36.824 is significantly
different from zero, and that the regression coefficient a = 0.795 is significantly different from zero as
well. The independent variable t-test can also be used to construct the confidence interval of the
coefficientsin this case the 95% confidence interval for the coefficient is [.580, .1.009].
One-Way ANOVA
For some statisticians the ANOVA doesn't end there - they assume a cause effect relationship and say
that one or more independent, controlled variables (the factors) cause the significant difference of one
or more characteristics. The way this works is that the factors sort the data points into one of the
groups and therefore they cause the difference in the mean value of the groups.
Example: Let us claim that woman have on average longer hair than men. We find twenty
undergraduate students and measure the length of their hair. A conservative statistician would then
claim we measured the hair of ten female and ten male students, and that we conducted an analysis of
variance and found that the average hair of female undergraduate students is significantly longer than
the hair of their fellow male students.
The ANOVA is a popular test; it is the test to use when conducting experiments. This is due to the fact
that it only requires a nominal scale for the independent variables - other multivariate tests (e.g.,
regression analysis) require a continuous-level scale. This following table shows the required scales for
some selected tests.
58
Independent Variable
Metric Non-metric
metric Regression ANOVA
Dependent
Variable Non-metric Discriminant Analysis
(Chi-Square)
The F-test, the T-test, and the MANOVA are all similar to the ANOVA. The F-test is another name for an
ANOVA that only compares the statistical means in two groups. This happens if the independent
variable for the ANOVA has only two factor steps, for example male or female as a gender.
The T-test compares the means of two (and only two) groups when the variances are not equal. The
equality of variances (also called homoscedasticity or homogeneity) is one of the main assumptions of
the ANOVA (see assumptions, Levene Test, Bartlett Test). MANOVA stands for Multivariate Analysis of
Variance. Whereas the ANOVA can have one or more independent variables, it always has only one
dependent variable. On the other hand the MANOVA can have two or more dependent variables.
Medicine - Does a drug work? Does the average life expectancy significantly differ between the
three groups that received the drug versus the established product versus the control?
Sociology - Are rich people happier? Do different income classes report a significantly different
satisfaction with life?
Management Studies - What makes a company more profitable? A one, three or five-year
strategy cycle?
First we examine the multivariate normality of the dependent variable. We can check graphically either
with a histogram (^&
Q-Q-Plot (Analyze/Descriptive Statistics/Q-Q-W). Both plots show a somewhat normal distribution,
with a skew around the mean.
59
Secondly, we can test for multivariate normality with the Kolmogorov-Smirnov goodness of fit test
(Analyze/Nonparacontinuous-level
Test/Legacy Dialog^<^).
An alternative to the K-S test is the Chi-
Square goodness of fit test, but the K-S
test is more robust for continuous-level
variables.
If normality is not present, we could exclude the outliers to fix the problem, center the variable by
deducting the mean, or apply a non-linear transformation to the variable creating an index.
60
Options
61
criterion that decides between the t-test and the ANOVA.
Contrasts
The last dialog box is contrasts. Contrasts are differences in mean scores. It allows you to group
multiple groups into one and test the average mean of the two groups against our third group. Please
note that the contrast is not always the mean of the pooled groups! Contrast = (mean first group + mean
second group)/2. It is only equal to the pooled mean, if the groups are of equal size. It is also possible
to specify weights for the contrasts, e.g., 0.7 for group 1 and 0.3 for group 2. We do not specify
contrasts for this demonstration.
62
The Output of the One-Way ANOVA
The result consists of several tables. The first table is the Levene Test or the Test of Homogeneity of
Variances (Homoscedasticity). The null
hypothesis of the Levene Test is: the
variances are equal. The test in our
example is significant with p = 0.000 <
0.05 thus we can reject the null
hypothesis and cannot (!) assume that
the variances are equal between the
groups with variations. Technically this means that the t-test with unequal variances is the right test to
answer our research question. However, we proceed with the ANOVA.
The next table presents the results of the ANOVA. Mathematically, the ANOVA splits the total variance
into explained variance (between groups) and unexplained variance (within groups), the variance is
defined as Var(x) = sum of squares(x) / degrees of freedom(x). The F-value, which is the critical test
value that we need for the ANOVA is defined as F = Varb / Varw .
63
passed and students who failed the final exam (p < 0.001).
One-Way ANCOVA
In basic terms, the ANCOVA examines the influence of an independent variable on a dependent variable
while removing the effect of the covariate factor. ANCOVA first conducts a regression of the
independent variable (i.e., the covariate) on the dependent variable. The residuals (the unexplained
variance in the regression model) are then subject to an ANOVA. Thus the ANCOVA tests whether the
independent variable still influences the dependent variable after the influence of the covariate(s) has
been removed. The One-Way ANCOVA can include more than one covariate, and SPSS handles up to
ten. The ANCOVA model has more than one covariate it is possible to calculate the one-way ANCOVA
using contrasts just like in the ANOVA to identify the influence of each covariate.
The ANCOVA is most useful in that it (1) explains an ANOVA's within-group variance, and (2) controls
confounding factors. Firstly, as explained in the chapter on the ANOVA, the analysis of variance splits
the total variance of the dependent variable into:
1. Variance explained by the independent variable (also called between groups variance)
The ANCOVA looks at the unexplained variance and tries to explain some of it with the covariate(s).
Thus it increases the power of the ANOVA by explaining more variability in the model.
Note that just like in regression analysis and all linear models, over-fitting might occur. That is, the more
covariates you enter into the ANCOVA, the more variance it will explain, but the fewer degrees of
freedom the model has. Thus entering a weak covariate into the ANCOVA decreases the statistical
power of the analysis instead of increasing it.
Secondly, the ANCOVA eliminates the covariates effect on the relationship between independent and
dependent variable that is tested with an ANOVA. The concept is very similar to the partial correlation
analysistechnically it is a semi-partial regression and correlation.
The One-Way ANCOVA needs at least three variables. These variables are:
x The independent variable, which groups the cases into two or more groups. The
independent variable has to be at least of nominal scale.
x The covariate, or variable that moderates the impact of the independent on the dependent
variable. The covariate needs to be a continuous-level variable (interval or ratio data). The
64
covariate is sometimes also called confounding factor, or concomitant variable. The
ANCOVA covariate is often a pre-test value or a baseline.
Medicine - Does a drug work? Does the average life expectancy significantly differ between
the three groups that received the drug versus the established product versus the control?
This question can be answered with an ANOVA. The ANCOVA allows to additionally control
for covariates that might influence the outcome but have nothing to do with the drug, for
example healthiness of lifestyle, risk taking activities, or age.
Sociology - Are rich people happier? Do different income classes report a significantly
different satisfaction with life? This question can be answered with an ANOVA. Additionally
the ANCOVA controls for confounding factors that might influence satisfaction with life, for
example, marital status, job satisfaction, or social support system.
Management Studies - What makes a company more profitable? A one, three or five-year
strategy cycle? While an ANOVA answers the question above, the ANCOVA controls
additional moderating influences, for example company size, turnover, stock market indices.
Is there a difference in the standardized math test scores between students who passed the
exam and students who failed the exam, when we control for reading abilities?
65
In the dialog boxes Model, Contrasts, and Plots we leave all settings on the default. The field post hocs is
disabled when one or more covariates are entered into the analysis. If it is of interest, for the factor
level that has the biggest influence a contrast can be added to the analysis. If we want to compare all
66
groups against a specific group, we need to select Simple as the contrast method. We also need to
specify if the first or last group should be the
group to which all other groups are
compared. For our example we want to
compare all groups against the classroom
lecture, thus we add the contrast Exam
(Simple) first.
67
results. Note that the total sum of Type III squares is the same as in the regular ANOVA model for the
same dependent and independent variable. The ANCOVA just modifies how the variance is explained.
The results show that the covariate and the independent variable are both significant to the ANCOVA
model. The next table shows the estimates for the marginal means. These are the group differences in
the dependent variable after the effect of the covariate has been accounted for.
The last table is the Univariate Test, which is the ANOVA test of the difference of the estimated marginal
means.
68
The analysis of covariance was used to investigate the hypothesis that the observed difference in
mean scores of the standardized math test is caused by differences in reading ability as
measured by the standardized reading test. The ANCOVA, however, found that the marginal
means of both the students who failed the exam and the students who passed the exam are
highly significant (F = 113.944, p < 0.001)that is, after the effect of the reading score has been
accounted for.
Factorial ANOVA
For some statisticians, the factorial ANOVA doesn't only compare differences but also assumes a cause-
effect relationship; this infers that one or more independent, controlled variables (the factors) cause the
significant difference of one or more characteristics. The way this works is that the factors sort the data
points into one of the groups, causing the difference in the mean value of the groups.
Independent Variables
1 2+
69
Example: Let us claim that blonde women have 2+ Multiple MANOVA
on average longer hair than brunette women as ANOVAs
well as men of all hair colors. We find 100
undergraduate students and measure the length of their hair. A conservative statistician would then
state that we measured the hair of 50 female (25 blondes, 25 brunettes) and 25 male students, and we
conducted an analysis of variance and found that the average hair of blonde female undergraduate
students was significantly longer than the hair of their fellow students. A more aggressive statistician
would claim that gender and hair color have a direct influence on the length of a
Most statisticians fall into the second category. It is generally assumed that the factorial ANOVA is an
analysis of dependencies. It is referred to as such because it tests to prove an assumed cause-effect
relationship between the two or more independent variables and the dependent variables. In more
statistical terms it tests the effect of one or more independent variables on one dependent variable. It
assumes an effect of Y = f(x1, x2, x3n).
The factorial ANOVA is closely related to both the one-way ANOVA (which we already discussed) and the
MANOVA (Multivariate Analysis of Variance). Whereas the factorial ANOVAs can have one or more
independent variables, the one-way ANOVA always has only one dependent variable. On the other
hand, the MANOVA can have two or more dependent variables.
The table helps to quickly identify the right Analysis of Variance to choose in different scenarios. The
factorial ANOVA should be used when the research question asks for the influence of two or more
independent variables on one dependent variable.
Examples of typical questions that are answered by the ANOVA are as follows:
Medicine - Does a drug work? Does the average life expectancy differ significantly between
the 3 groups x 2 groups that got the drug versus the established product versus the control
and for a high dose versus a low dose?
Sociology - Are rich people living in the country side happier? Do different income classes
report a significantly different satisfaction with life also comparing for living in urban versus
suburban versus rural areas?
Management Studies Which brands from the BCG matrix have a higher customer loyalty?
The BCG matrix measures brands in a brand portfolio with their business growth rate (high
versus low) and their market share (high versus low). To which brand are customers more
loyal stars, cash cows, dogs, or question marks?
Do gender and passing the exam have an influence how well a student scored on the
standardized math test?
70
This question indicates that the dependent variable is the score achieved on the standardized math tests
and the two independent variables are gender and the outcome of the final exam (pass or fail).
The factorial ANOVA is part of the SPSS GLM procedures, which are found in the menu Analyze/General
Linear Model/Univariate.
71
which of the means in our design are different, or if indeed they are different. In order to do this, post
hoc tests would be needed. If you want to include post hocs a good test to use is the Student-Newman-
Keuls test (or short S-N-K). The SNK pools the groups that do not differ significantly from each other.
Therefore it improves the reliability of the post hoc comparison by increasing the sample size used in the
comparison. Another advantage is that it is simple to interpret.
The Options dialog allows us to add descriptive statistics, the Levene Test and the practical significance
(estimated effect size) to the output and also the mean comparisons.
The Contrast dialog in the GLM procedure model us to group multiple groups into one and test the
average mean of the two groups against our third group. Please note that the contrast is not always the
mean of the pooled groups! Contrast = (mean first group + mean second group)/2. It is only equal to the
pooled mean if the groups are of equal size. In our example we do without contrasts.
And finally the dialog W allows us to add profile plots for the main and interaction effects to our
factorial ANOVA.
72
error variance is homogenous. Although this violates the ANOVA assumption, we proceed with the test.
The next table shows the results of the factorial ANOVA. The ANOVA splits the total variance (Type III
Sum of Squares) into explained variance (between groups) and unexplained variance (within groups),
where the variance is Var = sum of squares / df. The F-value is then the F = Varb / Varw. The variance is
split into explained and unexplained parts for the main effects of each factor and for the interaction
effect of the factors in the analysis.
If both effects are significant, a marginal means plot is used to illustrate the different effect sizes. While
the tables are hard to read quickly charting them in Excel helps to understand the effect sizes of our
factorial ANOVA. The slope of the line of the factor Exam is steeper than the slope of the factor Gender,
thus in our factorial ANOVA model exam has a larger impact than gender on the dependent variable
math score.
73
In summary, we can conclude:
The factorial ANOVA shows that a significant difference exists between the average math scores
achieved by students who passed the exam and students who failed the final exam. However
there is no significant difference between gender or the interaction effect of gender and outcome
of the final exam.
Factorial ANCOVA
In basic terms, the ANCOVA looks at the influence of two or more independent variables on a
dependent variable while removing the effect of the covariate factor. ANCOVA first conducts a
regression of the independent variables (the covariate) on the dependent variable. The residuals (the
unexplained variance in the regression model) are then subject to an ANOVA. Thus the ANCOVA tests
whether the independent variables still influence the dependent variable after the influence of the
covariate(s) has been removed.
The factorial ANCOVA includes more than one independent variable and the factorial ANCOVA can
include more than one covariate, SPSS handles up to ten. If the ANCOVA model has more than one
covariate it is possible to run the factorial ANCOVA with contrasts and post hoc tests just like the one-
way ANCOVA or the ANOVA to identify the influence of each covariate.
The factorial ANCOVA is most useful in two ways: 1) it explains a factorial ANOVA's within-group
variance, and 2) it controls confounding factors.
First, the analysis of variance splits the total variance of the dependent variable into:
x Variance explained by all of the independent variables together (also called the interaction
effect)
The factorial ANCOVA looks at the unexplained variance and tries to explain some of it with the
covariate(s). Thus it increases the power of the factorial ANOVA by explaining more variability in the
model. [Note that just like in regression analysis and all linear models, over-fitting might occur. That is,
the more covariates you enter into the factorial ANCOVA the more variance it will explain, but the fewer
degrees of freedom the model has. Thus entering a weak covariate into the factorial ANCOVA decreases
the statistical power of the analysis instead of increasing it.]
74
Secondly, the factorial ANCOVA eliminates the covariates effect on the relationship between
independent variables and the dependent variable which is tested with a factorial ANOVA. The concept
is very similar to the partial correlation analysis. Technically it is a semi-partial regression and
correlation.
The factorial ANCOVA needs at least four variables (the simplest case with two factors is called two-way
ANCOVA):
x Two or more independent variables, which group the cases into four or more groups. The
independent variable has to be at least of nominal scale.
x The covariate, also referred to as the confounding factor, or concomitant variable, is the
variable that moderates the impact of the independent on the dependent variable. The
covariate needs to be a continuous-level variable (interval or ratio data). The ANCOVA
covariate is often a pre-test value or a baseline.
Medicine - Does a drug work? Does the average life expectancy significantly differ between
the three groups that received the drug versus the established product versus the control
and accounting for the dose (high/low)? This question can be answered with a factorial
ANOVA. The factorial ANCOVA allows additional control of covariates that might influence
the outcome but have nothing to do with the drug, for example healthiness of lifestyle, risk
taking activities, age.
Sociology - Are rich people living in the countryside happier? Do different income classes
report a significantly different satisfaction with life when looking where they live (urban,
suburban, rural)? This question can be answered with a factorial ANOVA. Additionally the
factorial ANCOVA controls for confounding factors that might influence satisfaction with life,
e.g., marital status, job satisfaction, social support system.
Management Studies - Which brands from the BCG matrix have a higher customer loyalty?
The BCG matrix measures brands in a brand portfolio with their business growth rate
(high/low) and their market share (high/ low). A factorial ANOVA answers the question to
which brand are customers more loyal stars, cash cows, dogs, or question marks? And a
factorial ANCOVA can control for confounding factors, like satisfaction with the brand or
appeal to the customer.
75
We return to the research question from the chapter on the factorial ANOVA. This time we want to
know if gender and the outcome of the final exam (pass /fail) have an influence on the math score when
we control for the reading ability as measured by the score of the standardized reading test.
76
The Dependent Variable is the Math Test, and the
Covariate is the Reading Test.
The next table shows the Factorial ANCOVA results. The goodness of fit of the model indicates that the
covariate reading test score (p = 0.002) and the direct effects of exam (p<0.001) are significant, while
neither the direct effect of gender (p = 0.274) nor the interaction effect gender * exam (p = 0.776) are
significant.
77
The practical significance of each of the variables in our factorial ANCOVA model is displayed as Partial
Eta Squared, which is the partial variance explained by that variable. Eta squared ranges from 0 to 1,
where 0 indicates no explanatory power and 1 indicates perfect explanatory power. Eta square is useful
to compare the power of different variables, especially when designing experiments or questionnaires.
It is useful to include variables on the basis of the Eta-Square Analysis.
In our example we see that the two non-significant effects, gender and gender * exam, have a very small
eta (< 0.01). Between the two significant effects (the covariate and the outcome of the exam), the exam
has a higher explanatory power than the test2 reading score.
The factorial ANCOVA shows that a significant difference exists between the average math
scores achieved by students who passed the exam and students who failed the final exam when
controlling for the reading ability as measured by the score achieved in the standardized reading
test. However there is no significant difference between gender or the interaction effect of
gender and outcome of the final exam.
One-Way MANOVA
78
What is the One-Way MANOVA?
MANOVA is short for Multivariate ANalysis Of Variance. The main purpose of a one-way ANOVA is to
test if two or more groups differ from each other significantly in one or more characteristics. A factorial
ANOVA compares means across two or more variables. Again, a one-way ANOVA has one independent
variable that splits the sample into two or more groups whereas the factorial ANOVA has two or more
independent variables that split the sample in four or more groups. A MANOVA now has two or more
independent variables and two or more dependent variables.
For some statisticians the MANOVA doesn't only compare differences in mean scores between multiple
groups but also assumes a cause effect relationship whereby one or more independent, controlled
variables (the factors) cause the significant difference of one or more characteristics. The factors sort
the data points into one of the groups causing the difference in the mean value of the groups.
Example:
A research team wants to test the user acceptance with a new online travel booking tool. The
team conducts a study where they assign 30 randomly chosen people into 3 groups. The first
group needs to book their travel through an automated online-portal; the second group books
over the phone via a hotline; the third group sends a request via the online-portal and receives a
call back. The team measures user acceptance as the behavioral intention to use the system,
they do measure the latent construct behavioral intention with 3 variables ease of use,
perceived usefulness, effort to use.
Independent Variables
Metric Non-metric
In the example, some statisticians argue that the MANOVA can only find the differences in the
behavioral intention to use the system. However, some statisticians argue that you can establish a
causal relationship between the channel they used and the behavioral intention for future use. It is
generally assumed that the MANOVA is an analysis of dependencies. It is referred to as such because it
proves an assumed cause-effect relationship between two or more independent variables and two or
more dependent variables. In more statistical terms, it tests the effect of one or more independent
variables on one or more dependent variables.
When faced with a question similar to the one in our example you could also try to run a 3 factorial
ANOVAs, testing the influence of the three independent variables (the three channels) on each of the
three dependent variables (ease of use, perceived usefulness, effort to use) individually. However
79
running multiple factorial ANOVAs does not account for the full variability in all three dependent
variables and thus the test has lesser power than the MANOVA.
Another thing you might want to try is running a factor analysis on the three dependent variables and
then running a factorial ANOVA. The factor analysis reduces the variance within the three dependent
variables to one factor, thus this procedure does have lesser power than the MANOVA.
A third approach would be to conduct a discriminant analysis and switch the dependent and
independent variables. That is the discriminant analysis uses the three groups (online, phone, call back)
as the dependent variable and identifies the significantly discriminating variables from the list of
continuous-level variables (ease of use, perceived usefulness, effort to use).
Mathematically, the MANOVA is fully equivalent to the discriminant analysis. The difference consists of
a switching of the independent and dependent variables. Both the MANOVA and the discriminant
analysis are a series of canonical regressions. The MANOVA is therefore the best test use when
conducting experiments with latent variables. This is due to the fact that it only requires a nominal scale
for the independent variables which typically represent the treatment. This includes multiple
continuous-level independent variables which typically measure one latent (not directly observable)
construct.
The MANOVA is much like the one-way ANOVA and the factorial ANOVA in that the one-way ANOVA has
exactly one independent and one dependent variable. The factorial ANOVAs can have one or more
independent variables but always has only one dependent variable. On the other hand the MANOVA
can have two or more dependent variables.
The following table helps to quickly identify the right analysis of variance to choose in different
scenarios.
Sociology - Are rich people living in the country side happier? Do they enjoy their lives more
and have a more positive outlook on their futures? Do different income classes report a
significantly different satisfaction, enjoyment and outlook on their lives? Does the area in
which they live (suburbia/city/rural) affect their happiness and positive outlook?
Management Studies Which brands from the BCG matrix do have a higher customer
loyalty, brand appeal, customer satisfaction? The BCG matrix measures brands in a brand
portfolio with their business growth rate (high/low) and their market share (high/low). To
80
which brand are customers more loyal, more attracted, and more satisfied with? Stars, cash
cows, dogs, or question marks?
Do gender and the outcome of the final exam influence the standardized test scores of math,
reading, and writing?
The research question indicates that this analysis has multiple independent variables (exam and gender)
and multiple dependent variables (math, reading, and writing test scores). We will skip the check for
multivariate normality of the dependent variables; the sample we are going to look at has some
violations of the assumption set forth by the MANOVA.
The MANOVA can be found in SPSS in Analyze/General Linear Model/Multivariate, which opens the
dialog for multivariate GLM procedure (that is GLM with more than one dependent variable). The
multivariate GLM model is used to specify the MANOVAs.
81
To answer our research question we
need to specify a full-factorial model
that includes the test scores for math,
reading, and writing as dependent
variable. Plus the independent
variables gender and exam, which
represent a fixed factor in our research
design.
82
equal size. In our example we do without contrasts.
Lastly the W dialog allows us to add profile plots for the main and interaction effects to our
MANOVA. However it is easier to create the marginal means plots that are typically reported in
academic journals in Excel.
The next table shows the overall model tests of significance. Although
Wilk's Lambda is typically used to measure the overall goodness of fit
of the model, SPSS computes other measures as well. In our example
we see that exam has a significant influence on the dependent variables, while neither gender nor the
interaction effect of gender * exam have a significant influence on the dependent variables.
83
The next table is the result of the Levene Test of homogeneity of error variances. In the case of a
MANOVA the Levene Test technically tests for the homogeneity of the error variances, that is, the
variability in the error in measurement along the scale. As we can see the test is not significant for two
of the three dependent variables (p=0.000, 0.945, and 0.524). Therefore we cannot reject the null
hypothesis that the error variance is homogenous for only reading and writing.
The next table shows the results of the MANOVA. The MANOVA extracts the roots of the dependent
variables and then basically examines a factorial ANOVAs on the root of the dependent variables for all
independent variables. The MANOVA splits their total variance into explained variance (between
groups) and unexplained variance (within groups), where the variance is Var = sum of squares / df. The
F-value is then the F = Varb / Varw. This is done for the main effects each factor has on its own and the
interaction effect of the factors.
84
In our example the MANOVA shows that the exam (pass versus fail) has a significant influence on the
math, reading, and writing scores, while neither gender nor the interaction effect between gender and
exam have a significant influence on the dependent variables. The MANOVA's F-test tests the null
hypothesis that the mean scores are equal, which is the same as saying that there is no effect on the
dependent variable.
The MANOVA shows that the outcome of the final exam significantly influences the standardized
test scores of reading, writing, and math. All scores are significantly higher for students passing
the exam. However no significant difference for gender or the interaction effect of exam and
gender could be found in our sample.
85
One-Way MANCOVA
In basic terms, the MANCOVA looks at the influence of one or more independent variables on one
dependent variable while removing the effect of one or more covariate factors. To do that the One-Way
MANCOVA first conducts a regression of the covariate variables on the dependent variable. Thus it
eliminates the influence of the covariates from the analysis. Then the residuals (the unexplained
variance in the regression model) are subject to an MANOVA, which tests whether the independent
variable still influences the dependent variables after the influence of the covariate(s) has been
removed. The One-Way MANCOVA includes one independent variable, one or more dependent
variables and the MANCOVA can include more than one covariate, and SPSS handles up to ten. If the
One-Way MANCOVA model has more than one covariate it is possible to run the MANCOVA with
contrasts and post hoc tests just like the one-way ANCOVA or the ANOVA to identify the strength of the
effect of each covariate.
The One-Way MANCOVA is most useful for two things: 1) explaining a MANOVA's within-group variance,
and 2) controlling confounding factors. Firstly, as explained in the section tHow To Conduct a
MANOVA, the analysis of variance splits the total variance of the dependent variable into:
x Variance explained by all of the independent variables together (also called the interaction
effect)
The One-Way MANCOVA looks at the unexplained variance and tries to explain some of it with the
covariate(s). Thus it increases the power of the MANOVA by explaining more variability in the model.
[Note that just like in regression analysis and all linear models over-fitting might occur. That is, the
more covariates you enter into the MANCOVA the more variance will be explained, but the fewer degrees
of freedom the model has. Thus entering a weak covariate into the One-Way MANCOVA decreases the
statistical power of the analysis instead of increasing it.]
Secondly, the One-Way MANCOVA eliminates the covariates' effects on the relationship between
independent variables and the dependent variablesan effect that is typically tested using a MANOVA.
The concept is very similar to the concept behind partial correlation analysis; technically a MANCOVA is
a semi-partial regression and correlation.
86
x One independent variable, which groups the cases into two or more groups, i.e., it has two
or more factor levels. The independent variable has to be at least of nominal scale.
x Two or more dependent variables, which the independent variable influences. The
dependent variables have to be of continuous-level scale (interval or ratio data). Also, they
need to be homoscedastic and multivariate normal.
x One or more covariates, also called confounding factors or concomitant variables. These
variables moderate the impact of the independent factor on the dependent variables. The
covariates need to be continuous-level variables (interval or ratio data). The One-Way
MANCOVA covariate is often a pre-test value or a baseline.
Does the score achieved in the standardized math, reading, and writing test depend on the
outcome of the final exam, when we control for the age of the student?
This research question means that the three test scores are the dependent variables, the outcome of
the exam (fail vs. pass) is the independent variable and the age of the student is the covariate factor.
87
A click on this menu entry brings up the GLM dialog, which allows us to specify any linear model. For
MANCOVA design we need to add the independent variable (exam) to the list of fixed factors.
[Remember that the factor is fixed, if it is deliberately manipulated and not just randomly drawn from a
population. In our MANCOVA example this is the case. This also makes the ANCOVA the model of choice
when analyzing semi-partial correlations in an experiment, instead of the partial correlation analysis
which requires random data.]
If the MANCOVA is a factorial MANCOVA and
not a One-Way MANCOVA, i.e., includes more
than one independent variable, you could
choose to compare the main effects of those
independent variables. The MANCOVA output
would then include multiple ANOVAs that
compare the factor levels of the independent
variables. However, even if we adjust the
confidence interval using the Bonferroni
method, conducting multiple pairwise ANOVAs
88
will multiply the error terms. Thus this method of testing main effects is typically not used anymore,
and has been replaced by multivariate tests, e.g., Wilk's' Lambda.
89
the variances are equal. We may assume that the data of the reading and writing tests is homoscedastic
in all cells. For the math tests, however, we may not assume homoscedasticity. For large samples, the
T-tests and F-tests that follow are somewhat robust against this problem. Again, a log transformation or
centering the data on the mean might rectify the problem.
The next table finally shows the MANCOVA results. The direct effects of the exam are significant for all
three tests (math, reading, and writing). The effect of the covariate age is not significant (p > 0.05 p is
0.665, 0.07, and 0.212).
The MANCOVA shows that the outcome of the final exam significantly influences the
standardized test scores of reading, writing, and math. All scores are significantly higher for
students passing the exam than for the students failing the exam. When controlling the effect
for the age of the student however, we find no significant effect of the covariate factor and no
changes in the external validity of the influence of the exam outcome on the test scores.
90
Example:
A research team wants to test the user acceptance of a new online travel booking tool. The
team conducts a study where they assign 30 randomly chosen people into two groups. One
group uses the new system and another group acts as a control group and books its travel via
phone. The team measures the user acceptance of the system as the behavioral intention to use
the system in the first 4 weeks after it went live. Since user acceptance is a latent behavioral
construct the researchers measure it with three items ease of use, perceived usefulness, and
effort to use.
When faced with a question similar to the one in our example, you could also try to run 4 MANOVAs,
testing the influence of the independent variables on each of the observations of the four weeks.
Running multiple ANOVAs, however, does not account for individual differences in baselines of the
participants of the study.
The repeated measures ANOVA is similar to the dependent sample T-Test, because it also compares the
mean scores of one group to another group on different observations. It is necessary for the repeated
measures ANOVA for the cases in one observation to be directly linked with the cases in all other
observations. This automatically happens when repeated measures are taken, or when analyzing similar
units or comparable specimen.
The pairing of observations or making repeated measurements are very common when conducting
experiments or making observations with time lags. Pairing the measured data points is typically done
in order to exclude any cofounding or hidden factors (cf. partial correlation). It is also often used to
account for individual differences in the baselines, such as pre-existing conditions in clinical research.
Consider for example a drug trial where the participants have individual differences that might have an
impact on the outcome of the trial. The typical drug trial splits all participants into a control and the
treatment group and measures the effect of the drug in month 1 -18. The repeated measures ANOVA
91
can correct for the individual differences or baselines. The baseline differences that might have an
effect on the outcome could be typical parameter like blood pressure, age, or gender. Thus the
repeated measures ANOVA analyzes the effect of the drug while excluding the influence of different
baseline levels of health when the trial began.
Since the pairing is explicitly defined and thus new information added to the data, paired data can
always be analyzed with a regular ANOVA as well, but not vice versa. The baseline differences, however,
will not be accounted for.
A typical guideline to determine whether the repeated measures ANOVA is the right test is to answer
the following three questions:
x Is there a direct relationship between each pair of observations, e.g., before vs. after scores on
the same subject?
x Are the observations of the data points definitely not random (i.e., they must not be a randomly
selected specimen of the same population)?
If the answer is yes to all three of these questions the dependent sample t-test is the right test. If not,
use the ANOVA or the t-test. In statistical terms the repeated measures ANOVA requires that the
within-group variation, which is a source of measurement errors, can be identified and excluded from
the analysis.
The repeated measures ANOVA can be found in SPSS in the menu Analyze/General Linear
DZD
92
The dialog box that opens on the click is different than the GLM
module you might know from the MANOVA. Before specifying the
model we need to group the repeated measures.
93
Since our example does not have an independent variable the post hoc tests and contrasts are not
needed to compare individual differences between levels of the between-subject factor. We also go
with the default option of the full factorial model (in the D dialog box). If you were to conduct a
post hoc test, SPSS would run a couple of pairwise dependent samples t-tests. We only add some useful
EKsKdialog.
Technically we only need the Levene test for homoscedasticity when we would include at least one
independent variable in the sample. However it is checked here out of habit so that we don't forget to
select it for the other GLM procedures we run.
94
It is also quite useful to include the descriptive statistics, because we have not yet compared the
longitudinal development of the five administered aptitude tests.
The next table shows the results of the regression modeling the GLM procedure conducts. Since our
rather simple example of a simple repeated measures ANOVA does not include any regression,
component we can skip this table.
One of the key assumptions of the repeated measures ANOVA is sphericity. Sphericity is a measure for
the structure of the covariance matrix in repeated designs. Because repeated designs violate the
assumption of independence between measurements, the covariances need to be spherical. One
stricter form of sphericity is compound symmetry, which occurs if all the covariances are approximately
equal and all the variances are approximately equal in the samples. Mauchly's Sphericity Test tests this
assumption. If there is no sphericity in the data the repeated measures ANOVA can still be done when
the F-values are corrected by deducting additional degrees of freedom, e.g., Greenhouse-Geisser or
Huynh-Feldt.
95
Mauchly's Test tests the null hypothesis that the error covariance of the orthonormalized transformed
dependent variable is proportional to an identity matrix. In other words, the relationship between
different observation points is similar; the differences between the observations have equal variances.
This assumption is similar to homoscedasticity (tested by the Levene Test) which assumes equal
variances between groups, not observations. In our example, the assumption of sphericity has not been
met because the Mauchly's Test is significant. This means that the F-values of our repeated measures
ANOVA are likely to be too large. This can be corrected by decreasing the degrees of freedom used. The
last three columns (Epsilon) tell us the appropriate correction method to use. If epsilon is greater than
0.75 then we should use the Huynh-Feldt correction or the Greenhouse-Geisser correction. SPSS
automatically corrects the F-values in the f-statistics table of the repeated measures ANOVA.
The next table shows the f statistics (also called the within-subject effects). As discussed earlier the
assumption of sphericity has not been met and thus the degrees of freedoms in our repeated measures
ANOVA need to be decreased. The table shows that the differences in our repeated measures are
significant on a level p < 0.001. The table also shows that the Greenhouse-Geisser correction has
decreased the degrees of freedom from 4 to 2.831.
Thus we can reject the null hypothesis that the repeated measures are equal and we might assume that
our repeated measures are different from each other. Since the repeated measures ANOVA only
conducts a global F-test the pairwise comparison table helps us find the significant differences in the
observations. Here we find that the first aptitude test is significantly different than the second and
third; the second is only significantly different from the first, the fourth, and the fifth etc.
96
97
Repeated Measures ANCOVA
Example:
A research team wants to test the user acceptance of a new online travel booking tool. The
team conducts a study where they assign 30 randomly chosen people into two groups. One
group that uses the new system and another group acts as a control group and books its travel
via phone. The team also records self-reported computer literacy of each user.
The team measures the user acceptance of the system as the behavioral intention to use the
system in the first four weeks after it went live. Since user acceptance is a latent behavioral
construct the researchers measure it with three items ease of use, perceived usefulness, and
effort to use. They now use the repeated measures ANCOVA to find out if the weekly
measurements differ significantly from each other, if the treatment and control group differ
significantly from each other all the time controlling for the influence of computer literacy.
When faced with a question similar to the one in our example, try to run 4 MANCOVAs to test the
influence of the independent variables on each of the observations of the four weeks while controlling
for the covariate. Keep in mind, running multiple ANOVAs does not account for individual differences in
baselines of the participants of the study. Technically, the assumption of independence is violated
because the numbers of week two are not completely independent from the numbers from week one.
The repeated measures ANCOVA is similar to the dependent sample t-Test, and the repeated measures
ANOVA because it also compares the mean scores of one group to another group on different
observations. It is necessary for the repeated measures ANCOVA that the cases in one observation are
directly linked with the cases in all other observations. This automatically happens when repeated
measures are taken, or when analyzing similar units or comparable specimen.
Both strategies (pairing of observations or making repeated measurements) are very common when
conducting experiments or making observations with time lags. Pairing the observed data points is
typically done to exclude any cofounding or hidden factors (cf. partial correlation). It is also used to
account for individual differences in the baselines, for example pre-existing conditions in clinical
research. Consider the example of a drug trial where the participants have individual differences that
might have an impact on the outcome of the trial. The typical drug trial splits all participants into a
control and the treatment group and measures the effect of the drug in months 1 - 18. The repeated
98
measures ANCOVA can correct for the individual differences or baselines. The baseline differences that
might have an effect on the outcome could be a typical parameter like blood pressure, age, or gender.
Not only does the repeated measures ANCOVA account for difference in baselines, but also for effects of
confounding factors. This allows the analysis of interaction effects between the covariate, the time and
the independent variables' factor levels.
Since the pairing is explicitly defined and thus new information added to the data, paired data can
always be analyzed with a regular ANCOVA, but not vice versa. The baseline differences, however, will
not be accounted for.
Are there individual differences in the longitudinal measures of aptitude between students who
passed the final exam and the students who failed the final exam; when we control for the
mathematical abilities as measured by the standardized test score of the math test?
The repeated measures ANCOVA uses the GLM module of SPSS, like the factorial ANOVAs, MANOVAs,
and MANCOVAS. The repeated measures ANCOVA can be found in SPSS in the menu Analyze/General
>DZD
99
The dialog box that opens is different than the GLM module you might know from the MANCOVA.
Before specifying the model we need to group the repeated measures.
This is done by creating a within-subject factor. It is called a within subject factor of our repeated
measures ANCOVA because it represents the different observations of one subject. We measured the
aptitude on five different data points, which creates five factor levels.
We specify a factor called Aptitude_Tests with five factor levels (that is
the number of our repeated observations).
As usual we go with the standard settings for the model, contrast, plots,
and save the results. Also note that Post Hoc tests are disabled because
of the inclusion of a covariate in the model.
We simply add some useful statistics to the repeated measures ANOVA output in the K dialog.
These include the comparison of main effects with adjusted degrees of freedom, some descriptive
100
statistics, the practical significance eta, and the Levene test for homoscedasticity since we included
Exam as an independent variable in the analysis.
The second table of the repeated measures ANCOVA shows the descriptive
statistics (mean, standard deviation, and sample size) for each cell in our
analysis design.
The next table shows the results of the regression modeling the GLM procedure conducts. Regression is
used to test the factor effects of significance. The analysis finds that the aptitude tests do not have a
significant influence in the covariate regression model that is, we cannot reject the null hypothesis
that the mean scores are equal across observations. We find only that the interaction of the repeated
tests with the independent variable (exam) is significant.
101
One of the special assumptions of repeated designs is sphericity. Sphericity is a measure for the
structure of the covariance matrix in repeated designs. Because repeated designs violate the
assumption of independence between measurements, the covariances need to be spheric. One stricter
form of sphericity is compound symmetry, which occurs if all the covariances are approximately equal
and all the variances are approximately equal in the samples. Mauchly's sphericity test tests this
assumption. If there is no sphericity in the data the repeated measures ANOVA can still be done when
the F-values are corrected by deducting additional degrees of freedom (e.g., Greenhouse-Geisser or
Huynh-Feldt).
Mauchly's Test analyzes whether this assumption is fulfilled. It tests the null hypothesis that the error
covariance of the orthonormalized transformed dependent variable is proportional to an identity matrix.
In simpler terms, the relationship between different observation points is similar; the differences
between the observations have equal variances. This assumption is similar to homoscedasticity (tested
by the Levene Test) which assumes equal variances between groups, not observations.
In our example, the assumption of sphericity has not been met, because the Mauchly's Test is highly
significant. This means that the F-values of our repeated measures ANOVA are likely to be too large.
This can be corrected by decreasing the degrees of freedom used to calculate F. The last three columns
102
(Epsilon) tell us the appropriate correction method to use. If epsilon is greater than 0.75 we should use
the Huynh-Feldt correction or the Greenhouse-Geisser correction. SPSS automatically includes the
corrected F-values in the f-statistics table of the repeated measures ANCOVA.
The next table shows the f-statistics. As discussed earlier the assumption of sphericity has not been met
and thus the degrees of freedom in our repeated measures ANCOVA need to be decreased using the
Huynh-Feldt correction. The results show that neither the aptitude test scores nor the interaction effect
of the aptitude scores with the covariate factor are significantly different. The only significant factor in
the model is the interaction between the repeated measures of the aptitude scores and the
independent variable (exam) on a level of p = 0.005.
Thus we cannot reject the null hypothesis that the repeated measures are equal when controlling for
the covariate and we might unfortunately not assume that our repeated measures are different from
each other when controlling for the covariate.
The last two tables we reviewed ran a global F-test. The next tables look at individual differences
between subjects and measurements. First, the Levene tests is not significant for all repeated measures
but the first one, thus we cannot reject our null hypothesis and might assume equal variances in all cells
of our design. Secondly, we find that in our linear repeated measures ANCOVA model the covariate
factor levels (Test_Score) are not significantly different (p=0.806), and also that the exam factor levels
(pass vs. fail) are not significantly different (p=0.577).
103
The last output of our repeated measures ANCOVA are the pairwise tests. The pairwise comparisons
between groups is meaningless since they are not globally different to begin with; the interesting table
is the pairwise comparison of observations. It is here where we find that in our ANCOVA model test, 1,
2, and 3 differ significantly from each other, as well as 2 and 3 compared to 4 and 5, when controlling for
the covariate.
104
During the fieldwork five repeated aptitude tests were administered to the students. We
analyzed whether the differences between the five repeated measures are significant and
whether they are significant between the students who passed the final exam and the students
who failed the final exam when we controlled for their mathematical ability as measured by the
standardized math test. The repeated measures ANCOVA shows that the achieved aptitude
scores are not significant between the repeated measures and between the groups of students.
However a pairwise comparison identifies the aptitude tests 1, 2, 3 still being significantly
different from each other, when controlling for the students' mathematical abilities.
Profile Analysis
105
administers a personality test (e.g., NEO), the respondent gets a test profile in return showing the scores
on the Neuroticism, Extraversion, Agreeableness, Consciousness, and Openness dimensions. Similarly,
many tests such as GMAT, GRE, SAT, and various intelligence questionnaires report profiles for abilities
in reading, writing, calculating, and critically reasoning.
Typically test scores are used to predict behavioral items. In education studies it is common to predict
test performance, for example using the SAT to predict the college GPA when graduating. Cluster
analysis and Q-test have been widely used to build predictive models for this purpose.
What is the purpose of Profile Analysis? Profile Analysis helps researchers to identify whether two or
more groups of test takers show up as a significantly distinct profile. It helps to analyze patterns of
tests, subtests, or scores. The analysis may be across groups or across scores for one individual.
What does that mean? The profile analysis looks at profile graphs. A profile graph is simply the mean
score of the one group of test takers with the other group of test takers along all items in the battery.
The main purpose of the profile analysis is to identify how good a test is. Typically the tests consist of
multiple item measurements and are administered over a series of time points. You could use a simple
ANOVA to compare the test items, but this violates the independence assumption in two very important
ways. Firstly, the scores on each item are not independent item batteries are deliberately designed to
have a high correlation among each other. Secondly, if you design a test to predict group membership
(e.g., depressed vs. not depressed, likely to succeed vs. not like to succeed in college), you want the
item battery to best predict the outcome. Thus item battery and group membership are also not
independent.
What is the solution to this problem? Since neither the single measurements on the items nor the group
membership are independent, they needed to be treated as a paired sample. Statistically the Profile
Analysis is similar to a repeated measures ANOVA.
Example:
A research team wants to create a new test for a form of cancer that seems to present in
patients with a very specific history and diet. The researchers collect data on ten questions from
patients that present with the cancer and a randomly drawn sample of people who do not
present with the cancer.
Profile Analysis is now used to check whether the ten questions significantly differentiate between the
groups that presents with the illness and the group that does not. Profile analysis takes into account
that neither items among each other nor subject assignment to groups is random.
Profile Analysis is also a great way to understand and explore complex data. The results of the profile
analysis help to identify and focus on the relevant differences and help the researcher to select the right
contrasts, post hoc analysis, and statistical tests when a simple ANOVA or t-test would not suffice.
However profile analysis has its limitations, especially when it comes to standard error of measurement
and predicting a single person's score.
106
Alternatives to the Profile Analysis are the Multidimensional Scaling, and Q-Analysis. In Q-Analysis the
scores of an individual on the item battery are treated as an independent block (just as in Profile
Analysis). The Q-Analysis then conducts a rotated factor analysis on these blocks, extracting relevant
factors and flagging the items that define a factor.
Another alternative to Profile Analysis is a two-way MANOVA (or doubly MANOVA). In this design the
repeated measures would enter the model as the second dependent variable and thus the model
elegantly circumvents the sphericity assumption.
Do the students who passed the final exam and the students who failed the final exam have a
significantly different ranking in their math, reading, and writing test?
The Profile Analysis uses the repeated measures GLM module of SPSS, like the repeated measures
ANOVA and ANCOVA. The Profile Analysis can be found in SPSS in the menu Analyze/General Linear
DZD
The dialog box that opens is different than the GLM module for independent measures. Before
specifying the model we need to group the repeated measuresthe item battery we want to test. In
107
our example we want to test if the standardized test, which
consists of three items (math, reading, writing), correctly classifies
the two groups of students that either pass or fail the final exam.
The next dialog box allows us to specify the Profile Analysis. First
we need to add the three test items to the list of within-subjects
variables. We then add the exam variable to the list of between-
subjects factors. We can leave all other settings on default, apart
from the plots.
To create the profile plots we want the items (or subtests) on the
horizontal axis with the groups as separate lines. We also need the
Levene test for homoscedasticity to check the assumptions of the Profile Analysis, the Levene Test can
be included in the dialog Options...
108
Box's M Test verifies the assumption that covariance matrices of each cell in our Profile Analysis design
are equal. Box's M is significant (p < 0.001) thus we can reject the null hypothesis and we might not
assume homogenous covariance structures. Also we can verify that sphericity might not be assumed,
since the Mauchly's Test is significant (p = 0.003). Sphericity is a measure for the structure of the
covariance matrix in repeated designs. Because repeated designs violate the assumption of
independence between measurements the covariances need to be spheric. One stricter form of
sphericity is compound symmetry, which occurs if all the covariances are approximately equal and all
the variances are approximately equal in the samples. Mauchly's sphericity test tests this assumption. If
there is no sphericity in the data, the repeated measures ANOVA can still be done when the F-values are
corrected by deducting additional degrees of freedom (e.g., Greenhouse-Geisser or Huynh-Feldt). Thus
we need to correct the F-values when testing the significance of the main and interaction effects. The
epsilon is greater than 0.75, thus we can work with the less conservative Huynh-Feldt correction.
The first real results table of our Profile Analysis are the within-subjects effects. This table shows that
the items that build our standardized test are significantly different from each other and also that the
interaction effect between passing the exam and the standardized test items is significant. However,
the Profile Analysis does not tell how many items differ and in which direction they differ.
109
The Profile Analysis shows highly significant between-subjects effect. This indicates that the aptitude
groups differ significantly on the average of all factor levels of the standardized test (p < 0.001). We can
conclude that the factor levels of
the exam variable are
significantly different. However
we cannot say which direction
they differ (e.g., if failing the
exam results in a lower score on
the test). Also if we would have
a grouping variable with more
than two levels it would not tell
whether all levels are significantly
different or only a subset is different.
110
practical significance of the single items.
We investigated whether the administered standardized test that measures the students' ability
in math, reading, and writing can sufficiently predict the outcome of the final exam. We
conducted a profile analysis and the profile of the two student groups is significantly different
along all three dimensions of the standardized test, with students passing the exam scoring
consistently higher.
What is a double multivariate analysis? A double multivariate profile analysis (sometimes called doubly
multivariate) is a multivariate profile analysis with more than one dependent variable. Dependent
variables in Profile Analysis are the item batteries or subtests tested.
A Double Multivariate Profile Analysis can be double multivariate in two different ways: 1) two or more
dependent variables are measured multiple times, or 2) two or more sets of non-commensurate
measures are measured at once.
Let us first discuss the former, a set of multiple non-commensurate items that are measured two or
more different times. Non-commensurate items are items with different scales. In such a case we have
a group and a time, as well as an interaction effect group*time. The double multivariate profile analysis
will now estimate a linear canonical root that combines the dependent variables and maximizes the
main and interaction effects. Now we can find out if the time or the group effect is significant and we
can do simpler analyses to test the specific effects.
As for two or more sets of commensurate dependent variables of one subject measured one at a time,
this could, for instance, be the level of reaction towards three different stimuli and the reaction time.
Since both sets of measures are neither commensurate nor independent, we would need to use a
double multivariate profile analysis. The results of that analysis will then tell us the main effects of our
three stimuli, the reaction times, and the interaction effect between them. The double multivariate
profile analysis will show which effects are significant and worth exploring in multivariate analysis with
one dependent variable.
111
Additionally, the profile analysis looks at profile graphs. A profile graph simply depicts the mean scores
of one group of test takers along the sets of measurements and compares them to the other groups of
test takers along all items in the battery. Thus the main purpose of the profile analysis is to identify if
non-independent measurements on two or more scales are significantly different between several
groups of test takers.
Example:
A research team wants to create a new test for a form of cardiovascular disease that seems to
present in patients with a very specific combination of blood pressure, heart rate, cholesterol,
and diet. The researchers collect data on these three dependent variables.
Profile Analysis can then be used to check whether the three dependent variables differ significantly
differentiate between a group that presents with the illness versus the group that does not. Profile
analysis takes into account that neither items among each other nor subject assignment to groups is
random.
Profile Analysis is also a great way to understand and explore complex data. The results of the profile
analysis help to identify and focus on the relevant differences and help the researcher to select the right
contrasts, post hoc analysis, and statistical tests, when a simple ANOVA or t-test would not be sufficient.
An alternative to profile analysis is also the double multivariate MANOVA, where the time and
treatment effect are entered in a non-repeated measures MANOVA to circumvent the sphericity
assumption on the repeated observations.
The Double Multivariate Profile Analysis looks at profiles of data and checks a profile whether Profiles
are significantly distinct in pattern and significantly different in level. Technically, Double Multivariate
Profile analysis analyzes respondents as opposed to factor analysis which analyzes variables. At the
same time, Double Multivariate Profile Analysis is different from cluster analysis in that cluster analysis
does not take a dependent variable into account.
The purpose of Double Multivariate Profile Analysis is to check if four or more profiles are parallel. It
tests four statistical hypotheses:
In addition, the Double Multivariate Profile Analysis tests two practical hypotheses:
112
1. There are no within-subjects effects the profile analysis tests whether the items within different
batteries of subtests are significantly different, if items do not differ significantly they might be
redundant and excluded.
2. There are no between-subjects effects meaning that the subtest batteries do not produce different
profiles for the groups.
The profile analysis optimizes the covariance structure. The rationale behind using the covariance
structure is that the observations are correlated and that the correlation of observations is naturally
larger when they come from the same subject.
Our research question for the Doubly Multivariate Profile Analysis is as follows:
Does the test profile for the five midyear mini-tests and snippets from the standardized tests
(math, reading, and writing) differ between student who failed the final exam and students who
passed the final exam? (The example is a 3x5 = (3 standardized test snippets) x (5 repeated mini-
tests) design, thus the analysis requires 15 observations for each participant!)
The Profile Analysis uses the repeated measures GLM module of SPSS, like the repeated measures
ANOVA and ANCOVA. The Profile Analysis can be found in SPSS in the menu Analyze/General Linear
DZD
113
The dialog box that opens is different than the GLM module for
independent measures. Before specifying the model we need to define
the repeated measures, or rather, inform SPSS how we designed the study.
In our example we want to test the three factor levels of standardized
testing and the five factor levels of aptitude testing.
The next dialog box allows us to specify the Profile Analysis. We need to
add all nested observation to the list of within-subject variables. Every
factor level is explicitly marked. The first factor level on both variables is
(1,1) and (3,2) is both the third level on the first factor and the second level
on the second factor. Remember that this analysis needs 3x5 data points
per participant! We leave all other settings on default, apart from the plots
where we add the marginal means plots.
Box's M test would be typically the next result to examine. However SPSS finds singularity in covariance
matrices (that is perfect correlation). Usually Box M verifies the assumption that covariance matrices of
each cell in our Double Multivariate Profile Analysis design are equal. It does so by testing the null
hypothesis that the covariance structures are homogenous.
114
The next assumption to test is sphericity. In our Double Multivariate Profile Analysis sphericity can be
assumed for the main effects (Mauchly's Test is not significant p=0.000), since we cannot reject the null
hypothesis (Mauchly's Test is highly significant). Thus we need to correct the F-values when testing the
significance of the interaction effects. The estimated epsilon is less than 0.75, thus we need to work
with the more conservative the Greenhouse-Geisser correction.
The first results table of our Double Multivariate Profile Analysis reports the within-subjects effects. It
shows that the within-subject effects of factor 1 (the mini-test we administered) and the standardized
test bits are significantly different. However the sample questions for the standardized tests (factor 2)
that were included in our mini-tests are not significantly different, because of covariance structure
singularities. Additionally the interaction effects factor1 * exam (group of students who passed vs.
students who failed) is significant, as well as the two-way interaction between the factors and the three-
way interaction between the factors and the outcome of the exam variable. This is a good indication
115
that we found distinct measurements and that we do not see redundancy in our measurement
approach.
The next step in our Double Multivariate Profile Analysis tests the discriminating power of our groups. It
will reveal whether or not the profiles of the groups are distinct and parallel. Before we test, however,
we need to verify homoscedasticity. The Levene Test (below, right) prevents us from rejecting the null
hypothesis that the variances are equal, thus we might assume homoscedasticity in almost all tests.
116
The Double Multivariate Profile Analysis shows highly significant between-subjects effect. This indicates
that the student groups (defined by our external criterion of failing or passing the final exam) differ
significantly across all factor levels (p = 0.016). We can conclude that the factor levels of the tests are
significantly different. However we cannot say which direction they differ, for example if the students
that failed the final exam scored lower or not. Also, a grouping variable with more than two levels
would not tell whether all levels are significantly different or if only a subset is different. The Profile
Plots of the Double Multivariate Profile Analysis answer this question.
We find that for both repeated measures on the mini-tests and the sample questions from the
standardized tests the double multivariate profiles are somewhat distinctalbeit more so for the
standardized test questions. The students who failed the exam scored consistently lower than the
student who passed the final exam.
We investigated whether five repeated mini-tests that included prospective new questions for
the standardized test (three scores for math, reading, and writing) have significantly distinct
profiles. The doubly multivariate profile analysis finds that the five mini-tests are significantly
117
different, and that the students who passed the exam are significantly different from the
students that failed the exam on all scores measured (five mini-tests and three standardized test
questions. Also all two- and three-way interaction effects are significant. However due to
singularity in the covariance structures, the hypothesis could not be tested for the standardized
test questions.
The t-test family is based on the t-distribution, because the difference of mean score for two
multivariate normal variables approximates the t-distribution. The t-distribution and also the t-test is
sometimes also called ^s t. Student is the pseudonym used by W. S. Gosset in 1908 to publish
the t-distribution based on his empirical findings on the height and the length of the left middle finger of
criminals in a local prison.
Within the t-test family, the independent samples t-test compares the mean scores of two groups in a
given variable, that is, two mean scores of the same variable, whereby one mean represents the average
of that characteristic for one group and the other mean represents the average of that specific
characteristic in the other group. Generally speaking, the independent samples t-test compares one
measured characteristic between two groups of observations or measurements. It tells us whether the
difference we see between the two independent samples is a true difference or whether it is just a
random effect (statistical artifact) caused by skewed sampling.
The independent samples t-test is also called unpaired t-test. It is the t-test to use when two separate
independent and identically distributed variables are measured. Independent samples are easiest
obtained when selecting the participants by random sampling.
The independent samples t-test is similar to the dependent sample t-test, which compares the mean
score of paired observations these are typically obtained when either re-testing or conducting repeated
measurements, or when grouping similar participants in a treatment-control study to account for
differences in baseline. However the pairing information needs to be present in the sample and
therefore a paired sample can always be analyzed with an independent samples t-test but not the other
way around.
Examples of typical questions that the independent samples t-test answers are as follows:
118
Medicine - Has the quality of life improved for patients who took drug A as opposed to patients
who took drug B?
Sociology - Are men more satisfied with their jobs than women? Do they earn more?
Economics - Is the economic growth of developing nations larger than the economic growth of
the first world?
Marketing: Does customer segment A spend more on groceries than customer segment B?
Does the standardized test score for math, reading, and writing differ between students who
failed and students who passed the final exam?
Let's start by verifying the assumptions of the t-test to check whether we made the right choices in our
decision tree. First, we are going to create some descriptive statistics to get an impression of the
distribution. In order to do this, we open the Frequencies menu in Analyze/Descriptive
^&
119
Next we add the two groups to the list of variables. For the moment our two groups are stored in the
variable A and B. We deselect the frequency tables but add distribution parameters and the histograms
with normal distribution curve to the output.
120
The histograms show quite nicely that the variables approximate a normal distribution and also their
distributional difference. We could continue with verifying this eyeball test with a K-S test, however
because our sample is larger than 30, we will skip this step.
121
In the dialog box of the independent samples t-test we select the variable with our standardized test
scores as the three test variables and the grouping variable is the outcome of the final exam (pass = 1 vs.
fail = 0). The independent samples t-test can only compare two groups (if your independent variable
defines more than two groups, you either would need to run multiple t-tests or an ANOVA with post hoc
tests). The groups need to be defined upfront, for that you need to click on the button Define Groups
and enter the values of the independent variable that characterize the groups.
122
The dialog box Options allows us to define how
missing cases shall be managed (either exclude them
listwise or analysis by analysis). We can also define
the width of the confidence interval that is used to
test the difference of the mean scores in this
independent samples t-test.
We are not going to calculate the test manually because the second table nicely displays the results of
the independent samples t-test. If you remember there is one question from the decision tree still left
Are the groups homoscedastic?
The output includes the Levene Test in the first two columns. The Levene test tests the null hypothesis
that the variances are homogenous (equal) in each group of the independent variable. In our example it
is highly significant for the math test and not significant for the writing and reading test. That is why we
must reject the null hypothesis for the math test and assume that the variances are not equal for the
math test. We cannot reject the null hypothesis for the reading and writing tests, so that we might
assume that the variances of these test scores are equal between the groups of students who passed
the exam and students who failed the final exam.
We find the correct results of the t-test next to it. For the math score we have to stick to the row 'Equal
variances not assumed' whereas for reading and writing we go with the 'Equal variances assumed' row.
We find that for all three test scores the differences are highly significant (p < 0.001). The table also tells
123
us the 95% confidence intervals for the difference of the mean scores; none of the confidence intervals
include zero. If they did, the t-test would not be significant and we would not find a significant
difference between the groups of students.
We analyzed the standardized test scores for students who passed the final exam and students
who failed the final exam. An independent samples t-test confirms that students who pass the
exam score significantly higher on all three tests with p < 0.001 (t = 12.629, 6.686, and 9.322).
The independent samples t-test has shown that we can reject our null hypothesis that both
samples have the same mean scores for math, reading, and writing.
One-Sample T-Test
The independent sample t-test compares one mean of a distinct group to the mean of another group
from the same sample. It would examine the qAre old people smaller than the rest of the
population? The dependent sample t-test compares before/after measurements, like for example, Do
pupils grades improve after they receive tutoring?
So if only a single mean is calculated from the sample what does the 1-sample t-test compare the mean
with? The 1-sample t-test compares the mean score found in an observed sample to a hypothetically
assumed value. Typically the hypothetically assumed value is the population mean or some other
theoretically derived value.
There are some typical applications of the 1-sample t-test: 1) testing a sample a against a pre-defined
value, 2) testing a sample against an expected value, 3) testing a sample against common sense or
expectations, and 4) testing the results of a replicated experiment against the original study.
124
First, the hypothetical mean score can be a generally assumed or pre-defined value. For example, a
researcher wants to disprove that the average age of retiring is 65. The researcher would draw a
representative sample of people entering retirement and collecting their ages when they did so. The 1-
sample t-test compares the mean score obtained in the sample (e.g., 63) to the hypothetical test value
of 65. The t-test analyzes whether the difference we find in our sample is just due to random effects of
chance or if our sample mean differs systematically from the hypothesized value.
Secondly, the hypothetical mean score also can be some derived expected value. For instance, consider
the example that the researcher observes a coin toss and notices that it is not completely random. The
researcher would measure multiple coin tosses, assign one side of the coin a 0 and the flip side a 1. The
researcher would then conduct a 1-sample t-test to establish whether the mean of the coin tosses is
really 0.5 as expected by the laws of chance.
Thirdly, the 1-sample t-test can also be used to test for the difference against a commonly established
and well known mean value. For instance a researcher might suspect that the village she was born in is
more intelligent than the rest of the country. She therefore collects IQ scores in her home village and
uses the 1-sample t-test to test whether the observed IQ score differs from the defined mean value of
100 in the population.
Lastly, the 1-sample t-test can be used to compare the results of a replicated experiment or research
analysis. In such a case the hypothesized value would be the previously reported mean score. The new
sample can be checked against this mean value. However, if the standard deviation of the first
measurement is known a proper 2-sample t-test can be conducted, because the pooled standard
deviation can be calculated if the standard deviations and mean scores of both samples are known.
Although the 1-sample t-test is mathematically the twin brother of the independent variable t-test, the
interpretation is somewhat different. The 1-sample t-test checks whether the mean score in a sample is
a certain value, the independent sample t-test checks whether an estimated coefficient is different from
zero.
125
The statement we will examine for the 1-sample t-test is as follows: The average age in our student
sample is 9 years.
Before we actually
conduct the 1-sample
t-test, our first step is
to check the
distribution for
normality. This is
best done with a Q-Q
Plot. We simply add
the variable we want
to test (age) to the
box and confirm that
the test distribution is
set to Normal. This
will create the
diagram you see
below. The output
shows that small values and large values somewhat deviate from normality. As a check we can run a K-S
Test to tests the null hypothesis that the variable is normally distributed. We find that the K-S Test is not
significant thus we cannot reject H0 and we might assume that the variable age is normally distributed.
Let's move on to the 1 sample t-test, which can be found in Analyze/Compare Means/One-Sample T-
d
126
The 1-sample t-test dialog box is fairly simple. We add the test variable age to the list of Test Variables
and enter the Test Value. In our case the hypothetical test value is 9.5. The dialog K gives us the
setting how to manage missing values and also the opportunity to specify the width of the confidence
interval used for testing.
127
The second table contains the actual 1-sample t-test statistics. The output shows for each variable the
sample t-value, degrees of freedom, two-tailed test of significance, mean difference, and the confidence
interval.
The hypothesis that the students have an average age of 9 years was tested with a 1-sample t-
test. The test rejects the null hypothesis with p < 0.001 with a mean difference of .49997. Thus
128
we can assume that the sample has a significantly different mean than 9.5 and the hypothesis is
not true.
Within the t-test family the dependent sample T-Test compares the mean scores of one group in
different measurements. It is also called the paired t-test, because measurements from one group must
be paired with measurements from the other group. The dependent sample t-test is used when the
observations or cases in one sample are linked with the cases in the other sample. This is typically the
case when repeated measures are taken, or when analyzing similar units or comparable specimen.
Making repeated measurements or pairing observations is very common when conducting experiments
or making observations with time lags. Pairing the measured data points is typically done in order to
exclude any cofounding or hidden factors (cf. partial correlation). It is also often used to account for
individual differences in the baselines, for example pre-existing conditions in clinical research. Consider
the example of a drug trial where the participants have individual differences that might have an impact
on the outcome of the trial. The typical drug trial splits all participants into a control and the treatment
group. The dependent sample t-test can correct for the individual differences or baselines by pairing
comparable participants from the treatment and control group. Typical grouping variables are easily
obtainable statistics such as age, weight, height, blood pressure. Thus the dependent-sample t-test
analyzes the effect of the drug while excluding the influence of different baseline levels of health when
the trial began.
Pairing data points and conducting the dependent sample t-test is a common approach to establish
causality in a chain of effects. However, the dependent sample t-test only signifies the difference
between two mean scores and a direction of changeit does not automatically give a directionality of
cause and effect.
Since the pairing is explicitly defined and thus new information added to the data, paired data can
always be analyzed with the independent sample t-test as well, but not vice versa. A typical guideline to
determine whether the dependent sample t-test is the right test is to answer the following three
questions:
x Is there a direct relationship between each pair of observations (e.g., before vs. after scores on
the same subject)?
129
x Are the observations of the data points definitely not random (e.g., they must not be randomly
selected specimen of the same population)?
If the answer is yes to all three of these questions the dependent sample t-test is the right test,
otherwise use the independent sample t-test. In statistical terms the dependent samples t-test requires
that the within-group variation, which is a source of measurement errors, can be identified and
excluded from the analysis.
Do students aptitude test1 scores differ from their aptitude test2 scores?
130
assumes that the second dimension of the pairing is the case number, i.e. that case number 1 is a pair
of measurements between variable 1 and 2.
Although we could specify multiple dependent samples t-test that are executed at the same time, our
example only looks at the first and the second aptitude test. Thus we drag & drop 'Aptitude Test 1' into
the cell of pair 1 and variable 1, and 'Aptitude Test 2' into the cell pair 1 and variable 2. The K
button allows to define the width of the control interval and how missing values are managed. We
leave all settings as they are.
The second table in the output of the dependent samples t-test shows the correlation analysis between
the paired variables. This result is not part of any of the other t-tests in the t-test family. The purpose of
the correlation analysis is to show whether the use of dependent samples can increase the reliability of
the analysis compared to the independent samples t-test. The higher the correlation coefficient the
stronger the strength of association between both variable and thus the higher the impact of pairing the
data compared to conducting an unpaired t-test. In our example the Pearson's bivariate correlation
analysis finds a medium negative correlation that is significant with p < 0.001. We can therefore assume
that pairing our data has a positive impact on the power of t-test.
The third table contains the actual dependent sample t-statistics. The table includes the mean of the
differences Before-After, the standard deviation of that difference, the standard error, the t-value, the
degrees of freedom, the p-value and the confidence interval for the difference of the mean scores.
Unlike the independent samples t-test it does not include the Levene Test for homoscedasticity.
131
In our example the dependent samples t-test shows that aptitude scores decreased on average by 4.766
with a standard deviation of 14.939. This results in a t-value of t = 3.300 with 106 degrees of freedom.
The t-test is highly significant with p = 0.001. The 95% confidence interval for the average difference of
the mean is [1.903, 7.630].
The dependent samples t-test showed an average reduction in achieved aptitude scores by 4.766
scores in our sample of 107 students. The dependent sample t-test was used to account for
individual differences in the aptitude of the students. The observed decrease is highly significant
(p = 0.001).Therefore, we can reject the null hypothesis that there is no difference in means and
can assume with 99.9% confidence that the observed reduction in aptitude score can also be
found in the general population. With a 5% error rate we can assume that the difference in
aptitude scores will be between 1.903 and 7.630.
Mann-Whitney U-Test
Other dependency tests that compare the mean scores of two or more groups are the F-test, ANOVA
and the t-test family. Unlike the t-test and F-test, the Mann-Whitney U-test is a non-paracontinuous-
level test. That means that the test does not assume any properties regarding the distribution of the
underlying variables in the analysis. This makes the Mann-Whitney U-test the analysis to use when
analyzing variables of ordinal scale. The Mann-Whitney U-test is also the mathematical basis for the H-
test (also called Kruskal Wallis H), which is basically nothing more than a series of pairwise U-tests.
Because the test was initially designed in 1945 by Wilcoxon for two samples of the same size and in
1947 further developed by Mann and Whitney to cover different sample sizes the test is also called
MannWhitneyWilcoxon (MWW), Wilcoxon rank-sum test, WilcoxonMannWhitney test, or Wilcoxon
two-sample test.
132
The Mann-Whitney U-test is mathematically identical to conducting an independent sample t-test (also
called 2-sample t-test) with ranked values. This approach is similar to the step from Pearson's bivariate
correlation coefficient to Spearman's rho. The U-test, however, does apply a pooled ranking of all
variables.
The U-test is a non-paracontinuous-level test, in contrast to the t-tests and the F-test; it does not
compare mean scores but median scores of two samples. Thus it is much more robust against outliers
and heavy tail distributions. Because the Mann-Whitney U-test is a non-paracontinuous-level test it
does not require a special distribution of the dependent variable in the analysis. Thus it is the best test
to compare mean scores when the dependent variable is not normally distributed and at least of ordinal
scale.
For the test of significance of the Mann-Whitney U-test it is assumed that with n > 80 or each of the two
samples at least > 30 the distribution of the U-value from the sample approximates normal distribution.
The U-value calculated with the sample can be compared against the normal distribution to calculate
the confidence level.
The goal of the test is to test for differences of the media that are caused by the independent variable.
Another interpretation of the test is to test if one sample stochastically dominates the other sample.
The U-value represents the number of times observations in one sample precede observations in the
other sample in the ranking. Which is that with the two samples X and Y the Prob(X>Y) > Prob(Y>X).
Sometimes it also can be found that the Mann-Whitney U-test tests whether the two samples are from
the same population because they have the same distribution. Other non-paracontinuous-level tests to
compare the mean score are the Kolmogorov-Smirnov Z-test, and the Wilcoxon sign test.
Do the students that passed the exam achieve a higher grade on the standardized reading test?
The question indicates that the independent variable is whether the students have passed the final
exam or failed the final exam, and the dependent variable is the grade achieved on the standardized
reading test (A to F).
133
In the dialog box for the nonparacontinuous-level two independent samples test, we select the ordinal
test variable 'mid-term exam 1', which contains the pooled ranks, and our nominal grouping variable
'Exam'. With a click on '' we need to specify the valid values for the grouping variable
Exam, which in this case are 0 = fail and 1 = pass.
We also need to select the Test Type. The Mann-Whitney U-Test is marked by default. Like the Mann-
Whitney U-Test the Kolmogorov-Smirnov Z-Test and the Wald-Wolfowitz runs-test have the null
134
hypothesis that both samples are from the same population. Moses extreme reactions test has a
different null hypothesis: the range of both samples is the same.
The U-test compares the ranking, Z-test compares the differences in distributions, Wald-Wolfowitz
compares sequences in ranking, and Moses compares the ranges of the two samples. The Kolmogorov-
Smirnov Z-Test requires continuous-level data (interval or ratio scale), the Mann-Whitney U-Test, Wald-
Wolfowitz runs, and Moses extreme reactions require ordinal data.
If we select Mann-Whitney U, SPSS will calculate the U-value and Wilcoxon's W, which the sum of the
ranks for the smaller sample. If the values in the sample are not already ranked, SPSS will sort the
observations according to the test variable and assign ranks to each observation.
The dialog box allows us to specify an exact non-paracontinuous-level test of significance and the
dialog K^W^^
descriptive statistics.
The second table shows the actual test results. The SPSS
output contains the Mann-Whitney U, which is the sum of
the sum of the ranks for both variables, plus the maximum
sum of ranks, minus the sum of ranks for the first sample. In
our case U=492.5 and W = 1438.5, which results in a Z-Value
of -5.695. The test value z is approximately normally
distributed for large samples, so that p = 0.000. We know
that the critical z-value for a two-tailed test is 1.96 and a
one-tailed test 1.645. Thus the observed difference in
grading is statistically significant.
In our observation, 107 pupils were graded on a standardized reading test (grades A to F). Later
that year the students wrote a final exam. We analyzed the question whether the students who
passed the final exam achieved a better grade in the standardized reading test than the students
135
who failed the final exam. The Mann-Whitney U-test shows that the observed difference
between both groups of students is highly significant (p < 0.001, U = 492.5). Thus we can reject
the null hypothesis that both samples are from the same population, and that the observed
difference is not only caused by random effects of chance.
The Wilcox Sign test is a test of dependency. All dependence tests assume that the variables in the
analysis can be split into independent and dependent variables. A dependence tests that compares the
averages of an independent and a dependent variable assumes that differences in the average of the
dependent variable are caused by the independent variable. Sometimes the independent variable is
also called factor because the factor splits the sample in two or more groups, also called factor steps.
Dependence tests analyze whether there is a significant difference between the factor levels. The t-test
family uses mean scores as the average to compare the differences, the Mann-Whitney U-test uses
mean ranks as the average, and the Wilcox Sign test uses signed ranks.
Unlike the t-test and F-test the Wilcox sign test is a non-paracontinuous-level test. That means that the
test does not assume any properties regarding the distribution of the underlying variables in the
analysis. This makes the Wilcox sign test the analysis to conduct when analyzing variables of ordinal
scale or variables that are not multivariate normal.
The Wilcox sign test is mathematically similar to conducting a Mann-Whitney U-test (which is sometimes
also called Wilcoxon 2-sample t-test). It is also similar to the basic principle of the dependent samples t-
test, because just like the dependent samples t-test the Wilcox sign test, tests the difference of
observations.
However, the Wilcoxon signed rank test pools all differences, ranks them and applies a negative sign to
all the ranks where the difference between the two observations is negative. This is called the signed
rank. The Wilcoxon signed rank test is a non-paracontinuous-level test, in contrast to the dependent
samples t-tests. Whereas the dependent samples t-test tests whether the average difference between
two observations is 0, the Wilcox test tests whether the difference between two observations has a
mean signed rank of 0. Thus it is much more robust against outliers and heavy tail distributions.
Because the Wilcox sign test is a non-paracontinuous-level test it does not require a special distribution
of the dependent variable in the analysis. Therefore it is the best test to compare mean scores when
the dependent variable is not normally distributed and at least of ordinal scale.
136
For the test of significance of Wilcoxon signed rank test it is assumed that with at least ten paired
observations the distribution of the W-value approximates a normal distribution. Thus we can
normalize the empirical W-statistics and compare this to the tabulated z-ratio of the normal distribution
to calculate the confidence level.
Does the before-after measurement of the first and the last mid-term exam differ between the
students who have been taught in a blended learning course and the students who were taught
in a standard classroom setting?
We only measured the outcome of the mid-term exam on an ordinal scale (grade A to F); therefore a
dependent samples t-test cannot be used. This is such because the distribution is only binominal and
we do not assume that it approximates a normal distribution. Also both measurements are not
independent from each other and therefore we cannot use the Mann-Whitney U-test.
The Wilcox sign test can be found in Analyze/Nonparacontinuous-level Tests/Legacy Dialog/2 Related
^
137
In the next dialog box for the
nonparacontinuous-level two dependent
samples tests we need to define the paired
observations. We enter 'Grade on Mid-Term
Exam 1' as variable 1 of the first pair and 'Grade
on Mid-Term Exam 2' as Variable 2 of the first
pair. We also need to select the Test Type. The
Wilcoxon Signed Rank Test is marked by default.
Alternatively we could choose Sign, McNamar, or
Marginal Homogeneity,
Sign The sign test has the null hypothesis that both samples are from the same population. The sign
test compares the two dependent observations and counts the number of negative and positive
differences. It uses the standard normal distributed z-value to test of significance.
McNemar The McNemar test has the null hypothesis that differences in both samples are equal for
both directions. The test uses dichotomous (binary) variables to test whether the observed differences
in a 2x2 matrix including all 4 possible combinations differ significantly from the expected count. It uses
a Chi-Square test of significance.
Marginal Homogeneity The marginal homogeneity test has the null hypothesis that the differences in
both samples are equal in both directions. The test is similar to the McNemar test, but it uses nominal
variables with more than two levels. It tests whether the observed differences in a n*m matrix including
all possible combinations differ significantly from the expected count. It uses a Chi-Square test of
significance.
If the values in the sample are not already ranked, SPSS will sort the observations according to the test
variable and assign ranks to each observation, correcting for tied observations. The dialog box
allows us to specify an exact test of significance and the dialog box K defines how missing values
are managed and if SPSS should output additional descriptive statistics.
138
In our example we see that 107*2 observations were made for Exam 1 and Exam 2. The Wilcox Sign
Test answers the question if the difference is significantly different from zero, and therefore whether
the observed difference in mean ranks (39.28 vs. 30.95) can also be found in the general population.
One-hundred and seven pupils learned with a novel method. A before and after measurement
of a standardized test for each student was taken on a classical grading scale from A (rank 1) to
F (rank 6). The results seem to indicate that the after measurements show a decrease in test
scores (we find more positive ranks than negative ranks). However, the Wilcoxon signed rank
test shows that the observed difference between both measurements is not significant when we
account for the individual differences in the baseline (p = 0.832). Thus we cannot reject the null
hypothesis that both samples are from the same population, and we might assume that the
novel teaching method did not cause a significant change in grades.
139
CHAPTER 6: Predictive Analyses
Linear Regression
Sometimes the dependent variable is also called endogenous variable, prognostic variable or
regressand. The independent variables are also called exogenous variables, predictor variables or
regressors. However Linear Regression Analysis consists of more than just fitting a linear line through a
cloud of data points. It consists of 3 stages: 1) analyzing the correlation and directionality of the data,
2) estimating the model, i.e., fitting the line, and 3) evaluating the validity and usefulness of the model.
There are three major uses for Regression Analysis: 1) causal analysis, 2) forecasting an effect, 3) trend
forecasting. Other than correlation analysis, which focuses on the strength of the relationship between
two or more variables, regression analysis assumes a dependence or causal relationship between one or
more independent and one dependent variable.
Firstly, it might be used to identify the strength of the effect that the independent variable(s) have on a
dependent variable. Typical questions are what is the strength of relationship between dose and effect,
sales and marketing spending, age and income.
Secondly, it can be used to forecast effects or impacts of changes. That is, regression analysis helps us
to understand how much the dependent variable will change when we change one or more
independent variables. Typical questions are, How much additional Y do I get for one additional unit of
X.
Thirdly, regression analysis predicts trends and future values. The regression analysis can be used to get
point estimates. Typical questions are, What will the price for gold be 6 month from now? What is
the total effort for a task X?
In our sample of 107 students can we predict the standardized test score of reading when we
know the standardized test score of writing?
The first step is to check whether there is a linear relationship in the data. For that we check the scatter
plot ('). The scatter plot indicates a good linear relationship, which allows us to
conduct a linear regression analysis. We can also check the Pearson's Bivariate Correlation
140
() and find that both variables are strongly correlated (r = .645 with p <
0.001).
Secondly, we need to check for multivariate normality. We have a look at the Q-Q-Plots
(Analyze/Descriptive statistics/Q-Q-W) for both of our variables and see that they are not perfect,
but it might be close enough.
141
We can check our eyeball test with the 1-Sample Kolmogorov-Smirnov test (Analyze/Non
Paracontinuous-level Tests/Legacy Dialogs/1-Sample K-^). The test has the null hypothesis that the
variable approximates a normal distribution. The results confirm that reading score can be assumed to
be multivariate normal (p = 0.474)
while the writing test is not (p =
0.044). To fix this problem we
could try to transform the writing
test scores using a non-linear
transformation (e.g., log).
However, we do have a fairly large
sample in which case the linear
regression is quite robust against
violations of normality. It may
report too optimistic T-values and
F-values.
We now can conduct the linear regression analysis. Linear regression is found in SPSS in
Z>
To answer our simple research question we just need to add the Math Test Score as the dependent
variable and the Writing Test Score as the independent variable. The menu Statistics allows us to
include additional information that we need to assess the validity of our linear regression analysis. In
order to assess autocorrelation (especially if we have time series data) we add the Durbin-Watson Test,
and to check for multicollinearity we add the Collinearity diagnostics.
142
143
The next table is the F-test. The linear regression's F-test has the null hypothesis that there is no linear
relationship between the two variables (in other words R=0). With F = 53.828 and 106 degrees of
freedom the test is highly significant, thus we can assume that there is a linear relationship between the
variables in our model.
The next table shows the regression coefficients, the intercept, and the significance of all coefficients
and the intercept in the model. We find that our linear regression analysis estimates the linear
regression function to be y = 36.824 + .795* x. This means that an increase in one unit of x results in an
increase of .795 units of y. The test of significance of the linear regression analysis tests the null
hypothesis that the estimated coefficient is 0. The t-test finds that both intercept and variable are
highly significant (p < 0.001) and thus we might say that they are significantly different from zero.
This table also includes the Beta weights. Beta weights are the standardized coefficients and they allow
comparing of the size of the effects of different independent variables if the variables have different
144
units of measurement. The table also includes the collinearity statistics. However, since we have only
one independent variable in our analysis we do not pay attention to neither of the two values.
The last thing we need to check is the homoscedasticity and normality of residuals. The scatterplot
indicates constant variance. The P-P-Plot of z*pred and z*presid shows us that in our linear regression
analysis there is no tendency in the error terms.
We investigated the relationship between the reading and writing scores achieved on our
standardized tests. The correlation analysis found a medium positive correlation between the
two variables (r = 0.645). We then conducted a simple regression analysis to further
substantiate the suspected relationship. The estimated regression model is Math Score = 36.824
+ .795* Reading Score with an adjusted R of 33.3%; it is highly significant with p < 0.001 and F =
53.828. The standard error of the estimate is 14.58556. Thus we can not only show a positive
linear relationship, and we can also conclude that for every additional reading score achieved the
math score will increase by approximately .795 units.
At the center of the multiple linear regression analysis lies the task of fitting a single line through a
scatter plot. More specifically, the multiple linear regression fits a line through a multi-dimensional
cloud of data points. The simplest form has one dependent and two independent variables. The
145
general form of the multiple linear regression is defined as y E 0 E 1xi 2 E 2 xi 2 ... E p xin for i
n.
Sometimes the dependent variable is also called endogenous variable, criterion variable, prognostic
variable or regressand. The independent variables are also called exogenous variables, predictor
variables or regressors.
Multiple Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data
points. It consists of three stages: 1) analyzing the correlation and directionality of the data, 2)
estimating the model, i.e., fitting the line, and 3) evaluating the validity and usefulness of the model.
There are three major uses for Multiple Linear Regression Analysis: 1) causal analysis, 2) forecasting an
effect, and 3) trend forecasting. Other than correlation analysis, which focuses on the strength of the
relationship between two or more variables, regression analysis assumes a dependence or causal
relationship between one or more independent and one dependent variable.
Firstly, it might be used to identify the strength of the effect that the independent variables have on a
dependent variable. Typical questions would seek to determine the strength of relationship between
dose and effect, sales and marketing spend, age and income.
Secondly, it can be used to forecast effects or impacts of changes. That is to say, multiple linear
regression analysis helps us to understand how much the dependent variable will change when we
change the independent variables. A typical question would be ,ow much additional Y do I get for one
additional unit X
Thirdly, multiple linear regression analysis predicts trends and future values. The multiple linear
regression analysis can be used to get point estimates. Typical questions What will the
price for gold be six months from now? What is the total effort for a task X?
Can we explain the reading score that a student achieved on the standardized test with the five
aptitude tests?
First, we need to check whether there is a linear relationship between the independent variables and
the dependent variable in our multiple linear regression model. To do so, we check the scatter plots.
We could create five individual scatter plots using the G Alternatively we can use
the Matrix Scatter Plot in the menu '>^
146
The scatter plots indicate a good linear relationship between writing score and the aptitude tests 1 to 5,
where there seems to be a positive relationship for aptitude test 1 and a negative linear relationship for
aptitude tests 2 to 5.
Secondly, we need to check for multivariate normality. This can either be done with an eyeballtest on
the Q-Q-Plots or by using the 1-Sample K-S test to test the null hypothesis that the variable
approximates a normal distribution. The K-S test is not significant for all variables, thus we can assume
normality.
147
148
To answer our research question we
need to enter the variable reading
scores as the dependent variable in our
multiple linear regression model and
the aptitude test scores (1 to 5) as
independent variables. We also select
stepwise as the method. The default
method for the multiple linear
regression analysis is 'Enter', which
means that all variables are forced to
be in the model. But since over-fitting
is a concern of ours, we want only the
variables in the model that explain
additional variance. Stepwise means
that the variables are entered into the
regression model in the order of their
explanatory power.
In the field Options we can define the criteria for stepwise inclusion in the model. We want to include
variables in our multiple linear regression model that increase F by at least 0.05 and we want to exclude
them again if the increase F by less than 0.1. This dialog box also allows us to manage missing values
(e.g., replace them with the mean).
149
The dialog Statistics allows us to include additional statistics that we need to assess the validity of our
linear regression analysis. Even though it is not a time series, we include Durbin-Watson to check for
autocorrelation and we include the collinearity that will check for autocorrelation.
In the dialog Plots, we add the standardized residual plot (ZPRED on x-axis and ZRESID on y-axis), which
allows us to eyeball homoscedasticity and normality of residuals.
150
entered one by one and we would find five regression models. In this case however, we find that the
best explaining variable is Aptitude Test 1, which is entered in the first step while Aptitude Test 2 is
entered in the second step. After the second model is estimated, SPSS stops building new models
because none of the remaining variables increases F sufficiently. That is to say, none of the variables
adds significant explanatory power of the regression model.
The next table shows the multiple linear regression model summary and overall fit statistics. We find
that the adjusted R of our model 2 is 0.624 with the R = .631. This means that the linear regression
model with the independent variables Aptitude Test 1 and 2 explains 63.1% of the variance of the
Reading Test Score. The Durbin-Watson d = 1.872, which is between the two critical values of 1.5 and
2.5 (1.5 < d < 2.5), and therefore we can assume that there is no first order linear autocorrelation in our
multiple linear regression data.
If we would have forced all independent variables (Method: Enter) into the linear regression model we
would have seen a little higher R = 80.2% but an almost identical adjusted R=62.5%.
151
The next table is the F-test, or ANOVA. The F-Test is the test of significance of the multiple linear
regression. The F-test has the null hypothesis that there is no linear relationship between the variables
(in other words R=0). The F-test of or Model 2 is highly significant, thus we can assume that there is a
linear relationship between the variables in our model.
The next table shows the multiple linear regression coefficient estimates including the intercept and the
significance levels. In our second model we find a non-significant intercept (which commonly happens
and is nothing to worry about) but also highly significant coefficients for Aptitude Test 1 and 2. Our
regression equation would be: Reading Test Score = 7.761 + 0.836*Aptitude Test 1 0.503*Aptitude
Test 2. For every additional point achieved on Aptitude Test, we can interpret that the Reading Score
increases by 0.836, while for every additional score on Aptitude Test 2 the Reading Score decreases by
0.503.
152
Since we have multiple independent variables in the analysis the Beta weights compare the relative
importance of each independent variable in standardized terms. We find that Test 1 has a higher impact
than Test 2 (beta = .599 and beta = .302). This table also checks for multicollinearity in our multiple
linear regression model. Multicollinearity is the extent to which independent variables are correlated
with each other. Tolerance should be greater than 0.1 (or VIF < 10) for all variableswhich they are. If
tolerance is less than 0.1 there is a suspicion of multicollinearity, and with tolerance less than 0.01 there
is proof of multicollinearity.
Lastly, as the Goldfeld-Quandt test is not supported in SPSS, we check is the homoscedasticity and
normality of residuals with an eyeball test of the Q-Q-Plot of z*pred and z*presid. The plot indicates
that in our multiple linear regression analysis there is no tendency in the error terms.
We investigated the relationship between the reading scores achieved on our standardized tests
and the scores achieved on the five aptitude tests. The stepwise multiple linear regression
analysis found that Aptitude Test 1 and 2 have relevant explanatory power. Together the
estimated regression model (Reading Test Score = 7.761 + 0.836*Aptitude Test 1
0.503*Aptitude Test 2) explains 63.1% of the variance of the achieved Reading Score with an
adjusted R of 62.4%. The regression model is highly significant with p < 0.001 and F =88.854.
The standard error of the estimate is 8.006. Thus we can not only show a linear relationship
153
between aptitude tests 1 (positive) and 2 (negative), we can also conclude that for every
additional reading score achieved the reading score will increase by approximately 0.8 (Aptitude
Test 1) and decrease by 0.5 (Aptitude Test 2).
Logistic Regression
It is quite common to run a regular linear regression analysis with dummy independent variables. A
dummy variable is a binary variable that is treated as if it would be continuous. Practically speaking, a
dummy variable increases the intercept thereby creating a second parallel line above or below the
estimated regression line.
Alternatively, we could try to just create a multiple linear regression with a dummy dependent variable.
This approach, however, has two major shortcomings. Firstly, it can lead to probabilities outside of the
(0,1) interval, and secondly residuals will all have the same variance (think of parallel lines in the
zpred*zresid plot).
To solve these shortcomings we can use a logistic function to restrict the probability values to (0,1). The
logistic function is p(x) = 1/1+exp(-x). Technically this can be resolved to ln(p/(1-p))= a + b*x. ln(p/(1-p))
is also called the log odds. Sometimes
1
instead of a logit model for logistic
0,9
regression, a probit model is used. The
0,8
following graph shows the difference for a
0,7
logit and a probit model for different
0,6
values [-4,4]. Both models are commonly Logit
0,5
used in logistic regression; in most cases a Probit
0,4
model is fitted with both functions and
0,3
the function with the better fit is chosen.
0,2
However, probit assumes normal
0,1
distribution of the probability of the
0
event, when logit assumes the log -4 -2 0 2 4
154
distribution. Thus the difference between logit and probit is usually only visible in small samples.
At the center of the logistic regression analysis lies the task of estimating the log odds of an event.
Mathematically, logistic regression estimates a multiple linear regression function defined as logit(p)
p( y 1)
log E 0 E 1xi 2 E 2 xi 2 ... E p xin .
1 ( p 1)
Logistic regression is similar to the Discriminant Analysis. Discriminant analysis uses the regression line
to split a sample in two groups along the levels of the dependent variable. Whereas the logistic
regression analysis uses the concept of probabilities and log odds with cut-off probability 0.5, the
discriminant analysis cuts the geometrical plane that is represented by the scatter cloud. The practical
difference is in the assumptions of both tests. If the data is multivariate normal, homoscedasticity is
present in variance and covariance and the independent variables are linearly related. Discriminant
analysis is then used because it is more statistically powerful and efficient. Discriminant analysis is
typically more accurate than logistic regression in terms of predictive classification of the dependent
variable.
A research study is conducted on 107 pupils. These pupils have been measured with five
different aptitude testsone for each important category (reading, writing, understanding,
summarizing etc.). How do these aptitude tests predict if the pupils pass the year end exam?
First we need to check that all cells in our model are populated. Since we don't have any categorical
variables in our design we will skip this step.
155
This opens the dialog box to specify the model. Here we need to enter the nominal variable Exam (pass
= 1, fail = 0) into the dependent variable box and we enter all aptitude tests as the first block of
covariates in the model.
The menu C allows to specify contrasts for categorical variables (which we do not have in our
logistic regression model), and Options offers several additional statistics, which don't need.
156
The Output of the Logistic Regression Analysis
The first table simply shows the case processing summary, which lists nothing more than the valid
sample size.
Missing Cases 0 .0
Total 107 100.0
Unselected Cases 0 .0
Total 107 100.0
a. If weight is in effect, see classification table for the total
number of cases.
The next three tables are the results for the intercept model. That is the Maximum Likelihood model if
only the intercept is included without any of the dependent variables in the analysis. This is basically
only interesting to calculate the Pseudo R that describes the goodness of fit for the logistic model.
Classification Tablea,b
Predicted
Exam Percentage
Observed Fail Pass Correct
Step 0 Exam Fail 64 0 100.0
Pass 43 0 .0
Overall Percentage 59.8
a. Constant is included in the model.
b. The cut value is .500
157
Variables not in the Equation
Score df Sig.
Step 0 Variables Apt1 30.479 1 .000
Apt2 10.225 1 .001
Apt3 2.379 1 .123
Apt4 6.880 1 .009
Apt5 5.039 1 .025
Overall Statistics 32.522 5 .000
The relevant tables can be found in the section 'Block 1' in the SPSS output of our logistic regression
analysis. The first table includes the Chi-Square goodness of fit test. It has the null hypothesis that
intercept and all coefficients are zero. We can reject this null hypothesis.
Chi-square df Sig.
Step 1 Step 38.626 5 .000
The next table includes the Pseudo R; the -2 log likelihood is the minimization criteria used by SPSS. We
see that Nagelkerke's R is 0.409, which indicates that the model is good but not great. Cox & Snell's R
is the nth root (in our case the 107th of the -2log likelihood improvement. Thus we can interpret this as
30% probability of the event passing the exam is explained by the logistic model.
Model Summary
Cox & Snell R Nagelkerke R
Step -2 Log likelihood Square Square
1 105.559a .303 .409
a. Estimation terminated at iteration number 5 because parameter
estimates changed by less than .001
158
The next table contains the classification results, with almost 80% correct classification the model is not
too bad generally a discriminant analysis is better in classifying data correctly.
Classification Tablea
Predicted
Exam Percentage
Observed Fail Pass Correct
Step 1 Exam Fail 53 11 82.8
Pass 11 32 74.4
Overall Percentage 79.4
a. The cut value is .500
The last table is the most important one for our logistic regression analysis. It shows the regression
function -1.898 + .148*x1 .022*x2 - .047*x3 - .052*x4 + .011*x5. The table also includes the test of
significance for each of the coefficients in the logistic regression model. For small samples the t-values
are not valid and the Wald statistic should be used instead. Wald is basically t which is Chi-Square
distributed with df=1. However, SPSS gives the significance levels of each coefficient. As we can see,
only Apt1 is significantall other variables are not.
159
Model Summary
If we change the method from Enter to Forward: Wald the quality of the logistic regression improves.
Now only the significant coefficients are included in the logistic regression equation. In our case the
model simplifies to Aptitude Test Score 1 and the intercept. Then we get the logistic equation
1
p ( 5.270.158 Apt1)
. This equation is easier to interpret, because we know now that a score of one
1 e
point higher score on the Aptitude Test 1 multiplies the odds of passing the exam by 1.17 (exp(.158)).
We can also calculate the critical value for p = 50%, which is Apt1 = -intercept/coefficient = -5.270/.158 =
33.35. That is if a pupil scored higher than 33.35 on the Aptitude Test 1 the logistic regression predicts
that this pupil will pass the final exam.
We conducted a logistic regression to predict whether a student will pass the final exam based
on the five aptitude scores the student achieved. The stepwise logistic regression model finds
only the Aptitude Test 1 to be of relevant explanatory power. The logistic equation indicates that
an additional score point on the Aptitude Test 1 multiplies the odds of passing by 1.17. Also we
predict that students who scored higher than 33.35 on the Aptitude Test will pass the final exam.
Ordinal Regression
160
more independent variables. In ordinal regression analysis, the dependent variable is ordinal
(statistically it is polytomous ordinal) and the independent variables are ordinal or continuous-level(ratio
or interval).
Sometimes the dependent variable is also called response, endogenous variable, prognostic variable or
regressand. The independent variables are also called exogenous variables, predictor variables or
regressors.
Linear regression estimates a line to express how a change in the independent variables affects the
dependent variables. The independent variables are added linearly as a weighted sum of the form
y E 0 E 1xi 2 E 2 xi 2 ... E p xin . Linear regression estimates the regression coefficients by
minimizing the sum of squares between the left and the right side of the regression equation. Ordinal
regression however is a bit trickier. Let us consider a linear regression of income = 15,000 + .980 * age.
We know that for a 30 year old person the expected income is 44,400 and for a 35 year old the income
is 49,300. That is a difference of 4,900. We also know that if we compare a 55 year old with a 60 year
old the difference of 68,900-73,800 = 4,900 is exactly the same difference as the 30 vs. 35 year old. This
however is not always true for measures that have ordinal scale. For instance if we classify the income
to be low, medium, high, it is impossible to say if the difference between low and medium is the same as
between medium and high, or if 3*low = high.
There are three major uses for Ordinal Regression Analysis: 1) causal analysis, 2) forecasting an effect,
and 3) trend forecasting. Other than correlation analysis for ordinal variables (e.g., Spearman), which
focuses on the strength of the relationship between two or more variables, ordinal regression analysis
assumes a dependence or causal relationship between one or more independent and one dependent
variable. Moreover the effect of one or more covariates can be accounted for.
Firstly, ordinal regression might be used to identify the strength of the effect that the independent
variables have on a dependent variable. A typical question is, What is the strength of relationship
between dose (low, medium, high) and effect (mild, moderate, severe)
Secondly, ordinal regression can be used to forecast effects or impacts of changes. That is, ordinal
regression analysis helps us to understand how much will the dependent variable change, when we
change the independent variables. A typical question isWhen is the response most likely to jump into
the next category?
Finally, ordinal regression analysis predicts trends and future values. The ordinal regression analysis can
be used to get point estimates. A typical question is, If I invest a medium study effort what grade (A-F)
can I expect?
In our study the 107 students have been given six different tests. The pupils either failed or
passed the first five tests. For the final exam, the students got graded either as fail, pass, good
161
or distinction. We now want to analyze how the first five tests predict the outcome of the final
exam.
To answer this we need to use ordinal regression to analyze the question above. Although technically
this method is not ideal because the observations are not completely independent, it best suits the
purpose of the research team.
The next dialog box allows us to specify the ordinal regression model. For our example the final exam
(four levels fail, pass, good, distinction) is the dependent variable, the five
five exams taken during the term. Please note that this works correctly only if the right measurement
scales have been defined within SPSS.
162
Furthermore, SPSS offers the option to include one or more covariates of continuous-level scale (interval
or ratio). However, adding more than one covariate typically results in a large cell probability matrix
with a large number of empty cells.
The options dialog allows us to manage various settings for the iteration solution, more interestingly
here we can also change the link setting for the ordinal regression. In ordinal regression the link
function is a transformation of the cumulative probabilities of the ordered dependent variable that
allows for estimation of the model. There are five different link functions.
1. Logit function: Logit function is the default function in SPSS for ordinal regression. This function is
usually used when the dependent ordinal variable has equal categories. Mathematically, logit function
equals to p(z) = ln(z / (1 z)).
2. Probit model: This is the inverse standard normal cumulative distribution function. This function is
more suitable when a dependent variable is normally distributed. Mathematically, the probit function is
p( z) )( z) .
163
1
0,9
0,8
0,7
0,6
Logit
0,5
Probit
0,4
0,3
0,2
0,1
0
-4 -2 0 2 4
Both models (logit and probit) are most commonly used in ordinal regression, in most cases a model is
fitted with both functions and the function with the better fit is chosen. However, probit assumes
normal distribution of the probability of the categories of the dependent variable, when logit assumes
the log distribution. Thus the difference between logit and probit is typically seen in small samples.
3. Negative log-log: This link function is recommended when the probability of the lower category is
high. Mathematically the negative log-log is p(z) = log ( log(z)).
4. Complementary log-log: This function is the inverse of the negative log-log function. This function is
recommended when the probability of higher category is high. Mathematically complementary log-log
is p(z) = log ( log (1 z)).
5. Cauchit: This link function is used when the extreme values are present in the data. Mathematically
Cauchit is p(z) = tan (p(z 0.5)).
We leave the ordinal regression's other dialog boxes at their default settings; we just add the test of
parallel lines in the Output menu.
164
The Wald ratio is defined as (coefficient/standard error) and is the basis for the test of significance (null
hypothesis: the coefficient is zero). We find that Ex1 and Ex4 are significantly different from zero.
Therefore there seems to be a relationship between pupils performing on Ex1 and Ex4 and their final
exam scores.
The next interesting table is the test for parallel lines. It tests the null hypothesis that the lines run
parallel. Our test is not significant and thus we cannot reject the null hypothesis. A significant test
typically indicates that the ordinal regression model uses the wrong link function.
165
CHAPTER 7: Classification Analyses
Like all linear regressions, the multinomial regression is a predictive analysis. Multinomial regression is
used to describe data and to explain the relationship between one dependent nominal variable and one
or more continuous-level(interval or ratio scale) independent variables.
How can we apply the logistic regression principle to a multinomial variable (e.g. 1/2/3)?
Example:
We analyze our class of pupils that we observed for a whole term. At the end of the term we
gave each pupil a computer game as a gift for their effort. Each participant was free to choose
between three games an action, a puzzle or a sports game. The researchers want to know how
the initial baseline (doing well in math, reading, and writing) affects the choice of the game.
Note that the choice of the game is a nominal dependent variable with more than two levels.
Therefore multinomial regression is the best analytic approach to the question.
How do we get from logistic regression to multinomial regression? Multinomial regression is a multi-
equation model, similar to multiple linear regression. For a nominal dependent variable with k
categories the multinomial regression model estimates k-1 logit equations. Although SPSS does
compare all combinations of k groups it only displays one of the comparisons. This is typically either the
first or the last category. The multinomial regression procedure in SPSS allows selecting freely one
group to compare the others with.
What are logits? The basic idea behind logits is to use a logarithmic function to restrict the probability
values to (0,1). Technically this is the log odds (the logarithmic of the odds of y = 1). Sometimes a probit
model is used instead of a logit model for multinomial regression. The following graph shows the
difference for a logit and a probit model for different values (-4,4). Both models are commonly used as
the link function in ordinal regression. However, most multinomial regression models are based on the
166
logit function. The difference between both functions is typically only seen in small samples because
probit assumes normal distribution of the probability of the event, when logit assumes the log
distribution.
0,9
0,8
0,7
0,6
Logit
0,5
Probit
0,4
0,3
0,2
0,1
0
-4 -2 0 2 4
At the center of the multinomial regression analysis is the task estimating the k-1 log odds of each
category. In our k=3 computer game example with the last category as reference multinomial
regression estimates k-1 multiple linear regression function defined as
p( y 1)
logit(y=1) log E 0 E 1xi 2 E 2 xi 2 ... E p xin .
1 ( p 1)
p( y 2)
logit(y=2) log E 0 E 1xi 2 E 2 xi 2 ... E p xin .
1 ( p 2)
Multinomial regression is similar to the Multivariate Discriminant Analysis. Discriminant analysis uses
the regression line to split a sample in two groups along the levels of the dependent variable. In the
case of three or more categories of the dependent variable multiple discriminant equations are fitted
through the scatter cloud. In contrast multinomial regression analysis uses the concept of probabilities
and k-1 log odds equations that assume a cut-off probability 0.5 for a category to happen. The practical
difference is in the assumptions of both tests. If the data is multivariate normal, homoscedasticity is
present in variance and covariance and the independent variables are linearly related, then we should
use discriminant analysis because it is more statistically powerful and efficient. Discriminant analysis is
also more accurate in predictive classification of the dependent variable than multinomial regression.
We conducted a research study with 107 students. The students were measured on a
standardized reading, writing, and math test at the start of our study. At the end of the study,
we offered every pupil a computer game as a thank you gift. They were free to choose one of
167
three games a sports game, a puzzle and an action game. How does ability to read,
write, or calculate influence their game choice?
First we need to check that all cells in our model are populated. Although the multinomial regression is
robust against multivariate normality and therefore better suited for smaller samples than a probit
model, we still need to check. We find that some of the cells are empty. We must therefore collapse
some of the factor levels. The easiest way to check is to create the contingency table
(^).
But even if we collapse the factor levels of our multinomial regression model down to two levels
(performance good vs. not good) we observe empty cells. We proceed with the analysis regardless,
noting and reporting this limitation of our analysis.
168
169
In the menu Dwe need to specify the model for the multinomial regression. The huge advantage
over ordinal regression analysis is the ability to conduct a stepwise multinomial regression for all main
and interaction effects. If we want to include additional measures about the multinomial regression
model to the output we can do so in the dialog box
^
N Marginal Percentage
a. The dependent variable has only one value observed in 5 (62.5%) subpopulations.
170
The next table details which variables are entered into the multinomial regression. Remember that we
selected a stepwise model. In our example the writing test results (good3) and then the reading test
results (good2) were entered. Also the 0 model is shown as the -2*log(likelihood) change between
models is used for significance testing and calculating the Pseudo-Rs.
Step Summary
Model
Fitting
Criteria Effect Selection Tests
-2 Log
a
Model Action Effect(s) Likelihood Chi-Square df Sig.
0 Entered Intercept 216.336 .
1 Entered Good3 111.179 105.157 2 .000
2 Entered Good2 7.657 103.522 2 .000
Stepwise Method: Forward Entry
a. The chi-square for entry is based on the likelihood ratio test.
The next 3 tables contain the goodness of fit criteria. As we find the goodness-of-fit (chi-square test of
the null hypothesis that the coefficients are different from zero) is not significant and Nagelkerke's R is
close to 1. Remember that Cox & Snell's R does not scale up to 1. Cox & Snell's R is the nth root (in
our case the 107th) of the -2log likelihood improvement. We can interpret the Pseudo-R as our
multinomial regression model explains 85.6% of the probability that a given computer game is chosen by
the pupil.
Model
Fitting
Criteria Likelihood Ratio Tests
-2 Log Chi-
Model Likelihood Square df Sig.
171
The classification table shows that the estimated multinomial regression functions correctly
classify 97.2% of the events. Although this is sometimes reported, it is a less powerful goodness of fit
test than Pearson's or Deviance.
Classification
Predicted
Polar Bear Percent
Observed Superblaster Puzzle Mania Olympics Correct
Superblaster 40 0 0 100.0%
Puzzle Mania 2 28 1 90.3%
Polar Bear Olympics 0 0 36 100.0%
Overall Percentage 39.3% 26.2% 34.6% 97.2%
The most important table for our multinomial regression analysis is the Parameter Estimates table. It
includes the coefficients for the two logistic regression functions. The table also includes the test of
significance for each of the coefficients in the multinomial regression model. For small samples the t-
values are not valid and the Wald statistic should be used instead. However, SPSS gives the significance
levels of each coefficient. As we can see, only Apt1 is significant and all other variables are not.
172
Parameter Estimates
In this analysis the parameter estimates are quite wild because we collapsed our factors to binary level
for the lack of sample size. This results in the standard error either skyrocketing or dropping. The
intercept is the multinomial regression estimate for all other values being zero. The coefficient for
Good3 is 48.030. So, if a pupil were to increase his score on Test 3 by one unitthat is, he jumps from
fail to pass because of our collapsingthe log-odds of preferring action over the sports game would
decrease by -48.030. In other words, pupils that fail Test 2 and 3 (variables good2, good3) are more
likely to prefer the Superblaster game.
The standard error, Wald statistic, and test of significance are given for each coefficient in our
multinomial regression model. Because of our use of binary variables, the standard error is zero and
thus the significance is 0 as well. This is a serious limitation of this analysis and should be reported
accordingly.
173
Sequential one-way discriminant analysis now assumes that the discriminating, independent variables
are not equally important. This might be a suspected explanatory power of the variables, a hypothesis
deducted from theory or a practical assumption, for example in customer segmentation studies.
Like the standard one-way discriminant analysis, sequential one-way discriminant analysis is useful
mainly for two purposes: 1) identifying differences between groups, and 2) predicting group
membership.
Firstly, sequential one-way discriminant analysis identifies the independent variables that significantly
discriminate between the groups that are defined by the dependent variable. Typically, sequential one-
way discriminant analysis is conducted after a cluster analysis or a decision tree analysis to identify the
goodness of fit for the cluster analysis (remember that cluster analysis does not include any goodness of
fit measures itself). Sequential one-way discriminant analysis tests whether each of the independent
variables has discriminating power between the groups.
Secondly, sequential one-way discriminant analysis can be used to predict group membership. One
output of the sequential one-way discriminant analysis is Fisher's discriminant coefficients. Originally
Fisher developed this approach to identify the species to which a plant belongs. He argued that instead
of going through a whole classification table, only a subset of characteristics is needed. If you then plug
in the scores of respondents into these linear equations, the result predicts the group membership. This
is typically used in customer segmentation, credit risk scoring, or identifying diagnostic groups.
Because sequential one-way discriminant analysis assumes that group membership is given and that the
variables are split into independent and dependent variables, the sequential one-way discriminant
analysis is a so called structure testing method as opposed to structure exploration methods (e.g., factor
analysis, cluster analysis).
The sequential one-way discriminant analysis assumes that the dependent variable represents group
membership the variable should be nominal. The independent variables represent the characteristics
explaining group membership.
The independent variables need to be continuous-level(interval or ratio scale). Thus the sequential one-
way discriminant analysis is similar to a MANOVA, logistic regression, multinomial and ordinal
regression. Sequential one-way discriminant analysis is different than the MANOVA because it works
the other way around. MANOVAs test for the difference of mean scores of dependent variables of
continuous-level scale (interval or ratio). The groups are defined by the independent variable.
Sequential one-way discriminant analysis is different from logistic, ordinal and multinomial regression
because it uses ordinary least squares instead of maximum likelihood; sequential one-way discriminant
analysis, therefore, requires smaller samples. Also continuous variables can only be entered as
covariates in the regression models; the independent variables are assumed to be ordinal in scale.
Reducing the scale level of an interval or ratio variable to ordinal in order to conduct multinomial
regression takes out variation from the data and reduces the statistical power of the test. Whereas
sequential one-way discriminant analysis assumes continuous variables, logistic/ multinomial/ ordinal
174
regression assume categorical data and thus use a Chi-Square like matrix structure. The disadvantage of
this is that extremely large sample sizes are needed for designs with many factors or factor levels.
Moreover, sequential one-way discriminant analysis is a better predictor of group membership if the
assumptions of multivariate normality, homoscedasticity, and independence are met. Thus we can
prevent over-fitting of the model, that is to say we can restrict the model to the relevant independent
variables and focus subsequent analyses. Also, because it is an analysis of the covariance, we can
measure the discriminating power of a predictor variable when removing the effects of the other
independent predictors.
The students in our sample were taught with different methods and their ability in different tasks
was repeatedly graded on aptitude tests and exams. At the end of the study the pupils go to
chose from three thank you gifts: a sports game (Superblaster), a puzzle game
(Puzzle Mania) and an action game (Polar Bear Olympics). The researchers wish to learn what
guided the choice of gift.
The independent variables are the three test scores from the standardized mathematical, reading,
writing test (viz. Test_Score, Test2_Score, and Test3_score). From previous correlation analysis we
suspect that the writing and the reading score have the highest influence on the outcome. In our
logistic regression we found that pupils scoring lower had higher risk ratios of preferring the action
game over the sports or the puzzle game.
The sequential one way discriminant analysis is not a part of the graphical user interface of SPSS.
However, if we want include our variables in a specific order into the sequential one-way discriminant
model we can do so by specifying the order in the /analysis subcommand of the Discriminant syntax.
The SPSS syntax for a sequential one-way discriminant analysis specifies the sequence of how to include
the variables in the analysis by defining an inclusion level. ^W^^
where variables with level 0 are never included in the analysis.
DISCRIMINANT
/GROUPS=Gift(1 3)
/VARIABLES=Test_Score Test2_Score Test3_Score
/ANALYSIS Test3_Score (3), Test2_Score (2), Test_Score (1)
/METHOD=WILKS
/FIN=3.84
/FOUT=2.71
/PRIORS SIZE
/HISTORY
175
/STATISTICS=BOXM COEFF
/CLASSIFY=NONMISSING POOLED.
Test Results
Box's M 34.739
F Approx. 5.627
df1 6
df2 205820.708
Sig. .000
Tests null hypothesis of equal population
covariance matrices.
The next table shows the variables entered in each step of the sequential one-way discriminant analysis.
a,b,c,d
Variables Entered/Removed
Wilks' Lambda
Exact F
Step Entered Statistic df1 df2 df3 Statistic df1 df2 Sig.
At each step, the variable that minimizes the overall Wilks' Lambda is entered.
a. Maximum number of steps is 4.
b. Minimum partial F to enter is 3.84.
c. Maximum partial F to remove is 2.71.
d. F level, tolerance, or VIN insufficient for further computation.
176
We find that the writing test score was first entered, followed by the reading test score (based on the
Wilks' Lambda). The third variable we specified, the math test score, was not entered because it did not
explain anymore variance of the data. It also shows the significance of each variable by running the F-
test for the specified model.
Eigenvalues
Canonical
Function Eigenvalue % of Variance Cumulative % Correlation
a
1 5.601 99.9 99.9 .921
a
2 .007 .1 100.0 .085
a. First 2 canonical discriminant functions were used in the analysis.
The next few tables show the variables in the analysis and the variables not in the analysis and Wilk's
Lambda. All of these tables contain virtually the same data. The next table shows the discriminant
SS b
eigenvalues. The eigenvalues are defined as J and are maximized using ordinary least squares.
SS w
We find that the first function explains 99.9% of the variance and the second function explains the rest.
This is quite unusual for a discriminant model. This table also shows the canonical correlation
J
coefficient for the sequential discriminant analysis that is defined as c .
1 J
The next table in the output of our sequential one-way discriminant function describes the standardized
canonical discrim coefficientthese are the estimated Beta coefficients. Since we do have more than
two groups in our analysis we need at least two functions (each canonical discrim function can
differentiate between two groups). We see that
Function
1 2
Writing Test .709 .723
Reading Test .827 -.585
177
This however has no inherent meaning other than knowing that a high score on both tests gives function
1 a high value, while simultaneously giving function 2 a lower value. In interpreting this table, we need
to look at the group centroids of our one-way sequential discriminant analysis at the same time.
We find that a high score of three on the first function indicates a preference for the sports game, a
score close to zero indicates a preference for the puzzle game, and a low score indicates a preference
for the action game. Remember that this first function explained 99.9% of our variance in the data. We
also know that the sequential one-way discriminant function 1 scored higher for high results in the
writing and the reading tests, whereby reading was a bit more important than writing.
Thus we can say that pupils who did well on our reading and writing test are more likely to choose the
sports game, and pupils who did not do well on the tests are more likely to choose the action game.
The final interesting table in the sequential one-way discriminant function output is the classification
coefficient table. Fisher's classification coefficients can be used to predict group membership.
178
Superblaster = -2.249 + .151 * writing + .206 * reading
Puzzle Mania = -8.521 + .403 * writing + .464 * reading
Polar Bear Olympics = -26.402 + .727 * writing + .885 * reading
If we would plug in the numbers of a new student joining class who score 40 on both tests we would get
3 scores:
Superblaster = 12.031
Puzzle Mania = 26.159
Polar Bear Olympics = 38.078
Thus the student would most likely choose the Polar Bear Olympics (the highest value predicts the group
membership).
The table classification results show that specifically in the case where we predicted that the student
would choose the sports game, 13.9% chose the puzzle game instead. This serves to alert us to the risk
behind this classification function.
Cluster Analysis
179
The Cluster Analysis is often part of the sequence of analyses of factor analysis, cluster analysis, and
finally, discriminant analysis. First, a factor analysis that reduces the dimensions and therefore the
number of variables makes it easier to run the cluster analysis. Also, the factor analysis minimizes
multicollinearity effects. The next analysis is the cluster analysis, which identifies the grouping. Lastly, a
discriminant analysis checks the goodness of fit of the model that the cluster analysis found and profiles
the clusters. In almost all analyses a discriminant analysis follows a cluster analysis because the cluster
analysis does not have any goodness of fit measures or tests of significance. The cluster analysis relies
on the discriminant analysis to check if the groups are statistically significant and if the variables
significantly discriminate between the groups. However, this does not ensure that the groups are
actually meaningful; interpretation and choosing the right clustering is somewhat of an art. It is up to
the understanding of the researcher and how well he/she understands and makes sense of his/her data!
Furthermore, the discriminant analysis builds a predictive model that allows us to plug in the numbers of
new cases and to predict the cluster membership.
Medicine What are the diagnostic clusters? To answer this question the researcher would devise a
diagnostic questionnaire that entails the symptoms (for example in psychology standardized scales for
anxiety, depression etc.). The cluster analysis can then identify groups of patients that present with
similar symptoms and simultaneously maximize the difference between the groups.
Marketing What are the customer segments? To answer this question a market researcher conducts a
survey most commonly covering needs, attitudes, demographics, and behavior of customers. The
researcher then uses the cluster analysis to identify homogenous groups of customers that have similar
needs and attitudes but are distinctively different from other customer segments.
Education What are student groups that need special attention? The researcher measures a couple of
psychological, aptitude, and achievement characteristics. A cluster analysis then identifies what
homogenous groups exist among students (for example, high achievers in all subjects, or students that
excel in certain subjects but fail in others, etc.). A discriminant analysis then profiles these performance
clusters and tells us what psychological, environmental, aptitudinal, affective, and attitudinal factors
characterize these student groups.
Biology What is the taxonomy of species? The researcher has collected a data set of different plants
and noted different attributes of their phenotypes. A hierarchical cluster analysis groups those
observations into a series of clusters and builds a taxonomy tree of groups and subgroups of similar
plants.
Other techniques you might want to try in order to identify similar groups of observations are Q-
analysis, multi-dimensional scaling (MDS), and latent class analysis.
Q-analysis, also referred to as Q factor analysis, is still quite common in biology but now rarely used
outside of that field. Q-analysis uses factor analytic methods (which rely on Rthe correlation between
180
variables to identify homogenous dimensions of variables) and switches the variables in the analysis for
observations (thus changing the R into a Q).
Multi-dimensional scaling for scale data (interval or ratio) and correspondence analysis (for nominal
data) can be used to map the observations in space. Thus, it is a graphical way of finding groupings in
the data. In some cases MDS is preferable because it is more relaxed regarding assumptions (normality,
scale data, equal variances and covariances, and sample size).
Lastly, latent class analysis is a more recent development that is quite common in customer
segmentations. Latent class analysis introduces a dependent variable into the cluster model, thus the
cluster analysis ensures that the clusters explain an outcome variable, (e.g., consumer behavior,
spending, or product choice).
When we examine our standardized test scores in mathematics, reading, and writing,
what do we consider to be homogenous clusters of students?
In SPSS Cluster Analyses can be found in . SPSS offers three methods for the cluster
analysis: K-Means Cluster, Hierarchical Cluster, and Two-Step Cluster.
K-means cluster is a method to quickly cluster large data sets, which typically take a while to
compute with the preferred hierarchical cluster analysis. The researcher must to define the number of
clusters in advance. This is useful to test different models with a different assumed number of clusters
(for example, in customer segmentation).
Hierarchical cluster is the most common method. We will discuss this method shortly. It takes
time to calculate, but it generates a series of models with cluster solutions from 1 (all cases in one
cluster) to n (all cases are an individual cluster). Hierarchical cluster also works with variables as
opposed to cases; it can cluster variables together in a manner somewhat similar to factor analysis. In
addition, hierarchical cluster analysis can handle nominal, ordinal, and scale data, however it is not
recommended to mix different levels of measurement.
Two-step cluster analysis is more of a tool than a single analysis. It identifies the groupings by
running pre-clustering first and then by hierarchical methods. Because it uses a quick cluster algorithm
upfront, it can handle large data sets that would take a long time to compute with hierarchical cluster
methods. In this respect, it combines the best of both approaches. Also two-step clustering can handle
scale and ordinal data in the same model. Two-step cluster analysis also automatically selects the
number of clusters, a task normally assigned to the researcher in the two other methods.
181
The hierarchical cluster analysis follows three basic steps: 1) calculate the distances, 2) link the clusters,
and 3) choose a solution by selecting the right number of clusters.
Before we start we have to select the variables upon which we base our clusters. In the dialog we add
math, reading, and writing test to the list of variables. Since we want to cluster cases we leave the rest
of the tick marks on the default.
182
In the dialog box ^ we can specify whether we want to output the proximity matrix (these are
the distances calculated in the first step of the analysis) and the predicted cluster membership of the
cases in our observations. Again, we leave all settings on default.
In the dialog box W we should add the Dendrogram. The Dendrogram will graphically show how the
clusters are merged and allows us to identify what the appropriate number of clusters is.
The dialog box D is very important! Here we can specify the distance measure and the
clustering method. First, we need to define the correct distance measure. SPSS offers three large blocks
of distance measures for interval (scale), counts (ordinal), and binary (nominal) data.
183
For scale data, the most common is Square Euclidian Distance. It is based on the Euclidian Distance
between two observations, which uses Pythagoras' formula for the right triangle: the distance is the
square root of squared distance on dimension x and y. The Squared Euclidian Distance is this distance
squared, thus it increases the importance of large distances, while weakening the importance of small
distances.
If we have ordinal data (counts) we can select between Chi-Square (think cross-tab) or a standardized
Chi-Square called Phi-Square. For binary data SPSS has a plethora of distance measures. However, the
Square Euclidean distance is a good choice to start with and quite commonly used. It is based on the
number of discordant cases.
184
Next we have to choose the Cluster Method. Typically choices are Between-groups linkage (distance
between clusters is the average distance of all data points within these clusters), nearest neighbor
(single linkage: distance between clusters is the smallest distance between two data points), furthest
neighbor (complete linkage: distance is the largest distance between two data points), and Ward's
method (distance is the distance of all clusters to the grand average of the sample). Single linkage works
best with long chains of clusters, while complete linkage works best with dense blobs of clusters and
between-groups linkage works with both cluster types. The usual recommendation is to use single
linkage first. Although single linkage tends to create chains of clusters, it helps in identifying outliers.
After excluding these outliers, we can move onto Ward's method. Ward's method uses the F value (like
an ANOVA) to maximize the significance of differences between cluster, which gives it the highest
statistical power of all methods. The downside is that it is prone to outliers and creates small clusters.
185
A last consideration is standardization. If the variables have different scales and means we might want
to standardize either to Z scores or just by centering the scale. We can also transform the values to
absolute measures if we have a data set where this might be appropriate.
186
The icicle and dendrogram plot show the agglomeration schedule in a slightly more readable format. It
shows from top to bottom how the cases are merged into clusters. Since we used single linkage we find
that three cases form a chain and should be excluded as outliers.
187
188
the best solution is a two-cluster solution. The biggest step is from 2 to 1 cluster, and is by far bigger
than from 3 to 2 clusters.
Upon rerunning the analysis, this time in the dialog box ^ we can specify that we want to save the
two cluster solutions. In this case a new variable will be added that specifies the grouping. This then
would be the dependent variable in our discriminant analysis; it would check the goodness of fit and the
profiles of our clusters.
Factor Analysis
Factor Analysis reduces the information in a model by reducing the dimensions of the observations. This
procedure has multiple purposes. It can be used to simplify the data, for example reducing the number
of variables in predictive regression models. If factor analysis is used for these purposes, most often
factors are rotated after extraction. Factor analysis has several different rotation methodssome of
them ensure that the factors are orthogonal. Then the correlation coefficient between two factors is
zero, which eliminates problems of multicollinearity in regression analysis.
Factor analysis is also used in theory testing to verify scale construction and operationalizations. In such
a case, the scale is specified upfront and we know that a certain subset of the scale represents an
independent dimension within this scale. This form of factor analysis is most often used in structural
189
equation modeling and is referred to as Confirmatory Factor Analysis. For example, we know that the
questions pertaining to the big five personality traits cover all five dimensions N, A, O, and I. If we want
to build a regression model that predicts the influence of the personality dimensions on an outcome
variable, for example anxiety in public places, we would start to model a confirmatory factor analysis of
the twenty questionnaire items that load onto five factors and then regress onto an outcome variable.
Factor analysis can also be used to construct indices. The most common way to construct an index is to
simply sum up the items in an index. In some contexts, however, some variables might have a greater
explanatory power than others. Also sometimes similar questions correlate so much that we can justify
dropping one of the questions completely to shorten questionnaires. In such a case, we can use factor
analysis to identify the weight each variable should have in the index.
What are the underlying dimensions of our standardized and aptitude test scores?
That is, how do aptitude and standardized tests form performance dimensions?
190
In the dialog box of the factor analysis we start by adding our variables (the standardized tests math,
reading, and writing, as well as the aptitude tests 1-5) to the list of variables.
In the dialog we need to add a few statistics for which we must verify the assumptions
made by the factor analysis. If you want the Univariate Descriptives that is your choice, but to verify the
assumptions we need the KMO test of sphericity and the Anti-Image Correlation matrix.
The dialog box allows us to specify the extraction method and the cut-off value for the
extraction. Let's start with the easy one the cut-off value. Generally, SPSS can extract as many factors
as we have variables. The eigenvalue is calculated for each factor extracted. If the eigenvalue drops
below 1 it means that the factor explains less variance than adding a variable would do (all variables are
191
standardized to have mean = 0 and variance = 1). Thus we want all factors that better explain the model
than would adding a single variable.
The more complex bit is the appropriate extraction method. Principal Components (PCA) is the standard
extraction method. It does extract uncorrelated linear combinations of the variables. The first factor
has maximum variance. The second and all following factors explain smaller and smaller portions of the
variance and are all uncorrelated with each other. It is very similar to Canonical Correlation Analysis.
Another advantage is that PCA can be used when a correlation matrix is singular.
The second most common analysis is principal axis factoring, also called common factor analysis, or
principal factor analysis. Although mathematically very similar to principal components it is interpreted
as that principal axis that identifies the latent constructs behind the observations, whereas principal
component identifies similar groups of variables.
Generally speaking, principal component is preferred when using factor analysis in causal modeling, and
principal factor when using the factor analysis to reduce data. In our research question we are
interested in the dimensions behind the variables, and therefore we are going to use Principal Axis
Factoring.
The next step is to select a rotation method. After extracting the factors, SPSS can rotate the factors to
better fit the data. The most commonly used method is Varimax. Varimax is an orthogonal rotation
method (that produces independent factors = no multicollinearity) that minimizes the number of
variables that have high loadings on each factor. This method simplifies the interpretation of the
factors.
192
A second, frequently used method is Quartimax. Quartimax rotates the factors in order to minimize the
number of factors needed to explain each variable. This method simplifies the interpretation of the
observed variables.
Another method is Equamax. Equamax is a combination of the Varimax method, which simplifies the
factors, and the Quartimax method, which simplifies the variables. The number of variables that load
highly on a factor and the number of factors needed to explain a variable are minimized. We choose
Varimax.
In the dialog box Options we can manage how missing values are treated it might be appropriate to
replace them with the mean, which does not change the correlation matrix but ensures that we don't
over penalize missing values. Also, we can specify that in the output we don't want to include all factor
loadings. The factor loading tables are much easier to interpret when we suppress small factor loadings.
Default value is 0.1 in most fields. It is appropriate to increase this value to 0.4. The last step would be
to save the results in the ^ dialog. This calculates a value that every respondent would have
scored had they answered the factors questions (whatever they might be) instead. Before we save
these results to the data set, we should run the factor analysis first, check all assumptions, ensure that
the results are meaningful and that they are what we are looking for and then we should re-run the
analysis and save the factor scores.
193
The next table is the KMO and Bartlett test of sphericity. The KMO criterion can have values between
[0,1] where the usual interpretation is that 0.8 indicates a good adequacy to use the data in a factor
analysis. If the KMO criterion is less than 0.5 we cannot extract in some meaningful way.
194
The next table is the Anti-Image Matrices. Image theory splits the variance into image and anti-image.
Next we can check the correlation and covariances of the anti-image. The rule of thumb is that in the
anti-image covariance matrix only a maximum of 25% of all non-diagonal cells should be greater than
|0.09|.
The second part of the table shows the anti-image correlations the diagonal elements of that matrix are
the MSA values. Like the KMO criterion the MSA criterion shows if each single variable is adequate for a
factor analysis. A figure of 0.8 indicates good adequacy; if MSA is less than 0.5, we should exclude the
variable from the analysis. We find that although aptitude test 5 has a MSA value of .511 and might be a
candidate for exclusion, we can proceed with our factor analysis.
The next table shows the communalities. The communality of a variable is the variance of the variable
that is explained by all factors together. Mathematically it is the sum of the squared factor loadings for
each variable. A rule of thumb is for all communalities to be greater than 0.5. In our example that does
not hold true, however for all intents and purposes, we proceed with the analysis.
195
The next table shows the total explained variance of the model. The table also includes the eigenvalues
of each factor. The eigenvalue is the sum of the squared factor loadings for each factor. SPSS extracts
all factors that have an eigenvalue greater than 0.1. In our case the analysis extracts three factors. This
table also shows us the total explained variance before and after rotation. The rule of thumb is that the
model should explain more than 70% of the variance. In our example the model explains 55%.
The eigenvalues for each possible solution are graphically shown in the Scree Plot. As we find with the
Kaiser-Criterion (eigenvalue > 1) the optimal solution has three factors. However in our case we could
also argue in favor of a two factor solution, because this is the point where the explained variance
makes the biggest jump (the elbow criterion). This decision rule is similar to the rule we applied in
cluster analysis.
196
Other criteria commonly used are that they should explain 95%. In our example we would then need six
factors. Yet another criterion is that there should be less than half the number of variables in the
analysis, or as many factors that you can interpret plausibly and sensibly. Again, factor analysis is
somewhat an art.
The next two tables show the factor matrix and the rotated factor matrix. These tables are the key to
interpreting our three factors. The factor loadings that are shown in these tables are the correlation
coefficient between the variable and the factor. The factor loadings should be greater than 0.4 and the
structure should be easy to interpret.
Labeling these factors is quite controversial as every researcher would interpret them differently. The
best way to increase validity of the findings is to have this step done by colleagues and other students
that are familiar with the matter. In our example we find that after rotation, the first factor makes
students score high in reading, high on aptitude test 1, and low on aptitude tests 2, 4, and 5. The second
factor makes students score high in math, writing, reading and aptitude test 1. And the third factor
makes students score low on aptitude test 2. However, even if we can show the mechanics of the factor
analysis, we cannot find a meaningful interpretation of these factors. We would most likely need to go
back and look at the individual results within these aptitude tests to better understand what we see
here.
197
198
CHAPTER 8: Data Analysis and Statistical Consulting Services
Statistics Solutions is dedicated to facilitating the dissertation process for students by providing
statistical help and guidance to ensure a successful graduation. Having worked on my own mixed
method (qualitative and quantitative) dissertation, and with 18 years of experience in research design
and methodology, I present this SPSS user guide, on behalf of Statistics Solutions, as a gift to you.
The purpose of this guide is to enable students with little to no knowledge of SPSS to open the program
and conduct and interpret the most common statistical analyses in the course of their dissertation or
thesis. Included is an introduction explaining when and why to use a specific test as well as where to
find the test in SPSS and how to run it. Lastly, this guide lets you know what to expect in the results and
informs you how to interpret the results correctly.
^^offers a family of solutions to assist you towards your degree. If you would like to
learn more or schedule your free 30-minute consultation to discuss your dissertation research, you can
visit us at www.StatisticsSolutions.com or call us at 877-437-8622.
199
Terms of Use
We make no representation or warranty about the accuracy or completeness of the materials made
available.
We do not warrant that the materials are error free. You assume all risk with using and accessing the
materials, including without limitation the entire cost of any necessary service, or correction for any loss
or damage that results from the use and access of the materials.
Under no circumstances shall we, nor our affiliates, agents, or suppliers, be liable for any damages,
including without limitation, direct, indirect, incidental, special, punitive, consequential, or other
damages (including without limitation lost profits, lost revenues, or similar economic loss), whether in
contract, tort, or otherwise, arising out of the use or inability to use the materials available here, even if
we are advised of the possibility thereof, nor for any claim by a third party.
This material is for your personal and non-commercial use, and you agree to use this for lawful purposes
only. You shall not copy, use, modify, transmit, distribute, reverse engineer, or in any way exploit
copyrighted or proprietary materials available from here, except as expressly permitted by the
respective owner(s) thereof.
You agree to defend, indemnify, and hold us and our affiliates harmless from and against any and all
claims, losses, liabilities, damages and expenses (including attorney's fees) arising out of your use of the
materials.
The terms of use shall be governed in accordance with the laws of the state of Florida, U.S.A., excluding
its conflict of provisions. We reserve the right to add, delete, or modify any or all terms of use at
any time with or without notice.
200