Statisitvcal Methods and SPSS Latest-5
Statisitvcal Methods and SPSS Latest-5
Statisitvcal Methods and SPSS Latest-5
BY
Kimwise Aaron
PhD MIS, MBA-IT,BSE
1
Introduction
• Data analysis is only one part of the research process.
Before you can use SPSS
to analyze your data there are a number of things that need
to happen.
First, you have to design your study and choose appropriate
data collection instruments.
• Once you have conducted your study, the information
obtained must be prepared for entry into SPSS (using
something called a ‘codebook’).
•
2
Introduction
• CONTIN….
3
Introduction
• SPSS stands for Statistical Package for
Social Scientists. It is a combination of
computer software, programs and
instructions for data management. Data
management deals with data entry, storage,
retrieval, analysis and reporting
4
SPSS Data
• SPSS accept data in a rectangular form. The data
in SPSS is rectangular/tabular or a spread sheet
form i.e. data is made up of rows and columns.
Each row records a case or an entity, which can
be a respondent for each questionnaire i.e. a case
records one respondent e.g. a student. Each
column is reserved for a variable i.e. a
particular characteristic of a case or an individual
in a questionnaire.
5
6
SPSS Data cont…
• Note that there are many variables or
characteristics for each case and in this
case, a variable is a question asked in the
questionnaire e.g. for each student, the
researcher may want to know scores in
reading, writing, math and other
characteristics like sex, socio-economic
status and learner type.
7
SPSS Screen
• Starting/launching/opening SPSS window
• There are many ways of starting (opening) SPSS for
example, if there is an SPSS Icon on your desktop,
simply double click it and wait or go through the
start menu-All programs, SPSS for Windows-SPSS
24.0 for windows route or the route that appears on
top when you click on SPSS for Windows. After
this, the SPSS screen or interface will be opened,
which looks similar to Excel Spread Sheet.
8
9
Features of the SPSS screen
• Title bar (untitled- SPSS data editor): it is
called untitled because there is no current
data set. If we type in any data or open any
data set, the phrase “untitled…” changes to a
file name. e.g. “employee. Save – SPSS data
editor”. The phase SPSS data editor” is the
official name of the screen. This data editor
allows us to enter data into SPSS and edit it.
10
11
Features of the SPSS screen
• Menu bar. This is the cont…
second topmost row. It shows a list of
menus or commands for different purposes. SPSS for
windows is menu-driven i.e. whatever activity one wants to
do, a certain menu item must be selected. E.g. the file menu is
used for saving, creating, opening and closing files: the
analyse menu is used for various analyses e.g. descriptive
statistics, compare means, correlate etc); the graphs menu,
used to create graphs like pie charts, scatter/dot, histogram
etc.
•
12
13
Features of the SPSS screen
cont…
•
• Tools or icon bar. This is the third top most
row. It gives an icon or picture for some items
on the menu bar. Note that when there is no
active data set, the tools or icon bar is inactive.
To know what each icon does or is used for,
you just put the cursor on any icon.
14
15
Features of the SPSS screen
cont…
• Data grid. This is the spreadsheet that appears below
the Icon bar. This spread sheet or data grid has rows
labeled 1,2,….& columns labeled var, var,…. The
phrase var… is a short name for “variable”. It is
where the variable name is typed, then the word var.
will change to the variable name typed.
• This rectangular grid reminds us that SPSS accepts
data in a rectangular format; where rows are reserved
for cases and columns for variables.
16
17
Features of the SPSS screen cont…
• Status bar. This is at the bottom of the data grid and
it indicates the current status of the SPSS processor.
There are two statuses basically i.e. sometimes the
status bar may read “SPSS processor is ready”
meaning that it can perform the different tasks or
analyses as commanded. Sometimes it may read
“SPSS processor is unavailable” meaning that the
processor cannot perform any analysis. When
running any analysis, the status bar will indicate
accordingly.
18
19
Features of the SPSS screen cont…
• Exiting SPSS/ closing
• We can exit or close the SPSS window in
two ways; Clicking the close (red) icon at
the extreme right corner of the menu bar. Or
go through the file menu, then click exit. In
any case, we are reminded to first save the
unsaved changes.
•
20
21
Features of the SPSS screen
cont…
• Data processing using SPSS
• Data processing refers to how data are
prepared for analysis. It involves data
cording or categorizing data, entering data
into the computer, editing data and
summarizing it for presentation or
presenting it in a summary form.
22
Cording/categorizing data for
SPSS
• Before entering data, we have to code it or
categorize it first. To code your data in
SPSS, click on the variable view icon at the
bottom left of the data grid. The data grid has
columns entitled. Name….label and values
among others. Although there are other
columns, beginning SPSS users should focus
at three columns (name, label and values).
23
24
Cording/categorizing data for
SPSS..cont
• Name column: Here give a short name for
each variable or question. The name should not
exceed eight characters, no space and no
punctuation of any kind.
• Label column: Here you give/type a full name
or description of the variable. In most cases the
label is the question from the questionnaire but
it may be edited to fit data presentation.
25
26
Cording/categorizing data for
SPSS..cont
• Values column: Here categorize or give the
categories of each variable. This is done
only to categorical variables. If the
variable is numerical, then we leave this
box empty (with its none default). To create
categories click in the box (the three
dots…), then type a numerical value (e.g.1)
in the values box.
27
Cording/categorizing data for
SPSS..cont
• This number or code represents a category or a
particular answer, which is typed in the label
box. Click add to transfer your code and label
in the big box. Then type another code (say 2)
in the values box and the accompanying label
in the label box and click add. Repeat this
exercise until all categories for that question
are represented, then go to another question.
28
Cording/categorizing data for
SPSS..cont
29
Data entry
• After completing data coding, we can now start
entering our data. To do this, we click on data
view icon at the extreme bottom left corner of
the data grid. In the data view window we are
reminded to observe that while the rows are
labeled 1,2, … , the first few columns have
changed from var., var… to the short names we
gave to our variables, such as ID, Read,
30
31
Data entry
32
Table 1: Data on six variables on how students’ scores vary by SES,
ID Read Write
sex and learner type
Math SES Sex Learner type
1 10 8 4 1 2 1
2 8 7 9 1 1 1
3 12 5 5 1 2 1
4 5 4 7 1 1 1
5 11 9 6 1 2 1
6 10 10 8 1 2 1
7 7 8 10 1 1 1
8 6 5 8 1 1 1
9 8 6 8 1 1 2
10 9 8 6 1 2 2
11 8 9 5 2 2 2
12 15 15 10 2 2 2
13 12 12 14 2 1 1
14 10 10 10 2 1 1
15 10 10 8 2 2 2
16 12 12 9 2 2 2
17 11 11 8 2 2 2
18 12 12 10 2 1 1
19 10 10 13 2 1 1
20 16 16 12 3 2 2
21 13 13 10 3 2 2
22 11 11 12 3 1 2
23 13 13 12 3 1 2
24 14 14 10 3 2 2
25 10 10 12 3 1 2
26 15 15 14 3 2 2
27 17 17 10 3 2 2
28 18 18 17 3 1 1
29 17 17 15 3 2 2
30 12 12 13 1 2 2
31 13 13 9 3 2 2
32 14 14 13 3 1 1
33 12 12 8 2 2 1
33
Legend:
34
• NB: Code the data using variable view
window and save it. After coding (in the
variable view window), click data view
window to enter the data. You however
enter row by row.
35
Data Editing or Cleaning data
using SPSS
• Data editing means to check your data set to see
whether it is clean and suitable for analysis. You
can always check for errors, omissions, outliers or
values outside the permissible range and wrong
answers. It also involves data transformation i.e.
modifying your variables to make them suitable
for analysis. The following tools can help you to
check for errors, omissions and outliers; frequency
tables, Graphs, View menu and your own eyes.
36
Frequency Tables
• To compute frequency table; click analyze
• Descriptive statistics frequencies
• Transfer any variable e.g. sex into the variables box and
then click OK. (A frequency table will appear). Here
you check whether there is any awkward or unexpected
category, also called an outlier or value outside the
permissible range. For example under sex, we expect
only two categories (male & female). Any other
category apart from these two is taken to be an outlier.
37
Graphs
38
View menu
• View menu
• From the menu bar, click view, then Value
labels. This will display all the value labels
or responses, where you can easily identify
a wrong answer by looking through the
answers using own eyes.
39
• In any of the three cases above, if an error
or outlier is identified, go back to your
dataset and identify the questionnaire
number, then trace that respondent or
questionnaire, following the ID and collect
the outlier by replacing it with the correct
value.
•
40
Manipulating the data
• It looks at how SPSS helps us to change or
transform existing data to make it suitable
for analysis e.g. SPSS helps us to create a
new variable from existing variables.
Example: we may use SPSS to compute
totals/sums, means/ averages, create ranked
variables and so on.
41
Manipulating the data
• Collapsing continuous variables (e.g. read) into
groups to do some analyses such as analysis of
variance; and reducing or collapsing the number
of categories of a categorical variable.
42
Manipulating the data
• For example, if we want to compute total score
for each student in the three subjects, we may
follow the following steps;
• From the menu bar, click transform, then compute
variables.
• Type Total in the Target Variables box.
• Transfer read + write + maths into the Numeric
Expressions Box
• Click Ok
• Note that a new variable or column (totals) has
been added as the last column in the data view
window. We can now use this new variable for
other analyses, as we are yet to see.
43
Manipulating the data
• We may also use SPSS to compute the average score
for each student in the said three subjects. To create
average, we shall follow the following steps;
• From the menu bar, click transform then compute
variables
• Type Average in the Target variables box
• Then transfer (read + write + math)/3 into the numeric
expressions box.
• Click OK.
44
Manipulating the data
• Again note that a new variable or column
(Average) has been added as the last
column in the data view window. We can
now use this new variable for other
analyses, as we are yet to see.
45
Manipulating the data
• Collapsing a continuous variable into groups.
• For some analyses (e.g. Analysis of Variance) you
may wish to divide the sample into equal groups
according to respondents’ scores on some variable
(e.g. to give low, medium and high scoring groups).
This technique leaves
• the original variable measured as a continuous
variable, intact so that you can use it for other analyses
46
Manipulating the data
• To illustrate this process, we will use the data file that you
saved.
• Procedure for collapsing a continuous variable into groups.
• 1. From the menu at the top of the screen click on: Transform, and
choose Visual Bander.
• 2. Select the continuous variable that you want to use (e.g. read).
Transfer it into the Variables to Band box. Click on the Continue
button.
• 3. In the Visual Bander screen that appears, click on the variable to
highlight it.
• A histogram, showing the distribution of read scores should appear.
47
Manipulating the data
• At the section at the top labelled Banded
Variable type in the name for the new
• categorical variable that you will create (e.g.
Readband3).You can also change the
• suggested label that is shown (e.g. to read in 3
groups).
• 5. Click on button labelled Make Cutpoints,
and then OK.
48
Manipulating the data
• SPSS will try to put a percentage of the sample in each group.
Click on the Apply
• button.
• 6. Click on the Make Labels button back in the main dialogue
box. This will automatically generate value labels for each of
the new groups created. You can modify these if you wish by
clicking in the cells of the Grid.
• 7. Click on OK and a new variable (Readband3) will appear at
the end of your data file (go back to your Data Editor window,
choose the Variable View tab, and it should be at the bottom).
49
Manipulating the data
• Run Descriptives, Frequencies on your
newly created variable (Ageband3) to check
• the number of cases in each of the
categories.
50
Manipulating the data
• In the dialogue box that appears click on the option
Equal Percentiles Based on Scanned Cases. In
• the box Number of Cutpoints specify a number
one less than the number of
• groups that you want (e.g. if you want three
groups, type in 2 for cutpoints). In
• the Width (%) section below you will then see
33.33 appear—this means that
51
Presenting or summarizing
data using SPSS
52
Frequency tables
• Frequency tables are perhaps the most popular tools for data
presentation, because of their efficiency or ability to present so
much data in a small space. However we must note that frequency
tables are only used to present data which is categorical in nature.
To generate a frequency table use the following steps:
• Click Analyze, then Descriptive statistics and then Frequencies.
Transfer any categorical variable e.g. sex into the variables box
and then
• Click OK. NB: In case there are other variables in the variables
box, before you transfer the variables, first click reset, to remove
any existing variable from the box.
53
Tutorial problem
55
Types of frequency tables
• To generate a cross tabulation or two way table, we
may follow the following steps;
• Click Analyze, then descriptive statistics, then
Cross tabs.
• Transfer the first categorical variable e.g. sex into
the Rows box
• Transfer the second categorical variable e.g. SES
into the Column’s box
• Click Ok
56
Tutorial problem
57
DESCRIPTIVE DATA ANALYSIS USING SPSS
• Descriptive data analysis is an attempt to
describe data collected from a specific
sample, without any attempt to generalize
results to the population from which the
sample was chosen. In this section we see
how SPSS helps us to describe our data
using four tools as per this section headings;
58
Analysis of frequencies or frequency counts using
• Frequency counts andSPSS their corresponding relative
frequencies or percentages help data analysts to make sense
of their data in a one-way or univariate frequency table. Let
us teach ourselves on how SPSS helps us generate frequency
tables showing frequency counts and relative frequencies.
• From our sample data set, generate a simple, univariate or
one way frequency table for the respective categorical
variables there in (SES, sex and Ltype). Transfer all these
into the variables box and click Ok.
59
The output generated from this procedure is shown
below
60
• From the results you obtain, note that SPSS gives
relative frequencies under two columns, labeled
Percent and Valid Percent respectively.
However, in interpretation of our data, we should
ignore the Valid Percent and use the Percent
column, in case there are no missing responses.
But if there are some missing responses, then we
should take the Valid Percent column, which
ignores the missing scores.
61
• Also note that SPSS gives the Cumulative
Percent column, but we should remember
what we saw in our Statistical Methods,
that Cumulative Percents only make sense
if data are for ordinal or ranked variables.
• Also remember that apart from helping us
generate one-way, simple or univariate
frequency tables, SPSS can help us generate
Two-way, Complex or Cross-tabulation
frequency tables, which pertain two
variables.
62
• Let us teach ourselves on how SPSS helps
us generate Two-way, Complex or Cross-
tabulation frequency tables.
• From our sample data set, generate a Two-
way, Complex or Cross-tabulation
frequency table for the respective
categorical variables there in (SES Vs sex
and Ltype; sex Vs Ltype).
63
Procedure for cross-tabulation
64
Complex or Cross-tabulation
frequency table for SES and SEX
65
Analysis of Central tendencies or location using
• We should note now that SPSS
we cannot use frequency tables to
present or describe data on numeric variables. Thus data on
numeric variables is presented or described using measures
of central tendencies or location, where data is simply
summarised in tables (not frequency tables), describing our
sample using these statistics. We should remember from
our Statistics, that most numerical variables tend to have
the so called Normal frequency distribution or curve,
where majority of the scores tend to be located in the
centre.
66
Analysis of Central tendencies or location using
SPSS
• SPSS helps us determine the extent to which a given
numerical variable is normally distributed by helping us
to generate the pertinent frequency Histograms or curves.
• We should note that when describing any numerical data
set, we want to locate its centre where most scores tend to
be located. This is achieved by measures of central
tendencies or location such as average, model, median
and mean. Let us see how SPSS help us describe our data
by generating these;
67
To generate (arithmetic) mean and median for
numerical variables,
• Click Analyse, Descriptive Statistics and then Explore.
• Transfer a numeric variable (e.g. read) into the
Dependent List box
• Click Ok
• Note that among these statistics, SPSS gives us the
(arithmetic) mean and median. However it does not give
us Mode because model scores as measures of central
tendency or location are more suitable for categorical
variables.
68
The output generated from this
procedure is shown below
69
Analysis of Dispersion using
SPSS
• We should note that while two data sets may
have a common measure of central tendency or
location, the dispersion of the scores may differ.
Thus in addition to measures of central tendency
or location, we also need measures of dispersion
such as range, variance and standard deviation
of scores in a given data set. Let us see how
SPSS helps us to generate these indices.
70
To generate range, variances and standard deviations
for any numerical score,
71
The output generated from this
procedure is shown below
72
Analysis of Skew using SPSS
• We should also note that while some numerical variables tend to have the
so-called normal frequency distribution or curve, where most
observations are located in the centre, some curves tend to deviate from
this normality and become skewed, either positively or negatively. SPSS
can help us determine the extent to which data on a given numeric
variable (e.g. read) is skewed, by helping us generate pertinent
Histograms or curves. You can also observe that among the several
statistics SPSS gives you, there is a measure or a statistic on skewness.
The calculation or figure will show you the direction of the skew,
whether positive or negative. Remember that a negative answer implies
that data is skewed to the left, while a positive answer implies that data is
skewed to the right and a zero implies that data is normally distributed.
73
Graphs
• SPSS for Windows provides a number of different types of
graphs (referred
• to by SPSS as charts). In this chapter I’ll cover the basic
procedures to obtain
• the following graphs:
• histograms;
• • bar graphs;
• • scatterplots;
• • boxplots; and
• • line graphs.
74
Histograms
• Histograms are used to display the
distribution of a single continuous variable
• (e.g. read, write, math).
75
Procedure for creating a histogram
1. From the menu at the top of the screen click on:
Graphs, followed by legacy dialogs then click on
Histogram.
2. Click on your variable of interest and move it into the
Variable box. This should be
a continuous variable (e.g. read).
3. Click on Display normal curve. This option will give
you the distribution of your variable
and, superimposed over the top, how a normal curve for
this distribution would look.
4. If you wish to give your graph a title click on the Titles
button and type the desired
title in the box (e.g. Histogram of reading scores).
5. Click on Continue, and then OK.
76
The output generated from this
procedure is shown below.
77
Interpretation of output from Histogram
Inspection of the shape of the histogram provides
information about the distribution of scores on the
continuous variable. we assume that the scores on
each of the variables are normally distributed (i.e.
follow the shape of the normal curve). In this
example, the scores are reasonably normally
distributed, with most scores occurring in the centre,
tapering out towards the extremes. It is quite
common in the social
78
Bar graphs
• Bar graphs can be simple or very complex, depending on how
many variables
• you wish to include. The bar graph can show the number of cases
in particular
• categories, or it can show the score on some continuous variable
for different
• categories. Basically you need two main variables—one
categorical and one
• continuous. You can also break this down further with another
categorical
• variable if you wish.
79
Procedure for creating a bar graph
• 1. From the menu at the top of the screen click on:
Graphs, followed by legacy dialogs ; then Bar.
• 2. Click on Clustered.
• 3. In the Data in chart are section, click on
Summaries for groups of cases. Click on Define.
• 4. In the Bars represent box, click on Other
summary function.
• 5. Click on the continuous variable you are interested
in (e.g. read).
• This should appear in the box listed as Mean (Total
reading). This indicates that the mean on the reading
Scale for the different groups will be displayed.
80
Procedure for creating a bar
graph
• Click on your first categorical variable (e.g. SES).
Click on the arrow button to move it into the
Category axis box. This variable will appear across
the bottom of your bar graph (X axis).
• 7. Click on another categorical variable (e.g. sex) and
move it into the Define Clusters
• by: box. This variable will be represented in the
legend.
• 8. Click on OK.
81
The output generated from this procedure,
after it has been slightly modified, is shown
below.
82
Interpretation of output from Bar Graph
83
Scatterplots
• Scatterplots are typically used to explore the
relationship between two continuous
• variables (e.g. reading and writing). It is a good
idea to generate a scatterplot, before calculating
correlations . The scatterplot will give you an
indication of whether your variables are related
in a linear (straight-line) or curvilinear fashion.
Only linear relationships are suitable for
correlation analyses.
84
Scatterplots..cont
• The scatterplot will also indicate whether your variables
are positively related (high scores on one variable are
associated with high scores on the other) or negatively
related (high scores on one are associated with low scores
on the other). For positive correlations, the points form a
line pointing upwards to the right (that is, they start low
on the left-hand side and move higher on the right).
• For negative correlations, the line starts high on the left
and moves down on the
• right (see an example of this in the output below).
85
Scatterplots..cont
• The scatterplot also provides a general
indication of the strength of the relationship
between your two variables. If the
relationship is weak, the points will be all
over the place, in a blob-type arrangement.
For a strong relationship the points will form
a vague cigar shape, with a definite clumping
of scores around an imaginary straight line.
86
•
Scatterplots..cont
In the example that follows I request a scatterplot of scores
on reading and writing . I have asked for two groups in my
sample (males and females) to be represented separately on
the one scatterplot (using different symbols).
• This not only provides me with information concerning my
sample as a whole but also gives additional information on
the distribution of scores for males and females. If you wish
to obtain a scatterplot for the full sample (not split by group),
• just ignore the instructions below in the section labelled ‘Set
Markers by’.
87
Procedure for creating a scatterplot
• 1. From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs; then on Scatter.
• 2. Click on Simple and then Define.
• 3. Click on your first variable, usually the one you consider is the
dependent variable,
• (e.g. reading ).
• 4. Click on the arrow to move it into the box labelled Y axis. This
variable will appear
• on the vertical axis.
• 5. Move your other variable (e.g. writing) into the box labelled X
axis. This variable will appear on the horizontal axis.
88
Procedure for creating a scatterplot
• You can also have SPSS mark each of the points according to
some other categorical variable (e.g. sex). Move this variable into
the Set Markers by: box.
• This will display males and females using different markers.
• 7. Move the ID variable in the Label Cases by: box. This will
allow you to find out the ID number of a case from the graph if
you find an outlier.
• 8. If you wish to attach a title to the graph, click on the Titles
button. Type in the desired
• title and click on Continue.
• 9. Click on OK
89
The output generated from this procedure, modified
slightly for display purposes, is
shown below
90
Interpretation of output from
Scatterplot
• From the output above, there appears to be a strong,
positive correlation between the two variables (reading
and writing) for the sample as a whole. However, there is
no indication of a curvilinear relationship, so it would be
appropriate to calculate a Pearson correlation coefficient
for these two variables . Remember, the scatterplot does
not give you definitive answers; you need to follow it up
with the calculation of the appropriate statistic (in this
case, Pearson correlation coefficient).
91
Boxplots
• Boxplots are useful when you wish to compare the
distribution of scores on variables. You can use
them to explore the distribution of one continuous
• variable (e.g. reading) or alternatively you can ask
for scores to be broken down for different groups
(e.g. SES). You can also add an extra categorical
• variable to compare (e.g. males and females). In the
example below I will explore the distribution of
scores on the reading scale for males and females.
92
Procedure for creating a boxplot
• 1. From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs ; then click on Boxplot.
• 2. Click on Simple. In the Data in Chart Are section click on
Summaries for groups of cases. Click on the Define button.
• 3. Click on your continuous variable (e.g. reading). Click the
arrow button to move it into the Variable box.
• 4. Click on your categorical variable (e.g. sex). Click on the
arrow button to move into the Category axis box.
• 5. Click on ID and move it into the Label cases box. This will allow you to
identify the ID numbers of any cases with extreme values. 6. Click on OK.
93
The output generated from this
procedure is shown below.
94
Interpretation of output from
Boxplot
• The output from Boxplot gives you a lot of
information about the distribution of your
continuous variable and the possible
influence of your other categorical variable
(and cluster variable if used).
95
Interpretation of output from
Boxplot..cont
• Each distribution of scores is represented by a
box and protruding lines (called whiskers).
The length of the box is the variable’s
interquartile range and contains 50 per cent of
cases. The line across the inside of the box
represents the median value. The whiskers
protruding from the box go out to the
variable’s smallest and largest values.
96
Interpretation of output from Boxplot..cont
• Any scores that SPSS considers are outliers appear as little
circles with a number attached (this is the ID number of the
case). Outliers are cases with
• scores that are quite different from the remainder of the
sample, either much higher or much lower. SPSS defines
points as outliers if they extend more than 1.5 box-lengths
from the edge of the box. Extreme points (indicated with an
asterisk, *) are those that extend more than 3 box-lengths
from the edge of the box. In the example
• above there are no outliers at the low values for reading for
both males and females.
97
Interpretation of output from
Boxplot..cont
• In addition to providing information on outliers, a
boxplot allow you to inspect the pattern of scores
for your various groups. It provides an indication
of the variability in scores within each group and
allows a visual inspection of the differences
between groups. In the example presented above,
the
• distribution of scores on reading for males and
females is not similar.
98
Line graphs
• A line graph allows you to inspect the mean scores
of a continuous variable across a number of
different values of a categorical variable (e.g. SES,
Learner type). They are also useful for graphically
exploring the results of a one- or two-way analysis
of variance. Line graphs are provided as an optional
extra in the output of analysis of variance. This
procedure shows you how to generate a line graph
without having to run ANOVA.
99
Procedure for creating a line graph
• From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs ; then click on Line.
• 2. Click on Multiple. In the Data in Chart Are section,
click on Summaries for groups of cases. Click on Define.
• 3. In the Lines represent box, click on Other summary
function. Click on the continuous variable you are
interested in (e.g. reading). Click on the arrow button. The
variable should appear in the box listed as Mean (reading).
This indicates that the mean on the reading Scale for the
different groups will be displayed.
100
Procedure for creating a line graph
• 4. Click on your first categorical variable (e.g.
SES). Click on the arrow button to move it into the
Category Axis box. This variable will appear
across the bottom of your line graph (X axis).
• 5. Click on another categorical variable (e.g. sex)
and move it into the Define Lines
• by: box. This variable will be represented in the
legend.
• 6. Click on OK.
101
The output generated from this procedure, modified
slightly for display purposes, is
shown below.
102
Interpretation of output from Line Graph
• The line graph displayed above contains a good
deal of information.
• First, you can look at the impact of SES on mean
scores in reading for each of the sexes separately.
Female with high SES appear to have higher
mean scores in reading than either in median or
low SES . For females and male in the median
SES have almost the same mean scores in reading.
103
Interpretation of output from
Line Graph
• The results presented above suggest that to understand
the impact of SES on reading you must consider the
respondents’ gender. This sort of relationship is referred
to, when doing analysis of variance, as an interaction
effect. While the use of a line graph does not tell you
whether this relationship is statistically significant, it
certainly gives you a lot of information and raises a lot
of additional questions.
104
Exercises for practices
check given data files
105
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation coefficient
(PLCC)
• The PLCC is used to test for a relationship between two variables IV &
DV which are both numerical in nature. For example if we are interested
in establishing whether students’ scores in reading (numerical variable)
and scores in writing (numerical variable) are significantly related; that is
linearly correlated. In this case, we shall test a research hypothesis that;
• H1: the mean scores in reading and in writing are significantly linearly
correlated,
• Against a null hypothesis that
• H0: the mean scores in reading and in writing are not significantly linearly
correlated.
•
107
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation
coefficient (PLCC)
108
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation
coefficient (PLCC)
• To compute the PLCC statistic (r), we can take the
following steps;
• Click Analyze, then Correlate and then Bivariate
• Transfer read and write into the Variables box
• Click Options /Statistics and flag Means and
Standard Deviations
• Click Continue and then Ok.
• NB. Before you click Ok, ensure that the word
Pearson’s Correlation is ticked/flagged.
109
Analysis of Correlation between two numerical variables;
using Pearson’s linear correlation coefficient (PLCC)
• At least two tables will be generated, one for
Descriptive Statistics and another for
Correlations which can be summarized one as
follows;
Pearson’s Correlation results for students’
Scores in reading and writing
Variables Sample size Sample mean Sample std Deviation r-value Sig.
Write 33
11.09 3.60
110
Analysis of Correlation between two numerical variables; using
Pearson’s linear correlation coefficient (PLCC)
111
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation
coefficient (PLCC)
• SPSS saves us from bother of going through the
tedious steps of calculations, by giving us the r-
statistic (r-value) together with its
accompanying significance (sig.) level or p-
value. This sig. level or p-value, behaves in such
a way that as the computed or observed r-
statistic (r-value) increases or becomes bigger or
more significant, the sig. or p-value reduces.
112
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation
coefficient (PLCC)
113
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation
coefficient (PLCC)
114
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation coefficient
(PLCC)
• Determining the strength of the relationship
• The other thing to consider in the output in the table
produced is the size of the value of Pearson correlation
(r). This can range from –1.00 to 1.00. This value will
indicate the strength of the relationship between your two
variables. A correlation of 0 indicates no relationship at
all, a correlation of 1.0 indicates a perfect positive
correlation, and a value of –1.0 indicates a perfect
negative correlation. How do you interpret values
between 0 and 1?
115
Determining the strength of the relationship
• Different authors suggest different interpretations;
however, Cohen (1988) suggests the following
guidelines:
• r=.10 to .29 or r=–.10 to –.29 small
• r=.30 to .49 or r=–.30 to –.4.9 medium
• r=.50 to 1.0 or r=–.50 to –1.0 large
• These guidelines apply whether or not there is a negative
sign out the front of your r value. Remember, the
negative sign refers only to the direction of the
relationship, not the strength. The strength of correlation of
r=.5 and r=–.5 is the same
116
Determining the strength of the relationship cont,…
117
Tutorial problem
• a) Repeat the same example by testing
whether mean scores in reading and Maths
are significantly co-related.
• b) Repeat the same example by testing
whether mean scores in writing and Maths
are significantly co-related.
118
Analysis of Correlation between two ordinal or ranked
variables; using Spearman’s rank correlation
coefficient (SRCC) using SPSS
• In SRCC test, we are interest in testing two variables IV&
DV which are both ordinal or ranked are significantly
correlated. We should observe however that the SRCC is
much easier in computation and is similar in interpretation
to the PLCC test, already seen in this paper, because, the
SRCC is an approximation of the PLCC.
119
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
• The Pearson’s Chi-square (χ2) test is used to test for a
relationship between two variables IV & DV which are both
categorical in nature. For example if we are interested in
testing whether students’ socio-economic status and sex (both
categorical variable) are significantly co-related, we shall test a
research hypothesis that;
• H1: the two categorical variables (SES and sex) are
significantly co-related,
• Against a null hypothesis that
• H0: the two categorical variables (SES and sex) are not
significantly co-related,
•
120
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
• To test this null hypothesis, SPSS can be
used as follows;
• Click Analyze, then Descriptive Statistics
and then Crosstabs
• Transfer SES into the Rows box
• Transfer Sex into the Columns box
• Click Statistics and flag Chi-square
• Click Continue and then Ok.
121
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
• At least two tables will be generated, one
for Cross tabulations and another for Chi-
Square Tests which can be summarized
into one as follows;
• Table produced : Chi-square test results
for students’ SES and sex
Categories of SES Categories of sex χ2 Sig.
Total
Male Female
Low
5 6 11
Medium
4 6 10
0.068 0.966
High
5 7 12
Total
14 19 33
122
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
123
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
124
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
• Thus, to check whether the observed χ2-statistics is
statistically significant or big enough, all what we need
to do is to check whether its accompanying sig. or p-
value is small enough; that is less than the popular sig. or
p-value in social sciences of α = 0.05 or 5%, in which
case we reject the null hypothesis and accept the
alternative.
125
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
126
Tutorial problem
127
Exploring Differences Between Groups Or
Comparative Data Analysis
• In comparative data analysis we want to compare
two variables (IV &DV) where the IV is categorical
and the DV is numerical. In other words we want to
compare variables and establish whether there exists
a big difference between two variables,
•
• where the IV is categorical and the DV is
numerical. Mostly analyses involve comparing the
mean score for each group on one or more dependent
variables. There are a number of different but related
statistics in this group. The main techniques are very
briefly discussed in the following slides.
128
T-tests
• T-tests are used when you have two groups
(e.g. males and females) or two sets
• of data (before and after), and you wish to
compare the mean score on some
continuous variable. There are two main
types of t-tests. Paired sample t-tests
• (also called repeated measures) are used
when you are interested in changes in
scores for subjects tested at Time 1, and
then again at Time 2 (often after some
intervention or event).
129
T-tests…cont
• The samples are ‘related’ because they are the
same people tested each time. Independent
sample t-tests are used when you have two
different (independent) groups of people (males
and females), and you are interested in
comparing their scores. In this case you collect
information on only one occasion, but from two
different sets of people.
130
Example of T-test
• For example, we may be interested in testing whether
the mean scores in reading (a numerical DV) differed
significantly according to sex (a binary categorical
IV, with two categories, male and female).
131
Example of T-test
• In other words if we let μm and μf represent
the mean scores in reading for all male and
female students respectively, then we shall
test a research that;
• H1: the mean scores in reading for male and
female students differ significantly.
• Against a null hypothesis that;
132
Example of T-test
• H0: the mean scores in reading for male and
female students do not differ significantly.
• i.e. H1: μm ≠ μf against a null hypothesis that; H0: μm =
μf.
133
Example of T-test
• SPSS relieve us from the tedious calculations as it can
help us test this null hypothesis easily through the
following procedures;
• To compute the t-test,
• Click Analyze, Compare means and Independent
samples t-test,
• Transfer read into the Test Variables box
• Transfer sex into the Grouping Variables box.
• Click Define Groups and then type 1 in group 1 box
and 2 in group 2 box (1 and 2 refer to the codes you
gave for male and female respectively).
• Click Continue and then Ok.
134
Example of T-test
• At least two tables will be generated, one for
Group (descriptive) statistics and another for
the t-statistic and its sig. or p-value. The two
tables can be summarized into one as follows;
• Tables produced: Descriptive statistics and t-
test results for student’s scores in reading by
sex
Categories of sex Sample size Sample means Sample std dev t Sig or p-value
135
Example of T-test
• Note that the computed t-statistic is not computed
for its own sake; rather, it is to help us test the
previously stated null hypothesis, on whether the
mean scores in reading for male and female
students differ significantly.
• In testing this null hypothesis we are actually
posing a question; is the computed/observed t-
statistic statistically significant or big enough for
us to reject the H0?
136
Example of T-test
•
137
Tutorial problem
• a) Repeat the same procedure by testing whether the
mean scores in writing differ significantly according
to sex.
• b) Repeat the same procedure by testing whether the
mean scores in Maths differ significantly according to
sex.
• c) Repeat the same procedure by testing whether the
mean scores in reading differ significantly according
to learner type.
138
Tutorial problem
• a) Repeat the same procedure by testing whether the mean
scores in writing differ significantly according to sex.
• b) Repeat the same procedure by testing whether the mean
scores in maths differ significantly according to sex.
• c) Repeat the same procedure by testing whether the mean
scores in reading differ significantly according to learner type.
• d) Repeat the same procedure by testing whether the mean
scores in writing differ significantly according to learner type.
• e) Repeat the same procedure by testing whether the mean
scores in maths differ significantly according to learner type.
139
Tutorial problem
• d) Repeat the same procedure by testing
whether the mean scores in writing differ
significantly according to learner type.
• e) Repeat the same procedure by testing
whether the mean scores in Maths differ
significantly according to learner type.
140
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• Fisher’s Analysis of Variance is a generalization of the
student’s two independent samples t-test. It is a comparative
tool used to analyze data where the researcher is interested
in comparing means of a categorical IV, which has more
than two categories with a numerical DV.
141
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• If we let μL, μm, and μH stand for the mean scores in reading
for low, medium and high socio-economic status students
respectively, then we shall test a research hypothesis that;
142
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
143
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• At least two tables will be generated, one for
Descriptives and the second for ANOVA results.
These two tables can be summed up into one table that
appears as follows;
• Table produced : Descriptive statistics and ANOVA
(F) results for students’ scores in reading by SES
Categories of SES Sample size Sample mean Sample std F-value sig
Deviation
144
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• Note that the computed F-statistic in Table produced is not
computed for its own sake; rather it is to help us test the null
hypothesis on equality of mean scores in reading for all students in
the three socio-economic statuses. In testing this null hypothesis, we
are asking a question; is the observed F-statistic in Table
statistically significant or big enough for us to reject the null
hypothesis?
145
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• This sig. level or p-value, behaves in such a way that as the
computed or observed F-statistic increases or becomes bigger
or more significant, the sig. or p-value reduces.
146
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• From our example, since the sig. or p. value
(0.000 ) is less than 0.05 then, at α =
0.05 or at 5% level of significance, we
reject the null hypothesis and accept the
alternative one and we conclude or infer
that the mean scores in reading differ
significantly according to students’ socio-
economic status.
147
Tutorial problem
• ) Repeat the same example by testing whether
mean scores in writing differ significantly
according to socio-economic status.
• b) Repeat the same example by testing whether
mean scores in Maths differ significantly
according to socio-economic status.
•
148
Comparison of two or more population means for
equality; Fishers’ Two-way analysis of variance
• Two-way analysis of variance allows you to test
the impact of two independent variables on one
dependent variable. The advantage of using a
two-way ANOVA is that it allows you to test for
an interaction effect—that is, when the effect of
one independent variable is influenced by
another; for example, when you suspect that
performance in reading varies with SES, but
only for males.
149
Comparison of two or more population means for
equality; Fishers’ Two-way analysis of variance
• It also tests for ‘main effects’—that is, the
overall effect of each independent variable (e.g.
sex, ses and ltype).
• There are two different two-way ANOVAs:
between-groups ANOVA (when the groups are
different) and repeated measures ANOVA
(when the same people are tested on more than
one occasion).
150
• Factor analysis
• Factor analysis allows you to condense a large set of variables
or scale items
• down to a smaller, more manageable number of dimensions or
factors. It does
• this by summarising the underlying patterns of correlation and
looking for ‘clumps’
• or groups of closely related items. This technique is often used
when developing
• scales and measures, to identify the underlying structure
151