Statisitvcal Methods and SPSS Latest-5

SPSS LECTURE
BY
Kimwise Aaron
PhD MIS, MBA-IT,BSE
1
Introduction
• Data analysis is only one part of the research process.
Before you can use SPSS
to analyze your data there are a number of things that need
to happen.
First, you have to design your study and choose appropriate
data collection instruments.
• Once you have conducted your study, the information
obtained must be prepared for entry into SPSS (using
something called a ‘codebook’).
•
2
Introduction
• CONTIN….
• To enter the data into SPSS, you must

understand how SPSS works and how to
talk to it appropriately
3
Introduction
• SPSS stands for Statistical Package for
Social Scientists. It is a combination of
computer software, programs and
instructions for data management. Data
management deals with data entry, storage,
retrieval, analysis and reporting
4
SPSS Data
• SPSS accept data in a rectangular form. The data
in SPSS is rectangular/tabular or a spread sheet
form i.e. data is made up of rows and columns.
Each row records a case or an entity, which can
be a respondent for each questionnaire i.e. a case
records one respondent e.g. a student. Each
column is reserved for a variable i.e. a
particular characteristic of a case or an individual
in a questionnaire.
5
6
SPSS Data cont…
• Note that there are many variables or
characteristics for each case and in this
case, a variable is a question asked in the
questionnaire e.g. for each student, the
researcher may want to know scores in
reading, writing, math and other
characteristics like sex, socio-economic
status and learner type.
7
SPSS Screen
• Starting/launching/opening SPSS window
• There are many ways of starting (opening) SPSS for
example, if there is an SPSS Icon on your desktop,
simply double click it and wait or go through the
start menu-All programs, SPSS for Windows-SPSS
24.0 for windows route or the route that appears on
top when you click on SPSS for Windows. After
this, the SPSS screen or interface will be opened,
which looks similar to Excel Spread Sheet.
8
9
Features of the SPSS screen
• Title bar (untitled- SPSS data editor): it is
called untitled because there is no current
data set. If we type in any data or open any
data set, the phrase “untitled…” changes to a
file name. e.g. “employee. Save – SPSS data
editor”. The phase SPSS data editor” is the
official name of the screen. This data editor
allows us to enter data into SPSS and edit it.
10
11
• Menu bar. This is the cont…
second topmost row. It shows a list of
menus or commands for different purposes. SPSS for
windows is menu-driven i.e. whatever activity one wants to
do, a certain menu item must be selected. E.g. the file menu is
used for saving, creating, opening and closing files: the
analyse menu is used for various analyses e.g. descriptive
statistics, compare means, correlate etc); the graphs menu,
used to create graphs like pie charts, scatter/dot, histogram
etc.
•
12
13
cont…
•
• Tools or icon bar. This is the third top most
row. It gives an icon or picture for some items
on the menu bar. Note that when there is no
active data set, the tools or icon bar is inactive.
To know what each icon does or is used for,
you just put the cursor on any icon.
14
15
cont…
• Data grid. This is the spreadsheet that appears below
the Icon bar. This spread sheet or data grid has rows
labeled 1,2,….& columns labeled var, var,…. The
phrase var… is a short name for “variable”. It is
where the variable name is typed, then the word var.
will change to the variable name typed.
• This rectangular grid reminds us that SPSS accepts
data in a rectangular format; where rows are reserved
for cases and columns for variables.
16
17
Features of the SPSS screen cont…
• Status bar. This is at the bottom of the data grid and
it indicates the current status of the SPSS processor.
There are two statuses basically i.e. sometimes the
status bar may read “SPSS processor is ready”
meaning that it can perform the different tasks or
analyses as commanded. Sometimes it may read
“SPSS processor is unavailable” meaning that the
processor cannot perform any analysis. When
running any analysis, the status bar will indicate
accordingly.
18
19
Features of the SPSS screen cont…
• Exiting SPSS/ closing
• We can exit or close the SPSS window in
two ways; Clicking the close (red) icon at
the extreme right corner of the menu bar. Or
go through the file menu, then click exit. In
any case, we are reminded to first save the
unsaved changes.
•
20
21
cont…
• Data processing using SPSS
• Data processing refers to how data are
prepared for analysis. It involves data
cording or categorizing data, entering data
into the computer, editing data and
summarizing it for presentation or
presenting it in a summary form.
22
Cording/categorizing data for
SPSS
• Before entering data, we have to code it or
categorize it first. To code your data in
SPSS, click on the variable view icon at the
bottom left of the data grid. The data grid has
columns entitled. Name….label and values
among others. Although there are other
columns, beginning SPSS users should focus
at three columns (name, label and values).
23
24
SPSS..cont
• Name column: Here give a short name for
each variable or question. The name should not
exceed eight characters, no space and no
punctuation of any kind.
• Label column: Here you give/type a full name
or description of the variable. In most cases the
label is the question from the questionnaire but
it may be edited to fit data presentation.
25
26
SPSS..cont
• Values column: Here categorize or give the
categories of each variable. This is done
only to categorical variables. If the
variable is numerical, then we leave this
box empty (with its none default). To create
categories click in the box (the three
dots…), then type a numerical value (e.g.1)
in the values box.
27
SPSS..cont
• This number or code represents a category or a
particular answer, which is typed in the label
box. Click add to transfer your code and label
in the big box. Then type another code (say 2)
in the values box and the accompanying label
in the label box and click add. Repeat this
exercise until all categories for that question
are represented, then go to another question.
28
SPSS..cont
29
Data entry
• After completing data coding, we can now start
entering our data. To do this, we click on data
view icon at the extreme bottom left corner of
the data grid. In the data view window we are
reminded to observe that while the rows are
labeled 1,2, … , the first few columns have
changed from var., var… to the short names we
gave to our variables, such as ID, Read,
30
31
Data entry
• Write, Maths, SES, Sex and Ltype. Also

note that when you place the cursor at any
of those short names in the columns, the
variable label is shown. Now that the code
sheet has been created, use table 1 to enter
data into the data grid.
32
Table 1: Data on six variables on how students’ scores vary by SES,
ID Read Write
sex and learner type
Math SES Sex Learner type
1 10 8 4 1 2 1
2 8 7 9 1 1 1
3 12 5 5 1 2 1
4 5 4 7 1 1 1
5 11 9 6 1 2 1
6 10 10 8 1 2 1
7 7 8 10 1 1 1
8 6 5 8 1 1 1
9 8 6 8 1 1 2
10 9 8 6 1 2 2
11 8 9 5 2 2 2
12 15 15 10 2 2 2
13 12 12 14 2 1 1
14 10 10 10 2 1 1
15 10 10 8 2 2 2
16 12 12 9 2 2 2
17 11 11 8 2 2 2
18 12 12 10 2 1 1
19 10 10 13 2 1 1
20 16 16 12 3 2 2
21 13 13 10 3 2 2
22 11 11 12 3 1 2
23 13 13 12 3 1 2
24 14 14 10 3 2 2
25 10 10 12 3 1 2
26 15 15 14 3 2 2
27 17 17 10 3 2 2
28 18 18 17 3 1 1
29 17 17 15 3 2 2
30 12 12 13 1 2 2
31 13 13 9 3 2 2
32 14 14 13 3 1 1
33 12 12 8 2 2 1
33
Legend:
• Read= scores in reading

• Write = scores in writing
• Math = scores in mathematics
• SES = social economic status (1 = low; 2 =
medium; 3 = high)
• Sex = students’ sex (1= male; 2 = female)
• Ltype = student’s learner type (1 = slow; 2=
fast)
34
• NB: Code the data using variable view
window and save it. After coding (in the
variable view window), click data view
window to enter the data. You however
enter row by row.
35
Data Editing or Cleaning data
using SPSS
• Data editing means to check your data set to see
whether it is clean and suitable for analysis. You
can always check for errors, omissions, outliers or
values outside the permissible range and wrong
answers. It also involves data transformation i.e.
modifying your variables to make them suitable
for analysis. The following tools can help you to
check for errors, omissions and outliers; frequency
tables, Graphs, View menu and your own eyes.
36
Frequency Tables
• To compute frequency table; click analyze
• Descriptive statistics frequencies
• Transfer any variable e.g. sex into the variables box and
then click OK. (A frequency table will appear). Here
you check whether there is any awkward or unexpected
category, also called an outlier or value outside the
permissible range. For example under sex, we expect
only two categories (male & female). Any other
category apart from these two is taken to be an outlier.
37
Graphs
•To compute frequency table; click analyze

• Descriptive statistics frequencies
•Transfer any variable e.g. sex, into the variables
box, then Click charts and flag (tick) say Pie
charts. Click continue and then Ok. In case of
any error or outlier, there will be a chart portion
for the awkward category.
38
View menu
• View menu
• From the menu bar, click view, then Value
labels. This will display all the value labels
or responses, where you can easily identify
a wrong answer by looking through the
answers using own eyes.
39
• In any of the three cases above, if an error
or outlier is identified, go back to your
dataset and identify the questionnaire
number, then trace that respondent or
questionnaire, following the ID and collect
the outlier by replacing it with the correct
value.
•
40
Manipulating the data
• It looks at how SPSS helps us to change or
transform existing data to make it suitable
for analysis e.g. SPSS helps us to create a
new variable from existing variables.
Example: we may use SPSS to compute
totals/sums, means/ averages, create ranked
variables and so on.
41
• Collapsing continuous variables (e.g. read) into
groups to do some analyses such as analysis of
variance; and reducing or collapsing the number
of categories of a categorical variable.
42
• For example, if we want to compute total score
for each student in the three subjects, we may
follow the following steps;
• From the menu bar, click transform, then compute
variables.
• Type Total in the Target Variables box.
• Transfer read + write + maths into the Numeric
Expressions Box
• Click Ok
• Note that a new variable or column (totals) has
been added as the last column in the data view
window. We can now use this new variable for
other analyses, as we are yet to see.
43
• We may also use SPSS to compute the average score
for each student in the said three subjects. To create
average, we shall follow the following steps;
• From the menu bar, click transform then compute
variables
• Type Average in the Target variables box
• Then transfer (read + write + math)/3 into the numeric
expressions box.
• Click OK.
44
• Again note that a new variable or column
(Average) has been added as the last
column in the data view window. We can
now use this new variable for other
analyses, as we are yet to see.
45
• Collapsing a continuous variable into groups.
• For some analyses (e.g. Analysis of Variance) you
may wish to divide the sample into equal groups
according to respondents’ scores on some variable
(e.g. to give low, medium and high scoring groups).
This technique leaves
• the original variable measured as a continuous
variable, intact so that you can use it for other analyses
46
• To illustrate this process, we will use the data file that you
saved.
• Procedure for collapsing a continuous variable into groups.
• 1. From the menu at the top of the screen click on: Transform, and
choose Visual Bander.
• 2. Select the continuous variable that you want to use (e.g. read).
Transfer it into the Variables to Band box. Click on the Continue
button.
• 3. In the Visual Bander screen that appears, click on the variable to
highlight it.
• A histogram, showing the distribution of read scores should appear.
47
• At the section at the top labelled Banded
Variable type in the name for the new
• categorical variable that you will create (e.g.
Readband3).You can also change the
• suggested label that is shown (e.g. to read in 3
groups).
• 5. Click on button labelled Make Cutpoints,
and then OK.
48
• SPSS will try to put a percentage of the sample in each group.
Click on the Apply
• button.
• 6. Click on the Make Labels button back in the main dialogue
box. This will automatically generate value labels for each of
the new groups created. You can modify these if you wish by
clicking in the cells of the Grid.
• 7. Click on OK and a new variable (Readband3) will appear at
the end of your data file (go back to your Data Editor window,
choose the Variable View tab, and it should be at the bottom).
49
• Run Descriptives, Frequencies on your
newly created variable (Ageband3) to check
• the number of cases in each of the
categories.
50
• In the dialogue box that appears click on the option
Equal Percentiles Based on Scanned Cases. In
• the box Number of Cutpoints specify a number
one less than the number of
• groups that you want (e.g. if you want three
groups, type in 2 for cutpoints). In
• the Width (%) section below you will then see
33.33 appear—this means that
51
Presenting or summarizing
data using SPSS
• This involves summarizing or condensing

data and presenting them in simple form
that can easily be analysed/interpreted or
explained. SPSS provides us with two
statistical tools of data presentation, namely
frequency tables and graphics, as we have
described in the steps that follow;
52
Frequency tables
• Frequency tables are perhaps the most popular tools for data
presentation, because of their efficiency or ability to present so
much data in a small space. However we must note that frequency
tables are only used to present data which is categorical in nature.
To generate a frequency table use the following steps:
• Click Analyze, then Descriptive statistics and then Frequencies.
Transfer any categorical variable e.g. sex into the variables box
and then
• Click OK. NB: In case there are other variables in the variables
box, before you transfer the variables, first click reset, to remove
any existing variable from the box.
53
Tutorial problem
• a) Repeat the same procedure and generate a

frequency table for students’ SES
• b) Repeat the same procedure and generate a
frequency table for students’ learner type
• c) Repeat the same procedure and generate a
frequency table for students’ SES, sex and
learner type, by transferring all of them into
the box simultaneously.
54
Types of frequency tables
• 1. Simple, One-way or Univariate frequency tables.
Note that the frequency tables generated above are all
referred to as one-way, simple or univariate frequency
tables, because they present one variable.
• 2. Complex, Two-way or Bi-variate frequency table.
When two orbi-variate frequency tables (e.g. SES and
sex) are cross tabulated, we get a Two-way, Complex
or Bi-variate frequency table, which is also known as
a Cross-tabulation table.
55
Types of frequency tables
• To generate a cross tabulation or two way table, we
may follow the following steps;
• Click Analyze, then descriptive statistics, then
Cross tabs.
• Transfer the first categorical variable e.g. sex into
the Rows box
• Transfer the second categorical variable e.g. SES
into the Column’s box
• Click Ok
56
Tutorial problem
• a) Repeat the same procedure and generate

a cross tabulation table for SES and Ltype
• b) Repeat the same procedure and generate
a cross tabulation table for sex and Ltype
•
57
DESCRIPTIVE DATA ANALYSIS USING SPSS
• Descriptive data analysis is an attempt to
describe data collected from a specific
sample, without any attempt to generalize
results to the population from which the
sample was chosen. In this section we see
how SPSS helps us to describe our data
using four tools as per this section headings;
58
Analysis of frequencies or frequency counts using
• Frequency counts andSPSS their corresponding relative
frequencies or percentages help data analysts to make sense
of their data in a one-way or univariate frequency table. Let
us teach ourselves on how SPSS helps us generate frequency
tables showing frequency counts and relative frequencies.
• From our sample data set, generate a simple, univariate or
one way frequency table for the respective categorical
variables there in (SES, sex and Ltype). Transfer all these
into the variables box and click Ok.
59
The output generated from this procedure is shown
below
60
• From the results you obtain, note that SPSS gives
relative frequencies under two columns, labeled
Percent and Valid Percent respectively.
However, in interpretation of our data, we should
ignore the Valid Percent and use the Percent
column, in case there are no missing responses.
But if there are some missing responses, then we
should take the Valid Percent column, which
ignores the missing scores.
61
• Also note that SPSS gives the Cumulative
Percent column, but we should remember
what we saw in our Statistical Methods,
that Cumulative Percents only make sense
if data are for ordinal or ranked variables.
• Also remember that apart from helping us
generate one-way, simple or univariate
frequency tables, SPSS can help us generate
Two-way, Complex or Cross-tabulation
frequency tables, which pertain two
variables.
62
• Let us teach ourselves on how SPSS helps
us generate Two-way, Complex or Cross-
tabulation frequency tables.
• From our sample data set, generate a Two-
way, Complex or Cross-tabulation
frequency table for the respective
categorical variables there in (SES Vs sex
and Ltype; sex Vs Ltype).
63
Procedure for cross-tabulation
• From our sample data set, generate a

Complex or Cross-tabulation frequency
table for the categorical variables there in
(SES Vs sex). Clik analyze ,then
descriptive statistics, crosstab and Transfer
SES into row and SEX into column and
click Ok.
64
Complex or Cross-tabulation
frequency table for SES and SEX
65
Analysis of Central tendencies or location using
• We should note now that SPSS
we cannot use frequency tables to
present or describe data on numeric variables. Thus data on
numeric variables is presented or described using measures
of central tendencies or location, where data is simply
summarised in tables (not frequency tables), describing our
sample using these statistics. We should remember from
our Statistics, that most numerical variables tend to have
the so called Normal frequency distribution or curve,
where majority of the scores tend to be located in the
centre.
66
Analysis of Central tendencies or location using
SPSS
• SPSS helps us determine the extent to which a given
numerical variable is normally distributed by helping us
to generate the pertinent frequency Histograms or curves.
• We should note that when describing any numerical data
set, we want to locate its centre where most scores tend to
be located. This is achieved by measures of central
tendencies or location such as average, model, median
and mean. Let us see how SPSS help us describe our data
by generating these;
67
To generate (arithmetic) mean and median for
numerical variables,
• Click Analyse, Descriptive Statistics and then Explore.
• Transfer a numeric variable (e.g. read) into the
Dependent List box
• Click Ok
• Note that among these statistics, SPSS gives us the
(arithmetic) mean and median. However it does not give
us Mode because model scores as measures of central
tendency or location are more suitable for categorical
variables.
68
The output generated from this
procedure is shown below
69
Analysis of Dispersion using
SPSS
• We should note that while two data sets may
have a common measure of central tendency or
location, the dispersion of the scores may differ.
Thus in addition to measures of central tendency
or location, we also need measures of dispersion
such as range, variance and standard deviation
of scores in a given data set. Let us see how
SPSS helps us to generate these indices.
70
To generate range, variances and standard deviations
for any numerical score,
• Click Analyse, Descriptive Statistics and

then Explore.
• Transfer a numeric variable (e.g. read) into
the Dependent List box
• Click Ok
71
procedure is shown below
72
Analysis of Skew using SPSS
• We should also note that while some numerical variables tend to have the
so-called normal frequency distribution or curve, where most
observations are located in the centre, some curves tend to deviate from
this normality and become skewed, either positively or negatively. SPSS
can help us determine the extent to which data on a given numeric
variable (e.g. read) is skewed, by helping us generate pertinent
Histograms or curves. You can also observe that among the several
statistics SPSS gives you, there is a measure or a statistic on skewness.
The calculation or figure will show you the direction of the skew,
whether positive or negative. Remember that a negative answer implies
that data is skewed to the left, while a positive answer implies that data is
skewed to the right and a zero implies that data is normally distributed.
73
Graphs
• SPSS for Windows provides a number of different types of
graphs (referred
• to by SPSS as charts). In this chapter I’ll cover the basic
procedures to obtain
• the following graphs:
• histograms;
• • bar graphs;
• • scatterplots;
• • boxplots; and
• • line graphs.
74
Histograms
• Histograms are used to display the
distribution of a single continuous variable
• (e.g. read, write, math).
75
Procedure for creating a histogram
1. From the menu at the top of the screen click on:
Graphs, followed by legacy dialogs then click on
Histogram.
2. Click on your variable of interest and move it into the
Variable box. This should be
a continuous variable (e.g. read).
3. Click on Display normal curve. This option will give
you the distribution of your variable
and, superimposed over the top, how a normal curve for
this distribution would look.
4. If you wish to give your graph a title click on the Titles
button and type the desired
title in the box (e.g. Histogram of reading scores).
5. Click on Continue, and then OK.
76
procedure is shown below.
77
Interpretation of output from Histogram
Inspection of the shape of the histogram provides
information about the distribution of scores on the
continuous variable. we assume that the scores on
each of the variables are normally distributed (i.e.
follow the shape of the normal curve). In this
example, the scores are reasonably normally
distributed, with most scores occurring in the centre,
tapering out towards the extremes. It is quite
common in the social
78
Bar graphs
• Bar graphs can be simple or very complex, depending on how
many variables
• you wish to include. The bar graph can show the number of cases
in particular
• categories, or it can show the score on some continuous variable
for different
• categories. Basically you need two main variables—one
categorical and one
• continuous. You can also break this down further with another
categorical
• variable if you wish.
79
Procedure for creating a bar graph
• 1. From the menu at the top of the screen click on:
Graphs, followed by legacy dialogs ; then Bar.
• 2. Click on Clustered.
• 3. In the Data in chart are section, click on
Summaries for groups of cases. Click on Define.
• 4. In the Bars represent box, click on Other
summary function.
• 5. Click on the continuous variable you are interested
in (e.g. read).
• This should appear in the box listed as Mean (Total
reading). This indicates that the mean on the reading
Scale for the different groups will be displayed.
80
Procedure for creating a bar
graph
• Click on your first categorical variable (e.g. SES).
Click on the arrow button to move it into the
Category axis box. This variable will appear across
the bottom of your bar graph (X axis).
• 7. Click on another categorical variable (e.g. sex) and
move it into the Define Clusters
• by: box. This variable will be represented in the
legend.
• 8. Click on OK.
81
The output generated from this procedure,
after it has been slightly modified, is shown
below.
82
Interpretation of output from Bar Graph
• The output from this procedure gives you a quick

summary of the distribution of scores for the
groups that you have requested (in this case,
males and females from the different SES). The
graph presented above suggests that females with
high SES have high mean scores in reading than
males, and that, this difference is more
pronounced among the students with high SES.
83
Scatterplots
• Scatterplots are typically used to explore the
relationship between two continuous
• variables (e.g. reading and writing). It is a good
idea to generate a scatterplot, before calculating
correlations . The scatterplot will give you an
indication of whether your variables are related
in a linear (straight-line) or curvilinear fashion.
Only linear relationships are suitable for
correlation analyses.
84
Scatterplots..cont
• The scatterplot will also indicate whether your variables
are positively related (high scores on one variable are
associated with high scores on the other) or negatively
related (high scores on one are associated with low scores
on the other). For positive correlations, the points form a
line pointing upwards to the right (that is, they start low
on the left-hand side and move higher on the right).
• For negative correlations, the line starts high on the left
and moves down on the
• right (see an example of this in the output below).
85
Scatterplots..cont
• The scatterplot also provides a general
indication of the strength of the relationship
between your two variables. If the
relationship is weak, the points will be all
over the place, in a blob-type arrangement.
For a strong relationship the points will form
a vague cigar shape, with a definite clumping
of scores around an imaginary straight line.
86
•
Scatterplots..cont
In the example that follows I request a scatterplot of scores
on reading and writing . I have asked for two groups in my
sample (males and females) to be represented separately on
the one scatterplot (using different symbols).
• This not only provides me with information concerning my
sample as a whole but also gives additional information on
the distribution of scores for males and females. If you wish
to obtain a scatterplot for the full sample (not split by group),
• just ignore the instructions below in the section labelled ‘Set
Markers by’.
87
Procedure for creating a scatterplot
• 1. From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs; then on Scatter.
• 2. Click on Simple and then Define.
• 3. Click on your first variable, usually the one you consider is the
dependent variable,
• (e.g. reading ).
• 4. Click on the arrow to move it into the box labelled Y axis. This
variable will appear
• on the vertical axis.
• 5. Move your other variable (e.g. writing) into the box labelled X
axis. This variable will appear on the horizontal axis.
88
Procedure for creating a scatterplot
• You can also have SPSS mark each of the points according to
some other categorical variable (e.g. sex). Move this variable into
the Set Markers by: box.
• This will display males and females using different markers.
• 7. Move the ID variable in the Label Cases by: box. This will
allow you to find out the ID number of a case from the graph if
you find an outlier.
• 8. If you wish to attach a title to the graph, click on the Titles
button. Type in the desired
• title and click on Continue.
• 9. Click on OK
89
The output generated from this procedure, modified
slightly for display purposes, is
shown below
90
Interpretation of output from
Scatterplot
• From the output above, there appears to be a strong,
positive correlation between the two variables (reading
and writing) for the sample as a whole. However, there is
no indication of a curvilinear relationship, so it would be
appropriate to calculate a Pearson correlation coefficient
for these two variables . Remember, the scatterplot does
not give you definitive answers; you need to follow it up
with the calculation of the appropriate statistic (in this
case, Pearson correlation coefficient).
91
Boxplots
• Boxplots are useful when you wish to compare the
distribution of scores on variables. You can use
them to explore the distribution of one continuous
• variable (e.g. reading) or alternatively you can ask
for scores to be broken down for different groups
(e.g. SES). You can also add an extra categorical
• variable to compare (e.g. males and females). In the
example below I will explore the distribution of
scores on the reading scale for males and females.
92
Procedure for creating a boxplot
• 1. From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs ; then click on Boxplot.
• 2. Click on Simple. In the Data in Chart Are section click on
Summaries for groups of cases. Click on the Define button.
• 3. Click on your continuous variable (e.g. reading). Click the
arrow button to move it into the Variable box.
• 4. Click on your categorical variable (e.g. sex). Click on the
arrow button to move into the Category axis box.
• 5. Click on ID and move it into the Label cases box. This will allow you to
identify the ID numbers of any cases with extreme values. 6. Click on OK.
93
procedure is shown below.
94
Boxplot
• The output from Boxplot gives you a lot of
information about the distribution of your
continuous variable and the possible
influence of your other categorical variable
(and cluster variable if used).
95
Boxplot..cont
• Each distribution of scores is represented by a
box and protruding lines (called whiskers).
The length of the box is the variable’s
interquartile range and contains 50 per cent of
cases. The line across the inside of the box
represents the median value. The whiskers
protruding from the box go out to the
variable’s smallest and largest values.
96
Interpretation of output from Boxplot..cont
• Any scores that SPSS considers are outliers appear as little
circles with a number attached (this is the ID number of the
case). Outliers are cases with
• scores that are quite different from the remainder of the
sample, either much higher or much lower. SPSS defines
points as outliers if they extend more than 1.5 box-lengths
from the edge of the box. Extreme points (indicated with an
asterisk, *) are those that extend more than 3 box-lengths
from the edge of the box. In the example
• above there are no outliers at the low values for reading for
both males and females.
97
Boxplot..cont
• In addition to providing information on outliers, a
boxplot allow you to inspect the pattern of scores
for your various groups. It provides an indication
of the variability in scores within each group and
allows a visual inspection of the differences
between groups. In the example presented above,
the
• distribution of scores on reading for males and
females is not similar.
98
Line graphs
• A line graph allows you to inspect the mean scores
of a continuous variable across a number of
different values of a categorical variable (e.g. SES,
Learner type). They are also useful for graphically
exploring the results of a one- or two-way analysis
of variance. Line graphs are provided as an optional
extra in the output of analysis of variance. This
procedure shows you how to generate a line graph
without having to run ANOVA.
99
Procedure for creating a line graph
• From the menu at the top of the screen click on: Graphs,
followed by legacy dialogs ; then click on Line.
• 2. Click on Multiple. In the Data in Chart Are section,
click on Summaries for groups of cases. Click on Define.
• 3. In the Lines represent box, click on Other summary
function. Click on the continuous variable you are
interested in (e.g. reading). Click on the arrow button. The
variable should appear in the box listed as Mean (reading).
This indicates that the mean on the reading Scale for the
different groups will be displayed.
100
Procedure for creating a line graph
• 4. Click on your first categorical variable (e.g.
SES). Click on the arrow button to move it into the
Category Axis box. This variable will appear
across the bottom of your line graph (X axis).
• 5. Click on another categorical variable (e.g. sex)
and move it into the Define Lines
• by: box. This variable will be represented in the
legend.
• 6. Click on OK.
101
The output generated from this procedure, modified
slightly for display purposes, is
shown below.
102
Interpretation of output from Line Graph
• The line graph displayed above contains a good
deal of information.
• First, you can look at the impact of SES on mean
scores in reading for each of the sexes separately.
Female with high SES appear to have higher
mean scores in reading than either in median or
low SES . For females and male in the median
SES have almost the same mean scores in reading.
103
Line Graph
• The results presented above suggest that to understand
the impact of SES on reading you must consider the
respondents’ gender. This sort of relationship is referred
to, when doing analysis of variance, as an interaction
effect. While the use of a line graph does not tell you
whether this relationship is statistically significant, it
certainly gives you a lot of information and raises a lot
of additional questions.
104
Exercises for practices
check given data files
105
Analysis of Correlation between two numerical
variables; using Pearson’s linear correlation coefficient
(PLCC)
• The PLCC is used to test for a relationship between two variables IV &
DV which are both numerical in nature. For example if we are interested
in establishing whether students’ scores in reading (numerical variable)
and scores in writing (numerical variable) are significantly related; that is
linearly correlated. In this case, we shall test a research hypothesis that;
• H1: the mean scores in reading and in writing are significantly linearly
correlated,
• Against a null hypothesis that
• H0: the mean scores in reading and in writing are not significantly linearly
correlated.
•
107
variables; using Pearson’s linear correlation
coefficient (PLCC)
• NB. Before performing a correlation

analysis it is a good idea to generate a
scatterplot first. This enables you to check
for violation of the assumptions of linearity
• For scores in reading (numerical variable)
and scores in writing (numerical variable)
see slide 90 scatter plot produced.
108
coefficient (PLCC)
• To compute the PLCC statistic (r), we can take the
following steps;
• Click Analyze, then Correlate and then Bivariate
• Transfer read and write into the Variables box
• Click Options /Statistics and flag Means and
Standard Deviations
• Click Continue and then Ok.
• NB. Before you click Ok, ensure that the word
Pearson’s Correlation is ticked/flagged.
109
Analysis of Correlation between two numerical variables;
using Pearson’s linear correlation coefficient (PLCC)
• At least two tables will be generated, one for
Descriptive Statistics and another for
Correlations which can be summarized one as
follows;
Pearson’s Correlation results for students’
Scores in reading and writing
Variables Sample size Sample mean Sample std Deviation r-value Sig.
Read 33 0.926 0.000

11.55 3.15
Write 33
11.09 3.60
110
Analysis of Correlation between two numerical variables; using
Pearson’s linear correlation coefficient (PLCC)
• Note that the computed PLCC (r-value) in the table

produced is not computed for its own sake, rather it is to
help us test the null hypothesis on the co-relation between
scores in reading and scores in writing. In testing this null
hypothesis, we are asking a question; is the observed r-
statistic in table produced statistically significant or big
enough for us to reject the null hypothesis?
•
111
coefficient (PLCC)
• SPSS saves us from bother of going through the
tedious steps of calculations, by giving us the r-
statistic (r-value) together with its
accompanying significance (sig.) level or p-
value. This sig. level or p-value, behaves in such
a way that as the computed or observed r-
statistic (r-value) increases or becomes bigger or
more significant, the sig. or p-value reduces.
112
coefficient (PLCC)
• Thus, to determine whether the observed r-

statistic (r-value) is statistically significant or
big enough, all what we need to do is to check
whether its accompanying sig. or p-value is
small enough; if this sig. or p-value is less or
equal to 0.05, then it is small enough and so,
we reject the null hypothesis and accept the
alternative.
113
coefficient (PLCC)
• From our example (the table produced ), since

the sig. or p-value (0.000) is less than 0.05
then, at α = 0.05 or at 5% level of
significance, we reject the null hypothesis and
accept the alternative one and we conclude or
infer that the two variables, i.e. scores in
reading and scores in writing are positively
significantly co-related
114
variables; using Pearson’s linear correlation coefficient
(PLCC)
• Determining the strength of the relationship
• The other thing to consider in the output in the table
produced is the size of the value of Pearson correlation
(r). This can range from –1.00 to 1.00. This value will
indicate the strength of the relationship between your two
variables. A correlation of 0 indicates no relationship at
all, a correlation of 1.0 indicates a perfect positive
correlation, and a value of –1.0 indicates a perfect
negative correlation. How do you interpret values
between 0 and 1?
115
Determining the strength of the relationship
• Different authors suggest different interpretations;
however, Cohen (1988) suggests the following
guidelines:
• r=.10 to .29 or r=–.10 to –.29 small
• r=.30 to .49 or r=–.30 to –.4.9 medium
• r=.50 to 1.0 or r=–.50 to –1.0 large
• These guidelines apply whether or not there is a negative
sign out the front of your r value. Remember, the
negative sign refers only to the direction of the
relationship, not the strength. The strength of correlation of
r=.5 and r=–.5 is the same
116
Determining the strength of the relationship cont,…
. It is only in a different direction.

•In the example presented above there is a large
correlation between the two variables (r= .915),
suggesting quite a strong relationship between
reading and writing.. or r=.50 to 1.0 or r=–.50 to –
1.0 large
117
Tutorial problem
• a) Repeat the same example by testing
whether mean scores in reading and Maths
are significantly co-related.
• b) Repeat the same example by testing
whether mean scores in writing and Maths
are significantly co-related.
118
Analysis of Correlation between two ordinal or ranked
variables; using Spearman’s rank correlation
coefficient (SRCC) using SPSS
• In SRCC test, we are interest in testing two variables IV&
DV which are both ordinal or ranked are significantly
correlated. We should observe however that the SRCC is
much easier in computation and is similar in interpretation
to the PLCC test, already seen in this paper, because, the
SRCC is an approximation of the PLCC.
• Thus we may repeat practicing the PLCC in tutorial

problem by ensuring that the word Spearman at the
bottom is highlighted instead of the Pearson option. The
rest, such as results and their interpretation are the same
119
Analysis of Co-relation between two categorical
variables; Pearson’s Chi-square
• The Pearson’s Chi-square (χ2) test is used to test for a
relationship between two variables IV & DV which are both
categorical in nature. For example if we are interested in
testing whether students’ socio-economic status and sex (both
categorical variable) are significantly co-related, we shall test a
research hypothesis that;
• H1: the two categorical variables (SES and sex) are
significantly co-related,
• Against a null hypothesis that
• H0: the two categorical variables (SES and sex) are not
significantly co-related,
•
120
• To test this null hypothesis, SPSS can be
used as follows;
• Click Analyze, then Descriptive Statistics
and then Crosstabs
• Transfer SES into the Rows box
• Transfer Sex into the Columns box
• Click Statistics and flag Chi-square
121
• At least two tables will be generated, one
for Cross tabulations and another for Chi-
Square Tests which can be summarized
into one as follows;
• Table produced : Chi-square test results
for students’ SES and sex
Categories of SES Categories of sex χ2 Sig.
Total
Male Female
Low
5 6 11
Medium
4 6 10
0.068 0.966
High
5 7 12
Total
14 19 33
122
• Note that the computed Chi-square (χ2) statistic in the table

produced is not computed for its own sake, rather it is to
help us test the null hypothesis on the co-relation between
the two categorical variables (SES and sex). In testing this
null hypothesis, we actually ask a question; is the
computed (observed) χ2-statistic in the table produced
statistically significant or big enough for us to reject the
null hypothesis?
•
123
• SPSS saves us from the bother of going through

the tedious steps of calculations, by giving us
the χ2-statistic together with its accompanying
significance (sig.) level or p-value. Like all sigs.
or p-values, this sig. level or p-value, behaves in
such a way that as the computed or observed χ 2-
statistic increase or becomes bigger or more
significant, the sig. or p-value decreases.
124
• Thus, to check whether the observed χ2-statistics is
statistically significant or big enough, all what we need
to do is to check whether its accompanying sig. or p-
value is small enough; that is less than the popular sig. or
p-value in social sciences of α = 0.05 or 5%, in which
case we reject the null hypothesis and accept the
alternative.
125
• From our example (Table produced), since

the sig. or p-value (0.966) is greater than α
= 0.05, then, at 5% level of significance, we
accept the null hypothesis and reject the
alternative hypothesis and infer that the said
two categorical variables (SES and sex) are
not significantly co-related, that is, they are
independent.
126
Tutorial problem
• a) Repeat the same example by testing whether

students’ SES and Ltype are significantly co-
related.
• b) Repeat the same example by testing whether
students’ sex and Ltype are significantly co-
related.
•
127
Exploring Differences Between Groups Or
Comparative Data Analysis
• In comparative data analysis we want to compare
two variables (IV &DV) where the IV is categorical
and the DV is numerical. In other words we want to
compare variables and establish whether there exists
a big difference between two variables,
•
• where the IV is categorical and the DV is
numerical. Mostly analyses involve comparing the
mean score for each group on one or more dependent
variables. There are a number of different but related
statistics in this group. The main techniques are very
briefly discussed in the following slides.
128
T-tests
• T-tests are used when you have two groups
(e.g. males and females) or two sets
• of data (before and after), and you wish to
compare the mean score on some
continuous variable. There are two main
types of t-tests. Paired sample t-tests
• (also called repeated measures) are used
when you are interested in changes in
scores for subjects tested at Time 1, and
then again at Time 2 (often after some
intervention or event).
129
T-tests…cont
• The samples are ‘related’ because they are the
same people tested each time. Independent
sample t-tests are used when you have two
different (independent) groups of people (males
and females), and you are interested in
comparing their scores. In this case you collect
information on only one occasion, but from two
different sets of people.
130
Example of T-test
• For example, we may be interested in testing whether
the mean scores in reading (a numerical DV) differed
significantly according to sex (a binary categorical
IV, with two categories, male and female).
• The major aim is to find out whether the mean values

for the two categories (male and female) differ
significantly.
131
Example of T-test
• In other words if we let μm and μf represent
the mean scores in reading for all male and
female students respectively, then we shall
test a research that;
• H1: the mean scores in reading for male and
female students differ significantly.
• Against a null hypothesis that;
132
Example of T-test
• H0: the mean scores in reading for male and
female students do not differ significantly.
• i.e. H1: μm ≠ μf against a null hypothesis that; H0: μm =
μf.
133
Example of T-test
• SPSS relieve us from the tedious calculations as it can
help us test this null hypothesis easily through the
following procedures;
• To compute the t-test,
• Click Analyze, Compare means and Independent
samples t-test,
• Transfer read into the Test Variables box
• Transfer sex into the Grouping Variables box.
• Click Define Groups and then type 1 in group 1 box
and 2 in group 2 box (1 and 2 refer to the codes you
gave for male and female respectively).
134
Example of T-test
Group (descriptive) statistics and another for
the t-statistic and its sig. or p-value. The two
tables can be summarized into one as follows;
• Tables produced: Descriptive statistics and t-
test results for student’s scores in reading by
sex
Categories of sex Sample size Sample means Sample std dev t Sig or p-value
Male 14 10.71 3.32 -2.068 0.047
Female 19 13.53 3.15
135
Example of T-test
• Note that the computed t-statistic is not computed
for its own sake; rather, it is to help us test the
previously stated null hypothesis, on whether the
mean scores in reading for male and female
students differ significantly.
• In testing this null hypothesis we are actually
posing a question; is the computed/observed t-
statistic statistically significant or big enough for
us to reject the H0?
136
Example of T-test

• To determine whether the t-test statistic is statistically

significant or big enough, all what we need to do is to look
at the sig or the p-value and find out whether it is small
enough. If the sig. or p-value ≤ 0.05 then the t-statistic is
statistically significant or big enough. So we should reject
the null hypothesis and accept the alternative.
• From our example (above Table), since the sig. or p-value
(0.047) is less than 0.05, then at 5% level of significance,
we reject the null hypothesis and we conclude that the
mean scores in reading for male and female students
differed significantly, with females performing better than
the males.
•
137
Tutorial problem
• a) Repeat the same procedure by testing whether the
mean scores in writing differ significantly according
to sex.
• b) Repeat the same procedure by testing whether the
mean scores in Maths differ significantly according to
sex.
• c) Repeat the same procedure by testing whether the
mean scores in reading differ significantly according
to learner type.
138
Tutorial problem
• a) Repeat the same procedure by testing whether the mean
scores in writing differ significantly according to sex.
• b) Repeat the same procedure by testing whether the mean
scores in maths differ significantly according to sex.
• c) Repeat the same procedure by testing whether the mean
scores in reading differ significantly according to learner type.
• d) Repeat the same procedure by testing whether the mean
scores in writing differ significantly according to learner type.
• e) Repeat the same procedure by testing whether the mean
scores in maths differ significantly according to learner type.
139
Tutorial problem
• d) Repeat the same procedure by testing
whether the mean scores in writing differ
significantly according to learner type.
• e) Repeat the same procedure by testing
whether the mean scores in Maths differ
significantly according to learner type.
140
Comparison of two or more population means for
equality; Fishers’ One-Way ANOVA
• Fisher’s Analysis of Variance is a generalization of the
student’s two independent samples t-test. It is a comparative
tool used to analyze data where the researcher is interested
in comparing means of a categorical IV, which has more
than two categories with a numerical DV.
• From our example, we may be interested in testing whether

students’ performance in reading (a numeric DV) differ
according to SES (categorical IV, with more than two
categories, Low, Medium and High).
141
• If we let μL, μm, and μH stand for the mean scores in reading
for low, medium and high socio-economic status students
respectively, then we shall test a research hypothesis that;
• H1: the mean scores in reading differ significantly according

to socio-economic status.
• (i.e. μL≠ μm, ≠ μH) Against a null hypothesis that;
• H0: the mean scores in reading do not differ significantly
according to socio-economic status. (i.e. (i.e. μL= μm, = μH)
142
• To test this null hypothesis, SPSS can be used as

follows;
• Click Analyse, compare means and One Way
ANOVA.
• Transfer SES into the Factor box
• Transfer read into the Dependent list box.
• Click Options and flag Descriptives and then Continue
• Continue and Ok.
143
Descriptives and the second for ANOVA results.
These two tables can be summed up into one table that
appears as follows;
• Table produced : Descriptive statistics and ANOVA
(F) results for students’ scores in reading by SES
Categories of SES Sample size Sample mean Sample std F-value sig
Deviation
Low 11 8.91 2.343 16.243 0.000
Medium 10 11.20 1.874

High 12 14.25 2.454
Total 33 11.55 3.153
144
• Note that the computed F-statistic in Table produced is not
computed for its own sake; rather it is to help us test the null
hypothesis on equality of mean scores in reading for all students in
the three socio-economic statuses. In testing this null hypothesis, we
are asking a question; is the observed F-statistic in Table
statistically significant or big enough for us to reject the null
hypothesis?
SPSS, saves us from bother of going through the so many steps of

calculations, by giving us the F-statistic together with its
accompanying significance (sig.) level or p-value.
145
• This sig. level or p-value, behaves in such a way that as the
computed or observed F-statistic increases or becomes bigger
or more significant, the sig. or p-value reduces.
• Thus, to determine whether the observed F-statistic is

statistically significant or big enough, all what we need to do is
to check whether its accompanying sig. or p-value is small
enough; and if this sig. or p-value is less or equal to 0.05, then
it is small enough. If so, then we reject the null hypothesis and
accept the alternative.
146
• From our example, since the sig. or p. value
(0.000 ) is less than 0.05 then, at α =
0.05 or at 5% level of significance, we
reject the null hypothesis and accept the
alternative one and we conclude or infer
that the mean scores in reading differ
significantly according to students’ socio-
economic status.
147
Tutorial problem
• ) Repeat the same example by testing whether
mean scores in writing differ significantly
according to socio-economic status.
• b) Repeat the same example by testing whether
mean scores in Maths differ significantly
according to socio-economic status.
•
148
equality; Fishers’ Two-way analysis of variance
• Two-way analysis of variance allows you to test
the impact of two independent variables on one
dependent variable. The advantage of using a
two-way ANOVA is that it allows you to test for
an interaction effect—that is, when the effect of
one independent variable is influenced by
another; for example, when you suspect that
performance in reading varies with SES, but
only for males.
149
equality; Fishers’ Two-way analysis of variance
• It also tests for ‘main effects’—that is, the
overall effect of each independent variable (e.g.
sex, ses and ltype).
• There are two different two-way ANOVAs:
between-groups ANOVA (when the groups are
different) and repeated measures ANOVA
(when the same people are tested on more than
one occasion).
150
• Factor analysis
• Factor analysis allows you to condense a large set of variables
or scale items
• down to a smaller, more manageable number of dimensions or
factors. It does
• this by summarising the underlying patterns of correlation and
looking for ‘clumps’
• or groups of closely related items. This technique is often used
when developing
• scales and measures, to identify the underlying structure
151

Statisitvcal Methods and SPSS Latest-5

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Statisitvcal Methods and SPSS Latest-5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statisitvcal Methods and SPSS Latest-5

Uploaded by

Copyright:

Available Formats

SPSS LECTURE

• To enter the data into SPSS, you must

• Write, Maths, SES, Sex and Ltype. Also

• Read= scores in reading

•To compute frequency table; click analyze

• This involves summarizing or condensing

• a) Repeat the same procedure and generate a

• a) Repeat the same procedure and generate

• From our sample data set, generate a

• Click Analyse, Descriptive Statistics and

• The output from this procedure gives you a quick

• NB. Before performing a correlation

Read 33 0.926 0.000

• Note that the computed PLCC (r-value) in the table

• Thus, to determine whether the observed r-

• From our example (the table produced ), since

. It is only in a different direction.

• Thus we may repeat practicing the PLCC in tutorial

• Note that the computed Chi-square (χ2) statistic in the table

• SPSS saves us from the bother of going through

• From our example (Table produced), since

• a) Repeat the same example by testing whether

• The major aim is to find out whether the mean values

Male 14 10.71 3.32 -2.068 0.047

Female 19 13.53 3.15

• To determine whether the t-test statistic is statistically

• From our example, we may be interested in testing whether

• H1: the mean scores in reading differ significantly according

• To test this null hypothesis, SPSS can be used as

Low 11 8.91 2.343 16.243 0.000

Medium 10 11.20 1.874

SPSS, saves us from bother of going through the so many steps of

• Thus, to determine whether the observed F-statistic is

You might also like