PY1PR1 Stats Lecture 6 Handout

PY1PR1 Lecture 6: Nonparametric statistics
Dr David Field
Parametric vs. non-parametric

The t test covered in Lecture 5 is an example of a
parametric test
Parametric tests assume the data is of sufficient quality
the results can be misleading if assumptions are wrong
Quality is defined in terms of certain properties of the data
Non-parametric tests can be used when the data is not of

sufficient quality to satisfy the assumptions of parametric
test
Parametric tests are preferred when the assumptions are met
because they are more sensitive, and many of the parametric tests
you will encounter in year 2 have no non-parametric equivalent
Chapter 15 of the Andy Field textbook covers nonparametric tests

Chapter 5 covers assumptions in detail
Chapter 9 (9.3.2 and 9.8) cover specific assumptions of t tests
Assumptions of t tests a list

1) The sampling distribution is normally distributed
We dont have access to the sampling distribution
But the central limit theorem (text book 2.5.1) indicates
that the sampling distribution will always be normal if
sample size is 30 or greater
For N < 30 if the sample data is normally distributed
then the sampling distribution will also be normal
For an independent samples t test this means both

samples should be normally distributed
For a related samples t test or a one sample t test this
means the difference scores, not the raw scores,
should be normally distributed
2) The data should come from an interval or ratio

scale
in practice an ordinal scale with 5 or more levels is ok
Assumptions of t tests a list

3) There should not be extreme scores or outliers,
because these have a disproportionate influence
on the mean and the variance
4) For the independent samples t test the variance
in the two samples should be approximately
equal
This assumption is more important if sample size < 30

and / or sample sizes are unequal
As a rule of thumb, if the variance of one group is 3 or
more times greater than the variance of the other
group, then use non-parametric
Assumption 1 - normality
This can be checked by inspecting a histogram
with small samples the histogram is unlikely to ever be
exactly bell shaped
This assumption is only broken if there are large

and obvious departures from normality
In severe skew the most

extreme histogram interval
usually has the highest
frequency
In moderate skew the most
extreme histogram interval
does not have the highest
frequency
Assumption 3 no extreme scores
It is sometimes legitimate to
exclude extreme scores from
the sample or alter them to
make them less extreme. See
section 5.7.1 of the textbook.
You may then use parametric.
Assumption 4 (independent samples t only)

equal variance
Variance 25.2
Variance 4.1
Assumption 4 equal variances (independent

samples t only)
Sometimes, the variance in the two groups is
unequal, but the larger variance is less than 3
times bigger than the smaller variance
In this case you can perform a t test with a correction
for unequal variance
SPSS provides a statistical test called Levenes Test of
the null hypothesis that the variances in the two groups
are the same
If that null hypothesis is rejected you need to make a
correction to the t test
If the variance of one group is 3 or more times

bigger than the other then perform a Mann
Whitney U test (see later)
Levenes test and correcting for unequal

variance
Group Statistics
DV
group
1.00
2.00
Mean
25.5673
31.1920
12
12
Std. Deviation
5.04689
7.79554
Std. Error
Mean
1.45691
2.25038
variances are 25.4 and 60.7

Independent Samples Test
Levene's Test for
Equality of Variances
F
DV
Equal variances
assumed
Equal variances
not assumed
7.236
Sig.
.013
t-test for Equality of Means
df
Sig. (2-tailed)
Mean
Difference
Std. Erro
Differenc
-2.098
22
.048
-5.62476
2.6808
-2.098
18.843
.050
-5.62476
2.6808
Levenes test and correcting for unequal

variance
Group Statistics
DV
group
1.00
2.00
Mean
25.5673
31.1920
12
12
Std. Deviation
5.04689
7.79554
Std. Error
Mean
1.45691
2.25038
variances are 25.4 and 60.7

Independent Samples Test
Levene's Test for
Equality of Variances
F
DV
Equal variances
assumed
Equal variances
not assumed
7.236
Sig.
.013
t-test for Equality of Means
df
Sig. (2-tailed)
Mean
Difference
Std. Erro
Differenc
-2.098
22
.048
-5.62476
2.6808
-2.098
18.843
.050
-5.62476
2.6808
Digression: testing the null hypothesis that

two samples have the same variance
Suppose some researchers predict that children educated
in a traditional way will have a greater range of scores in
end of year tests compared to the modern approach
40 children are randomly allocated to either traditional or
modern classrooms
The Levenes Test can be used to test the null hypothesis
that the two groups show the same amount of dispersion
around the mean
Non-parametric tests
These are sometimes referred to as distribution free
tests, because they do not make assumptions about the
normality or variance of the data
The Mann Whitney U test is appropriate for a 2 condition
independent samples design
The Wilcoxon Signed Rank test is appropriate for a 2
condition related samples designs
If you have decided to use a non-parametric test then the
most appropriate measure of central tendency will
probably be the median
Mann-Whitney U test
To avoid making the assumptions about the data that are
made by parametric tests, the Mann-Whitney U test first
converts the data to ranks.
If the data were originally measured on an interval or ratio
scale then after converting to ranks the data will have an
ordinal level of measurement
Mann-Whitney U test: ranking the data

Sample 1
Sample 2
Score
Rank 1
Score
Rank 2
13
12
5.5
5.5

Sample 1
Sample 2
Score
Rank 1
Score
Rank 2
13
12
5.5
5.5
Scores are ranked irrespective of which experimental group

they come from

Sample 1
Sample 2
Score
Rank 1
Score
Rank 2
13
12
5.5
5.5
Tied scores take the mean of the ranks they occupy. In this
example, ranks 5 and 6 are shared in this way between 2
scores. (Then the next highest score is ranked 7)
Rationale of Mann-Whitney U
Imagine two samples of scores drawn at random from the
same population
The two samples are combined into one larger group and
then ranked from lowest to highest
In this case there should be a similar number of high and
low ranked scores in each original group
if you sum the ranks in each group the totals should be about the
same
this is the null hypothesis
If however, the two samples are from different populations

with different medians then most of the scores from one
sample will be lower in the ranked list than most of the
scores from the other sample
the sum of ranks in each group will differ
Mann-Whitney U test: sum of ranks

Sample 1
Sample 2
Score
Rank 1
Score
Rank 2
13
12
5.5
5.5
Sum of ranks
20.5
15.5
The next step in computing the Mann-Whitney U is to sum

the ranks in the two groups
Mann Whitney U - SPSS

Ranks
DV
group
1.00
2.00
Total
Mean Rank
10.75
14.25
12
12
24
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailed
Sig.)]
DV
51.000
129.000
-1.212
.225
a
.242
a. Not corrected for ties.

b. Grouping Variable: group
Sum of Ranks
129.00
171.00
The value of U is calculated

using a formula that compares
the summed ranks of the two
groups and takes into account
sample size
You dont need to know the
formula
Mann Whitney U - SPSS

Ranks
DV
group
1.00
2.00
Total
Mean Rank
10.75
14.25
12
12
24
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Exact Sig. [2*(1-tailed
Sig.)]
DV
51.000
129.000
-1.212
.225
a
.242
a. Not corrected for ties.

b. Grouping Variable: group
Sum of Ranks
129.00
171.00
You should generally report the

asymptotic p value
To calculate this SPSS
converts the value of U to a Z
score, i.e. a value on the
standard normal distribution
The Z score is converted to a p
value in the same way as for
the Z test (lecture 4)
Mann Whitney U - reporting

As the data was skewed, and the two sample sizes were
unequal, the most appropriate statistical test was MannWhitney. Descriptive statistics showed that group 1
(median = ____ )scored higher on the DV than group 2
(median = ____). However, the Mann-Whitney U was
found to be 51 (Z = -1.21), p > 0.05, and so the null
hypothesis that the difference between the medians arose
through sampling effects cannot be rejected.
For a significant result: .. Mann-Whitney U was found
to be 276.5 (Z = -2.56), p = 0.01 (one-tailed), and so the
null hypothesis that the difference between the medians
arose through sampling effects can be rejected in favour of
the alternative hypothesis that the IV had an influence on
the DV.
Wilcoxon signed ranks test

This is appropriate for within participants designs
The t test lecture used a within participants
example based upon testing reaction time in the
morning and in the afternoon, using the same
group of participants in both conditions
The Wilcoxon test is conceptually similar to the
related samples t test
between subjects variation is minimised by calculation
of difference scores
Wilcoxon test: ranking the data

Score
cond 1
Score
cond 2
Difference
Ranked dif
ignoring + /-
-4
3.5
-1
-4
3.5
First rank the difference scores, ignoring the sign of the

difference. Differences of 0 receive no rank
Rationale of Wilcoxon test

Some difference scores will be large, others will be small
Some difference scores will be positive, others negative
If there is no difference between the two experimental
conditions then there will be similar numbers of positive
and negative difference scores
If there is no difference between the two experimental
conditions then the numbers and sizes of positive and
negative differences will be equal
this is the null hypothesis
If there is a differences between the two experimental

conditions then there will either be more positive ranks
than negative ones, or the other way around
Also, the larger ranks will tend to lie in one direction

Score
cond 1
Score
cond 2
Difference
Ranked dif Ranked dif

ignoring + /+/reattached
-4
3.5
-3.5
-1
-1
-4
3.5
-3.5
Add the sign of the difference back into the ranks

Score
cond 1
Score
cond 2
Difference
Ranked dif Ranked dif

ignoring + /+/reattached
-4
3.5
-3.5
-1
-1
-4
3.5
-4
Separately, sum the positive ranks and the negative ranks. In

this example the positive sum is 2 and the negative sum is
-8.5. The Wilcoxon T is whichever is smaller (2 in this case)
Wilcoxon T - SPSS
Ranks
N
RTM in the afternoon
- RTM in the morning
Negative Ranks
Positive Ranks
Ties
Total
10a
3b
1c
14
Mean Rank
8.45
2.17
Sum of Ranks
84.50
6.50
a. RTM in the afternoon < RTM in the morning

b. RTM in the afternoon > RTM in the morning
c. RTM in the afternoon = RTM in the morning
Test Statisticsb
The value of T is equal to whichever of

the mean ranks is lower
T is converted to a Z score by SPSS,
taking into account sample size, and
the p value is derived from the
standard normal distribution
Z
RTM in the
afternoon RTM in the
morning
-2.732a
.006
a. Based on positive ranks.

b. Wilcoxon Signed Ranks Test
Wilcoxon T - reporting
As the difference scores were not normally distributed,
the most appropriate statistical test was the Wilcoxon
signed-rank test. Descriptive statistics showed that
measurement in condition 1 (median = ____ ) produced
higher scores than in condition 2 (median = ____). The
Wilcoxon test (T = 2.17) was converted into a Z score of
-2.73, p = 0.006 (two tailed). It can therefore be concluded
that the experimental and control treatments produced
different scores.
Limitations of non-parametric methods

Converting ratio level data to ordinal ranked data
entails a loss of information
This reduces the sensitivity of the non-parametric
test compared to the parametric alternative in
most circumstances
sensitivity is the power to reject the null hypothesis,
given that it is false in the population
lower sensitivity gives a higher type 2 error rate
Many parametric tests have no non-parametric

equivalent
e.g. Two way ANOVA, where two IVs and their
interaction are considered simultaneously
List of statistical terms for revision
Parametric
Non-parametric
Assumption
Outlier
Levenes test for equality of variances
Mann Whitney U test
Wilcoxon signed ranks test

PY1PR1 Stats Lecture 6 Handout

Uploaded by

Copyright:

Available Formats

PY1PR1 Stats Lecture 6 Handout

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PY1PR1 Stats Lecture 6 Handout

Uploaded by

Copyright:

Available Formats

PY1PR1 Lecture 6: Nonparametric statistics

Parametric vs. non-parametric

Non-parametric tests can be used when the data is not of

Chapter 15 of the Andy Field textbook covers nonparametric tests

Assumptions of t tests a list

For an independent samples t test this means both

2) The data should come from an interval or ratio

in practice an ordinal scale with 5 or more levels is ok

Assumptions of t tests a list

This assumption is more important if sample size < 30

This assumption is only broken if there are large

In severe skew the most

Assumption 3 no extreme scores

Assumption 4 (independent samples t only)

Assumption 4 equal variances (independent

If the variance of one group is 3 or more times

Levenes test and correcting for unequal

variances are 25.4 and 60.7

t-test for Equality of Means

Levenes test and correcting for unequal

variances are 25.4 and 60.7

t-test for Equality of Means

Digression: testing the null hypothesis that

Mann-Whitney U test: ranking the data

Mann-Whitney U test: ranking the data

Scores are ranked irrespective of which experimental group

Mann-Whitney U test: ranking the data

If however, the two samples are from different populations

Mann-Whitney U test: sum of ranks

The next step in computing the Mann-Whitney U is to sum

Mann Whitney U - SPSS

a. Not corrected for ties.

The value of U is calculated

Mann Whitney U - SPSS

a. Not corrected for ties.

You should generally report the

Mann Whitney U - reporting

Wilcoxon signed ranks test

Wilcoxon test: ranking the data

First rank the difference scores, ignoring the sign of the

Rationale of Wilcoxon test

If there is a differences between the two experimental

Wilcoxon test: ranking the data

Ranked dif Ranked dif

Add the sign of the difference back into the ranks

Wilcoxon test: ranking the data

Ranked dif Ranked dif

Separately, sum the positive ranks and the negative ranks. In

a. RTM in the afternoon < RTM in the morning

The value of T is equal to whichever of

a. Based on positive ranks.

Limitations of non-parametric methods

Many parametric tests have no non-parametric

List of statistical terms for revision

You might also like