Correlation Certificate

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Correlations and scatter diagrams

Correlation refers to the degree of correspondence or relationship between two variables.


Correlated variables tend to change together. If one variable gets larger, the other one
systematically becomes either larger or smaller.

The degree of correlation/association is determined by rank correlation coefficients


There are two types
1. Spearman rank correlation coefficient (ρ)
2. Kenddall’s rank correlation coefficient (τ

Interpretation of the rank correlation coefficients


A rank correlation coefficient measures the degree of similarity between two rankings
The table below is used
Correlation coefficient Interpretation
0-0-19 Very low correlation
0.2-0.39 Low correlation
0.4-0.59 Moderate correlation
0.6 – 0.79 High correlation
0.8 – 1.0 Very high correlation

Note: the positive or negative signs indicate positive or negative relationships respectively. Or they
the relationships are directly or inversely related.
The closer to zero the lower the relationship

Spearman rank correlation (ρ)


6 ∑ 𝑑2
It is given by ρ = 1- [ ]
𝑛(𝑛2 −1)
Where d = difference between ranks
n = total number of pairs
Example 1
Two examiners marked the scripts of 8 candidates. The table shows the marks awarded by two
examiners x and y.
x 72 60 56 76 68 52 80 64
y 56 44 60 74 66 38 68 52
Calculate the rank correlation coefficient and comment on your reults
Example 2
The following shows the marks obtained by 10 students in mathematics and physics exams.
Mathematics 80 80 70 60 65 80 68 90 95 50
Physics 50 45 70 80 70 90 70 80 70 95
Calculate the ranks correlation coefficient and comment on your results
Solution
RM RP d d2
6 ∑ 𝑑2
4 9 -5 25 ρ = 1- [ ]
𝑛(𝑛2 −1)
6 6.5 0.5 0.25 = 1- [
6 𝑥 211.5
] = -0.282
10(102 −1)
9 3.5 5.5 30.25
There is a low negative correlation
8 6.5 1.5 2.25
between mathematics and physics
4 2 2 4
7 6.5 0.5 0.25
2 3.5 1.5 2.25
1 6.5 -5.5 30.25
10 1 9 81
∑ 𝑑 2 = 211.5

Example 3
The following table gives the order in which six candidates were ranked in two tests x and y
x E C B F D A
y F A D E C C
Calculate the rank correlation coefficient and comment on your results
Solution
Rx Ry d d2
6 ∑ 𝑑2
5 6 1 1 ρ = 1- [ ]
𝑛(𝑛2 −1)
3 1 2 4 6 𝑥 14.5
= 1- [ ] = -0.586
2 4 2 4 6(62 −1)
There is a moderate negative correlation
6 5 1 1
between x and y
4 2.5 1.5 2.25
1 2.5 -1.5 225
∑ 𝑑 2 = 14.5

Kendall’s Rank correlation coefficient


It is a coefficient that represents the degree of concordance/agreement between toe columns of
ranked data. The greater the ‘inversions’ the smaller the coefficient will
𝐶−𝐷
Kendall’s Rank correlation coefficient, τ (tau) =
𝐶+𝐷
Where C = number of concordant pairs or pairs in agreement
D = number of disconcordant pairs or pairs in disagreement

The Tau correlation coefficient returns a value of 0 to 1, where: 0 is no relationship, 1 is a perfect


relationship
- Concordant pairs are the number of observed ranks below a particular rank which are larger
than that particular rank.
- Disconcordant pairs are the number of observed rank below a particular rank which are smaller
than that particular rank

Example 4
Two examiners marked the scripts of 8 candidates. The table shows the marks awarded by two
examiners x and y.
x 72 60 56 76 68 52 80 64
y 56 44 60 74 66 38 68 52

Using Kendall's method, calculate the rank correlation coefficient and comment on your results
Solution

x 723 606 567 762 684 528 801 645


y 565 447 604 741 663 388 682 526

- The ranks of x and y are filled in the table as below those of x in ascending order and those of y
correspondingly as shown in the table below.
- The values of C are bigger values in the column Ry bigger than and below a particular value in
that column. While the values of D are sallerr values in the column Ry bigger than and below a
particular value in that column
- For instance, the first value of C in the table below is the number of values bigger than and
below 2 in column Ry i.e. (5, 3, 6, 7, 4, 8) =6; the first value D = the number of values smaller
than and below 2 in column Ry; i.e. (1) = 1

Rx Ry C D 𝐶−𝐷
τ (tau) =
𝐶+𝐷
1 2 6 1 23−5
τ (tau) = = 0.642
2 1 6 0 23+5

3 5 3 2 high positive correlation


4 3 4 0
5 6 2 1
6 7 1 1
7 4 1 0
8 8
∑ 𝐶 = 23 ∑𝐷 = 5

Example 5
The height (cm) an ages (years) of random sample of ten farmers are given in the table below

Height 156 151 152 160 146 157 149 142 158 140
(cm)
Ages 47 38 44 55 46 49 54 52 45 30
(years)
(a)(i) Calculate the Kendall rank correlation coefficient
(ii) comment on your result (06marks)

Let the farmers be A, B, C, D, E, F, G, H, I, J (the subscripts are the rank)


Farmers A B C D E F G H I J
Height 1564 1516 1525 1601 1468 1573 1497 1429 1582 14010
Age 475 389 448 551 466 494 542 523 457 3010

By re-arranging the findings we have


Farmers D I F A C B G E H J
Height 1 2 3 4 5 6 7 8 9 10
Age 1 7 4 5 8 9 2 6 3 10
Agreements 9 3 5 4 2 1 3 1 1 =29
(C)
Disagreements 0 4 2 2 3 3 0 1 0 =15
(D)
𝐶−𝐷
τ (tau) =
𝐶+𝐷
29−15
τ (tau) = = 0.32
29+15
low positive correlation
Significance of ranks correlation coefficients
The calculated ranks correlation coefficients to be statistically significant; the calculated value should
be greater than that from the table critical values associated with the various sample sizes and
significance levels (α).
That is
 If the |𝜌𝑐 | > |𝜌𝑇 | 𝑜𝑟 |𝜏𝑐 | > |𝜏 𝑇 | , a significant relationship exists
 If the |𝑃𝑐 | < |𝑃𝑇 | 𝑜𝑟 |𝜏𝑐 | < |𝜏 𝑇 | , no significant relationship exists
Where 𝜌𝑐 = calculated Spearman’s correlation coefficient
𝜌𝑇 = Table Spearman’s correlation coefficient at either 1% (α=0.01) or 5% (α =0.05)
𝜏𝑐 = calculated Kendall’s correlation coefficient
𝜏 𝑇 = Table Kendall’s correlation coefficient at either 1% (α=0.01) or 5% (α =0.05)
Example 6
The following shows the marks obtained by 8students in mathematics and physics exams
Mathematics 65 65 70 75 75 80 85 85
Physics 50 55 58 55 65 58 61 65
Calculate the ranks correlation coefficient and comment of the significance of your results at 5%
level (Spearman’s 𝜌 = 0.71), 𝐾𝑒𝑛𝑑𝑎𝑙𝑙 ′ 𝑠 𝜏 = 0.64
(a) Using Spearman’s correlation coefficient 6 ∑ 𝑑2
2 ρ = 1- [ 2 ]
RM RP d d 𝑛(𝑛 −1)
6 𝑥 21
7.5 8 0.5 0.25 = 1- [ ] = 0.75
8(82 −1)
7.5 6.5 1 1 Since ρC(0.75)> ρT (0.71), a
6 4.5 1.5 2.25 significant relationship exist
4.5 6.5 -2 4
4.5 1.5 3 9
3 4.5 -1.5 2.25
1.5 3 -1.5 2.25
1.5 1.5 0 0
∑ 𝑑 2 = 21

Using Kendall’s correlation coefficient


RM RP C D 𝐶−𝐷
τ (tau) =
1.5 1.5 6 0 𝐶+𝐷
22−3
1.5 3 5 1 τ (tau) = = 0.76
22+3
3 4.5 3 1 Since τC(0.75)> τT (0.64), a
4.5 1.5 4 0 significant relationship exist
4.5 6.5 1 1
6 4.5 2 0
7.5 6.5 1 0
7.5 8 =22 3
Scatter graphs
They are graphs showing the relationship between two variables

Example 7
The heights and masses of ten students are given in the table below
Height 156 152 152 146 160 157 149 142 158 68
(cm)
Mass 62 58 63 58 70 60 55 57 68 56
(kg)
(a)(i) Plot the data on a scatter diagram
(ii) Draw the line of the best fit. Hence estimate the mass corresponding to height of 155cm
(b) (i) Calculate the rank correlation coefficient for the data.
(ii) Comment on the significance of the heights on the masses of the students.[Spearman’s ρ =
0.79 and Kendall’s τ = 0.64at 1% level of significance based on 10 observations]
Solution
(a)(i)

The weight corresponding to height 155cm is 65kg


Note: this value may vary from 63 to 67kg depending on how one has drawn the line of the best fit.
(b)(i) Using Spearman’ ran correlation coefficient
Heights (x) Mass (y) Rx Ry d d2 6 ∑ 𝑑2
ρ = 1- [ ]
156 62 4 4 0 0 𝑛(𝑛2 −1)
6 𝑥 21.5
151 58 6 6.5 0.5 0.25 = 1- [ ] = 0.8697
10(102 −1)
152 63 5 3 2 4 Since ρC(0.87)> ρT (0.79), a
146 58 8 6.5 1.5 2.25 significant relationship
160 70 1 1 0 0 exist between height and
weight of students at 1%
157 60 3 5 -2 4
level
149 55 7 10 -3 9
142 57 9 8 1 1
158 68 2 2 0 0
141 56 10 9 1 1
∑ 𝑑 2 = 21.5

Using Kendall’s method


By naming the pairs we have
A(156, 62), B(151, 58), C(152, 63), D(146, 58), E(160, 70), F(157, 60), G149, 55), H(142, 57), I(158, 68),
J(141, 56)
E I F A C B G D H J
x 1 2 3 4 5 6 7 8 9 10
y 1 2 5 4 3 6.5 10 6.5 8 9
C 9 8 5 5 5 3 0 2 1 =38
D 0 0 2 1 0 0 3 0 0 =6
𝐶−𝐷
τ (tau) =
𝐶+𝐷
38−6
τ (tau) = = 0.73
38+6
Since τC(0.73)> τT (0.64), a significant relationship exist between the heights and masses of student.

Revision questions
1.
Eight candidate seeing admission to a University sat for written and oral test. The scores were
shown below
Written 55 54 35 62 87 53 71 50
(x)
Oral (y) 57 60 47 65 83 56 74 63
(a) Plot the result on a scatter diagram. Comment on the relationship between the written test
and oral test
(b) Draw the line of the best fit on your graph and use it to estimate y when x = 70.
(c) Calculate the rank correlation coefficient. Comment on your results
2.
The pairs of observation have been made on two random variables x and y. the ten (x, y) are
(0, 20), (-7, 12), (-10, 15), (-12, 22), (-17, 5), (-30, -5), (-32, 13), (10, 30), (15, 40), and (-12, 8).
(a) Draw the results on a scatter diagram
(b) Draw the line of the best fit
(c) Estimate the expected value of y corresponding to x = -7
(d) Calculate the rank correlation coefficient and comment on the significance of the results at
1% significance level. (ρ = 0.894, τ= 0.778)
3.
Three examiners X, Y and Z each marked scripts of ten candidates who sat for mathematics
examination. The table below shows the examiner’s ranking of candidates.
A B C D E F G H I J
X 8 5 9 2 10 1 7 6 3 4
Y 5 3 6 1 4 7 2 10 8 9
Z 6 3 7 2 5 4 1 10 9 8
Calculate the coefficient of rank correlation of the rankings
(i) X and Y
(ii) Y and Z
(iii) Comment on the significance of each at 5% significant level
4.
Three weighing scales from three different shops W, X and Y in a market were used to weigh 10
bags of beans (A, B, C……) and the results in (kg) were given in the table below
A B C D E F G H I J
W 65 68 70 63 64 62 73 75 72 78
X 63 68 68 60 65 60 72 73 70 66
Y 63 74 78 75 64 73 79 70 67 79
Determine the rank correlation coefficient for the performance of the scales
(i) W and X
(ii) X and Y
(iii) Which of the three scales W, X and Y were in good working conditions
5.
(a) In many government institution, officers complain about typing errors. A test was designed
to investigate the relationship between typing speed and errors made. Twelve typist A, B, C
…L were picked at random to type a text. The table below shows the rankings of the typist
according to speed and errors made. (N.B lowest ranking in error implies the least errors
Typist A B C D E F G H I J K L
Speed 3 4 2 1 8 11 10 6 7 12 5 9
Errors 2 6 5 1 10 9 8 3 4 12 7 11
(i) Calculate the coefficient of rank correlation
(ii) Comment on the significance at 1% significance level
(b) The cost of travelling at a certain distance away from the city centre is found to depend on
the route and distance a given place is away from the centre. The table below gives average
rates of travel charged for distances to be travelled away from the city centre
Distance 9 12 14 21 24 30 33 45 46 50
(s km)
Rate 750 1000 1150 1200 1350 1250 1400 1750 1600 2000
charged
(r shs)
(i) Plot the above data on a scatter diagram and draw a line of best fit through the points of the
scatter diagram
(ii) Estimate the expected value r corresponding to s = 40km

You might also like