Unit 3 Partial and Multiple Correlations: Structure
Unit 3 Partial and Multiple Correlations: Structure
Unit 3 Partial and Multiple Correlations: Structure
CORRELATIONS
Structure
3.0 Introduction
3.1 Objectives
3.2 Partial Correlation (rp)
3.2.1 Formula and Example
3.2.2 Alternative Use of Partial Correlation
3.0 INTRODUCTION
While learning about correlation, we understood that it indicates relationship between
two variables. Indeed, there are correlation coefficients that involve more than two
variables. It sounds unusual and you would wonder how to do it? Under what
circumstance it can be done? Let me give you two examples. The first is about the
correlation between cholesterol level and bank balance for adults. Let us say that we
find a positive correlation between these two factors. That is, as the bank balance
increases, cholesterol level also increases. But this is not a correct relationship as
Cholesterol level can also increase as age increases. Also as age increases, the bank
balance may also increase because a person can save from his salary over the years.
Thus there is age factor which influences both cholesterol level and bank balance.
Suppose we want to know only the correlation between cholesterol and bank balance
without the age influence, we could take persons from the same age group and thus
control age, but if this is not possible we can statistically control the age factor and
thus remove its influence on both cholesterol and bank balance. This if done is called
partial correlation. That is, we can use partial and part correlation for doing the
same. Sometimes in psychology we have certain factors which are influenced by
large number of variables. For instance academic achievement will be affected by
intelligence, work habit, extra coaching, socio economic status, etc. To find out the
correlation between academic achievement with various other factors ad mentioned
above can be done by Multiple Correlation. In this unit we will be learning about
partial, part and multiple correlation.
3.1 OBJECTIVES
After completing this unit, you will be able to:
z Describe and explain concept of partial correlation;
49
Correlation and Regression z Explain, the difference between partial and semipartial correlation;
z Describe and explain concept of multiple correlation;
z Compute and interpret partial and semipartial correlations;
z Test the significance and apply the correlation to the real data;
z Compute and interpret multiple correlation; and
z Apply the correlation techniques to the real data.
Look at the data of academic achievement, anxiety and intelligence. Here, the academic
achievement test, the anxiety scale and intelligence test is administered on ten students.
The data for ten students is provided for the three variables in the table below.
Table 3.1: Data of academic achievement, anxiety and intelligence for 10
subjects
rP n − v
t= (eq. 3.3)
1 − rP2 51
Correlation and Regression Where,
rp = partial correlation computed on sample, rAB.C
n = sample size,
v = total number of variables employed in the analysis.
The significance of the rP is tested at the df = n – v.
In the present example, we can employ significance testing as follows:
rP n − v −.375 10 − 3 −0.992
t= = = = 1.69
1− r2
P 1 − (−.375 ) 2 0.927
We test the significance of this value at the df = 7 in the table for t-distribution in
the appendix. You will realise that at the df = 7, the table provides the critical value
of 2.36 at 0.05 level of significance. The obtained value of 1.69 is smaller than this
value. So we accept the null hypothesis stating that H0 : ñP = 0.
Large sample example:
Now we take a relatively large sample example. A counseling psychologist is interested
in understanding the relationship between practice of study skills and marks obtained.
But she is skeptical about the effectiveness of the study skills. She believes that they
can be effective because they are good cognitive techniques or they can be effective
simply because the subjects believes that the study skills are going to help them. The
first is attribute of the skills while second is placebo effect. She wanted to test this
hypothesis. So, along with measuring the hours spent in practicing the study skills and
marks obtained, she also took measures on belief that study skill training is useful.
She collected the data on 100 students. The obtained correlations are as follows.
The correlation between practice of study skills (A) and unit test marks (B) is 0.69
The correlation between practice of study skills (A) and belief about usefulness of
study skills (C) is 0.46
The correlation between marks in unit test (B) and belief about usefulness of study
skills (C) is 0.39
The partial correlation between practice of study skills (A) and unit test marks (B)
is 0.625. Let’s test the null hypothesis about the partial correlation for a null hypothesis
which states that H0 : ñP = 0.
The t value is significant at 0.05 level. So we reject the null hypothesis and accept
that there is a partial correlation between A and B. This means that the partial
correlation between practice of study skills (A) and unit test marks (B) is non-zero
at population. We can conclude that the correlation between practice of study skills
(A) and unit test marks (B) still exists even after controlled for the belief in the
52 usefulness of the study skills. So the skepticism of our researcher is unwarranted.
3.2.2 Alternative Use of Partial Correlation Partial and Multiple
Correlations
Suppose you have one variable which is dichotomous. These variables take two
values. Some examples are, male and female, experimental and control group, patients
and normal, Indians and Americans, etc. Now these two groups were measured on
two variables, X and Y. You want to correlate these two variables. But you are also
interested in testing whether these groups influence the correlation between the two
variables. This can be done by using partial correlations. Look at the following data.
This data is for male and female subjects on two variables, neuroticism and intolerance
to ambiguity.
Table 3.2: Table showing gender wise data for IOA and N.
Male Female
IOA N IOA N
12 22 27 20
17 28 25 15
7 24 20 18
12 32 19 12
14 30 26 18
11 27 23 13
13 29 24 20
10 17 22 9
21 34 21 19
If you compute the correlation between Intolerance of Ambiguity and neuroticism for
the entire sample of male and female for 20 subjects. It is – 0.462. This is against
the expectation.
This is a surprising finding which states that the as the neuroticism increases the
intolerance to ambiguous situations decreases. What might be the reason for such
correlation? If we examine the mean of these two variables across gender, then you
will realise that the trend of mean is reversed.
If you calculate the Pearson’s correlations separately for each gender, then they are
well in the expected line (0.64 for males and 0.41 for females).
The partial correlations can help us in order to solve this problem. Here, we calculate
the Pearson’s product moment correlation between IOA and N partialled out for
sex. This will be the correlation between neuroticism and intolerance of ambiguity
from which the influence of sex is removed.
rAB − rAC rBC −0.462 − (0.837 × −0.782) .193
rAB.C = = = = 0.566
(1 − rAC
2
)(1 − rBC
2
) (1 − 0.837 2 )(1 − (−0.782 2 )) 0.341
The correlation partialled out for sex is 0.57. Let’s test the significance of this
correlation.
rP n − v .566 18 − 3 2.194
t= = = = 2.66
1 − rP2 1 − .5662 0.824
The tabled value form the appendix at df = 15 for 0.05 level is 2.13 and for 0.01
level is 2.95. The obtained t-value is significant at 0.05 level. So we reject the null 53
Correlation and Regression hypothesis which stated that population partial correlation, between IOA and N
partialled out for sex is zero.
Partial correlation as Pearson’s Correlation between Errors
Partial Correlation can also be understood as a Pearson’s correlation between two
errors.
Before you proceed you need to know what is regression equation
Source: http://janda.org/c10/Lectures/topic04/L25-Modeling.htm
From this line, you can predict X from Y that is % votes in1984 if known, you can
find out the % of votes in 1980. Similarly if you know % of votes in 1980 you can
know % of votes in 1984.
The regression line seen in the above diagram is close to the scatterplots. That is
the predicted values need to be as close as possible to the data. Such a line is called
the best fitting line or Regression line. There are certain guidelines for regression lines:
1) Use regression lines when there is a significant correlation to predict values.
2) Do not use if there is not a significant correlation.
3) Stay within the range of the data. For example, if the data is from 10 to 60,
do not predict a value for 400.
54
4) Do not make predictions for a population based on another population’s Partial and Multiple
Correlations
regression line.
The y variable is often termed the criterion variable and the x variable the predictor
variable. The slope is often called the regression coefficient and the intercept the
regression constant. The slope can also be expressed compactly as ß1= r × sy/sx.
Normally we then predict values for y based on values of x. This still does not mean
that y is caused by x. It is still imperative for the researcher to understand the
variables under study and the context they operate under before making such an
interpretation. Of course, simple algebra also allows one to calculate x values for a
given value of y.
To obtain regression equation we use the following equation:
β = {N * ∑xy}- {∑y²*∑y} / {(N * ∑x²) – (∑y²)}
(eq. 4.8)
Where,
Y = dependent variable or criterion variable
á = the population parameter for the y-intercept of the regression line, or regression
coefficient (r=óy/ óx)
â = population slope of the regression line or regression coefficient (r*óx/ óy)
å = the error in the equation or residual
The value of á and â are not known, since they are values at the level of population.
The population level value is called the parameter. It is virtually impossible to calculate
parameter. So we have to estimate it. The two parameters estimated are á and â.
The estimator of the á is ‘a’ and the estimator for â is ‘b’. So at the sample level
equation can be written as
(eq. 4.9)
Where,
Y = the scores on Y variable
X = scores on X variable
a = the Y-intercept of the regression line for the sample or regression constant in
sample
b = the slope of the regression line or regression coefficient in sample
e = error in prediction of the scores on Y variable, or residual
Let us take an example and demonstrate
Example: Write the regression line for the following points:
55
Correlation and Regression
x y
1 4
3 2
4 1
5 0
8 0
Thus ß0 = [7*115 – 21*14] ÷ [5 * 115 - 212] = 511 ÷ 134 = 3.81 and ß1 = [5*14
– 21*7] ÷ [5 * 115 - 212] = -77 ÷ 134 = -0.575.
Thus the regression equation for this example is y = -0.575x + 3.81.
Thus if you have x , then you can find or predict y.
If you have y you can predict x.
Let’s continue with the first example.
It was relationship between anxiety and academic achievement. This relationship was
controlled for (partialled out for) intelligence.
In this case we can write two linear regression equations and solve them by using
ordinary least-squares (OLS). They are as follows:
Academic Achievement = a1 + b1 × Intelligence + e1
Where, ‘a1’ is a y intercept of the regression line;
‘b1’ is the slope of the line;
‘e1’ is the error in the prediction of academic achievement using intelligence.
Anxiety = a2 + b2 × Intelligence + å2
Where, ‘a2’ is a y intercept of the regression line;
‘b2’ is the slope of the line;
‘e2’ is the error in the prediction of academic achievement using intelligence.
Now we have e1 and e2. They are residuals of each of the variables after intelligence
explain variation in them. Meaning, e1 is the remaining variance in academic
achievement once the variance accounted for intelligence is removed. Similarly, e2 is
the variance left in the anxiety once the variance accounted for the intelligence is
removed.
Now, the partial correlation can be defined as the Pearson’s correlation between e1
and e2.
(eq. 3.4)
You will realise that this correlation is the correlation of academic achievement and
anxiety, from which a linear influence of intelligence has been removed. That is called
56
as partial correlation.
Partial and Multiple
3.4 PART CORRELATION (SEMIPARTIAL Correlations
CORRELATION) rSP
The Part correlation is also known as semi-partial correlation (rsp). Semipartial
correlation or part correlation are correlation between two variables, one of which
is partialled for a third variable.
In partial correlations (rp = rAB.C) the effect of the third variable (C) is partialled out
from BOTH the variables (A and B).
In semipartial correlations (rsp = rA(B.C)), as the name suggests, the effect of third
variable (C) was partialled out from only one variable (B) and NOT from both the
variables.
Let’s continue with the earlier example. The example was about the correlation
between anxiety (A) and academic achievement (B).
In the earlier example of partial correlation, we have partialled the effect of intelligence
(C) from both academic achievement and anxiety.
One may argue that the academic achievement is r=the only variable that relates to
intelligence.
So we need to partial out the effect of the intelligence only from academic achievement
and not from anxiety.
Now, we correlate anxiety (A) as one variable and academic achievement partialled
for intelligence (B.C) as another variable.
If we correlate these two then, the correlation of anxiety (A) with academic
achievement partialled for intelligence (B.C) is called as semipartial correlation (rA(B.C)).
In fact, if there are three variables, then total six semipartial correlations can be
computed. They are rA(B.C), rA(C.B), rB(A.C), rB(C.A), rC(A.B), and rC(B.A).
Formula:
In order to compute the semipartial correlation coefficient, following formula can be
used.
rAB − rAC rBC
rSP = rA( B.C ) = (eq. 3.5)
1 − rBC
2
Where,
rA(B.C) is a semipartial correlation of A with the B after linear relationship that C has
with B is removed
rAB Pearson’s product moment correlation between A and B
rAC Pearson’s product moment correlation between A and C
rBC Pearson’s product moment correlation between B and C
Example:
Let’s take the data from the earlier example of academic achievement, anxiety and
intelligence. The data table 3.1 is as follows.
57
Correlation and Regression Subject Academic Anxiety Intelligence
Achievement
1 15 6 25
2 18 3 29
3 13 8 27
4 14 6 24
5 19 2 30
6 11 3 21
7 17 4 26
8 20 4 31
9 10 5 20
10 16 7 25
The correlation between anxiety (A) and academic achievement (B) is – 0.369.
The correlation between intelligence (C) and academic achievement (B) is 0.918.
The correlation between anxiety (A) and intelligence (C) is – 0.245.
Given the correlations, we can now calculate the semipartial correlation (rSP) as
follows. We are not computing the correlation coefficients, simply because you have
already learned to compute the correlations earlier. The formula for semipartial
correlation is as follows:
The semipartial correlation between anxiety and academic achievement after the
linear relationship between the academic achievement and intelligence is removed is
-0.363.
The significance of the semipartial correlation can be tested by using t-distribution.
The null hpothsis and the alternate hypothesis are as follows.
H0: ñSP = 0
HA: ñSP ≠ 0
Where, the ñSP is the semipartial correlation in the population. We test the null
hypothesis whether the semipartial correlation in the population is zero. This can be
done by using following formula
rSP n − v
t= (eq. 3.7)
1 − rSP2
Where,
58 t = students t-value
rSP = semipartial correlation computed on sample, Partial and Multiple
Correlations
n = sample size,
v = number of variables used in the analysis
The significance of this t-value is tested at the df = n – v. when three variables are
involved then the df is n – 3.
For our example, the t-values can be computed as follows:
−0.363 10 − 3
t= = −1.032
1 − (−0.3632 )
The obtained t-value is tested at df = n – v = 10 – 3 = 7.
The t-value at .05 level is 2.364. The obtained t-value is smaller than that. So we
accept the null hypothesis that the population semipartial correlation is zero.
It has an interesting implication for our data. The correlation between anxiety and
academic achievement is zero in the population if the linear relationship between
academic achievement and intelligence is removed.
2
rAB + rAC
2
− 2rAB rAC rBC
RA . BC = (eq. 3.7)
1 − rBC
2
Where,
R A . BC = is multiple correlation between A and linear combination of B and C.
rAB = is correlation between A and B
rAC = is correlation between A and C
rBC = is correlation between B and C
Example
We shall continue with the earlier data.
The data table 3.1 is as follows.
0.813
=
0.94
= 0.929
This means that the multiple correlation between academic achievement and the
linear combination of intelligence and anxiety is 0.929 or 0.93. We have earlier
learned that the square of the correlation coefficient can be understood as percentage
of variance explained.
The R2 is then percentage of variance in academic achievement explained by the
linear combination of intelligence and anxiety. In this example the R2 is 0.9292 which
is 0.865. The linear combination of intelligence and anxiety explain 86.5 percent
variance in the academic achievement.
We have already converted the R into the R2 value. The R2 is the value obtained
on a sample. The population value of the R2 is denoted as P2. The R2 is an estimator
of the P2.
But there is a problem in estimating the P2 value from the R2 value.
The R2 is not an unbiased estimator of the P2.
So we need to adjust the value of the R2 in order to make it unbiased estimator.
Following formula is used for this purpose.
(1 − R )(n − 1)
2
R 2 = 1 − (eq. 3.8)
n − k −1
Where,
R 2 = adjusted value of R
2
k = number of predicted variables (or the variable for which a linear combination is
created)
n = sample size
61
Correlation and Regression For our example the R 2 value need to be computed.
(1 − R )(n − 1)
2
R 2 = 1 −
n − k −1
(1 − 0.865)(10 − 1)
R 2 = 1 −
10 − 2 − 1
1.217
R 2 = 1 − = 0.826
7
So the unbiased estimator of the R2 the adjusted value, R 2 , is 0.826 which is smaller
than the value of R2. It is usual to get a smaller adjusted value.
The significance testing of the R:
This can be used for the purpose of the significance testing. The null hypothesis and
the alternative hypothesis employed for this purpose are
H0 : P2 = 0
H A : P2 ≠ 0
The null hypothesis denotes that the population R2 is zero whereas the alternative
hypothesis denotes that the population R2 is not zero.
The F-distribution is used for calculating the significance of the R2 as follows:
(n − k − 1) R 2
F= (eq. 3.9)
k (1 − R 2 )
When the sample size is small, it is recommended that R 2 value be used. As the
sample size increase the difference between the resulting F values reduce considerably.
Since our sample is obviously small, we will use unbiased estimator.
(n − k − 1) R 2
F=
k (1 − R 2 )
It is the judgment of the researcher to use either of them. In the same example if R2
value is substituted for the adjusted R2 ( R 2 ) value then the F is 22.387 that is
significant at .01 level.
ATW OV Edu
2 7 14
4 10 13
8 14 11
7 13 9
8 9 5
9 10 14
1 6 5
0 9 6
6 12 11
5 10 12
65