Unit 3
Unit 3
Unit 3
INTRODUCTION
So far, we have studied problems relating to one variable only. In practice, we come across
many problems involving the use of two or more than two variables.
If two quantities vary in such a way that movements in one are accompanied by
movements in the other, these quantities are correlated.
For example,
1. there exists some relationship between age of husband and age of wife,
2. price of commodity and amount demanded,
3. increase in rainfall up to a point and production of rice,
4. an increase in the number of television licenses and number of cinemagoers, etc.
The correlation analysis refers to the techniques used in measuring the closeness of the
relationship between the variables.
Some important definitions of correlation are given below:
1. "Correlation analysis deals with the association between two or more variables."
Simpson & Kafka
2. "When the relationship is of a quantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in brief formula is known as
correlation." Croton & Cowden
3. "Correlation analysis attempts to determine the 'degree of relationship' between
variables." Ya Lun Chou
4. "Correlation is an analysis of the covariation between two or more variables." A.M.
Turtle.
Thus, correlation is a statistical device which helps us in analyzing the covariation of two or
more variables.
The problem of analyzing the relation between different series should be broken into
three steps:
1. Determining whether a relation exists and, if it does, measuring it.
2. Testing whether it is significant.
3. Establishing the cause-and-effect relation, if any.
TYPES OF CORRELATION
Correlation is described or classified in several different ways. Three of the most important
ways of classifying correlation are:
1. Positive or negative.
2. Simple, partial, and multiple.
3. Linear and non-linear.
EXAMPLES
1. POSITIVE CORRELATION
X: 10 12 15 18 20
Y: 15 20 22 25 37
2. POSITIVE CORRELATION
X: 80 70 60 40 30
Y: 50 44 30 20 10
1. NEGATIVE CORRELATION
X: 20 30 40 60 80
Y: 40 30 22 15 10
2. NEGATIVE CORRELATION
X: 100 90 60 40 30
Y: 10 20 30 40 50
SIMPLE, PARTIAL, AND MULTIPLE CORRELATIONS:
The distinction between Simple, Partial, and Multiple Correlations based upon the number
variables are studied.
For example. When we study the relationship between the yield of rice per acre and both the
amount of rainfall and the amount of fertilizers used, it is a problem of multiple correlation.
On the other hand, in partial correlation we recognize more than two variables, but consider
only two variables to be influencing each other the effect of other influencing variables being
kept constant.
EXAMPLE
In the rice problem taken above if we limit our correlation analysis of yield and rainfall to
periods when a certain average daily temperature existed it becomes a problem relating to
partial correlation only.
LINEAR AND NON-LINEAR (CURVILINEAR) CORRELATION
The distinction between linear and non-linear correlation is based upon the constancy of the
ratio of change between the variables. If the amount of change in one variable tends to bear
constant ratio to the amount of change in the other variable, then the correlation is said to be
linear.
EXAMPLE
observe the following two variables X and Y :
X: 10 20 30 40 50
Y: 70 140 210 280 350
The ratio of change between the two variables is the same If such variables are plotted on a
graph paper the plotted points would fall on straight line.
Correlation would be called non-linear or curvilinear if the amount of change in one variable
does not bear a constant ratio to the amount of change in the other variable.
EXAMPLE
If we double the amount of rainfall the production of rice or wheat, etc. would not necessarily
be doubled. It may be pointed out that in most of the practical situations, we find a non-linear
relationship between the variables. However, since techniques of analysis for measuring non-
linear correlation are far more complicated than those for linear correlation. we generally
assume that the relationship between the variables is of the linear type.
METHODS OF STUDYING CORRELATION
The various methods of ascertaining whether two variables are correlated or not are:
1. Scatter Diagram Method
2. Graphic Method
3. Karl Pearson's Coefficient of Correlation
4. Concurrent Deviation Method
5. Method of Least Squares.
Of these, the first two are based on the knowledge of diagrams and graphs. whereas the
others are the mathematical methods. Each of these methods shall be discussed in detail in the
following pages.
when this method is used the given data are plotted on graph paper in the form of dots.
i.e. for each pair of X and Y values, we put a dot and thus obtain as many points as the
number of observations. By looking at the scatter of the various points we can form an idea as to
whether the variables are related or not.
The more closely the points come to a straight line falling from the lower left-hand corner to the
upper right-hand corner, correlation is said to be perfectly positive (Le., r= + 1) (diagram I).
On the other hand, if all the points are lying on a straight line rising from the upper left-hand corner
to the lower right-hand corner of the diagram, correlation is said to be perfectly negative (Le.. r= - 1)
(diagram II).
If the plotted points fall in a narrow band there would be a high degree of correlation between the
variables
correlation shall be positive, if the points show a rising tendency from the lower left-hand corner to
the upper right-hand corner (diagram III)
and negative if the points show a declining tendency from the upper left-hand corner to the lower
right-hand corner of the diagram (diagram IV).
On the other hand, if the points are widely scattered over the diagram it indicates very little
relationship between the variables
—correlation shall be positive if the points are rising from the lower left-hand corner to the upper
right-hand corner (diagram V)
and negative if the points are running from the upper left-hand side to the lower right-hand side of
the diagram (diagram VI).
If the plotted points lie on a straight line parallel to X axis or in haphazard manner, It shows absence
of any relation between the variables. (r=0) In diagram VII.
This method is to be applied only where deviations of items are taken from aactual mean and not
from assumed mean.
The value of coefficient of correlation as obtained by the above formula shall always lie between
−1∧+1.
hen r =−1∨+1 it means there Is perfect positive correlation between the variables. When r =−1,
It means there is perfect negative ,correlation between the variables, When r= 0, it means there Is
no relationship between the two variables.
Example 3.
Calculate Karl Pearson's coefficient of correlation from the following data and interpret its value:
Examle:
Data of academic achievement, anxiety, and intelligence for
10subjects. Then compute the partial correlation between
the academic achievement and anxiety partialled out for
Intelligence.
Assumption of the Pearson’s Coefficient :
Karl Pearson's coefficient of correlation is based on the following assumptions :
• There is linear relationship between the variables, ie., when the two variables are plotted on a
scatter diagram a straight line will be formed by the points.
• The two variables under study are affected by many independent causes to form a normal
distribution. Variables like height, weight. price, demand, supply, etc., are affected by such forces
that a normal distribution is formed.
• There is a cause and effect relationship between the forces affecting, the distribution of the items
in the two services. If such a relationship is not formed between the variables i. e.. if the variables
are independent. (here cannot be any relation). For example. there is no relationship between
Income and height because the forces that affect these variables are not common.
The probable error of the coefficient of correlation helps in interpreting its value.
With the help of probable error, it is possible to determine the reliability of the value of the
coefficient in so far as it depends on the conditions of random sampling.
The probable error of the coefficient of correlation is obtained as follows:
2
1−r
P Er =0.6745
√N
where r is the coefficient of correlation and N the number of pairs of observations.
• If the value of r is less than the probable error there is no evidence of correlation, i.e., the value of
r is not at all significant.
• If the value of r is more than six times the probable error, the coefficient of correlation is
practically certain, i.e., the value of r is significant.
• By adding and subtracting the value of probable error from the coefficient of correlation we get
respectively the upper and lower limits within which coefficient of correlation in the population can
be expected to lie.
Symbolically,
ρ=r ± P Er
where ρ (rho) denotes correlation in the population.
Let us compute probable error, assuming a coefficient of correlation of 0.80 and a sample of 16
pairs, of items.
2
1−r 2 1−(0.8)
We will have P Er =0.6745 =0.6745 =0.06
√N √ 16
The limits of the correlation in the population would be ρ=r ± P Er =0.8 ± 0.06
Illustration 15.
If r= 0.6 and N= 64, find out the probable error of the coefficient of correlation and determine the
limits for population r.
Properties of the correlation coefficient
1.
2.
3.
4.