Unit 3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

CORRILATION

INTRODUCTION
So far, we have studied problems relating to one variable only. In practice, we come across
many problems involving the use of two or more than two variables.

If two quantities vary in such a way that movements in one are accompanied by
movements in the other, these quantities are correlated.
For example,
1. there exists some relationship between age of husband and age of wife,
2. price of commodity and amount demanded,
3. increase in rainfall up to a point and production of rice,
4. an increase in the number of television licenses and number of cinemagoers, etc.

The degree of relationship between the variables under consideration is measured


through the correlation analysis.
The measure of correlation called the correlation coefficient or correlation index
summarizes in one figure the direction and degree of correlation.

The correlation analysis refers to the techniques used in measuring the closeness of the
relationship between the variables.
Some important definitions of correlation are given below:
1. "Correlation analysis deals with the association between two or more variables."
Simpson & Kafka
2. "When the relationship is of a quantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in brief formula is known as
correlation." Croton & Cowden
3. "Correlation analysis attempts to determine the 'degree of relationship' between
variables." Ya Lun Chou
4. "Correlation is an analysis of the covariation between two or more variables." A.M.
Turtle.

Thus, correlation is a statistical device which helps us in analyzing the covariation of two or
more variables.
The problem of analyzing the relation between different series should be broken into
three steps:
1. Determining whether a relation exists and, if it does, measuring it.
2. Testing whether it is significant.
3. Establishing the cause-and-effect relation, if any.

TYPES OF CORRELATION
Correlation is described or classified in several different ways. Three of the most important
ways of classifying correlation are:
1. Positive or negative.
2. Simple, partial, and multiple.
3. Linear and non-linear.

Positive and Negative Correlation


Whether correlation is positive (direct) or negative (inverse) would depend upon the
direction of change of the variables. If both the variables are varying in the same direction.
i.e., If as one variable is increasing the other, on an average, is also increasing or. if as one
variable is decreasing the other, on an average, is also decreasing, correlation is said to be
positive. If, on the other hand, the variables are varying in opposite direction, i.e. If as one
variable is increasing the other, on an average, is decreasing or. if as one variable is
decreasing the other, on an average, is increasing, correlation is said to be negative.

EXAMPLES
1. POSITIVE CORRELATION

X: 10 12 15 18 20
Y: 15 20 22 25 37

2. POSITIVE CORRELATION
X: 80 70 60 40 30
Y: 50 44 30 20 10

1. NEGATIVE CORRELATION

X: 20 30 40 60 80
Y: 40 30 22 15 10

2. NEGATIVE CORRELATION
X: 100 90 60 40 30
Y: 10 20 30 40 50
SIMPLE, PARTIAL, AND MULTIPLE CORRELATIONS:

The distinction between Simple, Partial, and Multiple Correlations based upon the number
variables are studied.

When only two variables are studied it is a problem of simple correlation.


When three or more variables are studied it is a problem either multiple or partial correlation.

In multiple correlation three or more variables are studied simultaneously.

For example. When we study the relationship between the yield of rice per acre and both the
amount of rainfall and the amount of fertilizers used, it is a problem of multiple correlation.

On the other hand, in partial correlation we recognize more than two variables, but consider
only two variables to be influencing each other the effect of other influencing variables being
kept constant.

EXAMPLE
In the rice problem taken above if we limit our correlation analysis of yield and rainfall to
periods when a certain average daily temperature existed it becomes a problem relating to
partial correlation only.
LINEAR AND NON-LINEAR (CURVILINEAR) CORRELATION

The distinction between linear and non-linear correlation is based upon the constancy of the
ratio of change between the variables. If the amount of change in one variable tends to bear
constant ratio to the amount of change in the other variable, then the correlation is said to be
linear.

EXAMPLE
observe the following two variables X and Y :

X: 10 20 30 40 50
Y: 70 140 210 280 350
The ratio of change between the two variables is the same If such variables are plotted on a
graph paper the plotted points would fall on straight line.

Correlation would be called non-linear or curvilinear if the amount of change in one variable
does not bear a constant ratio to the amount of change in the other variable.

EXAMPLE
If we double the amount of rainfall the production of rice or wheat, etc. would not necessarily
be doubled. It may be pointed out that in most of the practical situations, we find a non-linear
relationship between the variables. However, since techniques of analysis for measuring non-
linear correlation are far more complicated than those for linear correlation. we generally
assume that the relationship between the variables is of the linear type.
METHODS OF STUDYING CORRELATION
The various methods of ascertaining whether two variables are correlated or not are:
1. Scatter Diagram Method
2. Graphic Method
3. Karl Pearson's Coefficient of Correlation
4. Concurrent Deviation Method
5. Method of Least Squares.

Of these, the first two are based on the knowledge of diagrams and graphs. whereas the
others are the mathematical methods. Each of these methods shall be discussed in detail in the
following pages.

SCATTER DIAGRAM METHOD


the simplest device for ascertaining whether two variables are related is to prepare a dot chart
called a scatter diagram.

when this method is used the given data are plotted on graph paper in the form of dots.

i.e. for each pair of X and Y values, we put a dot and thus obtain as many points as the
number of observations. By looking at the scatter of the various points we can form an idea as to
whether the variables are related or not.

The more closely the points come to a straight line falling from the lower left-hand corner to the
upper right-hand corner, correlation is said to be perfectly positive (Le., r= + 1) (diagram I).
On the other hand, if all the points are lying on a straight line rising from the upper left-hand corner
to the lower right-hand corner of the diagram, correlation is said to be perfectly negative (Le.. r= - 1)
(diagram II).

If the plotted points fall in a narrow band there would be a high degree of correlation between the
variables
correlation shall be positive, if the points show a rising tendency from the lower left-hand corner to
the upper right-hand corner (diagram III)

and negative if the points show a declining tendency from the upper left-hand corner to the lower
right-hand corner of the diagram (diagram IV).

On the other hand, if the points are widely scattered over the diagram it indicates very little
relationship between the variables
—correlation shall be positive if the points are rising from the lower left-hand corner to the upper
right-hand corner (diagram V)

and negative if the points are running from the upper left-hand side to the lower right-hand side of
the diagram (diagram VI).

If the plotted points lie on a straight line parallel to X axis or in haphazard manner, It shows absence
of any relation between the variables. (r=0) In diagram VII.

Illustration 1. Given the following pairs of


X: 2 3 5 6 8 9
y : 6 5 7 8 12 11
(a) Make a scatter diagram.
(b) Is there any correlation between the variables X and Y?
(C) By graphic inspection, draw an estimating line*.
Solution.
By looking at the scatter diagram we can say that the variables X and Y are correlated.
Further, correlation is positive because the trend of the points is upward rising from the lower left-
hand corner to the upper right-hand corner of the diagram.
The diagram also degree of relation ship is higher

Merits and Limitations of the Method :


Merits:
Following are the merits of scatter diagram method :
• It is a simple and non-mathematical method of studying correlation between the variables.
As such it can be easily understood and a rough idea can very quickly be formed as to whether or not
the variables are related.
• It is not influenced by the size of extreme items whereas most of the mathematical methods of
finding correlation are influenced by extreme items.
• Making a scatter diagram usually is the first step in investigating the relationship between two
variables.
Limitations:
By applying this method, we can get an idea about the direction of correlation and also whether it is
high or low. But we cannot establish the exact degree of correlation between the variables as is
possible by applying the mathematical methods.
GRAPHIC METHOD
When this method is used the individual values of the two variables are plotted on the graph paper.
We thus obtain two curves, one for X variable and another for Y variable.
By examining the direction and closeness of the two curves, so drawn we can conclude whether or
not the variables are related.
If both the curves drawn on the graph are moving in the same direction (either upward or
downward) correlation is said to be positive.
On the other hand, if the curves are moving in the opposite direction’s correlation is said to be
negative.
The following example shall illustrate the method.
Illustration 2. From the following data ascertain whether the income and expenditure of the 100
workers of a factory are correlated:
KARL PEARSON'S COEFFICIENT OF CORRELATION
Of the several mathematical methods of measuring correlation, the Kul Pearson's method. popularly
known as Pearson's coefficient of correlation, is most widely used in practice.
The Pearson coefficient of correlation denoted by the symbol r.

This method is to be applied only where deviations of items are taken from aactual mean and not
from assumed mean.
The value of coefficient of correlation as obtained by the above formula shall always lie between
−1∧+1.
hen r =−1∨+1 it means there Is perfect positive correlation between the variables. When r =−1,
It means there is perfect negative ,correlation between the variables, When r= 0, it means there Is
no relationship between the two variables.
Example 3.
Calculate Karl Pearson's coefficient of correlation from the following data and interpret its value:
Examle:
Data of academic achievement, anxiety, and intelligence for
10subjects. Then compute the partial correlation between
the academic achievement and anxiety partialled out for
Intelligence.
Assumption of the Pearson’s Coefficient :
Karl Pearson's coefficient of correlation is based on the following assumptions :
• There is linear relationship between the variables, ie., when the two variables are plotted on a
scatter diagram a straight line will be formed by the points.
• The two variables under study are affected by many independent causes to form a normal
distribution. Variables like height, weight. price, demand, supply, etc., are affected by such forces
that a normal distribution is formed.
• There is a cause and effect relationship between the forces affecting, the distribution of the items
in the two services. If such a relationship is not formed between the variables i. e.. if the variables
are independent. (here cannot be any relation). For example. there is no relationship between
Income and height because the forces that affect these variables are not common.

Merits of the Pearsonian Coefficient


1. Among the mathematical methods used for measuring the degree of relationship.
Karl Pearson's method is most popular.
2. The correlation coefficient summarizes not only the degree of correlation but also the
direction. i.e., whether correlation is positive or negative.
3. This method can be used to make predictions about the variables under study.
4. This can be used when actual data or ranks are given.
5. This method has many algebraic properties for which the calculation of coefficient of
correlation, and other related factors, are made easy.

Limitations of the Pearsonian Coefficient


1. The correlation coefficient always assumes linear relationship regardless of the fact whether
that assumption is correct or not.
2. The value of the coefficient is very much affected by the extreme items.
3. As compared with other methods this method takes more time to compute the value of
correlation coefficient.
4. It is comparatively difficult to calculate as its calculation involves complicated algebraic
methods of calculations.
5. It is based on a large number of assumptions viz. linear relationship, cause and effect
relationship etc. which may not always hold well.
6. In comparison to the other methods, it takes much time to arrive at the results.
7. It is always advisable to compute it probable error while interpreting its results.

COEFFICIENT OF CORRELATION AND PROBABLE ERROR

The probable error of the coefficient of correlation helps in interpreting its value.
With the help of probable error, it is possible to determine the reliability of the value of the
coefficient in so far as it depends on the conditions of random sampling.
The probable error of the coefficient of correlation is obtained as follows:
2
1−r
P Er =0.6745
√N
where r is the coefficient of correlation and N the number of pairs of observations.
• If the value of r is less than the probable error there is no evidence of correlation, i.e., the value of
r is not at all significant.
• If the value of r is more than six times the probable error, the coefficient of correlation is
practically certain, i.e., the value of r is significant.
• By adding and subtracting the value of probable error from the coefficient of correlation we get
respectively the upper and lower limits within which coefficient of correlation in the population can
be expected to lie.
Symbolically,
ρ=r ± P Er
where ρ (rho) denotes correlation in the population.
Let us compute probable error, assuming a coefficient of correlation of 0.80 and a sample of 16
pairs, of items.
2
1−r 2 1−(0.8)
We will have P Er =0.6745 =0.6745 =0.06
√N √ 16
The limits of the correlation in the population would be ρ=r ± P Er =0.8 ± 0.06

Conditions for the Use of Probable Error


The measure of probable error can be properly used only when the following three conditions exist:
1. The data must approximate a normal frequency curve (bell-shaped curve).
2. The statistical measure for which the P.E. is computed must have been calculated from a
sample.
3. The sample must have been selected in an unbiased manner and the individual items must
be independent.

Illustration 15.
If r= 0.6 and N= 64, find out the probable error of the coefficient of correlation and determine the
limits for population r.
Properties of the correlation coefficient
1.
2.
3.
4.

You might also like