Correlation and Regression Analysis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

UNIT - 1

CORRELATION AND REGRESSION ANALYSIS


Introduction:
So far, we have learned about analysis of univariate distribution which
involves in study only one variable. When the data regarding two or more variables
available, we may have to study the relative variation of these variables. If variables shows
related variation then they are said to be correlated.

Meaning and definition of Correlation:


Correlation is a statistical device which helps in analyzing the co-variation of two or more
variables.
Or
Correlation is a technique which is used for measuring the closeness of relationship
between the variables.
Or
In simple words correlation means the relationship between the values of two or more
variables.
Example: Increase in rainfall leads to increase in the agricultural production,
Smoking and lung cancer, etc;

Uses of Correlation
1. Correlation helps in measuring the relationship between the variables.
2. Correlation analysis helps in finding the economic behaviour of a country.
3. Correlation helps in reducing the uncertainty of business.
4. Correlation helps to measure the effect of one variable on another variable.

Types of correlation
1. Positive correlation
2. Negative correlation
3. Non correlation
4. Perfect correlation

1. Positive correlation:
If the both variables vary or moves in the same direction then it is said to be positive or
direct correlation (if both the variables moves either increasing or decreasing direction
together)
Example:
A. Income and Expenditure of different families.
B. Increase in rainfall leads to increase in production of food grains.

2. Negative correlation:
If both the variables vary in the opposite direction then they are said to be negative or
inverse or indirect correlation (if one variable increases then other variable decreases or
showing opposite direction)
Example;
A. Prices of commodity increases and demand of commodity increases.
B. Sale of ice cream/woolen items and changes in atmosphere temperature.
3. Non-correlation:
If two variables do not show the associated variation then they are said to be noncorrelated.
Example:
A. Height of a person the Height of a tree.
B. Sales of umbrellas and sales of books.

4. Perfect correlation:
If both the variables vary in the same proportion or ratio then it is said to be perfect
correlation. The perfect correlation may be either positive or negative perfect correlation
A. Positive perfect correlation (linear correlation)
If the changes in one variable leads to changes in another variable in the same proportion
or ratio.
Example: Land (in acres) 1 2 3 4 5
Rice production (in bags) 10 20 30 40 50
B. Negative perfect correlation (Non-linear correlation)
If the changes of one variable does not lead to changes in another variable in the same
proportion or ratio.
Example:
If double the amount of rainfall, the production of food grains would not be necessarily
double.

Methods of studying correlation:


The various methods used for ascertaining the correlation is as follows:
1. Scatter diagram method.
2. Graphical method.
3. Karl Pearson’s coefficient of correlation method.
4. Spearman's coefficient of rank correlation method.
5. Concurrent deviation method.
6. Methods of least squares.

Karl Pearson’s coefficient of correlation:


Karl Pearson is a great British statistician; he formulated the formula for calculation
of coefficient of correlation. It is denoted by "r". It is popularly known as Pearson’s
coefficient of correlation.
Coefficient of correlation means measuring the exact degree of correlation existed between
two variables in one figure. It indicates the magnitude of correlation and its direction.

Karl Pearson’s coefficient of correlation can be computed by using any one of the
following methods:
A. By taking Actual mean
B. By taking Assumed mean
A. By taking Actual Mean:

r=
∑xy
√ x2 × y2
Whereas:
r = Co-efficient of correlation
Σ = sum
xy = Product of x and y
x = x- ⃛x
y = y- ⃛y

B. By taking Assumed mean


∑ ( dx)(dy )
r= ∑ dxdy− n

√ ( ∑ dx
) √ ( ∑ dy
)
2 2
2 2
∑dx − n
× ∑d y − n

Whereas:
dx = x-A
dy = y-A
dxdy = product of dxdy
n= number of variables

Probable error (PE):


Probable error of the coefficient of correlation helps in interpreting the values of “r”.
Therefore, with the help of probable error it is possible to determine the reliability of the
value of coefficient of correlation.
Probable error of coefficient of correlation can be obtained by using the following
formula:
1−r 2
PE=0.6745×
√n
Whereas,
0.6745 = standard error for all problems
r = Coefficient of correlation
n = number of variables

Interpretation of Probable Error:


1. If the value of “r” is less than P.E, then there is no evidence of correlation i.e., the value
of “r” is not at all significant (r<PE).
2. If the value of “r” is more than six times of PE, then the existence of correlation is
practically certain i.e., the value of “r” is significant (r>6 x PE).

You might also like