Linear Regression and Correlation

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 6

Simple Linear Regression and Correlation

What is Correlation Analysis?


Correlation analysis is the study of the relationship between variables.

CORRELATION ANALYSIS A group of techniques to measure the strength of the


association between two variables.

The basic idea of correlation analysis is to report the strength of the association between two variables.
The usual first step is to plot the data in a scatter diagram.

SCATTER DIAGRAM A chart that portrays the relationship between two variables.

DEPENDENT VARIABLE The variables that is being predicted or estimated.

INDEPENDENT VARIABLE A variable that provides the basis for estimation. It is the
predictor variable.

It is common practice to scale the dependent variable on the vertical or Y-axis and the independent
variable on the horizontal or X-axis.

The Coefficient of Correlation


Originated by Karl Pearson about 1900, the coefficient of correction describes the strength of the
relationship between two sets of interval-scaled or ratio-scaled variables. Designated r, it is often
referred to as Pearson's r and as the Pearson product-moment correlation coefficient. It can assume any
value from -1.00 to + 1.00 inclusive. A correlation coefficient of -1.00 or +1.00 indicates perfect
correlation.

COEFFICIENT OF CORRELATION A measure of the strength of the linear


relationship between two variables.

How is the value of the coefficient of correlation determined?


The coefficient of correlation can be computed from a computational formula based on the actual values of X and Y. The
formula is:
COEFFICIENT OF CORRELATION,

Where: n is the number of paired observations.

r
 ( X  X )(Y  Y )
( n  1) sx.sy
Example: The Manager of a Company selects a random sample of 10 representatives and determines
the number of sales calls each representative made last month and the number of copies sold.
X
Y Sales
Y
X XX Y Copiers
 Y  
Rep Calls,x Sold,y
A 96 41 0 -4 0
B 40 41 -56 -4 224
C 104 51 8 6 48
D 128 60 32 15 480
E 164 61 68 16 1088
F 76 29 -20 -16 320
G 72 39 -24 -6 144
H 80 50 -16 5 -80
I 36 28 -60 -17 1020
J 84 43 -12 -2 24
K 180 70 84 25 2100
L 132 56 36 11 396 Using the above
120 45 24 0 0 formula
M
N 44 31 -52 -14 728
O 84 30 -12 -15 180
Totals 1440 675 0 0 6672
Mean 96 45      
r
 ( X  X )(Y  Y )
(n  1) sx.sy
6672
=
(15  1)(42.76)(12.89)

= 0.865
How do we interpret a coefficient of 0.865? [A chart is written on the board]
First, it is positive, so we see there is a direct relationship between the number of sales calls and the
number of copiers sold. The value of 0.865is fairly close to 1.00, so we conclude that the association is
strong.
Interpretation: There is a strong positive relationship between
the sales calls(X) and the number of Copiers sold(Y).
THE COEFFICIENT OF DETERMINATION It is computed by squaring the
coefficient of correlation.
This
is a Proportion or a percent; we can say that 74.8 percent
of the total variation in the dep. Variable Y (number of
copiers sold )is explained, or accounted for, by the
variation in the Ind. Variable(number of sales calls).

R2 = ( r)2 = (.865)2 =.748.


Questions:
a) Compute the coefficient of correlation and interpret.
b) Determine the coefficient of determination and interpret.
c) Determine the regression equation and interpret.

Y/ = a + bx Regression equation.

Where Y =Dependent variable X = Independent variable


a= Y-intercept. b =Regression coefficient.

b=r(sy/sx) =(.865)(12.89)/42.76 =.2608

a= Y  bX =45- (.2608)96= 19.9632

Linear Equation, Y/ = 19.9632 + .2608X


Equation in words:
Number of Copiers sold = 19.9632 + .2608(Sales Calls).

Interpretation: a=19.9632 i) 19.9632 is the point through


which the regression line crosses the Y-axis.
ii) If X=0, i.e. no sales calls are
made, we can expect to sell almost 20 copiers.
b=.2608. If the sales call is increased by 1,
the copiers sold will be increased by .2608.
D) If somebody makes 15 calls, how many copiers we can expect
he/she will be able to sell?
Ans: We use our regression equation to estimate Y, the dependent
variable.
Y/ =19.9632 + .2608X
=19.9632 + .2608(15)
= 23.8752
=24 ?????
E) Determine standard error of the estimate.

Formula: s y. x 
 (Y  Y / ) 2
n2
Sales sales copiers
/
 19.9632
Yrep.  0.2608 X sold, Y Y/ =
cells,X Y -Y/ (Y - Y/)2
A 96 41 -445 16
B 40 41 30.3952
10.6048 112.461783
C 104 51 47.0864
3.9136 15.31626496
D 128 60 53.3456
6.6544 44.28103936
E 164 61 62.7344
-1.7344 3.00814336
F 76 29 39.784
-10.784 116.294656
G 72 39 38.7408
0.2592 0.06718464
H 80 50 40.8272
9.1728 84.14025984
I 36 28 29.352
-1.352 1.827904
J 84 43 41.8704
1.1296 1.27599616
K 180 70 66.9072
3.0928 9.56541184
L 132 56 54.3888
1.6112 2.59596544
M 120 45 51.2592
-6.2592 39.17758464
N 44 31 31.4384
-0.4384 0.19219456
-
O 84 30 41.8704 11.8704 140.9063962
 Total  0 587.110784

s y.x 
 (Y  Y / ) 2

(587.1108)
 6.720
n2 15  2
6.720 is the typical error we make when we use regression
equation to estimate the dependent variable Y (copiers sold).

You might also like