Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression and Correlation
The basic idea of correlation analysis is to report the strength of the association between two variables.
The usual first step is to plot the data in a scatter diagram.
SCATTER DIAGRAM A chart that portrays the relationship between two variables.
INDEPENDENT VARIABLE A variable that provides the basis for estimation. It is the
predictor variable.
It is common practice to scale the dependent variable on the vertical or Y-axis and the independent
variable on the horizontal or X-axis.
r
( X X )(Y Y )
( n 1) sx.sy
Example: The Manager of a Company selects a random sample of 10 representatives and determines
the number of sales calls each representative made last month and the number of copies sold.
X
Y Sales
Y
X XX Y Copiers
Y
Rep Calls,x Sold,y
A 96 41 0 -4 0
B 40 41 -56 -4 224
C 104 51 8 6 48
D 128 60 32 15 480
E 164 61 68 16 1088
F 76 29 -20 -16 320
G 72 39 -24 -6 144
H 80 50 -16 5 -80
I 36 28 -60 -17 1020
J 84 43 -12 -2 24
K 180 70 84 25 2100
L 132 56 36 11 396 Using the above
120 45 24 0 0 formula
M
N 44 31 -52 -14 728
O 84 30 -12 -15 180
Totals 1440 675 0 0 6672
Mean 96 45
r
( X X )(Y Y )
(n 1) sx.sy
6672
=
(15 1)(42.76)(12.89)
= 0.865
How do we interpret a coefficient of 0.865? [A chart is written on the board]
First, it is positive, so we see there is a direct relationship between the number of sales calls and the
number of copiers sold. The value of 0.865is fairly close to 1.00, so we conclude that the association is
strong.
Interpretation: There is a strong positive relationship between
the sales calls(X) and the number of Copiers sold(Y).
THE COEFFICIENT OF DETERMINATION It is computed by squaring the
coefficient of correlation.
This
is a Proportion or a percent; we can say that 74.8 percent
of the total variation in the dep. Variable Y (number of
copiers sold )is explained, or accounted for, by the
variation in the Ind. Variable(number of sales calls).
Y/ = a + bx Regression equation.
Formula: s y. x
(Y Y / ) 2
n2
Sales sales copiers
/
19.9632
Yrep. 0.2608 X sold, Y Y/ =
cells,X Y -Y/ (Y - Y/)2
A 96 41 -445 16
B 40 41 30.3952
10.6048 112.461783
C 104 51 47.0864
3.9136 15.31626496
D 128 60 53.3456
6.6544 44.28103936
E 164 61 62.7344
-1.7344 3.00814336
F 76 29 39.784
-10.784 116.294656
G 72 39 38.7408
0.2592 0.06718464
H 80 50 40.8272
9.1728 84.14025984
I 36 28 29.352
-1.352 1.827904
J 84 43 41.8704
1.1296 1.27599616
K 180 70 66.9072
3.0928 9.56541184
L 132 56 54.3888
1.6112 2.59596544
M 120 45 51.2592
-6.2592 39.17758464
N 44 31 31.4384
-0.4384 0.19219456
-
O 84 30 41.8704 11.8704 140.9063962
Total 0 587.110784
s y.x
(Y Y / ) 2
(587.1108)
6.720
n2 15 2
6.720 is the typical error we make when we use regression
equation to estimate the dependent variable Y (copiers sold).