Business Statistics - Session 9
Business Statistics - Session 9
Business Statistics - Session 9
( xi x )( yi y ) for
sxy
n 1 samples
( xi x )( yi y ) for
xy populations
N
Correlation Coefficient
for for
samples populations
Where,
Sxy is sample covariance
xy is the population covariance
Correlation Coefficient
The coefficient can take on values between -1 and +1.
x y ( xi x ) ( yi y ) ( xi x )( yi y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944
Covariance and Correlation Coefficient
Example: Golfing Study
Sample Covariance
sxy ( x x )( y y ) 35.40
i i
7.08
n1 61
Sample Correlation Coefficient
sxy 7.08
rxy -.9631
sx sy (8.2192)(.8944)
Simple Linear Regression Model
The equation that describes how y is related to x and
an error term is called the regression model.
The simple linear regression model is:
y = b0 + b1x +e
where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation
Positive Linear Relationship
E(y)
Regression line
Intercept Slope b1
b0 is positive
x
Simple Linear Regression Equation
Negative Linear Relationship
E(y)
Slope b1
is negative
x
Simple Linear Regression Equation
No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0
x
Types of Relationships
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Estimated Simple Linear Regression Equation
ŷ b0 b1 x
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
Least Squares Method
• Slope for the Estimated Regression
Equation
b1 ( x x )( y y )
i i
(x x )
i
2
Least Squares Method
b0 y b1 x
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
Simple Linear Regression
Number of Number of
TV Ads Cars Sold
1 14
3 24
2 18
1 17
3 27
Estimated Regression Equation
Slope for the Estimated Regression Equation
b1 ( x x )( y y ) 20
i i
5
(x x )i
2
4
30
25
20
Cars Sold y = 5x + 10
15
10
0
0 1 2 3 4
TV Ads
Simple Linear Regression
Model
(continued)
Y
Observed Value
of Y for Xi
εi Slope = β
Intercept = α
Xi
X
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE
i
( y y ) 2
i
( ˆ
y y ) 2
i i
( y ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error (the difference
between the real values, and what we
predict from the line of best fit)
Coefficient of Determination
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares
Coefficient of Determination
where:
b1 = the slope of the estimated regression
equation yˆ b0 b1 x
Sample Correlation Coefficient
rxy (sign of b1 ) r 2
rxy = + .8772
rxy = +.9366
Simple Linear Regression Example:
Data
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Simple Linear Regression Example: Scatter Plot
450
400
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
450
400
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Simple Linear Regression Example: Making Predictions
• When using a regression model for prediction,
only predict within the relevant range of data
Relevant range
for interpolation
450
400
note: 0 r 1
2
Simple Linear Regression Example: Coefficient of
Determination, r2 in Excel
SSR 18934.9348
Regression Statistics
r 2
0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% of the variation in
Standard Error 41.33032
house prices is explained by
Observations 10
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
2. Suppose the estimating equation Y = 5 – 2X has been calculated for a set of data. Which of the
following is true for this situation?
(a) The Y-intercept of the line is 2.
(b) The slope of the line is negative.
(c) The line represents an inverse relationship.
(d) All of these.
(e) (b) and (c) but not (a).
Multiple Regression Model
The equation that describes how the
dependent variable y is related to the
independent variables x1, x2, . . . xp and an error
term is called the multiple regression model.
where:
b0, b1, b2, . . . , bp are the parameters, and
e is a random variable called the error term
Multiple Regression Model
y = b0 + b1x1 + b2x2 + e
where
y = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
Solving for the Estimates of b0, b1, b2
Least Squares
Input Data Output
x1 x2 y Computer b0 =
Package b1 =
4 78 24
for Solving b2 =
7 100 43
. . . Multiple
R2 =
. . . Regression
3 89 30 Problems etc.
Solving for the Estimates of b0, b1, b2
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 3.17394 6.15607 0.5156 0.61279
41 Experience 1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43
Estimated Regression Equation
b1 = 1. 404
b2 = 0.251
i
( y y ) 2
i
( ˆ
y y ) 2
i i
( y ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Multiple Coefficient of Determination
A B C D E F
32
33 ANOVA
34 df SS MS F Significance F
35 Regression 2 500.3285 250.1643 42.76013 2.32774E-07
36 Residual 17 99.45697 5.85041
37 Total 19 599.7855
38
SSR
SST
Multiple Coefficient of Determination
R2 = SSR/SST
R2 = 500.3285/599.7855 = .83418
Adjusted Multiple Coefficient of Determination
n1
Ra2 2
1 (1 R )
np1
20 1
R 1 (1 .834179)
2
a .814671
20 2 1
Where,
n is number of observations.
p is number of independent variables.
Adjusted Multiple Coefficient of Determination
A B C
23
24 SUMMARY OUTPUT
25
26 Regression Statistics
27 Multiple R 0.913334059
28 R Square 0.834179103
29 Adjusted R Square 0.814670762
30 Standard Error 2.418762076
31 Observations 20
32