Chapter 13
Chapter 13
Chapter 13
SIMPLE LINEAR
REGRESSION
Simple Regression
Linear Regression
Definition
A (simple) regression model that gives a
straight-line relationship between two
variables is called a linear regression
model.
Scatter Diagram
Least Squares Line
Interpretation of a and b
Assumptions of the Regression Model
Definition
A plot of paired observations is called a
scatter diagram.
Definition
In the regression model y = A + Bx + ε,
A is called the y-intercept or constant term,
B is the slope, and
ε is the random error term.
The dependent and independent variables
are y and x, respectively.
Definition
In the model ŷ = a + bx,
a and b, which are calculated using
sample data,
are called the estimates of A and B,
respectively.
SSE e ( y yˆ )
2 2
SSxy
b and a y bx
SSxx
where
x y x
2
SS xy xy and SS xx x 2
n n
x 386 y 108
x x / n 386 / 7 55.1429
y y / n 108 / 7 15.4286
SS xy xy
x y
6403
(386)(108)
447.5714
n 7
x
2
(386)2
SS xx x 2 23,058 1772.8571
n 7
SSxy 447.5714
b .2525
SSxx 1772.8571
a y bx 15.4286 (.2525)(55.1429) 1.5050
ŷ = 1.5050 + .2525 x
Interpretation of b
The value of b in the regression model
gives the change in y (dependent variable)
due to change of one unit in x
(independent variable).
We can state that, on average, a $100 (or
$1) increase in income of a household will
increase the food expenditure by $25.25
(or $.2525).
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.8 Positive and negative linear
relationships between x and y.
Assumption 1:
The random error term Є has a mean
equal to zero for each x
Assumption 2:
The errors associated with different
observations are independent
Assumption 3:
For any given x, the distribution of errors is
normal
Assumption 4:
The distribution of population errors for
each x has the same (constant) standard
deviation, which is denoted σЄ
( y )2
SSyy y 2
n
y
2
(108)2
SSyy y 2 1792 125.7143
n 7
SSyy bSSxy 125.7143 .2525(447.5714)
se 1.5939
n 2 72
SST y 2
n
Note that this is the same formula that we
used to calculate SSyy.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.15 Total errors.
SSyy
and 0 ≤ r2 ≤ 1
b SSxy (.2525)(447.5714)
r
2
.90
SSyy 125.7143
Sampling Distribution of b
Estimation of B
Hypothesis Testing About B
Step 2:
is not known
Hence, we will use the t distribution to
make the test about B
b B .2525 0
t 6.662
sb .0379
Step 5:
The value of the test statistic t = 6.662
It is greater than the critical value of t = 3.365
It falls in the rejection region
Hence, we reject the null hypothesis
We conclude that x (income) determines y
(food expenditure) positively.
x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation between two
variables.
(b) Perfect negative linear correlation, r = -1
x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation between two
variables.
(c) No linear correlation, , r ≈ 0
x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation between variables.
SSxy
r
SSxx SSyy
447.5714
.95
(1772.8571)(125.7143)
Step 3:
Area in the right tail = .01
df = n – 2 = 7 – 2 = 5
The critical value of t = 3.365
n2
t r
1 r 2
72
.95 6.803
1 (.95) 2
Step 5:
The value of the test statistic t = 6.803
It is greater than the critical value of t=3.365
It falls in the rejection region
Hence, we reject the null hypothesis
We conclude that there is a positive
relationship between incomes and food
expenditures.
90
80
Monthly Auto Insurance Premium ($)
70
60
50
40
30
20
10
0
0 5 10 15 20 25 30
Driving Experience (years)
Table 13.5
b) x x / n 90 / 8 11.25
y y / n 474 / 8 59.25
( x )(y ) (90)(474)
SSxy xy 4739 593.5000
n 8
( x )2 (90)2
SSxx x 2 1396 383.5000
n 8
( y )2 (474)2
SSyy y 2 29,642 1557.5000
n 8
SSxy 593.5000
b 1.5476
SSxx 383.5000
a y bx 59.25 (1.5476)(11.25) 76.6605
yˆ 76.6605 1.547 x
f)
SS xy 593.5000
r .77
SS xx SS yy (383.5000)(1557.5000)
bSS xy ( 1.5476)( 593.5000)
r
2
.59
SS yy 1557.5000
h) SSyy bSSxy
se
n 2
1557.5000 (1.5476)(593.5000)
82
10.3199
se 10.3199
i) sb .5270
SSxx 383.5000
/ 2 .5 (.90 / 2) .05
df n 2 8 2 6
t 1.943
b tsb 1.5476 1.943(.5270)
1.5476 1.0240 2.57 to .52
j)
Step 1:
H0: B = 0 (B is not negative)
H1: B < 0 (B is negative)
Step 3:
Area in the left tail = α = .05
df = n – 2 = 8 – 2 = 6
The critical value of t is -1.943
b B 1.5476 0
t 2.937
sb .5270
Step 5:
The value of the test statistic t = -2.937
It falls in the rejection region
Hence, we reject the null hypothesis and
conclude that B is negative
The monthly auto insurance premium
decreases with an increase in years of
driving experience.
k)
Step 1:
H0: ρ = 0 (The linear correlation coefficient
is zero)
H1: ρ ≠ 0 (The linear correlation coefficient
is different from zero)
Step 3:
Area in each tail = .05/2 = .025
df = n – 2 = 8 – 2 = 6
The critical values of t are -2.447 and
2.447
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.23
n2
t r
1 r 2
82
.77 2.956
1 ( .77)2
Step 5:
The value of the test statistic t = -2.956
It falls in the rejection region
Hence, we reject the null hypothesis
We conclude that the linear correlation
coefficient between driving experience and
auto insurance premium is different from
zero.