L05 ECO220 Print
L05 ECO220 Print
L05 ECO220 Print
Lecture 5
10 The WHO standard describes how children should grow if they receive proper nutrition and health
care. It is premised on the fact that the height distribution among children under age five who receive
adequate nutrition and health care has been shown to be similar in most ethnic groups (de Onis et al.
2006; WHO Multicentre Growth Reference Study Group 2006a).
5
1 5.9 4.6
2 3.5 5.8
3 7.2 6.9
4 3.6 5.8
… … … 2 4 6 8
Dosage (mg)
25 8.2 7.6
X-bar = 4.61 Y-bar = 5.66 Next, discuss how to
sx = 1.51 sy = 1.18 find the red OLS line
sxy = 1.09
8
3 4 5 6 7 8
Hours of Sleep
hat): 𝑦 = 𝑎 + 𝑏𝑥
• Residual: 𝑒 = 𝑦 − 𝑦
• OLS minimizes the SSE:
• 𝑆𝑆𝐸 = ∑ 𝑒
2 4 6 8
• 𝑆𝑆𝐸 = ∑ 𝑦 −𝑦 Dosage (mg)
• 𝑆𝑆𝐸 = ∑ 𝑦 − 𝑎 − 𝑏𝑥 .
𝑏= = = 0.48
.
Descriptive Interpretations
OLS line: 𝑦 = 62.83 + 4.73𝑥
(solid line) Interpretation in
2 – 3 sentences?
For U.S. retail firms from 2008 to
2018 tracked by Glassdoor, those
with employee satisfaction that is
1 point higher on a five-point
scale have customer satisfaction
that is 4.7 points higher on a 100
point index on average.
This is a modest difference because a 1 point higher employee
satisfaction is huge (e.g. going from 10th to 90th percentile!).
We cannot infer causality: we simply describe the pattern. 12
500
400
300
Quantity
200 Price
Supplied
100
0
100 150 200 250 Supply Shifters:
Price ($/tonne)
Demand Shifters D’
D
1
Price 2 Quantity 2 Q
P
(P) Supplied (QS) S
S’
3 4
Supply Shifters 3
Standardized
8 2
7 1
Sleep,
Sleep,
hours
6 0
5 -1
4
3 -2
2 4 6 8 -2 -1 0 1 2
Dosage, Dosage,
mg Standardized
Increasing the dosage by one
𝑏=𝑟 standard deviation results in
patients sleeping about an
extra 0.6 standard deviation
See pp. 177 – 179 more on average.
of the textbook. 15
17
standardized,
slope (𝑏) equals
2
coefficient of
correlation (𝑟)
0 and −1 ≤ 𝑟 ≤ 1.
Hence, a 1 s.d.
-2 increase in 𝑥
cannot be
-4 associated with
-4 -2 0 2 4 more than 1 s.d.
Father's height (standardized)
change in 𝑦.
18
Residuals (error): 𝑒 = 𝑦 − 𝑦
• Constant term mean 𝑦 = 3.44 + 0.48𝑥
of residuals is 0
𝑥 𝑦 𝑦 𝑒
• Estimated s.d. of 𝑖
(mg) (hrs) (hrs) (hrs)
residuals; Root MSE 1 5.9 4.6 6.3 -1.7
(Mean Square Error):
2 3.5 5.8 5.1 0.7
∑ 𝑒 −0 3 7.2 6.9 6.9 0.0
𝑠 =
𝑛−2 4 3.6 5.8 5.2 0.6
“standard
error of 𝑆𝑆𝐸 … … …
estimate” 𝑠 = 𝑛−2 25 8.2 7.6 7.4 0.2
.2 .4 .6
Sleep, hours
7
Density
6
5
4
3
0
2 4 6 8 -2 -1 0 1 2
Dosage, mg Residual (e)
2
Residual (hours)
Homoscedasticity: The
1 variance of the residual is
0 constant
-1 Homoscedasticity means that
-2 it makes sense to talk about
4 5 6 7 8 the standard deviation of the
Predicted Sleep (hours) residual
21
Homoscedasticity Heteroscedasticity
3
-1 0 1 2 3
2
Y1
Y2
1
0
0 2 4 6 8 10 0 2 4 6 8 10
X1 X2
Homoscedasticity Heteroscedasticity
-1-.5 0 .5 1 1.5
-1.5-1-.5 0 .5 1
Residual
Residual
4 5 6 7 8
Hours of Sleep
versus everything else
• Total sum of squares:
𝑆𝑆𝑇 = ∑ 𝑦 −𝑦 ∑
𝑠 =
• Regression sum of squares:
𝑆𝑆𝑅 = ∑ 𝑦 −𝑦
3
2 4 6 8
• Sum of squared errors: Dosage (mg)
23
y2, y2-hat
𝑆𝑆𝑇 =
∑ 𝑦 −𝑦
𝑆𝑆𝑅 =
If 𝑥 unrelated with 𝑦? If 𝑥 explains 𝑦 perfectly? ∑ 𝑦 −𝑦
𝑆𝑆𝐸 =
y4, y4-hat
y3, y3-hat
∑ 𝑦 −𝑦
24
25
Females, 2012
80 80
70 Turkey 70
60 60
50 50
40 40
30 30
20 40 60 80 20 40 60 80
Females, 2006 Females, 2006
Females, 2012
80 80
70 70
60
50 60
40 Turkey 50
30 Mexico
40
60 65 70 75 80 85 60 65 70 75 80 85
Males, 2012 Males, 2012
28
29
30
as a quantitative x
variable in OLS valid?
Note: Cumulative returns between the start of trading and 140 days (5 months)
subsequently. Right panel excludes observations w/ returns > 30 [outliers]. 33
• 𝑠 = 0.497 1.5
1
∑ 𝑦 −𝑦 .5
𝑠 = 0 1 2 3 4 5
𝑛−1 Lagged Target Rate
• 𝑥̅ = 2.453
• 𝑟 = 0.309
• 𝑠 = 1.395
𝑠 ∑ 𝑧 𝑧
• 𝑠 = 0.215 𝑟= =
𝑠 𝑠 𝑛−1
∑ 𝑥 − 𝑥̅ 𝑦 − 𝑦
𝑠 =
𝑛−1
34
CPIX
𝑠 𝑠 1.5
1
.5
• Residuals: 𝑒 =𝑦 −𝑦 0 1 2 3 4 5
– E.g. Jan. 2013: 𝑥 = 1, 𝑦 Lagged Target Rate
= 0.5, 𝑦 = 1.27,
𝑒 = −0.77 (= 0.5 − 1.27) • 𝑆𝑆𝑇 = ∑ 𝑦 −𝑦 = 24.7
• s.d. of resids: 𝑠 = 0.4748 • 𝑆𝑆𝑅 = ∑ 𝑦 −𝑦 = 2.4
∑ 𝑒 • 𝑆𝑆𝐸 = ∑ 𝑦 −𝑦 = 22.3
𝑠 = • 𝑅 = 𝑆𝑆𝑅/𝑆𝑆𝑇 = 0.10
𝑛−2
If standardized both variables, slope of regression line? 35
Recap
• Simple regression is an important tool, even in
cutting-edge empirical research
• Finding an OLS line, easy, interpreting it, hard
• Spent a lot of time on scatter and residuals
– The 𝑠 , 𝑟, or 𝑅 quantify scatter about an OLS line
(fit is opposite of scatter): all derive from ANOVA
– Addressed outliers and summary values
• An OLS line is not an ordinary line
36