EC220/221 Introduction To Econometrics: Canh Thien Dang
EC220/221 Introduction To Econometrics: Canh Thien Dang
EC220/221 Introduction To Econometrics: Canh Thien Dang
Michael Gmeiner
[email protected]
This lecture
• Application: Does class size affect student outcomes?
• Using regression as a data descriptive tool.
• The Conditional Expectation Function (CEF).
• Mechanics of regression (some mathematical derivations).
• Interpreting regression results.
1
Class-Size Effects: Small is Good, Big is Bad?
• What if we compare scores of students in countries with large and small average class sizes?
• Suppose we obtain data on class size in several countries, and data on scores for a standardised test
(e.g., the PISA test). (Programme for International Student Assessment).
• Countries with larger (or smaller) average class size might have other educational norms that harm (or
improve) test score averages.
3
Class-Size Effects: A Study in California
• Data: Primary school students in California school
districts (n = 420) for 1999.
• Variables:
• The outcome, 𝑌, is 5th grade test scores (Stanford-9
achievement test). Specifically, we observe the district
average for the sum of math and reading scores.
4
Possible Confounder
• Do smaller classes result in better outcomes for students?
• First, why do some districts have small classes and others large classes?
• Primarily due to income of local families (local taxes fund schools).
• Bias will result due to residential sorting of high-income families whose children may have
different test scores anyway.
Causal ?
Treatment Outcome of Interest
(Class Size) (Test Scores)
Confounder
(District Incomes) 5
Scatterplot
Stata command:
6
Some Observations About the Data
• Data are all primary school districts in California in 1999.
• We observe the whole population of interest. There are no
sampling issues like in lectures 4 and 5 when we talked about
standard errors of sample estimates.
• The outcome has high variance, even for the same class size.
• This means there are a lot of other factors causing different test
scores across districts.
• There is often a lot of randomness in relationships we study in
economics.
7
The Conditional Expectation Function
• We want to know the value of the test score “on average” across districts for a
given class size.
• Let 𝑌𝑖 be the test score in school district 𝑖, and 𝑋𝑖 the student-teacher ratio:
𝐸(𝑌𝑖 |𝑋𝑖 )
is the conditional expectation function (CEF);
𝐸(𝑌𝑖 |𝑋𝑖 = 𝑥)
is the value of the CEF for a particular value of 𝑋𝑖 , say 𝑥 = 18.
8
The Conditional Expectation Function - Example
9
The Conditional Expectation Function
10
The Data and the CEF
11
Simplifying the CEF
• Goal: summarize the relationship of 𝑋 and 𝑌 by creating a straight line that “best”
approximates the CEF.
• Choose 𝑎, 𝑏 such that 𝑎 + 𝑏𝑥 is “really close” to the CEF.
• How do we choose “the best” line that approximates the CEF?
760
720
𝒂 Slope is 𝒃
680
640
600
0 5 10 15 20 25 12
Least Squares Minimization
The “best” line minimises the expectation of the squared distance from the line to the CEF.
𝐸 𝐸 𝑌𝑖 𝑋𝑖 − 𝑎 − 𝑏𝑋𝑖 2
• The 𝑎 and 𝑏 that minimizes the above is the “best linear predictor” (BLP) of 𝑌 given 𝑋.
• The BLP of the CEF and the BLP of 𝑌 happen to be the same, i.e., the same 𝑎 and 𝑏 minimize both,
𝐸 𝐸 𝑌𝑖 𝑋𝑖 − 𝑎 − 𝑏𝑋𝑖 2 and 𝐸 𝑌𝑖 − 𝑎 − 𝑏𝑋𝑖 2
In practice, we don’t observe 𝐸 𝑌𝑖 𝑋𝑖 , but we do observe 𝑌𝑖 . The line that approximates the CEF is created
by solving the minimization problem in blue. This is the OLS regression line.
13
Data, the CEF, and the Regression Line
14
Residuals
min 𝐸 𝑌𝑖 − 𝑎 − 𝑏𝑋𝑖 2
𝑎,𝑏
• For any, not necessarily minimizing, 𝑎′ and 𝑏’, the residual is the difference between the point and the line.
• If 𝑋𝑖 = 24, 𝑌𝑖 = 679, 𝑎′ = 760, and 𝑏′ = −5,
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑌𝑖 − 𝑎′ − 𝑏′𝑋𝑖
= 679 − 760 − 24 −5
= 39 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑌𝑖 − 𝑎′ − 𝑏′𝑋𝑖
= 610 − 760 − 20 −5
= −50
Divide by −2𝑛.
𝜕𝑅𝑆𝑆 𝜕𝑅𝑆𝑆
= 𝐸 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 =0 = 𝐸 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 𝑋𝑖 = 0
𝜕𝑎 𝜕𝑏
After taking the derivatives, we use 𝛼 and 𝛽, rather than 𝑎 and 𝑏, to denote the population parameters that solve the minimisation (which is at the point where derivatives are 0). 16
Review?
B. 𝐸(𝑌𝑖 )𝐸(𝑋𝑖 )
D. Undefined
17
Solving for Estimates
Necessary conditions of the minimisation problem are:
1 𝐸 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 =0 2 𝐸 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 𝑋𝑖 = 0
𝐸 𝑌𝑖 𝑋𝑖 − 𝛽𝐸 𝑋𝑖2 = 𝐸 𝑌𝑖 − 𝛽𝐸 𝑋𝑖 𝐸[𝑋𝑖 ]
𝐸 𝑌𝑖 𝑋𝑖 − 𝛽𝐸 𝑋𝑖2 = 𝐸 𝑌𝑖 𝐸[𝑋𝑖 ] − 𝛽𝐸 𝑋𝑖 2
𝐸 𝑌𝑖 𝑋𝑖 − 𝐸 𝑌𝑖 𝐸 𝑋𝑖 = 𝛽[𝐸 𝑋𝑖2 − 𝐸 𝑋𝑖 2 ]
𝐶𝑜𝑣 𝑌𝑖 , 𝑋𝑖 = 𝛽𝑉𝑎𝑟(𝑋𝑖 )
𝑪𝒐𝒗 𝒀𝒊 , 𝑿𝒊
𝛽=
𝑽𝒂𝒓 𝑿𝒊
18
Solution
The regression line is the line 𝛼 + 𝛽𝑋 where 𝛼 and 𝛽 minimize:
𝐸 𝑌𝑖 − 𝑎 − 𝑏𝑋𝑖 2 .
The solution is given by:
𝐶𝑜𝑣(𝑌𝑖 , 𝑋𝑖 )
𝛽=
𝑉𝑎𝑟(𝑋𝑖 )
𝛼 = 𝐸 𝑌𝑖 − 𝛽𝐸(𝑋𝑖 )
More formal derivations are discussed in Wooldridge Section 2.2 and in LT.
This estimator is called Ordinary Least Squares or OLS.
19
Why Squared and not Absolute Value?
• This is not examinable.
• Squaring requires the line to fit outliers better. An outlier will be far from any
estimated line, and the squared deviation will be larger than the absolute
deviation. Thus, when minimising the sum of squared residuals, the minimising
line doesn’t allow for as big of outliers.
• It is easier to solve for the solution because the absolute value function is not
everywhere-differentiable.
• The least squares estimator has nice properties that we will continue to learn
throughout studying metrics.
• Nevertheless, there is an estimator that minimizes the sum of absolute residuals, called LAD.
It is beyond the scope of EC220.
20
Interpretation – 𝛼
𝑌𝑖 = α + 𝛽𝑋𝑖 + 𝑒𝑖
𝐶𝑜𝑣 𝑋,𝑌 −8.15932
In the California class size data, = = −2.28
𝑉𝑎𝑟 𝑋 3.579
and 𝐸 𝑌 − −2.28 𝐸 𝑋 = 654.16 − −2.28 19.64 = 698.9.
𝑖 = 698.9 − 2.28𝑋𝑖
𝑌
21
Interpretation – 𝛽 and 𝑒
𝑌𝑖 = α + 𝛽𝑋𝑖 + 𝑒𝑖
In the California class size data, the regression line is:
𝑌𝑖 = 698.9 − 2.28𝑋𝑖
• 𝛼 = 698.9 the vertical-axis intercept.
• 𝛽 = −2.28 the slope coefficient of the line.
• The change in 𝑌 when 𝑋 increases by 1.
22
Stata Output 𝑌𝑖 = α + 𝛽𝑋𝑖 + 𝑒𝑖
𝑌𝑖 = 698.9 − 2.28𝑋𝑖
------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------
Since the dataset is also our population of interest, we can take this as the population parameter, so we don't have to pay
attention to the standard error reported here. We will learn about how to use these standard errors in estimations with
samples next week.
23
Overview
The estimation equation (regression model)
𝑌𝑖 = α + 𝛽𝑋𝑖 + 𝑒𝑖
• 𝑌𝑖 is the “dependent variable” or “outcome”
• 𝑋𝑖 is the “independent variable”, a “regressor”, or a “covariate”
𝐶𝑜𝑣 𝑋,𝑌
• 𝛽= is the slope coefficient (often just “coefficient”)
𝑉𝑎𝑟 𝑋
• Analogously, because we have population data, estimates of 𝛼 and 𝛽 coincide with the population parameters.
• If we had a sample, rather than a population of data, we would use 𝛼ො and 𝛽መ to denote the estimates, and 𝑒Ƹ to denote the residual. “hats”
denote estimates that may differ from population values.
24
Data Points and the Regression Equation
𝑌𝑖 = α + 𝛽𝑋𝑖 + 𝑒𝑖
25
Predicted Values and Residuals
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝑒𝑖
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 . 𝑒2
This is the point on the regression line for 𝑋𝑖 .
If 𝑋𝑖 = 9, 𝛼 = 4, and 𝛽 = 0.2,
𝑖 = 𝛼 + 𝛽𝑋𝑖 = 4 + .2 9 = 5.8 𝑒1
𝑌
27
Predicted Values are Defined by the Line
651.02
28
Bivariate Regressions: Summary
• Bivariate relationships can be graphed in a scatterplot.
• This provides useful but sometimes overwhelming information.
• Linear regression is the best linear approximation to the data and to the CEF. It
neatly summarises the relationship between two variables in the regression slope
coefficient.
29