ARE107 L3 Detailed

ARE 107 Lecture Notes
Lecture 3
Dalia Ghanem, UC Davis
1
RECAP
2
Lecture 3 - Roadmap
So far...
I Predictive vs. Causal Modeling
I Randomized Control Trials
Causal Effect: E [Y1i − Y0i |Di = 1]
The conditional expectation given Di = 1 means we are taking the average over
individuals that received the treatment.
Now we want to turn to the linear regression model with

cross-sectional data and ask when the coefficients in the linear regression
model have a causal interpretation.
I Causal Effects and Selection Bias
I Causality and Randomized Control Trials (RCTs)
I Observational Data and Endogeneity Bias
Reading: Chapter 2 in Mastering Metrics (especially 2.3 onwards)
Note: (1) Recall that cross-sectional data means we have a random sample of observations. (2) By
when here we mean under what assumptions.
3
Linear Regression Model: Causal Effects and Selection Bias
We will start with a binary regressor, Di
Yi = α + βDi + ui
- Yi and Di are an outcome and a treatment variable (observable).

- ui is the residual or error term (unobservable).
Exercise 1. Define what the potential outcomes Y1i and Y0i for individual
i in this model.
Using the definitions of Y1i and Y0i , define the causal effect.
Note: Can you see the key problem in identifying the causal effect in the linear model?
4
In the linear model, β1 is the causal effect for an individual.
Exercise 2. Now decompose the difference in group means,
E [Yi |Di = 1] − E [Yi |Di = 0], into the average causal effect and selection bias
for the linear regression model.
5
In the linear model Yi = α + βDi + ui , the difference in group means can be
decomposed into the following
E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]

|{z} | {z }
Average Causal Effect Selection Bias
REMARKS:
I β is the Causal Effect: In the linear regression model, β is the causal
effect we are after.
I β is the be-all end-all : Note that β is not only the average causal
effect, but also the causal effect of D on Y for each individual i, i.e. the
causal effect is the same for everyone. This is a byproduct of the linear
model.
6
In the linear model Yi = α + βDi + ui , the difference in group means can be
decomposed into the following
E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]

|{z} | {z }
Average Causal Effect Selection Bias
REMARKS:
I Selection Bias: Consider the example where Y is income and D is
whether one has a college degree or not. We can think of ui as innate
ability, which is unobservable. Hence, selection bias here is the
difference in average innate ability between individuals that earn a
college degree and those who do not.
I Linear Regression of Y on D: Instead of comparing group means as
we did before, we can just run a regression of the outcome variable Y
on treatment status D to estimate the causal effect from an RCT.
7
Random Assignment and Exogeneity
Recall that random assignment eliminates selection bias, i.e.

E [ui |Di = 1] − E [ui |Di = 0], i.e.
E [ui |Di = 1] = E [ui |Di = 0]
which is implied by the exogeneity assumption,
E [ui |Di ] = 0.
Gist: Random assignment of a treatment is exogeneity in regression.
8
Now we are ready to move beyond a binary regressor.
Consider the example of earnings (Y ) and years of schooling (X )
Yi = α + βXi + ui
β is the causal effect of a unit change in X .

Exercise 3. What is the causal effect for individual i of changing schooling
from 11 to 12, 15 to 16, and 17 to 18?
9
Causal Effect and Selection Bias
Exercise 4. You are interested in learning the causal effect of completing

high school, decompose the difference between group means of those who
have a high school degree and those who only have 11 years of schooling,
E [Yi |Xi = 12] − E [Yi |Xi = 11] into the causal effect and selection bias?
10
Selection Bias = Endogeneity Bias
E [Yi |Xi = 12] − E [Yi |Xi = 11] = β + E [ui |Xi = 12] − E [ui |Xi = 11]
|{z} | {z }
Causal Effect Selection Bias
Recall that exogeneity is E [ui |Xi ]=0

- For our example, average ability for individuals is the same regardless of
schooling, which implies that E [ui |Xi = 12] − E [ui |Xi = 11] = 0.
- In General: exogeneity implies that there is no selection bias.
The opposite of exogeneity is endogeneity, i.e. E [ui |Xi ]6=0.

- For our example, endogeneity means that average ability depends on
years of schooling, which implies that E [ui |Xi = 12] 6= E [ui |Xi = 11], i.e.
there is selection bias
- In general: endogeneity bias is just another name for selection bias!
11
Exogeneity and Ceteris Paribus

Consider the linear regression model
Yi = α + βXi + ui
Question: How does exogeneity relate to ceteris paribus in this context?
Using the linear model, the conditional expectation of Yi given Xi is given
by the following
E [Yi |Xi ] = E [α + βXi + ui |Xi ] = α + βXi + E [ui |Xi ] (1)
E [ui |Xi ] may change as we change Xi in the above! And if this happens,
then E [Yi |Xi ] is changing partly because of X directly and indirectly
through E [ui |Xi ].
But to obtain β, the causal effect, we have to change Xi only while holding
E [ui |Xi ] constant (ceteris paribus).
Exogeneity, E [ui |Xi ] = 0, implies that E [ui |Xi ] does not change as Xi
changes, then
E [Yi |Xi ] = α + βXi + E [ui |Xi ] = α + βXi
| {z }
=0 held constant!
12
SUMMING UP AND PRACTICAL IMPLICATIONS
I Randomized Control Trial: In that case, exogeneity E [ui |Xi ] = 0 holds

due to random assignment.
⇒ No selection bias
Regression of Y on X estimates β, the causal effect.
I Observational Data: In that case, endogeneity E [ui |Xi ] 6= 0 is quite

likely.
⇒ Selection/Endogeneity bias exists, then Regression of Y on X does
not estimate β, the causal effect
E.g. Earnings and Schooling Example: how can average ability be the same
between college graduates and non-college graduates?
Next we want to get a better understanding of this bias...
13
ARE 107 Lecture Notes
Lecture 4
Dalia Ghanem, UC Davis
14
Lecture 4 - Roadmap
Last Lecture, we talked about the linear regression model with
cross-sectional data and asked when the coefficients in the linear
regression model have a causal interpretation.
I Causal Effects and Selection Bias X
I Causality and Randomized Control Trials (RCTs) X
I Observational Data and Endogeneity Bias
Question: How does this reflect on the OLS estimates?
- Formula for the OLS Estimator
- Omitted Variable Bias
- Application: Sales and Advertising
Reading: Chapter 2 in Mastering Metrics (especially 2.3 onwards)
15
Linear Regression Model: Observational Data
RECAP: What have we covered so far on causal modeling?
I Causal Effects and Randomized Control Trials

- Outcome Variable Y , Treatment Variable D
- Potential Outcomes Y0i and Y1i
- Causal Effect for Individual i: Y1i − Y0i
- Average Causal Effect and Selection Bias
E [Yi |Di = 1] − E [Yi |Di = 0] = E [Y1i − Y0i |Di = 1] + E [Y0i |Di = 1] − E [Y0i |Di = 0]
| {z } | {z } | {z }
Difference in Group Means Average Causal Effect Selection Bias
⇒ Naive comparisons of treated and untreated groups will generally not

equal to the average causal effect, e.g. comparing Americans with and
without health insurance
- Random Assignment Eliminates Selection Bias:
E [Yi |Di = 1] − E [Yi |Di = 0] = E [Y1i − Y0i |Di = 1]
| {z } | {z }
Difference in Group Means Average Causal Effect
⇒ Randomized Control Trials allow us to identify average causal

effects, e.g. RAND experiment and Oregon Lottery
16
RECAP: What have we covered so far on causal modeling?
I Linear Model, Causal Effects and Exogeneity

- Outcome Variable Y , Regressor X have a linear relationship
Yi = α + βXi + ui
- Causal Effect of a Unit Change for an Individual i: β

- Average Causal Effect and Selection Bias:
E [Yi |Xi = 12] − E [Yi |Xi = 11] = β + E [ui |Xi = 12] − E [ui |Xi = 11]
| {z } |{z} | {z }
Difference in Group Means Av. Causal Effect Selection Bias
- Exogeneity Eliminates Selection Bias:
E [Yi |Xi = 12] − E [Yi |Xi = 11] = β

| {z } |{z}
Difference in Group Means Av. Causal Effect
In practice, when we have a linear model, we just run a regression,

so now we want to know when the OLS coefficient is equal to β.
17
A Tale of Two β’s
1. β
Yi = α + βXi + ui
If we can hold ui constant (ceteris paribus) and just change Xi by 1

unit , β will be the change in Yi caused by a change in Xi .
2. βOLS
The β that minimized sum of squared residuals. From ARE 106, you
learned that
Cov (Xi , Yi )
βOLS =
Var (Xi )
βOLS is a measure of correlation between Y and X !
Note: Cov (Yi , Xi ) is the covariance of Yi and Xi . The mathematical definition for covariance is
Cov (Yi , Xi ) = E [(Yi − E [Yi ])(Xi − E [Xi ])].
18
Exercise 5. Decompose βOLS into β, the causal effect, and selection bias.
19
Now we have shown that the OLS coefficient on X consists of the average
causal effect as well as a selection bias term as follows
Cov (Xi , ui )
βOLS = β +
|{z} Var (Xi )
Average Causal Effect | {z }
Selection Bias
I If exogeneity holds, i.e. X is exogenous (e.g. randomly assigned)
E [ui |Xi ] = 0 ⇒ Cov (Xi , ui ) = 0

implies
Hence, βOLS = β. The OLS coefficient equals the average causal effect.
I If exogeneity does NOT hold, i.e. X is endogenous, then
Cov (Xi , ui ) 6= 0.
Hence, βOLS 6= β. The OLS coefficient only captures a correlation and

does not have a causal interpretation.
20
20
Why Distinguish Between Causation and Correlation?
Example
A manager has data on a product’s sales (1,000 units) and spending on TV,
newspaper, and radio advertising (1,000 dollars) in 200 markets where the
product is sold. The manager wants to decide whether to increase spending on
newspaper advertising or not. He/she regresses sales on newspaper advertising
and finds the following relationship.
Salesi = 12.351 + 0.055Newspaperi + errori

S.E . (0.621) (0.017)
Question: Is the coefficient on Newspaper significant?
21
Why Distinguish between Causation and Correlation?
Exercise 6. What does the magnitude of the coefficient on Newspaper

mean? Based on this magnitude, would you recommend that the manager
increases spending on Newspaper advertising? Why or Why Not?
22
Omitted Variable Bias
Consider a situation where sales (Y ) depended not only on newspaper

advertising (X ) but also on radio advertising (W ), such that
Yi = α + βXi + γWi + i
Exercise 7. Instead of running a regression of Y on X and W , you only

run a short regression on Y and X as above, we will refer to the coefficient
from this regression βSR . Decompose βSR into β and other terms.
23
Now we decomposed βSR as follows
Cov (Wi , Xi ) Cov (i , Xi )

βSR = β + γ +
|{z} Var (Xi ) Var (Xi )
Av. Caus. Effect | {z } | {z }
Omitted Variable Bias =0 if Cov (i , Xi ) = 0
Let us take a closer look at the omitted variable bias (OVB) formula
Cov (Wi , Xi )
OVB = γ
|{z} Var (Xi )
Coefficient on W in Y Eq. | {z }
?
Question: What does the second term in OVB look like?
24
Cov (Wi , Xi )
OVB = γ ×
|{z} Var (Xi )
Coefficient on W in Y Eq. | {z }
OLS coefficient of regressing W on X
The sign of the OVB is...

I positive if:
(1)
(2)
I negative if:
(1)
(2)
I zero if:
25
Now back to our example, recall the short regression of sales on newspaper
advertising

S.E . (0.621) (0.017)
Now let us also include radio advertising, the long regression
Salesi = 9.189 + 0.007Newspaperi + 0.199Radioi + errori

S.E . (0.628) (0.015) (0.022)
Question:
I Is the coefficient on newspaper significant in the long regression?
I Is the coefficient on Radio significant in the long regression?
26
Salesi = 9.189 + 0.007Newspaperi + 0.199Radioi + errori
Radioi = 15.888 + 0.241Newspaperi + errori
Exercise 8. Using the OVB formula, can you calculate the bias due to
omitting Radio from the short regression?
What is the difference between the coefficient on Newspaper from the short
vs. the long regression?
27
Yi = α + βXi + γWi + i
Cov (Wi , Xi )
OVB = βSR − βLR = γ
Var (Xi )
βSR is the OLS coefficient on X in the short regression of Y on X , βLR is the
OLS coefficient on X from the long regression of Y on W and X ,
REMARKS:
I The above shows that the omitted variable bias exactly measures the
bias due to omitting a variable.
I The omitted variable bias formula is particularly powerful when we

cannot observe or measure W . In this case, we can use the formula to
figure out the bias from omitted W .
28
Table: Sales and Advertising: Short and Long Regressions
Outcome variable:
Sales
(1) (2) (3) (4)
Newspaper 0.055∗∗∗ 0.044∗∗∗ 0.007 −0.001
(0.017) (0.010) (0.015) (0.006)
(0.001)
TV 0.047∗∗∗ 0.046∗∗∗
(0.003) (0.001)
Radio 0.199∗∗∗ 0.189∗∗∗

(0.022) (0.009)
Constant 12.351∗∗∗ 5.775∗∗∗ 9.189∗∗∗ 2.939∗∗∗

(0.621) (0.525) (0.628) (0.312)
Observations 200 200 200 200

R2 0.052 0.646 0.333 0.897
29

ARE107 L3 Detailed

Uploaded by

Copyright:

Available Formats

ARE107 L3 Detailed

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ARE107 L3 Detailed

Uploaded by

Copyright:

Available Formats

ARE 107 Lecture Notes

Dalia Ghanem, UC Davis

Now we want to turn to the linear regression model with

- Yi and Di are an outcome and a treatment variable (observable).

E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]

E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]

Random Assignment and Exogeneity

Recall that random assignment eliminates selection bias, i.e.

E [ui |Di = 1] = E [ui |Di = 0]

which is implied by the exogeneity assumption,

Gist: Random assignment of a treatment is exogeneity in regression.

β is the causal effect of a unit change in X .

Causal Effect and Selection Bias

Exercise 4. You are interested in learning the causal effect of completing

Selection Bias = Endogeneity Bias

Recall that exogeneity is E [ui |Xi ]=0

The opposite of exogeneity is endogeneity, i.e. E [ui |Xi ]6=0.

Exogeneity and Ceteris Paribus

SUMMING UP AND PRACTICAL IMPLICATIONS

I Randomized Control Trial: In that case, exogeneity E [ui |Xi ] = 0 holds

I Observational Data: In that case, endogeneity E [ui |Xi ] 6= 0 is quite

Next we want to get a better understanding of this bias...

Dalia Ghanem, UC Davis

RECAP: What have we covered so far on causal modeling?

I Causal Effects and Randomized Control Trials

⇒ Naive comparisons of treated and untreated groups will generally not

⇒ Randomized Control Trials allow us to identify average causal

RECAP: What have we covered so far on causal modeling?

I Linear Model, Causal Effects and Exogeneity

- Causal Effect of a Unit Change for an Individual i: β

- Exogeneity Eliminates Selection Bias:

E [Yi |Xi = 12] − E [Yi |Xi = 11] = β

In practice, when we have a linear model, we just run a regression,

A Tale of Two β’s

If we can hold ui constant (ceteris paribus) and just change Xi by 1

I If exogeneity holds, i.e. X is exogenous (e.g. randomly assigned)

E [ui |Xi ] = 0 ⇒ Cov (Xi , ui ) = 0

I If exogeneity does NOT hold, i.e. X is endogenous, then

Hence, βOLS 6= β. The OLS coefficient only captures a correlation and

Why Distinguish Between Causation and Correlation?

Salesi = 12.351 + 0.055Newspaperi + errori

Question: Is the coefficient on Newspaper significant?

Why Distinguish between Causation and Correlation?

Exercise 6. What does the magnitude of the coefficient on Newspaper

Omitted Variable Bias

Consider a situation where sales (Y ) depended not only on newspaper

Exercise 7. Instead of running a regression of Y on X and W , you only

Omitted Variable Bias

Now we decomposed βSR as follows

Cov (Wi , Xi ) Cov (i , Xi )

Question: What does the second term in OVB look like?

Omitted Variable Bias

The sign of the OVB is...

Omitted Variable Bias

Salesi = 12.351 + 0.055Newspaperi + errori

Now let us also include radio advertising, the long regression

Salesi = 9.189 + 0.007Newspaperi + 0.199Radioi + errori

I Is the coefficient on Radio significant in the long regression?

Omitted Variable Bias

Salesi = 12.351 + 0.055Newspaperi + errori

Cov (Wi , Xi ) Cov (i , Xi )