ARE107 L3 Detailed

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

ARE 107 Lecture Notes

Lecture 3

Dalia Ghanem, UC Davis

1
RECAP

2
Lecture 3 - Roadmap
So far...
I Predictive vs. Causal Modeling
I Randomized Control Trials
Causal Effect: E [Y1i − Y0i |Di = 1]
The conditional expectation given Di = 1 means we are taking the average over
individuals that received the treatment.

Now we want to turn to the linear regression model with


cross-sectional data and ask when the coefficients in the linear regression
model have a causal interpretation.
I Causal Effects and Selection Bias
I Causality and Randomized Control Trials (RCTs)
I Observational Data and Endogeneity Bias
Reading: Chapter 2 in Mastering Metrics (especially 2.3 onwards)

Note: (1) Recall that cross-sectional data means we have a random sample of observations. (2) By
when here we mean under what assumptions.

3
Linear Regression Model: Causal Effects and Selection Bias
We will start with a binary regressor, Di

Yi = α + βDi + ui

- Yi and Di are an outcome and a treatment variable (observable).


- ui is the residual or error term (unobservable).
Exercise 1. Define what the potential outcomes Y1i and Y0i for individual
i in this model.

Using the definitions of Y1i and Y0i , define the causal effect.

Note: Can you see the key problem in identifying the causal effect in the linear model?

4
Linear Regression Model: Causal Effects and Selection Bias
In the linear model, β1 is the causal effect for an individual.
Exercise 2. Now decompose the difference in group means,
E [Yi |Di = 1] − E [Yi |Di = 0], into the average causal effect and selection bias
for the linear regression model.

5
Linear Regression Model: Causal Effects and Selection Bias
In the linear model Yi = α + βDi + ui , the difference in group means can be
decomposed into the following

E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]


|{z} | {z }
Average Causal Effect Selection Bias

REMARKS:
I β is the Causal Effect: In the linear regression model, β is the causal
effect we are after.
I β is the be-all end-all : Note that β is not only the average causal
effect, but also the causal effect of D on Y for each individual i, i.e. the
causal effect is the same for everyone. This is a byproduct of the linear
model.

6
Linear Regression Model: Causal Effects and Selection Bias
In the linear model Yi = α + βDi + ui , the difference in group means can be
decomposed into the following

E [Yi |Di = 1] − E [Yi |Di = 0] = β + E [ui |Di = 1] − E [ui |Di = 0]


|{z} | {z }
Average Causal Effect Selection Bias

REMARKS:
I Selection Bias: Consider the example where Y is income and D is
whether one has a college degree or not. We can think of ui as innate
ability, which is unobservable. Hence, selection bias here is the
difference in average innate ability between individuals that earn a
college degree and those who do not.
I Linear Regression of Y on D: Instead of comparing group means as
we did before, we can just run a regression of the outcome variable Y
on treatment status D to estimate the causal effect from an RCT.

7
Linear Regression Model: Causal Effects and Selection Bias

Random Assignment and Exogeneity

Recall that random assignment eliminates selection bias, i.e.


E [ui |Di = 1] − E [ui |Di = 0], i.e.

E [ui |Di = 1] = E [ui |Di = 0]

which is implied by the exogeneity assumption,

E [ui |Di ] = 0.

Gist: Random assignment of a treatment is exogeneity in regression.

8
Linear Regression Model: Causal Effects and Selection Bias
Now we are ready to move beyond a binary regressor.
Consider the example of earnings (Y ) and years of schooling (X )

Yi = α + βXi + ui

β is the causal effect of a unit change in X .


Exercise 3. What is the causal effect for individual i of changing schooling
from 11 to 12, 15 to 16, and 17 to 18?

9
Linear Regression Model: Causal Effects and Selection Bias

Causal Effect and Selection Bias

Exercise 4. You are interested in learning the causal effect of completing


high school, decompose the difference between group means of those who
have a high school degree and those who only have 11 years of schooling,
E [Yi |Xi = 12] − E [Yi |Xi = 11] into the causal effect and selection bias?

10
Linear Regression Model: Causal Effects and Selection Bias

Selection Bias = Endogeneity Bias

E [Yi |Xi = 12] − E [Yi |Xi = 11] = β + E [ui |Xi = 12] − E [ui |Xi = 11]
|{z} | {z }
Causal Effect Selection Bias

Recall that exogeneity is E [ui |Xi ]=0


- For our example, average ability for individuals is the same regardless of
schooling, which implies that E [ui |Xi = 12] − E [ui |Xi = 11] = 0.
- In General: exogeneity implies that there is no selection bias.

The opposite of exogeneity is endogeneity, i.e. E [ui |Xi ]6=0.


- For our example, endogeneity means that average ability depends on
years of schooling, which implies that E [ui |Xi = 12] 6= E [ui |Xi = 11], i.e.
there is selection bias
- In general: endogeneity bias is just another name for selection bias!

11
Linear Regression Model: Causal Effects and Selection Bias

Exogeneity and Ceteris Paribus


Consider the linear regression model
Yi = α + βXi + ui
Question: How does exogeneity relate to ceteris paribus in this context?
Using the linear model, the conditional expectation of Yi given Xi is given
by the following
E [Yi |Xi ] = E [α + βXi + ui |Xi ] = α + βXi + E [ui |Xi ] (1)
E [ui |Xi ] may change as we change Xi in the above! And if this happens,
then E [Yi |Xi ] is changing partly because of X directly and indirectly
through E [ui |Xi ].
But to obtain β, the causal effect, we have to change Xi only while holding
E [ui |Xi ] constant (ceteris paribus).

Exogeneity, E [ui |Xi ] = 0, implies that E [ui |Xi ] does not change as Xi
changes, then
E [Yi |Xi ] = α + βXi + E [ui |Xi ] = α + βXi
| {z }
=0 held constant!

12
Linear Regression Model: Causal Effects and Selection Bias

SUMMING UP AND PRACTICAL IMPLICATIONS

I Randomized Control Trial: In that case, exogeneity E [ui |Xi ] = 0 holds


due to random assignment.
⇒ No selection bias
Regression of Y on X estimates β, the causal effect.

I Observational Data: In that case, endogeneity E [ui |Xi ] 6= 0 is quite


likely.
⇒ Selection/Endogeneity bias exists, then Regression of Y on X does
not estimate β, the causal effect
E.g. Earnings and Schooling Example: how can average ability be the same
between college graduates and non-college graduates?

Next we want to get a better understanding of this bias...

13
ARE 107 Lecture Notes
Lecture 4

Dalia Ghanem, UC Davis

14
Lecture 4 - Roadmap
Last Lecture, we talked about the linear regression model with
cross-sectional data and asked when the coefficients in the linear
regression model have a causal interpretation.
I Causal Effects and Selection Bias X
I Causality and Randomized Control Trials (RCTs) X
I Observational Data and Endogeneity Bias
Question: How does this reflect on the OLS estimates?
- Formula for the OLS Estimator
- Omitted Variable Bias
- Application: Sales and Advertising
Reading: Chapter 2 in Mastering Metrics (especially 2.3 onwards)

15
Linear Regression Model: Observational Data

RECAP: What have we covered so far on causal modeling?

I Causal Effects and Randomized Control Trials


- Outcome Variable Y , Treatment Variable D
- Potential Outcomes Y0i and Y1i
- Causal Effect for Individual i: Y1i − Y0i
- Average Causal Effect and Selection Bias
E [Yi |Di = 1] − E [Yi |Di = 0] = E [Y1i − Y0i |Di = 1] + E [Y0i |Di = 1] − E [Y0i |Di = 0]
| {z } | {z } | {z }
Difference in Group Means Average Causal Effect Selection Bias

⇒ Naive comparisons of treated and untreated groups will generally not


equal to the average causal effect, e.g. comparing Americans with and
without health insurance
- Random Assignment Eliminates Selection Bias:
E [Yi |Di = 1] − E [Yi |Di = 0] = E [Y1i − Y0i |Di = 1]
| {z } | {z }
Difference in Group Means Average Causal Effect

⇒ Randomized Control Trials allow us to identify average causal


effects, e.g. RAND experiment and Oregon Lottery

16
Linear Regression Model: Observational Data

RECAP: What have we covered so far on causal modeling?

I Linear Model, Causal Effects and Exogeneity


- Outcome Variable Y , Regressor X have a linear relationship
Yi = α + βXi + ui

- Causal Effect of a Unit Change for an Individual i: β


- Average Causal Effect and Selection Bias:

E [Yi |Xi = 12] − E [Yi |Xi = 11] = β + E [ui |Xi = 12] − E [ui |Xi = 11]
| {z } |{z} | {z }
Difference in Group Means Av. Causal Effect Selection Bias

- Exogeneity Eliminates Selection Bias:

E [Yi |Xi = 12] − E [Yi |Xi = 11] = β


| {z } |{z}
Difference in Group Means Av. Causal Effect

In practice, when we have a linear model, we just run a regression,


so now we want to know when the OLS coefficient is equal to β.

17
Linear Regression Model: Observational Data

A Tale of Two β’s

1. β

Yi = α + βXi + ui

If we can hold ui constant (ceteris paribus) and just change Xi by 1


unit , β will be the change in Yi caused by a change in Xi .

2. βOLS
The β that minimized sum of squared residuals. From ARE 106, you
learned that
Cov (Xi , Yi )
βOLS =
Var (Xi )
βOLS is a measure of correlation between Y and X !
Note: Cov (Yi , Xi ) is the covariance of Yi and Xi . The mathematical definition for covariance is
Cov (Yi , Xi ) = E [(Yi − E [Yi ])(Xi − E [Xi ])].

18
Linear Regression Model: Observational Data
Exercise 5. Decompose βOLS into β, the causal effect, and selection bias.

19
Linear Regression Model: Observational Data
Now we have shown that the OLS coefficient on X consists of the average
causal effect as well as a selection bias term as follows
Cov (Xi , ui )
βOLS = β +
|{z} Var (Xi )
Average Causal Effect | {z }
Selection Bias

I If exogeneity holds, i.e. X is exogenous (e.g. randomly assigned)

E [ui |Xi ] = 0 ⇒ Cov (Xi , ui ) = 0


implies

Hence, βOLS = β. The OLS coefficient equals the average causal effect.

I If exogeneity does NOT hold, i.e. X is endogenous, then

Cov (Xi , ui ) 6= 0.

Hence, βOLS 6= β. The OLS coefficient only captures a correlation and


does not have a causal interpretation.

20
20
Linear Regression Model: Observational Data

Why Distinguish Between Causation and Correlation?

Example
A manager has data on a product’s sales (1,000 units) and spending on TV,
newspaper, and radio advertising (1,000 dollars) in 200 markets where the
product is sold. The manager wants to decide whether to increase spending on
newspaper advertising or not. He/she regresses sales on newspaper advertising
and finds the following relationship.

Salesi = 12.351 + 0.055Newspaperi + errori


S.E . (0.621) (0.017)

Question: Is the coefficient on Newspaper significant?

21
Linear Regression Model: Observational Data

Why Distinguish between Causation and Correlation?

Exercise 6. What does the magnitude of the coefficient on Newspaper


mean? Based on this magnitude, would you recommend that the manager
increases spending on Newspaper advertising? Why or Why Not?

22
Linear Regression Model: Observational Data

Omitted Variable Bias

Consider a situation where sales (Y ) depended not only on newspaper


advertising (X ) but also on radio advertising (W ), such that

Yi = α + βXi + γWi + i

Exercise 7. Instead of running a regression of Y on X and W , you only


run a short regression on Y and X as above, we will refer to the coefficient
from this regression βSR . Decompose βSR into β and other terms.

23
Linear Regression Model: Observational Data

Omitted Variable Bias

Now we decomposed βSR as follows

Cov (Wi , Xi ) Cov (i , Xi )


βSR = β + γ +
|{z} Var (Xi ) Var (Xi )
Av. Caus. Effect | {z } | {z }
Omitted Variable Bias =0 if Cov (i , Xi ) = 0

Let us take a closer look at the omitted variable bias (OVB) formula

Cov (Wi , Xi )
OVB = γ
|{z} Var (Xi )
Coefficient on W in Y Eq. | {z }
?

Question: What does the second term in OVB look like?

24
Linear Regression Model: Observational Data

Omitted Variable Bias

Cov (Wi , Xi )
OVB = γ ×
|{z} Var (Xi )
Coefficient on W in Y Eq. | {z }
OLS coefficient of regressing W on X

The sign of the OVB is...


I positive if:

(1)
(2)

I negative if:
(1)
(2)

I zero if:

25
Linear Regression Model: Observational Data

Omitted Variable Bias

Now back to our example, recall the short regression of sales on newspaper
advertising

Salesi = 12.351 + 0.055Newspaperi + errori


S.E . (0.621) (0.017)

Now let us also include radio advertising, the long regression

Salesi = 9.189 + 0.007Newspaperi + 0.199Radioi + errori


S.E . (0.628) (0.015) (0.022)

Question:
I Is the coefficient on newspaper significant in the long regression?

I Is the coefficient on Radio significant in the long regression?

26
Linear Regression Model: Observational Data

Omitted Variable Bias

Salesi = 12.351 + 0.055Newspaperi + errori

Salesi = 9.189 + 0.007Newspaperi + 0.199Radioi + errori

Radioi = 15.888 + 0.241Newspaperi + errori

Exercise 8. Using the OVB formula, can you calculate the bias due to
omitting Radio from the short regression?

What is the difference between the coefficient on Newspaper from the short
vs. the long regression?

27
Linear Regression Model: Observational Data

Omitted Variable Bias

Yi = α + βXi + γWi + i

Cov (Wi , Xi )
OVB = βSR − βLR = γ
Var (Xi )
βSR is the OLS coefficient on X in the short regression of Y on X , βLR is the
OLS coefficient on X from the long regression of Y on W and X ,

REMARKS:
I The above shows that the omitted variable bias exactly measures the
bias due to omitting a variable.

I The omitted variable bias formula is particularly powerful when we


cannot observe or measure W . In this case, we can use the formula to
figure out the bias from omitted W .

28
Table: Sales and Advertising: Short and Long Regressions

Outcome variable:
Sales
(1) (2) (3) (4)
Newspaper 0.055∗∗∗ 0.044∗∗∗ 0.007 −0.001
(0.017) (0.010) (0.015) (0.006)
(0.001)
TV 0.047∗∗∗ 0.046∗∗∗
(0.003) (0.001)

Radio 0.199∗∗∗ 0.189∗∗∗


(0.022) (0.009)

Constant 12.351∗∗∗ 5.775∗∗∗ 9.189∗∗∗ 2.939∗∗∗


(0.621) (0.525) (0.628) (0.312)

Observations 200 200 200 200


R2 0.052 0.646 0.333 0.897

29

You might also like