Regression Discontinuity Designs

Kosuke Imai

Harvard University


Fall 2019

Observational Studies
In many cases, we cannot randomize the treatment assignment
ethical constraints
logistical constraints
But, some important questions demand empirical evidence even
though we cannot conduct randomized experiments!
Designing observational studies find a setting where credible
causal inference is possible
Key = Knowledge of treatment assignment mechanism
Regression discontinuiety design (RD Design):
1 Sharp RD Design: treatment assignment is based on a
deterministic rule
2 Fuzzy RD Design: encouragement to receive treatment is based on
a deterministic rule
Originates from a study of the effect of scholarships on students’
career plans (Thistlethwaite and Campbell. 1960. J. of Educ. Psychol)
Regression Discontinuity Design
Idea: Find an arbitrary cutpoint c which determines the treatment
assignment such that Ti = 1{Xi ≥ c}
Close elections as RD design (Lee et al. 2004. Q. J. Econ):

Total Effect of Initial Win on Future ADA Scores: ␥
This figure plots ADA scores after the election at time t ⫹ 1 against the
Democrat vote share, time
Kosuke Imai (Harvard) t. EachDiscontinuity
Regression circle is theDesigns
average ADA score within 0.01
E(Yi (1) − Yi (0) | Xi = c)
Assumption: E(Yi (t) | Xi = x) is continuous in x for t = 0, 1
deterministic rather than stochastic treatment assignment
violation of the overlap assumption: 0 < Pr(Ti | Xi = x) < 1 for all x
RD design is all about extrapolation

Regression modeling:

E(Yi (1) | Xi = c) = lim E(Yi (1) | Xi = x) = lim E(Yi | Xi = x)

x↓c x↓c
E(Yi (0) | Xi = c) = lim E(Yi (0) | Xi = x) = lim E(Yi | Xi = x)
x↑c x↑c

Advantage: internal validity

Disadvantage: external validity
Make sure nothing else is going on at Xi = c
Analysis Methods under the RD Design
Simple linear regression within a window
How should we choose a window in a principled manner?
How should we relax the functional form assumption?
higher-order polynomial regression not robust to outliers

Local linear regression (same h for both sides): better behavior at

the boundary than other nonparametric regressions
Xi − c
(α̂+ , β̂+ ) = argmin 1{Xi > c}{Yi − α − (Xi − c)β} · K
α,β h
Xi − c
(α̂− , β̂− ) = argmin 1{Xi < c}{Yi − α − (Xi − c)β} · K
α,β h

Weighted regression with a kernel function of one’s choice:

uniform kernel: K (u) = 21 1{|u| < 1}
triangular kernel: K (u) = (1 − |u|)1{|u| < 1}

Optimal Bandwidth (Imbens and Kalyanaraman. 2012. Rev. Econ. Stud.)
Choose the bandwidth by minimizing the MSE:
MSE = E[{(α̂+ − α̂− ) − (α+ − α− )}2 | X]
= E{(α̂+ − α+ )2 | X} + E{(α̂− − α− )2 | X}
−2 · E(α̂+ − α+ | X) · E(α̂− − α− | X)
= (Bias+ − Bias− )2 + Variance+ + Variance−

Bias and variance of local linear regression estimator at the

Bias = E(m̂(0) | X) − m(0), Variance = V(m̂(0) | X)
where m(x) = E(Yi | Xi = x), m̂(x) = α̂(x), and
Xi − x
(α̂(x), β̂(x)) = argmin (Yi − α − β(Xi − x)) · K
α,β h

Refinements, e.g., bias correction (Calonico et al. 2014. Econometrica)

The “As-if Random” Assumption
RD design does NOT require the local randomization or “as-if
random” assumption within a window:
{Yi (1), Yi (0)}⊥
⊥Ti | c0 ≤ Xi ≤ c1
Close Elections Controversy (de la Cuesta and Imai. 2016. Annu. Rev. Political Sci)
Density Test of Sorting
J. McCrary (McCrary. 2008.
/ Journal of Econometrics J. Econom.)
142 (2008) 698–714

120 1.40
Frequency Count

Density Estimate
90 1.00

30 0.40
0 0.00
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Democratic Margin

1 vote share
Create relative to with
histogram cutoff:apopular elections
selected bintosize
the House of Representatives, 1900–1990.
2Fit local linear regression to bin midpoints to smooth the histogram
3Estimate the difference in the logged histogram height at the
estimates threshold
Placebo Test

Specification Test: Similarity of Historical Voting Patterns between Bare
Democrat and Republican Districts
The panel plots one time lagged ADA scores against the Democrat vote share.
Time t and t ⫺ 1 refer to congressional sessions. Each point is the average lagged
What is a good placebo?
ADA score within intervals of 0.01 in Democrat vote share. The continuous line is
from a fourth-order polynomial in vote share fitted separately for points above and
1 expected not to have any effect
below the 50 percent threshold. The dotted line is the 95 percent confidence
2 closely related to outcome of interest
Lagged outcome future cannot affect past
V.C. Sensitivity to Alternative Measures of Voting Records
Interpretation: failure
Our results to are
so far reject
on anull 6= the
particular nullindex,
voting is correct
ADA score. In this section we investigate whether our results
Kosuke Imai (Harvard)
generalize Regression
to other voting Discontinuity
scores. We find Designs Stat186/Gov2002
Fuzzy RD Design (Hahn et al. 2001. Econometrica)
Sharp regression discontinuity design: Ti = 1{Xi ≥ c}
What happens if we have noncompliance?
Forcing variable as an instrument: Zi = 1{Xi ≥ c}
Potential outcomes: Ti (z) and Yi (z, t)
1 Monotonicity: Ti (1) ≥ Ti (0)
2 Exclusion restriction: Yi (0, t) = Yi (1, t)
3 E(Ti (z) | Xi = x) and E(Yi (z, Ti (z)) | Xi = x) are continuous in x

E(Yi (1, Ti (1)) − Yi (0, Ti (0)) | Complier , Xi = c)

limx↓c E(Yi | Xi = x) − limx↑c E(Yi | Xi = x)
limx↓c E(Ti | Xi = x) − limx↑c E(Ti | Xi = x)
Disadvantage: external validity
Class Size Effect (Angrist and Lavy. 1999. Q. J. Econ)
Effect of class-size on student test scores
Maimonides’ Rule: Maximum class size = 40
f (z) =  z−1 
40 +1
Class Size


Maimonides Rule
Actual class size

0 50 100 150 200

Enrollment Count

Empirical Analysis
Yi : class average verbal test score
Ti : classsize
Window size: w
Construction of forcing variable:

 40 − Zi if 40 − w/2 ≤ Zi ≤ 40 + w/2

Xi = 80 − Zi if 80 − w/2 ≤ Zi ≤ 80 + w/2
 .. ..
. .

Linear models (cluster standard errors by schools):

Ti = δ1 + α1 × I{Xi ≥ 0} + β1 Xi + γ1 Xi × I{Xi ≥ 0} + 1i

Yi = δ2 + α2 × I{Xi ≥ 0} + β2 Xi + γ2 Xi × I{Xi ≥ 0} + 2i

where α̂1 = −7.90 (s.e. = 1.90) and α̂2 = −0.056 (s.e. = 2.08)
Two-stage least squares estimate: est. = 0.007 (s.e. = 0.261)
Interrupted Time Series Design

Time as the forcing variable

Possibility of multiple events at the same time
Must model time trend: seasonality, etc.
Effect of the “stand your ground” bill in Florida on homicide

(Humphreys, Gasparrini, and Wiebe. 2017. JAMA)

Use of other states difference-in-differences designs

to reduce the incidence of compound treatments as much as possible. To do that, we
he boundaries of four different administrative units: U.S. congressional districts, state

Geographical RD Design (Keele and Titiunik. 2015. Political Anal.)

ricts, state house districts, and school districts. We found that for many parts of the
ket boundary, the boundaries of at least one of these units overlapped perfectly with the
ket boundary. In other words, in various boundary segments, not only did the media
ange at the boundary, but so did the school and/or the legislative districts. This is not
rprising since, as we discussed above, the media market boundary in this area (and in
RD in two dimensions
e United States) follows county boundaries.
rlap between media and county boundaries means that we cannot escape the problem of
don’t use distance as forcing variable
treatments entirely, but we can minimize it by restricting our analysis to those segments
Example: media markets in Princeton, New Jersey

144 Luke J. Keele and Rocı́o Titiunik

Frankling Township School District

Montgomery Township School District

New York-Philadelphia Media Market Boundary

South Brunswick School District

Treated Area of Analysis

Hopewell Valley School District
Control Area of Analysis
Princeton School District

Cranbury Township School District

West Windsor-Plainsboro School District

Lawrence Township School District

East Windsor School District

Robbinsville Township School District

Milestone Township School Distric
New York Media Market
Philadelphia Media Market

Upper Freehold School District

undary between Philadelphia and New York City media markets. The dashed Fig.line
5 represents
Detail of the boundary between Philadelphia and New York City media markets. Area marked with
ry between the Philadelphia, PA, media market (located southwest of the boundary)
gray hash and
indicates the West Windsor-Plainsboro school district, which straddles the media market
City, NY, media marketImai (Harvard)
(located Regression
northeast of the boundary), which divides the Discontinuity
boundary. state of analysisDesigns
Empirical Stat186/Gov2002
is confined to the West Windsor-Plainsboro schoolFall 2019
Observational studies treatment assignment is not random

“Design” observational studies for credible causal inference
Sharp regression discontinuity (RD) designs:
deterministic (rather than stochastic) treatment assignment rule
continuity assumption no sorting
does not require “as-if random” assumption
limited external validity extrapolation required for generalization
incumbency effects controversy (Eggers et al. 2015. Am. J. Political Sci)
Importance of placebo tests
Fuzzy RD designs: noncompliance
Other RD designs: interrupted time series, geographical boundary
Suggested readings:
I MBENS AND L EMIEUX. (2008). J. of Econom

