PSM Inès
PSM Inès
PSM Inès
University of Colorado
Anschutz Medical Campus
1 Stratification
2 Matching
3 Inverse probability weighting (IPW)
Stratification, matching, and IPW under strong ignorability: alternatives to
estimating treatment effects
2
Important
3
Defining treatment effects
We defined causal effects as a comparison of potential outcomes for unit i
and for a group of N units, which we could measure in terms of expected
values, although we saw that this is more general: we could compare other
quantities (median, odds ratios, etc):
1 Average treatment effectP (ATE):1 PN
E [Y1i ] − E [Y0i ] = N1 N 1
PN
i=1 Y1i − N i=1 Y0i = N i=1 (Y1i − Y0i ) = E [Y1i − Y0i ]
2 Average treatment effect on the treated (ATET):PN PN
E [Y1i |Di = 1] − E [Y0i |Di = 1] = PN1 D i=1 Di Y1i −
PN
1
D i=1 Di Y0i =
i=1 i i=1 i
1
PN
i=1 Di (Y1i − Y0i ) = E [Y1i − Y0i |Di = 1]
PN
i=1
D i
3 Average treatment effect on the control (ATEC):
E [Y1i |Di =P
0] − E [Y0i |Di = 0] = E [Y1i − Y0i |Di = 0]
1 N
4 i=1 (1 − Di )Y1i
PN
i=1
(1−D ) i
The above
PN expressions look esoteric but it’s quite simple when you realize
that i=1 Di is the number of treated units (could denote it by N T instead)
and Di Y1i is Y1i for treated and zero for controls (same for Y0i )
For ATEC, all would be (1 − Di ) since we only want to include controls. So
PN C
i=1 (1 − Di ) = N and (1 − Di )Yi is Yi for controls and zero for treated
4
Defining treatment effects
We can also define the average treatment effect as a function of ATET and
ATEC:
NT NC
ATE = N ATET + N ATEC
We also saw that with randomization ATE = ATET = ATEC
We can define treatment effects conditioning for covariates. We saw that this
would give us estimates of causal effects in cases like conditional (block)
randomization; randomization is based on the value of a covariate (or more
than one)
In that case, equations are similar, but we need to condition for the vector X:
ATE = E [Y1i − Y0i |Xi ] = E [Y1i |Xi ] − E [Y0i |Xi ]
ATET = E [Y1i − Y0i |Xi , Di = 1] = E [Y1i |Xi , Di = 1] − E [Y0i |Xi , Di = 1]
ATEC = E [Y1i − Y0i |Xi , Di = 0] = E [Y1i |Xi , Di = 0] − E [Y0i |Xi , Di = 0]
5
Estimating treatment effects
6
Estimating treatment effects
Under which circumstances a simple comparison of observed outcomes could
give us estimates of treatment effects?
We can decompose the observed conditional difference
E [Yi |Xi , D1 = 1] − E [Yi |Xi , D1 = 0] into two pieces:
E [Y1i |Xi , Di = 1] − E [Y0i |Xi , Di = 1] + E [Y0i |Xi , Di = 1] − E [Y0i |Xi , Di = 0]
The first difference is the definition of ATET , the second one is the part we
called the selection bias
If the selection bias is zero, then a comparison of observed expected values is
an estimate of treatment effects since E [Yi |Xi , Di = 0] = E [Y0i |Xi , Di = 1]
The left-hand side is observed, the right-hand side is a potential outcome:
the outcome for the treated group had they not been treated, which we don’t
observe. But if the observed outcome in the control group is the same as the
unobserved outcome for the treated group had they not been treated, then
the selection bias is zero
In other words, the selection bias is zero when the control group provides a
good prediction of what would have happened to the treated had they not
been treated (and the treated is a counterfactual for the control), conditional
on X 7
Estimating treatment effects
We called the main assumption relating selection bias being zero as
ignorability of treatment assignment (or the conditional independence
assumption, CIA, selection on observables, no unmeasured confounder):
(Y0i , Y1i ) ⊥ Di |Xi
That is, treatment assignment, conditional on a vector of covariates Xi , is
independent of potential outcomes
Now, this leaves us in a good place. If we can argue that the selection bias is
zero, which is equivalent as saying that ignorability holds, all we have to do is
find a statistical method to find two conditional expectation functions using
observed data:
E [Yi |Xi , D1 = 1] and E [Yi |Xi , D1 = 0]
Whether we need to condition on the vector X depends on the data
generating process. Under simple randomization, D and X are independent,
which makes them mean independent as well. So we could find treatment
effects without having to condition on X. Under conditional randomization,
we do need to condition on X
And, of course, we need SUTVA
8
Big picture
9
Regression adjustment, parametric
This is the old fashioned, vanilla linear/OLS regression model:
Yi = β0 + β1 Di + X0i β + i or E [Yi |Xi , Di ] = β0 + β1 Di + X0i β
Can this model estimate causal effects even if they are identified?
Well, it depends
First, the model must be correctly specific. That includes the assumption of
homogeneous treatment effects. We could add interactions and try other
model fits, but we never have certainty that the model is correctly specified
(should we add quadratic terms, multiple interactions? Are effects additive
and separable?)
Second, the assumption that Yi ∼ N(0, σ 2 ) could be wrong. So could be
other assumptions about the model, like iid errors and homoskedasticity. We
saw a bunch of alternatives: logit, probit, and GLMs in general. The
regression model needs to consider characteristics of the data generating
process, which is very important for inference (standard errors)
Third, when we use observational data, we need to worry about the
assumption we haven’t mentioned yet: overlap. Implicitly, we extrapolate
information from controls to treated and vice versa
10
Regression adjustment, semiparametric
11
Observational data and overlap
12
The propensity score and overlap
We already saw that we can use the propensity score to diagnose overlap
problems since we define overlap using the propensity score
The propensity score is a summary score: if a group of control and a group of
treated units have the same propensity score, then they have the same
distribution of X, where X are the variables used to estimate the propensity
score (we will see more formally that it’s also a balancing score)
Once we diagnose the problem, we can use the propensity score to find a
solution for the overlap problem. All are versions of the same idea: restrict
estimation to the region where there is good overlap:
1 Stratification by the propensity score
2 Matching using the propensity score
3 Inverse probability weighting (IPW) – with some restrictions
IPW requires a bit more thought. If overlap doesn’t hold, IPW wouldn’t be
1
defined for some observations. Recall that IPWi = P(Di =1|X i)
if Di = 1 and
1
IPWi = 1−P(Di =1|Xi ) if Di = 0. So P(Di = 1|Xi ) can be 1 or 0 in the
denominator (or very close to 1 or 0)
13
Data
We will use a dataset from Gelman, Hill, and Vihtari (2020) (it’s a fantastic
book)
Data for children born in the 80s, 290 received special services early in life;
4091 are controls. Children were targeted because they were born
prematurely or had low birth weight (≤ 2500) and lived in an intervention city
Outcome is a cognitive score (ppvtr36)
desc ppvtr36 bwg hispanic black bmarr lths hs ltcoll workdur prenatal male ///
> first bw preterm momage dayskidh
14
Checking balance
We could compare means and standard deviations or any other metric as a
typical Table 1 of any paper
A convenient summary is to use standardized differences (normalized
differences) and variance ratios
¯ X¯0
Standardized difference: ∆X = √X1 −
2 2S0 +S1
S12
Variance ratio: S02
Rule of thumb is that a standardized difference greater than 0.25 means that
a regression model adjusting for covariates would be sensitive to model
specification (because of lack of balance, overlap)
No rule for variance ratios (ideally, close to 1). Differences in variances but
good balance is not a major problem
Note that the standardized difference is similar to the two-sample t-test:
¯ ¯
T = √ 2 X1 −X0 2 . Larger sample sizes would decrease T , but larger sample
S0 /N0 +S1 /N1
sizes would not make a difference in terms of model specification problems
15
Checking balance: -teffects-
In Stata, we can use the -teffects- command to check for balance, but it’s
unfortunate that the only way to use it is to actually estimate a propensity
score type of model or some other tool like matching
We don’t want to see the outcome when we try different models to check for
balance
There are some user-written commands out there but we will stick with
-teffects- but will run it quietly
We will also use a user-written command -coefplot- (type “findit coefplot” to
install it) to display standardized differences and variance ratio plots
16
Balance
Just once so you see what I mean. Note the pstolerance option. We are doing
inverse probability weighting, but just because we want to check balance; we
would get an error term because some propensity scores are close to zero
teffects ipw (ppvtr36) (treat bwg hispanic black bmarr lths hs ltcoll workdur prenatal ///
male first bw preterm momage dayskidh), pstolerance(1e-50)
tebalance summarize
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Weighted Raw Weighted
----------------+------------------------------------------------
bw | -2.983154 -1.864095 .2545874 .1523872
hispanic | -.3384636 -.1374384 .5046336 .7920301
black | .4620816 .1761349 1.235165 1.111173
bmarr | -.5327536 -.2559087 1.143919 1.113478
lths | .2789109 .1534797 1.171242 1.106074
hs | -.3002187 -.0130743 .8326499 .9966036
ltcoll | -.0712848 -.15487 .8901424 .7567162
workdur | -.0638442 -.0016862 1.031228 1.002031
prenatal | -.1927647 -.0887688 3.422968 1.948011
male | .0228178 .1038601 1.003106 .9970652
first | .1238746 .0217956 1.027565 1.009207
preterm | 2.48568 1.312396 .9106964 .3954169
momage | .1467263 .1121158 3.477228 2.829429
dayskidh | 1.18667 .867865 4.272903 2.632112
-----------------------------------------------------------------
mat M = r(table)
coefplot matrix(M[,1]), noci xline(0) xline(-0.25 0.25, lpattern(dash)) title("Standardized differences")
graph export stdif.png, replace
coefplot matrix(M[,3]), noci xline(1) title("Variance ratios")
graph export var.png, replace
17
Balance
Standardized differences
18
Balance
Variance ratios
19
Balance vs overlap
We have some balance problems between treated and controls in this dataset
that would suggest regression adjustment would rely on extrapolation
This likely translates into overlap problems, which can be due to one or more
variables
Next step is to check overlap using the propensity score since it’s the
definition of overlap
20
Overlap
Using the propensity score to check overlap
. logit treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
> first preterm momage dayskidh, nolog
------------------------------------------------------------------------------
treat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bw | -.0044176 .0003203 -13.79 0.000 -.0050455 -.0037898
hispanic | -1.008019 .3283637 -3.07 0.002 -1.6516 -.3644382
black | .3852354 .2395514 1.61 0.108 -.0842768 .8547475
bmarr | -.6796435 .2310288 -2.94 0.003 -1.132452 -.2268353
lths | -.2753011 .4288705 -0.64 0.521 -1.115872 .5652696
hs | -1.233805 .4032258 -3.06 0.002 -2.024113 -.4434971
ltcoll | -1.008354 .4227533 -2.39 0.017 -1.836936 -.1797731
workdur | .2018587 .2149724 0.94 0.348 -.2194794 .6231969
prenatal | -.6206795 .5963921 -1.04 0.298 -1.789587 .5482276
male | -.0599874 .1943709 -0.31 0.758 -.4409474 .3209725
first | .5414952 .2146661 2.52 0.012 .1207574 .9622329
preterm | .3745637 .0495365 7.56 0.000 .2774739 .4716535
momage | .1053551 .0278844 3.78 0.000 .0507026 .1600076
dayskidh | -.0527067 .0101636 -5.19 0.000 -.0726271 -.0327864
_cons | 6.591678 1.424636 4.63 0.000 3.799443 9.383913
------------------------------------------------------------------------------
Note: 4 failures and 0 successes completely determined.
21
Overlap
22
Distribution of propensity scores
Clearly, some controls have small changes of being treated
23
Check birth weight
Check the distribution in birth weight for treated and control, but standardize
difference suggest other variables are problematic, like not having prenatal
care
kdensity bw if treat ==1, saving(tkden.gph, replace)
kdensity bw if treat ==0, saving(ckden.gph, replace)
graph combine tkden.gph ckden.gph, col(1) xcommon xsize(10) ysize(10)
graph export den.png, replace
24
Balance versus overlap
25
Balance versus overlap
Standardized differences
26
Regression
To make things more concrete. The issue is that the model below is probably
not the best, and we haven’t even dealt with model specification or residual
analysis
. reg ppvtr36 treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
> first preterm momage dayskidh, robust
------------------------------------------------------------------------------
| Robust
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 11.59116 1.227529 9.44 0.000 9.184584 13.99775
bw | .0005624 .0005151 1.09 0.275 -.0004475 .0015723
hispanic | -13.74123 .729361 -18.84 0.000 -15.17115 -12.31131
black | -17.2159 .640063 -26.90 0.000 -18.47075 -15.96106
bmarr | 3.00947 .615206 4.89 0.000 1.803354 4.215586
lths | -14.59204 1.043248 -13.99 0.000 -16.63733 -12.54674
hs | -8.47883 .9122019 -9.29 0.000 -10.26721 -6.690451
ltcoll | -6.393914 .9666583 -6.61 0.000 -8.289055 -4.498773
workdur | 2.820732 .5621512 5.02 0.000 1.718631 3.922834
prenatal | 4.357118 2.219644 1.96 0.050 .0054882 8.708748
male | 1.170581 .5042633 2.32 0.020 .1819689 2.159193
first | 4.604955 .5528963 8.33 0.000 3.520998 5.688913
preterm | .0102207 .1408463 0.07 0.942 -.2659095 .2863509
momage | .167805 .0886327 1.89 0.058 -.0059601 .3415701
dayskidh | -.1446362 .0513661 -2.82 0.005 -.2453397 -.0439326
_cons | 87.15683 3.8615 22.57 0.000 79.58633 94.72733
------------------------------------------------------------------------------
27
Matching
One way to restrict the estimation to the region where there is overlap would
be to find, for each treated unit, control units that are “similar” in their
covariates
If we do something like that, then the resulting sample would have good
balance and overlap. The target of estimation will then be ATET. We are
finding control units that are similar to the treated units to predict (impute)
the counterfactual Y0i for each unit i with Di = 1
We just need to find a way to measure similar using multiple variables (easier
for few variables, like age and sex). It would make sense to use the propensity
score as a measure of similarity
Remember the main result of propensity scores: if a group of treated and
control observations have the same propensity score then they have the same
distribution of the covariates that entered into the estimation of the
propensity score
(We will see other ways of matching. The propensity score may not be the
best, actually)
28
Many ways of matching, many ways of getting
confused
Over the years, many variants of matching have been proposed. And there
are many decisions one can make with matching. Main issues:
1 Measure of similarity: propensity score, Malahanobis, other metrics based on
variance (“exact matching” could fit here)
2 Replacement or not: Once a treated unit is matched with a control unit, can
the control unit be a match for another treated unit? If no, then without
replacement. If yes, with replacement
3 Number of matches: 1 to 1, 1 to N or variable? If 1 to 1, usually called pair
matching. Nearest-neighbor matching with replacement is common: For each
treated unit, find the k closest observations (we define k a priori) in the
control group. A control can be used multiple times. In case of ties, use all
ties as matches
4 Caliper matching (radius): Use only controls with a distance smaller than a
number c, the “caliper” (tries to avoid bad matches)
Even more, we could also use different algorithms to perform the match:
greedy, optimal, “genetic” algorithms
We will focus on common ones and the ones that Stata implemented:
commands -teffects psmatch- and -teffects nnmatch-
29
Classic: 1 to 1 matching without replacement,
nearest neighbor (pair matching)
30
Matching
31
1:1 Matching, no replacement
32
1:1 Matching, no replacement
. psmatch2 treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
> first preterm momage dayskidh, n(1) logit out(ppvtr36) noreplacement
------------------------------------------------------------------------------
treat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bw | -.0044176 .0003203 -13.79 0.000 -.0050455 -.0037898
hispanic | -1.008019 .3283637 -3.07 0.002 -1.6516 -.3644382
black | .3852354 .2395514 1.61 0.108 -.0842768 .8547475
...
...
-----------------------------------------------------------------------------
Note: 4 failures and 0 successes completely determined.
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
ppvtr36 Unmatched | 92.1137901 86.0280498 6.08574029 1.21935202 4.99
ATT | 92.1137901 81.6837432 10.4300469 1.6297464 6.40
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
tab treat _weight
| psmatch2:
| weight of
| matched
| controls
treat | 1 | Total
-----------+-----------+----------
0 | 290 | 290
1 | 290 | 290
-----------+-----------+----------
Total | 580 | 580
33
1:1 Matching, no replacement
* Replicate
* Raw, unmatched
reg ppvtr36 treat
Source | SS df MS Number of obs = 4,381
-------------+---------------------------------- F(1, 4379) = 24.91
Model | 10029.5409 1 10029.5409 Prob > F = 0.0000
Residual | 1763142.34 4,379 402.63584 R-squared = 0.0057
-------------+---------------------------------- Adj R-squared = 0.0054
Total | 1773171.88 4,380 404.833763 Root MSE = 20.066
------------------------------------------------------------------------------
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 6.08574 1.219352 4.99 0.000 3.695193 8.476287
_cons | 86.02805 .3137195 274.22 0.000 85.413 86.6431
------------------------------------------------------------------------------
* ATE, matched
reg ppvtr36 treat if _weight== 1
------------------------------------------------------------------------------
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 10.43005 1.629746 6.40 0.000 7.2291 13.63099
_cons | 81.68374 1.152405 70.88 0.000 79.42033 83.94715
------------------------------------------------------------------------------
34
Check balance
* Check balance (make sure you understand this; just using teffects to calculate
* balance statistics)
qui teffects psmatch (ppvtr36) (treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh) if _weight ==1, nneighbor(1)
tebalance summarize
mat M = r(table)
coefplot matrix(M[,2]), noci xline(0) xline(-0.25 0.25, lpattern(dash)) title("Standardized differences after 1:1 matching")
graph export stdif_m.png, replace
35
Check balance
. sum bw hispanic black bmarr momage dayskidh if treat ==0 & _weight ==1
36
Matches are not identical
+------------------------------------------------------------------------------+
2. | bw | hispanic | black | bmarr | lths | hs | ltcoll | workdur | prenatal |
| 2240 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
|---------+----------+---------------------------------------------------------|
| male | first | preterm | momage | dayskidh | _pdif |
| 1 | 0 | 3 | 22 | 4 | .00027962 |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
3659. | bw | hispanic | black | bmarr | lths | hs | ltcoll | workdur | prenatal |
| 2182.95 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
|---------+----------+---------------------------------------------------------|
| male | first | preterm | momage | dayskidh | _pdif |
| 0 | 0 | 7 | 18 | 14 | . |
+------------------------------------------------------------------------------+
37
Caveats
There are probably thousands of studies that have used some version of the
above analysis, but there are many problems with this strategy
1 Standard errors do not take into account that the propensity score has been
estimated (bootstrapping was the usual solution, but turns out that it doesn’t
quite work)
2 Not very efficient since we discard thousand of potential controls (could do
1:N matching instead)
The above issues are important and remember that this is an iterative
process. Try different models, check balance. Choose the best approach that
balances data
Other important issues:
1 As we saw, the propensity score balances on average, but other distance
metrics could be better (i.e. Malahanobis)
2 We could for example mimic conditional randomization by using other
covariates to block
There are strong arguments against PSM. For example, subtle papers like
King and Nielsen (2019) “Why PSM Should Not Be Used for Matching”
38
The propensity score model
39
Ways to restrict to overlap region
40
Stata’s -teffects psmatch-
41
-teffects psmatch-
As we saw before, some propensity scores are too low. We can force teffects
to continue by increasing the tolerance value (pstolerance option)
* Error
teffects psmatch (ppvtr36) (treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh) , nneighbor(1)
there are 232 propensity scores less than 1.00e-05
treatment overlap assumption has been violated; use the osample() option to identify the
observations
r(459);
teffects psmatch (ppvtr36) (treat bw hispanic black bmarr lths hs ltcoll workdur prenatal
> male ///
> first preterm momage dayskidh) , nneighbor(1) pstolerance(1e-50)
42
-teffects psmatch-
Note that the balance is not good at all. The lack of overlap problem is
severe in this dataset
tebalance summarize
note: refitting the model using the generate() option
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Matched Raw Matched
----------------+------------------------------------------------
bw | -2.983154 -1.827003 .2545874 .054266
hispanic | -.3384636 -.6073668 .5046336 .1371829
black | .4620816 -.6451967 1.235165 .2753288
bmarr | -.5327536 .6834454 1.143919 .2889577
lths | .2789109 -.6155638 1.171242 .3297026
hs | -.3002187 1.148684 .8326499 .3932727
ltcoll | -.0712848 -.5759349 .8901424 .1280602
workdur | -.0638442 .7952702 1.031228 .291495
prenatal | -.1927647 .0930047 3.422968 .3300719
male | .0228178 1.035419 1.003106 .3176788
first | .1238746 -.926912 1.027565 .23562
preterm | 2.48568 .9517999 .9106964 .2013297
momage | .1467263 1.478464 3.477228 .7966889
dayskidh | 1.18667 3.104121 4.272903 1.269045
-----------------------------------------------------------------
43
-teffects psmatch-
Need to restrict to a region of overlap. Could use the propensity score,
although we could use other strategies, including trimming based on bw for
example
We will use the propensity score for now. Note that we drop some treated
units
capture drop ps
qui logit treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh, nolog
predict double ps if e(sample)
tabstat ps, by(treat) stats (N min max)
| keep
treat | 1 | Total
-----------+-----------+----------
0 | 572 | 572
1 | 283 | 283
-----------+-----------+----------
Total | 855 | 855
44
-teffects psmatch-
| keep
treat | 1 | Total
-----------+-----------+----------
0 | 572 | 572
1 | 283 | 283
-----------+-----------+----------
Total | 855 | 855
45
-teffects psmatch-
teffects psmatch (ppvtr36) (treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh) if keep ==1, nneighbor(1)
Treatment-effects estimation Number of obs = 855
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 1
------------------------------------------------------------------------------
| AI Robust
ppvtr36 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE |
treat |
(1 vs 0) | 8.045777 1.259327 6.39 0.000 5.577541 10.51401
------------------------------------------------------------------------------
tebalance summarize
...
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Matched Raw Matched
----------------+------------------------------------------------
bw | -1.394722 -.2597216 .6006019 .4301904
hispanic | -.0650134 -.0179936 .8470438 .9587302
black | .0698415 -.3134383 1.008199 .8622675
bmarr | -.141656 .3399337 .9847759 .9090142
lths | .0908845 -.0310145 1.033483 .9874854
hs | -.1289964 .2140523 .904266 1.097288
ltcoll | -.0278215 -.1618443 .9542544 .7026459
workdur | .005303 .0905067 1.00003 .9662639
prenatal | -.1223482 .0082554 1.980514 .9455728
male | -.076909 .0820815 1.0083 .9888843
first | .0311325 -.2686938 1.004758 .8823296
preterm | 1.061327 .0654548 .5034276 .3864448
momage | .100821 .2402564 2.852832 2.220045
dayskidh | .6435155 .483438 1.107973 .8175438
-----------------------------------------------------------------
. mat M = r(table)
46
-teffects psmatch-
mat M = r(table)
coefplot matrix(M[,2]), noci xline(0) xline(-0.25 0.25, lpattern(dash)) title("Standardiz ed differences - k neighbor matching")
graph export stdifk.png, replace
47
-teffects psmatch-
48
Different propensity score model
teffects psmatch (ppvtr36) (treat c.bw##(i.hispanic i.black c.momage c.dayskidh ) bmarr lths ///
hs ltcoll workdur prenatal male first preterm ) if keep ==1, nneighbor(1)
49
Different propensity score model
Much better balance. We should explore other models to detect overlap
region as well
As I said, this is an iterative process
50
Malahanobis
51
Malahanobis
teffects nnmatch (ppvtr36 bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh ) (treat), nneighbor(1)
tebalance summarize
tebalance summarize
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Matched Raw Matched
----------------+------------------------------------------------
bw | -2.983154 -2.368282 .2545874 .1437058
hispanic | -.3384636 -.2499663 .5046336 .6221699
black | .4620816 .192254 1.235165 1.136648
bmarr | -.5327536 -.1851049 1.143919 1.100719
lths | .2789109 .0854561 1.171242 1.063502
hs | -.3002187 -.0685629 .8326499 .9719941
ltcoll | -.0712848 -.019315 .8901424 .9694677
workdur | -.0638442 .018842 1.031228 .9904492
prenatal | -.1927647 .001895 3.422968 .9848435
male | .0228178 .0744934 1.003106 .995244
first | .1238746 .0809175 1.027565 1.018515
preterm | 2.48568 1.724164 .9106964 .5250509
momage | .1467263 .0970095 3.477228 1.684246
dayskidh | 1.18667 .6236645 4.272903 1.185832
-----------------------------------------------------------------
52
Malahanobis
| keep1
treat | 1 | Total
-----------+-----------+----------
0 | 1,030 | 1,030
1 | 290 | 290
-----------+-----------+----------
Total | 1,320 | 1,320
53
Malahanobis
teffects nnmatch (ppvtr36 bw hispanic black bmarr lths hs ltcoll workdur prenatal male first ///
preterm momage dayskidh ) (treat) if keep1 ==1, nneighbor(1)
tebalance summarize
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Matched Raw Matched
----------------+------------------------------------------------
bw | -2.008488 -1.272001 .724186 .5429634
hispanic | -.2688362 -.1046097 .5603778 .8118967
black | .257156 .0430653 1.067346 1.015228
bmarr | -.3324808 -.1080212 1.020362 1.017085
lths | .1930087 .0563812 1.096464 1.029957
hs | -.2949933 -.0561583 .8333505 .9728451
ltcoll | -.0573106 -.0039269 .909349 .9935394
workdur | .0242883 .0092262 .9942454 .9967681
prenatal | -.112584 .0045582 1.812482 .9744437
male | -.0736482 -.0424846 1.010007 1.004785
first | .0705086 .0091229 1.012425 1.001612
preterm | 1.649226 1.112493 .5965296 .465296
momage | .1905657 .0955412 3.280781 1.815981
dayskidh | .8504919 .4556944 1.585644 .9601355
-----------------------------------------------------------------
54
Where are we?
Many different ways of matching, many decisions that can affect results. No
clear answers on the best strategy
If you this about it, we could have restricted the estimation to the region of
overlap and then run a regression model
Knowledge about the subject is important when deciding what should be
carefully balanced. And all depends on the dataset. In this dataset, there is a
severe overlap problem, mostly birth weight
Many approaches are reasonable – we found similar results
With -teffects nnmatch- we could force an exact match with the
ematch(varlist) option on some variables (or by creating categorical variables)
But don’t lose track of big picture: the goal is to restrict estimation to a
region where comparisons are possible, and then make those
comparisons
Careful that R, SAS, Stata implement different versions of matching
55
Matching as an imputation and weighting scheme
56
Inverse Probability Weighting
We saw this in the intro class and homework. We use the propensity score as
an inverse weight
1
Assuming p̂(xi ) is the predicted propensity score, for ATE ipwi = p̂(xi ) if
1
Di = 1 and ipwi = 1−p̂(x i)
if Di = 0
We can also define weights to get ATET and ATEC
p̂(xi )
ATET: ipwi = 1 if Di = 1 and ipwi = 1−p̂(xi ) if Di = 0
1−p̂(xi )
ATEC: ipwi = p̂(xi ) if Di = 1 and ipwi = 1 if Di = 0
We did IPW by hand, but we can use -teffects ipw- or -teffects ipwra-,
although teffects runs stratified models
Again, we need to restrict the region of overlap somehow. For simplicity, we
will restrict weights between 1500 and 3000, although we may get large IPW
weights
57
IPW
qui logit treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh if keep1==1, nolog
predict double ps1 if e(sample)
gen ipw = 1/ps1 if treat==1
replace ipw = 1/(1-ps1) if treat==0
------------------------------------------------------------------------------
| Robust
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 8.996997 1.483815 6.06 0.000 6.086099 11.90789
_cons | 82.55631 1.013474 81.46 0.000 80.56811 84.54451
------------------------------------------------------------------------------
58
IPW - teffects
teffects ipw (ppvtr36) (treat bw hispanic black bmarr lths hs ltcoll ///
workdur prenatal male first preterm momage dayskidh) if keep1==1
59
IPW - teffects
Still not optimal, we could further restrict to overlap region based on ps or
try other models
. tebalance summarize
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Weighted Raw Weighted
----------------+------------------------------------------------
bw | -2.008488 -.6226793 .724186 .3448521
hispanic | -.2688362 -.0461161 .5603778 .9176752
black | .257156 -.0656871 1.067346 .9807627
bmarr | -.3324808 -.0437763 1.020362 1.009212
lths | .1930087 .0730392 1.096464 1.041372
hs | -.2949933 .0096082 .8333505 1.005327
ltcoll | -.0573106 -.1439304 .909349 .7750979
workdur | .0242883 .117083 .9942454 .9718457
prenatal | -.112584 -.0381501 1.812482 1.281361
male | -.0736482 .071718 1.010007 .9965046
first | .0705086 .0366211 1.012425 1.014521
preterm | 1.649226 .4140395 .5965296 .2586534
momage | .1905657 .1157435 3.280781 2.578142
dayskidh | .8504919 .4614788 1.585644 1.137282
-----------------------------------------------------------------
60
IPW - teffects - ATET
61
IPW - teffects - ATET
-----------------------------------------------------------------
|Standardized differences Variance ratio
| Raw Weighted Raw Weighted
----------------+------------------------------------------------
bw | -1.394722 .3778824 .6006019 .7275763
hispanic | -.0650134 -.0841342 .8470438 .8086912
black | .0698415 -.119931 1.008199 1.012672
bmarr | -.141656 -.055692 .9847759 .9892314
lths | .0908845 .0970404 1.033483 1.035198
hs | -.1289964 -.0933433 .904266 .9256661
ltcoll | -.0278215 -.079925 .9542544 .8764902
workdur | .005303 .246544 1.00003 .9791861
prenatal | -.1223482 -.1930732 1.980514 3.637795
male | -.076909 .1539902 1.0083 1.023679
first | .0311325 .3258814 1.004758 1.137434
preterm | 1.061327 -.4482802 .5034276 .2973894
momage | .100821 -.0030976 2.852832 2.613683
dayskidh | .6435155 .0642314 1.107973 .76524
-----------------------------------------------------------------
62
-teffects ipwra-
We didn’t control for covariates in the outcome model but we could, and
that would help us deal with the remaining imbalance (!)
That is, the outcome model would be
. reg ppvtr36 treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
> first preterm momage dayskidh [pw=ipw], robust
(sum of wgt is 2,191.88404154778)
------------------------------------------------------------------------------
| Robust
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 8.380397 1.870968 4.48 0.000 4.709961 12.05083
bw | -.0019433 .003049 -0.64 0.524 -.0079248 .0040381
..
This won’t exactly match -teffects ipwra- since Stata runs stratified models
Please check the code using the link below to see how you can match teffects
by hand. teffects has the correct SEs, though (estimates the propensity score
models and the oucome model simultaneously with GMM)
https://clas.ucdenver.edu/marcelo-perraillon/sites/default/
files/attached-files/matching_teffects_code_perraillon_0.do
63
-teffects ipwra-
We could and should try different model specifications as well (interactions,
etc)
Note that with IPWRA we could tolerate some imbalance because we also
control for covariates in the outcome model
. teffects ipwra (ppvtr36 bw hispanic black bmarr lths hs ltcoll ///
> workdur prenatal male first preterm momage dayskidh) ///
> (treat bw hispanic black bmarr lths hs ltcoll workdur prenatal
> ///
> male first preterm momage dayskidh) if keep1 ==1, ate
64
Strong ignorability
If ignorability and overlap hold (strong ignorability), it turns out that IPW is
just another way of estimating E [Yi |Di = 1, Xi ] and E [Yi |Di = 0, Xi ]
One can show that:
Yi Di
E [ p̂(X i)
] = E [Yi |Di = 1, Xi ] and
E [ Y1−
i (1−Di )
p̂(Xi ) ] = E [Yi |Di = 0, Xi ]
IPW is equivalent to the Horvitz and Thompson (1952) estimator for
handling nonrandom sampling in surveys, in which the weight is the inverse
probability of being in the sample
Note that for the above to work, p̂(Xi ) cannot be 0 or 1, which means that
overlap must hold
A similar approach can be used for ATET
So when overlap holds and assuming that model specification in regression
adjustment is correct, we shouldn’t expect to find much different between
IPW and regression adjustment, with bonus that IPW is can be doubly robust
Stata also has the command -teffects aipw- for “augmented” IPW that has
the doubly robust property (-teffects aipw-)
65
Stratification
66
Rubin (1977)
67
Digression
Stratification same as interacted model
bcuse bwght, clear
gen smoked = 0
replace smoked = 1 if cigs ==0
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.smoked | 8.889065 1.488571 5.97 0.000 5.968966 11.80917
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
quietly {
reg bwght i.smoked if white ==1
scalar beta1 = _b[1.smoked]
scalar N1 = e(N)
68
Stratification
Overlap only exists in one region, the same we found before!
capture drop ps
qui logit treat bw hispanic black bmarr lths hs ltcoll workdur prenatal male ///
first preterm momage dayskidh, nolog
predict double ps if e(sample)
------------------------------------------------------------------------------
| Robust
ppvtr36 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | 10.45539 1.477092 7.08 0.000 7.556261 13.35452
bw | -.0012955 .002095 -0.62 0.537 -.0054075 .0028165
hispanic | -15.29031 2.348434 -6.51 0.000 -19.89965 -10.68098
black | -18.06875 1.321689 -13.67 0.000 -20.66286 -15.47463
bmarr | 1.456162 1.353218 1.08 0.282 -1.199835 4.112159
lths | -12.13782 2.308979 -5.26 0.000 -16.66971 -7.605922
hs | -7.425949 2.09373 -3.55 0.000 -11.53537 -3.316531
ltcoll | -5.144063 2.107533 -2.44 0.015 -9.280573 -1.007553
workdur | 4.13442 1.225533 3.37 0.001 1.729034 6.539806
prenatal | .1939434 3.359773 0.06 0.954 -6.400372 6.788259
male | 1.24003 1.122819 1.10 0.270 -.9637574 3.443816
first | 4.195408 1.207222 3.48 0.001 1.825962 6.564853
preterm | .5278955 .2720722 1.94 0.053 -.0061078 1.061899
momage | -.119995 .1597463 -0.75 0.453 -.4335333 .1935433
dayskidh | -.2562127 .0816998 -3.14 0.002 -.4165671 -.0958583
_cons | 100.5914 8.450377 11.90 0.000 84.0056 117.1772
------------------------------------------------------------------------------
70
Final comment
Perhaps the most important question when thinking about matching, IPW,
stratification is: why not regression adjustment? What’s the problem with it?
Lack of overlap is the problem if in fact it exists - so check for it with the
tools you learned in these lectures
If strong ignorability holds (ignorability plus overlap or selection on
observables plus overlap), then think about matching, IPW, stratification as
alternatives to regression adjustment
Alternatives that help you understand your data better, and alternatives that
have interesting properties – like more robust to model mispecification
71