Regression With A Binary Dependent Variable: Michael Ash

Regression with a Binary Dependent Variable
Chapter 9
Michael Ash
CPPA
Lecture 22
Course Notes
I Endgame
I Take-home final
I Distributed Friday 19 May
I Due Tuesday 23 May (Paper or emailed PDF ok; no Word,
Excel, etc.)
I Problem Set 7
I Optional, worth up to 2 percentage points of extra credit
I Due Friday 19 May
I Regression with a Binary Dependent Variable
Binary Dependent Variables
I Outcome can be coded 1 or 0 (yes or no, approved or denied,

success or failure) Examples?
I Interpret the regression as modeling the probability that the
dependent variable equals one (Y = 1).
I Recall that for a binary variable, E (Y ) = Pr(Y = 1)
HMDA example
I Outcome: loan denial is coded 1, loan approval 0

I Key explanatory variable: black
I Other explanatory variables: P/I , credit history, LTV, etc.
Linear Probability Model (LPM)
Yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ui

Simply run the OLS regression with binary Y .
I β1 expresses the change in probability that Y = 1 associated
with a unit change in X1 .
I Ŷi expresses the probability that Yi = 1
Pr(Y = 1|X1 , X2 , . . . , Xk ) = β0 +β1 X1 +β2 X2 +· · ·+βk Xk = Ŷ

Shortcomings of the LPM
I “Nonconforming Predicted Probabilities” Probabilities must

logically be between 0 and 1, but the LPM can predict
probabilities outside this range.
I Heteroskedastic by construction (always use robust standard
errors)
Probit and Logit Regression
I Addresses nonconforming predicted probabilities in the LPM

I Basic strategy: bound predicted values between 0 and 1 by
transforming a linear index, β0 + β1 X1 + β2 X2 + · · · + βk Xk ,
which can range over (−∞, ∞) into something that ranges
over [0, 1]
I When the index is big and positive, Pr(Y = 1) → 1.
I When the index is big and negative, Pr(Y = 1) → 0.
I How to transform? Use a Cumulative Distribution Function.
Probit Regression
The CDF is the cumulative standard normal distribution, Φ.

The index β0 + β1 X1 + β2 X2 + · · · + βk Xk is treated as a z-score.
Pr(Y = 1|X1 , X2 , . . . , Xk ) = Φ(β0 + β1 X1 + β2 X2 + · · · + βk Xk )

Interpreting the results
Pr(Y = 1|X1 , X2 , . . . , Xk ) = Φ(β0 + β1 X1 + β2 X2 + · · · + βk Xk )
I βj positive (negative) means that an increase in X j increases

(decreases) the probability of Y = 1.
I βj reports how the index changes with a change in X , but the
index is only an input to the CDF.
I The size of βj is hard to interpret because the change in
probability for a change in Xj is non-linear, depends on all
X1 , X 2 , . . . , X k .
I Easiest approach to interpretation is computing the predicted
probability Ŷ for alternative values of X
I Same interpretation of standard errors, hypothesis tests, and
confidence intervals as with OLS
HMDA example
See Figure 9.2

\
Pr(deny = 1|P/I , black) = Φ( −2.26 + 2.74 P/I + 0.71 black)
(0.16) (0.44) (0.083)
I White applicant with P/I = 0.3: Pr(deny =\1|P/I , black) =

Φ(−2.26 + 2.74 × 0.3 + 0.71 × 0) = Φ(−1.44) = 7.5%
I Black applicant with P/I = 0.3: Pr(deny =\1|P/I , black) =
Φ(−2.26 + 2.74 × 0.3 + 0.71 × 1) = Φ(−0.71) = 23.3%
Logit or Logistic Regression
Logit, or logistic regression, uses a slightly different functional form

of the CDF (the logistic function) instead of the standard normal
CDF.
The coefficients of the index can look different, but the probability
results are usually very similar to the results from probit and from
the LPM.
Aside from the problem with non-conforming probabilities in the
LPM, the three models generate similar predicted probabilities.
Estimation of Logit and Probit Models
I OLS (and LPM, which is an application of OLS) has a

closed-form formula for β̂
I Logit and Probit require numerical methods to find β̂’s that
best fit the data.
Nonlinear Least Squares
One approach is to choose coefficients b 0 , b1 , . . . , bk that minimize

the sum of squares of how far the actual outcome, Y i , is from the
prediction, Φ(b0 + b1 X1i + · · · + bk Xki ).
X
n
[Yi − Φ(b0 + b1 X1i + · · · + bk Xki )]2
i =i
Maximum Likelihood Estimation
I An alternative approach is to choose coefficients b 0 , b1 , . . . , bk

that make the current sample, Y1 , . . . , Yn as likely as possible
to have occurred.
I For example, if you observe data {4, 6, 8}, the predicted mean
that would make this sample most likely to occur is µ̂ MLE = 6.
I Stata probit and logistic regression (logit) commands are
under Statistics → Binary Outcomes
Inference and Measures of Fit
I Standard errors, hypothesis tests, and confidence intervals are
exactly as in OLS, but they refer to the coefficients and must
be translated into probabilities by applying the appropriate
CDF.
fraction correctly predicted using one probability cut-off, e.g., 0.50,
and check the fraction correctly predicted, but. . .
sensitivity/specificity Choose a cut-off. Sensitivity is the fraction
of observed positive-outcomes that are correctly
classified. Specificity is the fraction of observed
negative outcomes that are correctly specified.
Pseudo-R 2 is analogous to the R 2
I Expresses the predictive quality of the model
with explanatory variables relative to the
predictive quality of the sample proportion p of
cases where Yi = 1
I Adjusts for adding extra regressors
Sensitivity and Specificity
1.00 0.75
Sensitivity/Specificity
0.25 0.50
0.00
0.00 0.25 0.50 0.75 1.00

Probability cutoff
Sensitivity Specificity
Reviewing the HMDA results (Table 9.2)
I LPM, logit, probit (minor differences)

I Four probit specifications
I Highly robust result: 6.0 to 8.4 percentage-point gap in
white-black denial rates, controlling for a wide range of other
explanatory variables.
I Internal Validity
I External Validity
Other LDV Models
Limited Dependent Variable (LDV)

I Count Data (discrete non-negative integers),
Y ∈ 0, 1, 2, . . . , k with k small. Poisson or negative binomial
regression.
I Ordered Responses, e.g., completed educational credentials.
Ordered logit or probit.
I Discrete Choice Data, e.g., mode of travel. Characteristics of
choice, chooser, and interaction. Multinomial logit or probit,
I Can sometimes convert to several binary problems.
I Censored and Truncated Regression Models. Tobit or sample
selection models.

Regression With A Binary Dependent Variable: Michael Ash

Uploaded by

Copyright:

Available Formats

Regression With A Binary Dependent Variable: Michael Ash

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression With A Binary Dependent Variable: Michael Ash

Uploaded by

Copyright:

Available Formats

What models are used to analyze binary outcomes like loan approvals?

What models are used to analyze binary outcomes like loan approvals?

What types of models are used for count and ordered data?

What types of models are used for count and ordered data?

Regression with a Binary Dependent Variable

I Outcome can be coded 1 or 0 (yes or no, approved or denied,

I Outcome: loan denial is coded 1, loan approval 0

Yi = β0 + β1 X1i + β2 X2i + · · · + βk Xki + ui

Pr(Y = 1|X1 , X2 , . . . , Xk ) = β0 +β1 X1 +β2 X2 +· · ·+βk Xk = Ŷ

I “Nonconforming Predicted Probabilities” Probabilities must

I Addresses nonconforming predicted probabilities in the LPM

The CDF is the cumulative standard normal distribution, Φ.

Pr(Y = 1|X1 , X2 , . . . , Xk ) = Φ(β0 + β1 X1 + β2 X2 + · · · + βk Xk )

Pr(Y = 1|X1 , X2 , . . . , Xk ) = Φ(β0 + β1 X1 + β2 X2 + · · · + βk Xk )

I βj positive (negative) means that an increase in X j increases

See Figure 9.2

I White applicant with P/I = 0.3: Pr(deny =\1|P/I , black) =

Logit, or logistic regression, uses a slightly different functional form

I OLS (and LPM, which is an application of OLS) has a

One approach is to choose coefficients b 0 , b1 , . . . , bk that minimize

I An alternative approach is to choose coefficients b 0 , b1 , . . . , bk

0.00 0.25 0.50 0.75 1.00

I LPM, logit, probit (minor differences)

Limited Dependent Variable (LDV)

You might also like