Implementation of A Double-Hurdle Model: 13, Number 4, Pp. 776-794
Implementation of A Double-Hurdle Model: 13, Number 4, Pp. 776-794
Implementation of A Double-Hurdle Model: 13, Number 4, Pp. 776-794
Abstract. Corner solution responses are frequently observed in the social sciences.
One common approach to model phenomena that give rise to corner solution re-
sponses is to use the tobit model. If the decision to participate in the market
is decoupled from the consumption amount decision, then the tobit model is in-
appropriate. In these cases, the double-hurdle model presented in Cragg (1971,
Econometrica 39: 829–844) is an appropriate alternative to the tobit model. In
this article, I introduce a command, dblhurdle, that fits the double-hurdle model.
The implementation allows the errors of the participation decision and the amount
decision to be correlated. The capabilities of predict after dblhurdle are also dis-
cussed.
Keywords: st0317, dblhurdle, tobit, Heckman, Cragg, double hurdle, hurdle
1 Introduction
Double-hurdle models are used with dependent variables that take on the endpoints
of an interval with positive probability and that are continuously distributed over the
interior of the interval. For example, you observe the amount of alcohol individuals
consume over a fixed period of time. The distribution of the amounts will be roughly
continuous over positive values, but there will be a “pile up” at zero, which is the corner
solution to the consumption problem the individuals face; no individual can consume a
negative amount of alcohol.
One common approach to modeling such situations is to use the tobit model. Sup-
pose the dependent variable y is continuous over positive values, but Pr(y = 0) > 0 and
Pr(y < 0) = 0. Letting Φ () denote a standard normal cumulative distribution function
(CDF) and φ () denote a standard normal density function, recall that the log-likelihood
function for the tobit model is
xi β
yi − x i β
log(L) = log 1 − Φ + log φ − log (σ)
y =0
σ y >0
σ
i i
The functional form of the tobit model imposes a restriction on the underlying
stochastic process: xi β parameterizes both the conditional probability that yi = 0 and
the conditional density associated with the magnitude of yi whenever yi > 0. Thus
the tobit model cannot properly handle the situation where the effect of a covariate on
the probability of participation Pr(yi > 0) and the effect of the same covariate on the
amount of participation have different signs. For example, it might be the case that
c 2013 StataCorp LP st0317
B. Garcı́a 777
Letting Ψ (x, y, ρ) denote the CDF of a bivariate normal with correlation ρ, the log-
likelihood function for the double-hurdle model is
xi β
log(L) = log 1 − Φ zi γ, ,ρ
yi =0
σ
!
zi γ + σρ (yi − xi β) yi − xi β
+ log Φ − log [σ] + log φ
yi >0 1 − ρ2 σ
The double-hurdle model can be reduced to the tobit model by setting ρ = 0 and taking
the limit zi γ → +∞.
3.1 Syntax
dblhurdle depvar indepvars if in weight , {ll(#) | ul(#)}
peq(varlist, noconstant ) ptobit noconstant constraints(numlist)
vce(vcetype) level(#) correlation display options maximize options
indepvars and peq() may contain factor variables; see [U] 11.4.3 Factor variables.
3.2 Options
ll(#) indicates a lower corner. Observations with depvar ≤ # are considered at the
corner. One of ul(#) or ll(#) must be specified.
ul(#) indicates an upper corner. Observations with depvar ≥ # are considered at the
corner. One of ul(#) or ll(#) must be specified.
peq(varlist, noconstant ) specifies the set of regressors for the participation equa-
tion if these are different from those of the quantity equation.
ptobit specifies that the participation equation should consist of a constant only. This
option cannot be specified with the peq() option.
noconstant; see [R] estimation options.
constraints(numlist) is used to specify any constraints the researcher may want to
impose on the model.
vce(vcetype) specifies the type of standard error reported. vcetype may be oim (default),
robust, or cluster clustvar.
level(#); see [R] estimation options.
correlation displays the correlation between the error terms of the quantity equation
and the participation equation. The covariance is not shown when this option is
specified.
display options; see Reporting under [R] estimation options.
maximize options: technique(algorithm spec), iterate(#), no log,
tolerance(#), ltolerance(#), nrtolerance(#), and from(init specs); see
[R] maximize. These options are seldom used.
B. Garcı́a 779
4 Postestimation: predict
4.1 Syntax
predict type
newvarname if in , xb zb xbstdp zbstdp ppar ycond
yexpected stepnum(#)
4.2 Options
xb calculates the linear prediction for the quantity equation. This is the default option
when no options are specified in addition to stepnum().
zb calculates the linear prediction for the participation equation.
xbstdp calculates the standard error of the linear prediction of the quantity equation,
xb.
zbstdp calculates the standard error of the linear prediction of the participation equa-
tion, zb.
ppar is the probability of being away from the corner conditional on the covariates.
ycond is the expectation of the dependent variable conditional on the covariates and on
the dependent variable being away from the corner.
yexpected is the expectation of the dependent variable conditional on the covariates.
stepnum(#) controls the number of steps to be taken for predictions that require in-
tegration (yexpected and ycond). More specifically, # will be the number of steps
taken per unit of the smallest standard deviations of the normal distributions used
780 Double-hurdle model
in the prediction. The default is stepnum(10). You can fine-tune the value of this
parameter by trial and error until increasing the parameter results in no or little
change in the predicted value.
5 Example
We illustrate the use of the dblhurdle command using smoke.dta from Wooldridge
(2010).1
We begin our example by describing the dataset:
. use smoke
. describe
Contains data from smoke.dta
obs: 807
vars: 10 15 Aug 2012 19:00
size: 19,368
Sorted by:
. misstable summarize
(variables nonmissing or string)
We will model the number of cigarettes smoked per day, so the dependent variable
will be cigs. The explanatory variables we use are educ (number of years of schooling);
the log of income; the log of the price of cigarettes in the individual’s state; restaurn,
which takes the value 1 if the individual’s state has restrictions against smoking in
restaurants and 0 otherwise; and we include the individual’s age and the age squared.
Not all variables will be included in both equations.
The fact that cigs (the dependent variable) is a byte should remind us that we
are implicitly relaxing an assumption of the double-hurdle model. The hypothesized
data-generating process generates values over a continuous range of values, but all the
observed number of cigarettes are integers.
It is always good to check for any missing values; because we have no string variables,
the output of misstable summarize ensures that there are no missing values.
The dependent variable should have a “corner” at zero because all nonsmokers will
report smoking zero cigarettes per day. We verify this point by tabulating the dependent
variable. This simple check is important because it might be the case that our data
contain only smokers with positive entries in the variable cigs, in which case a truncated
regression model would be more appropriate. We perform the simple check:
. tabulate cigs
cigs.
smoked per
day Freq. Percent Cum.
The tabulation of the dependent variable reveals that about 60% of the individuals
in the sample smoked 0 cigarettes. Strangely, we also see that individuals seem to smoke
cigarettes in multiples of five—in part, this may be due to a reporting heuristic used
by individuals.
782 Double-hurdle model
cigs
educ 4.373058 .8969167 4.88 0.000 2.615134 6.130983
restaurn -6.629484 2.630784 -2.52 0.012 -11.78573 -1.473241
lincome 3.236915 1.534674 2.11 0.035 .2290102 6.24482
lcigpric -2.376598 12.02945 -0.20 0.843 -25.95388 21.20068
_cons -44.41139 50.5775 -0.88 0.380 -143.5415 54.71869
peq
educ -.2053851 .0324439 -6.33 0.000 -.2689739 -.1417963
age .0867284 .015593 5.56 0.000 .0561666 .1172901
The command showcases some of the features implemented. We used factor variables
to include both age and age squared.
The command displays the number of observations in the sample. It lacks a test
against a benchmark model. Most estimation commands implement a test against a
benchmark constant-only model. For the double-hurdle model, the choice of model
to test against has been left to the user. This test can be carried out with standard
postestimation tools.
The estimation table shows results for four equations. In the econometric sense, we
estimated the parameters from two equations and two dependence parameters. The first
equation displays the coefficients of the quantity equation, which is titled cigs after the
dependent variable. The second equation, titled peq, which is short for participation
equation, displays the coefficients of the participation equation. The third equation,
titled /sigma, displays the estimated value of the standard deviation of the error term
of the quantity equation. As mentioned, the analogous parameter of the participation
equation is set to 1; otherwise, the model is not identified. The fourth equation, titled
/covariance, displays the estimated value of the covariance between the error terms
of the quantity equation and the participation equation. If the correlation option is
specified, the correlation is displayed instead, and the equation title changes to /rho.
The results allow us to appreciate the strengths of the double-hurdle model. For
example, the coefficient of educ has a positive value on the quantity equation, while the
analogous coefficient in the participation equation has a negative value. This implies
that more educated individuals will be less likely to smoke, but if they smoke, they will
tend to smoke more than less educated individuals.
B. Garcı́a 783
So a small increment in the number of years of schooling will positively affect the
number of daily cigarettes smoked given that an individual is a smoker but negatively
affect the probability that the individual is a smoker. Naturally, we may want to
know which effect, if any, dominates. For nonlinear problems like this one, which effect
dominates depends on the other characteristics of the individual. In these situations,
researchers often calculate marginal effects. In our example, we illustrate how to com-
pute the average marginal effect of the number of years of schooling (educ) on three
different quantities of interest:
Given the signs of the coefficients, we know that the average marginal effect of educ
on the probability of smoking will be negative. We also expect the average marginal
effect of educ on the number of cigarettes smoked given that you are a smoker will be
positive. The final quantity, the marginal effect of educ on the number of cigarettes
smoked regardless of smoker status, is ambiguous.
To estimate these quantities, we use the predict() option in conjunction with the
margins command. First, we calculate the average marginal effect of educ on the
probability that the individual is a smoker by using the ppar option:
. margins, dydx(educ) predict(ppar)
Average marginal effects Number of obs = 807
Model VCE : OIM
Expression : predict(ppar)
dy/dx w.r.t. : educ
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
. set r on
r; t=0.00 11:43:17
. margins, dydx(educ) predict(ycond)
Average marginal effects Number of obs = 807
Model VCE : OIM
Expression : predict(ycond)
dy/dx w.r.t. : educ
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
r; t=120.64 11:45:17
. margins, dydx(educ) predict(ycond stepnum(100))
Average marginal effects Number of obs = 807
Model VCE : OIM
Expression : predict(ycond stepnum(100))
dy/dx w.r.t. : educ
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
r; t=1153.86 12:04:31
. set r off
First, we calculate the average marginal effect with the default value of stepnum(),
which is 10. We note that the effect is positive, as expected, and significant. We
also note that when the calculation is repeated with a stepnum() of 100, we observe a
change in the sixth decimal point, which in this context is meaningless, but it comes at
the expense of a tenfold increase in run time. Hence, stepnum() should be used with
caution. My advice is to tune it by using predict with the ycond option until the
predicted values show little or no sensitivity to positive changes in stepnum().
Finally, we use the yexpected option of predict in margins to calculate the average
marginal effect educ has on the number of cigarettes smoked per day regardless of the
individual’s smoker status:
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
We note that the effect is negative and that it is statistically significant. Hence, on
average, a higher education will lower the expected number of cigarettes an individual
smokes.
• The mean of the estimated parameters should be close to their true values.
• The mean standard error of the estimated parameters over the repetitions should
be close to the standard deviation of the point estimates.
• The rejection rate of hypothesis tests should be close to the nominal size of the
test.
The first step consists of choosing the parameters of the model. The quantity equa-
tion was chosen to have one continuous covariate, one indicator variable, and an in-
tercept. The variance of the error associated with this equation is equal to 1. The
participation equation consists of a different continuous variable, indicator variable,
and intercept. The error terms will be drawn so that they are independent. Thus
the correlation between the error terms will be 0. We set an upper corner at 0. The
data-generating process can be summarized as follows:
min(0, 2x1 − d1 + 0.5 + ) if x2 − 2d2 + 1 + u < 0
y=
0 otherwise
1 0
∼ N (0, Σ) , Σ =
u 0 1
A dataset of 2,000 observations was created containing the covariates. The x’s were
drawn from a standard normal distribution, and the d’s were drawn from a Bernoulli
with p = 1/2. In the pseudocode below, we refer to this dataset as “base”.
Now we describe an iteration of the simulation:
1. Use “base”.
The values of interest during each iteration are the point estimates of the parame-
ters; the standard errors of the parameters; and, for each parameter, whether the 95%
confidence interval around the estimated parameter excluded the true value of the pa-
rameter. At the conclusion of the simulation, we have a dataset of 10,000 observations,
where each observation is a realization of the values of interest.
The following table summarizes the results. It shows the mean estimated coefficient,
or “mean”; the standard deviation of the sample of estimated coefficients, or “std. dev.”;
the mean estimated standard error, or “std. err.”; and the proportion of the time a test
of size 0.05 rejected the true null hypothesis, denoted by “rej. rate”.
parameter true value mean std. dev std. err rej. rate
βx1 2 2.0007 0.0563 0.0561 0.0524
βd1 −1 −1.0001 0.0860 0.0856 0.0507
βcons1 0.5 0.5007 0.0881 0.0885 0.0497
γx2 1 1.0095 0.0823 0.0811 0.0520
γd2 −2 −2.0156 0.1424 0.1426 0.0486
γcons2 1 1.0068 0.0862 0.0863 0.0507
sigma 1 0.9979 0.0364 0.0364 0.0542
covariance 0 0.0016 0.1046 0.1036 0.0532
The results show that the statistical properties of the estimates are as desired. Other
simulations were done to see how these results would change under extreme circum-
stances, such as correlations close to the extremes of −1 or 1. The results were qualita-
tively similar to those above for correlations as high as 0.95 and as low as −0.95. There
were instances where the tests did not achieve their nominal size. Rather than being
driven by the extreme values of the input parameters, these issues seem to be driven
primarily by the proportion of observations at the corner. As this proportion gets close
to either extreme (0 or 1), the nominal size of a test of the covariance deviates from the
true size. This becomes an issue once the proportion of observations at the corner is
above 95% or below 5%.
The other parameters can also be affected by this, but for those parameters, this is
more intuitive because it can be viewed through the lens of a small-sample problem. For
example, if most of your observations are at the corner, you will have very little data to
estimate the parameters associated with the quantity equation. Because the confidence
intervals produced by maximum likelihood are normal only asymptotically, we cannot
expect them to achieve their nominal size on small samples.
B. Garcı́a 787
Figure 1 summarizes this information. Each scatterplot contains the observed re-
jection rate of a test of nominal size 0.05 on the vertical axis, and the proportion of
observations at the corner on the horizontal axis. Each point on the scatterplot rep-
resents a variation on the parameterization of the data-generating process presented
above. I held the coefficients of the quantity and participation equations fixed, and I
tried every combination of upper or lower corner; corner at −2, 0, 2; σ ∈ {0.2, 1, 10};
and ρ ∈ {−0.95, 0, 0.95}.
.05 .2 .35 .5 .65 .8 .95
γx2
.05 .5 .95 .05 .5 .95 .05 .5 .95 .05 .5 .95
Rej. rate
Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner
.05 .2 .35 .5 .65 .8 .95
σ ≠ 10 σ = 10
Figure 1. Scatterplots showing rejection rate of a test of nominal size 0.05 and proportion
of observations at the corner
A less intuitive issue occurs when the set of regressors in the participation equation is
equal to the set of regressors of the quantity equation. In this case, the model is weakly
identified, and the nominal sizes will differ from the true size of the test. To illustrate,
we attempt to recover the parameters of the following data-generating process:
min(0, 2x1 − d1 + 0.5 + ) if 2x1 − d1 + 0.5 + u < 0
y=
0 otherwise
1 0
∼ N (0, Σ) , Σ =
u 0 1
The results, summarized in the following table, suggest that the point estimates can
be trusted but that the size of the tests may deviate from the advertised values.
parameter true value mean std. dev std. err rej. rate
βx1 2 2.0043 0.0907 0.0877 0.0711
βd1 −1 −1.0029 0.0925 0.0925 0.0535
βcons1 0.5 0.5077 0.1618 0.1569 0.0762
γx1 2 2.0625 0.2898 0.2699 0.0846
γd1 −1 −1.0270 0.2216 0.2114 0.0560
γcons1 0.5 0.5417 0.2548 0.2447 0.0777
sigma 1 1.0009 0.0331 0.0328 0.0534
covariance 0 0.0374 0.2754 0.2541 0.1118
Figure 2 is analogous to figure 1. We note that tests on the covariance are particularly
unreliable, that the distinction between the cases where σ
= 10 and σ = 10 seems not
to matter, and that the rejection rate exceeds the nominal size of the test when the
proportion of observations at the corner is around 0.9. However, when the proportion
of observations at the corner is between 0.3 and 0.8, the sizes are mostly reliable with
the notable exception of tests of the covariance.
B. Garcı́a 789
γx1
.05 .5 .95 .05 .5 .95 .05 .5 .95 .05 .5 .95
Rej. rate
Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner
.05 .2 .35 .5 .65 .8 .95
σ
.05 .5 .95 .05 .5 .95 .05 .5 .95 .05 .5 .95
Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner Prop. of obs. at corner
σ ≠ 10 σ = 10
Figure 2. Scatterplots showing rejection rate of a test of nominal size 0.05 and proportion
of observations at the corner
Ψ = Ψ (zγ − c, b, ρ)
ψ = Ψ (zγ − c, b, ρ)
!
zγ − c − bρ
Φ12 = Φ
1 − ρ2
!
zγ − c − bρ
φ12 = Φ
1 − ρ2
!
b − ρ (zγ − c)
Φ21 = Φ
1 − ρ2
dβj xj φ (b) Φ12 −ρφ (a) y − xβ
= + +
d log (L) σ y=c Ψ−1 y>c 1 − ρ2 Φ (a) σ
dγj φ (z − c) Φ21 φ (a)
= zj +
d log (L) y=c
Ψ−1 y>c 1 − ρ2 Φ (a)
!
dσ12 1 ψ
y − xβ φ (a) aρ
= + +
d log (L) σ y=c
Ψ−1 y>x
σ Φ (a) 1 − ρ2 1 − ρ2
dσ 1 Φ12 φ (b) ρψ
= b +
d log (L) σ y=c 1−Ψ 1−Ψ
2 !
1 y − xβ −ρφ (a) y − xβ aρ
+ −1+ 2 +
σ y>c σ Φ (a) 1 − ρ2 σ 1 − ρ2
The implementation of the derivatives for an upper corner at c requires a few minor
changes. First, the derivatives with respect to βj and γj should be multiplied by −1.
Finally, multiply a, b, zγ − c, and (y − xβ)/σ by −1.
7.4 Weights
The weighting schemes implemented for dblhurdle are frequency weights (fw), sam-
pling weights (pw), and importance weights (iw). Recall that the likelihood function
is summed over observations. To implement the weights, you need to multiply the ith
term of the summation over observations by the weight of the ith observation. The
frequency weights are only allowed to be positive integers.
792 Double-hurdle model
When frequency weights are specified, the sample size is adjusted so that it is equal
to the sum of the weights. The importance weights are allowed to be any real number.
No sample-size adjustments are made when importance weights are specified. The
sampling weights are like the importance weights, but a robust estimator of the variance
is computed instead of the default oim estimator. No sample-size adjustment is made
when sampling weights are specified, and the weights are not allowed to be negative.
Finally, analytic weights (aw) are not allowed. This command was written with the
tobit command in mind. In that case, the aweights (normalized) divide the variance
of the error term. In the case of dblhurdle, the rationale for dividing the variance by
the normalized weights does not carry over well because we also have to estimate the
covariance between the error terms.
7.5 Prediction
There are three options in the prediction program that require some explanation. The
ppar option computes the probability of being away from the corner conditional on the
covariates. Thus this option computes
xβ − c
Pr (y > c|x, z) = Φ zγ − c, ,ρ
σ
Note that the options that involve integration are time consuming. Thus the option
stepnum() was added to the prediction program to allow the user some control of the
execution time for the integration. Letting ns denote the stepnum(), the step size is
chosen to be
min σ, 1 − ρ2
ns
Execution is faster when the stepnum() is smaller, but the improved run time comes
at a cost to accuracy. The default is stepnum(10).
B. Garcı́a 793
8 Conclusion
The double-hurdle model was an important contribution to the econometric toolkit
used by researchers. I hope that readers will consider this model and, in particular, the
dblhurdle command when their first instinct is to use the tobit model. The example
presented in section 5 illustrates the flexibility of the model. It allows the researcher to
break down the modeled quantity along two useful dimensions, the “quantity” dimension
and the “participation” dimension.
The command presented in this article only allows for a single corner in the data.
One desirable feature to add is the capability to handle dependent variables with two
corners. Such variables are common (for example, 401k contributions), so this feature
would certainly provide higher value to users.
9 Acknowledgments
I wrote this article and the command described therein during a summer internship at
StataCorp. It was exciting to meet the individuals behind Stata. I thank David Drukker
for his support and for the time he spent going over the intricate details of the models.
I also thank Rafal Raciborski for all of his comments, suggestions, and tips. Any errors
in my work are my own.
10 References
Cragg, J. G. 1971. Some statistical models for limited dependent variables with appli-
cation to the demand for durable goods. Econometrica 39: 829–844.
Jones, A. M. 1992. A note on computation of the double-hurdle model with dependence
with an application to tobacco expenditure. Bulletin of Economic Research 44: 67–74.
Jones, A. M., and S. T. Yen. 2000. A Box-Cox double-hurdle model. Manchester School
68: 203–221.
Martı́nez-Espiñeira, R. 2006. A Box-Cox Double-Hurdle model of wildlife valuation:
The citizen’s perspective. Ecological Economics 58: 192–208.
794 Double-hurdle model
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd
ed. Cambridge, MA: MIT Press.