Berry and Haile (2021) WP - Foundations of Demand Estimation
Berry and Haile (2021) WP - Foundations of Demand Estimation
Berry and Haile (2021) WP - Foundations of Demand Estimation
Steven T. Berry
Philip A. Haile
We benefited from detailed comments from Chris Conlon, Jean-Pierre Dubé, Aviv Nevo, and
Frank Verboven. Jaewon Lee provided capable research assistance. The views expressed herein
are those of the authors and do not necessarily reflect the views of the National Bureau of
Economic Research.
NBER working papers are circulated for discussion and comment purposes. They have not been
peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies
official NBER publications.
© 2021 by Steven T. Berry and Philip A. Haile. All rights reserved. Short sections of text, not to
exceed two paragraphs, may be quoted without explicit permission provided that full credit,
including © notice, is given to the source.
Foundations of Demand Estimation
Steven T. Berry and Philip A. Haile
NBER Working Paper No. 29305
September 2021
JEL No. C36,D12,L20
ABSTRACT
Demand elasticities and other features of demand are critical determinants of the answers to most
positive and normative questions about market power or the functioning of markets in practice.
As a result, reliable demand estimation is an essential input to many types of research in
Industrial Organization and other fields of economics. This chapter presents a discussion of some
foundational issues in demand estimation. We focus on the distinctive challenges of demand
estimation and strategies one can use to overcome them. We cover core models, alternative data
settings, common estimation approaches, the role and choice of instruments, and nonparametric
identification.
Steven T. Berry
Department of Economics
Yale University
Box 208264
37 Hillhouse Avenue
New Haven, CT 06520-8264
and NBER
[email protected]
Philip A. Haile
Department of Economics
Yale University
37 Hillhouse Avenue
P.O. Box 208264
New Haven, CT 06520
and NBER
[email protected]
Contents
1 Introduction 3
1.1 Why Estimate Demand? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Our Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Market-Level Data 21
4.1 The BLP Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Using a Supply Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Computing the BLP Estimator and Standard Errors . . . . . . . . . . . 30
2
1 Introduction
1.1 Why Estimate Demand?
Little can be said about the functioning of a market without a quantitative assessment of
demand. In the modern Industrial Organization (“IO”) literature, for example, estima-
tion of demand elasticities is essential for measuring markups and quantifying sources of
market power. And almost any counterfactual question about a market requires quanti-
tative measures of how choices respond to ceteris paribus changes in the prices or other
characteristics of the available options. Such measures seldom suffice on their own to
provide full answers to the economic questions of interest concerning market outcomes,
but they are often necessary. Examples include assessments of
Of course, the importance of measuring demand is not limited to the realm of IO. As these
examples suggest, demand is central to substantive policy questions in public economics,
trade, health, education, and other fields.
The problem of demand estimation is not new. However, it is both challenging and
subject to some common misconceptions. It has also been the focus of substantial atten-
tion in recent years. Critical contributions have come from scholars in a range of fields.
But IO economists often have led the way to the recent progress in demand estimation.
One reason is the essential role that demand plays in questions of competition, market
power, and market outcomes—questions at the heart of the field. The tradition in modern
IO has also been to insist on trying to measure what is important to its core questions,
rather than focusing on what is easy to measure given current data and estimation tech-
niques. Determining what to measure and how to do so often requires guidance from an
economic model, particularly in the context of imperfect competition and market equi-
librium. This has led to a strong connection between theory and empirical work in IO.
All of this has forced IO economists to face some difficult issues associated with demand
estimation and to press for new solutions.
3
1.2 Our Focus
Our goal in this chapter is to provide a unified and (reasonably) compact treatment of
some key ideas and practices developed in several literatures. Our emphasis naturally
reflects our own perspectives. We limit our focus to a set of foundational issues: the dis-
tinctive challenges of demand estimation; the most common empirical models of demand
in IO; the role and choice of instrumental variables; different data types (market-level
data, micro data, panel data); and nonparametric identification. This focus implies that
we neglect many other important issues, variations of the standard models, and appli-
cations. The chapter of Gandhi and Nevo in this Handbook has a different and highly
complementary focus.
We begin with a discussion of the special challenges posed by the problem of demand
estimation. We then review the classic discrete choice model of demand and move on
to estimation and identification using market-level data. We next turn to various forms
of “micro” (consumer-level) data and the advantages such data can offer. We conclude
with a discussion of directions for future research.
4
imagine a perfectly competitive market with a single good (no complements or substi-
tutes), where demand and supply are characterized by the pair of simultaneous equations
Q = D(X, P, U ) (2.1)
P = C(W, Q, V ). (2.2)
Here (2.1) represents demand while (2.2) represents supply. Q and P denote quantity
and price; X and W denote observable demand shifters and cost shifters, respectively;
and the “error terms” U ∈ R and V ∈ R denote unobserved demand shifters and cost
shifters (shocks to demand and marginal cost), respectively. Assume that only prices and
quantities are endogenous, i.e., that (X, W ) (U, V ).
|=
The presence of the latent demand shifters and cost shifters (U, V ) is, of course, the
reason there is a simultaneity/endogeneity “problem” in the identification/estimation
of demand.1 If there were no unobservable U affecting demand, identification would
be trivial: the directly observed relationship between (X, P ) and Q would itself be the
demand relationship. In that case, the notion that the price P could be econometrically
endogenous would be nonsense: there would be no unobservable whose variation could
be confounded with that of P .
Of course, even when one acknowledges the existence of latent demand shocks U , the
solution to the endogeneity problem in this special setting is well understood. Identi-
fication of demand can then be obtained through cost shifters that are excluded from
the function D (i.e., elements of W 6⊂ X) satisfying an appropriate independence con-
dition with respect to U . In fact, nonparametric identification of demand in this case
follows immediately from the same instrumental variables (“IV”) conditions that yield
identification in the case of a regression model with endogeneity.2
There is at least one important sense in which the optimistic message from this
special case is correct: as we will see below, the essential requirements for identification
of demand are indeed standard types of “clean” variation, as from instrumental variables.3
However, this special case can also be highly misleading. It may suggest that the challenge
to identification of demand is “merely” the endogeneity of price. And given the wide
1
Unless marginal cost (inverse supply) is constant with respect to Q, dependence between P and U
would arise even if U and V were assumed independent. Typically one will expect correlation between
latent factors affecting firm costs (V ) and those affecting consumer demand (U ). And outside the
realm of perfect competition, prices will generally depend on demand elasticities, yielding (functional)
dependence between prices and demand shocks (the latter affecting elasticities) even in the special case
of constant marginal cost and independence between cost shocks and demand shocks.
2
See Newey and Powell (2003) in the case of nonparametric D that is additively separable in U ,
or Chernozhukov and Hansen (2005) (Theorem 4) when additive separability is replaced with strict
monotonicity in U .
3
Related to the instrumental variables literature is the literature on restrictions on the matrix of
covariances across equations. For example, one might be willing in some contexts to assume that the
demand and supply unobservables are uncorrelated. Full or partial identification using such restrictions
dates back Koopmans (1949). A recent example in the oligopoly context is MacKay and Miller (2021).
5
attention to endogeneity in empirical economics, one might conclude that estimation of
demand could proceed using any number of common tools for measuring the (causal)
effects of endogenous covariates on outcomes of interest. As we explain below, both of
these conclusions would be incorrect.
Qj = Dj (X, P, U ) , (2.3)
4
A regression equation is most often specified in the separable form Y = f (X) + E (e.g., Newey and
Powell (2003)), leading to mean regression. Alternatively, a quantile regression model takes the form
Y = f (X, E), with f strictly increasing in the scalar E (e.g., Chernozhukov and Hansen (2005).)
6
formally in later sections of this chapter; but it is clear that results for identification of
a regression function with a scalar structural error (see, e.g., footnote 2) are not directly
applicable.
In general, econometric models with multiple structural errors in each equation are
much more challenging than regression models. However, such models are not foreign
to econometrics. For example, they arise in standard models of treatment effects (e.g.,
Angrist and Imbens (1995)) and in particular representations—typically the reduced
forms—of standard simultaneous equations models (e.g., Brown (1983) and Benkard and
Berry (2006)). These examples, in fact, hint at both the problems created by multiple
structural errors and how one might make progress.
In empirical settings with endogeneity and multiple unobservables, economists of-
ten settle for estimation of particular weighted average responses (e.g., a local average
treatment effect); but this is a compromise poorly suited to the economic questions that
motive demand estimation, as these typically require the levels and slopes of demand
at specific points. To make progress, the IO literature on demand estimation has used
tools more familiar in the literature on simultaneous equations models: first “inverting”
the system of equations, then relying on instrumental variables. Of course, in the case
of simultaneous equations models,5 one typically expects to need instruments (or other
sources of exogenous variation) for all endogenous variables: here, all J prices and all J
quantities. Indeed, although this is sometimes not fully appreciated, the IO literature has
developed strategies exploiting sources of independent variation in prices and quantities
even when estimating demand alone. We return to each of these points below.
7
veal certain averages of demand responses—integrating over the vector of demand shocks
(U1 , . . . , UJ ) to reveal a type of local average treatment effect. But in general, such aver-
ages are of very limited value in the case of demand, as they do not reveal any elasticity
of demand (or other standard notion of a demand response)—not at the observed prices
and quantities or any other known point. They therefore do not allow one to quantify
demand responses to counterfactuals of interest (e.g., outcomes following a merger), to
predict pass-through of a tax, or to infer firm markups through equilibrium pricing con-
ditions. We return to this issue in section 2.5.3. In short, however, such averages offer at
most a descriptive feature of demand, not the primary quantities of economic interest.
Of course, if experimental variation in prices does not allow identification of demand,
it should be clear that “quasi-experimental” variation cannot suffice. One important
implication is that instruments for prices (or other quasi-experimental variation in prices)
generally cannot by themselves deliver identification of demand. This point is often not
fully appreciated, but is essential to motivating the strategies relied on in the leading
work on demand estimation.
One possible path to resolving this challenge is to impose functional form restric-
tions. For example, suppose that the demand function on the right-hand side of (2.3) is
restricted to take the form
|=
when P is experimentally controlled by the researcher. Similarly, instruments for (all)
prices P could allow identification.
Of course, restricting the vector of demand shocks to affect demand for good j mono-
tonically through a scalar Ej (U ) involves a strong functional form restriction—one ruled
out even by common parametric demand specifications like the multinomial probit, multi-
nomial logit, or CES models.7 And without this index restriction an experiment (or in-
strument for the prices P ) generally will not suffice to allow identification of demand. As
we will see in later sections of this chapter, identification of demand can still be obtained
using additional sources of variation.
7
The notable exception of demand that is linear in all demand shocks (own and other) points out a
potentially unappreciated implication of a linear specification: a reduction in the dimension of exogenous
variation required for identification.
8
2.5 Many Common Tools Fall Short
Our discussion has already suggested that a solution to the challenge of demand esti-
mation can be obtained using instrumental variables, potentially with other sources of
independent variation in prices or quantities. Most of our chapter focuses on specific
parametric and nonparametric instrumental variables approaches to identification and
estimation, using insights from several literatures in IO and econometrics. One may
wonder whether the complexity of these strategies is necessary—specifically, whether
simpler or more familiar tools of empirical economics might offer viable alternatives. In
fact, many common empirical tools in economics—including various “research designs”
useful for measuring (causal) effects of endogenous variables in other contexts—are not
applicable to demand, at least without significant compromises or developments beyond
the current methodological frontier.
We briefly discuss some of these alternative empirical approaches below. Because
critical shortcomings are evident even in the idealized single-good supply and demand
model given by (2.1) and (2.2), we focus primarily on this case to illustrate.
9
Yet if the price of a given good is the same for all consumers in a given market (often
this is the definition of “market”), a fixed effect for each product×market will leave no
variation in price, making it impossible to measure demand elasticities or to connect
demand to standard notions of aggregate welfare.
In a panel data set covering products within geographic markets across time, it is
feasible to consider fixed effects for products across markets (held fixed over time) plus
time effects (held fixed across product/markets). But for this to serve as a valid approach
to the problem of endogenous prices, the resulting model must fit the data perfectly, up
to measurement error in the observed quantities. And, again, the same set of fixed effects
could not be used to estimate supply, because we require additional sources of supply-side
price variation (not just measurement error).
T =F (Z, E1 ) (2.5)
Y =G(T, E2 ), (2.6)
where (2.6) represents an outcome equation of interest and (2.5) is a reduced form for the
endogenous (“treatment”) variable T appearing on the right-hand side of (2.6).10 Here
E1 is assumed to be a scalar, with F strictly increasing in E1 . In such contexts, a control
function can be used to treat the endogeneity of T in the outcome equation, allowing
identification of G if E2 is also a scalar (or, otherwise, identification of certain average
effects).
One may be tempted to view demand as fitting this triangular structure, with price
being the endogenous variable T affecting the quantity demanded, Y . But demand
generally cannot be represented in this form. Even in the perfectly competitive single-
good economy characterized by (2.1) and (2.2), the reduced form for the equilibrium
price takes the form
P = R(X, W, U, V ). (2.7)
This does not take the form of (2.5). Critically, the right-hand side of (2.7) depends
on all structural errors—here, the scalar demand shock U and scalar cost shock V . As
demonstrated formally by Blundell and Matzkin (2014), only in very special cases (a
linear model being one example) will the errors U and V enter the reduced form through
a scalar index (not itself dependent on (X, W )), allowing for valid application of control
9
Imbens and Newey (2009) (see also references therein) discuss nonparametric control function ap-
proaches. They note the typical failure of control functions in non-triangular systems, including in many
classic simultaneous equations models.
10
Throughout we use the term “reduced form” as it is defined in econometrics: a relationship in which
an endogenous variables is expressed as a function of exogenous variables and structural errors.
10
function approaches. Without such functional form restrictions, however, the control
function approach will not allow identification of demand.11 The key problem is clear:
in general a single control variate cannot eliminate the confounding effects of multiple
unobservables, much less hold them fixed at particular values. Of course, this problem
only becomes more severe when one acknowledges the presence of other related goods; in
that case, each outcome (demand) equation depends on multiple endogenous prices, each
of which generally depends on the demand shocks and supply shocks associated with all
related goods.
Q = S(W, P, V ),
we teach (by graph and equation) that the change in the change in equilibrium price
resulting from change in the excise tax depends on relative slopes of supply and demand,
as measured by
∂S(W,P,V )
∂P ∗ (X, W, U, V, τ ) ∂P
= . (2.8)
∂τ ∂S(W,P,V )
+ ∂D(X,P,U )
∂P ∂P
This ratio is not a LATE. By definition, a LATE averages over the latent variables; this
is not the same thing as holding them fixed. Thus, a LATE approach cannot produce
ceteris paribus counterfactual quantity of interest: the causal effect of the tax change.
But the limitations of a LATE in this example are typically much more severe than
the distinction between the effect at a point and an average effect. Indeed, in some
11
Petrin and Train (2010) and Kim and Petrin (forthcoming) describe special cases allowing use of a
control function approach.
12
For simplicity, here we assume upward sloping marginal cost and differentiability of the functions D
and S. Note that elsewhere in this chapter we use the notation s to represent a vector of market shares.
We trust our use of S for “supply” here will not cause confusion.
11
cases one might be interested in an average effect like a LATE for the treatment of
a tax change. This would be equal to the left-hand side of (2.8) averaged over some
distribution of (U, V ). However, this cannot be determined from LATE estimates of
demand (and supply). One could estimate separate local average derivatives of demand
and supply with the LATE approach in this simple setting. However, a ratio of averages
is not equal to the average ratio. Furthermore, because the weights for each local average
depend on the instruments (see, e.g., Angrist and Imbens (1995) and Angrist, Graddy,
and Imbens (2000)), the necessary use of different instruments to identify demand and
supply averages will also imply different (and unknown) weights for each average. Thus,
a LATE approach to demand (and supply) cannot identify even the average numerator
and average denominator under a common measure.13
We emphasize that this example was chosen because it is perhaps the most elementary
example of the kind of economic question that motivates demand estimation. It is not
special in terms of the limitations of LATE. Almost any equilibrium counterfactual will
involve interactions between demand and supply, leading to similar issues. Thus, it is not
merely that a LATE approach to demand estimation fails to allow measurement of the
quantity of primary economic interest; rather, it typically does not allow one to measure
any well-defined average equilibrium counterfactual quantity.
Notably, the problems discussed in the preceding paragraphs are those arising in the
simplest case—that of demand and supply for a single good with no complements or
substitutes. With multiple goods, the shortcomings of a LATE approach also multiply.
One then faces a system of equations characterizing the responses of demand to prices
(recall section 2.3), even before turning to more complex counterfactuals. Multiple prices
require multiple instruments, and the problems of LATE in handling different averages
associated with different instruments become even more important. This only adds to
the limitations of LATE for equilibrium counterfactuals. And in such cases (i.e., in the
most common empirical setting in practice), it appears that LATE demand estimates
could not produce even a local average own-price elasticity of demand.
Thus, although a LATE approach might embody the right set of compromises for
many empirical settings, it is poorly suited to the economic questions that motivate
demand estimation. Luckily, one can use different empirical tools depending on the
empirical questions of interest. Much of our focus in what follows involves empirical
approaches that allow one to model substantial unobserved heterogeneity (at the level of
individuals, goods, and markets) while still permitting identification and estimation of
the objects of economic interest in contexts involving demand.
13
Of course, if the data offer adequate exogenous variation in the tax τ , one could estimate an average
ex post effect without estimating demand—either from averages of equilibrium prices P ∗ (X, W, U, V, τ ) or
an instrumental variables LATE estimate. This serves as a reminder that demand estimation is typically
motivated by a desire to answer questions—e.g., to infer oligopoly markups or provide policy advice on
the implications of a proposed carbon tax—that require either an ex ante analysis or a counterfactual
quantity that cannot be characterized by an average response of one scalar observable to an exogenous
change in another.
12
2.6 Balancing Flexibility and Practicality
Although demand presents challenges that are absent in many empirical settings, all the
“usual” challenges remain. One such challenge is finding empirical specifications that are
both (a) sufficiently flexible to avoid strong a priori restrictions on the results and (b)
sufficiently parsimonious to permit practical application. In some markets the number of
closely related goods can be large—consider, for example, the set of all new automobile
models, all computer models, all mutual funds, or all residential neighborhoods in a given
city. Because the demand for a given good depends on the characteristics and prices of
related goods, a demand system with J goods has J 2 price elasticities at each point. In
many contexts, this will rule out even a linear specification of the demand equation (2.3).
Thus, even in cases where nonparametric estimation would be possible in principle,
in practice it will often be necessary to impose restrictions in order to obtain an em-
pirical model that is practical for the data available. Unsurprisingly, one can go too
far in the pursuit of parsimony. Some of the simplest demand specifications (e.g., the
CES, multinomial logit, multinomial probit) impose strong a priori restrictions on de-
mand elasticities—and, therefore, on markups, pass-through, and other key quantities of
interest—that are at odds with common sense and standard economic models.14 Below
we will discuss some of the strategies used in practice to strike a more attractive balance
between parsimony and flexibility. One common strategy is to derive demand from a
specification of consumer utility functions.
14
See, e.g., the discussion of in Berry, Levinsohn, and Pakes (1995) and, for the CES, in Adao, Costinot,
and Donaldson (2017).
15
In terms of consumer theory, one may view utilities as primitives and individual/aggregate demand
as a derived object, or view individual choice rules (individual demand) as primitives, with utilities (and
optimization) as a derived representation (see, e.g., Mas-Colell, Whinston, and Green (1995)).
13
elasticities) with a relatively small number of parameters (see, e.g., Berry (1994)). Typ-
ically, some of this parsimony comes from imposition of economically motivated restric-
tions. For example, researchers will often prefer to require that individual demand satisfy
standard rationality conditions, which hold automatically when demand is derived from
utility maximization. And even when focusing on market-level demand, an explicit con-
nection to individual-level demand can often be usefully exploited in empirical work.
At the consumer level, researchers often wish to impose certain economic restrictions
or symmetry conditions that may be more easily formulated through utility functions.16
Examples of such restrictions include:
• an assumption that, all else equal, each consumer has a single marginal rate of
substitution between any pair of product characteristics—say, price and quality—
regardless of the name of the product.
Restrictions of these types are not without loss, but they can lead to specifications offering
an attractive balance of parsimony and flexibility.18
14
option (in the same demand system or, more likely, another) could be to purchase four
boxes of cookies and a gallon of milk.19 For simplicity, however, we will refer to each of
the options as a good or product. Typically each consumer’s choice set should include
an option of the form “none of the above”—what we will call the “outside good.” This is
important. In a discrete choice model without an outside good, the market demand elas-
ticity would always be zero; for example, when estimating demand for health insurance,
a model with no outside good would imply that doubling all premiums would have no
effect on the number of households with insurance. Note also that the choice probabilities
implied by a discrete choice model can often also be interpreted as a demand system gen-
erated by continuous choices, as from a representative consumer with a taste for variety
(see for example the review provided by Anderson, DePalma, and Thisse (1992)).
19
See for example Gentzkow (2007), who notes that this approach can generate complementarities
across the “primitive” products. There are practical limits to this flexibility, and one may want to
impose cross-option restrictions when the distinct options in the choice set involve partially overlapping
bundles.
20
As a characterization of demand, the modeled randomness in utilities may be interpreted as reflecting
heterogeneity in consumer’s decision making, allowing for example mis-perception, inconsistencies, or
non-optimizing behavior. See, e.g., the discussion and references in chapter 2 of Anderson, DePalma, and
Thisse (1992). The choice of interpretation becomes important if one wishes to make welfare statements
or counterfactual predictions associated with interventions that might alter consumers’ decision rules.
21
Normalizations of the location and scale of each consumer’s utilities are without loss of generality
with respect to behavior. However, different normalizations can have different implications for the
interpretation of additional assumptions, including those used to justify certain welfare statements. See
Bhattacharya (2018) for important results on standard aggregate welfare measures in random utility
discrete choice models.
15
“ties” (uij = uik for j 6= k) occur with probability zero.
We may then represent consumer i’s choice with the vector (qi1 , . . . , qiJi ), where
sij = E [qij | Ji , χi ]
Z
= dFu (ui0 , ui1 , . . . , uiJi | Ji , χi ) ,
Aij
where
Aij = (ui0 , ui1 , . . . , uiJi ) ∈ RJi +1 : uij ≥ uik ∀k .
To illustrate, consider an example with Ji = 2. Let pj denote the price of good j and
let
uij = µij − pj
for j > 0, where (µi1 , µi2 ) are drawn from a joint distribution Fµ (·). Set ui0 = 0,
normalizing the location of utilities. Figure 1 then illustrates the regions in (µi1 , µi2 )-
space leading consumer i to choose goods 0, 1, and 2. For example, only consumers for
whom µi2 − p2 > 0 prefer good 2 to the outside option. The dark grey region is the set
of (µi1 , µi2 ) combinations such that this holds and µi2 − p2 > µi1 − p1 , i.e., the set Ai2 .
Similarly, the light grey region corresponds to Ai1 . The choice probabilities for consumer
i then correspond to the probability measure assigned to each region by Fµ (·).
µi2
45◦
Ai2
p2
Ai1
Ai0
(0, 0) p1 µi1
16
3.2 The Canonical Model
Discrete choice demand models are frequently formulated using a parametric random
utility specification such as22
17
more demanding instrumental variables requirements.24
The additive ijt in (3.1) is most often specified as an i.i.d. draw from a standard type-
1 extreme value distribution, yielding a mixed multinomial logit model. Alternatively, a
normal distribution will yield a mixed multinomial probit.25 The term “mixed” reflects
the heterogeneity across consumers in the parameters αit and βit that characterize their
marginal rates of substitution between the various observed and unobserved characteris-
tics. Choice probabilities in the population reflect a mixture of the choice probabilities
conditional on each possible combination of (αit , βit ). For example, in the case of mixed
logit we have choice probabilities in the population (i.e., market shares) given by
Z
exjt βit −αit pjt +ξjt
sjt = PJt x β −α p +ξ dF (αit , βit ; t), (3.3)
k=0 e
kt it it kt kt
where F (·; t) denotes the joint distribution of (αit , βit ) in market t. The latent taste
parameters (αit , βit ) are often referred to as “random coefficients.”26
To specify the joint distribution F (·; t), each component k of the random coefficient
vector βit is commonly specified as taking the form
L
X
(k) (k) (k) (`,k)
βit = β0 + βν(k) νit + βd di`t . (3.4)
`=1
(k) (k)
Here β0 is a parameter shifting all consumers’ tastes for xjt . Each di`t represents a
(k)
characteristic (e.g., demographic measure) of individual i, and each νit is a random
variable with a pre-specified distribution (e.g., a standard normal). The parameters
(`,k) (k) (k)
βd and βν govern the extent of variation in tastes for xjt across consumers with
(k)
different demographic characteristics dit or different taste shocks νit . The distinction
(k)
between di`t and νit reflects the fact that each di`t (or at least its distribution in the
population) is assumed to be known. For example, in the case of demand for cars, one
might specify that family size affects preference for large cars, in which case the actual
distribution of family size in each market would allow the model to capture this source
of latent preference heterogeneity in the population of consumers.27 On the other hand,
although we may also expect preference for fuel efficiency to vary in the population, there
24
For example, given sufficient instruments, one may simply let pt include all endogenous observables,
as in Fan (2013). See also the discussion in Berry and Haile (2020).
25
Independence of ijt across goods j is not essential, and is often relaxed in the case of mixed probit.
However, given the presence of the market-level demand shocks ξt and the random coefficients (αit , βit ),
for the usual case in which prices are set at the market level (no individual-specific prices) it is standard
to assume that each ijt is independent of xt , pt , ξt .
26
Early examples using random coefficients to generate random utility discrete choice models can be
found in Quandt (1966, 1968). See also Quandt (1956).
27
See, e.g., Goldberg (1995) and Petrin (2002).
18
may be no demographic measure whose distribution captures this heterogeneity. Such
(k)
latent heterogeneity in preference for a product characteristic xjt can be captured by
(k)
the random taste shocks νit .
The treatment of the coefficient on price, αit is similar. A typical specification of αit
takes the form
(0)
ln(αit ) = α0 + αy yit + αν νit ,
where yit represents consumer-specific measures such as income that are posited to affect
price sensitivity.28 The variables included in yit might overlap partially or entirely with
dit .
19
predictions. This is a bug, not a feature. These restrictions do not come from economics
but from assumptions chosen for simplicity or analytical convenience. Models must, of
course, abstract from reality, and finite samples require appropriate parsimony. But good
modeling and approximation methods should aim to avoid strong a priori restrictions
on the very quantities of interest unless those restrictions can be defended as natural
economic assumptions.
Random coefficients are not the only way to avoid these restrictions. For example, the
random terms (i1t , . . . , iJt t ) need not be specified as mutually independent. In the case
with just a few products whose identity is constant across markets, a good alternative
to random coefficients might specify an unrestricted covariance matrix for the it vector.
But in cases with more than a few products per market, or with products whose charac-
teristics change across markets, random coefficients are attractive because they balance
flexibility in key dimensions with tractability. Random coefficient specifications can be
formulated using economics, building on the observations that real goods differ in multi-
ple dimensions, and real consumers have heterogeneous preferences over these differences.
Taking the case of automobiles, random coefficients on indicators for pickup trucks and
for minivans enables the model to predict that different models of pickup trucks will be
close substitutes, precisely because a consumer who likes one pickup truck will tend to be
one with strong idiosyncratic taste for all pickup trucks. And even if the leading minivan
and leading pickup truck have very similar market shares, the model can predict very dif-
ferent cross elasticities with respect to a third vehicle—say, an SUV targeted at families.
Thus, as a matter of theory, random coefficients can introduce consumer heterogeneity
along key dimensions of product differentiation. And a substantial empirical literature
has demonstrated that in practice random coefficients can play a critical role in giving
the demand specification sufficient flexibility to produce natural consumer substitution
patterns.
As a practical matter, important questions include the measures included in xt and
the extent of heterogeneity modeled through random coefficients. In some cases, practical
considerations may dictate selecting a set of observable characteristics viewed as most
important or most subject to heterogeneity in preferences.30 Depending on the data
set, a specification with a very large number of random coefficients may yield imprecise
estimates (particularly, of the parameters associated with the distribution of random
coefficients),31 or even numerical problems in estimation. A researcher then often faces
the practical question of what product characteristics are modeled as having random
coefficients. Should one choose just an index of “quality”? Just price? Dummy variables
30
Gillen, Moon, and Shum (2014) and Gillen, Moon, Montero, and Shum (2019) propose a data-driven
approach to selecting from a large set of observed characteristics assumed to affect only mean utilities.
31
Of course, we often care more about the estimation error in our eventual counterfactual analysis
than the statistical significance of the estimated random coefficient parameters per se. Furthermore,
an imprecise estimate of the variance of a random coefficient, as may arise when instruments produce
insufficient exogenous variation, should not be confused for evidence in favor of a degenerate coefficient.
20
indicating subsets (e.g., nests) of products?32 Multiple observed characteristics (parts
of xt , pt )? In practice, the choice must reflect the application and the available data.
Economic considerations often suggest dimensions along which preference heterogeneity
is likely to be most important for determining the consumer substitution patterns that
drive own- and cross-price elasticities. But practical considerations such as sample size
and available sources of exogenous variation may play a role as well. In many cases it
may not be desirable to specify random coefficients on all components of (xt , pt ). We will
return to this issue below when discussing instrumental variables and identification.
4 Market-Level Data
In many applications the key data are observed at the market level. In such cases, one
typically observes
• the number of goods Jt available to consumers in each market t;
• their prices and other observable characteristics pt , xt ;
• their observed market shares, s̃jt , typically measured as the total quantity of good
j sold in market t divided by the number of consumers (e.g., households) in that
market;
• the distribution of consumer characteristics (dit , yit ) in each market; and
• possibly, additional variables wt (e.g., cost shifters) that might serve as appropriate
instruments
The standard approach to estimation of discrete choice demand from market-level
data was developed in Berry, Levinsohn, and Pakes (1995), with many subsequent vari-
ations and extensions. Here we consider a slightly simplified version of their model with
a non-random coefficient on price.33 Thus, the random utility specification becomes
for j > 0, with ui0t = i0t . We follow Berry, Levinsohn, and Pakes (1995) in assuming that
each ijt is an i.i.d. draw from a standard type-1 extreme value (Gumbel) distribution,34
32
This case covers the nested logit as a special case. See Ben-Akiva (1973), McFadden (1978) and, for
the market-level IO context, Berry (1994).
33
In practice, it is often important to allow for heterogeneity in price sensitivity. The variation of
this model presented in section 6 illustrates a type of specification commonly used in practice, even in
the case of market-level data. We present the more restrictive quasi-linear specification here to simplify
exposition and make clearer the sources of key identification requirements.
34
Setting the scale parameter of the Gumbel distribution to one normalizes the scale of utilities.
Setting the location parameter to zero is also without loss due to the fact that adding the same constant
to all utilities yields an equivalent representation of preferences.
21
(k)
and that each νit in (3.4) is an i.i.d. draw from a standard normal distribution.
Observe that (4.1) can be rewritten as
Let Fµ (· | xt , βd , βν ) denote the joint distribution of the stochastic terms (µi1t , . . . , µiJt t )
given (xt , βd , βν ). Given our assumptions above, this distribution is known up to the
parameters (βd , βν ).
Letting
δt = (δ1t , . . . , δJt t ),
the market shares implied by the model take the form
Z
eδjt +µijt
σj (δt , xt , βd , βν , Jt ) = PJt δ +µ dFµ (µit | xt , βd , βν , Jt ) (4.5)
k=0 e
kt ikt
for each good j. An important fact, demonstrated in Berry (1994), is that the demand
system
σ(δt , xt , βd , βν , Jt ) = (σ1 (δt , xt , βd , βν , Jt ), . . . , σJt (δt , xt , βd , βν , Jt ))
d , βν and any vector of nonzero market shares s = (s1 , . . . , sJt ) in
is invertible: given xt , βP
market t such that 1 − j>0 sjt > 0, there is a unique vector δ for market t such that
σ(δ, xt , βd , βν , Jt ) = s.
θ ≡ (α0 , β0 , βd , βν )
represent all the parameters of the model. It will be useful to partition these as
θ1 = (α0 , β0 )
θ2 = (βd , βν ).
22
In the literature, the elements of θ1 are often referred to as the “linear parameters”
and with θ2 referred to as “nonlinear parameters.”35 Note that we can then rewrite the
model’s prediction of market shares (4.5) as
sjt = σj (δt , xt , θ2 , Jt ).
E[ξjt (θ)zjt ] = 0,
PT
More formally, let T denote the number of markets in the sample and let N = t=1 Jt .
The BLP estimator θ̂ is defined as the solution to a mathematical program:
35
Both sets of parameters alter demand nonlinearly, but the mean utilities δjt are linear in θ1 —a fact
that can be exploited in computation of the estimator.
23
where Ω denotes the standard GMM weight matrix and fβ̃ (·|θ2 ) denotes the joint density
of β̃it ≡ βit − β0 , i.e., the consumer-specific components of the coefficients βit . Computa-
tion and inference are discussed in section 4.4.
4.2 Instruments
Broadly speaking, estimation of demand requires observables that provide exogenous
sources of independent variation in prices and quantities. In the case of market-level
data, such variation must come from instrumental variables that are excluded from the
relevant demand equations in an appropriate sense. The need to instrument for both
prices and quantities may be counterintuitive: to estimate demand, we might think
instruments for prices would suffice. As suggested in section 2, however, this is not the
case.
The need for excluded instrumental variables beyond those for prices will be explained
more formally in section 5. But this is easily seen in the BLP model by considering the
hypothetical case of exogenous prices. Even in this case it is clear that the model cannot
be estimated using only moments interacting the demand shocks ξjt with xjt and pjt :
in a parametric model, identification requires at least as many moment conditions as
parameters, and the parameters of the model include not only the coefficients θ1 =
(α0 , β0 ) on xjt and pjt in (4.3), but also the parameters θ2 governing the variation in the
random coefficients. Thus, additional moment conditions would be required.
Below we discuss several types of variables that can provide the necessary sources
of exogenous variation in prices and quantities. That is, what types of observables can
satisfy the requisite relevance and exclusion (conditional moment) conditions? An addi-
tional question is what functions of these observables most usefully play the roles of zjt in
the unconditional moment conditions whose sample analogs are given by (4.7). Thus, in
this section we also discuss the approximation of “optimal instruments.” One important
lesson from the literature is that the use of (approximately) optimal instruments can
greatly improve estimation precision.
24
associated with one market will be shifted by contemporaneous demand shifters in other
markets.
Noisy measures of a producer’s actual cost shifters can also serve as instruments. For
example, the average wage level in a producer’s labor market may not perfectly track the
producers’ labor costs but is nonetheless likely to be highly correlated with those costs.
Thus, such wage measures can serve as instruments as long as they are uncorrelated with
demand shocks conditional on the exogenous variables and consumer-specific measures
(e.g., income and education) included in the demand model.
A less obvious type of proxy that can sometimes serve as an instrument for pjt is
the contemporaneous price of the same good in another geographic market (see, e.g.,
Hausman, Leonard, and Zona (1994), Hausman (1996), and Nevo (2001)). This is often
referred to as a “Hausman instrument.” The logic of this instrument is that even if
we do not observe producer-specific cost shifters, variation in costs is likely present and
at least partially responsible for variation in the prices a producer sets in all markets
it serves. Thus, an observed price increase in market t0 can signal a change in the
producer’s costs that also shifts its equilibrium price in market t. Although the logic
of a Hausman instrument builds on that for a cost shifter, an important difference is
that, outside a perfectly competitive model, prices reflect not just firm costs but also
demand elasticities—something that depends on demand shocks. The excludability of
Hausman instruments, therefore, requires close scrutiny. Taking the example above, the
key assumption is that the price in market t0 is (mean) independent of ξjt conditional on
the exogenous xjt . This would fail if the demand shocks ξjt and ξjt0 are correlated, for
example through seasonal variation in demand that is not captured by the observable
product characteristics. More generally, to use any proxy for an exogenous change in
firm costs as an instrument, the proxy error must also be exogenous.
25
of both quantities and markups, particularly when used in good approximations to their
optimal (efficiency maximizing) form (see section 4.2.5). We provide additional discussion
of these instruments in section 5.4. Obviously, the validity of BLP instruments depends
on their exogeneity (mean independence from the demand shocks). In some cases, it may
be more natural to assume that at least some components of xjt are chosen by firm j with
knowledge of the demand shocks (or other shocks correlated with the demand shocks).
In that case, it is clear that these product characteristics cannot be used as instruments,
and an appropriate alternative strategy will be necessary.
26
the market(s) served by firm j.38
38
With chains of partially overlapping service areas, demographics in the service areas of competitors
to competitors could also serve as instruments. In practice, the power of such instruments would need
to be checked.
39
The extent to which variation in common partial ownership affects equilibrium pricing in practice is
an area of active current research, relevant on its own but also to the potential use of common ownership
measures as instruments.
40
See, for example, Miller and Weinberg (2017).
41
This question is also separate from the optimal linear weighting of a given set of moment conditions,
which is already incorporated in the GMM objective function through the weighting matrix. When
estimating demand alone, use of optimal instruments yields a just-identified model, making the GMM
weight matrix irrelevant. This is not the case when estimating demand and supply jointly (see Conlon
and Gortmaker (2020)).
27
considered the problem of optimal instruments under an assumed conditional moment
restriction of the form E[ξjt (θ0 )|zjt ] = 0. He showed, under certain assumptions, that the
∗ ∗
optimal instruments zjt to use in an unconditional moment of the form E[ξjt (θ0 )zjt ]=0
are
∂ξjt (θ0 )
∗ −1
zjt = Ψjt E zjt , (4.11)
∂θ
where Ψjt = E [ξjt (θ0 )2 |zjt ] .
As in many nonlinear models, the optimal instruments are infeasible to compute, since
they depend on both the unknown value θ0 and the unknown distribution of the demand
shocks that is implicit in the expectation operators. Several approaches to approximating
the optimal instruments have been developed. While Berry, Levinsohn, and Pakes (1995)
proposed an initial approach using low-order terms of a polynomial basis, improved op-
tions (improved basis functions or direct approximations of the expectations (4.11)) have
been proposed by Berry, Levinsohn, and Pakes (1999), Reynaert and Verboven (2014),
Gandhi and Houde (2020), and Conlon and Gortmaker (2020).
The consistent message from this literature is that use of (approximately) optimal
instruments can substantially improve the precision of estimates. These alternative ap-
proximation approaches are discussed in detail by Conlon and Gortmaker (2020). Their
associated PyBLP software incorporates many of these as options, along with the appro-
priate extensions for the case of joint estimation of demand and supply.
28
may be found in Andrews and Guggenberger (2017), Andrews (2018), and Andrews and
Mikusheva (2020). The separate topic of the local sensitivity of parameter estimates to
IV assumptions is considered in Andrews, Gentzkow, and Shapiro (2017).
where wjt and ωjt represent, respectively, observed and unobserved cost shifters associated
with good j. Here we have introduced the new parameters γ = (γ0 , γ1 ) that govern the
effects of cost shifters and quantity on marginal cost. Given any value of γ and the
demand parameters θ, the inverted first-order conditions together with equation (4.12)
42
Berry and Haile (2014) demonstrate nonparametric identification of marginal costs and cost functions
for a large class of supply models. The approach here generalizes the use of imperfectly competitive
first order conditions going back to Rosse (1970) and is inspired by the (somewhat different) use of
multiproduct first-order conditions in Bresnahan (1987).
29
imply a unique value of ωjt , so that we can write ωjt (σ, α, β, γ). Given any observables
z̃jt that are assumed to be mean-independent of ωjt , we now have additional moment
conditions of the form
E [ωjt (σ, α, β, γ) z̃jt ] = 0. (4.13)
A typical assumption is that both wt and xt (the two may overlap) are mean in-
dependent of each ωjt , yielding a large number of instruments that can be included in
the supply-side instruments z̃jt . Furthermore, (4.12) naturally models marginal cost as
depending only on own-cost shifters and own-quantity. This implies that additional pos-
sible excluded instruments include the exogenous cost shifters and demand shifters for
rival products. Thus, adding the supply side will often introduce few new parameters
relative to the number of new moment conditions. Importantly, these supply moments
depend not only on the cost parameters γ but also on the demand parameters. Thus,
except in the case of just-identification, the supply moments (4.13) will provide infor-
mation about demand parameters as well. In practice, this will often manifest through
substantial improvements in the precision of the demand estimates—the parameters βν
governing the heterogeneity in random coefficients, in particular—when one incorporates
supply moments (see, e.g., Berry, Levinsohn, and Pakes (1995) and Conlon and Gort-
maker (2020)).
When a model of supply leads to such overidentifying restrictions, it may be possible
to use the satisfaction or failure of these restrictions to discriminate between alternative
models of supply. This can be important in multiple ways. Hypotheses about firm “con-
duct” are often of direct interest. And to the extent that one is relying on the supply
model for precise demand estimates, it would be valuable to have a way of evaluating
the hypothesized model of firm behavior. This idea of discriminating between alternative
models of firm conduct has its roots in the pioneering work of Bresnahan (1981, 1987).
Although it is not possible to represent firm conduct as simply a parameter in a conjec-
tural variations model, Berry and Haile (2014) have generalized key insights from that
early literature to show that positing a model of firm conduct indeed provides falsifiable
restrictions that can discriminate between alternative models of conduct, even without
parametric specifications of demand or cost functions. The essence of their results is that
there are many observable (or estimable) sources of variation in market conditions that
alter equilibrium markups—differentially across different models of firm conduct. The
comparative statics predictions of a given model of firm conduct typically will not align
with the price variation observed in the data unless the hypothesized model is correct.
We refer readers to Berry and Haile (2014) for a more formal discussion. Statistical
testing procedures have recently been developed and applied by Backus, Conlon, and
Sinkinson (2020) and Duarte, Magnolfi, Sølvsten, and Sullivan (2021).
30
approximation of the integrals defining market shares and demand elasticities with an
algorithm similar to the sketch provided on page 23. This is an example of a nested-fixed-
point algorithm, using a contraction mapping to solve a set of fixed-point equations for
the demand shocks ξt (θ) that equate predicted and actual market shares in every market
at each trial value of θ.
Whether estimating demand in isolation or jointly with supply, proper computation
of the BLP estimator can be challenging. While many authors succeeded in imple-
menting and customizing the BLP algorithm, naı̈ve implementations can easily fail.43
Over the last decade, several authors—notably Dubé, Fox, and Su (2012) and Conlon
and Gortmaker (2020)—have aimed to modernize the approach to computing the BLP
estimator by combining modern computing capabilities, new computational methods,
and a set of “best practices” tailored specifically to computation of the BLP estimator.
Important advances include computational power allowing improvement in procedures
for approximation of the integrals defining market shares; Monte Carlo evidence yielding
critical guidance on convergence tolerances; management of potential rounding, overflow,
and underflow errors; new techniques for approximating optimal instruments; improved
methods for computing pricing equilibria; and the use of modern solvers, often exploiting
gradient-based optimization.
Conlon and Gortmaker (2020) discuss these and other important advances in de-
tail and propose a modern version of the BLP nested-fixed-point algorithm.44 Their
paper also serves as an introduction to open-source software (“PyBLP”) for implement-
ing this approach, either for estimating demand in isolation or simultaneous estimation
of demand and supply. Simulations in Conlon and Gortmaker (2020) illustrate relative
advantages of alternative techniques available among the many options offered in the Py-
BLP software. We refer readers to Conlon and Gortmaker (2020) for details, extensive
references, and advice on current best practices in computation of the BLP estimator,
optimal instruments, standard errors, and equilibrium counterfactuals.
Regarding standard errors, note that the program on page 23 intentionally defines a
set of moment conditions that permit the use of GMM inference techniques. There are
four potential sources of variance in the moment conditions: (a) the data variance across
products within markets, (b) the cross-market variance in data, (c) variance due to the
finite sample of consumers used to construct the market shares s̃jt and (d) variance due
to simulation draws (if any).
There is a small literature examining asymptotic issues that arise with the BLP es-
43
Knittel and Metaxoglou (2014) document such possibilities.
44
In an essential earlier contribution, Dubé, Fox, and Su (2012) examined potential problems with
poorly formulated versions of the nested fixed point algorithm and proposed the alternative approach
of applying standard specialized constrained optimization solvers to the program (4.6) defining the BLP
estimator. Dubé, Fox, and Su (2012) showed that this program often can be reformulated to yield a
form amenable to their “MPEC” (mathematical programming with equilibrium constraints) approach,
particularly when first and second derivatives of the Lagrangian can be supplied. The authors have
made code for this approach publicly available.
31
timator. Berry, Linton, and Pakes (2004b) discuss two related issues that arise with
asymptotic approximations treating the number of products as growing large : (a) sim-
ulation error in the approximation of choice probabilities implied by the model; and (b)
sampling error in the empirical market shares—sample means that are interpreted as
approximations to the population means implied by the model. These issues are closely
related, as some market shares must become small as the number of products per mar-
ket grows. Berry, Linton, and Pakes (2004b) note that the nonlinear inversion for the
mean utilities δt can cause the simulation and sampling errors to “blow up” as market
shares become small. It is therefore important to control the simulation error as much
as possible—by using a large number of simulation draws and/or importance sampling
techniques—or else to avoid simulation entirely by using accurate numerical integration.
Berry, Linton, and Pakes (2004b) discuss the need to account for simulation and sampling
error when reporting estimation results, and provide formulas for doing so. To focus on
the troublesome issue of small market shares, they present asymptotic variance results
as the number of products, J, grows large.45 Freyberger (2015) and Hong, Li, and Li
(2020) provide general treatments of the asymptotics of the BLP-style estimators as the
number of markets (and, if applicable, the number of simulation and sampling draws)
grows large.
Aside from standard errors for parameter estimates, one will typically be interested
in standard errors on counterfactual quantities of interest—e.g., a price change or wel-
fare change under a hypothetical merger or counterfactual policy. Current practice is
to construct such standard errors using either a parametric bootstrap or nonparametric
bootstrap. In the former case, one simulates parameter draws from their normal asymp-
totic approximation and recomputes the implied quantity of interest for each draw (see,
e.g., Nevo (2001)). In the latter case, one re-samples the data, re-runs the estimation
procedure on each bootstrap sample, and computes the implied quantity of interest for
each such bootstrap replication.
As a final issue, with sufficiently small consumer samples (relative to the number of
products), one may observe market shares s̃jt for some goods that are equal to zero,
even though the expected shares (the choice probabilities from the model) are strictly
positive. This creates a problem for any estimation strategy relying on inversion of
demand. Gandhi, Lu, and Shi (2019a) show that zero market shares nonetheless imply
bounds on mean utilities, yielding an estimation approach that is valid in the presence
45
Armstrong (2016) points out a number of pathologies that occur under particular assumptions as J
grows large. For example, many models of competition will imply constant (or even zero) markups in the
limit, hindering any IV strategy that relies on the induced variation in oligopoly markups. These results
might be taken as a warning about how to develop useful asymptotic approximations when taking limits
in the number of products; asymptotic results that assume perfect competition or constant markups will
provide poor approximations for actual differentiated-products oligopoly markets in which markups are
substantial and highly sensitive to the intensity of competition faced by each product. Note also that,
although some types of instruments act only through their effects on markups, the essential role of BLP
instruments from the perspective of identification (see section 5) is to provide exogenous variation in
quantities conditional on prices.
32
of zero shares. Quan and Williams (2018) provide a more specialized solution to this
problem when products appear in multiple markets.
46
Here we focus on identification of demand. Berry and Haile (2014) also demonstrate nonparametric
identification of marginal costs, cost shocks, and marginal cost functions, as well as discrimination
between alternative models of supply.
33
5.1.1 Multinomial Logit
Consider a multinomial logit specification in which consumer i’s utility from good j in
market t takes the form
uijt = xjt β − αpjt + ξjt + ijt .
A key feature of this model is the linear index
eδjt
sjt = P (5.2)
1+ k eδkt
is a nonlinear function of the indices δ1t , . . . , δJt . A key observation is that this map from
indices to market shares is easily inverted; in particular,
Substituting (5.3) into (5.1) yields an estimable equation for each good j of the form
34
The inverted demand equations, after dividing through by β (1) , take the form
(1) 1 α
xjt + ξ˜jt = (1) (ln(sjt ) − ln(s0t )) + (1) pjt , (5.5)
β β
ξjt (1)
where ξ˜jt = β (1) . Compared to (5.3), we still have a type of index—here xjt + ξ˜jt —on the
left-hand side; and this index is still equal to a tightly parameterized function of markets
shares and the price of good j. If we re-arrange one more time to write
(1) 1 α
xjt = (1)
(ln(sjt ) − ln(s0t )) + (1) pjt − ξ˜jt , (5.6)
β β
we get something that now resembles a regression equation, with an additively separable
error ξ˜jt on the right-hand side. This differs from a regression equation in one important
way, however: the variable on the left-hand size is an exogenous product characteristic;
i.e., it is mean independent of the “error term” ξ˜jt . Writing the equation in this un-
usual way forms a connection to the more complicated models we discuss below. An
implication of (5.6) is that, just as in the original fully linear multinomial logit model,
only one excluded instrument zjt is needed to identify this equation despite the presence
of two right-hand-side endogenous variables, (ln(sjt ) − ln(s0t )) and pjt . In particular,
with one such excluded instrument, we have the bivariate conditional moment restriction
(1)
E[ξjt |xjt , zjt ] = 0, which can identify the two parameters appearing on the right-hand
side of (5.6).
Here the subscript g denotes the nest (or “group”) to which product j belongs. Compared
to the multinomial logit, flexibility is added through the new coefficient λ on the within-
group share sj/g,t of good j. Each equation of this inverted demand system again looks
like a regression equation. However, we now need an instrument for the endogenous
variable ln(sj/g,t ) as well as for price. Note that ln(sj/g,t ) is a particular function of the
full share vector (s1 , . . . , sJ ) implied by the parametric functional form.
(2)
Again, if we condition on xt and drop it from the notation, we can rewrite each
equation of this inverted demand system as
(1) 1 α
xjt + ξ˜jt = (1) ln(sjt ) − ln(s0t ) − (1 − λ) ln(sj/g,t ) + (1) pjt
(5.7)
β β
On the left-hand side we have the same index derived under the multinomial logit; but
on the right-hand side we have a more complicated function of market shares and price.
35
Again, the rearrangement has no effect on the set of instruments needed.
• the presence of a one-to-one mapping between the indices and market shares, al-
lowing inversion of the demand system;
We will see below that these same ideas allow one to demonstrate nonparametric
identification of demand from market-level data. Following Berry and Haile (2014), we
introduce a nonparametric index restriction to an otherwise very general demand model;
the demand system is then inverted, yielding a set of inverse demand equations with
one demand shock per equation. Standard IV conditions yield identification of these
equations and, therefore, of the realized demand shocks. Identification of demand then
becomes trivial, as the values of all variables in the demand system are known.
36
5.2 Nonparametric Demand Model
Without loss, we condition on a fixed number of inside goods J. Let demand for each
good j in market t be given by
Although we write sjt on the left-hand side of (5.9), this measure of demand at the market
level may be measured in quantities, market shares, or other one-to-one transformations
of the demand vector (i.e., the quantities demanded).47 As above, xt represents all
observed exogenous characteristics of the market and goods,48 pt represents the prices of
all goods, and ξt represents the J-vector of demand shocks.
This representation of demand may be derived from a random utility discrete choice
model in which the utilities (ui1t , . . . , uiJt ) are drawn from some unknown joint distri-
bution FU (·|xt , pt , ξt ). However, the demand system (5.9) need not arise from a random
utility specification, or even a discrete choice model. Indeed, thus far, the demand model
is very general, with the only significant restriction being that the number of scalar de-
mand shocks ξjt is J. To demonstrate identification, Berry and Haile (2014) require three
main assumptions closely tied to our insights from the parametric examples.
(1)
Assumption 5.1 requires that xjt and ξjt enter the nonparametric function σ only
through the index δjt . This index requirement is an important nonparametric functional
form restriction. As our examples above suggest, this type of restriction is implicit in
parametric models widely used in the practice of demand estimation. Here, of course,
the indices (δ1t , . . . , δJt ) are permitted to alter demand for each good j through a fully
47
Berry, Gandhi, and Haile (2013) point out that transformations of quantities demanded to market
shares, expenditure shares, or even purely artificial notions of “shares” can be useful for verifying con-
ditions ensuring invertibility of the demand system (see section 5.2.2 below). However, if demand is
invertible (or is identified) under one known injective transformation of quantities, it is under all such
transformations as well.
48
Any non-price market-level observables—e.g., average demographic measures—may be included in
xt .
37
nonparametric function σj . We will see below why this index structure is valuable,49 and
how it may be relaxed when one has access to “micro data” on individual choices rather
than market-level data.
As in the parametric examples, the fact that demand shocks have no natural location
or scale means that we may assume without loss of generality that E[ξjt ] = 0 and |βj | = 1
for all j. Thus, henceforth we work with indices of the form50
(1)
δjt = xjt + ξjt . (5.11)
(2)
Furthermore, because the exogenous variables xt play no role in the identification argu-
(2)
ment, we will henceforth condition on an arbitrary value of xt without loss of generality
(2)
and suppress xt in the notation.
Part (i) of Assumption 5.2 requires that goods be weak substitutes with respect to
the indices: an improvement in the index δjt must weakly reduce the demand for other
goods. This is automatic in a discrete choice setting in which δjt can be interpreted as
an index altering good j’s quality. While part (i) requires only weak substitution, part
(ii) requires at least some strict substitution among goods j = 0, 1, . . . , J—essentially,
enough that there is no strict subset of goods that substitute only among themselves.
Berry, Gandhi, and Haile (2013) further show that part (ii) is equivalent to a certain
notion of connectedness in the graph of the substitution matrix among goods. In par-
ticular, suppose we represent each good by a vertex and construct a directed edge from
good j to good k if j strictly substitutes to k—i.e., if a reduction in δjt would lead to
an increase in the demand for good k. Part (ii) requires (with appropriate quantifiers)
that this directed graph exhibit a path from each node j to the node 0 associated with
the outside good. Figure 2 illustrates this graph for 2 classes of standard discrete choice
models in which δjt shifts the utility for good j without affecting the utility from other
goods; panel (a) shows the graph for standard random utility models with horizontal
49
Berry and Haile (2014) show that the additive separability of (5.1) is not essential, although relaxing
separability requires strengthening the “relevance” condition on instrumental variables, just as in the
case of regression models (Chernozhukov and Hansen (2005)).
50
If the sign of βj is not known a priori, it is easily determined under our assumptions.
38
differentiation (e.g., multinomial logit or probit, nested logit, mixed logit); panel (b)
shows that for models of vertical differentiation (e.g., Bresnahan (1981)). In both cases
we see that from the vertex associated with any good j > 0, there is a directed path
to the vertex associated with the outside good, as required by the connected substitutes
conditions.
3 2
1 2 3 4
4 1
0 0
(a) (b)
Directed graphs of the substitution matrix for standard discrete choice models, with J = 4
inside goods. Panel (a): standard random utility models of horizontal differentiation,
such as the multinomial logit, multinomial probit, nested logit, mixed logit/probit. Panel
(b): the pure vertical model with an outside good. From each vertex associated with an
inside good there is a directed path to the vertex associated with the outside good.
Berry, Gandhi, and Haile (2013) demonstrate satisfaction of this condition in a wide
range of demand models and demonstrate that, because invertibility of demand is ensured
whenever the connected substitutes conditions hold for some injective transformation of
the demand system, invertibility can be demonstrated by these conditions even in some
cases in which goods are complements. We refer readers to Berry, Gandhi, and Haile
(2013) for additional examples and discussion. For our purposes, the key implication is
that for all demand vectors st such that sjt > 0 for all j, there exists an inverse demand
system taking the form
39
5.3 Identification via Instruments
We will see that with equation (5.12), identification of demand will follow from availability
of instruments satisfying the same conditions required of instruments for identification
of regression models. In the case of nonparametric regression we are interested in an
equation of the form51
y = Γ (x) + , (5.13)
where x ∈ RK . Newey and Powell (2003) showed that given instruments z satisfying
the mean independence condition E[|z] = 0, a necessary and sufficient condition for
identification of the regression function Γ is a standard “completeness” condition: that
in the class of functions B(·) on RK such that E[B(x)|z] is finite, the only function B
such that E[B(x)|z] = 0 almost surely is a function that maps to zero almost surely
on its domain. This “completeness” condition is, thus, the nonparametric analog of the
standard rank condition for linear regression. It is the formal “relevance” requirement on
the instruments, defining precisely what it means for them to provide sufficient exogenous
variation in the regressors x.
To connect this to demand, observe that we may re-arrange each equation of (5.12)
as
(1)
xjt = σj−1 (st ; pt ) − ξjt , (5.14)
yielding a form similar to (5.13). Unlike the regression equation, here we have an exoge-
(1)
nous variable xjt —indeed, a variable that is an essential instrument—on the left and all
endogenous variables, (st , pt ), on the right. Nonetheless, a very similar argument demon-
strates that with excluded instruments zt , identification of the inverse demand function
σj−1 holds on the same pair of conditions ensuring identification in the case of regression.
(1)
h jt |zt , xt ] = 0 almost
Assumption 5.3 (Instruments). (i) For all j = 1, . . . , J, E[ξ i surely.
(1)
(ii) For all functions B (st , pt ) with finite expectation, if E B (st , pt ) |zt , xt = 0 almost
surely then B (st , pt ) = 0 almost surely.
We refer readers to Berry and Haile (2014) for the proof, which follows that of Newey
and Powell (2003) closely. Observe that when each function σj−1 is known, each ξjt can
be inferred immediately from prices, market shares, and (5.14). With each ξjt known,
identification of demand follows directly from equation (5.9).
Theorem 5.1 (Berry and Haile (2014)). Suppose (st , xt , pt , zt ) are observable and that
Assumptions 5.1–5.3 hold. Then for all j, the demand function σj is identified.
51
We abuse notation here to follow standard convention, letting y, x, and denote the components
of a regression model (with z denoting an excluded instrument). However, none of these components
should be confused with similarly denoted elements of the demand models we discuss.
40
5.4 Discussion
A broad interpretation of this identification result is that, despite the substantial chal-
lenges discussed in section 2, the main requirement for identification of demand is a
completely standard one: availability of suitable instruments for the endogenous vari-
ables. On one hand, this is comforting. On the other hand, the result raises several
natural questions about the precise instrumental variables requirements and whether
further structure on the demand model might relax these requirements.
each with one demand shock, as needed for application of standard instrumental variables
arguments. However, each of these inverted demand equations features endogenous vari-
ables (st , pt ) ∈ R2J on the right-hand side. More intuitively, observing that the function
σj−1 above must be strictly increasing in sjt , we can rewrite (5.15) as
(1)
sjt = hj s−jt , pt , xjt , ξjt
(1)
for some function hj . This formulation suggests that xjt can act as an “instrument for
itself,” but that we still need J instruments for prices and J − 1 instruments for the
endogenous quantities sjt .
Putting this differently, we know that to quantify demand, we must pin down how the
quantity demanded of each good j changes when one price varies and all others are held
fixed. This indicates a need for exogenous shifters of prices. But we know (recall section
2.1) that exogenous variation in prices is not sufficient. We must also hold the demand
indices δt fixed. This is not obviously feasible because the demand shocks entering δt
are not observed. However, by exploiting the bijection between the vector δt and the
vector of market shares st (all else fixed), we can indirectly control the value of δt by
41
controlling the share vector st . This requires additional instruments for the J-vector of
market shares.
The need to instrument for market shares can be seen in our parametric examples
as well. There we saw that as we increased the flexibility of the demand model, the
inverse demand equation used for estimation involved not only prices but also the market
share vector: market shares entered with no unknown parameters in the case of the
multinomial logit, but as we added flexibility to the parametric model, additional moment
conditions were required to estimate the inverse demand functions. In the limiting case of
a fully nonparametric model, we require instruments for all prices and quantities (market
shares).
Identification of hj or σj−1 will, intuitively, still require J − 1 instruments for s−jt . Cost
shifters and markup shifters are of no help in this case: because they affect market shares
only through prices, they provide no variation in demand beyond that already accounted
(2)
for directly through pt . And we have already conditioned on (held fixed) xt , leaving
(1)
no variation to exploit. Thus, the demand shifters x−jt are the only possibility in the
“menu” of candidate instruments discussed above.52
More simply, we have seen that in the most general model we need independent
variation in all J shares and all J prices. So no matter how much variation we can
generate in prices, we need something else that moves all shares at any given price
vector. The BLP instruments are the only candidates in our setting of market-level data.
52
Additional types of observables or additional assumptions could offer alternative strategies. See, for
example, sections 5.4.4, 6, and 7.
42
5.4.3 Why the Index?
Although the preceding discussion explains why the variation provided by BLP instru-
ments is critical, it may be less clear why these instruments are valid, even given the
exogeneity required by Assumption 5.3. The power of these instruments to shift market
shares is clear. But how is it that observables entering the demand system can end up
satisfying the relevant exclusion restriction? In what sense are these observables properly
excluded? The answer to this question emphasizes one role that the index restriction is
playing in the identification argument.
(2)
Recall that we conditioned on xt , so the inverse demand function associated with
the index δjt is really
(2)
σj−1 st , pt ; xt
(2) (2)
In terms of the identification argument, conditioning on xt really means that xt “in-
(1)
struments for itself.” The index restriction is what leaves xt out of the function σj−1 ,
(1) (1)
making xt available as instruments for shares—in particular, allowing x−jt to serve as
instruments for s−jt .
This feature of the index is closely connected to a restriction that is often so natural
that it is assumed without comment: that characteristics of a good alter its utility but
not the utility from other goods. In the canonical random utility model, this applies to
all components of xjt , pjt , ξjt , leaving x−jt out of the inverted demand equation expressing
ξjt as a function of market shares, prices, and the exogenous observables xjt . Although
(2)
we do not require a specification of utilities, the index structure leaves the role of xt
entirely free while preserving this same natural feature with respect to the components
(1)
of xt .
or
pjt = −σj−1 (st ) + ξjt .
43
In this case we would require only J instruments, providing exogenous variation in st
that need not be independent of prices. In this case, cost shifters, Hausman instruments,
Waldfogel instruments, etc. are all candidates.
(1)
As a second example, suppose that both xt and prices enter only through the indices,
as
(1)
δjt = ξjt + xjt β − αpjt ,
where we may again set α = 1 without loss. The inverse demand equations then have
(2)
the form (suppressing xt )
or
pjt = −σj−1 (st ) + xjt β + ξjt
Again, instruments are required only for the market share vector st , and the BLP instru-
ments x−jt are now available options in addition to those discussed above.
These are just two examples illustrating trade-offs between functional form restric-
tions and IV requirements. This points out one role that even nonparametric functional
form restrictions can play in practice: filling in the gap between the exogenous variation
that may be available in a given data set and that which would be needed to discriminate
between all nonparametric demand systems. Of course, some types of restrictions—e.g.,
adding symmetry, exchangeability, nesting, etc.—can arise as natural economic restric-
tions. And in some cases, better data will provide an avenue for relaxing the IV require-
ments. We turn next to a leading example of this: consumer-level data.
44
product×market-level demand shocks are fixed.
McFadden’s classic work on demand for transportation modes (McFadden, Talvitie,
and Associates (1977)) is an example of estimation from micro data. Other prominent ex-
amples involve demand for hospitals (e.g., Capps, Dranove, and Satterthwaite (2003), Ho
(2009)), retail outlets (e.g., Burda, Harding, and Hausman (2015)), residential locations
(e.g., Bayer, Ferreira, and McMillan (2007), Diamond (2016)), automobiles (e.g., Gold-
berg (1995), Petrin (2002)) and schools (e.g., Neilson (2020)). In these cases, observed
demographic measures and geographic locations often play important roles. Other exam-
ples of consumer-level observables include product-specific advertising exposure (Acker-
berg (2003)), consumer-newspaper ideological match (Gentzkow and Shapiro (2010)),
or the match between household demographics and those of a school or neighborhood
(Bayer, Ferreira, and McMillan (2007), Hom (2018)).
As we will see below, one significant advantage of micro data is the potential for
within-market variation to lessen (but not eliminate) reliance on instrumental variables.
Indeed, micro data can both reduce the number of instrumental variables necessary for
identification and make new kinds of instrumental variables available.
To consider parametric models, we again focus on a mixed logit specification, with
conditional indirect utilities of the form
where
L
X
(k) (k) (`,k) (k)
βit =β0 + βd di`t + βν(k) νit .
`=1
and
δjt = xjt β0 − α0 pjt + ξjt . (6.3)
McFadden, Talvitie, and Associates (1977) referred to δjt as the “alternative-specific”
constant, which was held fixed in certain policy counterfactuals. The modern IO litera-
ture emphasizes the dependence of these “constants” on prices and other observed and
unobserved factors. Of course, demand elasticities (and other key aspects of demand) are
defined by responses to variation in one determinant of δjt while all other determinants
45
are held fixed.53
As discussed in section 2, the latent demand shocks in these constants introduce the
challenges of simultaneity/endogeneity. There was at one point perhaps some confusion
as to whether the simultaneity problem arises in the micro-data context, reflecting the
observation that the individual consumer does not “cause” the price. But this observation
has no bearing on the simultaneity problem that arises from unobservables at the level of
the product or market. We will see below that micro data can provide partial solutions to
the simultaneity challenges. However, as our discussion below makes clear, the problems
of simultaneity/endogeneity stem not from the level of aggregation of the data but from
the presence of market×product-level demand shocks whose effects are confounded with
those of market×product-level prices. Addressing this endogeneity problem generally
will still require cross-market variation and instruments for prices.
With the specification above, choice probabilities for each consumer i take the form
Z
exp{δjt + µijt (νit ; βd , βν )}
sijt = PJt dFν (νit ) (6.4)
k=0 exp{δkt + µikt (νit ; βd , βν )}
for each good j = 0, 1, . . . , Jt . Let j(i) denote the good selected by consumer i in market
t. Substituting j(i) for j in (6.4) gives the likelihood contribution of consumer i’s choice
as a function of parameters (δ, αy , βd , βν ), where we have now defined δ = {δt }Tt=1 and
treated it as a parameter vector. Although this likelihood would need to be approximated
by simulating from the distribution Fν , this immediately suggests an estimation approach
for these parameters, at least when the number of observed consumers per good is large
in each market.54 In particular, one could estimate (δ, βd , βν ) by maximizing the product
of these (simulated) likelihoods over all consumers,
Z
Y exp{δj(i)t + µij(i)t (νit ; βd , βν )}
L(δ, βd , βν ) = PJt dFν (νit ). (6.5)
i,t k=0 exp{δkt + µikt (νit ; βd , βν )}
Of course, to answer most economic questions one will also need to estimate the
parameters α0 and β0 in (6.3). The simplest approach is to run a second-step linear
IV regression of the estimated δjt on xjt and pjt .55 As an example, Bayer, Ferreira,
53
A similar observation limits the applicability of random utility models that represent all latent factors
with a single (typically additive) random shock. For example, the random utility model in (6.1)–(6.3)
could be rewritten more compactly as uijt = φ(xjt , pjt ) + eijt , allowing correlation between eijt and
(xjt , pjt ). But this representation alone is not adequate as a model of demand in such a setting. To
measure an own-price demand elasticity, for example, one must hold ξjt (among other things) fixed while
letting both the mean utility for good j and the distribution of the stochastic term µijt (νit ; βd , βν ) adjust
to a change pjt . By specifying only a composite stochastic term eijt , the more compact representation
does not allow one to even define such ceteris paribus changes.
54
Otherwise, the parameters δ may be poorly estimated.
55
It will be necessary here to account for estimation error in the estimated δjt , so the correct standard
errors for estimates constructed this way are not those of two-stage least squares.
46
and McMillan (2007) and Bayer and Timmins (2007) apply this two-step method to the
demand for residential locations. Note that, in this parametric context, we now require
just one excluded instrument, for the endogenous pjt . In addition to cost shifters and
cost proxies, candidate instruments include BLP instruments and Waldfogel instruments.
In fact, unlike the case of market-level data, here one can use own-market forms of the
Waldfogel instruments, such as the market-level means of dit , as long as these are not
elements of xjt and not correlated with ξjt . Thus, the parametric micro data model
requires as few as one excluded instrument (for price) to learn all parameters and, as
compared to the market-level case, can allow the use of an additional class of instruments.
Although this two-step approach is instructive, it will often be preferable to estimate
all parameters at once, exploiting both within-market and cross-market variation. At a
minimum, this will typically aid efficiency. One-step estimation also avoids estimating
the extra “parameters” δ—potentially a large number of them—when in fact these are
defined in (6.3) as functions involving only α0 and β0 as unknown parameters.
A more subtle issue concerns identification. In the fully parametric model, it may in-
deed be possible to estimate the parameters (βd , βν ) using only within-market variation—
even using data from only a single market.56 But, the available results on nonparametric
identification with micro data (see section 7) suggest that this possibility is dependent
on the parametric structure. Indeed, the results there require both within-market and
cross-market variation even to learn the relative effects of consumer observables dit (the
nonparametric analog of βd here).
Exploiting all the variation in the data needed for nonparametric identification can
often lead to much more precise estimates of parametric models. For example, Berry,
Levinsohn, and Pakes (2004a) report that they tried to estimate a related random co-
efficients discrete choice model on micro data from a single market, but found that the
estimates were very noisy. When they added choice-set variation (in the form of “second-
choice” data—see section 6.3), the results became much more precise. This is consistent
with the idea that some form of cross-market data may be important for learning about
the parameters βd and βν .
To estimate all parameters jointly, one could estimate using moment conditions reflect-
ing the score of the likelihood (6.5) with respect to (δ, βd , βν ) together with orthogonality
conditions of the form
E [(δjt − xjt β0 − α0 pjt ) zjt ] = 0,
where zjt represents the exogenous xjt combined with excluded instruments for pjt . The
latter take the same form as the orthogonality conditions used to estimate from market-
level data, but without excluded instruments beyond those the for the price pjt . Again,
this illustrates a substantial advantage of micro data.
In practice estimation using a simulated likelihood function or simulated score func-
tion is sometimes unattractive. One reason is that simulating the log-likelihood (or its
56
We are not aware of results characterizing the essential sources of variation for identification of the
parametric model.
47
score) with sufficient precision can be computationally demanding, particularly when
some true choice probabilities are close to zero. Train (2009) provides useful discussion
and offers a number of computational tricks. Modern computational tools may expand
the applicability of such approaches. However, a common alternative is to avoid the like-
lihood and rely instead on moment conditions capturing key variation across consumers
and their choices.
Following Berry, Levinsohn, and Pakes (2004a), for example, one can combine mo-
ments reflecting market shares (typically fitting these exactly, as is usually done in the
case of market-level data) with “micro moments” characterizing key features of the joint
distribution of consumer i’s characteristics and the characteristics of her choice j(i).
Typical micro moments include covariances, or conditional expectations of consumer
characteristics given characteristics of the chosen product (or vice versa). As an ex-
ample of the latter, in the case of autos one might use the conditional expectations
of family size, age, and income conditional on the class of automobile (e.g., minivan,
compact, luxury, pickup). This type of one-step method of moments approach again il-
lustrates the more limited reliance on orthogonality conditions when one has micro data:
in essence, aggregate moment conditions that pin down the nonlinear parameters in the
case of market-level data can be replaced by micro moments that are sufficient to identify
(δ, βd , βν ).
The PyBLP software discussed previously provides code for estimating all parame-
ters from micro data using certain combinations of market-level moments and “micro
moments” like those discussed above. This open source code also provides a template
for adapting the computation to incorporate estimation with other combinations of mo-
ments.57
48
market into a geographically-defined market m and a time period t. We focus on the
case in which one observes many consumers (“large N ”) on a small number of choice
occasions (“small T ”) with many geographically-defined markets (“large M ”). For sim-
plicity, consider a two-period panel, so that t ∈ {0, 1}. We also focus on the case in
which one views a consumer’s random coefficients as reflecting stable preferences (thus,
fixed across time periods), with only their product-specific shocks ijmt drawn anew (and
independently) each time period.
The model’s prediction for the probability that a given consumer i in market m
chooses good j in period 0 and good k in period 1 then takes the form (adapting the
prior notation to the refined notion of markets)
eδjm0 +µijm0 (νim ;βd ,βν ) eδkm1 +µikm1 (νim ;βd ,βν )
Z
sijkm = P P dFν (νim ). (6.6)
1 + ` eδ`m0 +µi`m0 (νim ;βd ,βν ) 1 + ` eδ`m1 +µi`m1 (νim ;βd ,βν )
Replacing j and k with the choices actually made by consumer i then yields the likelihood
contribution for consumer i, as a function of the parameters (δ, βd , βν ).58 Just as in the
case of micro data, the score of this likelihood function could be combined with additional
moment conditions to yield an estimation strategy for all model parameters using the
method of simulated moments and instruments for prices.59
The practical concerns associated with relying on a simulated log-likelihood with mi-
cro data can become more severe with panel data. Even with just two time periods,
the choice probabilities above involve (J + 1)2 combinations of potential choices, and the
true likelihood of some jk combinations observed in the data may, in certain contexts,
be extremely small. Again, one could avoid such problems by instead using micro mo-
ments that incorporate some degree of aggregation. There are many possibilities, but a
typical approach would start from the types of aggregate moments and micro moments
discussed for estimation with micro data, adding moments capturing the extra informa-
tion provided by the consumer panel: relationships between choices of the same consumer
across choice occasions. For example, in the case of data on grocery purchases in a given
product category, one might include as moments the covariances between the measured
characteristics of products selected across shopping trips.
49
(e.g., within a strategy-proof school choice mechanism).60 In principle, one could see
consumer’s ranking of all products, although typically one observes only a shortlist of
top choices.
This type of ranked choice data can be thought of as an ideal form of a consumer
panel. One observes, for each consumer, responses to the questions (a) what is your
preferred product among all options? (b) what is your preferred product among all
options excluding your favorite? (c) what is your preferred product among all those
excluding your top two choices, . . . . This is very similar to a consumer panel offering
observation of the same consumer’s choice from multiple choice sets.61 But there are at
least two advantages to ranked choice data. First, the absence of temporal separation can
avoid any question about which stochastic components of the model should be view as
fixed across “choice occasions.” Second, the type of “variation in the choice set” provided
by ranked choice data is ideal for assessing which products are closest substitutes. Indeed,
the substitution patterns that we have focused on as a primary challenge of demand
estimation are closely connected to the relationships between first and second choices.
Observing the first and second choices directly is therefore very powerful.62 And, of
course, the relationships among a longer list of ranked choices are driven by the same
components of the model.
Estimation in the case of ranked choices can proceed along the lines suggested for
a consumer panel. A likelihood approach could again be used, although with the same
types of caveats.63 The method of simulated moments again offers alternatives. Here, for
example, one might combine market shares (average choice probabilities for first choices)
with moments characterizing covariance between components of consumer characteristics,
characteristics of first-choice products, and characteristics of first- and second-choice
products (see Berry, Levinsohn, and Pakes (2004a)).
6.4 Hybrids
A common situation in practice is that one has a combination of multi-market market-
level data and a limited set of micro data or ranked choice data. One prominent example
60
Allenby, Hardt, and Rossi (2019) provides an overview of conjoint analysis. Hastings, Kane, and
Staiger (2010) consider demand for ranked school choices and are followed by a large literature, including
Agarwal and Somaini (2018).
61
For at least some purposes, one will need to assume that the data provide correctly ranked pref-
erences. In conjoint analysis, rankings are typically based on self-reports of hypothetical behavior,
potentially introducing noise in the measured rankings. And in the case of school choice data, even with
a strategy-proof school assignment mechanism, parents might not understand or trust the strategy-proof
nature of the problem they face.
62
See also the discussion in Conlon and Mortimer (2020).
63
In the mixed logit model considered in this section, the joint probability of a consumer’s rankings
factors as a product of standard logit choice probabilities for appropriately defined choice sets, conditional
on (dit , yit , νit ). See, e.g., Train (2009).
50
in the literature is Petrin’s (2002) study of welfare gains from introduction of a new
product, which combined aggregate market shares for automobiles with a small sample
of micro-data from the Consumer Expenditure Survey (CEX). Another is Goeree’s (2008)
study of advertising and personal computer demand, which combined market shares for
individual models with a limited micro-data sample linking individual characteristics to
computer purchases by brand.64 This kind of data configuration lends itself to the type
of simulated method of moments estimation approach discussed above, combining (a)
BLP-style moments involving aggregate market shares and exogeneity of instruments
and (b) moments defined as the difference between predicted and actual (covariance or
conditional mean) relationships between individual characteristics and characteristics of
the product chosen.
51
by equations
sijt = σj (dit , yit , xt , pt , ξt ) j = 1, . . . , J. (7.1)
We will interpret sijt as the probability consumer i in market t chooses good j in a
discrete choice demand model, although the arguments generalize to other settings (e.g.,
continuous demand, mixed discrete-continuous demand). As in our previous discussion
of identification, we have conditioned on a given number of goods J. Compared to
the model considered in the case of market-level data, here we have added observed
individual-specific measures (dit , yit ) as determinants of demand. The notation for the
other arguments of σj is as in the preceding sections, with all goods’ prices, product
characteristics, and demand shocks entering the demand for each good j.
Here we will partition (dit , yit ) so that dit ∈ RJ and yit ∈ RH , with H ≥ 0. Thus,
although there is no upper limit on the number of consumer-specific observables, we re-
quire at least as many consumer observables as goods. Furthermore, although we will
require a nonparametric index restriction on the way dit enters the demand model, we can
accommodate other consumer observables yit in an unrestricted way. The observables,
for purposes of considering identification, consist of dit , yit , xt , pt , and choice probabilities
conditional on (dit , yit ) in each market t. In addition, there are observed excluded instru-
ments that we discuss below. We treat the characteristics xt as exogenous, as we have
done in prior sections. To discuss identification we can then condition on an arbitrary
value of xt and suppress xt in the notation below.65
In addition to the required degree of variation in dit , choice sets and price instruments,
the identification results in Berry and Haile (2020) rely on a set of core assumptions on
demand. These play some of the roles of the “index and inversion” assumptions we
discussed in the case of market-level data, although they take a different form. The four
main assumptions are:
(i) For all j, σj (dit , yit , pt , ξt ) = σj (γ(dit , yit , ξt ), yit , pt ), with γ (dit , yit , ξt ) ∈ RJ .
(ii) σ (·, yit , pt ) is injective on the support of γ(dit , yit , ξt ) conditional on (yit , pt ).
65
Identification of demand conditional on endogenous xt is an ongoing area of research, related to
Berry and Haile (2020). Some models of endogenous xt may allow one to condition on endogenous xt
and still identify demand responses to ceteris paribus changes in prices.
52
(iii) requires that the index vector itself be invertible with respect to the vector dit .
This is satisfied automatically in some standard specifications, such as linear random
utility models in which each component dijt of dit affects only the utility of good j.66
Assumption (iii) avoids this requirement but maintains the requirement, given any value
of the demand shocks ξt , of a one-to-one mapping between the vector dit and the index
vector. Finally, Assumption (iv) requires that the index function be additively separable.
As with the identification results of section 5, a key motivation for this restriction is to
allow use of the same instrument relevance condition that is required for identification in
additively separable regression models.
To connect these assumptions to familiar parametric models, consider the mixed-logit
random utility specification67
and X
(k) (k) (k) (k) (0)
µijt = xjt β0j + βνj νit − pjt exp(α0 + αy yit + αν νit ) + ijt
k
Thus, recalling that L = J,68 our key assumptions hold (recall that those assumptions
are stated conditional on the suppressed xt , treating xt fully flexibly) as long as the J × J
P (k) (`,k)
matrix of coefficients on dit (whose elements are k xjt βdj ) is full rank.
66
Such an exclusivity condition is often combined with assumptions of independence and “large sup-
port” to demonstrate identification through a “special regressor” argument (see, for example, Lewbel
(2014)). The results discussed here will not require exclusivity, independence, or large support.
67
This example generalizes the canonical model of section 3.2 by allowing random coefficients on
xjt to vary with j. With L = J, the common special case in which dijt enters only the utility for
(k,`)
good j is then obtained by setting βdj to zero except when k = 1 (recall that the first component
of each xjt is a one) and ` = j. Logit specifications allowing choice-specific coefficients on consumer
characteristics are sometimes distinguished from models with choice-specific characteristics by labeling
the former “multinomial logit” and the latter “conditional logit.”
68
Any additional observables are easily incorporated outside the mapping g without further restriction.
The essential requirement is that consumer-level observables have dimension at least J.
53
7.2 Identification
Here we sketch the identification arguments, proceeding in three steps. First, a combina-
tion of within-market and cross-market variation is exploited to uncover the index func-
tion g : RJ → RJ . Then cross-market variation—including that produced by excluded
instruments for prices—allows identification of the demand shocks ξjt for all goods and
markets in the same way that residuals in a nonparametric regression model are iden-
tified. Finally, with the demand shocks known, identification of demand is immediate
from the definition of demand in (7.1). This argument does not require variation in yit ,
so we henceforth fix yit at an arbitrary value and suppress it in the notation.69
σ (g (d∗ ) + ξ, p) = s.
This d∗ is the vector of consumer characteristics that generate the choice probability
vector s (given (ξt , pt ) = (ξ, p)). So we may write
d∗ (s; ξ, p) .
Note that, because choice probabilities conditional on dit in each market t are observed,
d∗ (s; ξt , pt ) is observed for all t and s ∈ S (ξt , pt ) even though no ξt is observed or known
at this point.
If we differentiate (7.3) within a market t where pt = p and d∗ (s; ξt , p) = d, we obtain
54
equal and letting d0 = d∗ (s; ξt0 , p), we see that
−1
∂g (d0 ) ∂g (d) ∂d∗ (s; ξt , p) ∂d∗ (s; ξt0 , p)
= . (7.5)
∂d ∂d ∂s ∂s
0
The only unknowns in (7.5) are the matrices ∂g(d ∂d
)
and ∂g(d)
∂d
. Without loss, one may
choose an arbitrary point d and set both g(d ) and ∂g (d )/∂d to arbitrary values.70
0 0 0
Thus, by starting at d0 and stringing together relationships of the form (7.5) covering all
points in the support of dit , one can determine the derivatives of g on the entire support
and integrate these derivatives up to the value of g at all such points.71
Observe that the function g determines the effects of dit on demand relative to those
of the demand shocks ξjt . Thus, identification of g is a key step toward identification of
effects of dit on demand. Because the latent ξt is fixed within a market and alters the
demand of all consumers in that market, it may be unsurprising that identification of
g would require both within-market and cross-market variation. Our proof in this step,
indeed, uses both types of variation. Equations (7.3) and (7.4) make use of within-market
variation, where ξt is fixed. Equation (7.5) then makes use of cross-market variation.
∗
Although ∂d (s;ξ∂s
t ,p)
is observed in each market, equation (7.4) does not, by itself, allow
∂g(d)
one to distinguish ∂d from ∂σ −1 (s, p)/∂s. This indeterminacy is resolved by the cross-
market variation used in equation (7.5).
55
A helpful fact is that higher values of pjt and ξjt typically have opposite effects, whereas
equilibrium pricing typically implies positive dependence between pjt and ξjt . Thus, the
effects of even large variation in the demand shocks may be substantially damped by
the accompanying variation in prices. In any case, this assumption strictly relaxes the
“large support” assumption sometimes relied upon to ensure nonparametric identification
in discrete choice models. A large support assumption on dit would require sufficient
variation to move choice probabilities to every point in the simplex in every market; i.e.,
large support would imply that every vector of nonzero choice probabilities is a common
choice probability. Here only one such vector is required.
With a common choice probability vector s∗ , in every market t we have J inverse
demand equations of the form
In each equation, the left-hand side is now known. On right-hand side, s∗ is fixed across
markets. Thus, each equation (7.6) takes the form of a nonparametric regression equation
with a separable structural error. Identification of the “regression function” σj−1 (s∗ ; pt )
then follows immediately from the identification result of Newey and Powell (2003) given
instruments for the endogenous variables—pt here—satisfying the standard mean inde-
pendence (exclusion) and completeness (relevance) conditions.
This immediately implies identification of each ξjt as well. With the demand shocks
ξjt known, identification of demand follows immediately from equation (7.1), since choice
probabilities sijt are observed and all arguments of the functions σj (·) are now known.
7.3 Discussion
These results provide formal confirmation of a key benefit of micro data suggested in our
discussion of the parametric models: micro data with sufficient variation, combined with
choice sets that vary appropriately across markets, allow us to replace instruments for
quantities with micro-data variation via dit . The need for J-dimensional variation in dit is
directly connected to the J endogenous quantities. Note, however, that the “exogeneity”
of the micro-data variation arises not from an exclusion restriction but from the fact
that within a single market, market-level demand shocks simply do not vary. Thus, with
micro data, one has many of the same advantages that allow “within estimation” of slope
parameters in other types of panel data models.
These results imply that micro data can cut the number of required instruments by
half. Related but distinct is the fact that the BLP instruments are no longer required.
A benefit of the latter is our ability to treat all components of xt in a fully flexible way,
considering a strictly more general model than that considered in section 5 for the case
of market-level data. Of course, as discussed in section 5.4.3, this fully general treatment
of xt also implies that BLP instruments are not even available as candidate instruments.
BLP instruments can be made available by requiring some components of xt to enter
through the indices γt (see Berry and Haile (2020) for details). And the other types
56
of instruments discussed in section 4.2 remain valid candidates to instrument for prices
here.
Finally, although the formal results address only the case in which micro data variation
fully replaces J instruments, we expect that micro data with more limited variation will
still be useful in practice. There are many applied cases where one has only “partial”
micro data, or micro data with more limited dimension. In these cases, an applied
researcher might employ a combination of BLP-style instruments and moment conditions
reflecting the available micro data, as discussed in section 6.
57
A third topic of interest is invertibility of demand. Inversion of the demand sys-
tem arises repeatedly our discussions of both identification and estimation. This reflects
a reliance on inversion to address the fundamental challenges of simultaneity and the
appearance of many structural errors (demand shocks) in each demand equation. But
invertibility of a demand system is not automatic, and demand systems often violate
standard conditions used to establish invertibility (i.e., injectivity or “univalence”) of
multivariate maps. Berry, Gandhi, and Haile (2013) discuss this issue and offer invert-
ibility conditions that can accommodate some forms of complementarity. However, in-
vertibility (or alternatives) in the case of complements has not been fully explored.73 The
same is true of invertibility in models in which consumers make multiple purchases. A
related question is the extent to which there are helpful estimation approaches—perhaps
involving partial identification—for settings in which invertibility fails.
The necessity of further progress should not obscure progress made, however. Com-
pared to the situation 10 or so years ago, we now have access to a set of useful identifi-
cation results, greater clarity about the role of instruments and the types of instruments
available for estimating demand, more fully developed asymptotic theory, and set of
now well-proven computational tools for constructing standard estimators. This progress
suggests that much of the best future research will involve serious application of these
existing tools.
73
For a simple approach allowing a particular kind of complementary, see also Fosgerau, Monardo,
and De Palma (2020).
58
References
Ackerberg, D. (2003): “Advertising Learning and Customer Choice in Experience
Good Markets: A Structural Empirical Examination,” International Economic Review,
44, 1007–1040.
Andrews, I. and A. Mikusheva (2020): “Optimal Decision Rules for Weak GMM,”
Working paper, Harvard.
59
Backus, M., C. Conlon, and M. Sinkinson (2020): “Common Ownership and
Competition in the Ready-to-Eat Cereal Industry,” Working paper, New York Univer-
sity.
Bayer, P., F. Ferreira, and R. McMillan (2007): “A Unified Framework for
Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy,
115, 588–638.
Bayer, P. and C. Timmins (2007): “Estimating Equilibrium Models of Sorting Across
Locations,” The Economic Journal, 117, 353–374.
Ben-Akiva, M. E. (1973): “Structure of Passenger Travel Demand Models,” Ph.D.
thesis, MIT Department of Civi Engineering.
Benkard, L. and S. T. Berry (2006): “On the Nonparametric Identification of
Nonlinear Simultaneous Equations Models: Comment on Brown (1983) and Roehrig
(1988),” Econometrica, 74, 1429–1440.
Berry, S. (1994): “Estimating Discrete Choice Models of Product Differentiation,”
RAND Journal of Economics, 23, 242–262.
Berry, S., M. Carnall, and P. Spiller (1996): “Airline Hubs: Costs, Markups
and the Implications of Consumer Heterogeneity,” Working paper no. 5561, NBER.
Berry, S., J. Levinsohn, and A. Pakes (1995): “Automobile Prices in Market
Equilibrium,” Econometrica, 60, 889–917.
——— (1999): “Voluntary Export Restraints on Automobiles: Evaluating a Strategic
Trade Policy,” American Economic Review, 89, 189–211.
——— (2004a): “Differentiated Products Demand Systems from a Combination of Micro
and Macro Data: The New Vehicle Market,” Journal of Political Economy, 112, 68–
105.
Berry, S., O. Linton, and A. Pakes (2004b): “Limit Theorems for Differentiated
Product Demand Systems,” Review of Economic Studies, 71, 613–614.
Berry, S. T., A. Gandhi, and P. A. Haile (2013): “Connected Substitutes and
Invertibility of Demand,” Econometrica, 81, 2087–2111.
Berry, S. T. and P. A. Haile (2014): “Identification in Differentiated Products
Markets Using Market Level Data,” Econometrica, 82, 1749–1797.
——— (2018): “Nonparametric Identification of Simultaneous Equations Models with a
Residual Index Structure,” Econometrica, 86, 289–315.
——— (2020): “Nonparametric Identification of Differentiated Products Demand Using
Micro Data,” Working paper no. 27704, National Bureau of Economic Research.
60
Bhattacharya, D. (2018): “Empirical Welfare Analysis for Discrete Choice: Some
General Results,” Quantitative Economics, 9, 571–615.
Bresnahan, T. (1981): “Departures from Marginal Cost Pricing in the American Au-
tomobile Industry,” Journal of Econometrics, 17, 201–227.
——— (1987): “Competition and Collusion in the American Automobile Oligopoly: The
1955 Price War,” Journal of Industrial Economics, 35, 457–482.
61
DellaVigna, S. and M. Gentzkow (2019): “Uniform Pricing in U.S. Retail
Chains*,” The Quarterly Journal of Economics, 134, 2011–2084.
Dubé, J.-P., J. T. Fox, and C.-L. Su (2012): “Improving the Numerical Performance
of Static and Dynamic Aggregate Discrete Choice Random Coefficients Demand Esti-
mation,” Econometrica, 80, pp. 2231–2267.
Dubé, J.-P., G. Hitsch, and P. Rossi (2010): “State Dependence and Alternative
Explanations for Consumer Inertia,” RAND Journal of Economics, 41, 417–445.
Eizenberg, A. (2014): “Upstream Innovation and Product Variety in the United States
Home PC Market,” Review of Economic Studies, 81, 1003–1045.
Gandhi, A., Z. Lu, and X. Shi (2019a): “Estimating Demand for Differentiated
Products with Zeroes in Market Share Data,” Working paper, University of Wisconsin-
Madison.
Gentzkow, M. and J. Shapiro (2010): “What Drives Media Slant? Evidence from
U.S. Newspapers,” Econometrica, 78, 35–71.
62
Gillen, B. J., H. R. Moon, S. Montero, and M. Shum (2019): “BLP2-Lasso for
Aggregate Discrete Choice Models with Rich Covariates,” The Econometrics Journal,
22, 262–281.
Gillen, B. J., H. R. Moon, and M. Shum (2014): “Demand Estimation with High-
dimensional Product Characteristics,” in Advances in Econometrics: Bayesian Model
Comparison, ed. by I. Jeliazkov and D. Poirier, Emerald Publishing, vol. 34.
Goeree, M. S. (2008): “Limited Information and Advertising in the US Personal Com-
puter Industry,” Econometrica, 76, 1017–1074.
Goldberg, P. K. (1995): “Product Differentiation and Oligopoly in International Mar-
kets: The Case of the U.S. Automobile Industry,” Econometrica, 63, 891–951.
Gowrisankaran, G. and M. Rysman (2012): “Dynamics of Consumer Demand for
New Durable Goods,” JPE, 120, 1173–1219.
Handel, B. (2013): “Adverse Selection and Inertia in Health Insurance Markets: When
Nudging Hurts,” American Economic Review.
Hastings, J., T. Kane, and D. Staiger (2010): “Heterogeneous Preferences and
the Efficacy of Public School Choice,” Tech. rep., Brown University.
Hausman, J., G. Leonard, and J. Zona (1994): “Competitive Analysis with Dif-
ferentiated Products,” Annales d’Economie et de Statistique, 34, 159–180.
Hausman, J. and D. Wise (1978): “A Conditional Probit Model for Qualitative
Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Prefer-
ences,” Econometrica, 46, 403–426.
Hausman, J. A. (1996): “Valuation of New Goods under Perfect and Imperfect Com-
petition,” in The Economics of New Goods, ed. by T. F. Bresnahan and R. J. Gordon,
Chicago: University of Chicago Press, chap. 5, 209–248.
Heckman, J. J. (1981): “Heterogeneity and State Dependence,” in Studies in Labor
Markets, ed. by S. Rosen, University of Chicago Press.
Hendel, I. and A. Nevo (2006): “Sales and consumer inventory,” The RAND Journal
of Economics, 37, 543–561.
Ho, K. (2009): “Insurer-Provider Networks in the Medical Care Market,” American
Economic Review, 99, 393–430.
Hom, M. (2018): “School Choice, Segregation and Access to Quality Schools: Evidence
from Arizona,” Working paper, Yale University.
Hong, H., H. Li, and J. Li (2020): “BLP estimation using Laplace transformation
and overlapping simulation draws,” Journal of Econometrics.
63
Imbens, G. W. and W. K. Newey (2009): “Identification and Estimation of Trian-
gular Simultaneous Equations Models Without Additivity,” Econometrica, 77, 1481–
1512.
Kim, K. and A. Petrin (forthcoming): “Control Function Corrections for Unobserved
Factors in Differentiated Products Models,” Quantitative Marketing and Economics.
Knittel, C. R. and K. Metaxoglou (2014): “Estimation of Random-Coefficient
Demand Models: Two Empiricists’ Perspective,” Review of Economics and Statistics,
96, 34–59.
Koopmans, T. C. (1949): “Identification Problems in Economic Model Construction,”
Econometrica, 17, 125–144.
Lewbel, A. (2014): “An Overview of the Special Regressor Method,” in The Oxford
Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics,
ed. by J. S. Racine, L. Su, and A. Ullah, Oxford University Press, 38–62.
MacKay, A. and N. H. Miller (2021): “Estimating Models of Supply and Demand:
Instruments and Covariance Restrictions,” Tech. rep., Harvard.
Mas-Colell, A., M. D. Whinston, and J. R. Green (1995): Microeconomic The-
ory, Oxford University Press.
Matzkin, R. L. (2003): “Nonparametric Estimation of Nonadditive Random Func-
tions,” Econometrica, 71, 1339–1375.
——— (2008): “Identification in Nonparametric Simultaneous Equations,” Economet-
rica, 76, 945–978.
——— (2015): “Estimation of Nonparametric Models with Simultaneity,” Econometrica,
83, 1–66.
McFadden, D. (1974): “Conditional Logit Analysis of Qualitative Choice Behavior,”
in Frontiers of Econometrics, ed. by P. Zarembka, New York: Academic Press.
——— (1978): “Modelling the Choice of Residential Location,” in Spatial Interaction
Theory and Planning Models, ed. by A. Karlvist, Amsterdam: North Holland, 75–96.
——— (1981): “Econometric Models of Probabilistic Choice,” in Structural Analysis of
Discrete Data with Econometric Applications, ed. by C. Manski and D. McFadden,
Cambridge, MA: MIT Press.
McFadden, D., A. Talvitie, and Associates (1977): Demand Model Estimation
and Validation, Berkeley CA: Institute of Transportation Studies.
Miller, N. H. and M. C. Weinberg (2017): “Understanding the Price Effects of the
MillerCoors Joint Venture,” Econometrica, 85, 1763–1791.
64
Neilson, C. (2020): “Targeted Vouchers, Competition Among Schools, and the Aca-
demic Achievement of Poor Students,” Working paper, Princeton University.
Petrin, A. (2002): “Quantifying the Benefits of New Products: The Case of the Mini-
van,” Journal of Political Economy, 110, 705–729.
Rosse, J. N. (1970): “Estimating Cost Function Parameters without using Cost Func-
tion Data: An Illustrated Methodology,” Econometrica, 38, 256–275.
Train, K. E. (2009): Discrete Choice Methods with Simulation, Cambridge Press, 2nd
ed.
65