Berry and Haile (2021) WP - Foundations of Demand Estimation

NBER WORKING PAPER SERIES
FOUNDATIONS OF DEMAND ESTIMATION
Steven T. Berry
Philip A. Haile
Working Paper 29305

http://www.nber.org/papers/w29305
NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue
Cambridge, MA 02138
September 2021
We benefited from detailed comments from Chris Conlon, Jean-Pierre Dubé, Aviv Nevo, and
Frank Verboven. Jaewon Lee provided capable research assistance. The views expressed herein
are those of the authors and do not necessarily reflect the views of the National Bureau of
Economic Research.
NBER working papers are circulated for discussion and comment purposes. They have not been
peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies
official NBER publications.
© 2021 by Steven T. Berry and Philip A. Haile. All rights reserved. Short sections of text, not to
exceed two paragraphs, may be quoted without explicit permission provided that full credit,
including © notice, is given to the source.
Foundations of Demand Estimation
Steven T. Berry and Philip A. Haile
NBER Working Paper No. 29305
September 2021
JEL No. C36,D12,L20
ABSTRACT
Demand elasticities and other features of demand are critical determinants of the answers to most
positive and normative questions about market power or the functioning of markets in practice.
As a result, reliable demand estimation is an essential input to many types of research in
Industrial Organization and other fields of economics. This chapter presents a discussion of some
foundational issues in demand estimation. We focus on the distinctive challenges of demand
estimation and strategies one can use to overcome them. We cover core models, alternative data
settings, common estimation approaches, the role and choice of instruments, and nonparametric
identification.
Steven T. Berry
Department of Economics
Yale University
Box 208264
37 Hillhouse Avenue
New Haven, CT 06520-8264
and NBER
[email protected]
Philip A. Haile
Department of Economics
Yale University
37 Hillhouse Avenue
P.O. Box 208264
New Haven, CT 06520
and NBER
[email protected]
Contents
1 Introduction 3
1.1 Why Estimate Demand? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Our Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 The Challenges of Demand Estimation 4

2.1 The First Fundamental Challenge . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Second Fundamental Challenge . . . . . . . . . . . . . . . . . . . . . 6
2.3 Demand Is Not Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 A Surprisingly Difficult Case: Exogenous Prices . . . . . . . . . . . . . . 7
2.5 Many Common Tools Fall Short . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Balancing Flexibility and Practicality . . . . . . . . . . . . . . . . . . . . 13
2.7 Demand or Utilities? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Discrete Choice Demand 14

3.1 Random Utility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 The Canonical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Why Random Coefficients? . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Market-Level Data 21
4.1 The BLP Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Using a Supply Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Computing the BLP Estimator and Standard Errors . . . . . . . . . . . 30
5 Nonparametric Identification: Market-Level Data 33

5.1 Insights from Parametric Models . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Nonparametric Demand Model . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Identification via Instruments . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Micro Data, Panels, and Ranked Choices 44

6.1 Micro Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Consumer Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Ranked Choice Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Hybrids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 Nonparametric Identification with Micro Data 51

7.1 Nonparametric Demand Model . . . . . . . . . . . . . . . . . . . . . . . 51
7.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8 Some Directions for Future Work 57
2
1 Introduction
1.1 Why Estimate Demand?
Little can be said about the functioning of a market without a quantitative assessment of
demand. In the modern Industrial Organization (“IO”) literature, for example, estima-
tion of demand elasticities is essential for measuring markups and quantifying sources of
market power. And almost any counterfactual question about a market requires quanti-
tative measures of how choices respond to ceteris paribus changes in the prices or other
characteristics of the available options. Such measures seldom suffice on their own to
provide full answers to the economic questions of interest concerning market outcomes,
but they are often necessary. Examples include assessments of
• the impact of a tax or subsidy;
• the social value of a new good;
• the effect of a tariff on prices;
• the effect of a merger on consumer prices;
• outcomes under alternative school choice regimes;
• the impact of adverse selection on insurance markets; and
• any quantitative question (and many qualitative questions) concerning consumer

welfare.
Of course, the importance of measuring demand is not limited to the realm of IO. As these
examples suggest, demand is central to substantive policy questions in public economics,
trade, health, education, and other fields.
The problem of demand estimation is not new. However, it is both challenging and
subject to some common misconceptions. It has also been the focus of substantial atten-
tion in recent years. Critical contributions have come from scholars in a range of fields.
But IO economists often have led the way to the recent progress in demand estimation.
One reason is the essential role that demand plays in questions of competition, market
power, and market outcomes—questions at the heart of the field. The tradition in modern
IO has also been to insist on trying to measure what is important to its core questions,
rather than focusing on what is easy to measure given current data and estimation tech-
niques. Determining what to measure and how to do so often requires guidance from an
economic model, particularly in the context of imperfect competition and market equi-
librium. This has led to a strong connection between theory and empirical work in IO.
All of this has forced IO economists to face some difficult issues associated with demand
estimation and to press for new solutions.
3
1.2 Our Focus
Our goal in this chapter is to provide a unified and (reasonably) compact treatment of
some key ideas and practices developed in several literatures. Our emphasis naturally
reflects our own perspectives. We limit our focus to a set of foundational issues: the dis-
tinctive challenges of demand estimation; the most common empirical models of demand
in IO; the role and choice of instrumental variables; different data types (market-level
data, micro data, panel data); and nonparametric identification. This focus implies that
we neglect many other important issues, variations of the standard models, and appli-
cations. The chapter of Gandhi and Nevo in this Handbook has a different and highly
complementary focus.
We begin with a discussion of the special challenges posed by the problem of demand
estimation. We then review the classic discrete choice model of demand and move on
to estimation and identification using market-level data. We next turn to various forms
of “micro” (consumer-level) data and the advantages such data can offer. We conclude
with a discussion of directions for future research.
2 The Challenges of Demand Estimation

Demand estimation is surprisingly difficult. At the heart of the challenge is the need to
measure responses of quantities demanded to ceteris paribus changes in prices or other
factors. The most basic requirements for credible estimation of such effects are standard
in empirical economics; for example, one needs sufficiently flexible functional forms, valid
sources of exogenous variation, and sufficient account for unobserved individual-level
heterogeneity. But some distinctive challenges of demand estimation appear when one
acknowledges the presence of unobservables that affect demand even when aggregating
to the level of the product and market. The effects of such unobserved “demand shocks”
arise through elementary economics and lead to the well-recognized problem of price
endogeneity. Critically, the relevant demand shocks generally must be held fixed to
measure the demand elasticities and other demand responses of economic interest. This
alone rules out some common empirical approaches to estimation with endogeneity, even
in the hypothetical case of a single-good economy. And the challenges become more
severe in the presence of complements or substitutes with their own prices and demand
shocks—i.e., in most real applications. In such settings, each good’s demand depends
on multiple endogenous prices and multiple demand shocks. As we discuss below, even
when one’s interest is limited to demand for a single good, these demand shocks introduce
complications that are outside the realm of everyday empirical economics.
2.1 The First Fundamental Challenge

A primary challenge in demand estimation is the endogeneity of prices; i.e., statistical
dependence between prices and unobservables that also affect demand. To illustrate,
4
imagine a perfectly competitive market with a single good (no complements or substi-
tutes), where demand and supply are characterized by the pair of simultaneous equations
Q = D(X, P, U ) (2.1)
P = C(W, Q, V ). (2.2)
Here (2.1) represents demand while (2.2) represents supply. Q and P denote quantity
and price; X and W denote observable demand shifters and cost shifters, respectively;
and the “error terms” U ∈ R and V ∈ R denote unobserved demand shifters and cost
shifters (shocks to demand and marginal cost), respectively. Assume that only prices and
quantities are endogenous, i.e., that (X, W ) (U, V ).
|=
The presence of the latent demand shifters and cost shifters (U, V ) is, of course, the
reason there is a simultaneity/endogeneity “problem” in the identification/estimation
of demand.1 If there were no unobservable U affecting demand, identification would
be trivial: the directly observed relationship between (X, P ) and Q would itself be the
demand relationship. In that case, the notion that the price P could be econometrically
endogenous would be nonsense: there would be no unobservable whose variation could
be confounded with that of P .
Of course, even when one acknowledges the existence of latent demand shocks U , the
solution to the endogeneity problem in this special setting is well understood. Identi-
fication of demand can then be obtained through cost shifters that are excluded from
the function D (i.e., elements of W 6⊂ X) satisfying an appropriate independence con-
dition with respect to U . In fact, nonparametric identification of demand in this case
follows immediately from the same instrumental variables (“IV”) conditions that yield
identification in the case of a regression model with endogeneity.2
There is at least one important sense in which the optimistic message from this
special case is correct: as we will see below, the essential requirements for identification
of demand are indeed standard types of “clean” variation, as from instrumental variables.3
However, this special case can also be highly misleading. It may suggest that the challenge
to identification of demand is “merely” the endogeneity of price. And given the wide
1
Unless marginal cost (inverse supply) is constant with respect to Q, dependence between P and U
would arise even if U and V were assumed independent. Typically one will expect correlation between
latent factors affecting firm costs (V ) and those affecting consumer demand (U ). And outside the
realm of perfect competition, prices will generally depend on demand elasticities, yielding (functional)
dependence between prices and demand shocks (the latter affecting elasticities) even in the special case
of constant marginal cost and independence between cost shocks and demand shocks.
2
See Newey and Powell (2003) in the case of nonparametric D that is additively separable in U ,
or Chernozhukov and Hansen (2005) (Theorem 4) when additive separability is replaced with strict
monotonicity in U .
3
Related to the instrumental variables literature is the literature on restrictions on the matrix of
covariances across equations. For example, one might be willing in some contexts to assume that the
demand and supply unobservables are uncorrelated. Full or partial identification using such restrictions
dates back Koopmans (1949). A recent example in the oligopoly context is MacKay and Miller (2021).
5
attention to endogeneity in empirical economics, one might conclude that estimation of
demand could proceed using any number of common tools for measuring the (causal)
effects of endogenous covariates on outcomes of interest. As we explain below, both of
these conclusions would be incorrect.
2.2 The Second Fundamental Challenge

A second (and less familiar) fundamental challenge is the fact that demand for any one
good generally depends on more than one latent demand shock.
Demand for a given good, of course, typically depends on the prices and characteristics
of all related goods. Indeed, when students first learn the elementary supply and demand
model, they are taught that demand for one good cannot be considered in isolation. For
example, a change in the price (or quality) of a substitute or complement will cause
demand to shift. But in this regard there is nothing special about the observability (to
us) of the prices or characteristics of related goods: demand for the good of interest also
shifts in response to variation in the unobserved demand shifters associated with other
goods. This is true regardless of whether supply is assumed to be perfectly competitive.
Thus, elementary economics tells us that the demand for any one good will in general
depend on the prices, observed characteristics, and demand shocks of that good and all
related goods. None of these factors can, in general, be excluded from the demand for
a given good. Such factors are, therefore, among the “all else” that is held fixed when
defining a ceteris paribus effect, such as that of a change in one price. This creates
further challenges for demand estimation. It implies that many familiar econometric
tools cannot be used to estimate demand unless one is willing to rely on strong functional
form assumptions for identification. For example, demand estimation typically cannot
be treated as standard regression analysis.
2.3 Demand Is Not Regression

Suppose there are J interrelated goods in a market of interest. Then demand for each
good j = 1, . . . , J takes the form
Qj = Dj (X, P, U ) , (2.3)
where now X = (X1 , . . . , XJ ) , P = (P1 , . . . , PJ ) , U = (U1 , . . . , UJ ).

Notice that J structural errors—the demand shocks (U1 , . . . , UJ ) associated with all
goods—enter on the right-hand side of (2.3). In general, this is not a regression equa-
tion.4 For example, having valid instruments for all J prices will not generally suffice
for identification of Dj or the ceteris paribus effects of price changes. We will see this
4
A regression equation is most often specified in the separable form Y = f (X) + E (e.g., Newey and
Powell (2003)), leading to mean regression. Alternatively, a quantile regression model takes the form
Y = f (X, E), with f strictly increasing in the scalar E (e.g., Chernozhukov and Hansen (2005).)
6
formally in later sections of this chapter; but it is clear that results for identification of
a regression function with a scalar structural error (see, e.g., footnote 2) are not directly
applicable.
In general, econometric models with multiple structural errors in each equation are
much more challenging than regression models. However, such models are not foreign
to econometrics. For example, they arise in standard models of treatment effects (e.g.,
Angrist and Imbens (1995)) and in particular representations—typically the reduced
forms—of standard simultaneous equations models (e.g., Brown (1983) and Benkard and
Berry (2006)). These examples, in fact, hint at both the problems created by multiple
structural errors and how one might make progress.
In empirical settings with endogeneity and multiple unobservables, economists of-
ten settle for estimation of particular weighted average responses (e.g., a local average
treatment effect); but this is a compromise poorly suited to the economic questions that
motive demand estimation, as these typically require the levels and slopes of demand
at specific points. To make progress, the IO literature on demand estimation has used
tools more familiar in the literature on simultaneous equations models: first “inverting”
the system of equations, then relying on instrumental variables. Of course, in the case
of simultaneous equations models,5 one typically expects to need instruments (or other
sources of exogenous variation) for all endogenous variables: here, all J prices and all J
quantities. Indeed, although this is sometimes not fully appreciated, the IO literature has
developed strategies exploiting sources of independent variation in prices and quantities
even when estimating demand alone. We return to each of these points below.
2.4 A Surprisingly Difficult Case: Exogenous Prices

To appreciate the distinctive challenge created by the structural errors on the right-
hand side of (2.3) it is useful to consider a hypothetical experimental setting. Suppose a
researcher is able to randomly assign price vectors (p1t , . . . , pJt ) in many large markets t
and observe the resulting quantities (q1t , . . . , qJt ) demanded in each market.6 It may be
natural to imagine that identification of demand would be trivial in this case. It is not.
The problem is that the economic quantities of interest—demand elasticities being
the leading case—require that prices be varied while holding all else fixed. Among the
things that must be held fixed are the demand shocks. Assigning prices for each market
at random can avoid dependence between the demand shocks and prices, but it will not
hold the demand shocks fixed.
The observed variation in quantities with the randomized variation in prices will re-
5
See, e.g., Matzkin (2008, 2015), Blundell, Kristensen, and Matzkin (2017), and Berry and Haile
(2018).
6
Note that we assume the quantity observed is the quantity demanded rather than the quantity
supplied at the exogenously set price. This need not be the case when a price is imposed exogenously—
say, by random variation around a regulatory threshold—but could arise, e.g., through a marketing
experiment.
7
veal certain averages of demand responses—integrating over the vector of demand shocks
(U1 , . . . , UJ ) to reveal a type of local average treatment effect. But in general, such aver-
ages are of very limited value in the case of demand, as they do not reveal any elasticity
of demand (or other standard notion of a demand response)—not at the observed prices
and quantities or any other known point. They therefore do not allow one to quantify
demand responses to counterfactuals of interest (e.g., outcomes following a merger), to
predict pass-through of a tax, or to infer firm markups through equilibrium pricing con-
ditions. We return to this issue in section 2.5.3. In short, however, such averages offer at
most a descriptive feature of demand, not the primary quantities of economic interest.
Of course, if experimental variation in prices does not allow identification of demand,
it should be clear that “quasi-experimental” variation cannot suffice. One important
implication is that instruments for prices (or other quasi-experimental variation in prices)
generally cannot by themselves deliver identification of demand. This point is often not
fully appreciated, but is essential to motivating the strategies relied on in the leading
work on demand estimation.
One possible path to resolving this challenge is to impose functional form restric-
tions. For example, suppose that the demand function on the right-hand side of (2.3) is
restricted to take the form
Dj (X, P, U ) = Dj (X, P, Ej (U )) (2.4)
where the index Ej (U ) is a scalar and the function Dj is strictly increasing in Ej (U ).

This structure would arise, for example, if demand for good j is linear in each of the
demand shocks U1 , . . . , UJ . In this case, as in Matzkin (2003), the τ quantile of the
distribution of Qj |X, P reveals Dj (X, P, Ej (U )) for Ej (U ) fixed at its τ quantile. Thus,
in this special case, identification of demand can indeed be obtained when P U , as
|=
when P is experimentally controlled by the researcher. Similarly, instruments for (all)
prices P could allow identification.
Of course, restricting the vector of demand shocks to affect demand for good j mono-
tonically through a scalar Ej (U ) involves a strong functional form restriction—one ruled
out even by common parametric demand specifications like the multinomial probit, multi-
nomial logit, or CES models.7 And without this index restriction an experiment (or in-
strument for the prices P ) generally will not suffice to allow identification of demand. As
we will see in later sections of this chapter, identification of demand can still be obtained
using additional sources of variation.
7
The notable exception of demand that is linear in all demand shocks (own and other) points out a
potentially unappreciated implication of a linear specification: a reduction in the dimension of exogenous
variation required for identification.
8
2.5 Many Common Tools Fall Short
Our discussion has already suggested that a solution to the challenge of demand esti-
mation can be obtained using instrumental variables, potentially with other sources of
independent variation in prices or quantities. Most of our chapter focuses on specific
parametric and nonparametric instrumental variables approaches to identification and
estimation, using insights from several literatures in IO and econometrics. One may
wonder whether the complexity of these strategies is necessary—specifically, whether
simpler or more familiar tools of empirical economics might offer viable alternatives. In
fact, many common empirical tools in economics—including various “research designs”
useful for measuring (causal) effects of endogenous variables in other contexts—are not
applicable to demand, at least without significant compromises or developments beyond
the current methodological frontier.
We briefly discuss some of these alternative empirical approaches below. Because
critical shortcomings are evident even in the idealized single-good supply and demand
model given by (2.1) and (2.2), we focus primarily on this case to illustrate.
2.5.1 Controls, Including Fixed Effects

Because the problem of endogeneity arises from the presence of unobservables, a natural
approach is to look for “controls” that will eliminate this problem. This can work, but
the requirements are strong. First, the controls must absorb all effects of the unobserv-
ables on demand. Second, the controls cannot absorb all the variation in prices. This
second requirement is a kind of exclusion restriction: there must be sufficient sources
of price variation that are not included in the demand equation controls. Among other
consequences is that the same set of controls cannot be used for both demand and supply
estimation.
Note that if all demand shifters are observed and included in X in the demand
equation (2.1), then we must observe a perfect fit/prediction of demand at all (X, P ) (up
to any measurement error in the quantity demanded). This is a strong requirement; but
without it we are left with unobserved demand shifters and the problem of simultaneously
determined (endogenous) prices.
Fixed effects are sometimes considered to be an attractive set of controls for precisely
the reason that they might “control for everything.” However, the requirements for a
fixed effects strategy to solve the endogeneity problem are no different. It remains critical
that while fixed effects control for everything affecting demand, they do not control for
everything that affects price. This is a point worth emphasizing because the presence of
latent demand shocks at the level of the product or market might suggest a fixed effects
approach to controlling for these unobservables. However, a product-level fixed effect will
not suffice if demand shocks vary by both product and market, as typically assumed.8
8
Thus, for example, Nevo (2001) incorporates product fixed effects but still has demand shocks for
each product×market combination, each representing the deviations of the product-level unobservable
in that market from its overall mean.
9
Yet if the price of a given good is the same for all consumers in a given market (often
this is the definition of “market”), a fixed effect for each product×market will leave no
variation in price, making it impossible to measure demand elasticities or to connect
demand to standard notions of aggregate welfare.
In a panel data set covering products within geographic markets across time, it is
feasible to consider fixed effects for products across markets (held fixed over time) plus
time effects (held fixed across product/markets). But for this to serve as a valid approach
to the problem of endogenous prices, the resulting model must fit the data perfectly, up
to measurement error in the observed quantities. And, again, the same set of fixed effects
could not be used to estimate supply, because we require additional sources of supply-side
price variation (not just measurement error).
2.5.2 Control Function

“Control function” approaches are popular and have close ties to IV approaches.9 These
approaches begin with “triangular” models of the form
T =F (Z, E1 ) (2.5)
Y =G(T, E2 ), (2.6)
where (2.6) represents an outcome equation of interest and (2.5) is a reduced form for the
endogenous (“treatment”) variable T appearing on the right-hand side of (2.6).10 Here
E1 is assumed to be a scalar, with F strictly increasing in E1 . In such contexts, a control
function can be used to treat the endogeneity of T in the outcome equation, allowing
identification of G if E2 is also a scalar (or, otherwise, identification of certain average
effects).
One may be tempted to view demand as fitting this triangular structure, with price
being the endogenous variable T affecting the quantity demanded, Y . But demand
generally cannot be represented in this form. Even in the perfectly competitive single-
good economy characterized by (2.1) and (2.2), the reduced form for the equilibrium
price takes the form
P = R(X, W, U, V ). (2.7)
This does not take the form of (2.5). Critically, the right-hand side of (2.7) depends
on all structural errors—here, the scalar demand shock U and scalar cost shock V . As
demonstrated formally by Blundell and Matzkin (2014), only in very special cases (a
linear model being one example) will the errors U and V enter the reduced form through
a scalar index (not itself dependent on (X, W )), allowing for valid application of control
9
Imbens and Newey (2009) (see also references therein) discuss nonparametric control function ap-
proaches. They note the typical failure of control functions in non-triangular systems, including in many
classic simultaneous equations models.
10
Throughout we use the term “reduced form” as it is defined in econometrics: a relationship in which
an endogenous variables is expressed as a function of exogenous variables and structural errors.
10
function approaches. Without such functional form restrictions, however, the control
function approach will not allow identification of demand.11 The key problem is clear:
in general a single control variate cannot eliminate the confounding effects of multiple
unobservables, much less hold them fixed at particular values. Of course, this problem
only becomes more severe when one acknowledges the presence of other related goods; in
that case, each outcome (demand) equation depends on multiple endogenous prices, each
of which generally depends on the demand shocks and supply shocks associated with all
related goods.
2.5.3 Average Treatment Effects

The measurement of treatment effects is a major area of empirical economics that regu-
larly deals with multiple unobservables and endogeneity, often focusing on estimation of
average responses like a local average treatment effect (“LATE”). Such measures charac-
terize responses averaged over certain values of the unobservables (Angrist and Imbens
(1995)). A natural question is whether measurement of demand can be approached the
same way (see, e.g., Angrist, Graddy, and Imbens (2000)).
In general, the answer is no. As we have suggested already, average responses of
demand to price changes are of very limited economic interest. To illustrate this point
as clearly possible, consider one of the first ceteris paribus counterfactuals taught to
undergraduates: the effect of an excise tax, τ , on the equilibrium price P ∗ (X, W, U, V, τ )
in the single-good supply and demand model of equations (2.1) and (2.2). Rewriting the
marginal cost function in (2.2) in inverse form as “supply,”12
Q = S(W, P, V ),
we teach (by graph and equation) that the change in the change in equilibrium price
resulting from change in the excise tax depends on relative slopes of supply and demand,
as measured by
∂S(W,P,V )
∂P ∗ (X, W, U, V, τ ) ∂P
= . (2.8)
∂τ ∂S(W,P,V )
+ ∂D(X,P,U )
∂P ∂P
This ratio is not a LATE. By definition, a LATE averages over the latent variables; this
is not the same thing as holding them fixed. Thus, a LATE approach cannot produce
ceteris paribus counterfactual quantity of interest: the causal effect of the tax change.
But the limitations of a LATE in this example are typically much more severe than
the distinction between the effect at a point and an average effect. Indeed, in some
11
Petrin and Train (2010) and Kim and Petrin (forthcoming) describe special cases allowing use of a
control function approach.
12
For simplicity, here we assume upward sloping marginal cost and differentiability of the functions D
and S. Note that elsewhere in this chapter we use the notation s to represent a vector of market shares.
We trust our use of S for “supply” here will not cause confusion.
11
cases one might be interested in an average effect like a LATE for the treatment of
a tax change. This would be equal to the left-hand side of (2.8) averaged over some
distribution of (U, V ). However, this cannot be determined from LATE estimates of
demand (and supply). One could estimate separate local average derivatives of demand
and supply with the LATE approach in this simple setting. However, a ratio of averages
is not equal to the average ratio. Furthermore, because the weights for each local average
depend on the instruments (see, e.g., Angrist and Imbens (1995) and Angrist, Graddy,
and Imbens (2000)), the necessary use of different instruments to identify demand and
supply averages will also imply different (and unknown) weights for each average. Thus,
a LATE approach to demand (and supply) cannot identify even the average numerator
and average denominator under a common measure.13
We emphasize that this example was chosen because it is perhaps the most elementary
example of the kind of economic question that motivates demand estimation. It is not
special in terms of the limitations of LATE. Almost any equilibrium counterfactual will
involve interactions between demand and supply, leading to similar issues. Thus, it is not
merely that a LATE approach to demand estimation fails to allow measurement of the
quantity of primary economic interest; rather, it typically does not allow one to measure
any well-defined average equilibrium counterfactual quantity.
Notably, the problems discussed in the preceding paragraphs are those arising in the
simplest case—that of demand and supply for a single good with no complements or
substitutes. With multiple goods, the shortcomings of a LATE approach also multiply.
One then faces a system of equations characterizing the responses of demand to prices
(recall section 2.3), even before turning to more complex counterfactuals. Multiple prices
require multiple instruments, and the problems of LATE in handling different averages
associated with different instruments become even more important. This only adds to
the limitations of LATE for equilibrium counterfactuals. And in such cases (i.e., in the
most common empirical setting in practice), it appears that LATE demand estimates
could not produce even a local average own-price elasticity of demand.
Thus, although a LATE approach might embody the right set of compromises for
many empirical settings, it is poorly suited to the economic questions that motivate
demand estimation. Luckily, one can use different empirical tools depending on the
empirical questions of interest. Much of our focus in what follows involves empirical
approaches that allow one to model substantial unobserved heterogeneity (at the level of
individuals, goods, and markets) while still permitting identification and estimation of
the objects of economic interest in contexts involving demand.
13
Of course, if the data offer adequate exogenous variation in the tax τ , one could estimate an average
ex post effect without estimating demand—either from averages of equilibrium prices P ∗ (X, W, U, V, τ ) or
an instrumental variables LATE estimate. This serves as a reminder that demand estimation is typically
motivated by a desire to answer questions—e.g., to infer oligopoly markups or provide policy advice on
the implications of a proposed carbon tax—that require either an ex ante analysis or a counterfactual
quantity that cannot be characterized by an average response of one scalar observable to an exogenous
change in another.
12
2.6 Balancing Flexibility and Practicality
Although demand presents challenges that are absent in many empirical settings, all the
“usual” challenges remain. One such challenge is finding empirical specifications that are
both (a) sufficiently flexible to avoid strong a priori restrictions on the results and (b)
sufficiently parsimonious to permit practical application. In some markets the number of
closely related goods can be large—consider, for example, the set of all new automobile
models, all computer models, all mutual funds, or all residential neighborhoods in a given
city. Because the demand for a given good depends on the characteristics and prices of
related goods, a demand system with J goods has J 2 price elasticities at each point. In
many contexts, this will rule out even a linear specification of the demand equation (2.3).
Thus, even in cases where nonparametric estimation would be possible in principle,
in practice it will often be necessary to impose restrictions in order to obtain an em-
pirical model that is practical for the data available. Unsurprisingly, one can go too
far in the pursuit of parsimony. Some of the simplest demand specifications (e.g., the
CES, multinomial logit, multinomial probit) impose strong a priori restrictions on de-
mand elasticities—and, therefore, on markups, pass-through, and other key quantities of
interest—that are at odds with common sense and standard economic models.14 Below
we will discuss some of the strategies used in practice to strike a more attractive balance
between parsimony and flexibility. One common strategy is to derive demand from a
specification of consumer utility functions.
2.7 Demand or Utilities?

The most common approach to modeling demand estimation in the IO literature starts
from a specification of consumer utilities. This is a matter of convenience rather than
necessity. The primary goal in demand estimation is to obtain a quantitative representa-
tion of how quantities demanded respond to ceteris paribus variation in prices and other
observables. This does not require a specification of utilities.15 Indeed, a representation
of consumer demand in terms of utility maximization exists only under certain restric-
tions on demand. And when one is willing to impose the conditions on demand that
allow one to make valid welfare statements (see, e.g., Bhattacharya (2018) in the case of
discrete choice demand), these may be assumed directly and utilized to construct welfare
measures without actually specifying utility functions.
However, deriving demand from a specification of utilities can have significant practi-
cal advantages. A widely recognized (and often decisive) benefit is that such an approach
can represent a demand system for many goods (and, thus, many own- and cross-price
14
See, e.g., the discussion of in Berry, Levinsohn, and Pakes (1995) and, for the CES, in Adao, Costinot,
and Donaldson (2017).
15
In terms of consumer theory, one may view utilities as primitives and individual/aggregate demand
as a derived object, or view individual choice rules (individual demand) as primitives, with utilities (and
optimization) as a derived representation (see, e.g., Mas-Colell, Whinston, and Green (1995)).
13
elasticities) with a relatively small number of parameters (see, e.g., Berry (1994)). Typ-
ically, some of this parsimony comes from imposition of economically motivated restric-
tions. For example, researchers will often prefer to require that individual demand satisfy
standard rationality conditions, which hold automatically when demand is derived from
utility maximization. And even when focusing on market-level demand, an explicit con-
nection to individual-level demand can often be usefully exploited in empirical work.
At the consumer level, researchers often wish to impose certain economic restrictions
or symmetry conditions that may be more easily formulated through utility functions.16
Examples of such restrictions include:
• an assumption that heterogeneity in preferences over a given set of goods arises in

part from consumer heterogeneity in tastes for the characteristics of the goods;
• an assumption that variation in the characteristics of good j alters the attractive-

ness of good j relative to others, but not the relative attractiveness of other pairs
of goods;17
• an assumption that, all else equal, each consumer has a single marginal rate of
substitution between any pair of product characteristics—say, price and quality—
regardless of the name of the product.
Restrictions of these types are not without loss, but they can lead to specifications offering
an attractive balance of parsimony and flexibility.18
3 Discrete Choice Demand

Building on the pioneering work of McFadden (1974, 1977), demand in many IO appli-
cations is formulated with a discrete choice model. Thus, although most key insights
apply more broadly, much of our discussion will focus on discrete choice. In a discrete
choice model each consumer selects exactly one of the options available to her. In most
applications to demand, the options in the choice set are individual products. However,
the class of discrete choice models is more general than it may seem. For example, one of
the options could be to purchase both a Dodge Caravan and a Porsche 911, and another
16
Compiani (2020) explores imposition of such restrictions without specifying utility functions.
17
This assumption has similarities to Luce’s independence of irrelevant alternatives (“IIA”) axiom.
However, the IIA assumption replaces “relative attractiveness” with “relative choice probabilities.”
Luce’s IIA is a highly unnatural restriction (see, e.g., Debreu (1960), McFadden (1974), and Berry,
Levinsohn, and Pakes (1995)), whereas the “relative attractiveness” assumption has intuitive appeal in
the discrete choice context.
18
Similarly, specifying a utility functions can facilitate the use of additional assumptions to answer
questions like (a) whether larger values of an observed consumer characteristic make one good more
attractive or (b) whether observed consumer characteristics like income or education alter preferences
or serve as proxies for latent preference variation.
14
option (in the same demand system or, more likely, another) could be to purchase four
boxes of cookies and a gallon of milk.19 For simplicity, however, we will refer to each of
the options as a good or product. Typically each consumer’s choice set should include
an option of the form “none of the above”—what we will call the “outside good.” This is
important. In a discrete choice model without an outside good, the market demand elas-
ticity would always be zero; for example, when estimating demand for health insurance,
a model with no outside good would imply that doubling all premiums would have no
effect on the number of households with insurance. Note also that the choice probabilities
implied by a discrete choice model can often also be interpreted as a demand system gen-
erated by continuous choices, as from a representative consumer with a taste for variety
(see for example the review provided by Anderson, DePalma, and Thisse (1992)).
3.1 Random Utility Models

In most of the literature, discrete choice demand is represented with a “random utility”
model. Let j = 1, . . . , Ji index the “inside goods” available to consumer i while j = 0
denotes the outside good. A consumer’s choice set is characterized by Ji and a set χi ,
which may include observed characteristics of consumer i, observed characteristics (in-
cluding prices) of the available goods, observed characteristics of the local market, and
characteristics of the market or goods that are unobserved to the researcher. Each con-
sumer i has a conditional indirect utility (henceforth, “utility”) uij for good j. Consumer
i knows her utilities for all goods and chooses the good yielding her the highest utility.
Consumer preferences are permitted to be heterogeneous, even when conditioning on
any consumer characteristics included in χi . This heterogeneity is modeled by treating
utilities as varying at random across consumers: given the choice set (Ji , χi ), each con-
sumer’s utility vector (ui0 , ui1 , . . . , uiJi ) is an independent draw from a joint distribution
Fu (· | Ji , χi ).20 Because a consumer’s behavior depends only on her ordinal ranking of
goods, below we will normalize the location and scale of each consumer’s utility vector
without loss of generality.21 We assume that the distribution Fu (· | Ji , χi ) is such that
19
See for example Gentzkow (2007), who notes that this approach can generate complementarities
across the “primitive” products. There are practical limits to this flexibility, and one may want to
impose cross-option restrictions when the distinct options in the choice set involve partially overlapping
bundles.
20
As a characterization of demand, the modeled randomness in utilities may be interpreted as reflecting
heterogeneity in consumer’s decision making, allowing for example mis-perception, inconsistencies, or
non-optimizing behavior. See, e.g., the discussion and references in chapter 2 of Anderson, DePalma, and
Thisse (1992). The choice of interpretation becomes important if one wishes to make welfare statements
or counterfactual predictions associated with interventions that might alter consumers’ decision rules.
21
Normalizations of the location and scale of each consumer’s utilities are without loss of generality
with respect to behavior. However, different normalizations can have different implications for the
interpretation of additional assumptions, including those used to justify certain welfare statements. See
Bhattacharya (2018) for important results on standard aggregate welfare measures in random utility
discrete choice models.
15
“ties” (uij = uik for j 6= k) occur with probability zero.
We may then represent consumer i’s choice with the vector (qi1 , . . . , qiJi ), where
qij = 1 {uij ≥ uik ∀k ∈ {0, 1, . . . , Ji }} .
Consumer-specific choice probabilities are then given by
sij = E [qij | Ji , χi ]
Z
= dFu (ui0 , ui1 , . . . , uiJi | Ji , χi ) ,
Aij
where
Aij = (ui0 , ui1 , . . . , uiJi ) ∈ RJi +1 : uij ≥ uik ∀k .

To illustrate, consider an example with Ji = 2. Let pj denote the price of good j and
let
uij = µij − pj
for j > 0, where (µi1 , µi2 ) are drawn from a joint distribution Fµ (·). Set ui0 = 0,
normalizing the location of utilities. Figure 1 then illustrates the regions in (µi1 , µi2 )-
space leading consumer i to choose goods 0, 1, and 2. For example, only consumers for
whom µi2 − p2 > 0 prefer good 2 to the outside option. The dark grey region is the set
of (µi1 , µi2 ) combinations such that this holds and µi2 − p2 > µi1 − p1 , i.e., the set Ai2 .
Similarly, the light grey region corresponds to Ai1 . The choice probabilities for consumer
i then correspond to the probability measure assigned to each region by Fµ (·).
µi2
45◦
Ai2
p2
Ai1
Ai0
(0, 0) p1 µi1
Figure 1: Choice regions for goods 0, 1, and 2.
16
3.2 The Canonical Model
Discrete choice demand models are frequently formulated using a parametric random
utility specification such as22
uijt = xjt βit − αit pjt + ξjt + ijt (3.1)
for j > 0, with

ui0t = i0t . (3.2)
This formulation has many important components, which we discuss here in detail.
The notion of a “market” t is central to this formulation and will allow a precise
characterization of the endogeneity problems inherent to demand estimation. In practice,
markets are typically defined by natural combinations of time and geography—e.g., a
given year in a given metropolitan area. Let Jt denote the set of products (inside goods)
available to consumers in market t, and let Jt = |Jt |. On the right-hand side of (3.1),
pjt represents the price of good j in market t, while xjt ∈ RK represents other observable
characteristics of good j in this market.23 The term ξjt is an unobserved factor—a demand
shock—associated with good j and market t. The demand shock ξjt is often described
as a measure of good j’s unobserved characteristics. But this is more restrictive than
necessary; ξjt can represent any combination of latent taste variation and latent product
characteristics common to consumers in market t. For example, a high value of ξjt may
simply indicate that consumers in market t have a high mean taste for good j. We let
xt = (x1t , . . . , xJt ,t ), pt = (p1t , . . . , pJt ,t ), ξt = (ξ1t , . . . , ξJt ,t ), and χt = (xt , pt , ξt ).
Typically, one allows prices pt to be correlated with ξt . One reason is that standard
models of oligopoly competition imply that prices are endogenous; in particular, the
equilibrium price of any good j in market t will depend on all components of xt and
ξt , as these alter the residual demand for good j. In addition, equilibrium prices are
affected by latent shocks to marginal costs, which we typically expect to be correlated
with demand unobservables. And when marginal costs are upward-sloping, this will
imply dependence of equilibrium marginal costs (and, thus, prices) on demand shocks.
Exogeneity of the remaining product characteristics xt is often assumed, and we will
do so in what follows. However, this is not essential. On the demand side, allowing
endogeneity of additional characteristics is conceptually straightforward but will lead to
22
See McFadden, Talvitie, and Associates (1977), Hausman and Wise (1978), McFadden (1981), Berry
(1994), Berry, Levinsohn, and Pakes (1995), and a large literature that has followed. Observe that (3.2)
provides the necessary location normalization of utilities. We emphasize that this normalization is
without loss of generality with resect to the implied demand under the maintained assumption that
the additive scalars ξjt fully capture the effects of unobservables at the product×market level. Such an
assumption will sometimes be more plausible when controlling for systematic variation in the outside
option as, for example, in Eizenberg (2014).
23
We assume the first component of each xjt is a one, absorbing the mean of ξjt . In some applications
in which the same products appear in many markets, product dummies may be included. See, e.g., Nevo
(2001).
17
more demanding instrumental variables requirements.24
The additive ijt in (3.1) is most often specified as an i.i.d. draw from a standard type-
1 extreme value distribution, yielding a mixed multinomial logit model. Alternatively, a
normal distribution will yield a mixed multinomial probit.25 The term “mixed” reflects
the heterogeneity across consumers in the parameters αit and βit that characterize their
marginal rates of substitution between the various observed and unobserved characteris-
tics. Choice probabilities in the population reflect a mixture of the choice probabilities
conditional on each possible combination of (αit , βit ). For example, in the case of mixed
logit we have choice probabilities in the population (i.e., market shares) given by
Z
exjt βit −αit pjt +ξjt
sjt = PJt x β −α p +ξ dF (αit , βit ; t), (3.3)
k=0 e
kt it it kt kt
where F (·; t) denotes the joint distribution of (αit , βit ) in market t. The latent taste
parameters (αit , βit ) are often referred to as “random coefficients.”26
To specify the joint distribution F (·; t), each component k of the random coefficient
vector βit is commonly specified as taking the form
L
X
(k) (k) (k) (`,k)
βit = β0 + βν(k) νit + βd di`t . (3.4)
`=1
(k) (k)
Here β0 is a parameter shifting all consumers’ tastes for xjt . Each di`t represents a
(k)
characteristic (e.g., demographic measure) of individual i, and each νit is a random
variable with a pre-specified distribution (e.g., a standard normal). The parameters
(`,k) (k) (k)
βd and βν govern the extent of variation in tastes for xjt across consumers with
(k)
different demographic characteristics dit or different taste shocks νit . The distinction
(k)
between di`t and νit reflects the fact that each di`t (or at least its distribution in the
population) is assumed to be known. For example, in the case of demand for cars, one
might specify that family size affects preference for large cars, in which case the actual
distribution of family size in each market would allow the model to capture this source
of latent preference heterogeneity in the population of consumers.27 On the other hand,
although we may also expect preference for fuel efficiency to vary in the population, there
24
For example, given sufficient instruments, one may simply let pt include all endogenous observables,
as in Fan (2013). See also the discussion in Berry and Haile (2020).
25
Independence of ijt across goods j is not essential, and is often relaxed in the case of mixed probit.
However, given the presence of the market-level demand shocks ξt and the random coefficients (αit , βit ),
for the usual case in which prices are set at the market level (no individual-specific prices) it is standard
to assume that each ijt is independent of xt , pt , ξt .
26
Early examples using random coefficients to generate random utility discrete choice models can be
found in Quandt (1966, 1968). See also Quandt (1956).
27
See, e.g., Goldberg (1995) and Petrin (2002).
18
may be no demographic measure whose distribution captures this heterogeneity. Such
(k)
latent heterogeneity in preference for a product characteristic xjt can be captured by
(k)
the random taste shocks νit .
The treatment of the coefficient on price, αit is similar. A typical specification of αit
takes the form
(0)
ln(αit ) = α0 + αy yit + αν νit ,
where yit represents consumer-specific measures such as income that are posited to affect
price sensitivity.28 The variables included in yit might overlap partially or entirely with
dit .
3.3 Why Random Coefficients?

In the canonical model, randomness in utilities reflects both the idiosyncratic “tastes
for products” ij and the random coefficients (“tastes for characteristics”) (αit , βit ). One
motivation for the latter can be illustrated by considering the same model without random
coefficients:
uijt = xjt β0 − αpjt + ξjt + ijt . (3.5)
Letting
δjt = xjt β0 − αpjt + ξjt , (3.6)
we can write (3.5) as
uijt = δjt + ijt . (3.7)
If the remaining stochastic terms ijt are i.i.d. and independent of (x, p), products differ
(up to realizations of ijt ) only in their “mean utilities” δjt .29 This implies that choice
probabilities depend only on these mean utilities. Likewise, price elasticities (own and
cross) depend only on mean utilities. This is true not just for the multinomial logit
model, but any additive random utility model of the form (3.7) with i.i.d. ijt .
These are very restrictive implications. For example, they imply that any two goods
with the same (or similar) market shares—no matter how they differ in other respects—
will have the same (or similar) own-price elasticities, equilibrium markups, and cross-price
elasticities with respect to any third good. These are not only strong restrictions, but
properties that are contrary to economic models of differentiated products, where, for
example, goods that are more similar tend to have larger cross-price elasticities.
To be clear, the problem is not just a lack of “realism,” but the a priori restriction
on key features like own and cross-price elasticities that motivate estimation of demand.
Models of the form (3.7) impose very restrictive relationships between the levels of market
shares and the matrix of own and cross-price derivatives and, therefore, on counterfactual
28
Other functional forms are common in the literature. For example, it is common to specify price as
entering in the form α ln(yit − pjt ), following Berry, Levinsohn, and Pakes (1995).
29
The term “mean utility” is standard but loose. Here the mean of uijt is equal to δjt plus the mean
of ijt , which need not be zero.
19
predictions. This is a bug, not a feature. These restrictions do not come from economics
but from assumptions chosen for simplicity or analytical convenience. Models must, of
course, abstract from reality, and finite samples require appropriate parsimony. But good
modeling and approximation methods should aim to avoid strong a priori restrictions
on the very quantities of interest unless those restrictions can be defended as natural
economic assumptions.
Random coefficients are not the only way to avoid these restrictions. For example, the
random terms (i1t , . . . , iJt t ) need not be specified as mutually independent. In the case
with just a few products whose identity is constant across markets, a good alternative
to random coefficients might specify an unrestricted covariance matrix for the it vector.
But in cases with more than a few products per market, or with products whose charac-
teristics change across markets, random coefficients are attractive because they balance
flexibility in key dimensions with tractability. Random coefficient specifications can be
formulated using economics, building on the observations that real goods differ in multi-
ple dimensions, and real consumers have heterogeneous preferences over these differences.
Taking the case of automobiles, random coefficients on indicators for pickup trucks and
for minivans enables the model to predict that different models of pickup trucks will be
close substitutes, precisely because a consumer who likes one pickup truck will tend to be
one with strong idiosyncratic taste for all pickup trucks. And even if the leading minivan
and leading pickup truck have very similar market shares, the model can predict very dif-
ferent cross elasticities with respect to a third vehicle—say, an SUV targeted at families.
Thus, as a matter of theory, random coefficients can introduce consumer heterogeneity
along key dimensions of product differentiation. And a substantial empirical literature
has demonstrated that in practice random coefficients can play a critical role in giving
the demand specification sufficient flexibility to produce natural consumer substitution
patterns.
As a practical matter, important questions include the measures included in xt and
the extent of heterogeneity modeled through random coefficients. In some cases, practical
considerations may dictate selecting a set of observable characteristics viewed as most
important or most subject to heterogeneity in preferences.30 Depending on the data
set, a specification with a very large number of random coefficients may yield imprecise
estimates (particularly, of the parameters associated with the distribution of random
coefficients),31 or even numerical problems in estimation. A researcher then often faces
the practical question of what product characteristics are modeled as having random
coefficients. Should one choose just an index of “quality”? Just price? Dummy variables
30
Gillen, Moon, and Shum (2014) and Gillen, Moon, Montero, and Shum (2019) propose a data-driven
approach to selecting from a large set of observed characteristics assumed to affect only mean utilities.
31
Of course, we often care more about the estimation error in our eventual counterfactual analysis
than the statistical significance of the estimated random coefficient parameters per se. Furthermore,
an imprecise estimate of the variance of a random coefficient, as may arise when instruments produce
insufficient exogenous variation, should not be confused for evidence in favor of a degenerate coefficient.
20
indicating subsets (e.g., nests) of products?32 Multiple observed characteristics (parts
of xt , pt )? In practice, the choice must reflect the application and the available data.
Economic considerations often suggest dimensions along which preference heterogeneity
is likely to be most important for determining the consumer substitution patterns that
drive own- and cross-price elasticities. But practical considerations such as sample size
and available sources of exogenous variation may play a role as well. In many cases it
may not be desirable to specify random coefficients on all components of (xt , pt ). We will
return to this issue below when discussing instrumental variables and identification.
4 Market-Level Data
In many applications the key data are observed at the market level. In such cases, one
typically observes
• the number of goods Jt available to consumers in each market t;
• their prices and other observable characteristics pt , xt ;
• their observed market shares, s̃jt , typically measured as the total quantity of good
j sold in market t divided by the number of consumers (e.g., households) in that
market;
• the distribution of consumer characteristics (dit , yit ) in each market; and
• possibly, additional variables wt (e.g., cost shifters) that might serve as appropriate
instruments
The standard approach to estimation of discrete choice demand from market-level
data was developed in Berry, Levinsohn, and Pakes (1995), with many subsequent vari-
ations and extensions. Here we consider a slightly simplified version of their model with
a non-random coefficient on price.33 Thus, the random utility specification becomes
uijt = xjt βit − α0 pjt + ξjt + ijt (4.1)
for j > 0, with ui0t = i0t . We follow Berry, Levinsohn, and Pakes (1995) in assuming that
each ijt is an i.i.d. draw from a standard type-1 extreme value (Gumbel) distribution,34
32
This case covers the nested logit as a special case. See Ben-Akiva (1973), McFadden (1978) and, for
the market-level IO context, Berry (1994).
33
In practice, it is often important to allow for heterogeneity in price sensitivity. The variation of
this model presented in section 6 illustrates a type of specification commonly used in practice, even in
the case of market-level data. We present the more restrictive quasi-linear specification here to simplify
exposition and make clearer the sources of key identification requirements.
34
Setting the scale parameter of the Gumbel distribution to one normalizes the scale of utilities.
Setting the location parameter to zero is also without loss due to the fact that adding the same constant
to all utilities yields an equivalent representation of preferences.
21
(k)
and that each νit in (3.4) is an i.i.d. draw from a standard normal distribution.
Observe that (4.1) can be rewritten as
uijt = δjt + µijt + ijt , (4.2)
where we have defined

δjt = xjt β0 − α0 pjt + ξjt (4.3)
and !
K
X L
X
(k) (`,k) (k)
µijt = xjt βd di`t + βν(k) νit . (4.4)
k=1 `=1
Let Fµ (· | xt , βd , βν ) denote the joint distribution of the stochastic terms (µi1t , . . . , µiJt t )
given (xt , βd , βν ). Given our assumptions above, this distribution is known up to the
parameters (βd , βν ).
Letting
δt = (δ1t , . . . , δJt t ),
the market shares implied by the model take the form
Z
eδjt +µijt
σj (δt , xt , βd , βν , Jt ) = PJt δ +µ dFµ (µit | xt , βd , βν , Jt ) (4.5)
k=0 e
kt ikt
for each good j. An important fact, demonstrated in Berry (1994), is that the demand
system
σ(δt , xt , βd , βν , Jt ) = (σ1 (δt , xt , βd , βν , Jt ), . . . , σJt (δt , xt , βd , βν , Jt ))
d , βν and any vector of nonzero market shares s = (s1 , . . . , sJt ) in
is invertible: given xt , βP
market t such that 1 − j>0 sjt > 0, there is a unique vector δ for market t such that
σ(δ, xt , βd , βν , Jt ) = s.
4.1 The BLP Estimator

At the broadest level, an estimation strategy involves searching (or solving) for the pa-
rameters of the model that allow it to best fit the data. Let
θ ≡ (α0 , β0 , βd , βν )
represent all the parameters of the model. It will be useful to partition these as
θ1 = (α0 , β0 )
θ2 = (βd , βν ).
22
In the literature, the elements of θ1 are often referred to as the “linear parameters”
and with θ2 referred to as “nonlinear parameters.”35 Note that we can then rewrite the
model’s prediction of market shares (4.5) as
sjt = σj (δt , xt , θ2 , Jt ).
Because identification of the model will rely on instrumental variables, it is natural

to formulate an estimator using moment conditions. Berry, Levinsohn, and Pakes (1995)
proposed a generalized method of moments (GMM) estimation approach that can be
sketched as follows:
1. take a trial value of the parameters θ;

2. for each market t, “invert” the demand model at the observed market
shares s̃t to find the unique vector ξt ∈ RJt such that, given the definition
(4.3), σj (δt , xt , θ2 , Jt ) = s̃jt for all j;
3. evaluate the trial value θ using a GMM criterion function based on mo-
ment conditions of the form
E[ξjt (θ)zjt ] = 0,
where zjt ⊃ xjt is a vector of appropriate instrumental variables;

4. repeat from step 1 until a minimum is found.
PT
More formally, let T denote the number of markets in the sample and let N = t=1 Jt .
The BLP estimator θ̂ is defined as the solution to a mathematical program:
min g(ξ (θ))0 Ω g(ξ (θ)) (4.6)

θ
subject to
1 X
g(ξ(θ)) = ξjt (θ)zjt (4.7)
N ∀j,t
ξjt (θ) = δjt (θ2 ) − xjt β + αpjt (4.8)
log(s̃jt ) = log(σj (δt , xt , θ2 , Jt )) (4.9)
Z
exp[δjt (θ2 ) + xjt β̃]
σj (δt , xt , θ2 , Jt ) = P fβ̃ (β̃|θ2 )dβ̃, (4.10)
1 + k exp[δjt (θ2 ) + xkt β̃]
35
Both sets of parameters alter demand nonlinearly, but the mean utilities δjt are linear in θ1 —a fact
that can be exploited in computation of the estimator.
23
where Ω denotes the standard GMM weight matrix and fβ̃ (·|θ2 ) denotes the joint density
of β̃it ≡ βit − β0 , i.e., the consumer-specific components of the coefficients βit . Computa-
tion and inference are discussed in section 4.4.
4.2 Instruments
Broadly speaking, estimation of demand requires observables that provide exogenous
sources of independent variation in prices and quantities. In the case of market-level
data, such variation must come from instrumental variables that are excluded from the
relevant demand equations in an appropriate sense. The need to instrument for both
prices and quantities may be counterintuitive: to estimate demand, we might think
instruments for prices would suffice. As suggested in section 2, however, this is not the
case.
The need for excluded instrumental variables beyond those for prices will be explained
more formally in section 5. But this is easily seen in the BLP model by considering the
hypothetical case of exogenous prices. Even in this case it is clear that the model cannot
be estimated using only moments interacting the demand shocks ξjt with xjt and pjt :
in a parametric model, identification requires at least as many moment conditions as
parameters, and the parameters of the model include not only the coefficients θ1 =
(α0 , β0 ) on xjt and pjt in (4.3), but also the parameters θ2 governing the variation in the
random coefficients. Thus, additional moment conditions would be required.
Below we discuss several types of variables that can provide the necessary sources
of exogenous variation in prices and quantities. That is, what types of observables can
satisfy the requisite relevance and exclusion (conditional moment) conditions? An addi-
tional question is what functions of these observables most usefully play the roles of zjt in
the unconditional moment conditions whose sample analogs are given by (4.7). Thus, in
this section we also discuss the approximation of “optimal instruments.” One important
lesson from the literature is that the use of (approximately) optimal instruments can
greatly improve estimation precision.
4.2.1 Cost Shifters and their Proxies

A classic type of (excluded) instrument for estimating demand is an exogenous shifter
of marginal cost, such as exogenous material costs, a tax, or tariff. In most models of
supply, variation in marginal costs will be “passed through” to some extent. As long as
these cost shifters are (mean) independent of latent demand shocks ξt , they can serve as
appropriate instruments. Many natural cost shifters will vary across time and operate at
the firm level; these will be most useful in applications where firms operate in few markets
or when variation in “markets” has a substantial temporal component. Other measures
such as location-specific distribution costs can provide variation even with global firms
and no temporal variation. Similarly, if a firm producing for geographically distinct
markets faces upward sloping marginal cost at the firm (product) level, the marginal cost
24
associated with one market will be shifted by contemporaneous demand shifters in other
markets.
Noisy measures of a producer’s actual cost shifters can also serve as instruments. For
example, the average wage level in a producer’s labor market may not perfectly track the
producers’ labor costs but is nonetheless likely to be highly correlated with those costs.
Thus, such wage measures can serve as instruments as long as they are uncorrelated with
demand shocks conditional on the exogenous variables and consumer-specific measures
(e.g., income and education) included in the demand model.
A less obvious type of proxy that can sometimes serve as an instrument for pjt is
the contemporaneous price of the same good in another geographic market (see, e.g.,
Hausman, Leonard, and Zona (1994), Hausman (1996), and Nevo (2001)). This is often
referred to as a “Hausman instrument.” The logic of this instrument is that even if
we do not observe producer-specific cost shifters, variation in costs is likely present and
at least partially responsible for variation in the prices a producer sets in all markets
it serves. Thus, an observed price increase in market t0 can signal a change in the
producer’s costs that also shifts its equilibrium price in market t. Although the logic
of a Hausman instrument builds on that for a cost shifter, an important difference is
that, outside a perfectly competitive model, prices reflect not just firm costs but also
demand elasticities—something that depends on demand shocks. The excludability of
Hausman instruments, therefore, requires close scrutiny. Taking the example above, the
key assumption is that the price in market t0 is (mean) independent of ξjt conditional on
the exogenous xjt . This would fail if the demand shocks ξjt and ξjt0 are correlated, for
example through seasonal variation in demand that is not captured by the observable
product characteristics. More generally, to use any proxy for an exogenous change in
firm costs as an instrument, the proxy error must also be exogenous.
4.2.2 BLP Instruments

In addition to cost shifters and Hausman instruments, a third class of instruments involves
the exogenous characteristics of competing products. In section 2 we explained that a
fundamental challenge to estimation of demand is the fact that the demand for any one
good j depends on the characteristics of all related goods. But while this elementary
observation reveals a challenge, it can also provide a solution. In particular, as long as
we can maintain the assumption of mean independence between xt and ξt , the entire set
x−jt of competitors’ product characteristics can serve as instruments (components of zjt )
creating exogenous variation in good j’s market share. As just noted, these characteristics
affect all quantities through the demand system. They also affect prices: the equilibrium
markup for good j depends on the elasticity of the residual demand for good j, which
again depends on the characteristics of all goods. Thus, exogenous characteristics of
competing goods can provide exogenous variation in both prices and quantities.
These instruments—exogenous characteristics of competing goods—are often called
“BLP instruments,” following their use (along with other types of instruments) in Berry,
Levinsohn, and Pakes (1995). In practice the BLP instruments are often strong shifters
25
of both quantities and markups, particularly when used in good approximations to their
optimal (efficiency maximizing) form (see section 4.2.5). We provide additional discussion
of these instruments in section 5.4. Obviously, the validity of BLP instruments depends
on their exogeneity (mean independence from the demand shocks). In some cases, it may
be more natural to assume that at least some components of xjt are chosen by firm j with
knowledge of the demand shocks (or other shocks correlated with the demand shocks).
In that case, it is clear that these product characteristics cannot be used as instruments,
and an appropriate alternative strategy will be necessary.
4.2.3 Waldfogel-Fan Instruments

A fourth class of candidate instruments involves characteristics—e.g., average demo-
graphic measures—of nearby markets. In some applications these will act as exogenous
shifters of equilibrium markups. Despite the reference to “other markets,” the logic of
these instruments is fundamentally different from that for Hausman instruments. In
many applications, prices are set at a regional or “zone” level, with each zone covering
more than one market.36 In standard models, equilibrium prices for a zone will then
depend on all factors affecting demand in the zone. For example, a market with a given
distribution of income will be more likely (under zone pricing) to have high prices if it
is adjacent to (in the same zone as) a high-income market than if it is surrounded by
low-income markets. This will be true even if only some of the firms in the target market
also operate in the nearby market (Fan (2013)). Thus, income in markets adjacent to
market t will affect equilibrium markups in market t and can serve as an instrument for
pt , as long as the income measure is uncorrelated with the demand shocks ξt in market
t. This will be most plausible when income in market t is already among the market
observables xt . Of course, other demographic measures may affect equilibrium pricing as
well.
We refer to instruments relying on the distribution of consumer demographics as
“Waldfogel instruments,” since the logic follows the key insight emphasized by Waldfogel
(2003): one’s neighbors influence the types of products and prices one is offered.37
A related strategy can become available when firms compete in partially overlap-
ping service areas. Following Fan (2013), who studied competition among newspapers,
consider firms j and k whose service areas intersect in market t. If each firm sets a
single price for its service area, then demographic characteristics throughout a firm’s
service area will affect its equilibrium pricing strategy. Consequently, demographics of
all markets t0 within firm k’s service will affect firm j’s markup. Thus, demographic
characteristics anywhere within a competitor’s service area can instrument for prices in
36
See, e.g., Williams and Adams (2019) and DellaVigna and Gentzkow (2019).
37
Of course income (or other market characteristics) may also shift the level of demand and therefore
alter firm’s marginal costs if those are upward sloping. Because the Waldfogel instruments affect markups
even with constant marginal costs, one need not take a stand on whether marginal costs are upward
sloping to support this IV strategy.
26
the market(s) served by firm j.38
4.2.4 Exogenous Measures of Market Structure

In some applications exogenous changes in market structure may occur over time. For
example entry or exit of products will alter market shares directly and will alter equilib-
rium markups through the resulting changes in the intensity of competition (overall and
locally in product space). When such entry and exit is exogenous, it can provide useful
variation in both price and quantities. Measures of such variation are, in fact, just one
form of the BLP instruments discussed above.
Another possible change in market structure is a change in firm ownership—a merger,
spinoff, or possibly a change in the extent of common partial ownership. Profit maxi-
mization implies that such changes in ownership will alter the internalization of pricing
externalities and, therefore, equilibrium markups.39 When measures of such changes are
independent of latent demand shocks (conditional on the other exogenous variables in
the model), these can serve as instruments for prices.40
4.2.5 Optimal Instruments

Entirely separate from the question of which observables zjt serve as exogenous instru-
ments is the question of the optimal functions of these variables to use when transforming
conditional moment restrictions like E[ξjt |zjt ] = 0 into unconditional moments like those
defining the BLP estimator.41 Intuitively, this is a question of what transformations of
the exogenous variables yield the most useful variation for pinning down the parame-
ters of the model. This can be particularly important when, as in many applications to
differentiated products markets, there is a large number of excluded instruments which
individually may have limited strength but which together can have strong effects on the
relevant endogenous variables.
Formally, the question is what form of the instruments leads to asymptotic efficiency
of the estimates. This is a standard problem in econometrics. For clarity, here we
will write θ0 to denote the true value of the parameter vector θ. Chamberlain (1986)
38
With chains of partially overlapping service areas, demographics in the service areas of competitors
to competitors could also serve as instruments. In practice, the power of such instruments would need
to be checked.
39
The extent to which variation in common partial ownership affects equilibrium pricing in practice is
an area of active current research, relevant on its own but also to the potential use of common ownership
measures as instruments.
40
See, for example, Miller and Weinberg (2017).
41
This question is also separate from the optimal linear weighting of a given set of moment conditions,
which is already incorporated in the GMM objective function through the weighting matrix. When
estimating demand alone, use of optimal instruments yields a just-identified model, making the GMM
weight matrix irrelevant. This is not the case when estimating demand and supply jointly (see Conlon
and Gortmaker (2020)).
27
considered the problem of optimal instruments under an assumed conditional moment
restriction of the form E[ξjt (θ0 )|zjt ] = 0. He showed, under certain assumptions, that the
∗ ∗
optimal instruments zjt to use in an unconditional moment of the form E[ξjt (θ0 )zjt ]=0
are
∂ξjt (θ0 )

∗ −1
zjt = Ψjt E zjt , (4.11)
∂θ
where Ψjt = E [ξjt (θ0 )2 |zjt ] .
As in many nonlinear models, the optimal instruments are infeasible to compute, since
they depend on both the unknown value θ0 and the unknown distribution of the demand
shocks that is implicit in the expectation operators. Several approaches to approximating
the optimal instruments have been developed. While Berry, Levinsohn, and Pakes (1995)
proposed an initial approach using low-order terms of a polynomial basis, improved op-
tions (improved basis functions or direct approximations of the expectations (4.11)) have
been proposed by Berry, Levinsohn, and Pakes (1999), Reynaert and Verboven (2014),
Gandhi and Houde (2020), and Conlon and Gortmaker (2020).
The consistent message from this literature is that use of (approximately) optimal
instruments can substantially improve the precision of estimates. These alternative ap-
proximation approaches are discussed in detail by Conlon and Gortmaker (2020). Their
associated PyBLP software incorporates many of these as options, along with the appro-
priate extensions for the case of joint estimation of demand and supply.
4.2.6 Evaluating Instruments

In the literature on linear IV models, it is nearly universal practice to report a “first-
stage” regression of the endogenous variables on the instruments. This regression can
produce various diagnostic statistics to evaluate whether the instruments are doing a
“good job” of predicting the endogenous variables. Importantly, these diagnostics can
provide tests for weak instruments. The literature on nonlinear IV does not always
provide clear guidance on an appropriate analog to this important diagnostic exercise,
but various authors have proposed some ideas to consider.
Some discrete choice models, like the logit and nested logit, produce expressions for
δjt that are linear in all parameters, allowing for traditional first-stage regressions. For
example, in the nested logit model we need instruments for price and the “within nest”
share of a product (Berry (1994)), and we can run diagnostics on a traditional first stage.
Salanie and Wolak (2019) derive a linear-in-parameters approximation to the BLP inverse
share function, which could perhaps be used in a similar first-stage exercise. Some papers
present other quasi-first stage exercises, for example regressions of prices and quantities
on the exogenous variables. Motivated by the optimal instruments expression in (4.11),
Berry, Carnall, and Spiller (1996) report an “ex-post” quasi first-stage by regressing
∂ξjt (θ̂)/∂θ on exogenous variables (with θ̂ denoting the estimated parameters.)
Further progress may be needed here, especially to consider weak instruments and
related issues in the context of nonlinear models with multiple endogenous variables,
multiple structural errors in each reduced form, and multiple instruments. Useful ideas
28
may be found in Andrews and Guggenberger (2017), Andrews (2018), and Andrews and
Mikusheva (2020). The separate topic of the local sensitivity of parameter estimates to
IV assumptions is considered in Andrews, Gentzkow, and Shapiro (2017).
4.3 Using a Supply Side

In many cases, estimation of demand is motivated by questions defined by market coun-
terfactuals involving both supply and demand. And even when one is focused exclusively
on questions about demand, there can be substantial gains in precision from exploiting
the additional restrictions that come from the equilibrium conditions of a supply model.
Adding the supply side is relatively straightforward. Although almost any model of
oligopoly supply can be considered, the most common is that implied by Nash equilib-
rium in a simultaneous price-setting game under complete information.
A key insight from BLP is that the system of multiproduct Nash price-setting first-
order conditions can inverted to solve for the equilibrium level of marginal costs as func-
tion of (i) observed data and (ii) slopes of demand that are known once demand is
identified. This insight extends easily to other static oligopoly models (e.g, quantity set-
ting) and to nonparametric models.42 The inverted first-order condition for each good j
in these cases takes the form
mcjt = ψj (st , pt ),
where the function ψj is known when the oligopoly game is specified and demand is
identified. Thus, when demand is identified, identification of the equilibrium level of
marginal costs (and therefore markups) follows immediately.
Since marginal costs can be recovered (up to sampling error) by inverting the equilib-
rium first-order conditions, standard arguments can allow identification and estimation
of marginal cost functions as well. These functions can be of direct interest and they
are essential for the identification of counterfactuals that alter the equilibrium quantities
produced. But even if one’s interest is limited to demand, one can add precision to the
estimates of demand parameters by exploiting exogeneity assumptions involving shocks
to marginal costs.
Suppose, for example, that we specify
mcjt = cjt (wjt , qjt , ωjt , γ) = wjt γ0 + γ1 qjt + ωjt , (4.12)
where wjt and ωjt represent, respectively, observed and unobserved cost shifters associated
with good j. Here we have introduced the new parameters γ = (γ0 , γ1 ) that govern the
effects of cost shifters and quantity on marginal cost. Given any value of γ and the
demand parameters θ, the inverted first-order conditions together with equation (4.12)
42
Berry and Haile (2014) demonstrate nonparametric identification of marginal costs and cost functions
for a large class of supply models. The approach here generalizes the use of imperfectly competitive
first order conditions going back to Rosse (1970) and is inspired by the (somewhat different) use of
multiproduct first-order conditions in Bresnahan (1987).
29
imply a unique value of ωjt , so that we can write ωjt (σ, α, β, γ). Given any observables
z̃jt that are assumed to be mean-independent of ωjt , we now have additional moment
conditions of the form
E [ωjt (σ, α, β, γ) z̃jt ] = 0. (4.13)
A typical assumption is that both wt and xt (the two may overlap) are mean in-
dependent of each ωjt , yielding a large number of instruments that can be included in
the supply-side instruments z̃jt . Furthermore, (4.12) naturally models marginal cost as
depending only on own-cost shifters and own-quantity. This implies that additional pos-
sible excluded instruments include the exogenous cost shifters and demand shifters for
rival products. Thus, adding the supply side will often introduce few new parameters
relative to the number of new moment conditions. Importantly, these supply moments
depend not only on the cost parameters γ but also on the demand parameters. Thus,
except in the case of just-identification, the supply moments (4.13) will provide infor-
mation about demand parameters as well. In practice, this will often manifest through
substantial improvements in the precision of the demand estimates—the parameters βν
governing the heterogeneity in random coefficients, in particular—when one incorporates
supply moments (see, e.g., Berry, Levinsohn, and Pakes (1995) and Conlon and Gort-
maker (2020)).
When a model of supply leads to such overidentifying restrictions, it may be possible
to use the satisfaction or failure of these restrictions to discriminate between alternative
models of supply. This can be important in multiple ways. Hypotheses about firm “con-
duct” are often of direct interest. And to the extent that one is relying on the supply
model for precise demand estimates, it would be valuable to have a way of evaluating
the hypothesized model of firm behavior. This idea of discriminating between alternative
models of firm conduct has its roots in the pioneering work of Bresnahan (1981, 1987).
Although it is not possible to represent firm conduct as simply a parameter in a conjec-
tural variations model, Berry and Haile (2014) have generalized key insights from that
early literature to show that positing a model of firm conduct indeed provides falsifiable
restrictions that can discriminate between alternative models of conduct, even without
parametric specifications of demand or cost functions. The essence of their results is that
there are many observable (or estimable) sources of variation in market conditions that
alter equilibrium markups—differentially across different models of firm conduct. The
comparative statics predictions of a given model of firm conduct typically will not align
with the price variation observed in the data unless the hypothesized model is correct.
We refer readers to Berry and Haile (2014) for a more formal discussion. Statistical
testing procedures have recently been developed and applied by Backus, Conlon, and
Sinkinson (2020) and Duarte, Magnolfi, Sølvsten, and Sullivan (2021).
4.4 Computing the BLP Estimator and Standard Errors

Berry, Levinsohn, and Pakes (1995) provided a computational algorithm for their esti-
mator and for the associated standard errors. Their approach combined Monte Carlo
30
approximation of the integrals defining market shares and demand elasticities with an
algorithm similar to the sketch provided on page 23. This is an example of a nested-fixed-
point algorithm, using a contraction mapping to solve a set of fixed-point equations for
the demand shocks ξt (θ) that equate predicted and actual market shares in every market
at each trial value of θ.
Whether estimating demand in isolation or jointly with supply, proper computation
of the BLP estimator can be challenging. While many authors succeeded in imple-
menting and customizing the BLP algorithm, naı̈ve implementations can easily fail.43
Over the last decade, several authors—notably Dubé, Fox, and Su (2012) and Conlon
and Gortmaker (2020)—have aimed to modernize the approach to computing the BLP
estimator by combining modern computing capabilities, new computational methods,
and a set of “best practices” tailored specifically to computation of the BLP estimator.
Important advances include computational power allowing improvement in procedures
for approximation of the integrals defining market shares; Monte Carlo evidence yielding
critical guidance on convergence tolerances; management of potential rounding, overflow,
and underflow errors; new techniques for approximating optimal instruments; improved
methods for computing pricing equilibria; and the use of modern solvers, often exploiting
gradient-based optimization.
Conlon and Gortmaker (2020) discuss these and other important advances in de-
tail and propose a modern version of the BLP nested-fixed-point algorithm.44 Their
paper also serves as an introduction to open-source software (“PyBLP”) for implement-
ing this approach, either for estimating demand in isolation or simultaneous estimation
of demand and supply. Simulations in Conlon and Gortmaker (2020) illustrate relative
advantages of alternative techniques available among the many options offered in the Py-
BLP software. We refer readers to Conlon and Gortmaker (2020) for details, extensive
references, and advice on current best practices in computation of the BLP estimator,
optimal instruments, standard errors, and equilibrium counterfactuals.
Regarding standard errors, note that the program on page 23 intentionally defines a
set of moment conditions that permit the use of GMM inference techniques. There are
four potential sources of variance in the moment conditions: (a) the data variance across
products within markets, (b) the cross-market variance in data, (c) variance due to the
finite sample of consumers used to construct the market shares s̃jt and (d) variance due
to simulation draws (if any).
There is a small literature examining asymptotic issues that arise with the BLP es-
43
Knittel and Metaxoglou (2014) document such possibilities.
44
In an essential earlier contribution, Dubé, Fox, and Su (2012) examined potential problems with
poorly formulated versions of the nested fixed point algorithm and proposed the alternative approach
of applying standard specialized constrained optimization solvers to the program (4.6) defining the BLP
estimator. Dubé, Fox, and Su (2012) showed that this program often can be reformulated to yield a
form amenable to their “MPEC” (mathematical programming with equilibrium constraints) approach,
particularly when first and second derivatives of the Lagrangian can be supplied. The authors have
made code for this approach publicly available.
31
timator. Berry, Linton, and Pakes (2004b) discuss two related issues that arise with
asymptotic approximations treating the number of products as growing large : (a) sim-
ulation error in the approximation of choice probabilities implied by the model; and (b)
sampling error in the empirical market shares—sample means that are interpreted as
approximations to the population means implied by the model. These issues are closely
related, as some market shares must become small as the number of products per mar-
ket grows. Berry, Linton, and Pakes (2004b) note that the nonlinear inversion for the
mean utilities δt can cause the simulation and sampling errors to “blow up” as market
shares become small. It is therefore important to control the simulation error as much
as possible—by using a large number of simulation draws and/or importance sampling
techniques—or else to avoid simulation entirely by using accurate numerical integration.
Berry, Linton, and Pakes (2004b) discuss the need to account for simulation and sampling
error when reporting estimation results, and provide formulas for doing so. To focus on
the troublesome issue of small market shares, they present asymptotic variance results
as the number of products, J, grows large.45 Freyberger (2015) and Hong, Li, and Li
(2020) provide general treatments of the asymptotics of the BLP-style estimators as the
number of markets (and, if applicable, the number of simulation and sampling draws)
grows large.
Aside from standard errors for parameter estimates, one will typically be interested
in standard errors on counterfactual quantities of interest—e.g., a price change or wel-
fare change under a hypothetical merger or counterfactual policy. Current practice is
to construct such standard errors using either a parametric bootstrap or nonparametric
bootstrap. In the former case, one simulates parameter draws from their normal asymp-
totic approximation and recomputes the implied quantity of interest for each draw (see,
e.g., Nevo (2001)). In the latter case, one re-samples the data, re-runs the estimation
procedure on each bootstrap sample, and computes the implied quantity of interest for
each such bootstrap replication.
As a final issue, with sufficiently small consumer samples (relative to the number of
products), one may observe market shares s̃jt for some goods that are equal to zero,
even though the expected shares (the choice probabilities from the model) are strictly
positive. This creates a problem for any estimation strategy relying on inversion of
demand. Gandhi, Lu, and Shi (2019a) show that zero market shares nonetheless imply
bounds on mean utilities, yielding an estimation approach that is valid in the presence
45
Armstrong (2016) points out a number of pathologies that occur under particular assumptions as J
grows large. For example, many models of competition will imply constant (or even zero) markups in the
limit, hindering any IV strategy that relies on the induced variation in oligopoly markups. These results
might be taken as a warning about how to develop useful asymptotic approximations when taking limits
in the number of products; asymptotic results that assume perfect competition or constant markups will
provide poor approximations for actual differentiated-products oligopoly markets in which markups are
substantial and highly sensitive to the intensity of competition faced by each product. Note also that,
although some types of instruments act only through their effects on markups, the essential role of BLP
instruments from the perspective of identification (see section 5) is to provide exogenous variation in
quantities conditional on prices.
32
of zero shares. Quan and Williams (2018) provide a more specialized solution to this
problem when products appear in multiple markets.
5 Nonparametric Identification: Market-Level Data

We have presented a standard specification of demand derived from a random utility dis-
crete choice model that incorporates a combination of parametric assumptions—on the
functional form of utilities and on the distributions of latent heterogeneity across con-
sumers. These specific choices are not essential: while some of the presented parametric
choices lead to computational simplifications, the same approach generalizes easily to
other parameterizations. For example, a researcher could relax the linearity of the mean
utilities or replace one parametric distribution with another.
Because in practice all samples are finite, most empirical work relies to some degree
on choices of functional form. This is true even of “nonparametric” estimation meth-
ods, which depend on the specification of, e.g., a kernel function or a finite-dimensional
parametrization within a sieve sequence. There remains, however, the important question
of whether parametric assumptions can be correctly viewed as finite-sample approxima-
tions or are in fact essential maintained assumptions.
In many simple empirical settings in economics, it is straightforward to show that
parametric assumptions are not essential. That is, the quantities of interest are non-
parametrically identified. For example, in the case of IV regression, it is not essential to
assume the usual linear-in-parameters form. Rather, identification holds nonparametri-
cally given appropriate instruments for any endogenous variables (see, e.g., Newey and
Powell (2003)). Berry and Haile (2014) have demonstrated that the same is true for a
flexible nonparametric model of demand for differentiated products, even when one has
access only to market-level data.46 Indeed, the essential requirements are precisely the
instrumental variables conditions necessary for identification in the case of regression.
5.1 Insights from Parametric Models

Some useful intuition and insights about identification of demand can be gleaned by
reviewing some familiar parametric models of discrete choice demand. In particular,
these examples illustrate (i) the role of an index structure linking the unobservables ξjt to
observables; (ii) the role that inversion of the demand system plays in yielding equations
amenable to standard econometric tools; and (iii) the need for excluded instruments for
all endogenous variables (prices and quantities) appearing in those equations.
46
Here we focus on identification of demand. Berry and Haile (2014) also demonstrate nonparametric
identification of marginal costs, cost shocks, and marginal cost functions, as well as discrimination
between alternative models of supply.
33
5.1.1 Multinomial Logit
Consider a multinomial logit specification in which consumer i’s utility from good j in
market t takes the form
uijt = xjt β − αpjt + ξjt + ijt .
A key feature of this model is the linear index
δjt = xjt β − αpjt + ξjt . (5.1)
As is well known, each market share
eδjt
sjt = P (5.2)
1+ k eδkt
is a nonlinear function of the indices δ1t , . . . , δJt . A key observation is that this map from
indices to market shares is easily inverted; in particular,
δjt = ln(sjt ) − ln(s0t ). (5.3)
Substituting (5.3) into (5.1) yields an estimable equation for each good j of the form
ln(sjt ) − ln(s0t ) = xjt β − αpjt + ξjt . (5.4)
Estimation of this equation is straightforward (Berry (1994)). Indeed, although none of

the demand (share) equations (5.2) is a regression equation, (5.4) is. Its identification
requires one instrument for the one endogenous variable on the right-hand side. Thus,
even a single market-level instrument for prices can suffice.
This basic “recipe” of “index-inversion-instruments” is transparent in the multinomial
logit model. As we’ll see below, it generalizes to other parametric and nonparametric
models. To foreshadow elements of the fully nonparametric case, consider a moreflexible
(1) (2)
semi-parametric model in which we (a) partition xjt for each j as xjt , xjt , with
(1)
xjt ∈ R, and (b) require that the multinomial logit structure hold only after conditional
(2)
on the remaining exogenous observables xt . This allows a much more flexible treatment
(2)
of xt in the demand model. For example, the mean utility of good j could be affected
(2)
by a fully nonparametric function of xjt , or could even depend on xkj for k 6= j. To
(2)
consider identification in this more flexible model, we can simply fix xt at an arbitrary
value and drop it from the notation, so that we have the random utility specification
(1)
uijt = xjt β − αpjt + ξjt + ijt j = 1, . . . , J.
34
The inverted demand equations, after dividing through by β (1) , take the form
(1) 1 α
xjt + ξ˜jt = (1) (ln(sjt ) − ln(s0t )) + (1) pjt , (5.5)
β β
ξjt (1)
where ξ˜jt = β (1) . Compared to (5.3), we still have a type of index—here xjt + ξ˜jt —on the
left-hand side; and this index is still equal to a tightly parameterized function of markets
shares and the price of good j. If we re-arrange one more time to write
(1) 1 α
xjt = (1)
(ln(sjt ) − ln(s0t )) + (1) pjt − ξ˜jt , (5.6)
β β
we get something that now resembles a regression equation, with an additively separable
error ξ˜jt on the right-hand side. This differs from a regression equation in one important
way, however: the variable on the left-hand size is an exogenous product characteristic;
i.e., it is mean independent of the “error term” ξ˜jt . Writing the equation in this un-
usual way forms a connection to the more complicated models we discuss below. An
implication of (5.6) is that, just as in the original fully linear multinomial logit model,
only one excluded instrument zjt is needed to identify this equation despite the presence
of two right-hand-side endogenous variables, (ln(sjt ) − ln(s0t )) and pjt . In particular,
with one such excluded instrument, we have the bivariate conditional moment restriction
(1)
E[ξjt |xjt , zjt ] = 0, which can identify the two parameters appearing on the right-hand
side of (5.6).
5.1.2 Nested Logit

Consider now the more general nested logit model where the inverted demand system
takes the form
ln(sjt ) − ln(s0t ) = xjt β − αpjt + (1 − λ) ln(sj/g,t ) + ξjt j = 1, . . . , J.
Here the subscript g denotes the nest (or “group”) to which product j belongs. Compared
to the multinomial logit, flexibility is added through the new coefficient λ on the within-
group share sj/g,t of good j. Each equation of this inverted demand system again looks
like a regression equation. However, we now need an instrument for the endogenous
variable ln(sj/g,t ) as well as for price. Note that ln(sj/g,t ) is a particular function of the
full share vector (s1 , . . . , sJ ) implied by the parametric functional form.
(2)
Again, if we condition on xt and drop it from the notation, we can rewrite each
equation of this inverted demand system as
(1) 1 α
xjt + ξ˜jt = (1) ln(sjt ) − ln(s0t ) − (1 − λ) ln(sj/g,t ) + (1) pjt

(5.7)
β β
On the left-hand side we have the same index derived under the multinomial logit; but
on the right-hand side we have a more complicated function of market shares and price.
35
Again, the rearrangement has no effect on the set of instruments needed.
5.1.3 The BLP Model

Now consider the BLP mixed logit model discussed previously and suppose that there is
(1)
at least one component xjt of the observed product characteristics xjt that is specified
as entering the model without a random coefficient. This is a restriction—and one we’ll
discuss further below; but in practice it is satisfied in almost all applications. As discussed
earlier, the demand system has an inverse, usually written in terms of the mean utility
vector δt or demand shocks ξt . However, as above we can also write the inverse for each
good j in the form
(1) 1
(2)

xjt + ξ˜jt = (1) δ̃j st , pt , xt , θ . (5.8)
β
On the left-hand side we have the same linear index in the previous examples. On
the right-hand side, we haven an inverse market share function δ̃j , which we know to
exist even though it lacks a closed form. This function depends on all of the model
parameters and is a more complicated function of prices and market shares, all of which
are correlated with the demand shock ξ˜jt in this equation. We explained earlier that with
instruments zjt comprising only the exogenous xjt and an excluded instrument for pjt ,
moment conditions of the form E[ξjt zjt ] = 0 will not suffice for identification. Equation
(5.8) suggests why: we need instruments generating sufficient variation in the endogenous
right-hand side variables st and pt .
5.1.4 Index, Inversion, and Instruments

This review of how standard parametric models are identified brings out three recurring
themes:
• demand shocks that enter through indices for each good;
• the presence of a one-to-one mapping between the indices and market shares, al-
lowing inversion of the demand system;
• the application of instrumental variables to identify the components of the inverse

demand.
We will see below that these same ideas allow one to demonstrate nonparametric
identification of demand from market-level data. Following Berry and Haile (2014), we
introduce a nonparametric index restriction to an otherwise very general demand model;
the demand system is then inverted, yielding a set of inverse demand equations with
one demand shock per equation. Standard IV conditions yield identification of these
equations and, therefore, of the realized demand shocks. Identification of demand then
becomes trivial, as the values of all variables in the demand system are known.
36
5.2 Nonparametric Demand Model
Without loss, we condition on a fixed number of inside goods J. Let demand for each
good j in market t be given by
sjt = σj (xt , pt , ξt ) j = 1, . . . , J. (5.9)
Although we write sjt on the left-hand side of (5.9), this measure of demand at the market
level may be measured in quantities, market shares, or other one-to-one transformations
of the demand vector (i.e., the quantities demanded).47 As above, xt represents all
observed exogenous characteristics of the market and goods,48 pt represents the prices of
all goods, and ξt represents the J-vector of demand shocks.
This representation of demand may be derived from a random utility discrete choice
model in which the utilities (ui1t , . . . , uiJt ) are drawn from some unknown joint distri-
bution FU (·|xt , pt , ξt ). However, the demand system (5.9) need not arise from a random
utility specification, or even a discrete choice model. Indeed, thus far, the demand model
is very general, with the only significant restriction being that the number of scalar de-
mand shocks ξjt is J. To demonstrate identification, Berry and Haile (2014) require three
main assumptions closely tied to our insights from the parametric examples.
5.2.1 A Nonparametric Index

(1) (2) (1) (1) (1)
Partition xt as (xt , xt ) where xt = (x1t , . . . , xJt ) ∈ RJ . For each market t, define a
vector of indices δt = (δ1t , . . . , δJt ) where
(1)
δjt = xjt βj + ξjt . (5.10)

(2)
Assumption 5.1 (Index). For all j, σj (xt , pt , ξt ) = σj xt , δt , pt .
(1)
Assumption 5.1 requires that xjt and ξjt enter the nonparametric function σ only
through the index δjt . This index requirement is an important nonparametric functional
form restriction. As our examples above suggest, this type of restriction is implicit in
parametric models widely used in the practice of demand estimation. Here, of course,
the indices (δ1t , . . . , δJt ) are permitted to alter demand for each good j through a fully
47
Berry, Gandhi, and Haile (2013) point out that transformations of quantities demanded to market
shares, expenditure shares, or even purely artificial notions of “shares” can be useful for verifying con-
ditions ensuring invertibility of the demand system (see section 5.2.2 below). However, if demand is
invertible (or is identified) under one known injective transformation of quantities, it is under all such
transformations as well.
48
Any non-price market-level observables—e.g., average demographic measures—may be included in
xt .
37
nonparametric function σj . We will see below why this index structure is valuable,49 and
how it may be relaxed when one has access to “micro data” on individual choices rather
than market-level data.
As in the parametric examples, the fact that demand shocks have no natural location
or scale means that we may assume without loss of generality that E[ξjt ] = 0 and |βj | = 1
for all j. Thus, henceforth we work with indices of the form50
(1)
δjt = xjt + ξjt . (5.11)
(2)
Furthermore, because the exogenous variables xt play no role in the identification argu-
(2)
ment, we will henceforth condition on an arbitrary value of xt without loss of generality
(2)
and suppress xt in the notation.
5.2.2 Inverting Demand

In the parametric examples, a key step was inverting the demand system. We can use
that strategy in the nonparametric model under a “connected substitutes” condition
introduced by Berry, Gandhi, and Haile (2013).
Assumption 5.2 (Connected Substitutes).

(i) σk (δt , pt ) is nonincreasing in δjt for all j > 0, k 6= j, and any (δt , pt ) ∈ R2J ;
(ii) for each (δt , pt ) ∈supp(δt , pt ) and any nonempty K ⊆ {1, . . . , J} , there exist k ∈ K
and ` ∈/ K such that σ` (δt , pt ) is strictly decreasing in δkt .
Part (i) of Assumption 5.2 requires that goods be weak substitutes with respect to
the indices: an improvement in the index δjt must weakly reduce the demand for other
goods. This is automatic in a discrete choice setting in which δjt can be interpreted as
an index altering good j’s quality. While part (i) requires only weak substitution, part
(ii) requires at least some strict substitution among goods j = 0, 1, . . . , J—essentially,
enough that there is no strict subset of goods that substitute only among themselves.
Berry, Gandhi, and Haile (2013) further show that part (ii) is equivalent to a certain
notion of connectedness in the graph of the substitution matrix among goods. In par-
ticular, suppose we represent each good by a vertex and construct a directed edge from
good j to good k if j strictly substitutes to k—i.e., if a reduction in δjt would lead to
an increase in the demand for good k. Part (ii) requires (with appropriate quantifiers)
that this directed graph exhibit a path from each node j to the node 0 associated with
the outside good. Figure 2 illustrates this graph for 2 classes of standard discrete choice
models in which δjt shifts the utility for good j without affecting the utility from other
goods; panel (a) shows the graph for standard random utility models with horizontal
49
Berry and Haile (2014) show that the additive separability of (5.1) is not essential, although relaxing
separability requires strengthening the “relevance” condition on instrumental variables, just as in the
case of regression models (Chernozhukov and Hansen (2005)).
50
If the sign of βj is not known a priori, it is easily determined under our assumptions.
38
differentiation (e.g., multinomial logit or probit, nested logit, mixed logit); panel (b)
shows that for models of vertical differentiation (e.g., Bresnahan (1981)). In both cases
we see that from the vertex associated with any good j > 0, there is a directed path
to the vertex associated with the outside good, as required by the connected substitutes
conditions.
Figure 2: Substitution in Standard Discrete Choice Models
3 2
1 2 3 4
4 1
0 0
(a) (b)
Directed graphs of the substitution matrix for standard discrete choice models, with J = 4
inside goods. Panel (a): standard random utility models of horizontal differentiation,
such as the multinomial logit, multinomial probit, nested logit, mixed logit/probit. Panel
(b): the pure vertical model with an outside good. From each vertex associated with an
inside good there is a directed path to the vertex associated with the outside good.
Berry, Gandhi, and Haile (2013) demonstrate satisfaction of this condition in a wide
range of demand models and demonstrate that, because invertibility of demand is ensured
whenever the connected substitutes conditions hold for some injective transformation of
the demand system, invertibility can be demonstrated by these conditions even in some
cases in which goods are complements. We refer readers to Berry, Gandhi, and Haile
(2013) for additional examples and discussion. For our purposes, the key implication is
that for all demand vectors st such that sjt > 0 for all j, there exists an inverse demand
system taking the form
δjt = σj−1 (st ; pt ) j = 1, . . . , J. (5.12)
39
5.3 Identification via Instruments
We will see that with equation (5.12), identification of demand will follow from availability
of instruments satisfying the same conditions required of instruments for identification
of regression models. In the case of nonparametric regression we are interested in an
equation of the form51
y = Γ (x) + , (5.13)
where x ∈ RK . Newey and Powell (2003) showed that given instruments z satisfying
the mean independence condition E[|z] = 0, a necessary and sufficient condition for
identification of the regression function Γ is a standard “completeness” condition: that
in the class of functions B(·) on RK such that E[B(x)|z] is finite, the only function B
such that E[B(x)|z] = 0 almost surely is a function that maps to zero almost surely
on its domain. This “completeness” condition is, thus, the nonparametric analog of the
standard rank condition for linear regression. It is the formal “relevance” requirement on
the instruments, defining precisely what it means for them to provide sufficient exogenous
variation in the regressors x.
To connect this to demand, observe that we may re-arrange each equation of (5.12)
as
(1)
xjt = σj−1 (st ; pt ) − ξjt , (5.14)
yielding a form similar to (5.13). Unlike the regression equation, here we have an exoge-
(1)
nous variable xjt —indeed, a variable that is an essential instrument—on the left and all
endogenous variables, (st , pt ), on the right. Nonetheless, a very similar argument demon-
strates that with excluded instruments zt , identification of the inverse demand function
σj−1 holds on the same pair of conditions ensuring identification in the case of regression.
(1)
h jt |zt , xt ] = 0 almost
Assumption 5.3 (Instruments). (i) For all j = 1, . . . , J, E[ξ i surely.
(1)
(ii) For all functions B (st , pt ) with finite expectation, if E B (st , pt ) |zt , xt = 0 almost
surely then B (st , pt ) = 0 almost surely.
Lemma 1. Under Assumptions 5.1–5.3, for all j = 1, . . . , J, σj−1 is identified on the

support of (st , pt ).
We refer readers to Berry and Haile (2014) for the proof, which follows that of Newey
and Powell (2003) closely. Observe that when each function σj−1 is known, each ξjt can
be inferred immediately from prices, market shares, and (5.14). With each ξjt known,
identification of demand follows directly from equation (5.9).
Theorem 5.1 (Berry and Haile (2014)). Suppose (st , xt , pt , zt ) are observable and that
Assumptions 5.1–5.3 hold. Then for all j, the demand function σj is identified.
51
We abuse notation here to follow standard convention, letting y, x, and denote the components
of a regression model (with z denoting an excluded instrument). However, none of these components
should be confused with similarly denoted elements of the demand models we discuss.
40
5.4 Discussion
A broad interpretation of this identification result is that, despite the substantial chal-
lenges discussed in section 2, the main requirement for identification of demand is a
completely standard one: availability of suitable instruments for the endogenous vari-
ables. On one hand, this is comforting. On the other hand, the result raises several
natural questions about the precise instrumental variables requirements and whether
further structure on the demand model might relax these requirements.
5.4.1 Why 2J Instruments?

Theorem 5.1 required instrumental variables conditions on both the exogenous product
(1) (1)
characteristics xt and excluded instruments zt . The product characteristics xt have
dimension J by construction. And a necessary condition for satisfaction of the complete-
ness condition is that zt have dimension of at least J. Candidates for zt include those
discussed in section 4.2: cost shifters, proxies for cost shifters (e.g., “Hausman instru-
ments”), and markup shifters such as “Waldfogel instruments” or exogenous measures of
competition. The need for these instruments is quite intuitive: to learn about demand,
we need exogenous variation in the J-dimensional price vector pt . Less intuitive is the
(1)
need for the J-dimensional exogenous product characteristics xt as well.
This requirement ties directly to our discussion in section 2 of the fact that a J-
dimensional unobserved vector ξt appears as a determinant of the demand for each good.
Inversion of the demand system is a “trick” that yields a set of equations
(1)
xjt = σj−1 (st ; pt ) − ξjt , (5.15)
each with one demand shock, as needed for application of standard instrumental variables
arguments. However, each of these inverted demand equations features endogenous vari-
ables (st , pt ) ∈ R2J on the right-hand side. More intuitively, observing that the function
σj−1 above must be strictly increasing in sjt , we can rewrite (5.15) as

(1)
sjt = hj s−jt , pt , xjt , ξjt
(1)
for some function hj . This formulation suggests that xjt can act as an “instrument for
itself,” but that we still need J instruments for prices and J − 1 instruments for the
endogenous quantities sjt .
Putting this differently, we know that to quantify demand, we must pin down how the
quantity demanded of each good j changes when one price varies and all others are held
fixed. This indicates a need for exogenous shifters of prices. But we know (recall section
2.1) that exogenous variation in prices is not sufficient. We must also hold the demand
indices δt fixed. This is not obviously feasible because the demand shocks entering δt
are not observed. However, by exploiting the bijection between the vector δt and the
vector of market shares st (all else fixed), we can indirectly control the value of δt by
41
controlling the share vector st . This requires additional instruments for the J-vector of
market shares.
The need to instrument for market shares can be seen in our parametric examples
as well. There we saw that as we increased the flexibility of the demand model, the
inverse demand equation used for estimation involved not only prices but also the market
share vector: market shares entered with no unknown parameters in the case of the
multinomial logit, but as we added flexibility to the parametric model, additional moment
conditions were required to estimate the inverse demand functions. In the limiting case of
a fully nonparametric model, we require instruments for all prices and quantities (market
shares).
5.4.2 Why BLP Instruments?

(1)
The elements of the J-vector xt are examples of the so-called BLP instruments. Col-
lectively, they have dimension J by construction. They shift prices through the supply
(1)
equilibrium. But at any given price vector, the observables xt also directly affect de-
mand. Thus, these may serve the role of the required instruments for market shares. In
fact, these instruments are essential to identification of the nonparametric demand model
in the case of market-level data. To see this, let us suppose that prices in each market
were set exogenously at levels independent of the demand shocks ξt . We still have the
fundamental problem that demand for each good depends on the J-vector ξt . Inverse
demand still takes the form
(1)
ξjt = σj−1 (st ; pt ) − xjt
or, equivalently,
(1)
sjt = hj s−jt , pt , xjt , ξjt .
Identification of hj or σj−1 will, intuitively, still require J − 1 instruments for s−jt . Cost
shifters and markup shifters are of no help in this case: because they affect market shares
only through prices, they provide no variation in demand beyond that already accounted
(2)
for directly through pt . And we have already conditioned on (held fixed) xt , leaving
(1)
no variation to exploit. Thus, the demand shifters x−jt are the only possibility in the
“menu” of candidate instruments discussed above.52
More simply, we have seen that in the most general model we need independent
variation in all J shares and all J prices. So no matter how much variation we can
generate in prices, we need something else that moves all shares at any given price
vector. The BLP instruments are the only candidates in our setting of market-level data.
52
Additional types of observables or additional assumptions could offer alternative strategies. See, for
example, sections 5.4.4, 6, and 7.
42
5.4.3 Why the Index?
Although the preceding discussion explains why the variation provided by BLP instru-
ments is critical, it may be less clear why these instruments are valid, even given the
exogeneity required by Assumption 5.3. The power of these instruments to shift market
shares is clear. But how is it that observables entering the demand system can end up
satisfying the relevant exclusion restriction? In what sense are these observables properly
excluded? The answer to this question emphasizes one role that the index restriction is
playing in the identification argument.
(2)
Recall that we conditioned on xt , so the inverse demand function associated with
the index δjt is really
(2)
σj−1 st , pt ; xt
(2) (2)
In terms of the identification argument, conditioning on xt really means that xt “in-
(1)
struments for itself.” The index restriction is what leaves xt out of the function σj−1 ,
(1) (1)
making xt available as instruments for shares—in particular, allowing x−jt to serve as
instruments for s−jt .
This feature of the index is closely connected to a restriction that is often so natural
that it is assumed without comment: that characteristics of a good alter its utility but
not the utility from other goods. In the canonical random utility model, this applies to
all components of xjt , pjt , ξjt , leaving x−jt out of the inverted demand equation expressing
ξjt as a function of market shares, prices, and the exogenous observables xjt . Although
(2)
we do not require a specification of utilities, the index structure leaves the role of xt
entirely free while preserving this same natural feature with respect to the components
(1)
of xt .
5.4.4 Further Restrictions and Tradeoffs

The prior discussion makes clear the essential roles of the index restriction, BLP instru-
ments, and additional instruments for prices—at least in the absence of further restric-
tions on the model or additional observables. Here we give just two examples illustrating
how additional structure can soften the IV requirements.
(1)
Suppose that, instead of xjt , it is the price of good j that enters only through good
j’s index. Then
δjt = ξjt − αpjt ,
where we now set α = 1 without loss of generality. Treating xt fully flexibly and condi-
tioning on it, the inverse demand equations would then take the form
ξjt = σj−1 (st ) + pjt
or
pjt = −σj−1 (st ) + ξjt .
43
In this case we would require only J instruments, providing exogenous variation in st
that need not be independent of prices. In this case, cost shifters, Hausman instruments,
Waldfogel instruments, etc. are all candidates.
(1)
As a second example, suppose that both xt and prices enter only through the indices,
as
(1)
δjt = ξjt + xjt β − αpjt ,
where we may again set α = 1 without loss. The inverse demand equations then have
(2)
the form (suppressing xt )
ξjt = σj−1 (st ) − xjt β + pjt
or
pjt = −σj−1 (st ) + xjt β + ξjt
Again, instruments are required only for the market share vector st , and the BLP instru-
ments x−jt are now available options in addition to those discussed above.
These are just two examples illustrating trade-offs between functional form restric-
tions and IV requirements. This points out one role that even nonparametric functional
form restrictions can play in practice: filling in the gap between the exogenous variation
that may be available in a given data set and that which would be needed to discriminate
between all nonparametric demand systems. Of course, some types of restrictions—e.g.,
adding symmetry, exchangeability, nesting, etc.—can arise as natural economic restric-
tions. And in some cases, better data will provide an avenue for relaxing the IV require-
ments. We turn next to a leading example of this: consumer-level data.
6 Micro Data, Panels, and Ranked Choices

6.1 Micro Data
In the context of demand estimation, the term “micro data” typically refers to a setting
where a researcher observes individual consumer characteristics dit matched to the choices
qit (e.g., vectors of quantities purchased) of each consumer. With market-level data,
the observables can reveal only the marginal distributions Fd (dit ) , Fq (qit ) of consumer-
specific observables (e.g., demographics) dit and consumer-level choices qit (conditional on
market-level observables) in each market. Micro data can reveal their joint distribution
Fdq (dit , qit ). This clearly provides useful information about how the observables dit alter
individual choices. For example, this could allow assignment of each consumer type—
each value of dit —to its own market, making the model completely flexible with respect
to the effects of dit . However, if one retains a more standard notion of market (e.g.,
based on a combination of time and geography), micro data provide a panel structure:
observed outcomes for many individual consumers within each market. A key benefit—
even if one is ultimately interested only in the market-level demand faced by firms—is
that one can then exploit variation across consumers within each market, where the
44
product×market-level demand shocks are fixed.
McFadden’s classic work on demand for transportation modes (McFadden, Talvitie,
and Associates (1977)) is an example of estimation from micro data. Other prominent ex-
amples involve demand for hospitals (e.g., Capps, Dranove, and Satterthwaite (2003), Ho
(2009)), retail outlets (e.g., Burda, Harding, and Hausman (2015)), residential locations
(e.g., Bayer, Ferreira, and McMillan (2007), Diamond (2016)), automobiles (e.g., Gold-
berg (1995), Petrin (2002)) and schools (e.g., Neilson (2020)). In these cases, observed
demographic measures and geographic locations often play important roles. Other exam-
ples of consumer-level observables include product-specific advertising exposure (Acker-
berg (2003)), consumer-newspaper ideological match (Gentzkow and Shapiro (2010)),
or the match between household demographics and those of a school or neighborhood
(Bayer, Ferreira, and McMillan (2007), Hom (2018)).
As we will see below, one significant advantage of micro data is the potential for
within-market variation to lessen (but not eliminate) reliance on instrumental variables.
Indeed, micro data can both reduce the number of instrumental variables necessary for
identification and make new kinds of instrumental variables available.
To consider parametric models, we again focus on a mixed logit specification, with
conditional indirect utilities of the form
uijt = xjt βit − α0 pjt + ξjt + ijt ,
where
L
X
(k) (k) (`,k) (k)
βit =β0 + βd di`t + βν(k) νit .
`=1
We can rewrite this as

uijt = δjt + µijt (νit ; βd , βν ) + ijt , (6.1)
with !
K
X L
X
(k) (k) (`,k)
µijt (νit ; βd , βν ) = xjt βν(k) νit + βd di`t (6.2)
k=1 `=1
and
δjt = xjt β0 − α0 pjt + ξjt . (6.3)
McFadden, Talvitie, and Associates (1977) referred to δjt as the “alternative-specific”
constant, which was held fixed in certain policy counterfactuals. The modern IO litera-
ture emphasizes the dependence of these “constants” on prices and other observed and
unobserved factors. Of course, demand elasticities (and other key aspects of demand) are
defined by responses to variation in one determinant of δjt while all other determinants
45
are held fixed.53
As discussed in section 2, the latent demand shocks in these constants introduce the
challenges of simultaneity/endogeneity. There was at one point perhaps some confusion
as to whether the simultaneity problem arises in the micro-data context, reflecting the
observation that the individual consumer does not “cause” the price. But this observation
has no bearing on the simultaneity problem that arises from unobservables at the level of
the product or market. We will see below that micro data can provide partial solutions to
the simultaneity challenges. However, as our discussion below makes clear, the problems
of simultaneity/endogeneity stem not from the level of aggregation of the data but from
the presence of market×product-level demand shocks whose effects are confounded with
those of market×product-level prices. Addressing this endogeneity problem generally
will still require cross-market variation and instruments for prices.
With the specification above, choice probabilities for each consumer i take the form
Z
exp{δjt + µijt (νit ; βd , βν )}
sijt = PJt dFν (νit ) (6.4)
k=0 exp{δkt + µikt (νit ; βd , βν )}
for each good j = 0, 1, . . . , Jt . Let j(i) denote the good selected by consumer i in market
t. Substituting j(i) for j in (6.4) gives the likelihood contribution of consumer i’s choice
as a function of parameters (δ, αy , βd , βν ), where we have now defined δ = {δt }Tt=1 and
treated it as a parameter vector. Although this likelihood would need to be approximated
by simulating from the distribution Fν , this immediately suggests an estimation approach
for these parameters, at least when the number of observed consumers per good is large
in each market.54 In particular, one could estimate (δ, βd , βν ) by maximizing the product
of these (simulated) likelihoods over all consumers,
Z
Y exp{δj(i)t + µij(i)t (νit ; βd , βν )}
L(δ, βd , βν ) = PJt dFν (νit ). (6.5)
i,t k=0 exp{δkt + µikt (νit ; βd , βν )}
Of course, to answer most economic questions one will also need to estimate the
parameters α0 and β0 in (6.3). The simplest approach is to run a second-step linear
IV regression of the estimated δjt on xjt and pjt .55 As an example, Bayer, Ferreira,
53
A similar observation limits the applicability of random utility models that represent all latent factors
with a single (typically additive) random shock. For example, the random utility model in (6.1)–(6.3)
could be rewritten more compactly as uijt = φ(xjt , pjt ) + eijt , allowing correlation between eijt and
(xjt , pjt ). But this representation alone is not adequate as a model of demand in such a setting. To
measure an own-price demand elasticity, for example, one must hold ξjt (among other things) fixed while
letting both the mean utility for good j and the distribution of the stochastic term µijt (νit ; βd , βν ) adjust
to a change pjt . By specifying only a composite stochastic term eijt , the more compact representation
does not allow one to even define such ceteris paribus changes.
54
Otherwise, the parameters δ may be poorly estimated.
55
It will be necessary here to account for estimation error in the estimated δjt , so the correct standard
errors for estimates constructed this way are not those of two-stage least squares.
46
and McMillan (2007) and Bayer and Timmins (2007) apply this two-step method to the
demand for residential locations. Note that, in this parametric context, we now require
just one excluded instrument, for the endogenous pjt . In addition to cost shifters and
cost proxies, candidate instruments include BLP instruments and Waldfogel instruments.
In fact, unlike the case of market-level data, here one can use own-market forms of the
Waldfogel instruments, such as the market-level means of dit , as long as these are not
elements of xjt and not correlated with ξjt . Thus, the parametric micro data model
requires as few as one excluded instrument (for price) to learn all parameters and, as
compared to the market-level case, can allow the use of an additional class of instruments.
Although this two-step approach is instructive, it will often be preferable to estimate
all parameters at once, exploiting both within-market and cross-market variation. At a
minimum, this will typically aid efficiency. One-step estimation also avoids estimating
the extra “parameters” δ—potentially a large number of them—when in fact these are
defined in (6.3) as functions involving only α0 and β0 as unknown parameters.
A more subtle issue concerns identification. In the fully parametric model, it may in-
deed be possible to estimate the parameters (βd , βν ) using only within-market variation—
even using data from only a single market.56 But, the available results on nonparametric
identification with micro data (see section 7) suggest that this possibility is dependent
on the parametric structure. Indeed, the results there require both within-market and
cross-market variation even to learn the relative effects of consumer observables dit (the
nonparametric analog of βd here).
Exploiting all the variation in the data needed for nonparametric identification can
often lead to much more precise estimates of parametric models. For example, Berry,
Levinsohn, and Pakes (2004a) report that they tried to estimate a related random co-
efficients discrete choice model on micro data from a single market, but found that the
estimates were very noisy. When they added choice-set variation (in the form of “second-
choice” data—see section 6.3), the results became much more precise. This is consistent
with the idea that some form of cross-market data may be important for learning about
the parameters βd and βν .
To estimate all parameters jointly, one could estimate using moment conditions reflect-
ing the score of the likelihood (6.5) with respect to (δ, βd , βν ) together with orthogonality
conditions of the form
E [(δjt − xjt β0 − α0 pjt ) zjt ] = 0,
where zjt represents the exogenous xjt combined with excluded instruments for pjt . The
latter take the same form as the orthogonality conditions used to estimate from market-
level data, but without excluded instruments beyond those the for the price pjt . Again,
this illustrates a substantial advantage of micro data.
In practice estimation using a simulated likelihood function or simulated score func-
tion is sometimes unattractive. One reason is that simulating the log-likelihood (or its
56
We are not aware of results characterizing the essential sources of variation for identification of the
parametric model.
47
score) with sufficient precision can be computationally demanding, particularly when
some true choice probabilities are close to zero. Train (2009) provides useful discussion
and offers a number of computational tricks. Modern computational tools may expand
the applicability of such approaches. However, a common alternative is to avoid the like-
lihood and rely instead on moment conditions capturing key variation across consumers
and their choices.
Following Berry, Levinsohn, and Pakes (2004a), for example, one can combine mo-
ments reflecting market shares (typically fitting these exactly, as is usually done in the
case of market-level data) with “micro moments” characterizing key features of the joint
distribution of consumer i’s characteristics and the characteristics of her choice j(i).
Typical micro moments include covariances, or conditional expectations of consumer
characteristics given characteristics of the chosen product (or vice versa). As an ex-
ample of the latter, in the case of autos one might use the conditional expectations
of family size, age, and income conditional on the class of automobile (e.g., minivan,
compact, luxury, pickup). This type of one-step method of moments approach again il-
lustrates the more limited reliance on orthogonality conditions when one has micro data:
in essence, aggregate moment conditions that pin down the nonlinear parameters in the
case of market-level data can be replaced by micro moments that are sufficient to identify
(δ, βd , βν ).
The PyBLP software discussed previously provides code for estimating all parame-
ters from micro data using certain combinations of market-level moments and “micro
moments” like those discussed above. This open source code also provides a template
for adapting the computation to incorporate estimation with other combinations of mo-
ments.57
6.2 Consumer Panels

Although what we call micro data is form of panel data, in the case of consumer demand
the term “panel data” is typically reserved for something else: observation of each con-
sumer on multiple choice occasions. For clarity, we refer to this as a “consumer panel.”
Examples include data on different grocery shopping trips for the same consumer, car
purchases by the same family across different years, or health insurance selections by the
same employee in the open enrollment period of different years. The advantage of a con-
sumer panel is that observations on the same consumer on different choice occasions can
provide even more information about the role of individual characteristics in determining
substitution patterns. Intuitively, for example, one may directly observe which product
a given consumer substitutes to in response to an exogenous price increase sufficient to
induce a switch.
The estimation approaches discussed above are easily adapted to this setting. To
illustrate, consider the same mixed logit model, but now splitting our usual notion of
57
Although Conlon and Gortmaker (2020) does not cover micro data, documentation can be found at
https://pyblp.readthedocs.io/en/stable/api.html#micro-moment-classes.
48
market into a geographically-defined market m and a time period t. We focus on the
case in which one observes many consumers (“large N ”) on a small number of choice
occasions (“small T ”) with many geographically-defined markets (“large M ”). For sim-
plicity, consider a two-period panel, so that t ∈ {0, 1}. We also focus on the case in
which one views a consumer’s random coefficients as reflecting stable preferences (thus,
fixed across time periods), with only their product-specific shocks ijmt drawn anew (and
independently) each time period.
The model’s prediction for the probability that a given consumer i in market m
chooses good j in period 0 and good k in period 1 then takes the form (adapting the
prior notation to the refined notion of markets)
eδjm0 +µijm0 (νim ;βd ,βν ) eδkm1 +µikm1 (νim ;βd ,βν )
Z
sijkm = P P dFν (νim ). (6.6)
1 + ` eδ`m0 +µi`m0 (νim ;βd ,βν ) 1 + ` eδ`m1 +µi`m1 (νim ;βd ,βν )
Replacing j and k with the choices actually made by consumer i then yields the likelihood
contribution for consumer i, as a function of the parameters (δ, βd , βν ).58 Just as in the
case of micro data, the score of this likelihood function could be combined with additional
moment conditions to yield an estimation strategy for all model parameters using the
method of simulated moments and instruments for prices.59
The practical concerns associated with relying on a simulated log-likelihood with mi-
cro data can become more severe with panel data. Even with just two time periods,
the choice probabilities above involve (J + 1)2 combinations of potential choices, and the
true likelihood of some jk combinations observed in the data may, in certain contexts,
be extremely small. Again, one could avoid such problems by instead using micro mo-
ments that incorporate some degree of aggregation. There are many possibilities, but a
typical approach would start from the types of aggregate moments and micro moments
discussed for estimation with micro data, adding moments capturing the extra informa-
tion provided by the consumer panel: relationships between choices of the same consumer
across choice occasions. For example, in the case of data on grocery purchases in a given
product category, one might include as moments the covariances between the measured
characteristics of products selected across shopping trips.
6.3 Ranked Choice Data

Some applications offer data on each consumer’s rank ordering of products. Examples
include certain types of conjoint analysis (common in marketing) or ranked school choices
58
Implicit is an assumption of no state dependence—e.g., switching costs, rational inattention, or
development of brand loyalty through past purchase. Allowing state dependence would require a change
to the model, a choice of whether to treat consumers as forward-looking, and, typically, dealing with an
“initial conditions” problem. See, e.g., Dubé, Hitsch, and Rossi (2010) and Handel (2013).
59
Chintagunta and Dubé (2005) apply a similar strategy to a consumer panel of grocery store pur-
chases. Similar to our discussion of the likelihood in (6.5), they estimate the mean utilities δjt and
nonlinear parameters of the utility specification via maximum likelihood, then estimate the parameters
of (6.3) with a second-step IV regression.
49
(e.g., within a strategy-proof school choice mechanism).60 In principle, one could see
consumer’s ranking of all products, although typically one observes only a shortlist of
top choices.
This type of ranked choice data can be thought of as an ideal form of a consumer
panel. One observes, for each consumer, responses to the questions (a) what is your
preferred product among all options? (b) what is your preferred product among all
options excluding your favorite? (c) what is your preferred product among all those
excluding your top two choices, . . . . This is very similar to a consumer panel offering
observation of the same consumer’s choice from multiple choice sets.61 But there are at
least two advantages to ranked choice data. First, the absence of temporal separation can
avoid any question about which stochastic components of the model should be view as
fixed across “choice occasions.” Second, the type of “variation in the choice set” provided
by ranked choice data is ideal for assessing which products are closest substitutes. Indeed,
the substitution patterns that we have focused on as a primary challenge of demand
estimation are closely connected to the relationships between first and second choices.
Observing the first and second choices directly is therefore very powerful.62 And, of
course, the relationships among a longer list of ranked choices are driven by the same
components of the model.
Estimation in the case of ranked choices can proceed along the lines suggested for
a consumer panel. A likelihood approach could again be used, although with the same
types of caveats.63 The method of simulated moments again offers alternatives. Here, for
example, one might combine market shares (average choice probabilities for first choices)
with moments characterizing covariance between components of consumer characteristics,
characteristics of first-choice products, and characteristics of first- and second-choice
products (see Berry, Levinsohn, and Pakes (2004a)).
6.4 Hybrids
A common situation in practice is that one has a combination of multi-market market-
level data and a limited set of micro data or ranked choice data. One prominent example
60
Allenby, Hardt, and Rossi (2019) provides an overview of conjoint analysis. Hastings, Kane, and
Staiger (2010) consider demand for ranked school choices and are followed by a large literature, including
Agarwal and Somaini (2018).
61
For at least some purposes, one will need to assume that the data provide correctly ranked pref-
erences. In conjoint analysis, rankings are typically based on self-reports of hypothetical behavior,
potentially introducing noise in the measured rankings. And in the case of school choice data, even with
a strategy-proof school assignment mechanism, parents might not understand or trust the strategy-proof
nature of the problem they face.
62
See also the discussion in Conlon and Mortimer (2020).
63
In the mixed logit model considered in this section, the joint probability of a consumer’s rankings
factors as a product of standard logit choice probabilities for appropriately defined choice sets, conditional
on (dit , yit , νit ). See, e.g., Train (2009).
50
in the literature is Petrin’s (2002) study of welfare gains from introduction of a new
product, which combined aggregate market shares for automobiles with a small sample
of micro-data from the Consumer Expenditure Survey (CEX). Another is Goeree’s (2008)
study of advertising and personal computer demand, which combined market shares for
individual models with a limited micro-data sample linking individual characteristics to
computer purchases by brand.64 This kind of data configuration lends itself to the type
of simulated method of moments estimation approach discussed above, combining (a)
BLP-style moments involving aggregate market shares and exogeneity of instruments
and (b) moments defined as the difference between predicted and actual (covariance or
conditional mean) relationships between individual characteristics and characteristics of
the product chosen.
7 Nonparametric Identification with Micro Data

Our discussion in the previous section suggests that micro data can be valuable both for
allowing richer demand specifications and for estimation of the “nonlinear parameters”
governing substitution patterns—adding precision and softening the reliance on instru-
mental variables. By returning to the question of nonparametric identification, we can
demonstrate a strong formal confirmation of this message. In this section, we discuss
recent results from Berry and Haile (2020) showing how the addition of micro data can
allow a more flexible demand model and a reduction in both the number and types of
instrumental variables required, as compared to the case of market-level data. Unsur-
prisingly, instruments for the endogenous prices will still be needed. However, micro
data can eliminate the necessity of instruments that shift market shares independently
of prices (cf. section 5.4.1). One can also allow for very flexible effects of consumer-level
observables on demand—for example, avoiding any requirement that certain consumer
observables alter preferences only for certain products.
Of course, replacing variation through relevant instruments with micro-data variation
will require a notion of “sufficient” variation through the consumer observables, explained
below. Furthermore, the identification of highly flexible effects of consumer observables
on demand relies on a combination of within-market and cross-market variation: observed
variation across consumers within a market does not suffice.
7.1 Nonparametric Demand Model

We will proceed somewhat less formally than in our discussion of nonparametric identifi-
cation with market-level data (section 5), referring readers to Berry and Haile (2020) for
a more complete treatment. Consider a nonparametric model of demand characterized
64
Goeree (2008) also extends the BLP model to allow for incomplete “consideration sets,” i.e., for
consumers who are aware of only a strict subset of the available goods.
51
by equations
sijt = σj (dit , yit , xt , pt , ξt ) j = 1, . . . , J. (7.1)
We will interpret sijt as the probability consumer i in market t chooses good j in a
discrete choice demand model, although the arguments generalize to other settings (e.g.,
continuous demand, mixed discrete-continuous demand). As in our previous discussion
of identification, we have conditioned on a given number of goods J. Compared to
the model considered in the case of market-level data, here we have added observed
individual-specific measures (dit , yit ) as determinants of demand. The notation for the
other arguments of σj is as in the preceding sections, with all goods’ prices, product
characteristics, and demand shocks entering the demand for each good j.
Here we will partition (dit , yit ) so that dit ∈ RJ and yit ∈ RH , with H ≥ 0. Thus,
although there is no upper limit on the number of consumer-specific observables, we re-
quire at least as many consumer observables as goods. Furthermore, although we will
require a nonparametric index restriction on the way dit enters the demand model, we can
accommodate other consumer observables yit in an unrestricted way. The observables,
for purposes of considering identification, consist of dit , yit , xt , pt , and choice probabilities
conditional on (dit , yit ) in each market t. In addition, there are observed excluded instru-
ments that we discuss below. We treat the characteristics xt as exogenous, as we have
done in prior sections. To discuss identification we can then condition on an arbitrary
value of xt and suppress xt in the notation below.65
In addition to the required degree of variation in dit , choice sets and price instruments,
the identification results in Berry and Haile (2020) rely on a set of core assumptions on
demand. These play some of the roles of the “index and inversion” assumptions we
discussed in the case of market-level data, although they take a different form. The four
main assumptions are:
(i) For all j, σj (dit , yit , pt , ξt ) = σj (γ(dit , yit , ξt ), yit , pt ), with γ (dit , yit , ξt ) ∈ RJ .
(ii) σ (·, yit , pt ) is injective on the support of γ(dit , yit , ξt ) conditional on (yit , pt ).
(iii) γ (·, yit , ξt ) is injective on the support of dit |yit
(iv) For all j, γj (dit , yit , ξt ) = gj (dit , yit ) + ξjt .
Assumption (i) is a nonparametric index restriction. Like the index restriction in

section 5, it limits the way that the demand shocks enter the model. We will connect
this condition to standard specifications below. With the invertibility requirement of
assumption (ii), this index restriction will ensure that we can construct an inverted
demand system with only one structural error in each equation. A sufficient condition for
assumption (ii) is the connected substitutes property discussed in section 5. Assumption
65
Identification of demand conditional on endogenous xt is an ongoing area of research, related to
Berry and Haile (2020). Some models of endogenous xt may allow one to condition on endogenous xt
and still identify demand responses to ceteris paribus changes in prices.
52
(iii) requires that the index vector itself be invertible with respect to the vector dit .
This is satisfied automatically in some standard specifications, such as linear random
utility models in which each component dijt of dit affects only the utility of good j.66
Assumption (iii) avoids this requirement but maintains the requirement, given any value
of the demand shocks ξt , of a one-to-one mapping between the vector dit and the index
vector. Finally, Assumption (iv) requires that the index function be additively separable.
As with the identification results of section 5, a key motivation for this restriction is to
allow use of the same instrument relevance condition that is required for identification in
additively separable regression models.
To connect these assumptions to familiar parametric models, consider the mixed-logit
random utility specification67
uijt = xjt βijt − αit pjt + ξjt + ijt , (7.2)

(k) (k) PL (`,k) (k) (k) (0)
where βijt = β0j + `=1 βdj di`t + βνj νit and ln(αit ) = α0 + αy yit + αν νit . We can
rewrite (7.2) as
uijt = gj (dit , xt ) + ξjt + µijt ,
where
X L
X L
X X
(k) (`,k) (k) (`,k)
gj (dit , xt ) = xjt βdj di`t = di`t xjt βdj
k `=1 `=1 k
and X
(k) (k) (k) (k) (0)
µijt = xjt β0j + βνj νit − pjt exp(α0 + αy yit + αν νit ) + ijt
k
Thus, recalling that L = J,68 our key assumptions hold (recall that those assumptions
are stated conditional on the suppressed xt , treating xt fully flexibly) as long as the J × J
P (k) (`,k)
matrix of coefficients on dit (whose elements are k xjt βdj ) is full rank.
66
Such an exclusivity condition is often combined with assumptions of independence and “large sup-
port” to demonstrate identification through a “special regressor” argument (see, for example, Lewbel
(2014)). The results discussed here will not require exclusivity, independence, or large support.
67
This example generalizes the canonical model of section 3.2 by allowing random coefficients on
xjt to vary with j. With L = J, the common special case in which dijt enters only the utility for
(k,`)
good j is then obtained by setting βdj to zero except when k = 1 (recall that the first component
of each xjt is a one) and ` = j. Logit specifications allowing choice-specific coefficients on consumer
characteristics are sometimes distinguished from models with choice-specific characteristics by labeling
the former “multinomial logit” and the latter “conditional logit.”
68
Any additional observables are easily incorporated outside the mapping g without further restriction.
The essential requirement is that consumer-level observables have dimension at least J.
53
7.2 Identification
Here we sketch the identification arguments, proceeding in three steps. First, a combina-
tion of within-market and cross-market variation is exploited to uncover the index func-
tion g : RJ → RJ . Then cross-market variation—including that produced by excluded
instruments for prices—allows identification of the demand shocks ξjt for all goods and
markets in the same way that residuals in a nonparametric regression model are iden-
tified. Finally, with the demand shocks known, identification of demand is immediate
from the definition of demand in (7.1). This argument does not require variation in yit ,
so we henceforth fix yit at an arbitrary value and suppress it in the notation.69
7.2.1 Identification of the Index Function

Let S (ξ, p) denote the support of the share vector when the random variables (ξt , pt )
take the values (ξ, p). Because dit varies within each market, the set S (ξ, p) is not a
singleton: each dit in market t is associated with a different observed conditional choice
probability vector sit .
Given the assumptions on demand, for each vector of market shares s ∈ S (ξ, p) there
will be a unique d∗ in the support of dit such that
σ (g (d∗ ) + ξ, p) = s.
This d∗ is the vector of consumer characteristics that generate the choice probability
vector s (given (ξt , pt ) = (ξ, p)). So we may write
d∗ (s; ξ, p) .
Furthermore, the inverted demand system at this point is
g (d∗ (s; ξ, p)) + ξ = σ −1 (s; p) . (7.3)
Note that, because choice probabilities conditional on dit in each market t are observed,
d∗ (s; ξt , pt ) is observed for all t and s ∈ S (ξt , pt ) even though no ξt is observed or known
at this point.
If we differentiate (7.3) within a market t where pt = p and d∗ (s; ξt , p) = d, we obtain
∂g (d) ∂d∗ (s; ξt , p) ∂σ −1 (s; p)

= . (7.4)
∂d ∂s ∂s
If we do the same within another market t0 with the same p and same s ∈ S(ξt0 , p), we
get a similar expression with an identical right-hand side. Setting the two left-hand sides
69
When yit does vary, it can provide a source of overidentifying restrictions.
54
equal and letting d0 = d∗ (s; ξt0 , p), we see that
−1
∂g (d0 ) ∂g (d) ∂d∗ (s; ξt , p) ∂d∗ (s; ξt0 , p)

= . (7.5)
∂d ∂d ∂s ∂s
0
The only unknowns in (7.5) are the matrices ∂g(d ∂d
)
and ∂g(d)
∂d
. Without loss, one may
choose an arbitrary point d and set both g(d ) and ∂g (d )/∂d to arbitrary values.70
0 0 0
Thus, by starting at d0 and stringing together relationships of the form (7.5) covering all
points in the support of dit , one can determine the derivatives of g on the entire support
and integrate these derivatives up to the value of g at all such points.71
Observe that the function g determines the effects of dit on demand relative to those
of the demand shocks ξjt . Thus, identification of g is a key step toward identification of
effects of dit on demand. Because the latent ξt is fixed within a market and alters the
demand of all consumers in that market, it may be unsurprising that identification of
g would require both within-market and cross-market variation. Our proof in this step,
indeed, uses both types of variation. Equations (7.3) and (7.4) make use of within-market
variation, where ξt is fixed. Equation (7.5) then makes use of cross-market variation.
∗
Although ∂d (s;ξ∂s
t ,p)
is observed in each market, equation (7.4) does not, by itself, allow
∂g(d)
one to distinguish ∂d from ∂σ −1 (s, p)/∂s. This indeterminacy is resolved by the cross-
market variation used in equation (7.5).
7.2.2 Identification of Demand

If one hopes that micro data variation can substitute for some variation through in-
strumental variables, some requirement on the extent of variation in dit will of course
be required. Berry and Haile (2020) require that there exist some “common choice
probability” vector s∗ that is reached in every market by a consumer with the “right”
characteristics dit for that market. Specifically, to our earlier assumptions (i)–(iv) we add
(v) There exists s∗ such that s∗ ∈ S (ξ, p) for all (ξ, p) ∈ supp (ξt , pt ).
As an example, with two inside goods and outside good this assumption requires
existence of at least one vector of choice probabilities for the inside goods—perhaps
0.4 for each—that is reached in every market t by conditioning on the right type of
consumer (i.e., the right vector of observables (d1t , d2t )) for that market. We need not
know in advance what this common choice probability vector s∗ is. Indeed, whether such
a vector s∗ exists is observable—formally, this assumption is “verifiable” (see, e.g., Berry
and Haile (2018)).
The strength of the common choice probability requirement in a given application will
depend on the supports of pt and ξt and the effects that each of these has on demand.
70
As discussed by Berry and Haile (2020), these are proper normalizations imposing no restriction on
the demand system (7.1).
71
Berry and Haile (2020) provide technical conditions ensuring that this is possible.
55
A helpful fact is that higher values of pjt and ξjt typically have opposite effects, whereas
equilibrium pricing typically implies positive dependence between pjt and ξjt . Thus, the
effects of even large variation in the demand shocks may be substantially damped by
the accompanying variation in prices. In any case, this assumption strictly relaxes the
“large support” assumption sometimes relied upon to ensure nonparametric identification
in discrete choice models. A large support assumption on dit would require sufficient
variation to move choice probabilities to every point in the simplex in every market; i.e.,
large support would imply that every vector of nonzero choice probabilities is a common
choice probability. Here only one such vector is required.
With a common choice probability vector s∗ , in every market t we have J inverse
demand equations of the form
gj (d∗ (s∗ ; ξt , pt )) = σj−1 (s∗ ; pt ) − ξjt . (7.6)
In each equation, the left-hand side is now known. On right-hand side, s∗ is fixed across
markets. Thus, each equation (7.6) takes the form of a nonparametric regression equation
with a separable structural error. Identification of the “regression function” σj−1 (s∗ ; pt )
then follows immediately from the identification result of Newey and Powell (2003) given
instruments for the endogenous variables—pt here—satisfying the standard mean inde-
pendence (exclusion) and completeness (relevance) conditions.
This immediately implies identification of each ξjt as well. With the demand shocks
ξjt known, identification of demand follows immediately from equation (7.1), since choice
probabilities sijt are observed and all arguments of the functions σj (·) are now known.
7.3 Discussion
These results provide formal confirmation of a key benefit of micro data suggested in our
discussion of the parametric models: micro data with sufficient variation, combined with
choice sets that vary appropriately across markets, allow us to replace instruments for
quantities with micro-data variation via dit . The need for J-dimensional variation in dit is
directly connected to the J endogenous quantities. Note, however, that the “exogeneity”
of the micro-data variation arises not from an exclusion restriction but from the fact
that within a single market, market-level demand shocks simply do not vary. Thus, with
micro data, one has many of the same advantages that allow “within estimation” of slope
parameters in other types of panel data models.
These results imply that micro data can cut the number of required instruments by
half. Related but distinct is the fact that the BLP instruments are no longer required.
A benefit of the latter is our ability to treat all components of xt in a fully flexible way,
considering a strictly more general model than that considered in section 5 for the case
of market-level data. Of course, as discussed in section 5.4.3, this fully general treatment
of xt also implies that BLP instruments are not even available as candidate instruments.
BLP instruments can be made available by requiring some components of xt to enter
through the indices γt (see Berry and Haile (2020) for details). And the other types
56
of instruments discussed in section 4.2 remain valid candidates to instrument for prices
here.
Finally, although the formal results address only the case in which micro data variation
fully replaces J instruments, we expect that micro data with more limited variation will
still be useful in practice. There are many applied cases where one has only “partial”
micro data, or micro data with more limited dimension. In these cases, an applied
researcher might employ a combination of BLP-style instruments and moment conditions
reflecting the available micro data, as discussed in section 6.
8 Some Directions for Future Work

We have focused on “foundations” of demand estimation—the key challenges, core mod-
els, standard methods, and sources of identification. This focus necessarily leaves many
topics neglected or considered only very lightly. A number of these topics offer fruitful
directions for future research.
One is dynamics. Many demand decisions are dynamic, leading to potentially im-
portant roles for various kinds of sunk costs and adjustment costs that interact with
tastes that are persistent through time. Models incorporating such factors raise difficult
computational issues, and also difficult questions about how to discriminate between (or
separately identify) various forms of state dependence and persistent tastes. Interesting
reading here includes Hendel and Nevo (2006), Gowrisankaran and Rysman (2012), and
Handel (2013), as well as the enormous literature on dynamic models more generally,
where distinguishing between unobserved heterogeneity and state dependence has long
been a focus (see, e.g., Heckman (1981)).
A second topic concerns functional forms used for estimation. We noted that the
widely-used mixed logit model offers a compromise between flexibility and the parsi-
mony necessary for real-world estimation. This compromise was designed in particular
to provide flexibility (at least as compared to a multinomial logit or CES model) in the
cross-product derivatives with respect to prices and product characteristics. These sub-
stitution patterns drive answers to many questions of interest—e.g., the sizes of markups
or outcomes under a counterfactual merger. However, other kinds of counterfactuals can
require flexibility in other dimensions. For example, “pass-through” (e.g., of a tariff,
tax, or technologically driven reduction in marginal cost) depends critically on second-
derivatives of demand. It is not clear that a mixed-logit model is very flexible in this
dimension. An alternative is nonparametric demand estimation, as in Compiani (2020),
although many off-the-shelf nonparametric approaches lack the parsimony necessary to
estimate demand systems with a large number of products or product characteristics.
An interesting question is whether alternative (parametric, semi-parametric, or non-
parametric) specifications of demand or the distribution of utilities can offer attractive
alternatives.72
72
See, e.g., Gandhi, Nevo, and Tao (2019b).
57
A third topic of interest is invertibility of demand. Inversion of the demand sys-
tem arises repeatedly our discussions of both identification and estimation. This reflects
a reliance on inversion to address the fundamental challenges of simultaneity and the
appearance of many structural errors (demand shocks) in each demand equation. But
invertibility of a demand system is not automatic, and demand systems often violate
standard conditions used to establish invertibility (i.e., injectivity or “univalence”) of
multivariate maps. Berry, Gandhi, and Haile (2013) discuss this issue and offer invert-
ibility conditions that can accommodate some forms of complementarity. However, in-
vertibility (or alternatives) in the case of complements has not been fully explored.73 The
same is true of invertibility in models in which consumers make multiple purchases. A
related question is the extent to which there are helpful estimation approaches—perhaps
involving partial identification—for settings in which invertibility fails.
The necessity of further progress should not obscure progress made, however. Com-
pared to the situation 10 or so years ago, we now have access to a set of useful identifi-
cation results, greater clarity about the role of instruments and the types of instruments
available for estimating demand, more fully developed asymptotic theory, and set of
now well-proven computational tools for constructing standard estimators. This progress
suggests that much of the best future research will involve serious application of these
existing tools.
73
For a simple approach allowing a particular kind of complementary, see also Fosgerau, Monardo,
and De Palma (2020).
58
References
Ackerberg, D. (2003): “Advertising Learning and Customer Choice in Experience
Good Markets: A Structural Empirical Examination,” International Economic Review,
44, 1007–1040.
Adao, R., A. Costinot, and D. Donaldson (2017): “Nonparametric Counterfac-

tual Predictions in Neoclassical Models of International Trade,” American Economic
Review, 107, 633–89.
Agarwal, N. and P. Somaini (2018): “Demand Analysis Using Strategic Reports:

An Application to a School Choice Mechanism,” Econometrica, 86, 391–444.
Allenby, G. M., N. Hardt, and P. E. Rossi (2019): “Chapter 3 - Economic

foundations of conjoint analysis,” in Handbook of the Economics of Marketing, Volume
1, ed. by J.-P. Dub and P. E. Rossi, North-Holland, vol. 1 of Handbook of the Economics
of Marketing, 151 – 192.
Anderson, S., A. DePalma, and F. Thisse (1992): Discrete Choice Theory of

Product Differentiation, Cambridge MA: MIT Press.
Andrews, D. W. K. and P. Guggenberger (2017): “Asymptotic Size of Kleiber-

gens LM and conditional LR Tests for Moment Condition Models,” Econometric The-
ory, 33, 10461080.
Andrews, I. (2018): “Valid Two-Step Identification-Robust Confidence Sets for GMM,”

Review of Economics and Statistics, 100, 337–348.
Andrews, I., M. Gentzkow, and J. M. Shapiro (2017): “Measuring the Sensitivity

of Parameter Estimates to Estimation Moments,” Quarterly Journal of Economics,
132, 1553–1592.
Andrews, I. and A. Mikusheva (2020): “Optimal Decision Rules for Weak GMM,”
Working paper, Harvard.
Angrist, J. D., K. Graddy, and G. W. Imbens (2000): “The Interpretation of

Instrumental Variables Estimators in Simultaneous Equations Models with an Appli-
cation to the Demand for Fish,” The Review of Economic Studies, 67, pp. 499–527.
Angrist, J. D. and G. W. Imbens (1995): “Two-Stage Least Squares Estimation of

Average Causal Effects in Models with Variable Treatment Intensity,” Journal of the
American Statistical Association, 90, 431–442.
Armstrong, T. B. (2016): “Large Market Asymptotics for Differentiated Product

Demand Estimators with Economic Models of Supply,” Econometrica, 84, 1961–1980.
59
Backus, M., C. Conlon, and M. Sinkinson (2020): “Common Ownership and
Competition in the Ready-to-Eat Cereal Industry,” Working paper, New York Univer-
sity.
Bayer, P., F. Ferreira, and R. McMillan (2007): “A Unified Framework for
Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy,
115, 588–638.
Bayer, P. and C. Timmins (2007): “Estimating Equilibrium Models of Sorting Across
Locations,” The Economic Journal, 117, 353–374.
Ben-Akiva, M. E. (1973): “Structure of Passenger Travel Demand Models,” Ph.D.
thesis, MIT Department of Civi Engineering.
Benkard, L. and S. T. Berry (2006): “On the Nonparametric Identification of
Nonlinear Simultaneous Equations Models: Comment on Brown (1983) and Roehrig
(1988),” Econometrica, 74, 1429–1440.
Berry, S. (1994): “Estimating Discrete Choice Models of Product Differentiation,”
RAND Journal of Economics, 23, 242–262.
Berry, S., M. Carnall, and P. Spiller (1996): “Airline Hubs: Costs, Markups
and the Implications of Consumer Heterogeneity,” Working paper no. 5561, NBER.
Berry, S., J. Levinsohn, and A. Pakes (1995): “Automobile Prices in Market
Equilibrium,” Econometrica, 60, 889–917.
——— (1999): “Voluntary Export Restraints on Automobiles: Evaluating a Strategic
Trade Policy,” American Economic Review, 89, 189–211.
——— (2004a): “Differentiated Products Demand Systems from a Combination of Micro
and Macro Data: The New Vehicle Market,” Journal of Political Economy, 112, 68–
105.
Berry, S., O. Linton, and A. Pakes (2004b): “Limit Theorems for Differentiated
Product Demand Systems,” Review of Economic Studies, 71, 613–614.
Berry, S. T., A. Gandhi, and P. A. Haile (2013): “Connected Substitutes and
Invertibility of Demand,” Econometrica, 81, 2087–2111.
Berry, S. T. and P. A. Haile (2014): “Identification in Differentiated Products
Markets Using Market Level Data,” Econometrica, 82, 1749–1797.
——— (2018): “Nonparametric Identification of Simultaneous Equations Models with a
Residual Index Structure,” Econometrica, 86, 289–315.
——— (2020): “Nonparametric Identification of Differentiated Products Demand Using
Micro Data,” Working paper no. 27704, National Bureau of Economic Research.
60
Bhattacharya, D. (2018): “Empirical Welfare Analysis for Discrete Choice: Some
General Results,” Quantitative Economics, 9, 571–615.
Blundell, R., D. Kristensen, and R. L. Matzkin (2017): “Individual Coun-

terfactuals with Multidimensional Unobserved Heterogeneity,” Tech. rep., CEMMaP
Working Paper CWP60/17.
Blundell, R. and R. L. Matzkin (2014): “Control Functions in Nonseparable Si-

multaneous Equations Models,” Quantitative Economics, 5, 271–295.
Bresnahan, T. (1981): “Departures from Marginal Cost Pricing in the American Au-
tomobile Industry,” Journal of Econometrics, 17, 201–227.
——— (1987): “Competition and Collusion in the American Automobile Oligopoly: The
1955 Price War,” Journal of Industrial Economics, 35, 457–482.
Brown, B. (1983): “The Identification Problem in Systems Nonlinear in the Variables,”

Econometrica, 51, 175–96.
Burda, M., M. Harding, and J. Hausman (2015): “A Bayesian Mixed Logit-Probit

Model for Multinomial Choice,” Journal of Applied Econometrics, 30, 353–376.
Capps, C., D. Dranove, and M. Satterthwaite (2003): “Competition and Market

Power in Option Demand Markets,” RAND Journal of Economics, 34, 737–763.
Chamberlain, G. (1986): “Asymptotic Efficiency in Estimation with Conditional Mo-

ment Restrictions,” Journal of Econometrics, 305–334.
Chernozhukov, V. and C. Hansen (2005): “An IV Model of Quantile Treatment

Effects,” Econometrica, 73, 245–261.
Chintagunta, P. K. and J.-P. Dubé (2005): “Estimating a Stockkeeping-Unit-Level

Brand Choice Model that Combines Household Panel Data and Store Data,” Journal
of Marketing Research, 42, 368–379.
Compiani, G. (2020): “Market Counterfactuals and the Specification of Multi-Product

Demand: A Nonparametric Approach,” Working paper, University of Chicago.
Conlon, C. and J. Gortmaker (2020): “Best Practices for Differentiated Products

Demand Estimation with pyBLP,” RAND Journal of Economics, 51, 1108–1161.
Conlon, C. and J. Mortimer (2020): “Empirical Properties of Diversion Ratios,”

working paper, NYU.
Debreu, G. (1960): “Review of Individual Choice Behavior: A Theoretical Analysis by

R. Duncan Luce,” American Economic Review, 50, 186–188.
61
DellaVigna, S. and M. Gentzkow (2019): “Uniform Pricing in U.S. Retail
Chains*,” The Quarterly Journal of Economics, 134, 2011–2084.
Diamond, R. (2016): “The Determinants and Welfare Implications of US Workers’

Diverging Location Choices by Skill: 1980–2000,” The American Economic Review,
106, 479–524.
Duarte, M., L. Magnolfi, M. Sølvsten, and C. Sullivan (2021): “Testing Firm

Conduct,” working paper, University of Wisconsin-Madison.
Dubé, J.-P., J. T. Fox, and C.-L. Su (2012): “Improving the Numerical Performance
of Static and Dynamic Aggregate Discrete Choice Random Coefficients Demand Esti-
mation,” Econometrica, 80, pp. 2231–2267.
Dubé, J.-P., G. Hitsch, and P. Rossi (2010): “State Dependence and Alternative
Explanations for Consumer Inertia,” RAND Journal of Economics, 41, 417–445.
Eizenberg, A. (2014): “Upstream Innovation and Product Variety in the United States
Home PC Market,” Review of Economic Studies, 81, 1003–1045.
Fan, Y. (2013): “Ownership Consolidation and Product Characteristics: A Study of the

U.S. Daily Newspaper Market,” American Economic Review, 103, 1598–1628.
Fosgerau, M., J. Monardo, and A. De Palma (2020): “The Inverse Product

Differentiation Logit Model,” Working paper, CREST.
Freyberger, J. (2015): “Asymptotic theory for differentiated products demand models

with many markets,” Journal of Econometrics, 185, 162 – 181.
Gandhi, A. and J.-F. Houde (2020): “Measuring Substitution Patterns in Differen-

tiated Products Industries,” Working paper, Univ of Wisconsin.
Gandhi, A., Z. Lu, and X. Shi (2019a): “Estimating Demand for Differentiated
Products with Zeroes in Market Share Data,” Working paper, University of Wisconsin-
Madison.
Gandhi, A., A. Nevo, and J. Tao (2019b): “Flexible Estimation of Differentiated

Products Demand Models Using Aggregate Data,” Tech. rep., University of Pennsyl-
vania.
Gentzkow, M. (2007): “Valuing New Goods in a Model with Complementarities: On-

line Newspapers,” American Economic Review, 97, 713–744.
Gentzkow, M. and J. Shapiro (2010): “What Drives Media Slant? Evidence from
U.S. Newspapers,” Econometrica, 78, 35–71.
62
Gillen, B. J., H. R. Moon, S. Montero, and M. Shum (2019): “BLP2-Lasso for
Aggregate Discrete Choice Models with Rich Covariates,” The Econometrics Journal,
22, 262–281.
Gillen, B. J., H. R. Moon, and M. Shum (2014): “Demand Estimation with High-
dimensional Product Characteristics,” in Advances in Econometrics: Bayesian Model
Comparison, ed. by I. Jeliazkov and D. Poirier, Emerald Publishing, vol. 34.
Goeree, M. S. (2008): “Limited Information and Advertising in the US Personal Com-
puter Industry,” Econometrica, 76, 1017–1074.
Goldberg, P. K. (1995): “Product Differentiation and Oligopoly in International Mar-
kets: The Case of the U.S. Automobile Industry,” Econometrica, 63, 891–951.
Gowrisankaran, G. and M. Rysman (2012): “Dynamics of Consumer Demand for
New Durable Goods,” JPE, 120, 1173–1219.
Handel, B. (2013): “Adverse Selection and Inertia in Health Insurance Markets: When
Nudging Hurts,” American Economic Review.
Hastings, J., T. Kane, and D. Staiger (2010): “Heterogeneous Preferences and
the Efficacy of Public School Choice,” Tech. rep., Brown University.
Hausman, J., G. Leonard, and J. Zona (1994): “Competitive Analysis with Dif-
ferentiated Products,” Annales d’Economie et de Statistique, 34, 159–180.
Hausman, J. and D. Wise (1978): “A Conditional Probit Model for Qualitative
Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Prefer-
ences,” Econometrica, 46, 403–426.
Hausman, J. A. (1996): “Valuation of New Goods under Perfect and Imperfect Com-
petition,” in The Economics of New Goods, ed. by T. F. Bresnahan and R. J. Gordon,
Chicago: University of Chicago Press, chap. 5, 209–248.
Heckman, J. J. (1981): “Heterogeneity and State Dependence,” in Studies in Labor
Markets, ed. by S. Rosen, University of Chicago Press.
Hendel, I. and A. Nevo (2006): “Sales and consumer inventory,” The RAND Journal
of Economics, 37, 543–561.
Ho, K. (2009): “Insurer-Provider Networks in the Medical Care Market,” American
Economic Review, 99, 393–430.
Hom, M. (2018): “School Choice, Segregation and Access to Quality Schools: Evidence
from Arizona,” Working paper, Yale University.
Hong, H., H. Li, and J. Li (2020): “BLP estimation using Laplace transformation
and overlapping simulation draws,” Journal of Econometrics.
63
Imbens, G. W. and W. K. Newey (2009): “Identification and Estimation of Trian-
gular Simultaneous Equations Models Without Additivity,” Econometrica, 77, 1481–
1512.
Kim, K. and A. Petrin (forthcoming): “Control Function Corrections for Unobserved
Factors in Differentiated Products Models,” Quantitative Marketing and Economics.
Knittel, C. R. and K. Metaxoglou (2014): “Estimation of Random-Coefficient
Demand Models: Two Empiricists’ Perspective,” Review of Economics and Statistics,
96, 34–59.
Koopmans, T. C. (1949): “Identification Problems in Economic Model Construction,”
Lewbel, A. (2014): “An Overview of the Special Regressor Method,” in The Oxford
Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics,
ed. by J. S. Racine, L. Su, and A. Ullah, Oxford University Press, 38–62.
MacKay, A. and N. H. Miller (2021): “Estimating Models of Supply and Demand:
Instruments and Covariance Restrictions,” Tech. rep., Harvard.
Mas-Colell, A., M. D. Whinston, and J. R. Green (1995): Microeconomic The-
ory, Oxford University Press.
Matzkin, R. L. (2003): “Nonparametric Estimation of Nonadditive Random Func-
tions,” Econometrica, 71, 1339–1375.
——— (2008): “Identification in Nonparametric Simultaneous Equations,” Economet-
rica, 76, 945–978.
——— (2015): “Estimation of Nonparametric Models with Simultaneity,” Econometrica,
83, 1–66.
McFadden, D. (1974): “Conditional Logit Analysis of Qualitative Choice Behavior,”
in Frontiers of Econometrics, ed. by P. Zarembka, New York: Academic Press.
——— (1978): “Modelling the Choice of Residential Location,” in Spatial Interaction
Theory and Planning Models, ed. by A. Karlvist, Amsterdam: North Holland, 75–96.
——— (1981): “Econometric Models of Probabilistic Choice,” in Structural Analysis of
Discrete Data with Econometric Applications, ed. by C. Manski and D. McFadden,
Cambridge, MA: MIT Press.
McFadden, D., A. Talvitie, and Associates (1977): Demand Model Estimation
and Validation, Berkeley CA: Institute of Transportation Studies.
Miller, N. H. and M. C. Weinberg (2017): “Understanding the Price Effects of the
MillerCoors Joint Venture,” Econometrica, 85, 1763–1791.
64
Neilson, C. (2020): “Targeted Vouchers, Competition Among Schools, and the Aca-
demic Achievement of Poor Students,” Working paper, Princeton University.
Nevo, A. (2001): “Measuring Market Power in the Ready-to-Eat Cereal Industry,”

Newey, W. K. and J. L. Powell (2003): “Instrumental Variable Estimation in

Nonparametric Models,” Econometrica, 71, 1565–1578.
Petrin, A. (2002): “Quantifying the Benefits of New Products: The Case of the Mini-
van,” Journal of Political Economy, 110, 705–729.
Petrin, A. and K. Train (2010): “A Control Function Approach to Endogeneity in

Consumer Choice Models,” Journal of Marketing Research, 47, 3–13.
Quan, T. W. and K. R. Williams (2018): “Product Variety, Across Market Demand

Heterogeneity, and the Value of Online Retail,” The RAND Journal of Economics, 49,
877–913.
Quandt, R. E. (1956): “A Probabilistic Theory of Consumer Behavior,” Quarterly

Journal of Economics, 70, 507–536.
——— (1966): “A Probabilistic Abstract Mode Model,” in Studies in Travel Demand,

Volume II, Mathematica, Princeton, N.J., 90–113.
——— (1968): “Estimation of Modal Splits,” Transportation Research, 2, 41–50.
Reynaert, M. and F. Verboven (2014): “Improving the performance of random co-

efficients demand models: The role of optimal instruments,” Journal of Econometrics,
179, 83 – 98.
Rosse, J. N. (1970): “Estimating Cost Function Parameters without using Cost Func-
tion Data: An Illustrated Methodology,” Econometrica, 38, 256–275.
Salanie, B. and F. A. Wolak (2019): “Fast, ‘Robust’, and Approximately Cor-

rect: Estimating Mixed Demand Systems,” Working Paper 25726, National Bureau of
Economic Research.
Train, K. E. (2009): Discrete Choice Methods with Simulation, Cambridge Press, 2nd
ed.
Waldfogel, J. (2003): “Preference Externalities: An Empirical Study of Who Benefits

Whom in Differentiated-Product Markets,” RAND Journal of Economics, 34, 557–568.
Williams, K. R. and B. M. Adams (2019): “Zone Pricing in Retail Oligopoly,”

American Economic Journal: Microeconomics, 11, 124–156.
65

Berry and Haile (2021) WP - Foundations of Demand Estimation

Uploaded by

Copyright:

Available Formats

Berry and Haile (2021) WP - Foundations of Demand Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Berry and Haile (2021) WP - Foundations of Demand Estimation

Uploaded by

Copyright:

Available Formats

NBER WORKING PAPER SERIES

FOUNDATIONS OF DEMAND ESTIMATION

Working Paper 29305

NATIONAL BUREAU OF ECONOMIC RESEARCH

2 The Challenges of Demand Estimation 4

3 Discrete Choice Demand 14

5 Nonparametric Identification: Market-Level Data 33

6 Micro Data, Panels, and Ranked Choices 44

7 Nonparametric Identification with Micro Data 51

8 Some Directions for Future Work 57

• the impact of a tax or subsidy;

• the social value of a new good;

• the effect of a tariff on prices;

• the effect of a merger on consumer prices;

• outcomes under alternative school choice regimes;

• the impact of adverse selection on insurance markets; and

• any quantitative question (and many qualitative questions) concerning consumer

2 The Challenges of Demand Estimation

2.1 The First Fundamental Challenge

2.2 The Second Fundamental Challenge

2.3 Demand Is Not Regression

where now X = (X1 , . . . , XJ ) , P = (P1 , . . . , PJ ) , U = (U1 , . . . , UJ ).

2.4 A Surprisingly Difficult Case: Exogenous Prices

Dj (X, P, U ) = Dj (X, P, Ej (U )) (2.4)

where the index Ej (U ) is a scalar and the function Dj is strictly increasing in Ej (U ).

2.5.1 Controls, Including Fixed Effects

2.5.2 Control Function

2.5.3 Average Treatment Effects

2.7 Demand or Utilities?

• an assumption that heterogeneity in preferences over a given set of goods arises in

• an assumption that variation in the characteristics of good j alters the attractive-

3 Discrete Choice Demand

3.1 Random Utility Models

qij = 1 {uij ≥ uik ∀k ∈ {0, 1, . . . , Ji }} .

Consumer-specific choice probabilities are then given by

Figure 1: Choice regions for goods 0, 1, and 2.

uijt = xjt βit − αit pjt + ξjt + ijt (3.1)

for j > 0, with

3.3 Why Random Coefficients?

uijt = xjt βit − α0 pjt + ξjt + ijt (4.1)

uijt = δjt + µijt + ijt , (4.2)

where we have defined

4.1 The BLP Estimator

Because identification of the model will rely on instrumental variables, it is natural

1. take a trial value of the parameters θ;

where zjt ⊃ xjt is a vector of appropriate instrumental variables;

min g(ξ (θ))0 Ω g(ξ (θ)) (4.6)

4.2.1 Cost Shifters and their Proxies

4.2.2 BLP Instruments

4.2.3 Waldfogel-Fan Instruments

4.2.4 Exogenous Measures of Market Structure

4.2.5 Optimal Instruments

4.2.6 Evaluating Instruments

4.3 Using a Supply Side

mcjt = cjt (wjt , qjt , ωjt , γ) = wjt γ0 + γ1 qjt + ωjt , (4.12)

4.4 Computing the BLP Estimator and Standard Errors

5 Nonparametric Identification: Market-Level Data

5.1 Insights from Parametric Models

δjt = xjt β − αpjt + ξjt . (5.1)

uijt = xjt βit − αit pjt + ξjt + ijt (3.1)

uijt = xjt βit − α0 pjt + ξjt + ijt (4.1)

uijt = δjt + µijt + ijt , (4.2)

uijt = xjt βit − α0 pjt + ξjt + ijt ,

uijt = xjt βijt − αit pjt + ξjt + ijt , (7.2)